GraphDB Benchmark Results

 

GraphDB™ 6.0 Performance Benchmark Results

Adequate benchmarking of semantic repositories is a complex exercise involving many factors. Ontotext is involved in project LDBC – an outstanding initiative that aims to establish industry cooperation between vendors of RDF and graph database technologies in developing, endorsing, and publishing reliable and insightful benchmark results.

The benchmark results presented here aim to provide sufficient information on how GraphDB™ performs important tasks (such as loading, inference and querying) with variations in size and nature of the data, inference, query types and other relevant factors. It also presents the improvement of speed in GraphDB™ in comparison to OWLIM 5.4.

  • TaskTask
  • UNIPROT Aug'14 loadUNIPROT Aug'14 load
  • BSBM 100M ExploreBSBM 100M Explore
  • BSBM 100M Explore & UpdateBSBM 100M Explore & Update
  • BSBM 1B ExploreBSBM 1B Explore
  • BSBM 1B Explore & UpdateBSBM 1B Explore & Update
  • LDBC SPB 50MLDBC SPB 50M
  • LDBC SPB 50MLDBC SPB 50M
  • LDBC SPB 1BLDBC SPB 1B
  • Wordnet loadWordnet load
  • TaskHardware (1)
  • UNIPROT Aug'14 loadRolle
  • BSBM 100M ExploreLeibniz
  • BSBM 100M Explore & UpdateLeibniz
  • BSBM 1B ExploreLeibniz
  • BSBM 1B Explore & UpdateLeibniz
  • LDBC SPB 50MNewton
  • LDBC SPB 50MAWS c3.4xlarge
  • LDBC SPB 1BNewton server
  • Wordnet loadLeibniz server
  • TaskData size (2) (explicit triples)
  • UNIPROT Aug'14 load12 896 017 962
  • BSBM 100M Explore99 892 000
  • BSBM 100M Explore & Update
  • BSBM 1B Explore998 782 000
  • BSBM 1B Explore & Update
  • LDBC SPB 50M50 124 572
  • LDBC SPB 50M50 124 572
  • LDBC SPB 1B1 002 491 440
  • Wordnet load2 724 000
  • TaskLoad time (sec.)
  • UNIPROT Aug'14 load57 240
  • BSBM 100M Explore536
  • BSBM 100M Explore & Update
  • BSBM 1B Explore5 581
  • BSBM 1B Explore & Update
  • LDBC SPB 50M2 045
  • LDBC SPB 50M
  • LDBC SPB 1B41 400
  • Wordnet load576
  • TaskLoading speed (st./sec.)
  • UNIPROT Aug'14 load225 297
  • BSBM 100M Explore186 366
  • BSBM 100M Explore & Update
  • BSBM 1B Explore178 961
  • BSBM 1B Explore & Update
  • LDBC SPB 50M24 511
  • LDBC SPB 50M
  • LDBC SPB 1B24 215
  • Wordnet load4 729
  • TaskQuery Perfor-
    mance
  • UNIPROT Aug'14 load
  • BSBM 100M Explore7 349
  • BSBM 100M Explore & Update9 154
  • BSBM 1B Explore1 083
  • BSBM 1B Explore & Update1 278
  • LDBC SPB 50M32
  • 12
  • LDBC SPB 50M27
  • 11
  • LDBC SPB 1B10
  • 2
  • Wordnet load
  • TaskQuery perf. Measure
  • UNIPROT Aug'14 load
  • BSBM 100M ExploreQMPH
  • BSBM 100M Explore & UpdateQMPH
  • BSBM 1B ExploreQMPH
  • BSBM 1B Explore & UpdateQMPH
  • LDBC SPB 50Mread queries per second
  • updates per second
  • LDBC SPB 50Mread queries per second
  • updates per second
  • LDBC SPB 1Bread queries per second
  • updates per second
  • Wordnet load
  • TaskLoad time speed up (3)
  • UNIPROT Aug'14 load353%
  • BSBM 100M Explore241%
  • BSBM 100M Explore & Update
  • BSBM 1B Explore239%
  • BSBM 1B Explore & Update
  • LDBC SPB 50M10%
  • LDBC SPB 50M
  • LDBC SPB 1B526%
  • Wordnet load
  • TaskQuery time speed up (3)
  • UNIPROT Aug'14 load
  • BSBM 100M Explore22%
  • BSBM 100M Explore & Update7%
  • BSBM 1B Explore13%
  • BSBM 1B Explore & Update10%
  • LDBC SPB 50M10%
  • 33%
  • LDBC SPB 50M4%
  • 38%
  • LDBC SPB 1B-7%
  • 152%
  • Wordnet load-100%
  • TaskComment
  • UNIPROT Aug'14 loadLoaded in a bit less than 16h. If data size is judged by the amount of triples in the input files (which is 17 billions), the loading speed is 295 000 st./sec.
  • BSBM 100M ExploreQuery performance measured with 16 clients. Results in Query Mixes Per Hour
  • BSBM 100M Explore & Update
  • BSBM 1B Explore
  • BSBM 1B Explore & Update
  • LDBC SPB 50MLoad time includes forward-chaining and materialization. 14 clients perform read queries, while in parallel 2 clients perform updates
  • LDBC SPB 50MLoad time includes forward-chaining and materialization. 14 clients perform read queries, while in parallel 2 clients perform updates
  • LDBC SPB 1B
  • Wordnet loadQuite expressive reasoning is performed through forward-chaining

 

Notes:

(1) The hardware configurations are as follows. Leibniz is a dual-CPU server with Xeon E5-2690 CPUs, 256 GB of RAM and SSD storage array; overall assembly cost below $10,000. Rolle is the same as Leibniz, but with 512GB of RAM. Newton is very similar to Leibniz. AWS c3.4xlarge is a type Amazon cloud instance with 16 vCPUs, 55 ECU, 30GB of RAM and SSD storage.

(2) In the Data size column we refer to the number of explicit statements in the repository after the initial loading data. We exclude inferred statements, because this is only relevant for forward-chaining based engines. Some tests insert additional statements if update queries are part of the query mixes – such additional statements are ignored above. There are datasets that include a substantial amount of duplicate statements in the data dumps – for instance, the raw files of UNIPROT contain 17B statements, but only 12B of those are unique.

(3) Load and query performance of GraphDB™ is compared to OWLIM SE 5.4, running in the same environment. Loading in GraphDB™ is performed using the new Load Tool in 6.0.

Results Analysis:

  • GraphDB™ can load datasets of more than 10 billion statements on a single commodity database server at speeds exceeding 200 000 statements per second. In specific loading scenarios GraphDB™ managed to load 1 billion triple scale data sets at speeds around 500 000 statements per second.
  • The loading speed of GraphDB™ does not degrade as the volume of the data grows – for both BSBM and LDBC, the loading speeds for the 50-100 million data sets were the same as for the 1 billion statement data set.
  • Under the LDBC Semantic Publishing Benchmark (SPB) 50-million dataset, GraphDB™ Standard can execute 30 read queries per second, while handling about 10 updates each second in a consistent and safe manner from a transactional perspective. This is also the case on the Amazon AWS instance with 30GB of RAM. LDBC SPB is a benchmark derived from BBC’s Dynamic Semantic Publishing projects.  This benchmark simulates loads similar to the one experienced by GraphDB™ serving web page generation for the BBC Sport website. Read query performance can be scaled up linearly through the cluster architecture of GraphDB™  Enterprise;
  • GraphDB’s Loading Tool is much faster than any loading mechanism in OWLIM 5.4. For big data sets the speed up can be more than 5 times;
  • GraphDB™ is faster on update queries – the speed up varies between 30% and 150%.

Additional Resources

The Latest White Paper from Ontotext: "The Truth About Triplestores"

Download Whitepaper

GraphDB: At Last, the Meaningful Database

 

Download Report

OpenPolicy: Semantic Technology Accelerates Document Search

Download White Paper