GraphDB Benchmark Results

GraphDB™ 6.1 Performance Benchmark Results

Adequate benchmarking of semantic repositories is a complex exercise involving many factors. Ontotext is involved in project LDBC – an outstanding initiative that aims to establish industry cooperation between vendors of RDF and graph database technologies in developing, endorsing, and publishing reliable and insightful benchmark results.

The benchmark results presented here aim to provide sufficient information on how GraphDB™ performs important tasks (such as loading, inference and querying) with variations in size and nature of the data, inference, query types and other relevant factors. It also presents the improvement of speed in GraphDB™ 6.1 in comparison to OWLIM 5.4.

Task Hardware (1) Data size (2) (explicit triples) Load time (sec.) Loading speed (st./sec.) Query Performance Query perf. Measure Load time speed up (3) Query time speed up (3) Comment
UNIPROT Aug’14 load Rolle 12,896,017,962 57,240 225,297 353% Loaded in a bit less than 16h. If data size is judged by the amount of triples in the input files (which is 17 billions), the loading speed is 295 000 st./sec.
DBPedia 2014 load, English version Leibniz 566,076,449 3,147 179,905 Loaded in 1 hour and 10 minutes from Turtle files
BSBM 100M Explore Leibniz 99,892,000 536 186,366 10,041 QMPH 241% 67% Query performance measured with 16 clients. Results in Query Mixes Per Hour
BSBM 100M Explore & Update Leibniz 10,086 QMPH 18%
BSBM 1B Explore Leibniz 998,782,000 5,581 178,961 1,083 QMPH 239% 13%
BSBM 1B Explore & Update Leibniz 1,278 QMPH 10%
LDBC SPB 50M Newton 50,124,572 2, 045 24, 511 40 read queries per second 10% 38% Load time includes forward-chaining and materialization. 10 clients perform read queries, while in parallel 2 clients perform updates
31 updates per second 244%
LDBC SPB 50M AWS c3.4xlarge 50,124,572 31 read queries per second 19% Load time includes forward-chaining and materialization. 14 clients perform read queries, while in parallel 2 clients perform updates
17 updates per second 113%
LDBC SPB 1B Newton 1,002,491,440 41,400 24,215 11 read queries per second 526% -2%
10 updates per second 1415%
Wordnet load Leibniz 2,724,000 576 4,729 Quite expressive reasoning is performed through forward-chaining

Notes:

(1) The hardware configurations are as follows. Leibniz is a dual-CPU server with Xeon E5-2690 CPUs, 256 GB of RAM and SSD storage array; overall assembly cost below $10,000. Rolle is the same as Leibniz, but with 512GB of RAM. Newton is very similar to Leibniz. AWS c3.4xlarge is a type Amazon cloud instance with 16 vCPUs, 55 ECU, 30GB of RAM and SSD storage.

(2) In the Data size column we refer to the number of explicit statements in the repository after the initial loading data. We exclude inferred statements, because this is only relevant for forward-chaining based engines. Some tests insert additional statements if update queries are part of the query mixes – these additional statements are ignored above. There are datasets that include a substantial amount of duplicate statements in the data dumps – for instance, the raw files of UNIPROT contain 17B statements, but only 12B of those are unique.

(3) Load and query performance of GraphDB™ is compared to OWLIM SE 5.4, running in the same environment. Loading in GraphDB™ is performed using the new Load Tool.

Results Analysis:

  • GraphDB™ can load datasets of more than 10 billion statements on a single commodity database server at speeds exceeding 200,000 statements per second. In specific loading scenarios GraphDB™ managed to load  billions of triple scale datasets at speeds of around 500,000 statements per second.
  • The loading speed of GraphDB™ does not degrade as the volume of the data grows – for both BSBM and LDBC, the loading speeds for the 50-100 million datasets were the same as for the 1 billion statement datasets.
  • Under the LDBC Semantic Publishing Benchmark (SPB) 50-million dataset, GraphDB™ Standard Edition can execute 30 read queries per second, while handling more than 20 updates each second in a consistent and transactionally safe manner. This is also the case on the Amazon AWS instance with 30GB of RAM. LDBC SPB is a benchmark derived from BBC’s Dynamic Semantic Publishing projects.  This benchmark simulates loads similar to the one experienced by GraphDB™ serving web page generation for the BBC Sport website. Read query performance can be scaled up linearly through the cluster architecture of GraphDB™  Enterprise;
  • GraphDB’s Loading Tool is much faster than any loading mechanism in OWLIM 5.4. For big datasets the speed up can be more than 5 times;
  • GraphDB™ is faster on update queries – the increase in speed varies between 2 times (on SPB 50M) and 15 times (on SPB 1B).

Additional Resources

The Latest White Paper from Ontotext: "The Truth About Triplestores"

Download Whitepaper

GraphDB: At Last, the Meaningful Database

 

Download Report

GraphDB Knowledge Path Series: Advanced Features

View the Series