GraphDB™ 6.1 Performance Benchmark Results
Adequate benchmarking of semantic repositories is a complex exercise involving many factors. Ontotext is involved in project LDBC – an outstanding initiative that aims to establish industry cooperation between vendors of RDF and graph database technologies in developing, endorsing, and publishing reliable and insightful benchmark results.
The benchmark results presented here aim to provide sufficient information on how GraphDB™ performs important tasks (such as loading, inference and querying) with variations in size and nature of the data, inference, query types and other relevant factors. It also presents the improvement of speed in GraphDB™ 6.1 in comparison to OWLIM 5.4.
|Task||Hardware (1)||Data size (2) (explicit triples)||Load time (sec.)||Loading speed (st./sec.)||Query Performance||Query perf. Measure||Load time speed up (3)||Query time speed up (3)||Comment|
|UNIPROT Aug’14 load||Rolle||12,896,017,962||57,240||225,297||353%||Loaded in a bit less than 16h. If data size is judged by the amount of triples in the input files (which is 17 billions), the loading speed is 295 000 st./sec.|
|DBPedia 2014 load, English version||Leibniz||566,076,449||3,147||179,905||Loaded in 1 hour and 10 minutes from Turtle files|
|BSBM 100M Explore||Leibniz||99,892,000||536||186,366||10,041||QMPH||241%||67%||Query performance measured with 16 clients. Results in Query Mixes Per Hour|
|BSBM 100M Explore & Update||Leibniz||10,086||QMPH||18%|
|BSBM 1B Explore||Leibniz||998,782,000||5,581||178,961||1,083||QMPH||239%||13%|
|BSBM 1B Explore & Update||Leibniz||1,278||QMPH||10%|
|LDBC SPB 50M||Newton||50,124,572||2, 045||24, 511||40||read queries per second||10%||38%||Load time includes forward-chaining and materialization. 10 clients perform read queries, while in parallel 2 clients perform updates|
|31||updates per second||244%|
|LDBC SPB 50M||AWS c3.4xlarge||50,124,572||31||read queries per second||19%||Load time includes forward-chaining and materialization. 14 clients perform read queries, while in parallel 2 clients perform updates|
|17||updates per second||113%|
|LDBC SPB 1B||Newton||1,002,491,440||41,400||24,215||11||read queries per second||526%||-2%|
|10||updates per second||1415%|
|Wordnet load||Leibniz||2,724,000||576||4,729||Quite expressive reasoning is performed through forward-chaining|
(1) The hardware configurations are as follows. Leibniz is a dual-CPU server with Xeon E5-2690 CPUs, 256 GB of RAM and SSD storage array; overall assembly cost below $10,000. Rolle is the same as Leibniz, but with 512GB of RAM. Newton is very similar to Leibniz. AWS c3.4xlarge is a type Amazon cloud instance with 16 vCPUs, 55 ECU, 30GB of RAM and SSD storage.
(2) In the Data size column we refer to the number of explicit statements in the repository after the initial loading data. We exclude inferred statements, because this is only relevant for forward-chaining based engines. Some tests insert additional statements if update queries are part of the query mixes – these additional statements are ignored above. There are datasets that include a substantial amount of duplicate statements in the data dumps – for instance, the raw files of UNIPROT contain 17B statements, but only 12B of those are unique.
(3) Load and query performance of GraphDB™ is compared to OWLIM SE 5.4, running in the same environment. Loading in GraphDB™ is performed using the new Load Tool.
- GraphDB™ can load datasets of more than 10 billion statements on a single commodity database server at speeds exceeding 200,000 statements per second. In specific loading scenarios GraphDB™ managed to load billions of triple scale datasets at speeds of around 500,000 statements per second.
- The loading speed of GraphDB™ does not degrade as the volume of the data grows – for both BSBM and LDBC, the loading speeds for the 50-100 million datasets were the same as for the 1 billion statement datasets.
- Under the LDBC Semantic Publishing Benchmark (SPB) 50-million dataset, GraphDB™ Standard Edition can execute 30 read queries per second, while handling more than 20 updates each second in a consistent and transactionally safe manner. This is also the case on the Amazon AWS instance with 30GB of RAM. LDBC SPB is a benchmark derived from BBC’s Dynamic Semantic Publishing projects. This benchmark simulates loads similar to the one experienced by GraphDB™ serving web page generation for the BBC Sport website. Read query performance can be scaled up linearly through the cluster architecture of GraphDB™ Enterprise;
- GraphDB’s Loading Tool is much faster than any loading mechanism in OWLIM 5.4. For big datasets the speed up can be more than 5 times;
- GraphDB™ is faster on update queries – the increase in speed varies between 2 times (on SPB 50M) and 15 times (on SPB 1B).