GraphDB™ 6.0 Benchmark Results
Adequate benchmarking of semantic repositories is a complex exercise involving many factors. Ontotext is involved in project LDBC – an outstanding initiative that aims to establish industry cooperation between vendors of RDF and graph database technologies in developing, endorsing, and publishing reliable and insightful benchmark results.
The benchmark results presented here aim to provide sufficient information on how GraphDB™ performs important tasks (such as loading, inference and querying) with variations in size and nature of data, inference, query types and other relevant factors. It also presents the improvement of speed in GraphDB™ in comparison to OWLIM 5.4.
(1) The hardware configurations are as follows. Leibniz is a dual-CPU server with Xeon E5-2690 CPUs, 256 GB of RAM and SSD storage array; overall assembly cost bellow $10,000. Rolle is the same as Leibniz, but with 512GB of RAM. Newton is very similar to Leibniz. AWS c3.4xlarge is a type Amazon cloud instance with 16 vCPUs, 55 ECU, 30GB of RAM and SSD storage.
(2) In the Data size column we refer to the number of explicit statements in the repository after the initial loading data. We exclude inferred statements, because this is only relevant for forward-chaining based engines. Some tests insert additional statements if update queries are part of the query mixes – such additional statements are ignored above. There are datasets that include a substantial amount of duplicate statements in the data dumps – for instance, the raw files of UNIPROT contain 17B statements, but only 12B of those are unique.
(3) Load and query performance of GraphDB™ is compared to OWLIM SE 5.4, running in the same environment. Loading in GraphDB™ is performed using the new Load Tool in 6.0.
- GraphDB™ can load datasets of more than 10B statements on a single commodity database server at speeds exceeding 200 000 statements/second. In specific loading scenarios GraphDB™ managed to load billion triples scale datasets at speeds around 500 000 statements in second;
- Loading speed of GraphDB™ does not degrade as the volume of the data grows – for both BSBM and LDBC, the loading speed for the 50-100 million datasets is the same as for the 1B datasets;
- Under LDBC Semantic Publishing Benchmark (SPB), 50-million dataset, GraphDB™ Standard can execute 30 read queries per second, while handling about 10 updates each second in a consistent and safe manner from a transactional perspective. This is also the case on the Amazon AWS instance with 30GB of RAM. LDBC SPB is a benchmark derived from BBC’s Dynamic Semantic Publishing projects; this benchmark simulates loads similar to the one experienced by GraphDB™ serving web page generation for the BBC Sport website. Read query performance can be scaled up linearly through the cluster architecture of GraphDB™ Enterprise;
- GraphDB’s Loading Tool is much faster than any loading mechanism in OWLIM 5.4. For big datasets the speed up can be more than 5 times;
- GraphDB™ is faster on update queries – the speed up varies between 30% and 150%.