Ontotext

OWLIM Performance with Jena

The purpose of this page is to provide some indication of expected performance when comparing the different access methods of OWLIM as well as comparing OWLIM with other Jena repository implementations, specifically Jena TDB.

The Berlin SPARQL Benchmark (BSBM)

The BSBM 100 Million benchmark was executed against three configurations as shown in the table below. The numbers in the table are the number of Query Mixes executed per Hour (QMpH):

  Joseki/Jena/BigOWLIM Tomcat/Sesame/BigOWLIM Joseki/Jena/TDB
Loading time (sec.) 1 823 1 849 4 166
1 client 707 749 443
4 clients 2 432 1 996 631
8 clients 2 859 2 473 521

As can be seen, the performance of BigOWLIM with either Sesame or Jena is consistently better than Jena/TDB. For the 8 client run the performance is around five times better. The above results were obtained in the ontosol test environment using the following components and versions:

BigOWLIM 3.4
Sesame 2.3.2
Tomcat 6.0.29
Jena 2.6.3
Joseki 3.4.0
TDB 0.8.7 (Using the statistics-based BGP optimizer which requires some extra configuration efforts; without it TDB results were 2-3 times worse)

The results for a single client for Sesame/BigOWLIM and Jena/TDB are similar to those obtained in an independent BSBM evaluation. These results suggest that:

  • For a single client, using BigOWLIM through Sesame is the fastest option; using BigOWLIM through Jena is marginally slower;
  • Perhaps unexpectedly, the Joseki/Jena/BigOWLIM configuration is much faster than Tomcat/Sesame/BigOWLIM (about 20%) for multiple clients; this suggests that the Tomcat/Sesame interfaces are introducing some overheads for the remote connection of multiple clients over HTTP;
  • Using BigOWLIM as a backend behind Joseki and Jena delivers a huge improvement in performance when compared to TDB; for multi-client loads the results above demonstrate that BigOWLIM is 5 times faster than Jena/TDB.

The Lehigh University Benchmark (LUBM)

The LUBM(50) benchmark was executed against the same versions of the components and using the same test environment as those presented in the previous section. For each configuration the loading times are given (in seconds) and the timings for each of the specimen queries is given (in milliseconds) together with the number of returned results shown in brackets.

  Sesame/BigOWLIM Jena/BigOWLIM Jena/TDB
Loading time (sec.) 200 260 229*
Query 1 msec. (number of results) 2(4) 2(4) 3 880(4)
Query 2 1 873 (130) 1 922 (130) 9 634 (130)
Query 3 1 (6) 2 (6) 2 (6)
Query 4 4 (34) 5 (34) 21 (34)
Query 5 6 (719) 14 (719) 84 (719)
Query 6 257 (519 842) 4 432 (519 842) 5 491 (393 730)**
Query 7 2 (67) 3 (67) 9 (59)**
Query 8 85 (7 790) 176 (7 790) 19 005 (5 916)**
Query 9 3 256 (13 639) 3 620 (13 639) 24 447 (6 538)**
Query 10 1 (4) 2 (4) 2 (0)**
Query 11 1 (224) 3 (224) 1 (0)**
Query 12 5 (15) 4 (15) 585 (0)**
Query 13 8 (228) 10 (228) 1 (0)**
Query 14 193 (393 730) 3 929 (393 730) 4 685 (393 730)
Average (msec.) 407 1 009 4 846

*The loading times include inferencing for BigOWLIM, but do not include inferencing for TDB. The inference step for TDB is conducted during the warm up phase of the benchmark and not reported here in accordance with normal practice.

**The inference engine component of TDB used here is the RDFSExptRuleReasoner. It it not expressive enough to capture the semantics expressed in the dataset, hence these queries do not return the complete set of expected answers. Note that this means that these tests are not a fair comparison, since the TDB configuration does less inference work. This inference mode was chosen for TDB as the more expressive ones make the loading time unreasonably slow (more than 100 min.).

The results shown above indicate that:

  • BigOWLIM uses its own reasoning mechanisms (in this case the OWL Horst ruleset) which are applied during data loading when using both Jena and Sesame. However, loading through Sesame turns out to be some 30% faster;
  • Query evaluation using BigOWLIM is always faster than than Jena/TDB;
  • When BigOWLIM is used through Jena there is a considerable slow down for the queries that return large result sets, but these are still faster than Jena/TDB