Menu

Ontotext

owl-sameAs-optimization

The performance of a OWLIM-SE repository is greatly improved with a specific optimization that allows it to handle owl:sameAs statements efficiently. owl:sameAs is an OWL predicate that declares that two different URIs identify one and the same resource. Most often, it is used to align different identifiers of the same real-world entity used in different data sources.

However, using owl:sameAs is not without its problems, as shown in the following example.

Example

Asserting that there are three different URIs for Bulgaria and two for Sofia (that is part of Bulgaria) can be done using these RDF statements:

dbpedia:Sofia owl:sameAs geonames:727011
geonames:727011 geo-ont:parentFeature geonames:732800
dbpedia:Bulgaria owl:sameAs geonames:732800
dbpedia:Bulgaria owl:sameAs opencyc-en:Bulgaria

Which is shown graphically in the following diagram:

RDF statements describing Sofia and Bulgaria

Standard OWL semantics states that owl:sameAs is:

  • transitive
  • symmetric
  • reflexive

Therefore, statements asserted using one of a set of equivalent URIs should be 'replicated' so that the same statement has all equivalent URIs substituted in the same position. Therefore, the 4 statements in the example lead to 10 inferred statements:

geonames:727011 owl:sameAs dbpedia:Sofia
geonames:732800 owl:sameAs dbpedia:Bulgaria
geonames:732800 owl:sameAs opencyc-en:Bulgaria
opencyc-en:Bulgaria owl:sameAs dbpedia:Bulgaria
opencyc-en:Bulgaria owl:sameAs geonames:732800
dbpedia:Sofia geo-ont:parentFeature geonames:732800
dbpedia:Sofia geo-ont:parentFeature opencyc-en:Bulgaria
dbpedia:Sofia geo-ont:parentFeature dbpedia:Bulgaria
geonames:727011 geo-ont:parentFeature opencyc-en:Bulgaria
geonames:727011 geo-ont:parentFeature dbpedia:Bulgaria

Which is shown graphically in the following diagram:

RDF statements describing Sofia and Bulgaria with inferences

 

Optimization strategy

OWLIM-SE features an optimization that allows it to use a single 'master-node' in its indices to represent an equivalence class of owl:sameAs URIs. This avoids inflating the indices with multiple equivalent statements.

e.g. imagine a statement, which has 5 owl:sameAs equivalents of its subject, 2 of its predicate, and 3 of its object. Such a situation would lead to 30 replica statements (5x2x3) in the indices after forward-chaining if such an optimization is not used.

An advantage to this approach, is that it allows more compact query results. The owl:sameAs equivalence can result in a multiplication of the bindings of variables during query evaluation with both forward- and backward-chaining. This leads to an expansion of the result-set with rows that differ only by using different URIs from one and the same equivalence class. OWLIM-SE allows the selection of condensed or expanded results at query time.

The owl:sameAs optimization is carefully designed and implemented to make sure that:

  • All the inferences that follow from the application of the standard owl:sameAs semantics are inferable with the optimization

  • One can correctly determine the "original" version of the statement, i.e. which URIs were used when the statement was asserted

  • One can still get all the variations of all statements, if desired - the standard semantics can be simulated upon retrieval in a manner which makes the owl:sameAs optimization an implementation detail that is transparent to end users who are not concerned with obtaining condensed result sets

Without this optimization reasoning with linked data becomes inefficient and the query results become overly inflated.