The reasonability criteria for LDRS are defined with respect to OWL 2 RL. LDSR allows forward-chaining, e.g. entailment and consistency checking within O(n.log(n)) space and time. The integrated dataset of LDSR is consistent with respect to OWL 2 RL. Most of the results of the inference comply with common sense without specific assumptions about the context of interpretation.
The standard reasoning behaviour of OWLIM is to update the deductive closure upon committing of a transaction to the repository. When new statements are introduced, the new explicit statements are added to the repository in addition to the existing explicit statements that have come from previous transactions and their closure. Forward-chaining is performed with respect to the rules from the selected rule-set. It infers and adds to the repository all statements that are inferable from the repository in its current state. This allows for efficient incremental updates of the deductive closure. Consistency checking is performed, applying the checking rules after adding all new statements and updating the deductive closure. When statements are deleted, the deductive closure is updated in order to withdraw statements that cannot be inferred from the new state of the repository.
owl:sameAs is a predicate which is used to encode that two different
URIs denote one and the same resource. Most often, it is used to align the
different identifiers of one and the same real-world entity across
different datasets and data-sources. owl:sameAs is heavily used for linking the different datasets in Linking Open Data initiative, LOD, and can be considered as the most important OWL predicate when it comes to merging data from different data sources. Here are its effects.
The URI of Vienna in DBPedia is http://dbpedia.org/page/Vienna,
while in Geonames its URI is http://sws.geonames.org/2761369/. In DBpedia, there is a statement
(S1) dbpedia:Vienna owl:sameAs geonames:2761369
According to the formal definition of OWL 2 RL, whenever two URIs are declared to be equivalent, all statements which involve one of them, should be "replicated" with the other URI as well. The inferencing process goes as follows.
The city of Vienna with URI http://sws.geonames.org/2761369/ in Geonames is defined as part of the first-order administrative division in Austria with the same name and with URI http://www.geonames.org/2761367/. It on its turn is part of the country Austria with URI http://www.geonames.org/2782113. This makes for the following RDF statements:
(S2) geonames:2761369 gno:parentFeature geonames:2761367
(S3) geonames:2761367 gno:parentFeature geonames:2782113
gno:parentFeature is a transitive relationship, in the course of the initial
inference, OWLIM will derive that the city of Vienna is also part of Austria, e.g.:
(S4) geonames:2761369 gno:parentFeature geonames:2782113
owl:sameAs, OWLIM will infer in the subsequent inference from (S1) that statements (S2) and (S4) also hold for Vienna, when it is referred to with its DBpedia URI, e.g.:
(S5) dbpedia:Vienna gno:parentFeature geonames:2761367
(S6) dbpedia:Vienna gno:parentFeature geonames:2782113
(S7) geonames:2782113 owl:sameAs dbpedia:Austria
(S8) dbpedia:Vienna gno:parentFeature dbpedia:Austria
(S9) dbpedia:Austria gno:parentFeature dbpedia:Austria
(S10) geonames:2761369 gno:parentFeature dbpedia:Austria
(S11) geonames:2761367 gno:parentFeature dbpedia:Austria
It is thus clear that the equivalence operator owl:sameAs generates plenty of new statements even for equivalence declared between just two URIs in two distinct datasets. There are 7 new statements generated just by the single declarations of equivalence between Vienna in DBPedia and Vienna in Geonames, and between Austria in DBPedia and Austria in Geonames which makes for 175% increase of the dataset. The number of explicit and implicit statements will vastly increase as equivalences are declared between URIs in additional datasets. As an equivalence operator, owl:sameAs is transitive, reflexive, and symmetric, thus, a set of N equivalent URIs will generate N2 owl:sameAs statement between each pair of those. For instance, Vienna has an URI also in UMBEL which is also declared equivalent to the URI in DBpedia. This will make for another 4 additional implicit statements.
Although owl:sameAs is useful for interlinking RDF datasets, its semantics causes considerable inflation of the number of implicit facts that should be considered during inference and query evaluation. This has performance implications and requires optimization.
The loading of LDSR takes considerable benefits from a specific feature of the
BigTRREE engine, which allows it to handle owl:sameAs statements efficiently.
In its indices, each set of equivalent URIs (equivalence class with respect to owl:sameAs) is presented by a single super-node. So, BigTRREE can still
enumerate all statements that should be inferred through the equivalence, but it does not
have to inflate its indices. This approach can be considered as a sort of partial
materialization. However, BigOWLIM takes special care to make sure that this trick
does not hinder the ability to distinguish explicit from implicit statements.
This optimisation allows OWLIM to efficiently handle large datasets where owl:sameAs is extensively used. In the case of LDSR, this technique allows OWLIM to deal with more than 7 billion statements at the computational costs required for 860 million statements.
X,rdf:type,rdf:Resource> and <P,rdf:type,rdf:Property> should be inferred for all URIs which appear as subjects in triples and for all predicates of statements;
X,rdf:type,owl:Thing> and <X,owl:sameAs,X> should be inferred for each URI in a subject position;
owl:Thing and owl:Nothing have to be asserted to be super- and sub-classes of all classes
Given that in LDSR there are on average 3.5 explicit statements per URI, these 3 extra statements per URI appear as an unjustified overhead, especially with their limited utility. Thus, LDSR is loaded with the "partialRDFS" parameter of OWLIM which suppresses the inference of these extra statements coming from the features of the semantics of RDFS and OWL.
We have also "switched off" the RDFS rules which derive types of resources based on domains and ranges of properties,
because plenty of properties are used without regard (or knowledge) about their formal definitions.
For instance,
foaf:img has domain foaf:Person, but it is often used denote
images of all types of resources. The effect of this change in the result set was that OWLIM inferred
446 million statements less.