Linked Data Semantic Repository: Reason-able Views
Reason-able views (RAV) represent a practical approach for reasoning with the web of linked data.
It is an assembly of independent datasets,
which can be used as a single body of knowledge - an integrated dataset - with respect to reasoning and query evaluation.
The integrated dataset is designed to meet some criteria for "reasonability", e.g. it has specific qualities
with respect to a specific reasoning task and language.
For example, "consistent with OWL Lite" or "allows RDFS entailment within O(n) time and space".
Linked data reason-able view can be considered a special case where:
- All the datasets in the view represent linked data
- Single reasonability criteria is imposed on all datasets
- Each dataset is connected to at least one of the others
Considering the size of the LOD datasets, in order to make query
evaluation and reasoning practically feasible, the integrated dataset of a linked RAV should be
loaded in a single repository (even if it employs some sort of distribution internally).
Such linked RAV can be considered as index, which caches parts of the LOD cloud and provides access
to the datasets included in it in a manner similar to the one in which web search engines
index WWW pages and facilitate their usage.
As a final practical consideration, to allow for caching and indexing, linked RAVs should include
only datasets that are more or less static; this excludes various types of wrappers or virtual datasets,
where RDF is generated in answer to retrieval requests (one can make an analogy with the dynamic
part of the WWW).
Standard Methods of Inference
Practically inapplicable to a web of linked data are the standard methods of sound and complete inference
with respect to relatively rich flavor of the First Order Predicate Calculus (FOPC).
Some of the major obstacles are:
- Counting on "closed-world" assumption models developed under centralized control by the most popular FOPC fragments,
such as the Description Logics (DL). This is irrelevant in web context. Performing
sound and complete inference with respect to LOD-type data is heavily prone to
inconsistency. This renders the results of such inference useless.
- Mechanisms with prohibitively high computational complexity of the semantics of languages like DL. They require "satisfiability" checks. As a result
the most scalable published experiments with DL reasoning remain below 10 million statements of sound and complete reasoning. This is not enough.
- Unsuitability for reasoning of some of the datasets of LOD (or some parts of them). Some data publishers seem to use the OWL and RDFS
vocabulary without account for their formal semantics. The result of inference for some datasets is of questionable utility. For instance, a dataset contains a subject hierarchy, encoded via the relation
rdfs:subClassOf with cycles of length tens of concepts. Any reasoner,
following the standard semantics of rdfs:subClassOf, will infer that all the
concepts in the loop are equivalent. This does not seem to be the intention
of the publishers.
- Reasoning with data distributed across different web servers is possible but much slower than reasoning with local data. The
fundamental reason is related to the so called "remote join" problem known
from the distributed database management systems (DBMS).
Linking the Linked Data
Reasoning has the potential to enhance the interlinking between linked data datasets, as long
as it it ensures enforcement of the semantics of the links. For instance, the link between the identifiers
for Vienna in DBPedia (dbpedia:Vienna) and in Geonames (geonames:2761369), and the statements linking Vienna to the corresponding
high-level administrative region in Austria (geonames:2761367):
dbpedia:Vienna owl:sameAs geonames:2761369
geonames:2761369 gno:parentFeature geonames:2761367
derive by simple inference the statement:
dbpedia:Vienna gno:parentFeature geonames:2761367
This would allow this connection between the DBPedia entry of Vienna and the Geonames description of Austria to appear when exploring dbpedia:Vienna or to be considered
during query evaluation.