Ontotext

Ontology Engineering and Linked Data Management

Building a Reference Structure for Managing Linked Open Data

Linking Open Data (LOD) project facilitates the emergence of a web of linked data by publishing and interlinking open data on the web in RDF. The 203 datasets in LOD cover a wide spectrum of subject domains – biomedical, science, geographic, generic knowledge, entertainment, government. As they constantly increase we are facing the problem of accessing them conveniently, manipulating them and further developing them. Having this large set of interconnected data, equipped with ontologies describing parts of them calls for approaches for their efficient usage and better integration.

Case

FactForge, the largest set of heterogeneous generic knowledge on which reference has been performed, is an implementation of a reason-able view of the web of data gathering some of the most popular LOD datasets, e.g. a “refined” version of DBpedia and the original versions of Freebase, Geonames, WordNet, CIA World Factbook, Lingvoj, MusicBrainz, RDF from Zitgist, New York Times in a compound dataset with an overall of a couple of billion explicit statements and implicit facts. One can query with a combination of predicates from different datasets, and also obtain results from different datasets. Querying FactForge requires in-depth knowledge about the schema and the names of the predicates in all datasets, which is an impossible task.

On the picture below (derived from the original LOD diagram) the datasets included in FactForge in red and those included in LinkedLifeData in yellow. Please, click here or on image to view full LOD Cloud.

 

Goal

The goal is to provide a more convenient way of obtaining useful information from FactForge without having to use single predicates from single datasets and to ensure consistency of the results.

Solution and Benefits

Ontotext collaborated with Structured Dynamics to develop an innovative approach for building reference structures for managing linked data based on ontology matching. Upper-level foundational ontologies with different levels of generality, Ontotext’s PROTON and Structured Dynamics’ UMBEL have been mapped together and to the schemata of DBpedia, Freebase and GeoNames unidirectionally, thus providing access to DBpedia, FreeBase and Geonames instances by means of the most generic concept of the chain of mapped concepts. The employed techniques allow for coping with hidden relationships and missing instances. Additionally, DBpedia instances have been assigned directly UMBEL classes.

 

These mappings, loaded together with all FactForge datasets created its new instance. The benefits of this are sixfold:

  • An infrastructure for exploring the role of the reference structures in providing pathways for easier and cheaper access to vast amounts of heterogeneous data has been put in place
  • The information available for retrieval has increased by 40%
  • Accessing vast amounts of instances from different datasets has been now possible through a foreseeable amount of concepts and relationships
  • Consistency of the results has been ensured because of the consistency of the ontologies
  • Algorithms can make use of these heterogeneous data for reasoning, semantic annotation, querying in natural language, and other knowledge intensive applications
  • The mappings have been used as a Gold Standard for developing automated mapping approaches to help scaling the solution for the entire Web of Data