Building a Reference Structure for Managing Linked Open Data
Linking Open Data (LOD) project facilitates the emergence of a web of linked data by publishing and interlinking open data on the web in RDF. The 203 datasets in LOD cover a wide spectrum of subject domains – biomedical, science, geographic, generic knowledge, entertainment, government. As they constantly increase we are facing the problem of accessing them conveniently, manipulating them and further developing them. Having this large set of interconnected data, equipped with ontologies describing parts of them calls for approaches for their efficient usage and better integration.
FactForge, the largest set of heterogeneous generic knowledge on which reference has been performed, is an implementation of a reason-able view of the web of data gathering some of the most popular LOD datasets, e.g. a “refined” version of DBpedia and the original versions of Freebase, Geonames, WordNet, CIA World Factbook, Lingvoj, MusicBrainz, RDF from Zitgist, New York Times in a compound dataset with an overall of a couple of billion explicit statements and implicit facts. One can query with a combination of predicates from different datasets, and also obtain results from different datasets. Querying FactForge requires in-depth knowledge about the schema and the names of the predicates in all datasets, which is an impossible task.
On the picture below (derived from the original LOD diagram) the datasets included in FactForge in red and those included in LinkedLifeData in yellow. Please, click here or on image to view full LOD Cloud.
The goal is to provide a more convenient way of obtaining useful information from FactForge without having to use single predicates from single datasets and to ensure consistency of the results.
Ontotext collaborated with Structured Dynamics to develop an innovative approach for building reference structures for managing linked data based on ontology matching. Upper-level foundational ontologies with different levels of generality, Ontotext’s PROTON and Structured Dynamics’ UMBEL have been mapped together and to the schemata of DBpedia, Freebase and GeoNames unidirectionally, thus providing access to DBpedia, FreeBase and Geonames instances by means of the most generic concept of the chain of mapped concepts. The employed techniques allow for coping with hidden relationships and missing instances. Additionally, DBpedia instances have been assigned directly UMBEL classes.
These mappings, loaded together with all FactForge datasets created its new instance. The benefits of this are sixfold: