What is Semantic Data Integration?

Semantic data integration enables blending data from disparate sources by employing a data-centric architecture built upon an RDF model. The ability to easily import and harmonize heterogeneous data from multiple sources and interlink it as RDF statements into an RDF triplestore is essential for many knowledge management solutions.

Semantic data integration is the process of combining data from disparate sources and consolidating it into meaningful and valuable information through the use of Semantic Technology.

Integrating Heterogeneous Datasets

As organizations scale up in size, so does their data. Without the right data management strategy, intradepartmental or application-specific data silos quickly arise and hinder productivity and cooperation.

Semantic Data Integration offers a solution that goes beyond the standard enterprise application integration solutions. It employs a data-centric architecture built upon a standardized model for data publishing and interchange, namely the Resource Description Framework (RDF).

New call-to-action

 

In this framework, the heterogeneous data of an organization (structured, semi-structured and unstructured) is expressed, stored and accessed in the same way. As the data structure is expressed through the links within the data itself, it is not constrained to a structure imposed by the database and does not become obsolete with the evolution of the data. When changes in the data structure occur, they are reflected in the database through changes in the links within the data.

In addition, being the backbone of Semantic Technology, RDF enables the inference of new facts from the existing data as well as the enrichment of the available knowledge by accessing Linked Open Data (LOD) resources.

Creating a 360 Degree View with Semantic Data Integration

In a world where complete visibility, accurate analysis and solving data complexity challenges dominate the business landscape, integrating disparate data into a synchronized 360-degree perspective is paramount. Today, organizations are looking for solutions that allow them to manage all of their data and make it consumable for decision-making purposes.

Whether their database operates standalone or is integrated into a larger database ecosystem, enterprises need a complete set of data integration tools that can perform complex tasks and are easy to use.

The ability to easily import and transform heterogeneous data from multiple sources, integrate and interlink the data as RDF statements into an RDF triplestore and merge two or more graph databases are all essential functions that support world-class semantic solutions.

Text Mining to Triplestores – The Full Semantic Circle

Semantic Integration Tools

The main tasks when performing semantic data integration are:

  • creating an Application Profile (RDF Shape) that describes the desired form of the final dataset;
  • reusing existing ontologies and engineering new ontologies as needed;
  • leveraging fully the available Linked Open Datasets in the domain;
  • designing a simple, logical and sustainable URL strategy;
  • using the variety of available conversion and ETL tools to perform the integration;
  • designing and implementing a data update strategy.

To go smoothly through a full semantic data integration lifecycle, organizations need a set of easy to use semantic integration tools. With Ontotext’s semantic integration tools, users can quickly design data processing jobs and integrate massive amounts of data.

Ontotext Refine

Ontotext Refine is the next evolution of GraphDB’s OntoRefine tool. This free add-on is now separate from Ontotext’s GraphDB. This division makes it easier to deploy and scale. Ontotext Refine is a data transformation tool for “RDF-zing” semi-structured data that can be used with or without backing from the semantic repository. Data stored in Ontotext Refine can also be accessed through the REST API used to automate various tasks. In addition to this, Ontotext Refine ships with its command line interface which wraps around a dedicated Java client for easy integration with your own automated data pipeline.

GraphDB Workbench

GraphDB Workbench is a web interface and API to facilitate RDF database management, administration and application development tasks. The workbench allows for easy configuration and operation of RDF databases. We support a Sesame API, the Linked Data Publishing platform from W3C, the W3C SPARQL Graph Protocol, the ability to create, reconfigure and delete repositories, security management, user setup, write permissions, creating and modifying linked data sources and more.

GraphDB Connectors

GraphDB Connectors allow users to connect GraphDB to external information retrieval engines such as Lucene, SoLR and Elasticsearch, event store platforms such as Apache Kafka, and document storage databases such as MongoDB. Users can obtain updates from big data stores and write to external file systems for backup or data replication. Today, GraphDB supports connectors to SoLR, Lucene and Elasticsearch, as well as Kafka. GraphDB can serve both as a Kafka producer, and as a Kafka consumer. In addition to this, information stored in MongoDB can be obtained from a dedicated plugin. We are always looking at ways to improve our coverage.

Want to learn more how to use semantic data integration with an RDF database?
White Paper: The Truth About Triplestores
The Top 8 Things You Need to Know When Considering a Triplestore

Download Now

 

[schemaapprating]

Ontotext Newsletter