Linked Data is a set of design principles for sharing machine-readable interlinked data on the Web. Open Data, on the other hand, is data that can be freely used and distributed by anyone, subject just to the requirement to attribute and share-alike, at most. Datasets that are both open and linked are Linked Open Data.
Similarly to the principles and standards for defining what linked data or open data is, we can also measure how much linked and open a set of data is.
In 2010, the inventor of the World Wide Web and the creator and advocate of the Semantic Web and Linked Data, Sir Tim Berners-Lee, suggested a 5-star deployment scheme for Linked Open Data. The rating begins at one star and data gets stars when proprietary formats are removed and links are added.
Let’s take a look at what does it take to be ‘awarded’ each of the five stars and what benefits the users of those datasets draw from going up the ‘stars count’.
The one-star open data is defined as data available on the web, in whatever format, but with an open license, so as to be Open Data. Consumers can look, search, store, change data and share the data with anyone they like. As a data publisher, an organization knows that it’s simple to publish and does not need to constantly explain to others that they can use the data.
In order to win a second star, the open data needs to be available as machine-readable structured data, for example, an excel spreadsheet instead of an image scan of a table. The users of 2-star open data can do anything they do with a 1-star data plus directly processing it with proprietary software and exporting it into another structured format. However, that type of data is still locked up because users depend on proprietary software to be able to get the data out of a document.
Therefore, the third star is awarded to data for which users don’t require proprietary software package in order to analyze it. One example of this is the comma-separated values (CSV) format that stores tabular data in plain text.
Another star goes to data that uses open standards from W3C, such as RDF and SPARQL, to identify things. RDF, which stands for Resource Description Framework, is the standard used in a semantic graph database. This graph database, also called an RDF triplestore, is a type of semantic technology for storing and managing interlinked data and making sense of that interconnected data. Unlike the relational database, the triplestore maps the various relationships between entities in graph databases. SPARQL is the W3C-standardized query language for the RDF database.
The core concept of the triplestore and the underlying Linked Data principle is the Uniform Resource Identifier (URI), a unique ID for all things linked. By representing data in a graph database, the user can link to it from any other place or reuse parts of the data.
With the help of the W3C standards and Linked Data principles, data publishers link their data to other people’s data to provide context. This is the prerequisite for getting the fifth star for Linked Open Data, according to Sir Berners-Lee.
The semantic graph database is capable of handling various datasets and maps links to linked open data sources such as DBpedia or GeoNames, for example.
Users of five-star data can discover more and more interlinked information while using the data. As the semantic graph database is capable of inferring new links out of existing facts, users can discover more relationships within their linked data.
Simply put, the five-star Linked Open Data is open data available on the Web linked to other data, and its network effect is to the benefit of both data consumers and data publishers.
White Paper: The Truth About TriplestoresThe Top 8 Things You Need to Know When Considering a Triplestore |