The Semantic Web is a vision about an extension of the existing World Wide Web, which provides software programs with machine-interpretable metadata of the published information and data. In other words, we add further data descriptors to otherwise existing content and data on the Web. As a result, computers are able to make meaningful interpretations similar to the way humans process information to achieve their goals.
The ultimate ambition of the Semantic Web, as its founder Tim Berners-Lee sees it, is to enable computers to better manipulate information on our behalf. He further explains that, in the context of the Semantic Web, the word “semantic” indicates machine-processable or what a machine is able to do with the data. Whereas “web” conveys the idea of a navigable space of interconnected objects with mappings from URIs to resources.
What’s behind the original vision of the Semantic Web comes under the umbrella of three things: Automation of information retrieval, the Internet of Things and Personal Assistants. You can read more about all three in the seminal article by Tim Berners-Lee, James Hendler and Ora Lassila, published in Scientific American: The Semantic Web.
With time, however, the concept evolved into two important types of data, which, taken together, implement its vision today. These are Linked Open Data and Semantic Metadata.
For the Semantic Web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.
Linked Open Data (LOD) is structured data modeled as a graph and published in a way that allows interlinking across servers. This was formalized by Tim Berners-Lee in 2006 as the Four rules of linked data:
LOD enables both people and machines to access data across different servers and interpret its semantics more easily. As a result, the Semantic Web transcends from a space comprising of linked documents to a space comprising of linked information. Which, in turn, empowers the creation of a richly interconnected network of machine-processable meaning.
Linked Open Data includes:
Today, there are thousands of datasets published as LOD across different sectors such as encyclopedia, geographic data, government data, scientific database and articles, entertainment, traveling, etc. In Life Sciences alone, there are more than 100 scientific databases published as LOD.
Because of their linking, these datasets form a giant web of data or a knowledge graph, which connects a vast amount of descriptions of entities and concepts of general importance. For example, there are several descriptions of the city of Varna (e.g., one derived from Wikipedia, another from GeoNames, etc.).
Semantic metadata amounts to semantic tags that are added to regular Web pages in order to better describe their meaning. For instance, the home page of the Bulgarian Institute for Oceanography can be semantically annotated with references to several appropriate concepts and entities, e.g., Varna, Academic Institution and Oceanography.
Such metadata makes it much easier to find Web pages based on semantic criteria. It resolves any potential ambiguity and ensures that when we search for Paris (the capital of France), we will not get pages about Paris Hilton.
If we want to have a well-determined relationship between the subject of the Web page and the corresponding page or document, it is best to use one of the structured data metadata schemes. Currently, the most popular such scheme is Schema.org, which was established by Google, Yahoo, Microsoft and Yandex. According to a recent study of the University of Mannheim, in 2015, 30% of the Web pages contained Semantic Metadata.
The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners.
Fundamental for the adoption of the Semantic Web vision was the development of a set of standards established by the international standards body – the World Wide Web Consortium (W3C):
The availability of such standards fostered the development of an ecosystem of different tools from various providers, e.g.: database engines, such as GraphDB, that deal with RDF data (known as triplestores), ontology editors, tagging tools that use text analysis to automatically generate semantic metadata, semantic search engines and much more.
Although knowledge graphs came later, they quickly became a powerful driver for the adoption of Semantic Web standards and all the semantic technologies that implement them. Knowledge graphs bring the Semantic Web paradigm to the enterprises, by introducing semantic metadata to drive data management and content management to new levels of efficiency and breaking silos to let them synergize with various forms of knowledge management.
Enterprise knowledge graphs use ontologies to make explicit various conceptual models (schemas, taxonomies, vocabularies, etc.) used across different systems in the enterprise. Using the enterprise data management slang, knowledge graphs represent a premium sort of semantic reference data: a collection of interlinked descriptions of entities – objects, events or concepts.
In this way, knowledge graphs help organizations smarten up proprietary information by using global knowledge as context for interpretation and source for enrichment.
The Semantic Web is the web of connections between different forms of data that allow a machine to do something it wasn’t able to do directly.
Cit. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by its inventor, p. 185
Thanks to their capacity to boost the generation, integration and understanding of data, the Semantic Web concepts were rapidly adopted in data and information management. Today, multiple organizations use Linked Data as a mechanism to publish master data internally. The Semantic Web standards are widely used in the development of knowledge graphs in different domains: government (for instance Legislation.gov.uk), media (BBC was the pioneer), science (both Elsevier and Springer Nature use GraphDB), financial services, etc.