The heart of the knowledge graph is a knowledge model – a collection of interlinked descriptions of concepts, entities, relationships and events where:
Knowledge graphs combine characteristics of several data management paradigms:
Knowledge graphs, represented in RDF, provide the best framework for data integration, unification, linking and reuse, because they combine:
Ontologies represent the backbone of the formal semantics of a knowledge graph. They can be seen as the data schema of the graph. They serve as a formal contract between the developers of the knowledge graph and its users regarding the meaning of the data in it. A user could be another human being or a software application that wants to interpret the data in a reliable and precise way. Ontologies ensure a shared understanding of the data and its meanings.
When formal semantics are used to express and interpret the data of a knowledge graph, there are a number of representation and modeling instruments:
Not every RDF graph is a knowledge graph. For instance, a set of statistical data, e.g. the GDP data for countries, represented in RDF is not a KG. A graph representation of data is often useful, but it might be unnecessary to capture the semantic knowledge of the data. It might be sufficient for an application to just have a string ‘Italy’ associated with the string ‘GDP’ and a number ‘1.95 trillion’ without needing to define what countries are or what the ‘Gross Domestic Product’ of a country is. It’s the connections and the graph that make the KG, not the language used to represent the data.
Not every knowledge base is a knowledge graph. A key feature of a KG is that entity descriptions should be interlinked to one another. The definition of one entity includes another entity. This linking is how the graph forms. (e.g. A is B. B is C. C has D. A has D). Knowledge bases without formal structure and semantics, e.g. Q&A “knowledge base” about a software product, also do not represent a KG. It is possible to have an expert system that has a collection of data organized in a format that is not a graph but uses automated deductive processes such as a set of ‘if-then’ rules to facilitate analysis.
Google Knowledge Graph. Google made this term popular with the announcement of its knowledge graph in 2012. However, there are very few technical details about its organization, coverage and size. There are also very limited means for using this knowledge graph outside Google’s own projects.
DBPedia. This project leverages the structure inherent in the infoboxes of Wikipedia to create an enormous dataset of 4.58 things (link https://wiki.dbpedia.org/about ) and an ontology that has encyclopedic coverage of entities such as people, places, films, books, organizations, species, diseases, etc. This dataset is at the heart of the Open Linked Data movement. It has been invaluable for organizations to bootstrap their internal knowledge graphs with millions of crowdsourced entities.
Geonames. Under a creative commons, users of Geonames dataset have access to 25 million geographical entities and features.
Wordnet. One of the most well-known lexical databases for the English language, providing definitions and synonyms. Often used to enhance the performance of NLP and search applications.
FactForge. After years of developing expertise in the news publishing industry, Ontotext produced their knowledge graph of Linked Open Data and news articles about people, organizations and locations. It incorporates the data from the KGs described above as well as specialized ontologies such as the Financial Industry Business Ontology.
Years ago, we moved away from the buzzword of Big Data to Smart Data. Having unprecedented amounts of data pushed the need to have a data model that mirrored our own complex understanding of information.
To make data smart, the machines needed to be no longer bound by inflexible data schemas defined ‘a priori’. We needed data repositories that could represent the ‘real world’ and the tangled relationships that are entailed. All this needed to be done in a machine-readable way and have a formal semantics to enable automated reasoning that complemented and facilitated our own.
RDF databases (also called RDF triplestores), such as Ontotext’s GraphDB, can smoothly integrate heterogeneous data from multiple sources and store hundreds of billions of facts about any conceivable concept. The RDF graph structure is very robust (it can handle massive amounts of data of all kinds and from various sources) and flexible (it does not need its schema re-defined every time we add new data).
As we have already seen, there are many freely available interlinked facts from sources such as DBpedia, GeoNames, Wikidata and so on, and their number continues to grow every day. However, the real power of knowledge graphs comes when we transform our own data into RDF triples and then connect our proprietary knowledge to open global knowledge.
Another important feature of RDF databases is their inference capability where new knowledge can be created from already existing facts. When such new facts are materialized and stored in an RDF database, our search results become much more relevant, opening new avenues for actionable insights.
But if we want to add even more power to our data, we can use text mining techniques to extract the important facts from free-flowing texts and then add them to the facts in our database.
It is no surprise that modern text analysis technology makes considerable use of knowledge graphs:
Ontotext Platform implements all flavors of this interplay linking text and big knowledge graphs to enable solutions for content tagging, classification and recommendation. It is a platform for organizing enterprise knowledge into knowledge graphs, which consists of a set of databases, machine learning algorithms, APIs and tools for building various solutions for specific enterprise needs.
One interesting example of semantic tagging on news against a big knowledge graph developed around DBPedia is Ontotext’s NOW public news service.
A number of specific uses and applications rely on knowledge graphs. Examples include data and information-heavy services such as intelligent content and package reuse, responsive and contextually aware content recommendation, knowledge graph powered drug discovery, semantic search, investment market intelligence, information discovery in regulatory documents, advanced drug safety analytics, etc.
White Paper: Knowledge Graphs in the Enterprise