In the world of data, the machine is king, but humans live in a world of knowledge. It is knowledge, hopefully, supported by data, that enables decisions to be made and actions to be taken. But, although supercomputers like Summit have astounding computational power, they are still as dumb as toasters when it comes to acquiring information or skills by experience.
To make them smarter, together with the huge amounts of data produced by humanity (which, according to a recent Forbes article, is 2.5 quintillion bytes per day), we need to provide them with the knowledge of how to interpret this data (the experience).
People and organizations in the know understand the business value and strategic advantage of moving up through the knowledge pyramid and knowledge bases are designed to do just that.
A knowledge base is a collection of interlinked descriptions of entities (real-world objects, events, situations or abstract concepts) interlinked in a way that enables storage, analysis and reuse of this knowledge in a machine-interpretable way. The Facebooks, Amazons, Netflix and Googles (FANGs) of the world have long ago figured this out and they all use knowledge bases.
Most people are familiar with traditional, relational databases. There are cells and tables filled with letters and numbers. Years of refinements and optimizations have ensured that organizations can manage phenomenal amounts of data. But as the American author Clifford Stoll said it best:
‘Data is not information, information is not knowledge’.
The world of databases is very much still a place governed by computers. It is left to legions of human DbA’s, application developers and UI designers to decode the logic of those strings, numbers, columns and rows in a way that users can make sense of. That’s increasingly impossible in a 2.5 quintillion bytes per day world.
Knowledge bases, on the other hand, abstract away from a simple database to create an organized collection of data that is closer to how the human brain organizes information. Knowledge bases add a semantic model to the data, which includes a formal classification with classes, subclasses, relationships and instances (ontologies and dictionaries), on one hand, and rules for interpreting the data, on the other.
That’s a very important difference compared to a traditional database, which stores everything in the same way. Instead, by using a very flexible semantic model, a knowledge base shapes up data and information the way an organization understands it. If that understanding changes, the knowledge base can be adapted and augmented without having to change the data itself. This allows the evolving expert knowledge to be kept along with the data and preserves the organizational memory in a formal, machine-interpretable way.
Together with the easily extendable classification of the data, there is also a formal inference layer built on top of the knowledge base. This is a series of rules and statistical models for interpreting data and information that instruct machines how to use the data and how to derive knowledge out of it. As a result, it removes the necessity of passing data specific knowledge among DbAs and application developers. It also further pushes the organization up to the higher levels of the knowledge pyramid.
A major advantage of knowledge bases is that the logic of the data interpretation stays with the data and not only with the application.
Technically, this means that all applications can still use their specific application profiles (which is a set of integrity constraints for checking the physical and logical correctness or rationality of a certain dataset). In fact, the knowledge base is often a fully-distributed decentralized system where multiple schemas and application profiles applied over the same data codify different application perspectives to the data.
Essentially, all applications still own the data they use, but as they comply to or extend the organization’s understanding of the data (the formal semantic model), they make this data reusable.
Adopting a knowledge base helps organizations share:
Combining data and data interpretation rules from various applications in this way enables a whole range of intelligent operations such as business intelligence, predictive maintenance, decision support and many more.
By going beyond just keywords, concepts and tags, the knowledge base improves content indexing and advanced search. It provides the knowledge behind individual concepts and allows search engines and other content retrieval applications to interpret text and match it to advanced queries. The most well-known knowledge base is Google’s Knowledge Graph, which helps obtain and optimize search engine results.
Huge global brands such as the BBC use Ontotext’s technology to power enormous, complex and dynamic websites that require near perfect uptime knowledge base access (e.g., 99.995% minimum uptime).
There are different ways of making a knowledge base work for an organization. The standard (and most powerful) choice for enterprises of all sizes to house their knowledge base is to use a graph database. Graph databases implement W3C standards for describing the data and its semantics such as RDF (for representing graph data), SPARQL (the query language for distributed knowledge bases), SKOS (the knowledge organization system), etc.
The semantic graph database that Ontotext has developed, GraphDB, is fully compliant with all W3C standards as well as many industry-led specifications for storing, query and distribution of knowledge bases. It can integrate knowledge with other knowledge bases across the world in a seamless and secure way.
The next step when adopting a knowledge base is to choose the formal model for describing the data. Here again, it is best to use the de facto industry standard such as the shared vocabulary built by Google, Microsoft, Yahoo and Yandex – Schema.org. The initial intention of the founders of Shema.org was to make it easier for webmasters and developers to structure metadata on their websites and to create enhanced descriptions that would help search engines interpret the published content. However, the value of this collaborative initiative goes far beyond its original scope as Schema.org provides a standardized vocabulary for most of the business objects required by an enterprise.
Once an organization has decided on a database for housing their knowledge base and on the data model describing it, it’s important to know how it can enrich the original dataset with more data. There are many public datasets that describe human knowledge in different areas (i.e., locations, drugs, publications, etc.) and follow the set standards for data publishing. The largest knowledge base in terms of data that can be retrieved is the Linked Open Data Could (https://lod-cloud.net/).
Ontotext has made the most central datasets of the LOD Cloud available in a GraphDB instance in FactForge. FactForge is a hub for open data and news about people, organizations and locations and offers a public service for free access to this data represented as an RDF graph.
All in all, the power of having the data model as part of the data enables the integration of knowledge bases and charges an organization’s data and information with the wisdom of the world. And, wisdom, as we well know, is the peak of the knowledge pyramid offering an enterprise a 360 view.
White Paper: The Truth About Triplestores
The Top 8 Things You Need to Know When Considering a Triplestore