Inference is the derivation of new knowledge from existing knowledge and axioms. In an RDF database, inference is used for deducing further knowledge based on existing RDF data and a formal set of inference rules.
It was Sherlock Holme’s deductive powers that set him apart as the preeminent detective of Victorian London. He took the facts of the case and applied them to his encyclopaedic understanding of the world of crime. This enabled Holmes to discover new relationships between his current case and what he already knows about crimes and criminals. He’s gaining new facts all the time. That is inference.
Another example comes from Aristotle and another figure from Western literature, Socrates.
Aristotle stated the famous syllogism:
All men are mortal.
Socrates is a man.
Therefore, Socrates is mortal.
We can describe the first two statements, called premises by Aristotle, as a very simple database of statements, which could be easily represented as RDF triples. We have the ontology, sometimes also referred to as the vocabulary – All men are mortal. We also have some instance data – Socrates is a man.
Without inference, there is nothing more to be gained. Our database can only answer two questions. ‘Are all men mortal?’ And ‘Is Socrates a man?’ Pretty uninteresting, huh?
Funny enough, that uninteresting database is how relational database work. With a semantic database like Ontotext’s GraphDB with the power of inference, there is an extra relationship or fact that can be deduced – Therefore, Socrates is mortal. 50% more interesting! This very simple database can answer one extra question, ‘Is Socrates mortal?’ A relationship between Socrates the man and a general concept of mortality that did not exist beforehand. That is inference.
The simple example demonstrated that inference is as clever as Sherlock for deducing new knowledge. Without inference, you incur the cost of finding, encoding, storing and maintaining an explicit statement of every possible fact you and your organization needs. Even more interesting is the use of inference to identify inconsistencies. This is particularly useful if you are integrating new data internally or within the wider web. Just as inference can figure out that Socrates is mortal. It can figure out when there is a problem.
All men are mortal.
Socrates is a man.
Socrates is not mortal.
A database without inference would quite happily report back contradictory information when querying about Socrates’ mortality. Inference, however, would inevitably flag up the problem.
A graph database by using ontologies creates a flexibility in your data model. The world, or how you want to represent it in data, changes. You change the ontology. You can add types of relationships and concepts. Inference is also flexible. You can change the rules by which you infer new knowledge.
Ultimately, inference is a set of relationships and concepts as well. In the first example, we could deduce Socrates is mortal because there is an inference rule described in the same subject-predicate-object language to enable the relationship between Socrates and being mortal because he is a man.
There are different inference methods, each with its advantages and disadvantages. You can start with the goal you want to achieve and work your way backward. This is called backward chaining – you start with a fact and then attempt to prove that this fact is true. A database does that by first checking to see if the fact is present and, if not, uses existing facts and rules to gain proof.
The process that GraphDB uses for inference is called forward-chaining and it starts with the known facts, not unlike Sherlock Holmes on a case. It then applies the inference rules to generate new facts like how we deduced that Socrates is mortal. It then exhaustively reapplies the inference rules on the known facts and the newly inferred facts to produce yet more facts. This goes on and on until no new facts can be deduced. Though this initial exhaustive discovery of knowledge can be expensive in terms of time and the usage of run-time memory, the advantage is that when it comes time to query the database, it is extremely quick.
When all new facts are deduced, GraphDB saves and indexes all of the inferred facts back to the database, so that it doesn’t require an exhaustive reapplication of inference rules to create all the possible facts. This approach is called total materialization and once you’ve done it, nothing else is required to perform reasoning, which makes query and retrieval fast and efficient. This also enables GraphDB, a very complex and sophisticated database, to benefit from the decades of query optimization techniques from regular relational databases.
Since GraphDB is optimized for handling very large datasets, its rules expressivity is limited to the monotonic logic. In the monotonic entailment, the newly asserted statements cannot invalidate the previously existing statements. Thus, adding a single statement to a very big repository would require to calculate the inference rules only for this particular statement.
Hence, GraphDB also ensures that new facts added to the knowledge base do not cause inconsistencies and other problems with existing facts that have been produced by inference. In addition, it stores both the original and the inferred facts, so when any of original facts need to be removed from the database, it has the ability to also retract any facts that have been inferred from them.
GraphDB enables users to select what set of inference rules they want to apply to their data by providing full standard-compliant reasoning for RDFS, OWL-Horst, OWL2-RL and OWL2-QL. Apart from the predefined rule-sets it offers, GraphDB can also be configured to use custom rule-sets where the semantics is better tuned to a particular domain.
The applications of inferencing span industries and use cases. Knowing that two people or two companies are connected through a series of other factual relationships can be helpful in identifying non-obvious networks for a variety of purposes. In the same way, when analyzing economic markets, the ability to infer trading price points for commodities using weather and regional data may provide a significant competitive advantage.
Deriving new knowledge from existing facts adds extra explanatory power to your knowledge discovery and can be one of the most useful tools for your data-driven business analytics. With a semantic graph database like GraphDB as the “smart brain” on top of your legacy systems, you can leverage knowledge, rules and inference to bring meaning to all of your data.
White Paper: The Truth About Triplestores
The Top 8 Things You Need to Know When Considering a Triplestore