The answer to this question loops back to one of our previous posts – GraphDB is a forward chaining database. This means that inferred statements are changed – inserted or removed – during data updates. Virtualized repositories are, as the name suggests, virtual. No data is ingested for these repositories. You can think about them as you would think about a translator. You write a SPARQL query, they translate it to another language and give you the result triples. At no point do they store the triples.
So, if the triples are not stored, is there no way to perform inference on them?
There’s another issue here as well. Even if data was inferred for virtualized repositories, it would probably contain literals. GraphDB implements simple inference based on statement patterns. To give you an example of a statement pattern:
Inference on gift giving
That’s great. But what if we want to specify that the gift was non-trivial, i.e., it cost more than 1 euro? Unfortunately, that’s not possible. GraphDB performs inference directly on the identifiers that IRIs get assigned. Literals, like the value of the object, are not stored in the same data structure. This means that the data structure is kept as small and as fast as possible, but precludes us from doing inference with values.
So, is this a lost cause? Fortunately, it is not. Even before the advent of GraphDB 9.8, there was the notifications API, which listens to specific statement patterns and notifies the user that something’s changed. And although the notifications API is hard to scale, Kafka is not.
GraphDB 9.8 introduced a connector that allows you, when a triple is changed, to feed data about that change into a Kafka producer. This can include entity filters and complex triple patterns. Then this data can be consumed by a script, or, if it’s relatively simple, fed directly into a Smart Updates template. With this, you can use a SPARQL insert to trigger a SPARQL insert to trigger a SPARQL insert…
For a virtual repository, this would require configuring your own Kafka producer at the core database that you are virtualizing. Fortunately, most modern databases, just like GraphDB, have some sort of Kafka connector. What is great about Kafka is that you can do this recursive logic at scale. You may have as many partitions and as many consumers of Kafka data as you want.
def process_kafka_string(msg, graphdb_endpoint): if msg.value: # Insertion lead_iri = msg.key.decode("utf-8") data = json.loads(msg.value.decode("utf-8")) qname = "queries/infer.rq" path = os.path.join(os.path.dirname(__file__), qname) with open(path, 'r') as request: update = request.read().replace("?template", "<" + lead_iri + ">") if "someValue" in data: update = update.replace("?someValueTemplate", str(data["someValue"])) _perform_update_request(graphdb_endpoint, lead_iri, update, "performing inference", "Successfully inferred data for") else: # Truth maintenance. lead_iri = msg.key.decode("utf-8") qname = "queries/remove.rq" path = os.path.join(os.path.dirname(__file__), qname) with open(path, 'r') as request: update = request.read().replace("?template", "<" + lead_iri + ">") _perform_update_request(graphdb_endpoint, lead_iri, update, "removing inferred statements", "Successfully cleaned inferred statements for")
Sample Python-based Kafka consumer that infers statements
And what about retraction of assertions? Or, in other words, when inferred data is no longer relevant, how do we clean it up? If you look at the example above, there is a section on “Truth maintenance”. When a triple is deleted, this also produces a Kafka message. The key of that message is the subject of the triple, and there is no value. So, we can use this to know that we should remove the relevant inferred statements.
The caveat of this is that connectors only listen to typed instances – i.e., entities that have a rdf:type. So, as always, looking into data quality is crucial. Perhaps you want to use SHACL for it?
Did this help you solve your issue? Your opinion is important not only to us but also to your peers.