We are pleased to announce the next GraphDB 9.10 release, which offers smarter RDF updates through SPARQL templates, seamless integration of updates from Kafka, graph path search performance improvements, an OntoRefine command-line interface and numerous fixes.
Updating the content of RDF documents has always been tricky due to the very nature of RDF – no fixed schema and no standard notion for management of multi-document graphs. There are two widely employed strategies when it comes to managing RDF documents – storing each RDF document in a single named graph vs. storing each RDF document as a collection of triples where multiple RDF documents exist in the same graph.
The single RDF document per named graph is easy to update, you can simply replace the content of the named graph with the updated document and GraphDB provides an optimization to do that efficiently. When there are multiple documents in a graph and a single document needs to be updated, the old content of the document must be removed first. This is typically done using a handcrafted SPARQL update that deletes only the triples that define the document. That SPARQL needs to be the same on every client that updates data in order to get consistent behavior across the system.
GraphDB 9.10 introduces smart updates using server-side SPARQL templates. Each template corresponds to a single document type, and defines the SPARQL update that needs to be executed in order to remove the previous content of the document. The template can also be used to generate any metadata triples that need to be added every time a document is updated such as the exact timestamp the document was updated.
To initiate a smart update, the user provides the IRI identifying the template (i.e., the document type) and the IRI identifying the document. The new content of the document is then simply added to the database in any of the supported ways – SPARQL INSERT, add statements, etc.
Smart updates remove the burden of managing handcrafted SPARQL as part of individual applications or clients and push it to the server.
Modern business has an ever increasing need of integrating data coming from multiple and diverse systems. Automating the update and continuous build of the knowledge graphs with the incoming streams of data can be cumbersome due to a number of reasons such as verbose functional code writing, numerous transactions per update, suboptimal usability of GraphDB’s RDF mapping language and the lack of a direct way to stream updates to knowledge graphs.
GraphDB’s open-source Kafka Sink connector, which supports smart updates with SPARQL templates, solves this issue by reducing the amount of code needed for raw event data transformation and thus contributing to the automation of knowledge graph updates. It is a separately running process, which helps avoid database sizing. The connector allows for customization according to the user’s specific business logic and requires no GraphDB downtime during configuration.
With it, users can push update messages to Kafka, after which a Kafka consumer processes them and applies the updates in GraphDB.
GraphDB 9.10 introduces exportable graph pattern bindings that allow users to project any number of bindings from the graph pattern SERVICE clause. GraphDB combines the power of SPARQL graph pattern-matching with path search algorithm – one can restrict the start and the end nodes of the path search to those pairs that match a particular graph pattern defined as SPARQL property path. With GraphDB 9.10 one can “export” bindings from such graph patterns and in this way get additional details about the found paths.
We’ve also added bidirectional search that can be used to traverse paths as if the graph is undirected, i.e. if the edges between the nodes have no direction. Technically, the bidirectional search traverses adjacent nodes both in S-P-O and O-P-S order, where the subject and object are the recursively evaluated start and end nodes. Bidirectional search can be combined with wildcard and graph pattern search as well as the new exportable graph pattern bindings.
Last but not least, we have redesigned the graph path search algorithm to evaluate adjacent nodes starting from both the source and the end node (rather than just the source node) in order to reduce the number of operations exponentially. This was particularly noticeable with large datasets where a long path search query between two nodes could take minutes to evaluate, while with the new algorithm the query achieves sub-second performance.
GraphDB comes with the latest RDF4J 3.7.3, which includes important bug fixes related to SPARQL, ND-JSONLD and RDF-star serialization.
Get ready to take advantage of all new GraphDB 9.10 features!
For more information, contact Doug Kimball, Chief Marketing Officer at Ontotext