GraphDB Users Ask: What is the Best Way to Store the Triples’ History in the Database?

TESTED ON: GraphDB 10.1

March 31, 2023 5 mins. read GraphDB Q&As

ONTOTEXT ANSWER:

Keeping a historical record of the DB state is a frequently faced challenge. To start off, you need to keep in mind that any approach towards this task will have obvious advantages and disadvantages:

  • If you keep such information, you can answer useful queries such as “which users updated this object on the following day?”
  • On the other hand, as this is extra data that you are keeping in the database, there will be an impact on the performance and resource utilization.

With that in mind, GraphDB offers at least three solutions to suit your needs. If none quite work for your use case, we also offer a flexible plugin API that can allow you to write your own custom logic.

The modeling approach

This is the most flexible approach and is highly recommended when you have full control over your ingestion. Under such a scenario, you can keep your data in timestamped named graphs. For example, data ingested on the 1st of May 2022, can be kept in the named graph <http://example.org/2022/05/01>. If you want finer granularity, you can use something more specific, like a precise timestamp: <http://example.org/1651408271>.

Now, this might be insufficient – you may want to use named graphs for another purpose or you don’t want each instance to be dropped in a named graph. In such a case, you can keep triples like <http://example.org/createdAt/> “2022-05-01T12:21:03Z”^^xsd:dateTime and <http://example.org/lastUpdated/> “2022-05-01T12:25:03Z”^^xsd:dateTime.

The approach above works for the instance as a whole. If you want to use it on individual triples, you would have to combine this with RDF-star and nested triples.

<<http://example.org/subject> <http://example.org/predicate> <http://example.org/object>> <http://example.org/lastUpdated/> “2022-05-01T12:25:03Z”^^xsd:dateTime.

If you want to achieve your goals by modeling, you need to control the whole ingestion process.

Entity change plugin

The change tracking plugin allows you to effectively give a timestamp to every triple that is being ingested. When you enable the plugin for a transaction, you give a specific in-memory named graph. All triples persisted in this transaction would be added to this graph, but also to the graph that they were intended for. This goes around the issue of using named graphs for multiple purposes.

Note the usage of the semicolon to separate the two SPARQL requests in the same transaction. You can access the two “added” and “removed” special named graphs to check what happened in this transaction.

The advantage of such an approach is that it’s easier to intercept all requests to the database and plug in a pre-commit transaction, which enables the track-changes plugin. The disadvantage is that those graphs are in-memory. When a shutdown is due, you need to persist them.

History plugin

In cases where you don’t have any control over the ingestion, or where you don’t want to deal with modeling issues, you can use the history plugin instead. The history plugin allows you to keep track of certain triples and instance types. It is global – it keeps track of all transactions done by all users. Since it’s fully automated, you won’t have to touch anything after configuring the plugin. The downside of such an approach is that it is the least flexible.

The history plugin creates a new index. By default, GraphDB stores data inside a PSO (predicate-subject-object) and a POS index. With the history plugin, a DSPOCI index would also be kept.

  • D stands for a date.
  • I stands for insertion or deletion – it’s a boolean flag.

Naturally, such an index would increase your database size. Assuming you keep historical data for all your triples and only the default indexes, your database size would nearly double. In order to manage this, you have the capability to trim and compact the history index.

You can configure the history plugin to filter some specific data. It’s unlikely that you want history for each triple – just a few key objects should be enough. This can be done for each position, Subject, Predicate, Object or Context. Each filter can be either:

  • * – wildcard, add everything.
  • IRI, BNODE, LITERAL – only keep history if the value at this position is of the correct type.
  • Specific IRI, e.g. <http://example.org/test>.
  • IRI template, e.g. <http://example.org/*>

Once you have ingested data, you can query it like this:

You can combine the three options as you like. Additionally, you can use either option for history tracking with the audit log to track down the user who made the change.

 

New call-to-action

 

Article's content

Ontotext answers questions from our GraphDB users. You can also check out the frequently asked questions on general topics about GraphDB. Or you can get quick answers on technical questions from the community as well as Ontotext experts using the graphdb tag on stack overflow.

GraphDB Users Ask: Where Can We Deploy GraphDB And What Are Some Best Practices?

In this blog, we answer questions from our GraphDB users. This question is about where can one deploy GraphDB and what are some best practices

GraphDB Users Ask: What Isolation Levels Does GraphDB Support?

In this blog, we answer questions from our GraphDB users. This question is about the the isolation levels GraphDB supports..

GraphDB Users Ask: What is the Most Important Hardware Attribute for Optimizing GraphDB Performance?

In this blog, we answer questions from our GraphDB users. This question is about the most important hardware attribute for optimizing GraphDB performance.

GraphDB Users Ask: What is the Best Way to Store the Triples’ History in the Database?

In this blog, we answer questions from our GraphDB users. This question is about the best way to store the triples’ history in the database

GraphDB Users Ask: Can I Use Nested Repositories to Introduce Logical Separation to GraphDB?

In this blog, we answer questions from our GraphDB users. This question is about using nested repositories to introduce logical separation to GraphDB

GraphDB Users Ask: Can I Fine-tune Security on Some of the Endpoints in GraphDB?

In this blog, we answer questions from our GraphDB users. This question is about fine-tuning securing on a GraphDB endpoint.

GraphDB Users Ask: What Are the Different Ways to Deploy GraphDB?

In this blog, we answer questions from our GraphDB users. This question is about the different ways to deploy GraphDB.

GraphDB Users Ask: What is the best way to integrate JSON data in GraphDB?

In this blog, we answer questions from our GraphDB users. This question is about the best ways to integrate JSON data in GraphDB.

GraphDB Users Ask: How Does GraphDB’s Security Work, Especially for Automated APIs?

In this feature, we answer questions from our GraphDB users. This question is about how about GraphDB security workds, especially for Automated APIs

GraphDB Users Ask: Is Kafka Only Used for Exporting Data, or for Importing, or Can We Do Both?

In this feature, we answer questions from our GraphDB users. This question is about if Kafka is used only for exporting or importing data or we can use for both

GraphDB Users Ask: How Do I Change the Configuration of an Existing Connector?

In this feature, we answer questions from our GraphDB users. Today’s question is about how to change the configuration of connector if you’ve made a mistake when creating it

GraphDB Users Ask: Are There Any Administration Differences to Operating a Cluster on GraphDB 10?

In this feature, we answer questions from our GraphDB users. Today’s question is about whether there are administration differences to operating a cluster in GraphDB 10

GraphDB Users Ask: Can I Scale GraphDB?

In this feature, we answer questions from our GraphDB users. Today’s question is if one can scale GraphDB.

GraphDB Users Ask: Can I Change My Inference At Runtime?

In this feature, we answer questions from our GraphDB users. Today’s question is if one can change inference at runtime.

GraphDB Users Ask: How To Mark Statements In A Query As Explicit Or Implicit?

In this feature, we answer questions from our GraphDB users. Today’s question is about how to mark statements in a query as explicit or implicit.

GraphDB Users Ask: Can I Use the Standard Ontop Configurations?

In this feature, we answer questions from our GraphDB users. Today’s question is if one can use the standard Onotp configurations.

GraphDB Users Ask: Should I Use a SPARQL Repository or a HTTP Repository?

In this feature, we answer questions from our GraphDB users. Today’s question us whether to use a SPARQL Repository or a HTTP Repository.

GraphDB Users Ask: Do You Have Any Advice on the Log4j Vulnerability for Different Versions of GraphDB?

In this feature, we answer questions from our GraphDB users. Today’s question is about the Log4j vulnerability for different versions of GraphDB.

GraphDB Users Ask 12 Very Short Questions

In this feature, we answer questions from our GraphDB users. Today, we answer 12 very short question from GraphDB users.

GraphDB Users Ask: Which of the GraphDB Logs Do I Need to Monitor for Problems?

In this feature, we answer questions from our GraphDB users. Today’s question is about GraphDB logs and how to monitor for problems.

GraphDB Users Ask: Can You Help Me Optimize My Queries?

In this feature, we answer questions from our GraphDB users. Today’s question is about how users can optimize their queries.

GraphDB Users Ask: What’s the Difference Between SPARQL and FedX Federation?

In this feature, we answer questions from our GraphDB users. Today’s question is about the difference between SPARQL and FedX federation.

GraphDB Users Ask: What Does The “Insufficient Free Heap Memory” Error Mean?

In this feature, we answer questions from our GraphDB users. Today’s question is about what the “Insufficient Free Heap memory” error means.

GraphDB Users Ask: How To Optimize My Inference?

In this feature, we answer questions from our GraphDB users. Today’s question is about how to optimize inference.

GraphDB Users Ask: Is RDF-Star The Best Choice For Reification?

In this feature, we answer questions from our GraphDB users. Today’s question is about whether RDF-star is the best choice for reification.

GraphDB Users Ask: Can GraphDB Infer Data Based on Values From a Virtualized Repository?

In this feature, we answer questions from our GraphDB users. Today’s question is about if GraphDB’s inference works with virtualized repositories.

GraphDB Users Ask: How Does SHACL Work on GraphDB?

In this feature, we answer questions from our GraphDB users. Today’s question is about how SHACL works on GraphDB.

GraphDB Users Ask: Does GraphDB Support ABAC?

In this feature, we answer questions from our GraphDB users. Today’s question is about if GraphDB supports ABAC.

GraphDB Users Ask: Why Do I Get Errors About GraphDB Being “Unable to Find Valid Certification Path to Requested Target”?

In this feature, we answer questions from our GraphDB users. Today’s question is about getting errors about GraphDB being “unable to find valid certification path to requested target”.

GraphDB Users Ask: How Can I Break Up My Data to Control Access To It?

In this feature on our blog, we answer questions from our GraphDB users. Today’s question is about GraphDB security and access control.

GraphDB Users Ask: Why does My Import Start Really Fast But Then Starts Losing Speed After a While?

In this feature on our blog, we answer questions from our GraphDB users. Today’s question is about GraphDB import speed.

GraphDB Users Ask: Can You Help Me Understand The Built-in GraphDB Security?

In this feature on our blog, we answer questions from our GraphDB users. Today’s question is about GraphDB security.

GraphDB Users Ask: How Many Repositories Can I Have in GraphDB and How Can I Unite the Disparate Data Between Them?

In this feature, we answer questions from our GraphDB users. Today’s question is about the number of repos in GraphDB and accessing the data.