What is RDF-Star?

RDF-Star (also known as “RDF*”) allows descriptions to be added to edges in a graph such as scores, weights, temporal aspects and provenance to edges in a graph. Formally, RDF* extends the RDF graph model by allowing statements about statements, i.e., one can attach metadata, which describe an edge in a graph, while RDF allows statements to be made only about nodes.

 

RDF-star (or RDF*) and the associated query language SPARQL-star (also written as SPARQL*) are the most widely supported extension of the existing standards and will be included in RDF 1.2. In RDF* one can make statements about statements, formally called statement-level annotations. For instance, one can provide a time span for a relationship. The example above demonstrates that Abraham Lincoln was the President of the United States (i.e. POTUS) from March 4th 1862.

RDF* goes beyond the expressivity of Property Graphs, where one can attach key-value pairs to relationships. The statement-level annotations enable a more efficient representation of scores, weights, temporal restrictions and provenance information.

Why Does RDF Need an Asterisk?

RDF stands for Resource Description Framework and is a standard for data interchange on the Web, developed via a W3C’s community process. Conceived as the foundation of the entire stack of Semantic Web standards, it has features that allow it to link data published without centralized control. An expressive and powerful language, it enables Ontotext and other vendors to deliver enterprise-strength knowledge graphs to solve the data management problems of some of the world’s top global brands in finance, pharmaceuticals and media.

RDF (without the star) is an abstract knowledge representation model that does not differentiate data from metadata. It provides enormous flexibility for the expression of multiple levels of metadata about nodes, classes, predicates (relationship types) and even (sub-)graphs. It is all about making statements about nodes such as &lt:man rdfs:Label “Ivan”&gt. Nodes and predicates are identified via URIs (Uniform Resource Identifiers, e.g., URLs). Here follows an example:

:hasSpouse rdf:type owl:TransitiveProperty; 
       rdfs:subPropertyOf :familyRelationship .
:Person rdfs:subClassOf :Agent ; 
       rdfs:label “Person” ; 
       rdf:description “A human being” .

GRAPH :myFamilyData {
       :man :hasSpouse :woman .
       :woman rdf:type :Person ; 
       :hasGender “Female”; 
       :birthdate "2000-12-08"^^xsd:date .
}

:myFamilyData <http://purl.org/dc/elements/1.1/creator&gt :me ;
       < http://purl.org/dc/elements/1.1/date&gt "2021-02-15"^^xsd:date .

As we see above, one can declare relationship types such as :hasSpouse to be of a specific class (transitive properties) and to be a more specific version of another property, i.e., :familyRelationship. Similarly, one can define new classes of objects and attach metadata to them, e.g., human readable labels and descriptions. One can link two nodes such as :man and :woman with a specific type of relationship like the defined above :hasSpouse. It is also very straightforward to provide data or metadata about the nodes as, for example, to define that :woman is an instance of :Person and has specific gender and birth date.

Using the so-called named graphs (we use the TRIG format above) one can group several  statements and designate them as a specific graph, named in the example above :myFamilyData. Again it is easy to attach metadata to the entire graph, e.g., author and publication date.

In RDF (almost) everything has an URI – the specific objects, as well as the classes, the predicates, the graphs, etc. And one can make further statements that describe these resources (that’s why it is called Resource Description Framework!). It is a very flexible one. Still, there is one thing for which there is no direct way to describe: the edges in the graph, the specific statements.

Current Approaches to Tagging Edges

Without the ability to express statement-level metadata annotations, engineers have had to develop a number of approaches (e.g. hacks) to mitigate the inherent lack of native support for such edge-level properties in RDF. However, they all have certain advantages and disadvantages, which we will look at below.

Standard Reification

Reification means expressing an abstract construct with the existing concrete methods supported by the language. The RDF specification sets a standard vocabulary for representing references to statements like:

  :man :hasSpouse :woman .
  :id1 rdf:type rdf:Statement ;
    rdf:subject :man ;
    rdf:predicate :hasSpouse ;
    rdf:object :woman ;
    :startDate "2020-02-11"^^xsd:date .

 

 

Standard reification requires stating four additional triples to refer to the triple for which one wants to provide metadata. The subject of these four additional triples has to be a new identifier (IRI or blank node), which later on may be used for providing the metadata. The existence of a reference to a triple does not automatically assert it.

Advantage: This approach is compliant with published RDF standards and will be supported by any RDF store.
Disadvantage: This approach creates inefficiency related to exchanging or persisting the RDF data and the cumbersome syntax to access and match the corresponding four reification triples.

N-ary Relations

The approach for representing N-ary relations in RDF is to model it via a new relationship concept that connects all arguments like:

 



 :Marriage1 rdf:type :Marriage ;
   :partner1 :man ;
   :partner2 :woman ;
   :startDate "2020-02-11"^^xsd:date .

 

Advantage: Similar to standard reification in terms of standard compliance, but it adopts a schema specific to the domain model that is presumably understood by its consumers.
Disadvantage: This approach increases the ontology model complexity and is proven difficult to evolve models in a backward compatible way.

Singleton Properties

Singleton properties are a hacky way to introduce statement identifiers as a part of the predicate like:



  :man :hasSpouse#1 :woman .
  :hasSpouse#1 :startDate "2020-02-11"^^xsd:date .

The local name of the predicate after the # encodes a unique identifier.

Advantage: This approach created a more compact representation.
Disadvantage: It is highly inefficient for querying data. For example, a query to return all :hasSpouse links must parse all predicate values with a regular expression.

Named Graphs

The named graph approach is a variation of the singleton properties, which uses the so-called named graphs, which are formally introduced in the SPARQL specification. Technically, this is a fourth element, which can be attached to the <subject, predicate, object> triple, in order to designate that this statement is part of a specific named (sub)graph. The identifier of the named graph can be treated as a node in the RDF graph, so that one can easily make statements about the entire named graph. Singleton named graph can be created to allow one to attach properties to this statement as follows:

:man :hasSpouse :woman :singletonGraph#1 .
:singletonGraph#1 :startDate "2020-02-11"^^xsd:date :metadata .

 

Advantage: The approach has multiple advantages over the singleton properties and eliminates the need for regular expression parsing.
Disadvantage: A significant drawback is the overload of the named graph parameter with an identifier instead of the file or source that produced the triple. The updates based on the triple source become more complicated and cumbersome to maintain. Also, if a repository stores a large number of named graphs, it is vital to enable the context indexes.

What RDF* Improves

RDF* is an extension of the RDF 1.1 standard that proposes a more efficient reification serialization syntax. The main advantages of this representation include reduced document size that increases the efficiency of data exchange as well as shorter SPARQL queries for improved comprehensibility.

  :man :hasSpouse :woman .
  <<:man :hasSpouse :woman>> :startDate "2020-02-11"^^xsd:date .

The RDF* extension captures the notion of an embedded triple by enclosing the referenced triple using the strings << and >>. The embedded triples, like the blank nodes, may take a subject and object position only, and their meaning is aligned to the semantics of the standard reification, but using a much more efficient serialization syntax. To simplify the querying of the embedded triples, the paper extends the query syntax with SPARQL* enabling queries like:

# List all metadata for the given reference to a statement
SELECT *
WHERE {
    <<:man :hasSpouse :woman>> ?p ?o
}

The embedded triple in SPARQL* also supports free variables for retrieving a list of reference statements:

# List all metadata for the given reference to a statement
SELECT *
WHERE {
    <<?man :hasSpouse :woman>> ?p ?o
    FILTER (?man = :man)
}

Performance of the Approaches

To test the different approaches, we benchmark a subset of Wikidata, whose data model heavily uses statement-level metadata. The authors of the paper Reifying RDF: What works well with Wikidata? have done an excellent job with remodeling the dataset in various formats, and kindly shared with our team the output datasets. According to their modeling approach, the dataset includes:

Modeling approachTotal statementsLoading time (min)Repository image size (MB)
Standard reification391,652,27052.436,768
N-ary relations334,571,87750.634,519
Named graphs277,478,5215635,146
RDF-star220,375,7023422,465

We did not test the singleton properties approach due to the high number of unique predicates.

Syntax and Examples

This section provides more in-depth details on how Ontotext’s RDF database GraphDB implements the RDF* syntax.

Let’s say we have a statement like the one above, together with the metadata fact that we are 90% certain about this statement. The RDF* syntax allows us to represent both the data and the metadata by using an embedded triple as follows:

<<:man :hasSpouse :woman>> ex:certainty 0.9 .

According to the formal semantics of RDF*, each embedded triple also asserts the referenced statement and its retraction – deletes it. Unfortunately, this requirement breaks the compatibility with the standard reification and causes a non-transparent behavior when dealing with triples stored in multiple named graphs. GraphDB implements the embedded triples by introducing a new additional RDF type next to IRI, blank node, and literal. So in the previous example, the engine will store only a single triple.

Below are a few more examples of how this syntax can be utilized.

Object Relation Qualifiers:

<<:man :hasSpouse :woman>> :startDate "2020-02-11"^^xsd:date

:hasSpouse is a symmetric relation so that it can be inferred in the opposite direction. However, the metadata in the opposite direction is not asserted automatically, so it needs to be added:

<<:woman :hasSpouse :man>> :startDate "2020-02-11"^^xsd:date

Data Value Qualifiers:

<<:painting :height 32.1>>
  :unit :cm;
  :measurementTechnique :laserScanning;
  :measuredOn "2020-02-11"^^xsd:date.

Statement Sources/References:

<<:man :hasSpouse :woman>>
  :source :TheNationalEnquirer;
  :webpage <http://nationalenquirer.com/news/2020-02-12>;
  :retrieved "2020-02-13"^^xsd:dateTime.

Nested Embedded Triples:

<< <<:man :hasSpouse :woman>> :startDate "2020-02-11"^^xsd:date >>
    :webpage <http://nationalenquirer.com/news/2020-02-12> .

Converting Standard Reification to RDF*

The RDF* support in GraphDB does not exclude any of the other modeling approaches. It is possible to independently maintain RDF* and standard reification statements in the same repository, like:

:man :hasSpouse :woman .
:id1 rdf:type rdf:Statement ;
    rdf:subject :man ;
    rdf:predicate :hasSpouse ;
    rdf:object :woman ;
    :startDate "2020-02-11"^^xsd:date .

<<:man :hasSpouse :woman>> :startDate "2020-02-11"^^xsd:date .

Still, this is likely to confuse, so GraphDB provides a tool for converting standard reification to RDF* outside of the database using the reification-convert command line tool. If the data is already imported, use this SPARQL for a conversion:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
DELETE {
    ?reification a rdf:Statement .
    ?reification rdf:subject ?subject .
    ?reification rdf:predicate ?predicate .
    ?reification rdf:object ?object .
    ?reification ?p ?o .
} INSERT {
    <<?subject ?predicate ?object>> ?p ?o .
} WHERE {
    ?reification a rdf:Statement .
    ?reification rdf:subject ?subject .
    ?reification rdf:predicate ?predicate .
    ?reification rdf:object ?object .
    ?reification ?p ?o .
    FILTER (?p NOT IN (rdf:subject, rdf:predicate, rdf:object) &&
    (?p != rdf:type && ?object != rdf:Statement))
}

GraphDB extends the existing RDF and query results formats with dedicated formats that encode embedded triples natively (for example, <<:subject :predicate :object>> in Turtle*). Each new format has its own MIME type and file extension, the details of which can be found in our documentation. For the benefit of older clients, in all other formats the embedded triples are serialized as special IRIs in the format urn:rdf4j:triple:xxx. Here, xxx stands for the Base64 URL-safe encoding of the N-Triples representation of the embedded triple. This is controlled by a boolean writer setting, and is ON by default. The setting is ignored by writers that support RDF* natively.

Such special IRIs are converted back to triples on parsing. This is controlled by a boolean parser setting, and is ON by default. It is respected by all parsers, including those with native RDF* support.

Great! When Can I Use RDF*

Today is the answer.

GraphDB is working with the W3C and other technology companies to make RDF* and SPARQL* the standard for representing metadata with triples.

GraphDB Free Download
Give it a try today!

Download Now

 

Ontotext Newsletter