Menu

Ontotext

SPARQL 1.1 and OWLIM version 4

One of the major new feature sets for OWLIM version 4 is the introduction of SPARQL 1.1 functionality via the native Sesame interfaces. Ontotext has continued to invest in the development of the Sesame openRDF framework and the fruits of this effort have seen first SPARQL 1.1 Query and now SPARQL 1.1 Update functionality added to Sesame and available with all editions of OWLIM.

Some of this functionality has been available since OWLIM 3.5 when using the Jena adapter in combination with the Jena framework and the ARQ query engine. However, this was only available for BigOWLIM and could not be used with the replication cluster. Also, the Jena adapter is somewhat cumbersome for users who have solutions built around Sesame and also delivers less than optimal performance, especially with large result sets.

The older W3C SPARQL 1.0 recommendations cover just the query language and a protocol, whereas the emerging SPARQL 1.1 specifications include:

OWLIM 4.0 was released in June 2011 and this version included the SPARQL 1.1 Query functionality conformant with the October 2010 W3C working draft. OWLIM 4.2 was released at the end of August 2011 and included SPARQL 1.1 Update functionality as well as SPARQL 1.1 Query support compliant with the May 2011 W3C working draft. SPARQL 1.1 Federation was added to OWLIM 4.3 and the Graph Store Protocol was included with OWLIM 5.0 that was released in April 2012.

SPARQL 1.1 Query

This version of the SPARQL Query definition adds many language features found in modern database query languages including:

In addition, new features specific to RDF/graph data structures and new functions and operators have been added:

The end effect is that a lot of application functionality can now be pushed in to query evaluation. Instead of applications aggregating results (COUNT, AVG, SUM, MIN, MAX) this can now be achieved in the query-engine. For queries over large datasets, e.g. some billions of statements on a remote OWLIM cluster, the aggregation occurs at the server end without the need to transfer possibly billions of results to the client application. The potential performance improvements are significant, see below for another example.

Property paths also provide a useful syntax for specifying arbitrary length sequences of properties for connecting resources, e.g. the following example will retrieve people that know a police officer, or who know somebody who knows a police officer, or who know somebody who knows somebody.... (ad infinitum) who knows a police officer:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbpedia: <http://dbpedia.org/resource/>

SELECT ?p WHERE {
  ?p foaf:knows* ?po .
  ?po a dbpedia:Police_officer .
}

Note the use of '*' after foaf:knows to specify that the query engine should search transitively over foaf:knows properties. This functionality is not possible using SPARQL 1.0.

SPARQL 1.1 Update

This extension to SPARQL allows for the modification of an RDF database via a SPARQL end-point. The language provides a means to:

  • add/remove statements from graphs
  • create/remove graphs
  • move/copy statements between graphs
  • group updates in to transactions

These extensions allow applications to use a standard language to modify an RDF database, rather than relying on non-standard APIs and frameworks. The update language features are also much more expressive and powerful compared to the functionality provided by typical APIs. The end result is that application development time is reduced, application functionality improved and portability increased. For example:

PREFIX foaf:  <http://xmlns.com/foaf/0.1/>

WITH <http://example/addresses>
DELETE { ?person foaf:givenName 'Bill' }
INSERT { ?person foaf:givenName 'William' }
WHERE
  { ?person a foaf:Person .
    ?person foaf:givenName 'Bill'
  } 

This example updates the graph http://example/addresses to rename all people with the given name "Bill" to "William".

Expressivity and Performance Benefits

In 2010 the UK Government contracted Ontotext to implement a semantic Knowledge Base (KB) for the Government's Web Archive. The project brings together publicly available linked data and open-source text mining technology for semantic indexing and search over archive contents. See The National Archives: Semantic Knowledge Base (SKB) success story.

The system was finalized in March 2011 and the full archive have been analyzed and indexed. The documents in the archive were annotated with references to a large KB (about 2 billion statements), which combines linked data (namely the integrated dataset of FactForge) with UK government specific datasets. The archive contains about 160 million unique documents, each of which are annotated with an average of 50 RDF metadata statements that link them to the pre-existing factual knowledge. This adds up to about 8 billion statements of metadata, which when added to the SKB repository it makes more than 10 billion explicit statements - around 12 billion statements after materialization.

One of the search interfaces provided for the SKB project is the KIM semantic faceted search interface, which relies on lower level SPARQL queries executed by OWLIM.

A recent upgrade from BigOWLIM 3.5 to OWLIM-SE 4.1 provided a number of benefits, the first of which includes the removal of the scalability boundary for indexing 2^32 (4 billion) unique resources. This allowed the SKB metadata storage capacity to grow to nearly 12 billion statements.

On top of this, the new version of OWLIM allowed the search engine to use more expressive SPARQL 1.1 Query features that not only better capture the expressed query, but also deliver massive performance improvements.

One such example is the following query:

PREFIX pkm: <http://proton.semanticweb.org/2006/05/protonkm#>
PREFIX ptop: <http://proton.semanticweb.org/2006/05/protont#>
PREFIX ff: <http://factforge.net/>
PREFIX skb: <http://proton.semanticweb.org/skb-ont#>

SELECT DISTINCT ?E ?ML WHERE {
  ?D pkm:mentions ?E .
  ?ML <http://www.ontotext.com/owlim/lucene#> "alan*" .
  ?E ff:preferredLabel ?ML ;
    a skb:Officer .
}
LIMIT limit 25

that retrieves officers whose names start with 'alan' and who are mentioned by any web-page in the archive. This query is compliant with SPARQL 1.0 and during execution can find many bindings for ?D which are not needed, because ?D does not appear in the select clause. In effect, the designer of this query wanted to retrieve officers mentioned by at least one web-page, but due to limitations in the expressivity of SPARQL 1.0, it was necessary to iterate all web-pages that mention officers and then discard them.

One of the new features of SPARQL 1.1 Query is the EXISTS keyword that exactly captures this intention. It is used in FILTER expressions in order to indicate when at least one binding exists, but without the need to iterate over all bindings. In a way, this is similar to the difference between evaluating SELECT and ASK queries, where ASK will terminate when the first solution is found, whereas SELECT will continue until all solutions are found.

Using OWLIM 4, the above query can be re-written as:

PREFIX pkm: <http://proton.semanticweb.org/2006/05/protonkm#>
PREFIX ptop: <http://proton.semanticweb.org/2006/05/protont#>
PREFIX ff: <http://factforge.net/>
PREFIX skb: <http://proton.semanticweb.org/skb-ont#>

SELECT DISTINCT ?E ?ML WHERE {
  FILTER EXISTS { ?D pkm:mentions ?E . } 
  ?ML <http://www.ontotext.com/owlim/lucene#> "alan*" . 
  ?E ff:preferredLabel ?ML ; 
    a skb:Officer .
}
LIMIT 25

The benefit of this new expressiveness on query performance is shown in the table below:

Query and OWLIM version Typical execution time
SPARQL 1.0 Query with OWLIM 3.5 Greater than 10 minutes
SPARQL 1.1 Query with OWLIM 4.1 A few seconds

In this example, the performance improvement was greater than one hundred-fold. However, there are many other new features in SPARQL 1.1 that dramatically improve the expressiveness of queries and their resulting evaluation performance. Ultimately, this improves application performance and reduces application complexity and development time.