What is SPARQL?

SPARQL is the standard query language and protocol for Linked Open Data and RDF databases. Having been designed to query a great variety of data, it can efficiently extract information hidden in non-uniform data and stored in various formats and sources.

SPARQL, pronounced ‘sparkle’, is the standard query language and protocol for Linked Open Data on the web or for RDF triplestores.

SPARQL, short for “SPARQL Protocol and RDF Query Language”, enables users to query information from databases or any data source that can be mapped to RDF.

The SPARQL standard is designed and endorsed by the W3C and helps users and developers focus on what they would like to know instead of how a database is organized.

Read our White Paper: The Truth About Triplestores!

SPARQL vs SQL

Just like SQL allows users to retrieve and modify data in a relational database, SPARQL provides the same functionality for NoSQL graph databases like Ontotext’s GraphDB.

In addition, a SPARQL query can also be executed on any database that can be viewed as RDF via a middleware. For example, a relational database can be queried with SPARQL by using a Relational Database to RDF (RDB2RDF) mapping software.

This is what makes SPARQL such a powerful language for computation, filtering, aggregation and subquery functionality.

In contrast to SQL, SPARQL queries are not constrained to working within one database: federated queries can access multiple data stores (endpoints). This is technically possible because SPARQL is more than just a query language. It is also an HTTP-based transport protocol, where any SPARQL endpoint can be accessed via a standardized transport layer. RDF results can be returned in several data-interchange formats and RDF entities are identified by Universal Resource Identifiers (URIs).

Forging data with URIs allows data to be unambiguously referenced across applications and overcomes the constraints posed by local search. Consequently, additional application-specific APIs can be developed and can refer to that data.

These design choices – enabling queries over distributed sources on non-uniform data – are not accidental. SPARQL is designed to enable Linked Data for the Semantic Web. Its goal is to enrich data by linking it to other global semantic resources, thus sharing, merging and reusing data in a more meaningful way.

As a result, the power of SPARQL together with the flexibility of RDF can lower development costs by making it easier to merge results from multiple data sources.

SPARQL from Within

SPARQL sees your data as a directed, labeled graph, that is internally expressed as triples consisting of subject, predicate and object.

Correspondingly, a SPARQL query consists of a set of triple patterns in which each element (the subject, predicate and object) can be a variable (wildcard). Solutions to the variables are then found by matching the patterns in the query to triples in the dataset.

SPARQL has four types of queries. It can be used to:

  1. ASK whether there is at least one match of the query pattern in the RDF graph data;
  2. SELECT all or some of those matches in a tabular form (including aggregation, sampling and pagination through OFFSET and LIMIT);
  3. CONSTRUCT an RDF graph by substituting the variables in these matches in a set of triple templates;
  4. DESCRIBE the matches found by constructing a relevant RDF graph.

The leading semantic graph databases that support SPARQL have intuitive SPARQL editors with autocomplete, explorer and many other features that facilitate building powerful SPARQL queries.

SPARQL by Example

The greatest strength of SPARQL is navigating relationships in RDF graph data through graph pattern matching. In this process, simple patterns can be combined into more complex ones, which explore more elaborate relationships in the data.

The relationships can be explored by using basic patterns, pattern joins, unions, by adding optional patterns that may extend the information about the found solutions, etc. In addition, property paths allow sequential composition (sequencing), parallel composition (alternatives), iterations (Kleene star), inversion, etc.

As already mentioned, the basic graph pattern consists of a triple in which each element (subject, predicate and object) can be a variable (wildcard).

Let’s see an example.

The pattern ‘John’ (a subject)->‘has son’ (a predicate)->X (a wildcard object) will have as a solution each triple in the RDF graph that matches the subject and the predicate, and has any object.

So if John has two sons – Bob and Michael, the triples ‘John’->‘has son’->‘Bob’ and ‘John’->‘has son’->‘Michael’ will be the results of the SPARQL query.

The power of SPARQLA SPARQL query can also express a union of alternative graph patterns. Any solution to at least one of the patterns is a solution of the union.

For example, the union of the patterns ‘John’->‘has son’->X and ‘John’->‘has daughter’->X will have as solutions all of John’s sons and all of John’s daughters.

Union of two patterns

A group graph pattern is a join of two (or more) basic graph patterns. Unlike the union, it requires that both (or all) patterns are matched. So a join of ‘John’->‘has son’->Y and Y->‘has son’->Z will have as matching solutions the sons of John’s sons.

John has son X U John has daughter Y

The sons of John’s daughters, however, will not be returned because the first basic pattern in the query, namely ‘John’->‘has son’->Y, will not be matched by a triple in the data such as ‘John’->‘has daughter’->‘Anna’.

So even if, ‘Anna’->‘has son’->‘Timmy’, Timmy will not show up as a solution of the above join. Luckily, an alternative graph pattern and a group graph pattern can easily be combined. So a union of ‘John’->‘has son’->Y and ‘John’->‘has daughter’->Y grouped with Y->‘has son’->Z will find all of John’s grandsons.

John has son Y U John has daughter Y

Extensions of SPARQL

SPARQL is not just a query language, but a comprehensive set of specifications. SPARQL UPDATE includes queries to delete data, insert data and manipulate graphs. In general, SPARQL Protocol defines how to access SPARQL endpoints and result formats and can be further extended to leverage the uniqueness of various data types.

Standardized extensions include GeoSPARQL for querying geospatial data. Custom extensions supported by GraphDB include full-text search, making queries against external full-text and faceting engines (Lucene, SOLR, ElasticSearch), RDFRank for ordering, SPARQL MM for multimedia, etc.

SPARQL-Star

RDF triplestores have often been criticized because they don’t allow for descriptions or properties to be attached to the edges in the graph (when a set of triples are joined together, they they form a natural graph, where the predicates are interpreted as edges, and the subjects and objects are the nodes). This is perceived by some as a disadvantage compared to Property Graphs. However, this concern has been addressed with RDF-Star (abbreviated RDF*), which allows one to make statements about other statements and this way to attach metadata to the edges in the graph.

Therefore, SPARQL has been extended accordingly with SPARQL-star (abbreviated SPARQL*) to accommodate the RDF* updates in the RDF model, i.e. to allow for querying metadata about edges in the graph.

Why usе SPARQL?

As you can see, there is a wide variety of graph patterns that can be matched through SPARQL queries, which reflects the variety of the data that SPARQL was designed to query. As a result, SPARQL can efficiently extract information hidden in non-uniform data and stored in various formats and sources.

As the inventor of the World Wide Web, the creator and advocate of the Semantic Web and W3C Director, Sir Tim Berners-Lee, puts it:

“Trying to use the Semantic Web without SPARQL is like trying to use a relational database without SQL. SPARQL makes it possible to query information from databases and other diverse sources in the wild, across the Web.”

Want to learn more about SPARQL and RDF databases like Ontotext’s GraphDB?

White Paper: The Truth About Triplestores
The Top 8 Things You Need to Know When Considering a Triplestore

Download Now

[schemaapprating]

Ontotext Newsletter