Read about one of the key new features of GraphDB 8.3 that helps users create visualizations of RDF data with different starting points.
With data visualization, many complex structures and relationships have the potential to become an easy-to-grasp message that speaks to a wider audience, helping them understand data and the nature of the connections they exist in.
It is just like a few years ago when Jean-Baptiste Michel and Erez Lieberman Aiden showed us that a picture is not worth a thousand words but rather 500 billion words (see: What we learned from 5 million books).
In this blog post, in a few easy steps, we will transform your tabular data into Linked Data and we will visualize the connections. This way we will find out how the pieces of your data relate to each other and what is the nature of these relationships.
For the demo, we will use a simple table with data about movies – a cvs file with information about each movie, the actors starring in it and its director.
Using GraphDB’s powerful feature OntoRefine, we will convert this csv data into a graph. Once converted into RDF, the tabular data about the movies, actors, directors and the relationships between them will be easier to navigate and explore. It is visual graph exploration that will help us delve quickly into the data as it is.
If you are not familiar with our OntoRefine feature, OntoRefine enables you to transform legacy tabular data to RDF and directly load it in your GraphDB repository, using simple SPARQL queries and a SPARQL virtual endpoint.
Now, let’s do the magic step by step.
First, you need to create a new repository “movies”: Setup -> Repositories -> Create new repository.
Next, you need to put data inside. To do that, go to Import -> Tabular (OntoRefine) -> Create project and upload the csv file. movie_metadata_small.csv
Click Next to load the data into OntoRefine and then Create Project.
OntoRefine uses the first column to assign row identifiers. Since the first column is country, its values are not suitable for identifiers. We solve this by moving the title column to the front, and in this way OntoRefine will use it to generate row ids. Go to movie_title column -> Edit column -> Move column to the beginning.
Nice, we have identifiers for each row now.
In RDF, everything is IRI and in order to connect pieces of data, we need to convert each entity to IRI. Later, we will create IRI’s for each author and director from their names but now we need to remove the spaces since they are not valid IRI symbols.
Go to the column title -> Edit column -> Add column based on this column and remove the spaces in the values. Name the column actor_1.
Do the same for actor_2_name and director_name. This should be the result.
Now that we have the data prepared, click on the RDF button to obtain the simplest SPARQL query. Add BaseIRI and Base IRI prefix for the IRIs in the generated query.
It serves as a basis for data RDF-ization. For each row + column + value in our tabular data we have a triple as a result in the form <row_id> mdb:<column_title> ?value.
Use SPARQL to create a Graph Data model
Having this data now as RDF, we can use the power of SPARQL to model it the way we want.
Put in the editor the following SPARQL and let us explain to you what it does.
PREFIX spif: <http://spinrdf.org/spif#> PREFIX mdb: <http://example.org/movieDB/> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> # An INSERT query that maps the raw RDF data from OntoRefine to user-specified # RDF data (different IRIs, types, property names and dates as date-typed literals) # and inserts the data into the current GraphDB repository. CONSTRUCT {#graph <http://example.org/movieDB/> { ?movieIRI a mdb:Movie ; mdb:title ?title ; mdb:directedBy ?directorIRI ; mdb:leadActor ?actor1IRI; mdb:2ndActor ?actor2IRI; mdb:country ?countryIRI. ?directorIRI a mdb:Director ; foaf:name ?directorName. ?actor1IRI a mdb:Actor; foaf:name ?actor1Name. ?actor2IRI a mdb:Actor; foaf:name ?actor2Name. ?countryIRI a mdb:Country; rdfs:label ?country. #} } where { # Uses SERVICE to fetch the raw RDF data from OntoRefine ?movieRow a mdb:Row ; mdb:rowNumber ?movieRowNumber. OPTIONAL {?movieRow mdb:director_name ?directorName} OPTIONAL {?movieRow mdb:director ?director} OPTIONAL {?movieRow mdb:actor_1_name ?actor1Name} OPTIONAL {?movieRow mdb:actor_2_name ?actor2Name} OPTIONAL {?movieRow mdb:actor_1 ?actor1} OPTIONAL {?movieRow mdb:actor_2 ?actor2} OPTIONAL {?movieRow mdb:movie_title ?title} OPTIONAL {?movieRow mdb:country ?country} bind(iri(concat("http://example.org/movieDB/movie/", spif:encodeURL(?movieRowNumber))) as ?movieIRI). bind(iri(concat("http://example.org/movieDB/person/", spif:encodeURL(?director))) as ?directorIRI). bind(iri(concat("http://example.org/movieDB/person/", spif:encodeURL(?actor1))) as ?actor1IRI). bind(iri(concat("http://example.org/movieDB/person/", spif:encodeURL(?actor2))) as ?actor2IRI). bind(iri(concat("http://example.org/movieDB/country/", spif:encodeURL(?country))) as ?countryIRI). bind(iri(?imdb) as ?imdbURL). } LIMIT 5000
Using a CONSTRUCT query, we can construct a new graph based on the results that we have.
To do this, we want to create for each column value an IRI, i.e for the OrlandoBloom string we want to create the IRI <http://example.org/movieDB/OrlandoBloom>. This way, we can link all the Orlando Bloom’s results to obtain a graph with his connections.
Once we have an IRI for each value, we construct a new RDF by introducing new predicates such as mdb:directedBy, mdb:leadActor, mdb:2ndActor. Each row has also a rowNumber we can fetch to generate IRIs for each movie based on it.
So in simple words what we do is we transform:
mdb:RowNumber -> http://example.org/movieDB/movie/<number>
mdb:Row mdb:director ?director -> http://example.org/movieDB/movie/<number> mdb:director http://example.org/movieDB/person/<director name>
mdb:Row mdb:actor_1 ?actor -> http://example.org/movieDB/movie/<number> mdb:actor_1 http://example.org/movieDB/person/<actor_name>
http://example.org/movieDB/person/<actor_name> a mdb:Actor.
http://example.org/movieDB/person/<director name> a mdb:Director. etc.
Execute the query and see the new results.
Now, we have a visual graph representation on top of the csv data. We can materialize it by importing it in our repository. Click on Data -> Open in main SPARQL editor.
A service line is added in the where clause that acts as a bridge for us to fetch the data from the csv. The magic happens here. We write SPARQL in our query editor but the results come from the csv -> rdf mapping.
We transform the query from CONSTRUCT to INSERT and remove the commented code to INSERT the data into a new graph <http://example.org/movieDB/>. Execute the query.
Now, we have an RDF model of our movies data in our repository. Let’s see how to explore the Graph.
In GraphDB, you can search for IRIs by enabling autocompletion. Go to Setup -> Autocomplete.
Now go to Explore -> Visual Graph and search for an actor for example “Charlie” and select CharlieRow
Expand the nodes further by double-clicking them or by clicking the expand icon for a node. The blue nodes are the movies and the labels that come from the IRIs as numbers are not so meaningful. But this is also configurable.
Click on the info icon for a node of a movie and you can see that each movie has a movieTitle predicate.
Copy this predicate. We can configure the graph now so that it uses the movie title value for a label of a node, not its row id. Click the Back to visual graph home icon and then on Create graph config. Choose Start with a search box for a Starting point and go to the Node basics tab.
From the sample queries, choose the second one, and add the copied predicate as an alternative label. Enter a name for this graph configuration and save it.
Now, load this new graph configuration and search for CharlieRowe again.
After loading it, we see now that the movies have meaningful titles.
And this is how a custom visual graph is born.
Custom visual graphs are very powerful. And it is with advanced visual graph configuration that we can combine the power of visual representation and SPARQL to obtain easy-to-understand graph views with exploration designed by us to match our data and our particular needs.
Interested to learn more about data visualization with GraphDB and custom visual graphs?