From Disparate Data to Visualized Knowledge Part III: The Outsider Perspective

This series of blog posts constitutes a step-by-step guide for data ingestion, inference validation and visualization with GraphDB followed by GraphQL interface setup, search and federation with Ontotext Platform.

December 3, 2021 9 mins. read Radostin Nanov

In our previous blog posts of the series, we talked about how to ingest data from different sources into GraphDB, validate it and infer new knowledge from the extant facts as well as how to adapt and scale our basic solution.

But a successful system would eventually have to “graduate” out of the internal development stage and get to be used in the wider world. And the LAZY system from our previous blog posts is at the threshold of that important step. Which is great. However, if you want to interact with others, you have to have the tooling for it. As LAZY isn’t big enough, we can’t expect that other systems would always provide for interacting with it.

So, let’s see what can be done here.

Integrating with other systems – federation

Data is interconnected, like some sort of worldwide web of knowledge. No matter how great the LAZY system is, it cannot possibly hold all the relevant data. Inevitably, some competitor would arise and LAZY would gain some of their clients (and lose some of its own clients – it’s a free market after all). Or, there would be a call to integrate a third party’s system. For example, they may want to integrate with financial auditing systems that verify that the materials used are the ones paid for. Unfortunately for them, not everything runs on RDF. There can always be some sort of translation software, but let’s look at what’s already available.

SPARQL federation

That data connectivity problem was conceived and addressed a long time ago. In 2013, actually, with SPARQL 1.1. SPARQL is a format that easily allows cross-database queries. Fetching data from a remote repository is as simple as declaring its address under the SERVICE keyword. What GraphDB brings to the table are two extra features: internal federation and FedX.

Internal federation is a way to skip the HTTP overhead when performing federation requests to a repository that is within the same GraphDB installation. LAZY, for example, may deploy three repositories for each customer: an inspections repository, a standards repository and a geolocations repository.

Using internal federation, requests to those repositories can be much faster:

Fetch a building, get its country and the relevant country standard

FedX is exciting, because it allows us to integrate resources from several different sites seamlessly. Usually, we’d have to type the SERVICE keyword. Even worse, we would need to use basic authentication if the endpoint is secured and not internal. FedX does away with all that. Suppose that the “standards” repository is external and also password protected. And the same applies to “geo”. No problem, we can set up the “federated” repository that joins our local “inspections” repository with the other two repositories and query it directly.

Simpler query which does the same thing with FedX

FedX is limited when repositories have similar schemas and use the same namespaces. The federation engine may get confused as to which repository to send which part of the query. It’s naturally slower than native queries to GraphDB.

Relational data

There’s a lot of data stored in relational databases. How to interact with these? That’s another question that has a ready answer. Once again, the answer involves a mapping language – and, more specifically, OBDA, the Ontop-native mapping Language.

OBDA allows you to define how your relational data would map to RDF. Suppose that LAZY has a legacy system with lots of inspections in it. Then, there would be a mapping file that specifies the translation process.

[PrefixDeclaration]
lazy:        http://lazy.org/compliance

[MappingDeclaration] @collection [[
mappingId    inspection
target        lazy:inspection/report/{report_id} a lazy:Report ;  lazy:date {at_date}^^xsd:dateTime .
source        SELECT * FROM "inspections1"."inspection" WHERE “completed” = “1”
]]

Mapping the data and report ID to RDF

Beside this, LAZY would also need to provide a relevant JDBC driver for their database. Then they can use federation to access that repository from any other place, or even materialize it as native RDF. Every time, the query would be executed at the relational database, eliminating the overhead of GraphDB.

GraphQL federation

As we’ve already lamented, the world does not run on RDF. There are many other data formats. One of the popular ways to query data is GraphQL. The great benefits of GraphQL are that it’s easy (compared to SPARQL, anyway), you only get what you request, it has a ready federation solution and there’s a lot of services implementing it.

When you venture into answering queries over RDF data with GraphQL, you don’t need to look further than Ontotext Platform. Ontotext Platform allows you to define a simple model of your data – or to generate it from your pre-existent ontology. This model would contain a number of objects such as Report, Drone, Inspection, Building, etc. Each object would have a number of properties – for example, Report.date, Building.location, etc. Then, you can use GraphQL to query those properties, with powerful filtering capabilities.

Get all buildings built after 1990 that are not taller than 100 meters

What’s most important here is that you can also federate with external sources. You can use the extend and external flags to define a property as something that needs to be fetched from an external source. Suppose that LAZY has to integrate with another inspection software’s data. They also have a GraphQL type called “Building”. So, LAZY (or their partners) would have to declare that their “Building” extends the base type. Then, the federation service query resolvers would mix the data sources and provide the customer with a united answer.

Extend the base Building object with the builtOn property. Specify that inspectionRating is provided externally.

There are many other benefits of the Platform such as RBAC security at the Object level or Elasticsearch integration with Semantic Search, but for now, LAZY is not interested in them.

Federation of different data sources is a tough subject. With Ontotext’s products, LAZY is firmly on the way towards finding a solution.

Visualization tools and data access

Data analytics can be a bit elitist. After all, it requires you to know specific query languages, sometimes even tools. This is why, when we move towards the business side, often some kind of visualization is required. Why have all that wonderful data if only a few people can navigate around it?

GraphDB is not primarily a data visualization tool. That does not mean that it cannot visualize data. Quite the contrary, the workbench offers some excellent capabilities for exploring your data visually. However, the visualization still adheres to the RDF model and it isn’t very flexible. LAZY wouldn’t be able to overlay the locations of their buildings and drones on a map of the area, for example. GraphDB and the Platform are very versatile when it comes to data access. The time would be spent on perfecting the visualization and not on wrangling the infrastructure and hoping it would connect.

When visualizing directly from SPARQL, you can use the trusty Jupyter notebooks. Jupyter is a product that usually isn’t considered very visually appealing and it has a somewhat high learning curve. However, as it is directly utilizing Python and SPARQL, it can be really powerful.

Visualization with SPARQLWrapper


The end result – buildings on a map using latitude and longitude

If Python and Jupyter’s rough default looks are not to a customer’s liking, LAZY can immediately turn to something more user-friendly such as Kibana visualizations. Kibana is built on Elasticsearch and offers a simple configuration mechanism. The Elasticsearch requirement is not a problem. GraphDB Enterprise comes with an Elasticsearch connector and the Platform can create it automatically from a SOML description.

Using SPARQL to create an Elasticsearch index (this can also be done through the Workbench UI)

Once you have that, Kibana dashboards are only a small step away.

A Kibana Dashboard map

Custom visualizations, of course, are always possible. Between SPARQL, GraphQL, Elasticsearch (or Solr, or Lucene) and even SQL access, GraphDB and the Platform can expose LAZY’s data in many formats that are sure to satisfy even the most demanding analyst.

Conclusion


The final LAZY Architecture that integrates with third party services

Just like most enterprises, LAZY started out relatively humble. Initially, its goal was simply to reduce the strain on building inspectors by automating some of their tasks. But, the user base and revenue hopefully grow overtime. And so do their requirements. First, it is as simple as ingesting data from different sources. Quite soon, however, they all  have to face the questions that LAZY faced. How to scale? How to be resilient? How to play well with others? It is a scenario that happens on a grand scale in most enterprises around the globe.

With GraphDB and Ontotext Platform, LAZY has found a solid foundation. Ontotext can be its one-stop shop for everything that relates to knowledge graphs. From ontologies through inference, virtualization and all the way to visualizations, there is a tool in Ontotext’s tool belt that can help.

What’s more, this solution can be reapplied. By deploying with Helm and Kubernetes, LAZY has come to an architecture that can work on any infrastructure. With it, surveyors can make the right decisions at a high pace, keeping up with the requirements of institutions and private customers. That’s the power of a knowledge graph that is not only properly constructed, but also well-deployed and well-visualized.

Do you want to solve similar problems specific to your enterprise use case?

 

GraphDB Free Download
Ontotext’s GraphDB
Give it a try today!

Download Now

Article's content

Solution/System Architect at Ontotext

Radostin Nanov has a MEng in Computer Systems and Software Engineering from the University of York. He joined Ontotext in 2017 and progressed through many of the company's teams as a software engineer working on the Ontotext Cognitive Cloud, GraphDB and finally Ontotext Platform before settling into his current role as a Solution Architect in the Knowledge Graph Solutions team.

SHACL-ing the Data Quality Dragon III: A Good Artisan Knows Their Tools

Read our blog post about the internals of a SHACL engine and how Ontotext GraphDB validates your data

SHACL-ing the Data Quality Dragon II: Application, Application, Application!

Read our blog post to learn how to apply SHACL to your data and how to handle the output

SHACL-ing the Data Quality Dragon I: the Problem and the Tools

Read our blog post to learn about the dragon of invalid data and the wide array of SHACL constraints you can apply to combat it

Power to the Filters! GraphDB Introduces Improvements to the Connectors in its 10th Edition.

Read about the improvements of GraphDB 10 Connectors, which offer more more flexibility and further filtering capabilities when synchronizing RDF data to non-RDF stores

Connecting the Dots to Turn Data Into Knowledge: Entity Linking

Read about the advantages and disadvantages of different ways to do entity linking based on reconciliation, inference, SPARQL and Kafka

Loading Data in GraphDB: Best Practices and Tools

Read about our guided tour through data transformation, ingestion, updates and virtualization with GraphDB

At Center Stage V: Embedding Graphs in Enterprise Architectures via GraphQL, Federation and Kafka

Read about the mechanisms for building a big enterprise software architectures by embedding graphs via GraphQL, Federation and Kafka

From Disparate Data to Visualized Knowledge Part III: The Outsider Perspective

Read our final post from this series focusing on how GraphDB and Ontotext Platform provide an architecture that can work on any infrastructure resulting in a well-deployed and well-visualized knowledge graph.

From Disparate Data to Visualized Knowledge Part II: Scaling on Both Ends

Read our second post of this series focusing on what happens when you have more and faster data sources as well as when you want more processing power and more resilient and available data.

From Disparate Data to Visualized Knowledge Part I: Moving from Spreadsheets to an RDF Database

Read our first post from this series about how to turn your disparate data into visualized knowledge, starting with a step-by-step guide for data ingestion and inference validation with GraphDB