Read our first post from this series about how to turn your disparate data into visualized knowledge, starting with a step-by-step guide for data ingestion…
In our previous blog posts of the series, we talked about how to ingest data from different sources into GraphDB, validate it and infer new knowledge from the extant facts as well as how to adapt and scale our basic solution.
But a successful system would eventually have to “graduate” out of the internal development stage and get to be used in the wider world. And the LAZY system from our previous blog posts is at the threshold of that important step. Which is great. However, if you want to interact with others, you have to have the tooling for it. As LAZY isn’t big enough, we can’t expect that other systems would always provide for interacting with it.
So, let’s see what can be done here.
Data is interconnected, like some sort of worldwide web of knowledge. No matter how great the LAZY system is, it cannot possibly hold all the relevant data. Inevitably, some competitor would arise and LAZY would gain some of their clients (and lose some of its own clients – it’s a free market after all). Or, there would be a call to integrate a third party’s system. For example, they may want to integrate with financial auditing systems that verify that the materials used are the ones paid for. Unfortunately for them, not everything runs on RDF. There can always be some sort of translation software, but let’s look at what’s already available.
That data connectivity problem was conceived and addressed a long time ago. In 2013, actually, with SPARQL 1.1. SPARQL is a format that easily allows cross-database queries. Fetching data from a remote repository is as simple as declaring its address under the SERVICE keyword. What GraphDB brings to the table are two extra features: internal federation and FedX.
Internal federation is a way to skip the HTTP overhead when performing federation requests to a repository that is within the same GraphDB installation. LAZY, for example, may deploy three repositories for each customer: an inspections repository, a standards repository and a geolocations repository.
Using internal federation, requests to those repositories can be much faster:
Fetch a building, get its country and the relevant country standard
FedX is exciting, because it allows us to integrate resources from several different sites seamlessly. Usually, we’d have to type the SERVICE keyword. Even worse, we would need to use basic authentication if the endpoint is secured and not internal. FedX does away with all that. Suppose that the “standards” repository is external and also password protected. And the same applies to “geo”. No problem, we can set up the “federated” repository that joins our local “inspections” repository with the other two repositories and query it directly.
Simpler query which does the same thing with FedX
FedX is limited when repositories have similar schemas and use the same namespaces. The federation engine may get confused as to which repository to send which part of the query. It’s naturally slower than native queries to GraphDB.
There’s a lot of data stored in relational databases. How to interact with these? That’s another question that has a ready answer. Once again, the answer involves a mapping language – and, more specifically, OBDA, the Ontop-native mapping Language.
OBDA allows you to define how your relational data would map to RDF. Suppose that LAZY has a legacy system with lots of inspections in it. Then, there would be a mapping file that specifies the translation process.
[PrefixDeclaration] lazy: http://lazy.org/compliance [MappingDeclaration] @collection [[ mappingId inspection target lazy:inspection/report/{report_id} a lazy:Report ; lazy:date {at_date}^^xsd:dateTime . source SELECT * FROM "inspections1"."inspection" WHERE “completed” = “1” ]]
Mapping the data and report ID to RDF
Beside this, LAZY would also need to provide a relevant JDBC driver for their database. Then they can use federation to access that repository from any other place, or even materialize it as native RDF. Every time, the query would be executed at the relational database, eliminating the overhead of GraphDB.
As we’ve already lamented, the world does not run on RDF. There are many other data formats. One of the popular ways to query data is GraphQL. The great benefits of GraphQL are that it’s easy (compared to SPARQL, anyway), you only get what you request, it has a ready federation solution and there’s a lot of services implementing it.
When you venture into answering queries over RDF data with GraphQL, you don’t need to look further than Ontotext Platform. Ontotext Platform allows you to define a simple model of your data – or to generate it from your pre-existent ontology. This model would contain a number of objects such as Report, Drone, Inspection, Building, etc. Each object would have a number of properties – for example, Report.date, Building.location, etc. Then, you can use GraphQL to query those properties, with powerful filtering capabilities.
Get all buildings built after 1990 that are not taller than 100 meters
What’s most important here is that you can also federate with external sources. You can use the extend and external flags to define a property as something that needs to be fetched from an external source. Suppose that LAZY has to integrate with another inspection software’s data. They also have a GraphQL type called “Building”. So, LAZY (or their partners) would have to declare that their “Building” extends the base type. Then, the federation service query resolvers would mix the data sources and provide the customer with a united answer.
Extend the base Building object with the builtOn property. Specify that inspectionRating is provided externally.
There are many other benefits of the Platform such as RBAC security at the Object level or Elasticsearch integration with Semantic Search, but for now, LAZY is not interested in them.
Federation of different data sources is a tough subject. With Ontotext’s products, LAZY is firmly on the way towards finding a solution.
Data analytics can be a bit elitist. After all, it requires you to know specific query languages, sometimes even tools. This is why, when we move towards the business side, often some kind of visualization is required. Why have all that wonderful data if only a few people can navigate around it?
GraphDB is not primarily a data visualization tool. That does not mean that it cannot visualize data. Quite the contrary, the workbench offers some excellent capabilities for exploring your data visually. However, the visualization still adheres to the RDF model and it isn’t very flexible. LAZY wouldn’t be able to overlay the locations of their buildings and drones on a map of the area, for example. GraphDB and the Platform are very versatile when it comes to data access. The time would be spent on perfecting the visualization and not on wrangling the infrastructure and hoping it would connect.
When visualizing directly from SPARQL, you can use the trusty Jupyter notebooks. Jupyter is a product that usually isn’t considered very visually appealing and it has a somewhat high learning curve. However, as it is directly utilizing Python and SPARQL, it can be really powerful.
Visualization with SPARQLWrapper
The end result – buildings on a map using latitude and longitude
If Python and Jupyter’s rough default looks are not to a customer’s liking, LAZY can immediately turn to something more user-friendly such as Kibana visualizations. Kibana is built on Elasticsearch and offers a simple configuration mechanism. The Elasticsearch requirement is not a problem. GraphDB Enterprise comes with an Elasticsearch connector and the Platform can create it automatically from a SOML description.
Using SPARQL to create an Elasticsearch index (this can also be done through the Workbench UI)
Once you have that, Kibana dashboards are only a small step away.
A Kibana Dashboard map
Custom visualizations, of course, are always possible. Between SPARQL, GraphQL, Elasticsearch (or Solr, or Lucene) and even SQL access, GraphDB and the Platform can expose LAZY’s data in many formats that are sure to satisfy even the most demanding analyst.
The final LAZY Architecture that integrates with third party services
Just like most enterprises, LAZY started out relatively humble. Initially, its goal was simply to reduce the strain on building inspectors by automating some of their tasks. But, the user base and revenue hopefully grow overtime. And so do their requirements. First, it is as simple as ingesting data from different sources. Quite soon, however, they all have to face the questions that LAZY faced. How to scale? How to be resilient? How to play well with others? It is a scenario that happens on a grand scale in most enterprises around the globe.
With GraphDB and Ontotext Platform, LAZY has found a solid foundation. Ontotext can be its one-stop shop for everything that relates to knowledge graphs. From ontologies through inference, virtualization and all the way to visualizations, there is a tool in Ontotext’s tool belt that can help.
What’s more, this solution can be reapplied. By deploying with Helm and Kubernetes, LAZY has come to an architecture that can work on any infrastructure. With it, surveyors can make the right decisions at a high pace, keeping up with the requirements of institutions and private customers. That’s the power of a knowledge graph that is not only properly constructed, but also well-deployed and well-visualized.
Ontotext’s GraphDBGive it a try today! |