• Blog
  • Informational

RDF-Star: Metadata Complexity Simplified

The next generation data and content management technology heavily relies on diverse, complex and dynamic metadata. Knowledge graphs, backed by translytical graph databases and formal semantics, offer the solution. RDF-Star brings the simplicity and usability of property graphs without sacrificing the essential semantics that enables correct interpretation and diligent management of the data. 

June 11, 2021 9 mins. read Jarred McGinnis

There are no easy answers in life or in Information Architecture. Design decisions come with tradeoffs. Relational databases (RDBS) have been the workhorse of ICT for decades. Being able to sit down and define a complete schema, a blueprint of the database, gave everyone assurity and consistency. Sure, you have to ignore the edge cases and hope that they stay edge cases. And yeah, the real-world relationships among the entities represented in the data had to be fudged a bit to fit in the counterintuitive model of tabular data, but, in trade, you get reliability and speed. Surely, business requirements don’t change over time, right?

Graph Databases vs Relational Databases

The simple, transactional data that relational databases do well increasingly does not reflect the hyper-connected dynamic needs of today’s business environment. Ironically, relational databases only imply relationships between data points by whatever row or column they exist in. With graph databases the representation of relationships as data make it possible to better represent data in real time, addressing newly discovered types of data and relationships. Relational databases benefit from decades of tweaks and optimizations to deliver performance. However, when it comes to queries that involve large and highly interconnected master data, the performance is solidly in favour of graph databases like GraphDB. This is why data-driven companies like the FAANGs, global pharma brands and the financial industry have long ago switched to graph databases.

For instance, the analysis of M&A transactions in order to derive investment insights requires the raw transaction data, in addition to the information on relationships of the companies involved in these transactions, e.g. subsidiaries, joint ventures, investors or competitors. This is a graph of millions of edges and vertices – in enterprise data management terms it is a giant piece of master/reference data. Now consider that transaction data is dynamic (thousands of equity transactions take place daily) and, to further complicate the scenario, the reference data is dynamic as transactions often imply new relationships in the company graph. To handle such scenarios you need a transalytical graph database – a database engine that can deal with both frequent updates (OLTP workload) as well as with graph analytics (OLAP).

Not Every Graph is a Knowledge Graph: Schemas and Semantic Metadata Matter

Click on image to enlarge

In order to have a competitive advantage in dynamic environments, enterprises need to enhance their proprietary information using global knowledge as context for interpretation and source for enrichment. They should be able to continuously integrate data across multiple internal systems and link it to data from external sources. To be able to automate these operations and maintain sufficient data quality, enterprises have started implementing the so-called data fabrics, that employ diverse metadata sourced from different systems.

“The ability of the data fabric to continuously find, integrate, catalog, and share all forms of metadata: It should be able to do this across all environments, including hybrid and multicloud platforms, and at the edge. This metadata should then be represented, along with its intricate relationships, in a connected knowledge graph model that can be understood by the business teams”

Further, “ML-Augmented data integration is making active metadata analysis and semantic knowledge graphs pivotal parts of the data fabric””

Gartner, ‘Data Fabrics Add Augmented Intelligence to Modernize Your Data Integration’, Ehtisham Zaidi, Eric Thoo, Guido De Simoni, Mark Beyer,  December 17, 2019,.

The vital added-value of KGs is the paradigm for using ontologies – explicit formal conceptual models – to provide consistent unified access to data scattered across different systems. The key characteristic is that ontologies capture, integrate and operationalize knowledge across several disciplines and type of systems:

  • Database schemas;
  • Master and reference data, critical in enterprise data management;
  • Taxonomies and controlled vocabularies for content and knowledge management;
  • Scientific data, particularly important in pharma and healthcare;
  • Product catalogues and complex technical and configuration knowledge;
  • All sorts of metadata – from provenance to usage logs and access rights.

If you want to solve interesting problems beyond basic data analytics, you are going to need formal semantics and that means schemas. Schemas are powerful. They create reliable, consistent and communicable models for representing data. It provides meaning. In the world of relational databases where meaning is tacit and reliant on a costly database architect to define the entire model a priori and hope they don’t make a mistake or the model needs adaptation at production.

The advantage of knowledge graphs over a relational database is that schema is data too. It can be queried. It can also be modified as the business needs require. Knowledge graphs use ontologies as semantic schemas in order to accommodate all the above types of knowledge in a way that allows both human experts and computers to understand and interpret them in an unambiguous manner.

The addition of formal semantics to the data model has a number of advantages:

  • It creates the ability to align with other schemas and reliably query diverse datasets. For example, with semantics, it’s possible to get results from a query where one schema uses ‘parentOf’ and another uses ‘childOf’ or ‘relativeOf’.
  • Using globally unique identifiers (URIs) for both types and instances further clarifies meaning and interoperability across the web.
  • With formal semantics, inference can provide answers to complex queries that are not explicitly stated in the data.
  • It enables a more robust validation of graph consistency and quality compared to property graphs.

RDF vs. Property Graphs

The stack of Semantic Web standards (RDF, RDFS, SPARQL, OWL, SHACL) is developed through the W3C community process, to make sure that the requirements of different actors are satisfied – all the way from logicians to enterprise data management professionals and system operations teams. It incorporates several interoperable schema languages to make sure different applications and types of data can be represented properly (e.g. open-world vs. closed-world assumptions).

On the other hand the property graph model, implicitly defined by the Apache ThinkerPop framework (referred to as Property Graph), is designed for efficient paths traversal and similar tasks. They aren’t concerned with publishing or integrating data. Enterprise data management and governance require standards for query schema and query languages, identification and serialization formats, federation and management protocols none of which is present in the Property Graph stack. Lacking any form of formal semantics, they certainly aren’t a good choice for automated reasoning over data to provide data insights. This is why data architects and organisations interested in the sustainability of their data prefer RDF for implementation of their knowledge graphs.

Metadata about Relationships Come in Handy

At the low-level of representation of data in a graph, there is often a need to attach metadata to relationships, which are most naturally represented as edges in the graph. Such examples are provenance (e.g. where a relationship is sourced from or who edited it last), access control and representation of context (e.g. time span).

The ability to attach properties to relationships (i.e. the edges) as simply as to entities (i.e. nodes) was an advantage to Property Graph representations. It’s not impossible to do in RDF, but the workaround all came with costs. In the diagram below you can see four different ways of doing it without extension of the RDF specification: reification, singleton properties or named graphs, and N-ary relationships. Each approach’s advantages and disadvantages are detailed in What is RDF-Star?

Natural and intuitive modelling is key for numerous reasons! All these approaches share several important problems: they are counter-intuitive, representation is “hairy” and requires extra effort to comprehend. When people are faced with the challenging task of integrating data across multiple sources, many of them quite complex on their own, it is too burdensome to ask them to further complicate the representation. It changes the already difficult task into an impractical one. This unnecessary modelling complexity can impede the adoption of KG in large enterprises who could benefit most from their semantics.

Removing ungainly representation was an early lesson learned by the pioneers of the Semantic Web when Description Logics (DL) were a prevalent approach in the field. DL reasoning’s power came at the cost of performance, but that isn’t what made the semantic world move away from DLs. The true problem was that the DL semantics were too complex to comprehend at scale. Tracing inference chains on anything non-trivial became onerous. If it’s not intuitive to a KR-specialist with a pocketful of PhDs, what chance would a commercial developer have? RDFS with its simple entailment rules solved that issue.

RDF-Star: Simplifying What Need Not Be Complex

Recently an extension has been proposed, RDF-Star (sometimes referred to as RDF*), that addresses those issues by reducing the document size and increasing efficiency. Most importantly, it provides a representation that is human-friendly and intuitive.

For example, in your data you have People entities and Job Titles and the relationship between them called positionHeld. If you wanted to label positionHeld with a length of time, it would require the addition of additional statements to ‘reify’ with additional statements such as Id1 which in turn would have relationship with other properties such as StartDate and EndDate. Id1 isn’t intuitive or useful. RDF-Star enables you to do directly what you want and with fewer statements. Namely, label the relationship just as easily as you label entities.

RDF-Star comes with the corresponding extension of the query language SPARQL-Star – examples of modelling properties on edges in RDF-Star and querying them from SPARQL-Star are provided here.

The Best of Both Worlds

RDF-Star provides a representation that is closer to the clean simplicity of property graphs but without having to sacrifice semantics in the bargain. RDF-Star goes beyond the expressivity of Property Graphs, where you can only attach key-value pairs to relationships – in RDF-Star you can make a statement about an edge in the graph that refers to another RDF resource, e.g. description of context, shared with other edges.

Knowledge graphs have been shown to be essential for dynamic, flexible and automated data management that is required today with its reusable data services, machine-readable semantic metadata and APIs that ensure the integration and orchestration of data across the organization and with 3rd parties.

RDF-Star is furthering the open data philosophy of RDF to ensure interoperability and avoid the inevitable headaches of proprietary languages that dominate the world of Property Graphs. More than RDBMS or Property Graphs, knowledge graphs deliver unified data access, automation of data management tasks, and meaningful data in context.


GraphDB Free Download
Ontotext’s GraphDB
Give it a try today!

Download Now


Article's content

Technical Author at Freelancer

Jarred McGinnis is a managing consultant in Semantic Technologies. Previously he was the Head of Research, Semantic Technologies, at the Press Association, investigating the role of technologies such as natural language processing and Linked Data in the news industry. Dr. McGinnis received his PhD in Informatics from the University of Edinburgh in 2006.

Human-computer Collaboration with Text Analysis for Content Management

Read about how knowledge-driven computing such as Ontotext’s content management solutions are essential for closing the semantic gap between humans and computers.

RDF-Star: Metadata Complexity Simplified

Read about how RDF-Star brings the simplicity and usability of property graphs without sacrificing the essential semantics that enables correct interpretation and diligent management of the data.

Knowledge Graphs for Open Science

Read about how knowledge graphs model the relationships within scientific data in an open and machine-understandable format for better science

Knowledge Graphs and Healthcare

Read about how industry leaders are using Ontotext knowledge graph technology to discover new treatments and test hypotheses.

Does Your Right Hand Know That Your Left Hand Just Lost You a Billion Dollars?

Read about how by automatically identifying and managing human, software and hardware related outages and exposures, Ontotext’s smart connected inventory solution allows banks to save much time and expenses.

Data Virtualization: From Graphs to Tables and Back

Read about how GraphDB’s data virtualization allows you to connect your data with the knowledge graph regardless of where that data lives on the internet or what format it happens to be in.

Throwing Your Data Into the Ocean

Read about how knowledge graphs help data preparation for analysis tasks and enables contextual awareness and smart search of data by virtue of formal semantics.

Ontotext Invents the Universe So You Don’t Need To

Read about the newest version of Ontotext Platform and how it brings the power of knowledge graphs to everyone to solve today’s complex business needs..

From Data Silos to Data Fabric with Knowledge Graphs

Read about the significant advantages that knowledge graphs can offer the data architect trying to bring a Data Fabric to their organization.

What Does 2000 Year Old Concrete Have to Do with Knowledge Graphs?

Read about how knowledge graphs provide a ‘human-centric’ solution to preserving institutional memory and avoiding operational mistakes and missed business opportunities.

Three’s Company Too: Metadata, Data and Text Analysis

Read about how metadata grew more expressive as user needs grew more complex and how text analysis made it possible to get metadata from our information and data.

The New Improved and Open GraphDB

Read about Ontotext’s GraphDB Version 9.0 and its most exciting new feature – open-sourcing the Workbench and the API Plugins.

It Takes Two to Tango: Knowledge Graphs and Text Analysis

Read about how Ontotext couples text analysis and knowledge graphs to better solve today’s content challenges.

Artificial Intelligence and the Knowledge Graph

Read about how knowledge graphs such as Ontotext’s GraphDB provide the context that enables many Artificial Intelligence applications.

Semantic Search or Knowing Your Customers So Well, You Can Finish Their Sentences For Them

Read about the benefits of semantic search and how it can determine the intent, concepts, meaning and context of the words for a search.

The Knowledge Graph and the Internet’s Memory Palace

Learn about the knowledge graph and how it tells you what it knows, how it knows it and why.

The Web as a CMS: How BBC joined Linked Open Data

Learn what convinced the skeptics on the editorial side of the BBC to try the simple but radical idea of ‘The Web as a CMS’.

Can Semantics be the Peacemaker between ECM and DAM?

Learn about how semantics (content metadata) can give peace a chance and resemble how humans understand and use the content.

The Future is NOW: Dynamic Semantic Publishing

Learn how semantically annotated texts enhance the delivery of content online with Ontotext’s News On the Web (NOW) demo.

Introducing NOW – Live Semantic Showcase by Ontotext

Discover interesting news, aggregated from various sources with Ontotext’s NOW and enjoy their enriched content with semantic annotation.