Choosing A Graph Data Model to Best Serve Your Use Case

This post discusses the LPG-RDF vis-a-vis graph models and why enterprises should invest in knowledge graphs with RDFs for their data management practices

March 27, 2024 7 mins. read Sumit Pal

Graphs and knowledge graphs are all around us. We use them every day without realizing. For example, GPS, social media, cell phone handoffs are modeled as graphs while data catalogs, data lineage and MDM tools leverage knowledge graphs for linking metadata with semantics. Knowledge graphs model knowledge of a domain as a graph with a network of entities and relationships.

There are 2 popular graph models:

LPG model

LPG uses labels for nodes and edges that characterize entities and relationships. Nodes are linked uni/bi-directionally to other nodes through edges. Both nodes and edges have associated properties modeled as key-values with primitive data types and are single-valued.

LPG strengths

LPGs support “index-free adjacency”, which makes it ideal for graph traversals to implement graph algorithms like shortest path between nodes, clustering, and centrality. Most organizations leverage LPGs to address specific niche use cases.

LPG weaknesses

LPGs are not based on theoretical foundations and lack standards and a knowledge model. They also don’t have features for enterprise data management such as schema language, data validation capabilities, interoperable serialization formats, or a proper modeling language. 

For example, a node in an LPG with a given label does not guarantee anything about its properties and data type (because it is a string and represents no semantics). This makes LPGs inflexible. LPG lacks schema and semantics, which makes it inappropriate for publishing and sharing of data. LPG is not self-explanatory as there is no contract about the meaning of the data. Without this, graphs cannot be used across applications and use cases.

Poor data modeling capabilities of LPGs with vendor specific constructs to express semantic constraints hinders portability, expressibility, and semantic data integration. It is harder to organically grow an LPG in response to business changes. This requires restructuring the graph and refactoring queries.

RDF model

The RDF data model is for encoding semantic relationships between data items that are broken down into a triples structure composed of Subject, Predicate and Object. Predicate is the relationship type assigned to the graph edge connecting the endpoints Subject and Object. It uses Internationalized Resource Identifiers (IRIs, such as the web URLs), a unique sequence of characters to identify logical or physical resources of the triple. The RDF-star extension allows making statements about statements, for example, about provenance, weight, or timespan.

RDF strengths

The primary value of RDF is that it allows making statements and connecting concepts and entities with relationships. It accelerates data projects with data quality and lineage and contextualizes through ontologies, taxonomies, and vocabularies, making integrations easier. 

RDF is used extensively for data publishing and data interchange and is based on W3C and other industry standards. It supports schema evolution without requiring data transformation and loading.

The formalism in RDFs results in the emergence of semantics. Adherence to standards promotes alignment of meaning, unambiguous interpretation, interoperability, and semantic integration of data sources.

RDF weaknesses

There are few real weaknesses of the RDF mode. The use of IRIs as identifiers sets a bit higher minimal representation footprint per edge in the graph.

Perceived weaknesses vs LPG, such as lack of node adjacency lists and support for edge properties, were already addressed with the relevant standard extensions and optimized engine implementations. People often assume that RDF graphs are harder to develop, because one must develop schema and should deal with namespace definitions for the identifiers. Actually, none of this is mandatory – one can easily bootstrap an RDF graph without schema and use blank nodes, instead of IRIs.   

Why & when organizations should move away from LPG models

LPG and RDF models are optimized for different use cases and complement rather than compete. Increasingly, organizations are using both. 

Let’s have a look at some of the differences:

  • RDFs are used for managing ontologies / taxonomies, standards, data quality, and governance while pulling data from RDF graphs to LPG is used for graph traversal and graph data science applications. 
  • RDF facilitates strategic integration while LPGs are best for tactical analytics. 
  • RDF uses unique identifiers (IRIs) to build reusable data models and standardized mapping languages to integrate external data sources. This sets it apart from LPGs to be the unifying access layer for disconnected data sources. 
  • RDF is self-describing – a concept that is foreign to LPG.
  • The RDF stack is more mature than the LPG’s with solid foundation, rich tooling and proven best practices, publicly available and reusable data models backed by standards for enterprise data. 
  • The ability of RDFs to enrich data through composition, whereby nodes with the same URI can be automatically merged allows data linking and sharing with semantics. In contrast, this has to be manually done in LPGs, which involves extra code and business logic.
  • RDFs can encode provenance of metadata and provide semantics for human and machine consumption improving data discoverability whereas, again, with LPGs this has to be manually done.
  • In comparison to LPGs, RDF graphs demonstrate a higher level of granularity. This verbosity allows schema, metadata, and instance data to be in one place, enabling accessibility and manageability. This is especially important when integrating diverse data sources. 
  • Knowledge graphs based on RDF, leverage IRIs for nodes and edges, ensuring universality and standard compliance, which is missing in LPG.

Real-world knowledge graphs need RDF

The key difference between arbitrary graph data structures like LPG and RDF knowledge graphs is the presence or absence of a knowledge model. This is what turns raw data into knowledge by combining schemas with ontologies, taxonomies, controlled vocabularies, domain specific models and business rules. This provides semantics, unambiguous representation and interpretation.

“People think RDF is a pain because it is complicated. The truth is even worse. RDF is painfully simplistic, but it allows you to work with real-world data and problems that are horribly complicated.” – Dan Brickley, and Google

In general, LPGs fall short of these capabilities. Nodes in an LPG do not differentiate between data types like classes and categories. They cannot easily incorporate data rules with SKOS and OWL and to W3C standards, thus losing interoperability, which is imperative in modern data architectures.

LPGs are rudimentary knowledge graphs. They provide relationship detection between data elements but lack uniformity of nomenclature. They fail to identify concepts, and semantics, which hinders unambiguous data sharing across organizations.

Enterprises that embark on building Enterprise Knowledge Graphs (EKG) start their journey with LPG, but quickly back out, realizing its shortcomings. This is especially true for enterprises with domain models where a large-scale effort is needed to consolidate and extract value from data across the enterprise silos.

Given the complexity and multidimensionality of real-world data, RDFs are essential for harnessing the capabilities of semantic graphs to construct robust knowledge graphs. Collectively, these factors establish RDF as the preferred choice for enterprises when building knowledge graphs. 

To wrap it all up

Real-world enterprise data is complex, multi-dimensional, and constantly in flux. RDF inherently incorporates web semantic standards, industry ontologies and taxonomies that enables enterprises to accommodate frequent schema changes and new requirements.

Organizations looking to build reusable data management solutions with interoperability across the organization and industry should embrace RDF to deliver the intended promises of being data-driven.

Download GraphDB and start building RDF knowledge graphs for your data management practices! 

Click Here to Give It a Try

Article's content

Strategic Technology Director at Ontotext

Sumit Pal is an Ex-Gartner VP Analyst in Data Management & Analytics space. Sumit has more than 30 years of experience in the data and Software Industry in various roles spanning companies from startups to enterprise organizations in building, managing and guiding teams and building scalable software systems across the stack from middle tier, data layer, analytics and UI using Big Data, NoSQL, DB Internals, Data Warehousing, Data Modeling, Data Science and middle tier.