Strategically Approaching Graph Technologies

This is an abbreviated and updated version of a presentation from Ontotext’s Knowledge Graph Forum 2023 titled “Strategically Approaching Graph Technologies: Understanding the Bigger Picture of RDF and LPG” by Brandon Richards, General Manager APAC region at Ontotext.

February 26, 2024 11 mins. read Brandon Richards

Several years ago, the idea of reusable rockets was viewed as non-feasible. NASA and other aerospace organizations would build a rocket, fly it once, and that was it. Imagine building a 747, flying it once across the world, and then destroying it. Then came Elon Musk and SpaceX.

“If one can figure out how to effectively reuse rockets, just like airplanes, the cost of access to space will be reduced by as much as a factor of a hundred.” Elon Musk

SpaceX succeeded in building reusable rockets, drastically reducing the cost of sending them into orbit or taking astronauts to the International Space Station. Recently, they broke a record, sending the same Falcon 9 first-stage rocket on its 18th trip into space. This has become a testament to the power of innovation.

This innovation has enabled SpaceX to have a massive cost advantage in getting satellites into orbit, which has translated into having more satellites in space than the rest of the world combined. 

The power of reusable knowledge graphs

No doubt, building a reusable rocket is harder than building a reusable graph, but many of the same principles apply. The good news is that you don’t need to figure it out from scratch. Building reusable graphs is a solved problem. 

However, it still requires more effort than building a single-purpose graph, like a labeled property graph (LPG), where you can whiteboard your graph model in an hour. If you take the time to work through a unified, conceptual model based on both industry and semantic standards, controlled vocabularies, and merging the relevant ontologies and taxonomies, it will take time and thinking. But the payoff is massive, especially in the context of doing a more strategic graph deployment across an organization.

The goal: a data-driven organization

Let’s zoom out for a bit. 

Every organization wants to be data-driven. The goal is to capture data, convert it into the right insights, and integrate those insights quickly and efficiently into your business decisions and processes. Companies that do this well have a significant competitive advantage over their peers. However, this is easier said than done and there are many challenges across this entire process.

Firstly, on the data maturity spectrum, the vast majority of organizations I’ve spoken with are stuck in the information stage. They have massive amounts of data they’re collecting and storing in their relational databases, document stores, data lakes, and data warehouses. But until they connect the dots across their data, they will stay in this stage. 

This is where graph technologies come in. They allow you to connect the dots across your data and progress on the spectrum toward knowledge, insight, and wisdom. For this reason, graph technologies have taken center stage in the data and analytics space as a critical enabler. And this includes the AI space. Before you can really maximize the value from data, you must first connect the dots. 

Strategic graph transformation

This message has been getting through, and most large organizations have at least a couple of graph use cases deployed. Some of the more mature ones have several use cases deployed. However, deploying one use case at a time, as different teams discover graph technology, is slow and costly. 

Becoming data-driven will also be slow and costly. Remember that you need to streamline the process of capturing data, converting it into the right insights, and then quickly and efficiently implementing them into your business decisions and processes.

This happenstance approach may eventually get organizations to a reasonable data maturity level but at massive costs. Until C-level executives start to take graph technologies more seriously, they will struggle to deliver on the promises of their digital transformations and become data-driven.

Organizations that understand that they have a data problem take a different approach. They maximize graph opportunities with the help of what we call a Graph Center of Excellence (CoE). Let’s have a quick look at what this is. 

A graph Center of Excellence

What are the main characteristics of a Graph CoE:

  • Strategic prioritization of graph use cases – the organization systematically identifies the top use cases for their industry and prioritizes them based on their key initiatives and objectives. Each additional graph use case increases its derived value thanks to reducing costs, adding new capabilities, and minimizing risk.
  • C-suite sponsorship – it may start with a single C-level executive, but there will need to be broad consensus across the C-suite that this is a strategic initiative for the organization.
  • Effectively evangelizing to overcome organizational inertia – communicating the importance of graph technology across the organization
  • Key technical, data, and solution architecture skills – technical skill sets needed to deploy these projects, like ontologists, semantic engineers, data architects and modelers, domain experts.
  • Capturing and leveraging best practices – the ability to bring in the industry best practices, while capturing learnings internally so the organization can deploy each successive use case faster and cheaper than the previous one. 
  • Starting small – at least initially, they will need to start small and get quick incremental wins, and demonstrate that value to keep the momentum flowing.
  • Economies of Scale – ultimately, they would benefit from economies of scale, saving the organization time, money, and effort by doing this in a centralized way. 
  • Building a foundational semantic graph layer – their first mission will be to build a reusable, foundational, semantic graph layer across the organization’s data.

Bad Data Tax and the Data Bill of Rights

So far, our discussion has been pretty theoretical, so we need a compelling business justification for moving in this direction. In the race to become data-driven, most efforts have resulted in a tangled web of data integrations and reconciliations across a sea of data silos. This adds up to between 40% and 60% of an enterprise’s annual technology spend. We call this the “bad data tax.” You can read more about it in a dedicated post on the subject by myself, Michael Atkin, and Sumit Pal.

And, if this wasn’t bad enough, the outcomes often don’t translate into the key insights needed to deliver better business decisions or more efficient processes. Even with this massive spend on moving and integrating data, reconciling, and validating, the results are quite subpar. 

This is the solid business justification every organization needs to fix their data. The cost of solving this problem is a small fraction of what they spend each year on the bad data tax. For some organizations with huge IT budgets, this bad data tax can be in the hundreds of millions of dollars every year. 

Yet, the solution is relatively simple. Build a foundational semantic graph layer across your data to reduce and even eliminate this key bottleneck in the process of becoming a data-driven organization. If you want to convert your data into the right insights to drive business decisions and processes, you need this data to be easily accessible and stored in a format that is flexible, accurate, and machine-readable. It must retain the context and insight of the original data and be traceable as it flows through the organization. My friend and colleague, Michael Atkin, refers to this as the data bill of rights

RDF vs LPG?

For these reasons, the reusable graph foundational layer with a shared conceptual model needs to be built on RDF (Resource Description Framework). Labeled property graphs (LPGs) and RDFs are optimized for different use cases, but this is one of those use cases that falls squarely on RDF.

That brings us to the age-old graph technologies holy war between RDF and LPG. Organizations looking to leverage graph technologies have ended up in these hotly-contested internal debates over RDF versus LPGs. But I think we’re looking at this all the wrong way. The idea that an organization must decide either/or is completely wrong and more organizations are choosing to use both. 

Each of these types of graph databases is optimized for different things and there’s very little overlap when you look at it. Let’s summarize very quickly.

RDF tends to be more data and metadata-centric. There’s an additional effort around controlled vocabularies, using IRIs to resolve entities, leveraging Linked Open Data to enrich metadata, and managing complex ontologies and taxonomies. All this upfront work pays off in the end. You’re building a standard for the organization, something that can be used and reused over and over again for a wide range of use cases. The reusability is a key feature here, as well as having interoperability across the organization or even the industry.

With LPGs, the approach is very different. Usually, there’s a specific use case that it’s being deployed such as real-time fraud detection or a special recommendation engine. This means that the model doesn’t necessarily care about industry or even company standards and will be optimized for the queries the organization is trying to answer. This also means that it can be deployed very quickly. You develop the graph model, load the data into the model from your data sources, write your queries, tune the model, and there you go. However, there are many instances where LPG deployments can greatly benefit from all the upfront work done on the RDF side.

RDF and LPG: the bigger picture

So, in what way are LPGs and RDFs better together? As I mentioned, LPGs are typically built around a specific use case. You start with understanding the questions you want the graph to answer based on that specific use case. You need to figure out what data is needed and how to best model that data as a graph. Once that’s determined, you can begin to ingest data and for optimal outcomes, you need to ingest good, high-quality data. 

RDF can feed LPGs with higher-quality data, such as the unified conceptual models, with the ontologies and taxonomies, or at least the portions of the ontologies that will be useful in the LPG use case. You also have controlled vocabularies and better data hygiene with data validation. You can enrich the data and metadata with Linked Open Data. The data can be NLP and text-analysis-ready. All of that happens on the RDF side.

All this means that using RDF and LPGs together will not only feed the LPG graph with higher-quality data but will also bring in better integration and interoperability. The LPG will no longer be a graph silo that you are adding to your infrastructure. It will share the same meaning and context with the rest of the organization, making everything speak the same language. On top of that, for cases with complex ontologies and taxonomies, the LPG will benefit from a speedier deployment, capturing that rich context from the RDF. And, lastly, when you have higher-quality data, you’ll see better outcomes.

A semantic graph layer 

Coming back to our bad data tax use case, RDF lays the foundation for all kinds of RDF and LPG use cases.

In the diagram above, you can see the basic architecture of using a foundational semantic graph layer to resolve the bad data tax problem. At the bottom, we have the sea of data silos. Above them are our data catalogs, which we then map to our semantic graph layer. After that, all of our downstream applications can access the semantic graph layer for various LPG or RDF graph use cases. It can also enable the development and evolution of data products that aim to deliver trusted data and convert it into insights. These insights now can be quickly and efficiently integrated into business decisions and processes. 

Building such a foundational layer would be the first mission of the Graph Center of Excellence. As a result, other graph use cases would benefit from the economies of scale, whether they are RDF or LPG use cases.

Main Takeaways

Graph technologies have the potential to deliver on the key goals and aspirations of every organization seeking to be data-driven. This is becoming increasingly important as we enter the age of AI where data quality, context, and trust really matter. Forward-thinking organizations can empower their data teams by making data easier to find and access while enriching that data with the context that comes from connections and ontologies. 

By thinking more strategically about graph technologies, they can implement a powerful solution to their data management problems and eliminate the bad data tax once and for all. The solution doesn’t require a massive investment. On the contrary, adding a semantic graph layer can make their data investments deliver on their intended promises. 

Stay tuned for our next posts about Graph Center of Excellence!

Still not sure how to proceed?

New call-to-action

Article's content