Case study: Policy Enforcement Automation With Semantics

This is an abbreviated and updated version of a presentation from Ontotext’s Knowledge Graph Forum 2023 by Nimit Mehta, CEO at TopQuadrant.

May 2, 2024 8 mins. read Nimit Mehta

Data leaders today are faced with an almost impossible challenge. Particularly those on the “the create side of the house” who are tasked to deliver insights and analytics. They are expected to understand the entire data landscape and generate business-moving insights while facing the voracious needs of different teams and the constraints of technology architecture and compliance. 

Evolution of data approaches

The data strategies we’ve had so far have led to a lot of challenges and pain points.

Application-centric approach

In the application-centric approach to data, people create an application to solve their problems today. But once that application has done its work, it leaves an important dataset behind a firewall and on a server instead of in the hands of users. So, even if it solves a problem for a while, the rapidly increasing needs of product and business teams lead to the proliferation of such applications. 

The result is issues in discovery as well as consistency across applications. For example, when one application talks about “customers”, it isn’t clear if other applications share the same view of the concept. Such inconsistencies bring lowered trust in the outcomes analytics and insights leaders try to get. In effect, while solving their problems today, they are hurting themselves for tomorrow.

Storage-centric approach

In the storage-centric approach, people try to address data silos by throwing everything in a data lake or a data warehouse. But, although, this helps somewhat in terms of architecture, soon these data lakes become unwieldy. Every new dataset and new user adds a little more friction that hits the core metric of the velocity of data and brings it down to zero.

Another challenge is the data models that represent how insights and analytics teams see the world. These models are as important to companies as their frontline products and determine how data is managed, consumed, combined, joined, and analyzed.

Data-centric approach

In the data-centric approach, metadata serves as a layer of interoperability between the data sources. This powers numerous applications, insight generations, dashboards, and tools. It also enables business teams to have their data at their fingertips and have a uniform view of it.

This approach is fueled by enterprise semantics and makes data findable, consistent, interoperable, and reusable for generating faster insights. It also enables leaders to make decisions not based on the limitations of their technology, but on what data can do. As a result, they can orchestrate different systems around their organization to be the neural core of how data flows. 

Our Value Proposition

How does TopQuadrant fit into this picture? TopQuadrant is an enterprise data management software built around helping companies realize the value of their data by solving tough problems with semantics. Our product is TopBraid EDG and it’s the choice semantic-modeling and application-powering tool, enabling teams to create, maintain, govern, and deploy their data. 

The other half of our value proposition is Ontotext GraphDB. GraphDB is a best-of-breed RDF database for knowledge graphs that allows linking diverse data, indexing it for semantic search, and enriching it via text analysis to build big knowledge graphs.

Thanks to our partnership with Ontotext, TopBraid EDG customers get access to industry-leading performance and scalability alongside analytics, features, and tools. At the same time, GraphDB customers gain the ability to take the data stored in the database and make it collaborative, visible, and discoverable to the analytics teams. This enables them to turn data into an enterprise asset and create industry accelerators that bring real outcomes.

The method

As shown in the diagram below, we start with enterprise harmonization and enrichment that enables better insights. Providing a unified metadata model and a semantic layer is enhanced through discovery, auto-classification, tagging, inferencing, and so on.

Over time, that accelerates even more and discovery becomes richer and, through semantic search, analysis can be much quicker. Last but not least is the reusability and exchange that enable data products to be discovered and managed across an organization through a data marketplace.

Use case-driven growth

How do we take data and create value? The best way to drive value is through use cases. We can talk about three categories, each with different sophistication, but all valuable in and of themselves.

Data products – optimization of data assets that drive business, improving data quality and interoperability. These can be business glossaries, code lists, policy literature, and other things that are imminently reusable and valuable across an organization.

Semantic applications – combinations of data assets that enable data to be combined and reused to drive specific business outcomes. Usually, business users want a black box as they don’t care how things are done, they care that they are done better, faster, and cheaper. Technical users, on the other hand, want an open box, so this can be the bridge between these. Some examples are recommendation systems, semantic search and integration, customer 360, and automated policy enforcement.

Semantic layer – a framework that enables ideal data usage across an organization to improve IT efficiency and data reuse and enables enterprise use cases like AI and machine learning. Here we talk about metadata management, catalog of catalogs, and so on.

A Pharma story

The first story we want to share is about the second largest drug development Pharma organization (Fortune 50), which wanted to improve the visibility of why their drug development was so slow. For each drug, they would run 30 trials, of which 15 or 17 candidates would drop off in stage 1, 9 – in stage 2, and 1 would make it to stage 3 for an FDA drug trial. It took 4 months from deciding that a drug was valuable to submitting the report. That drug represented about $10 to $15 billion of enterprise value so what was at stake was about $100 million a month.

Another complication was the existing data silos. Like many other Pharma companies, they kept data on R&D, clinical trials, reporting, and post-clinical trials in disparate data hubs. They also had about 12 analytics teams, each managing a different part of the process.

The semantic layer approach allowed the Pharma company to flip their applications and business units upside down and see what data-driven insights they could get. By creating a Trial Data Harmonizer, they had one point of access for all the data about a compound through the different stages of discovery, validation, testing, improvement in clinical trials, and post-clinical checks. As a result, they could cut the time from 4 months down to about 2 weeks.

The Trial Data Harmonizer also enabled them to create new data products like MedDRA Codeset & Internal Extension and a Trial Requirement library. The analytic opportunities led to the creation of new applications such as the trial submission report generation system and the international trial submission report generation systems. 

Lastly, the semantic layer empowered an end-to-end R&D life cycle from drug discovery through FDA approval to go to market that is continually being improved, powering more than one application. 

A Top 3 Payments Company Story

The second story is about a Top 3 Payments company, which wanted to enable the creation of commercial data products and enhance customer insights. The main challenges were risk mitigation and agility in analytics. There were also a lot of complications around CDPR, CCPA, multiple internal domains, and international footprint. So, we worked with them to create an automated view and discovery around data policy. 

As you can see in the diagram above, this enabled them to have all GDPR policies into sub-policies in the GDPR Policy Library and Sub-library. It also allowed them to create a Company-wide business glossary, which defined core concepts for connecting policies. In this way, people around the world could visualize the glossary and the policy library tied together with workflows, best practices, and so on.

As a result, teams that didn’t normally talk well together improved their communication and produced better outcomes. It also enabled the creation of several applications such as Automated CCPA Enforcement, Fraud Detection, M&A Data Harmonizer, and Customer 360. On top of that, the semantic layer empowered a company-wide data privacy layer that powered applications and provided value and data-driven insights.

LLM integration

Semantics, the collective wisdom of your organization, will enable your teams to better utilize the large language models (LLMs) that have become so popular. As a start, semantic enrichment will give LLMs the power to create more context around core concepts that matter to your organization, use a corpus of data, process unstructured documents, and recommend new insights.

Over time, two other areas unique to graph technology will become evident. First, that semantics is the best training ground for generative AI and, second, that the way knowledge graphs are structured provides what we call “LLM rails” that help along the way. 

So the technology of our time will depend on and will make semantics easier. And, when all the dust around the current GenAI hype settles, this is how semantics and knowledge graphs will empower LLMs.

Key Takeaways

The great value of semantics and knowledge graphs is in building technologies and accelerators that drive outcomes in Life Sciences and Financial Services.

To sum it up, best-of-breed technologies offer a way to drive the best value. What’s also important is that insights and applications garner the most attention and support. And, lastly, it’s never been easier to start. 





Article's content

CEO at TopQuadrant

Nimit Mehta is the CEO of TopQuadrant, the applied knowledge graph collaboration platform. Its flagship product, EDG, brings life to metadata to solve the most important safety, security, and discovery problems for the world's largest organizations. It seamlessly connects teams, technology, and processes with a unique belief in shared knowledge to give a voice to the most valuable enterprise assets - the unsung, often overlooked, heroes who lead through data. Prior to TopQuadrant, Nimit guided ascendant technologies through the challenges of rapid growth as an operating executive, entrepreneur, and investor.