A Global Pharma Company Uses Ontotext’s Solution for Semantic Similarity Search in Documents

Ontotext's smart semantic similarity search solution enables a global Pharma company to quickly process large volumes of Regulatory questions and scale up the information extraction.

GraphDB’s Similarity plugin is one of the latest new features of Ontotext’s leading semantic graph database. It brings cognitive awareness to GraphDB by leveraging the knowledge graph links.

The similarity indices are a fuzzy match heuristic based on statistical semantics. This is especially useful for retrieving the closest related texts. The plugin integrates the Semantic vectors library and its underlying Random Indexing algorithm. For more information, see our Documentation.

The Goal

A global Pharma company selected Ontotext to create a smart industry-specific solution for processing the large volumes of diverse questions coming from Regulatory Authorities that had to be answered in a short period of time.

Although the company had amassed a huge archive of questions answered in the past, the existing solution could not handle this process efficiently. The different formats and the various document management platforms that had stored the Q&As over the years made it very difficult to reuse the company’s knowledge. Even when answering repetitive and very often identical questions about the same product, company analysts regularly had to spend days searching for the answers.

The Challenge

One of the main challenges for the Pharma company was that the system in place was based on conventional search technologies and most of their documents were in an unsearchable PDF format. On top of that, these documents were not indexed and the provided metadata was fragmentary and of poor quality.

Another difficulty was that to find relevant documents, the analysts had to write a series of complicated queries, trying to match keywords from the new question to keywords from existing documents containing answers. This was a complex, iterative process of figuring out how to make the query-specific enough but not too specific, so it would yield meaningful results.

During this process, the analysts had to review long lists of results, weed out duplications and unrelated items, and determine which (if any) of the documents would best serve their purposes. This method was time-consuming and required years of expert knowledge, which also made onboarding of new employees a demanding task.

The Solution – KG-powered Semantic Similarity Search

Ontotext’s smart semantic similarity search solution enables the Pharma company to quickly process large volumes of Regulatory questions and scale up the information extraction.

The Solution - KG-powered Semantic Similarity Search

The solution ingests the various documents from the company’s archive and automatically extracts and categorizes Q&A pairs. The content of the questions is semantically indexed, so that the system is able to compare any new question to all previous questions, even when formulated differently (from a partial inversion or deletion to more significant alterations).

The processed data is used for building a knowledge graph that represents the relations between the different elements of the document. Empowered by this knowledge graph, Ontotext’s solution uses GraphDB’s semantic text similarity search to match words that co-occur with other words in the same context. (For example, even when “cancer” and “metastasis” appear in different texts, they can still be matched as semantically related.)

Finally, Ontotext’s solution returns the top 10 most similar Q&A pairs from the archive, so now company analysts only need to review them and, if necessary, make some modifications before sending their answer. They can also increase the weight of specific terms within a query to the system (to focus, for example, on the safety aspect of a question) and narrow down the results even more.

Why Choose Ontotext?

With Ontotext’s smart semantic similarity search solution, the Pharma company analysts can now:

  • have full access to a comprehensive collection of Q&As;
  • much more easily identify similar questions and their relevant answers;
  • reuse the company’s knowledge by simply copying and pasting answers from previous questions;
  • respond much faster to questions – from 2 days to less than 1 hour.

Ontotext’s solution was built for a very specific Pharma Regulatory problem, but the functionality is applicable to all types of domains as it is based on a generic technology.

Do you think this case resembles your particular needs?

New call-to-action

Contact Us Now