GraphDB’s Similarity plugin is one of the latest new features of Ontotext’s leading semantic graph database. It brings cognitive awareness to GraphDB by leveraging the knowledge graph links.
The similarity indices are a fuzzy match heuristic based on statistical semantics. This is especially useful for retrieving the closest related texts. The plugin integrates the Semantic vectors library and its underlying Random Indexing algorithm. For more information, see our Documentation.
A global Pharma company selected Ontotext to create a smart industry-specific solution for processing the large volumes of diverse questions coming from Regulatory Authorities that had to be answered in a short period of time.
Although the company had amassed a huge archive of questions answered in the past, the existing solution could not handle this process efficiently. The different formats and the various document management platforms that had stored the Q&As over the years made it very difficult to reuse the company’s knowledge. Even when answering repetitive and very often identical questions about the same product, company analysts regularly had to spend days searching for the answers.
One of the main challenges for the Pharma company was that the system in place was based on conventional search technologies and most of their documents were in an unsearchable PDF format. On top of that, these documents were not indexed and the provided metadata was fragmentary and of poor quality.
Another difficulty was that to find relevant documents, the analysts had to write a series of complicated queries, trying to match keywords from the new question to keywords from existing documents containing answers. This was a complex, iterative process of figuring out how to make the query-specific enough but not too specific, so it would yield meaningful results.
During this process, the analysts had to review long lists of results, weed out duplications and unrelated items, and determine which (if any) of the documents would best serve their purposes. This method was time-consuming and required years of expert knowledge, which also made onboarding of new employees a demanding task.
Ontotext’s smart semantic similarity search solution enables the Pharma company to quickly process large volumes of Regulatory questions and scale up the information extraction.
The solution ingests the various documents from the company’s archive and automatically extracts and categorizes Q&A pairs. The content of the questions is semantically indexed, so that the system is able to compare any new question to all previous questions, even when formulated differently (from a partial inversion or deletion to more significant alterations).
The processed data is used for building a knowledge graph that represents the relations between the different elements of the document. Empowered by this knowledge graph, Ontotext’s solution uses GraphDB’s semantic text similarity search to match words that co-occur with other words in the same context. (For example, even when “cancer” and “metastasis” appear in different texts, they can still be matched as semantically related.)
Finally, Ontotext’s solution returns the top 10 most similar Q&A pairs from the archive, so now company analysts only need to review them and, if necessary, make some modifications before sending their answer. They can also increase the weight of specific terms within a query to the system (to focus, for example, on the safety aspect of a question) and narrow down the results even more.
With Ontotext’s smart semantic similarity search solution, the Pharma company analysts can now:
Ontotext’s solution was built for a very specific Pharma Regulatory problem, but the functionality is applicable to all types of domains as it is based on a generic technology.
Do you think this case resembles your particular needs?