Ontotext Metadata Studio 3.7 Introduces Probably The Best AI Model for Linking Text to Wikidata

OMDS’s latest release brings to its users a state-of-the-art entity linking service that tags mentions to specific concepts in Wikidata

New York, Sofia, Basel Friday, February 16, 2024

Ontotext is pleased to announce the new version of the Ontotext Metadata Studio (OMDS). It now enables you to tag your content with CEEL – our new generation class-leading text analytics service performing Common English Entity Linking. This is part of Ontotext’s AI-in-Action initiative aimed at enabling data scientists and engineers to benefit from the AI capabilities of our products.

CEEL is trained to tag mentions of People, Organizations and Locations to their representation in Wikidata – the biggest global public knowledge graph, which includes close to 100 million entity instances. Wikidata entities have precise mappings to Wikipedia articles, where those exist – Wikipedia has about 7 million articles. Wikidata is also used continuously as a source for the enrichment of Google’s knowledge graph, which makes Wikidata popular for semantic SEO purposes. Focusing on the entity types of interest, CEEL is trained to recognize about 40 million of the Wikidata concepts.  

The purpose of models like CEEL is to streamline information extraction from text and enrichment of databases and knowledge graphs. For instance, large language models (LLMs) are good for extracting specific types of company-related events from the news. They can properly recognize and classify places in the text where events are reported and extract the names of the organizations involved. What LLMs cannot do is disambiguate the names to specific concepts in a graph or records in a database. An LLM can extract a relationship (for example, acquisition, which results in parent-subsidiary). But this new fact will not be ready to add to a database before the identifier of one out of multiple possible records for similarly named companies is selected via a service like CEEL.

Keeping with the spirit of the cross-domain nature of the product, the featuring of CEEL now enables the following capabilities within OMDS:

  • Enhancing content discoverability by linking entity mentions in text to their corresponding Wikidata entries. This provides readers with instant access to additional global knowledge context. 
  • Aiding in the automated tagging and categorization of content. This facilitates more efficient discovery, reviews and knowledge synthesis. 
  • Content, enriched with such semantic metadata, allows for more precise search, better SEO and better performance of retrieval augmented generation (RAG) of  LLMs and downstream analytics.
  • Streamlining information extraction from large volumes of unstructured content. This enables organizations to quickly analyze and comprehend market trends or signals.

Evaluation of CEEL’s accuracy, using the most popular public benchmarks for this task, proves that it performs on par or better than the state-of-the-art AI models. More details related to CEEL’s architecture, evaluation, and general availability are available in the dedicated blog post.

This latest offering supplements the pre-existing core feature of OMDS that enables users to perform entity linking against their own taxonomies and reference data. Now they can easily combine and interlink their organizational and domain knowledge with the global body of reference of Wikidata into a single cohesive knowledge graph.

Another highlight of this release is the UX improvements for the Form workflow. The UI has been refined and now makes it more evident exactly how much information a certain annotation or Form section contains. The workflow has also been slightly modified to enable users to do important actions, such as saving and canceling changes to annotations, with fewer clicks. In addition, OMDS 3.7 streamlines the way the quick search works, especially when transitioning to the detailed concept search, which visualizes the comprehensive information for a specific concept in the graph.

Last but not least, the new release expands upon the concept Highlight feature. It now allows users to easily see and “scroll” through each concept mention in the document. In this way, they can quickly grasp the impact and importance of that concept for the whole document. Other smaller improvements include general stability and vulnerability updates, making it the best outing of OMDS to date.

Want to see for yourself?

New call-to-action


For more information, contact Doug Kimball, Chief Marketing Officer at Ontotext