Read about the special relationship between knowledge graphs & text analysis and the business value of delivering such solutions
Assessing the abilities of Artificial Intelligence (AI) systems is a critical aspect of development that has become increasingly important as AI has expanded into various areas of application. We can evaluate their performance through comparisons to human benchmarks, to other AI systems or a combination of both. But no matter which approach serves our purpose best, it remains a specialized task, with varying levels of difficulty depending on the specific field of AI.
An area of AI that Ontotext has been working on for over 20 years is text analytics. This uses natural language processing (NLP) techniques to perform information extraction across unstructured content and store it in a structured form. This data can then be easily analyzed to provide insights or used to train machine learning models.
Ontotext’s approach is to optimize models and algorithms through human contribution and benchmarking in order to create better and more accurate AI.
In text analytics, the human benchmark is a set of documents manually annotated by human experts. To be able to annotate the specified content consistently and unambiguously, these experts usually follow a set of specific conventions, which are referred to as “annotation guidelines”. A human benchmark dataset is also called “quality baseline”, “gold standard corpus”, “ground truth”, etc. At Ontotext, we generally use the classic term of “gold standard corpus”. You can read more about it in this blog post.
A gold standard corpus can differ depending on how it was produced. For example, it can be annotated manually (the classic approach) or semi-automatically (when automated annotations are validated and expanded upon by humans to bootstrap the process). Also, it can either be a one person job or it can be produced by a team of experts who work independently to try to eliminate any subjective bias.
Using a team of annotators allows you to take different expert opinions into account, which will indicate the level of ambiguity of the task. You’ll also be able to establish an inter-annotator agreement (IAA) metric. This measures the consistency of annotations when more than one person is involved in the process. Also, in practice it sets the theoretical limit of the quality that any future automated process can achieve.
Creating such a human benchmark from scratch with multiple annotators is usually expensive. It can often be seen as more practical to have a single expert go through the documents and establish what they deem “good enough” to satisfy the requirements of a specific use case. The approach depends on the price-to-quality ratio sought after in each particular case.
Having a good human benchmark is crucial when it comes to developing NLP software. It provides the foundation on which you can build your automated text analytics services and processes, evaluate them and continuously monitor and improve the quality of the output.
You can find readily available annotated datasets for common tasks such as part of speech tagging, named entity recognition (e.g., people, organizations, locations) or sentiment analysis. If the NLP problem is popular and generic enough, there are many good and free quality resources that can be reused.
However, chances are that for a specific business domain or use case, you would need to build a text analytics service from scratch and create your own human annotations.
In the past few decades, there has been a significant increase in the amount of free text in digital formats. More and more companies are beginning to understand that the untapped potential in their text data can be used for various business applications and decision-making processes. We have extensive expertise with text analytics solutions, as we have worked on various types of solutions for companies across many different industry verticals so we understand the typical pain points throughout this process.
In order to empower enterprises to turn their static documents into actionable data, we’ve developed Ontotext Metadata Studio – an all-in-one environment facilitating the creation, evaluation and improvement of the quality of text analytics services. It offers flexible modeling capabilities that can address the challenges of any specific use case. Its intuitive interface greatly simplifies the creation of human annotations. It also makes it easy to create various reports to monitor the quality and keep it in line with your requirements.
Ontotext Metadata Studio’s modeling power and flexibility enables out-of-the-box rapid NLP prototyping and development. It allows you to iteratively create a text analytics service that would best serve your domain knowledge. At any point, you can measure the output against your human benchmark and improve its quality by fine-tuning the performance. This brings quick development with short feedback loops. You can apply the same iterative process when creating your human benchmark.
The simple interface enables non-technical domain experts to work in a wide variety of setups, even for the most complex domain models and use cases. It makes the whole process of reviewing and iterating the output of the text analytics service or human annotation very explainable. The interface also shows you an overall quality metric score for all documents in the corpus as well as a drill down view to individual documents. This makes it easy to see a detailed breakdown of the quality discrepancies on the single document and single annotation level, as well as to assess what you have in your data or content.
Being able to quickly annotate in a consistent way and measure the quality of the output of the text analytics service you’ve created with a few clicks, significantly speeds up the development of your text analytics solution. It also includes an early warning system that notifies you about any underlying issues you might have in your text analytics, which might otherwise be left unnoticed. As a result, you can apply the correct solution depending on the root cause and resolve the issue quickly so it is not replicated throughout the document set.
Building a big, accurately annotated corpus of documents for developing or maintaining a text analytics solution can be a complex and expensive task. While you can do it with other specialized tools, or even in Excel, most don’t address the core problem of creating such a corpus. Every time your base use case gets more sophisticated, or you need to add more people to your annotation team, or to increase the number of documents scheduled for annotation, the task becomes even more complex and more expensive.
Ontotext Metadata Studio addresses all of these problems head on. Its flexibility and easy configurability make it suitable for a wide range of use cases. It also alleviates workload constraints for your team and provides an easy solution for keeping track of your documents.
Lastly, Ontotext Metadata Studio provides an out-of-the-box configuration with some of the most popular text analytics services on the market. You can effortlessly measure the performance of Google’s NLP, Amazon Comprehend, IBM Watson, SpaCy against your own benchmark or compare them against each other. We make it easy to extend this list with other third-party services out there, including your own.
Creating a good human benchmark for your text analytics solution involves many tasks, but with the help of Ontotext Metadata Studio this process becomes a lot more straightforward. Instead of being bogged down by administrative tasks, you can now focus on making clear, unambiguous annotation guidelines and keep them up-to-date. You can annotate more documents with better quality, streamline the work of domain experts by standardizing their vocabulary and make your annotators’ lives much easier.
With its powerful modeling capabilities, intuitive interface and detailed reporting features, Ontotext Metadata Studio allows enterprises to save time and resources while improving the accuracy and quality of their text analytics services.