Learn why and how a Knowledge Graph boosts significantly Text Analytics processes and practices and makes text work for us in a more meaningful way.
Ontotext Platform synergizes knowledge graphs and text analysis as follows:
Call it symbiosis, a virtuous cycle or just good engineering, the Ontotext advantage is coupling two technologies (text analysis and knowledge graphs) that complement each other to better solve today’s content challenges. Text analysis helps machines parse and organise the messiness inherent in human language. Knowledge graphs reduce the semantic gaps between a human understanding of information and the computer’s structuring of that information.
Machines need consistency, constancy and unambiguity. They are built up from the clear distinct states of on-off, one-zero, true-false. Human language, on the other hand, has emerged from millennia of history, geography, culture and happenstance. The meaning of words is often ambiguous, variable and context-dependent. Humans use language to communicate but also to obfuscate meaning. The endless expressivity of our language is why it is such a powerful tool for us and the reason it is so hard for machines to replicate our language abilities
A full explanation of text analysis can be found in our fundamental. Ultimately the goal of text analysis is to bring some tidiness to the messiness of language. The analysis includes tasks from separating parts of speech (e.g. nouns, verbs, adjectives, etc.) to identifying concepts and entities such as people, organisations, place names, chemical compounds and products as well as relationships between them.
Identifying the entities, often the proper nouns, is an extremely useful shortcut to identifying what content is about. Humans, too, are particularly focused on proper nouns for understanding. Text analysis typically uses a list, a gazetteer, as a first step to identify and categorise these concepts. Making such a list is a relatively straightforward, if time-consuming, process. It is the maintenance of those lists that becomes unwieldy against that mercurial disorder of language. Languages are not static. The topics we discuss using language are also not static. Words appear and disappear. The meanings of words slip and slide constantly (see autoantonym).
One of the ways the Ontotext Platform solves this problem is by utilising knowledge graphs to create and maintain the gazetteer list. Harnessing the global and encyclopaedic information of the Linked Open Data (LOD), the platform provides a way for machines to organise text in a humanly-useful way. Search and analytics are improved immediately. Not only can the machine identify, also called tagging, entities and the relationships amongst these entities within a single document, it does so by imposing a universal structure across the organisation to enable the machine, and human-users, to understand the entities and their relationships across all documents. This normalisation of data becomes language and system independent. Whether the text refers to the capital of France as Paris, Париж or 巴黎, the computer knows the content or data is referring to the same entity. Additionally, the Ontotext Platform makes use of an organisation’s internal data to tag entities specific to the organisation and terms used within its industry.
The knowledge graph also improves disambiguation. ‘Paris’ could be a person’s first name as well as a city in France or Texas, USA. By utilising the entities identified in the surrounding text and the graph (for more details on graphs), the knowledge base enables the text analysis to have more certainty in identifying the correct entity. For a specific example of how the knowledge graph provides context and concept awareness, refer to the webinar Graph Analytics on Company Data and News.
The knowledge graph suggests entities that are currently known. Inevitably, new people, companies, products appear in content, and these are most likely currently of most interest. The Ontotext Platform’s text analysis is also able to tag candidate entities that aren’t currently in the gazetteer. By making use of the entities also identified in the text, text analysis is able to suggest new entities and their type (e.g. person, location, company, brand, etc).
Depending on its certainty, it can automatically or semi-automatically add the newly identified entities and infer new relationships. Take the following sentence as an example: ‘Ann Sarnoff has been appointed the new CEO of Warner Bros.’ If the knowledge graph isn’t aware of ‘Ann Sarnoff’ or even the company ‘Warner Bros.’, the Ontotext Platform will be able with a high degree of certainty to add a surprising amount of new information to the knowledge graph.
That is a lot of information extracted from one simple sentence. In practice, the text analysis is processing of sentences and enriching the knowledge base with thousands and millions of new pieces of information. This new information is added to the knowledge graph, making it richer, more exhaustive and more likely to identify new entities and relationships. For a more detailed and technical explanation of how semantic tagging is modelled in the Ontotext Platform, read Ontotext Platform: A Global View Across Knowledge Graphs and Content Annotations.