What is Text Analysis?

Text Analysis (TA) aims to extract machine-readable information from unstructured text in order to enable data-driven approaches towards managing content. To overcome the ambiguity of human language and achieve high accuracy for a specific domain, TA requires the development of customized text mining pipelines.

Information Extraction

Text Analysis is about parsing texts in order to extract machine-readable facts from them. The purpose of Text Analysis is to create structured data out of free text content. The process can be thought of as slicing and dicing heaps of unstructured, heterogeneous documents into easy-to-manage and interpret data pieces. Text Analysis is close to other terms like Text Mining, Text Analytics and Information Extraction – see discussion below.

Red Sox Tame Bulls

The central challenge in Text Analysis is the ambiguity of human languages. Most people in the USA will easily understand that “Red Sox Tame Bulls” refers to a baseball match. Not having the background knowledge, a computer will generate several linguistically valid interpretations, which are very far from the intended meaning of this news title. People not interested in baseball will have trouble understanding it, too.

Achieving high accuracy for a specific domain and document types require the development of a customized text mining pipeline, which incorporates or reflects these specifics.

Do you want to enrich, interlink and repurpose your content utilizing the best text analysis approaches for your use case and domain?

New call-to-action

Knowledge Graphs Help Text Analysis

Modern Text Analysis technology extensively interplays with knowledge graphs (KG):

  • Big graphs provide background knowledge, human-alike concept and entity awareness, to enable a more accurate interpretation of the text;
  • The results of the analysis are semantic tags (annotations) that link references in the text to specific concepts in the graph. These tags represent structured metadata that enables better search and further analytics;
  • Facts extracted from the text can be added to enrich the Knowledge Graph.

Ontotext Platform implements all flavors of this interplay linking text and big Knowledge Graphs to enable solutions for content tagging, classification and recommendation.

Semantic Tagging            Content Classification       Recommendation

Semantic Tagging            Content Classification              Content Recommendation

Examples of the typical steps of Text Analysis, as well as intermediate and final results, are presented in the fundamental What is Semantic Annotation?, which also features a short video. Ontotext’s NOW public news service demonstrates semantic tagging on news against big knowledge graph developed around DBPedia.

Text Analysis vs. Text Mining vs. Text Analytics

Text Analysis and Text Mining are used as synonyms. Information Extraction is the name of the scientific discipline behind text mining. The article What is Information Extraction? provides a list of typical Text Analysis tasks.

All these terms refer to partial Natural Language Processing (NLP) where the final goal is not to fully understand the text, but rather to retrieve specific information from it in the most practical manner. This means making a good balance between the efforts needed to develop and maintain the analytical pipeline, its computational cost and performance (e.g., how much memory it needs and how long it takes to process one document) and its accuracy. The latter is measured with recall (extraction completeness), precision (quality of the extracted information) and combined measures such as F-Score.

You will often find Text Analysis used interchangeably with Text Analytics. And while to the untrained mind these might sound like synonyms, from the point of view of practice and experience, there is a subtle difference worth mentioning.

Text Analysis is the term describing the very process of computational analysis of texts

while

Text Analytics involves a set of techniques and approaches towards bringing textual content to a point where it is represented as data and then mined for insights/trends/patterns.

Case in point, Text Analysis helps translate a text in the language of data. And it is when Text Analysis “prepares” the content, that Text Analytics kicks in to help make sense of these data.

text analysis roman roads

In this sentence, Text Analysis is what you do in order to transform the sentence into data and be able to present to computers what this text is about: Rome, the Roman Empire. Then, once presented in the universal language of data, this sentence can easily enter many analytical processes, Text Analytics included. With Text Analytics, you will be able to derive a conclusion about the percentage of texts that mention Rome in the context of the Roman Empire, and not in the context of vacations in Europe, for instance.

How Can Text Analysis Help Enterprises?

multitudes of content illustration

Companies use Text Analysis to set the stage for a data-driven approach towards managing content. The moment textual sources are sliced into easy-to-automate data pieces, a whole new set of opportunities opens for processes like decision making, product development, marketing optimization, business intelligence and more.

In a business context, analyzing texts to capture data from them supports the broader tasks of:

  • content management;
  • semantic search;
  • content recommendation;
  • regulatory compliance.

When turned into data, textual sources can be further used for deriving valuable information, discovering patterns, automatically managing, using and reusing content, searching beyond keywords and more.

Using Text Analysis is one of the first steps in many data-driven approaches, as the process extracts machine-readable facts from large bodies of texts and allows these facts to be further entered automatically into a database or a spreadsheet. The database or the spreadsheet are then used to analyze the data for trends, to give a natural language summary, or may be used for indexing purposes in Information Retrieval applications.

Want to learn more about Text Analysis and its applications in Enterprise Content Management?

 

White Paper: Text Analysis for Content Management
5 Steps To Make Your Content Serve Your Business Better

New call-to-action

 

Ontotext Newsletter