Ontotext

Text Analysis

Text Analysis or the so called Information Extraction (IE) is a process of extracting from documents (which may be in a variety of languages) salient facts about prespecified types of events, entities or relationships. These facts are then usually entered automatically into a database or spreadsheet, which may then be used to analyze the data for trends, to give a natural language summary, or may be used for indexing purposes in Information Retrieval (IR) applications.

It is instructive to compare IE and IR: whereas IR simply finds texts and presents them to the user, the typical IE application analyzes texts and presents only the specific information from them that the user is interested in. For example, a user of an IR system wanting information on the share price movements of companies with holdings in Bolivian raw materials would typically type in a list of relevant words and receive in return a set of documents (e.g. newspaper articles) which contain likely matches. The user would then read the documents and extract the requisite information themselves. They might then enter the information in a spreadsheet and produce a chart for a report or presentation. In contrast, an IE system user could, with a properly configured application, automatically populate their spreadsheet directly with the names of companies and the price movements.

Information Retrieval gets sets of relevant documents

retrieval

 

pin noteOur approach to this task is multi-paradigm search.

Information Extractiongets facts out of documents

extraction

 

pin note Besides IE we also do semantic annotation. 

Why try to say something new, when it is already written by someone smarter. The above information is derived from Information Extraction - a User Guide written by Hamish Cunningham, the visionary behind gate logo, Research Professor of Internet Computing, Computer Science, University of Sheffield