• Blog
  • Informational

Three’s Company Too: Metadata, Data and Text Analysis

A short history of how metadata grew more expressive as user needs grew more complex and how text analysis made it possible to get metadata from our information and data.

August 20, 2020 6 mins. read Jarred McGinnis

Metadata used to be a secret shared between system programmers and the data. Metadata described the data in terms of cardinality, data types such as strings vs integers, and primary or foreign key relationships. Details that were never intended or even useful to the end business user or even another program or database.

As connectivity became the rule rather than the exception for computers, the importance and usefulness of sharing data necessitated the ability to share how that data was defined. XML and later JSON were the languages that enabled data interchange by establishing a common data model by establishing a standard description of the data being shared.

Inevitably, the information that could and needed to be expressed by metadata increased in complexity. Metadata started including descriptions and source information. Separate documents called schemas let you describe the structure and restrictions of the data being described. Separating the definitions of the metadata from the data had the benefit of simplifying validation and introducing flexibility. Beyond the ability to ensure there was an enterprise wide data model, it was also possible to reuse data but with different metadata and schema.

Metadata, the Lingua Franca of the Internet

As metadata grew in expressivity and ubiquity to facilitate data interchange, so did the general connectivity of computers. More and more content such as documents, videos, sound was natively digital. As the amount of digital content increased, so did the difficulty in finding exactly what you were looking for.

Structural metadata offered no help, but text analysis did. Initially this was limited to keyword search, which was only as useful if the person searching could correctly guess the right terms to match exactly what was being searched for. Once text analysis was coupled with taxonomies it was possible to have a better understanding of what the content was about and most importantly what other concepts and content were related to.

Read our White Paper: Text Analytics for Enterprise Use!

Content about ‘NASDAQ’, ‘FTSE’ and China’s ‘SSE’ would be understood to have a relationship to a broader concept of ‘stock market’. Taxonomies eventually became ontologies. The relationships between terms, also called entities or concepts, in a taxonomy are limited to narrower and broader. The most famous taxonomy is the one in biology: domain, kingdom, phylum, class, order, family, genus, and species.

On the other hand, ontologies are able to capture a multiplicity of relationships. Rather than a top down tree structure branching out, ontologies resemble a graph. This level of complexity better captures how business users understand their content and data. It is more suited to the variety, complexity, and dynamic nature of the world our data is attempting to describe. Ontologies are able to capture the relationships between business processes and entities that reflect the real world. The consequence is that information becomes understood by both machine and human users.

Text Analysis Makes Metadata Feasible. Semantics Makes It Powerful.

It is hard to overstate the value of metadata and schemas, but they are only part of the puzzle. When dealing with the volumes of information, content, and data that an average organization contends with on a daily basis, manual processes for adding metadata are impossible. Text analysis is a ‘catch all’ term to describe any process or algorithm that takes human language text as an input and outputs additional metadata about that text. This metadata could include classification, sentiment, keyword extraction, etc.

To ensure the metadata extracted is of high-quality, Ontotext uses knowledge graphs to increase the performance of its text analysis services. The knowledge graph holds the ontology, the description of the concepts and their relationships, and the existing instances of those concepts and relationships.

For example, the ontology would describe that every company is registered in a country. The data would state that the USA is a country and Facebook is a company that is registered in the USA. This machine understandable but humanly sensible way of representing the world helps the text analysis deal with the ambiguity of language and mitigates errors such as confusing homonyms and homographs. Both are notoriously tricky for text analysis.

The added advantage of this approach is that the metadata that is associated with the content is semantic. There is a link between the concept extracted and the ontology, which means that concept and now the text too have context, which enables more accurate search and further analytics. The text analysis is also able to identify gaps in the knowledge graph such as concepts and relationships previously unidentified. This new knowledge is added to the graph further enhancing the text analysis to produce more and better metadata.

The Value of Text Analysis and Semantic Metadata

With text analysis producing valuable (and semantic) metadata, companies derive a number of benefits.

Better search

Without metadata, the chances of finding anything you are looking for are near nil. Finding what you need through file structures on our personal computers is bad enough, never mind an enterprise-wide ICT. Faster and easier knowledge discovery has obvious cost benefits and reduces duplication of effort.

More useful organization of information

With good metadata and especially good semantic metadata, you are able to organize and present information more closely to how a human user understands it. If the user role is also understood in metadata, It is possible as well to dynamically modify information for the specific role’s needs and access privileges.

Better understanding of information

Metadata makes for richer analytics. The time it takes to Extract, Transform and Load (ETL) is significantly reduced when the metadata and its schema are there. That means more accurate data and more timely decision-making based on that data.

Data Governance

Metadata around provenance and permissions simplify data governance issues and provide the full picture of data and its lifecycle.

Universal integration

Metadata has immediate SEO benefits. Just as metadata helps you find what you are looking for so does it help others find you. The standards for semantic metadata are well-developed by the W3C, enabling seamless integration of  internal company’s data and metadata with valuable third party sources.

Machine understandable and human sensible

Enabling machines to do the heavy-lifting to process the huge amounts of information and data that passes through enables business users to more easily get to the information that need to make the right decisions in a timely manner.

Only with good metadata and text analysis is this all possible.

Want to learn more about how text analysis and semantic metadata work and can transform enterprise knowledge management?

White paper Text Analytics for Enterprise Use
White Paper: Text Analytics for Enterprise Use
Use the power of text analytics for your enterprise

Download Now

Article's content

Technical Author at Freelancer

Jarred McGinnis is a managing consultant in Semantic Technologies. Previously he was the Head of Research, Semantic Technologies, at the Press Association, investigating the role of technologies such as natural language processing and Linked Data in the news industry. Dr. McGinnis received his PhD in Informatics from the University of Edinburgh in 2006.

Human-computer Collaboration with Text Analysis for Content Management

Read about how knowledge-driven computing such as Ontotext’s content management solutions are essential for closing the semantic gap between humans and computers.

RDF-Star: Metadata Complexity Simplified

Read about how RDF-Star brings the simplicity and usability of property graphs without sacrificing the essential semantics that enables correct interpretation and diligent management of the data.

Knowledge Graphs for Open Science

Read about how knowledge graphs model the relationships within scientific data in an open and machine-understandable format for better science

Knowledge Graphs and Healthcare

Read about how industry leaders are using Ontotext knowledge graph technology to discover new treatments and test hypotheses.

Does Your Right Hand Know That Your Left Hand Just Lost You a Billion Dollars?

Read about how by automatically identifying and managing human, software and hardware related outages and exposures, Ontotext’s smart connected inventory solution allows banks to save much time and expenses.

Data Virtualization: From Graphs to Tables and Back

Read about how GraphDB’s data virtualization allows you to connect your data with the knowledge graph regardless of where that data lives on the internet or what format it happens to be in.

Throwing Your Data Into the Ocean

Read about how knowledge graphs help data preparation for analysis tasks and enables contextual awareness and smart search of data by virtue of formal semantics.

Ontotext Invents the Universe So You Don’t Need To

Read about the newest version of Ontotext Platform and how it brings the power of knowledge graphs to everyone to solve today’s complex business needs..

From Data Silos to Data Fabric with Knowledge Graphs

Read about the significant advantages that knowledge graphs can offer the data architect trying to bring a Data Fabric to their organization.

What Does 2000 Year Old Concrete Have to Do with Knowledge Graphs?

Read about how knowledge graphs provide a ‘human-centric’ solution to preserving institutional memory and avoiding operational mistakes and missed business opportunities.

Three’s Company Too: Metadata, Data and Text Analysis

Read about how metadata grew more expressive as user needs grew more complex and how text analysis made it possible to get metadata from our information and data.

The New Improved and Open GraphDB

Read about Ontotext’s GraphDB Version 9.0 and its most exciting new feature – open-sourcing the Workbench and the API Plugins.

It Takes Two to Tango: Knowledge Graphs and Text Analysis

Read about how Ontotext couples text analysis and knowledge graphs to better solve today’s content challenges.

Artificial Intelligence and the Knowledge Graph

Read about how knowledge graphs such as Ontotext’s GraphDB provide the context that enables many Artificial Intelligence applications.

Semantic Search or Knowing Your Customers So Well, You Can Finish Their Sentences For Them

Read about the benefits of semantic search and how it can determine the intent, concepts, meaning and context of the words for a search.

The Knowledge Graph and the Internet’s Memory Palace

Learn about the knowledge graph and how it tells you what it knows, how it knows it and why.

The Web as a CMS: How BBC joined Linked Open Data

Learn what convinced the skeptics on the editorial side of the BBC to try the simple but radical idea of ‘The Web as a CMS’.

Can Semantics be the Peacemaker between ECM and DAM?

Learn about how semantics (content metadata) can give peace a chance and resemble how humans understand and use the content.

The Future is NOW: Dynamic Semantic Publishing

Learn how semantically annotated texts enhance the delivery of content online with Ontotext’s News On the Web (NOW) demo.

Introducing NOW – Live Semantic Showcase by Ontotext

Discover interesting news, aggregated from various sources with Ontotext’s NOW and enjoy their enriched content with semantic annotation.