Learn more about the importance of being metadata-driven today in our latest SlideShare presentation.
Metadata used to be a secret shared between system programmers and the data. Metadata described the data in terms of cardinality, data types such as strings vs integers, and primary or foreign key relationships. Details that were never intended or even useful to the end business user or even another program or database.
As connectivity became the rule rather than the exception for computers, the importance and usefulness of sharing data necessitated the ability to share how that data was defined. XML and later JSON were the languages that enabled data interchange by establishing a common data model by establishing a standard description of the data being shared.
Inevitably, the information that could and needed to be expressed by metadata increased in complexity. Metadata started including descriptions and source information. Separate documents called schemas let you describe the structure and restrictions of the data being described. Separating the definitions of the metadata from the data had the benefit of simplifying validation and introducing flexibility. Beyond the ability to ensure there was an enterprise wide data model, it was also possible to reuse data but with different metadata and schema.
As metadata grew in expressivity and ubiquity to facilitate data interchange, so did the general connectivity of computers. More and more content such as documents, videos, sound was natively digital. As the amount of digital content increased, so did the difficulty in finding exactly what you were looking for.
Structural metadata offered no help, but text analysis did. Initially this was limited to keyword search, which was only as useful if the person searching could correctly guess the right terms to match exactly what was being searched for. Once text analysis was coupled with taxonomies it was possible to have a better understanding of what the content was about and most importantly what other concepts and content were related to.
Content about ‘NASDAQ’, ‘FTSE’ and China’s ‘SSE’ would be understood to have a relationship to a broader concept of ‘stock market’. Taxonomies eventually became ontologies. The relationships between terms, also called entities or concepts, in a taxonomy are limited to narrower and broader. The most famous taxonomy is the one in biology: domain, kingdom, phylum, class, order, family, genus, and species.
On the other hand, ontologies are able to capture a multiplicity of relationships. Rather than a top down tree structure branching out, ontologies resemble a graph. This level of complexity better captures how business users understand their content and data. It is more suited to the variety, complexity, and dynamic nature of the world our data is attempting to describe. Ontologies are able to capture the relationships between business processes and entities that reflect the real world. The consequence is that information becomes understood by both machine and human users.
It is hard to overstate the value of metadata and schemas, but they are only part of the puzzle. When dealing with the volumes of information, content, and data that an average organization contends with on a daily basis, manual processes for adding metadata are impossible. Text analysis is a ‘catch all’ term to describe any process or algorithm that takes human language text as an input and outputs additional metadata about that text. This metadata could include classification, sentiment, keyword extraction, etc.
To ensure the metadata extracted is of high-quality, Ontotext uses knowledge graphs to increase the performance of its text analysis services. The knowledge graph holds the ontology, the description of the concepts and their relationships, and the existing instances of those concepts and relationships.
For example, the ontology would describe that every company is registered in a country. The data would state that the USA is a country and Facebook is a company that is registered in the USA. This machine understandable but humanly sensible way of representing the world helps the text analysis deal with the ambiguity of language and mitigates errors such as confusing homonyms and homographs. Both are notoriously tricky for text analysis.
The added advantage of this approach is that the metadata that is associated with the content is semantic. There is a link between the concept extracted and the ontology, which means that concept and now the text too have context, which enables more accurate search and further analytics. The text analysis is also able to identify gaps in the knowledge graph such as concepts and relationships previously unidentified. This new knowledge is added to the graph further enhancing the text analysis to produce more and better metadata.
With text analysis producing valuable (and semantic) metadata, companies derive a number of benefits.
Without metadata, the chances of finding anything you are looking for are near nil. Finding what you need through file structures on our personal computers is bad enough, never mind an enterprise-wide ICT. Faster and easier knowledge discovery has obvious cost benefits and reduces duplication of effort.
With good metadata and especially good semantic metadata, you are able to organize and present information more closely to how a human user understands it. If the user role is also understood in metadata, It is possible as well to dynamically modify information for the specific role’s needs and access privileges.
Metadata makes for richer analytics. The time it takes to Extract, Transform and Load (ETL) is significantly reduced when the metadata and its schema are there. That means more accurate data and more timely decision-making based on that data.
Metadata around provenance and permissions simplify data governance issues and provide the full picture of data and its lifecycle.
Metadata has immediate SEO benefits. Just as metadata helps you find what you are looking for so does it help others find you. The standards for semantic metadata are well-developed by the W3C, enabling seamless integration of internal company’s data and metadata with valuable third party sources.
Enabling machines to do the heavy-lifting to process the huge amounts of information and data that passes through enables business users to more easily get to the information that need to make the right decisions in a timely manner.
Only with good metadata and text analysis is this all possible.
Want to learn more about how text analysis and semantic metadata work and can transform enterprise knowledge management?
White Paper: Text Analytics for Enterprise UseUse the power of text analytics for your enterprise |