Learn about semantic information extraction and how it pulls out meaningful data from textual sources, ready to be leveraged for insights, decisions and actions.
Semantic Annotation is about weaving data into textual sources. In semantically annotated texts, certain words (denoting things, people, locations, organizations, etc.) are linked to data – that is, to context and references that can be processed by an algorithm.
The goal of semantic annotation is better information retrieval and smarter knowledge management. Click To TweetIn particular, this translates into technologies that help content creators and consumers to retrieve information faster and manage knowledge easier. Semantic search, content aggregation, and automated relationships discovery are among the most common applications, enabled by Semantic Annotation.
With data woven into texts, the “new readers” (meet them in the next paragraph) are able to interpret, combine, and use content in an automated way thus facilitating the way we navigate, find, collect and analyze information.
Initially it [reading] was the simple faculty of extracting visual information from any encoded system and comprehending the respective meaning. Later it came to signify almost exclusively the comprehending of a continuous text of written signs on an inscribed surface. More recently it has included the extracting of encoded information from an electronic screen. And reading’s definition will doubtless continue to expand in future for, as with any faculty, it is also a measure of humanity’s own advancement.
Steven R. Fischer, A History of Reading
The definition of reading does expand every single day, following our growing need to manage more and more textual sources. So does the profile of the reader. Reading, in its very basic form (extracting information from any encoded system and comprehending meaning), is not a human-only territory anymore.
Take the reading on the Web, for example. According to a recent report, although humans are the ones responsible for 51.5% of the traffic on the web, a significant 48.5% of all online traffic is attributed to bots. Assisting with automated tasks, machines are everywhere, not only on the web, collecting data but also across corporate intranets.
Come to think of it, in an ocean of digital content, reading and understanding heavily depend on using the right tools for handling texts. Tools that allow efficient research, quick information retrieval and facts discovery, gathering and managing information.
Activities are unthinkable without the help of software agents. These agents have huge processing powers to navigate, process and manage huge volumes of content on our behalf, provided we show them around our content and help them make sense of it. For that to happen, we need to enrich texts with information presented in the formal language the new readers understand – that is, in the language of data.
A gloss (from Latin: glossa, from Greek: γλῶσσα glóssa “language”) is a brief notation, especially a marginal one or an interlinear one, of the meaning of a word or wording in a text. It may be in the language of the text, or in the reader’s language if that is different.
To get the benefit of understanding Semantic Annotation without the burden of the complexity it involves, it will help to view it as digital marginalia.
Marginalia, the medieval side notes, have served understanding for ages and have been an invaluable source of additional information to the reader. Just like Semantic Annotations are today, in our digital-everything age. Only that today’s readers are not only human.
It is through Semantic Annotations that we can leave notes for smart agents to process and further assist us in managing our digital content. Written in the machine-interpretable formal language of data, these notes will serve computers to classify, link, search through and filter texts and data, associated with them.
When it comes to machine-readable texts, it is important to bear in mind that “understanding”, as of today, is still confined to and only possible within a limited, pre-defined context. Semantic Annotations help machines “to read” in the very basic sense of the word – that is, in the sense of deciphering strings of symbols. Nothing more, nothing less – a computer’s understanding is inseparable from the information and the formal knowledge they were fed with.
Much to learn you still have.
From Quotes of Yoda, Star Wars: Episode V – The Empire Strikes Back (1980)
Algorithms do have a hard time understanding (encompassing and decoding) the richness and granular expressivity with which we describe the world. And while the expressiveness of the language of data (that is the depths of the concepts and ideas represented and communicated with it) is growing bigger, we still have a long way to go till we stop sounding to our machines like Yoda does.
The good news is that in certain areas, Semantic Annotations do help machine-understanding. They are invaluable when it comes to bringing the significant automated analytical power of machines to help us navigate the ocean of digital content.
Interlinking texts with data is already widely used in fields where knowledge is formally described and explicitly recorded. Semantic Annotations support scientists, researchers, insurers, doctors and lawyers in facing the challenges of accurate research and unearthing precise information.
Enabling various applications such as automatic relationship discovery, content aggregation and recommendation, and regulatory compliance detection, Semantic Annotation comes handy when sifting through huge amounts of textual sources like scientific research, medical documents and health insurance claims.
Any domain of knowledge can benefit from creating digital marginalia, provided they are well described (in a standard data language) and properly linked (semantically indexed and connected to highly-structured and machine-readable datasets). Currently, among the successful users of semantically annotated content are publishers, pharmaceutical companies, financial institutions and health-care organizations.
Swamped in digital resources of all kinds, readers (new and traditional alike) crave relevancy. Semantic annotation provides a much-needed way for efficient document management. Weaving data into textual sources is what sets digital content apart from the restrictive organization into files and folders – a technological relic from an “archaic analogue age”, as Jarred McGinnis calls the era of gray filing cabinets, ring binders, and paper labels.
With Semantic Annotation, textual sources are given the notes machines need in order to organize and serve content in an accurate and efficient way. It is yet another step towards to revolutionizing the way we approach information management and knowledge discovery.
Or better, it is yet another note in the margin for future generations of all kinds of readers.
Want to learn more about the value of Semantic Annotation?
White Paper: Text Analytics for Enterprise Use
Use the power of text analytics for your enterprise