Throwing Your Data Into the Ocean

Knowledge graphs remove the drudgery of data preparation for analysis tasks. By virtue of their formal semantics, contextual awareness and smart search of data becomes possible.  

January 6, 2021 7 mins. read Jarred McGinnis

Through 2022, the application of graph processing and graph databases will grow at 100% annually to accelerate data preparation and integration, and enable more adaptive data science. GARTNER, INC: ‘DATA FABRICS ADD AUGMENTED INTELLIGENCE TO MODERNIZE YOUR DATA INTEGRATION’ (EHTISHAM ZAIDI ET AL, DECEMBER 2019)

Humans are stuck on this planet until they devised a way to travel, 11kms per second, the escape velocity from Earth.

It’s only been very recently that humans figured out how to go that fast. It’s hard and it’s expensive. According to this article, it costs $54,500 for every kilogram you want into space. Think of the money you’ll save if you go before the holiday season bingeing! That was until commercial space companies like SpaceX took a different approach. It has been suggested that their Falcon 9 rocket has lowered the cost per kilo to $2,720. A cost reduction by nearly a factor of 20 is an astounding accomplishment for any industry, but it’s especially noteworthy for escaping our gravitational fetters. How did they do it?

They did it by doing a lot of hard and expensive work and not throwing it in the ocean. They are reusing the initial boosters and other parts of the rocket to achieve that incredible 11kms per second. The problem of wasting time, effort and resources is so common that it’s a cliché.

‘Don’t Reinvent the Wheel’

Data analysis is an example where time and effort are being spent over and over only for the data and development to be chucked into the ocean after the work is done. This is particularly true for the initial data preparation stage of any analysis work. Before a data analyst can get to work, they have to gather the data because we have long passed the age where all the data we need is sitting in just one place.

Next, the data needs to be sifted through to better understand what you have and most importantly what you are missing. After that, the data needs to be cleaned. That means removing errors, filling in missing information and harmonizing the various data sources so that there is consistency. Once that is done, data can be transformed and enriched with metadata to facilitate analysis.

Finally, the data is uploaded and ready to actually be used, to do that actual analysis part of the data analysis task. What’s worse is that data analysts spend the majority of time preparing the data rather than, you know, analyzing it. The same survey also found unsurprisingly they aren’t too happy about it.

Enter KGs the Falcon Boosters of Data Analysis

Finding relationships in combinations of diverse data, using graph techniques at scale, will form the foundation of modern data and analytics. This applies to knowledge graphs, to data fabrics, NLP, explainable AI, … GARTNER, INC: ‘Top 10 Trends in Data and Analytics, 2020’ (RITA SALLAM ET AL, May 2020)

Knowledge graphs represent a collection of interlinked descriptions of concepts and entities. These concepts use other concepts to describe each other. The connections made through these descriptions create context. It’s context that enriches meaning and enables understanding.

A knowledge graph can be used as a database because it structures data that can be queried such as through a query language like SPARQL. It can be treated as a graph, a set of vertices and edges. You can apply graph optimizations or operations such as traversals and transformations. It is also a knowledge base, because the data in it bears formal semantics, which can be used to interpret the data and infer new facts.These semantics enable humans and machines to infer new information without introducing factual errors into the dataset.

Knowledge graphs help with data analysis in a number of ways. The use of metadata and especially semantic metadata creates a unified, standardized means to fuse diverse, proprietary and third-party data seamlessly in a format based on how the data is being used rather than what format it is in or where it is stored.

A knowledge graph provides centralized information control for enterprises at a time when it is no longer possible to integrate all transactional information due to its extreme volume, velocity and variety. Having a formal definition that is both machine and human readable of enterprise-level models describing important and shared concepts across all business departments and reach agreement on common meta-data, reference and master data entities has an enormous value.

Why are knowledge graphs important:

  • Communicating with precision the business critical concepts and their semantic context/meaning.
  • Reuse of knowledge from third party data providers and establishing data quality principles to populate it.
  • Delivering tools that will increase the “digital dexterity” (the desire and ability of employees to embrace existing and emerging technologies to achieve better business outcomes) by proactively suggesting conceptually linked data and metadata.
  • Performing significantly higher score in the enterprise level entity disambiguation tasks, which enables new types of automatic data and content processing.
  • Execution of the CIO and CDO strategic plans, communicating them in a better and transparent way for all internal users and partnering organizations.

GraphDB and Data Analysis in Biopharmaceuticals

The reuse of initial stage boosters saved space travelers a factor of 20. In the world of knowledge graphs we’ve seen factors of 100!

Ontotext worked with a global research-based biopharmaceutical company to solve the problem of inefficient search across dispersed and vast sources of unstructured data. It’s not rocket science, but biopharmaceutical is just as complex. They were facing three different data silos of half a million documents full of clinical study data. Researchers who were trying to design new clinical studies first had to trudge through days of tedious processing with search results of up to 1,000-10,000 results to identify relevant clinical studies.

By using Ontotext’s knowledge graph technology, they were able to achieve a number of benefits:

  • automatically segment and annotate documents;
  • filter and classify data;
  • integrate structured and unstructured data sources;
  • analyze and contextualize results;
  • semantic search

This led to quicker access to data, improved usefulness of search results, which ultimately provided improved evidence-based decision-making and the efficient design of new clinical studies. Most incredibly, the time it took to retrieve the required information for answering the regulatory questions was reduced from four person days to less than one.

With Ontotext’s Platform, It’s Even Easier

Knowledge graphs are at the heart of Ontotext Platform. In the past, in order to ensure application developers didn’t have to deal with the complexities and nuances of knowledge graphs, it was necessary to build a middleware layer with lots of APIs and canned SPARQL queries underneath. That came with cost overheads and the app server layer could mushroom to be a bigger development task than implementing the knowledge graph itself.

Ontotext’s decades of experience have found a better and simpler way. The use of GraphQL and Shapes ensures that application developers can blissfully avoid hacking SPARQL but without the bloated app server middleware layer.

It ensures your enterprise can take advantage of the semantic approach while avoiding backend development of APIs, tools that simplify data consumption and processing. The strict adherence to open-source standards avoid unpalatable vendor lock-in and maximize interoperability with third party tools and data.

 Discover how to tap into the power of knowledge graphs with Ontotext Platform!

Discover More

Article's content

Jarred McGinnis is a managing consultant in Semantic Technologies. Previously he was the Head of Research, Semantic Technologies, at the Press Association, investigating the role of technologies such as natural language processing and Linked Data in the news industry. Dr. McGinnis received his PhD in Informatics from the University of Edinburgh in 2006.

Human-computer Collaboration with Text Analysis for Content Management

Read about how knowledge-driven computing such as Ontotext’s content management solutions are essential for closing the semantic gap between humans and computers.

RDF-Star: Metadata Complexity Simplified

Read about how RDF-Star brings the simplicity and usability of property graphs without sacrificing the essential semantics that enables correct interpretation and diligent management of the data.

Knowledge Graphs for Open Science

Read about how knowledge graphs model the relationships within scientific data in an open and machine-understandable format for better science

Knowledge Graphs and Healthcare

Read about how industry leaders are using Ontotext knowledge graph technology to discover new treatments and test hypotheses.

Does Your Right Hand Know That Your Left Hand Just Lost You a Billion Dollars?

Read about how by automatically identifying and managing human, software and hardware related outages and exposures, Ontotext’s smart connected inventory solution allows banks to save much time and expenses.

Data Virtualization: From Graphs to Tables and Back

Read about how GraphDB’s data virtualization allows you to connect your data with the knowledge graph regardless of where that data lives on the internet or what format it happens to be in.

Throwing Your Data Into the Ocean

Read about how knowledge graphs help data preparation for analysis tasks and enables contextual awareness and smart search of data by virtue of formal semantics.

Ontotext Invents the Universe So You Don’t Need To

Read about the newest version of Ontotext Platform and how it brings the power of knowledge graphs to everyone to solve today’s complex business needs..

From Data Silos to Data Fabric with Knowledge Graphs

Read about the significant advantages that knowledge graphs can offer the data architect trying to bring a Data Fabric to their organization.

What Does 2000 Year Old Concrete Have to Do with Knowledge Graphs?

Read about how knowledge graphs provide a ‘human-centric’ solution to preserving institutional memory and avoiding operational mistakes and missed business opportunities.

Three’s Company Too: Metadata, Data and Text Analysis

Read about how metadata grew more expressive as user needs grew more complex and how text analysis made it possible to get metadata from our information and data.

The New Improved and Open GraphDB

Read about Ontotext’s GraphDB Version 9.0 and its most exciting new feature – open-sourcing the Workbench and the API Plugins.

It Takes Two to Tango: Knowledge Graphs and Text Analysis

Read about how Ontotext couples text analysis and knowledge graphs to better solve today’s content challenges.

Artificial Intelligence and the Knowledge Graph

Read about how knowledge graphs such as Ontotext’s GraphDB provide the context that enables many Artificial Intelligence applications.

Semantic Search or Knowing Your Customers So Well, You Can Finish Their Sentences For Them

Read about the benefits of semantic search and how it can determine the intent, concepts, meaning and context of the words for a search.

The Knowledge Graph and the Internet’s Memory Palace

Learn about the knowledge graph and how it tells you what it knows, how it knows it and why.

The Web as a CMS: How BBC joined Linked Open Data

Learn what convinced the skeptics on the editorial side of the BBC to try the simple but radical idea of ‘The Web as a CMS’.

Can Semantics be the Peacemaker between ECM and DAM?

Learn about how semantics (content metadata) can give peace a chance and resemble how humans understand and use the content.

The Future is NOW: Dynamic Semantic Publishing

Learn how semantically annotated texts enhance the delivery of content online with Ontotext’s News On the Web (NOW) demo.

Introducing NOW – Live Semantic Showcase by Ontotext

Discover interesting news, aggregated from various sources with Ontotext’s NOW and enjoy their enriched content with semantic annotation.