Read about the knowledge graph and about how many enterprises are already embracing the idea of benefiting from it.
There is only one thing that the IT industry likes more than a clever acronym. It is a buzzword. ‘Data Fabric’ has reached where ‘Cloud Computing’ and ‘Grid Computing’ once trod. Data Fabric hit the Gartner top ten in 2019. However, Data Fabric is not an application or software package but a set of design principles and strategies to deal with the very real and concrete truth that centralized data storage and control is gone.
Today’s organizations are dealing with data of unprecedented diversity in terms of type, location and use at equally unprecedented volumes and no one is proposing that it is ever going to simplify. This multiplicity of data leads to the growth silos, which in turns increases the cost of integration. Added to this is the increasing demands being made on our data from event-driven and real-time requirements, the rise of business-led use and understanding of data, and the move toward automation of data integration, data and service-level management.
The purpose of weaving a Data Fabric is to remove the friction and cost from accessing and sharing data in the distributed ICT environment that is the norm. It is the understanding that data management must be simplified and the seams between cloud-based storage and local one must be invisible. It must not attempt to force or predict how the data is to be used but to enable that decision to be made by today’s user (and tomorrow’s) while still maintaining organization appropriate access control and security. It is important to remember that in an age where new technologies can go from cult usage to widespread adoption with astonishing rapidity that a Data Fabric aims to orchestrate existing and future data services rather than replace existing infrastructure.
The Data Fabric paradigm combines design principles and methodologies for building efficient, flexible and reliable data management ecosystems. This means the creation of reusable data services, machine-readable semantic metadata and APIs that ensure the integration and orchestration of data across the organization and with third-party external data.
To implement any Data Fabric approach, it is essential to be able to understand the context of data. This means having the ability to define and relate all types of metadata. There must be a representation of the low-level technical and operational metadata as well as the ‘real world’ metadata of the business model or ontologies.
There are several characteristics of the knowledge graph that make this possible:
In other words, knowledge graphs solve the data silo problem by making it irrelevant. The use of knowledge graphs doesn’t try to enforce yet another format on the data but instead overlays a semantic data fabric, which virtualizes the data at a level of abstraction more closely to how the users want to make use of the data. The multiple and varying ‘views’ of the data are now possible without modifying the data at its source or the host system.
The users are freed from having to negotiate the particularities of where the data is, how to get it and the effect of changes on that data for others. With knowledge graphs, users can create on-the-fly views of the data without duplication and without being beholden to the idiosyncrasies of the data’s origins and tailored to the user’s security privileges, technical ability and needs.
With its cloud-agnostic infrastructure, Ontotext Platform can operate in cloud and on-premise environments. It is an enterprise ready platform supporting LDAP integration, multi factor authentication and performance monitoring dashboards.
Ontotext Platform ensures data is accessible to the people in the organization that need the data rather than depending on a technical staff to package it and ferry it to them. Often software architects rely on customized APIs that require extra time and effort to develop. These additional software components need to be updated, tested and deployed, which goes counter to the Data Fabric goal of creating frictionless movement of data.
Ontotext uses an automatically generated GraphQL API to support efficient integration into presentation layers. The GraphQL API created in this process allows for the emerging changes made via the ontology management interface to smooth out the evolution of schema, and it also enables mashup generation for simple adaptation to changing integration requirements. Ontotext Platform provides efficient, consistent and easy access to the data of knowledge graphs via the GraphQL interfaces.
In order to realize a Data Fabric, there are practical steps needed to bring knowledge graphs to enterprises.
Establish the goal behind collecting the data and define what questions you want to get answered. If needed, Ontotext’s consultants and partners can advise you on your data management strategy and plans.
Discover what datasets, taxonomies and other information (proprietary, open or commercially available) would serve you best to achieve your goal in terms of domain, scope, provenance, maintenance, etc. Consider using data catalogs for this purpose.
Correct any data quality issues to make the data most applicable to your task. This includes removing invalid or meaningless entries, adjusting data fields to accommodate multiple values, fixing inconsistencies, etc.
Analyze thoroughly the different data schemata to prepare for harmonizing the data. Reuse or engineer ontologies, application profiles, RDF shapes or some other mechanism on how to use them together. Formalize your data model using standards like RDF Schema and OWL.
Apply ETL tools to convert your data to RDF or use data virtualization to access it via technologies such as NoETL, OBDA, GraphQL Federation, etc. Gather metadata from the underlying systems and generate semantic metadata to make the data easier to integrate, update, discover and reuse.
Match descriptions of one and the same entity across datasets with overlapping scope, handle their attributes to merge the information.
Merge different graphs flawlessly using the RDF data model. For locally stored data Ontotext Platform can efficiently enforce the semantics of the data model via reasoning, consistency checking and validation. It can scale in a cluster and synchronize with search engines like Elasticsearch to match the anticipated usage and performance requirements.
Enrich your data extracting new entities and relationships from text. By applying inference and graph analytics to uncover new information. Now your graph has more data than the sum of its constituent datasets. It is also better interconnected, which brings more content and enables deeper analytics.
Start delivering the answers to your original questions through different knowledge discovery tools such as powerful SPARQL queries, easy to use GraphQL interface, semantic search, faceted search, data visualization, etc. Also, ensure that your data is FAIR (findable, accessible, interoperable and reusable).
Finally, after you have crafted your knowledge graph and people have started using it, keep it live by setting up your maintenance procedures – the way it would evolve and updates from the different sources will be consumed without compromising data quality.
Ontotext technology provides the continuous data operations, data management for analytics and the metadata management that a Data Fabric requires. Each organization has different requirements, challenges and goals when implementing a Data Fabric. Often, this requires a tailored solution to ensure the Data Fabric meets the enterprise’s specific requirements and data strategy.
There is no universal approach, but as more and more organizations in various industries have turned to Ontotext products and services for better enterprise knowledge management, data and content analytics, Ontotext has developed a methodology for transforming and interlinking huge amounts of data and providing intuitive and useful interfaces for the frictionless access to data that today’s organizations need.
Learn more about enterprise knowledge graph!