Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

This article was originally published in Dataversity.

August 4, 2023 6 mins. read Doug Kimball

Generating actionable insights across growing data volumes and disconnected data silos is becoming increasingly challenging for organizations. Working across data islands leads to siloed thinking and the inability to implement critical business initiatives such as Customer, Product, or Asset 360. As data is generated, stored, and used across data centers, edge, and cloud providers, managing a distributed storage environment is complex with no map to guide technology professionals.

According to McKinsey, users often spend 30% of their time trying to find the right data. As a result, organizations are applying data fabrics to create a virtually unified environment so data consumers can access data splintered across applications and processes.

Data Fabric: Who and What?

According to Gartner, data fabric is a design concept that serves as an integrated layer (fabric) of data and connecting processes. A data fabric utilizes an integrated data layer over existing, discoverable, and inferenced metadata assets to support the design, deployment, and utilization of data across enterprises, including hybrid and multi-cloud platforms.

This logical data architecture is designed to help organizations deal with growing volumes of data, spanning data silos with seamless connectivity and a knowledge layer. Using metadata, machine learning (ML), and automation, a data fabric provides a unified view of enterprise data across data formats and locations. It enables data federation and virtualization as well as seamless access and sharing in a distributed data environment. It also helps capture and connect data based on business or domains.

Using a data fabric, organizations can improve the usability and quality of their assets and extend and enrich it with reusable services. Thanks to the metadata that the data fabric relies on, companies can also recognize different types of data, what is relevant, and what needs privacy controls; thereby, improving the intelligence of the whole information ecosystem.

As a design concept, data fabric requires a combination of existing and emergent data management technologies beyond just metadata. Data fabric does not replace data warehouses, data lakes, or data lakehouses. Instead, it leverages AI and graph-based analytics as well as deeply integrated data management workflows and applications. A fabric aggregates data from heterogeneous sources with a virtualization layer that assimilates data with zero copy. The data fabric layer also ensures privacy and compliance with regulations.

Data Fabric: When, Where, and Why

Data fabric is best suited for large organizations with a rapidly growing data footprint that resides across a myriad of sources and includes a variety of formats stored across multiple data centers. Democratizing access to data to build competitive intelligence is another popular use case, as data fabrics help organizations with highly interrelated data needs to unify information across different business units and departments. After all, when businesses lack domain context, and unified semantics hinder data usage within the organization, a data fabric approach can be a game-changer.

Major goals of data fabric include:

  • Create smart semantic data integration and engineering: with governed access to improve findability and comprehensibility of data.
  • Enable tagging and annotations: supported by centralized policies for access, privacy, protection, and quality of data with enforcement of governance policies.
  • Reduce time to insight and streamline data access: across business intelligence, ML, and other use cases by simplifying data integration and distribution of data across systems.
  • Assimilate, aggregate, and unify heterogenous siloed data: regardless of format, making it available for humans and machines to discover and consume unambiguously.

Adopting a data fabric approach to enterprise data management challenges simplifies integration. It lowers data management costs by eliminating silos and reducing integration complexity. This also provides the flexibility to add new data sources, applications, and data services as needed without disrupting existing infrastructure.

Components of a Data Fabric Architecture

Data fabric implementations and deployment vary across organizations and, unlike traditional approaches, there is no one-size-fits-all solution. The approach is unique to each business and organizations must choose from a variety of technologies and products to construct and assemble the data fabric that works best for them. Often vendors embellish data catalogs and sell them with a data fabric moniker. Organizations can buy pre-integrated tools from a vendor or incorporate best-of-breed components from different vendors and integrate internally, to build a data fabric.

Under the hood, a data fabric relies on universal data representation that allows efficient and effective search, automation, integration, and reuse of data across silos, applications, and use cases. At its core, data fabric incorporates ML-driven algorithms and processes to automate discovery, cataloging, and preparation so data teams can keep up with continuously evolving data and schema.

Powered by a layer of software over existing systems, and composed of several services, data fabric leverages rules to automatically map and link policies to data assets that are managed using classification and business vocabularies and taxonomies.

Knowledge Graphs: A Key Building Block for Data Fabric

A knowledge graph (KG) driven layer is the core of a strong data fabric. A KG adds semantics and context to the data pieces and links/interconnects data elements across diverse structured and unstructured datasets, enabling seamless integration and data interoperability. With a semantic KG, data is mapped to semantic standards which the graph model is created and based upon. This aids in data discovery and exploration as it identifies patterns across all types of metadata.

Using the concepts, entities, relationships, and semantics in the knowledge graph model, the data fabric blends diverse datasets and makes it meaningfully consumable across data products. Knowledge graph models with support for semantics, standardization, data and fact validation capabilities, can be used to ensure semantic data quality, as well as data consistency, interoperability, and discoverability. A data fabric needs to continuously find, integrate, catalog, and share metadata, across hybrid and multi-cloud platforms, and the edge. This metadata, with its interconnections and relationships, is represented as a graph of connected entities and attributes with an ontology.

The semantic catalog core is curated and enhanced with metadata that defines data policies for privacy, data lineage, security, and compliance validations. This applies policies based on consumer profiles to automate policy enforcements. Automated data enrichment is applied to auto-discover, classify, detect sensitive data, analyze data quality, and link business terms to technical metadata. The knowledge-based metadata core relies on AI and ML algorithms and augments the metadata to create and enrich the knowledge catalog. This facilitates discovery, enriches data assets, and performs analysis to extract insight for more automation using AI.

Data fabric represents the evolution of enterprise data architecture with the goal of automating and reducing the two most challenging aspects of data in large organizations – data silos and data integration. A data fabric that leverages semantic knowledge graphs is the key to powering intelligent data catalogs and virtualization approaches that can let data remain in place, while providing uniform, governed access for enterprise consumption across data centers and organizational boundaries.

Article's content