What is Data Mesh?

Data mesh is still in its infancy, and data personas and organizations are craving clarity and specificity. That is why it is so important to be aware of the Why and What of data mesh and the role that knowledge graphs should play.

Data mesh is still in its early stages, leaving both data personas and organizations yearning for clear and precise guidance. It is vital to comprehend the reasons and objectives for a data mesh approach and thoroughly understand of the role that knowledge graphs play when contemplating the adoption of a data mesh strategy.

The ongoing discussion regarding the definition of a data mesh is still going on. Most debaters agree that it is not a platform or an off-the-shelf product and that its primary goals should be to enhance data accessibility, security, discoverability, and interoperability. However, in addition to the confusion of what it is, there are also the challenges associated with understanding the processes and data connections necessary to implement a data mesh initiative successfully. While clarifying the essence and mechanics of data mesh is crucial, the foremost priority is to initiate the journey by understanding the underlying motivations.

Why do organizations need a data mesh?

The data mesh paradigm seeks to tackle some of the most significant pain points faced by most organizations.

One such challenge involves scaling enterprise data infrastructure to accommodate diverse data types from various sources. This issue becomes particularly troublesome when dealing with data veracity. Data mesh addresses this by promoting data autonomy, enabling users to make domain-related decisions without relying on a central authority. Additionally, it enhances velocity by implementing improved data governance and access aligned with business requirements.

Another common obstacle is the inability to establish a domain-centric sense of data ownership, which is often disconnected from the business’s understanding of data. This makes it challenging for organizations to leverage their data as a strategic asset. A mesh approach aids in distributing data ownership and reducing interdependencies between services, fostering a data-driven mindset.

Effective coordination and communication across cross-functional teams present yet another challenge and can often lead to failures as the gap between data and business widens. By granting organizational autonomy to teams, data mesh eliminates central bottlenecks and delivers value from data.

Figure 1 shows the overall idea of a data mesh with the major components.

What is a data mesh and how does it work?

Think of data mesh as an operational mode for organizations with a  domain-driven,  decentralized data architecture. It’s a combination of implementation, organizational patterns, and a technology-agnostic set of principles. By pushing the ownership to the product owners of the data, it can better serve the consumers, changing the way data projects are managed within organizations.

Consider data mesh as an operational framework for organizations that embrace a domain-centric, decentralized data architecture. It represents a blend of implementation, organizational structures, and a technology-agnostic set of principles. By transferring the ownership to the data product owners, it can better serve consumers and reshapes how data initiatives are managed within organizations.

According to Zhamak Dehghani, the innovator behind this paradigm, the fundamental principles of data mesh include:

  • Treating data as a product –  Empowerя business units to take ownership and provide their data as a product in its own right.
  • Domain-driven ownership of data –  This approach allows users to discover, explore, create, and enrich new data sources based on specific use cases, all while maintaining centralized governance for security and privacy.
  • Self-serve data platforms –  Enables domain teams to offer self-service capabilities that simplify the processes of data creation and consumption of data products.
  • Federate trust and computational platform – Creates an ecosystem in which users derive value by aggregating and correlating independent data products. This is possible because the data mesh is based on the foundation of interoperability standards.

To better understand the principles of data mesh better, and how to best enable it, let’s first discuss the key components including domain, data product, data contracts, and data sharing.

What is a domain?

In the context of a data mesh, the term “domain” refers to a logical grouping of organizational units that collectively serve a specific functional context. It encompasses a set of tasks that the domain is assigned to perform, essentially explaining why the domain exists within the constraints of the organization.

Within a data mesh framework, domains are represented by a node, which can take the form of an Operational Data Store (ODS), a data warehouse, or a data lake customized to meet the specific requirements of the domain. Domains can ingest operational data, create analytical data models as data products, and publish them with data contracts to fulfill the data needs of other domains.

Mesh emerges when teams use data products from other domains and the domains communicate with others in a governed manner. 

What is a data product?

A data product is the node on the mesh that encapsulates code, data, metadata, and infrastructure. These data products are created, curated, and offered to users as self-service, providing a dependable and trustworthy source for sharing data across the organization.

Teams aligned with specific domains take ownership of these data products and assume responsibility for managing aspects like Service Level Agreements (SLAs), data quality, and governance. The data product owner is accountable for establishing mechanisms that allow secure and dependable interactions and transactions between data producers and data consumers. Additionally, they provide the necessary infrastructure and mechanisms to permit such interactions.

Figure 2 shows the concept of a data product.

What is a data contract?

Data contracts enable domain developers to create products according to specifications. These contracts ensure interface compatibility and include terms of service and an SLA. They cover the utilization of data and specify the required data quality.

The primary objective of data contracts is to establish transparency for data usage and dependencies, while also outlining the terms of service and SLAs. However, implementing this requires a cultural shift, and users need time to familiarize themselves and understand the importance of data ownership. Data contracts should also include information schema, semantics, and lineage.

What is data sharing?

Data sharing facilitates domain teams to connect and share data products without the necessity of duplication. Ideally, data should not be copied, thereby reducing the proliferation of isolated data repositories and keeping ownership within domain ownership. To securely share datasets between producers and consumers, it’s advisable to employ a centralized data governance approach, ideally facilitated through metadata linking.

What is team structure?

Team and organizational structure are important aspects to consider within the context of data mesh. It is typical to organize teams around selected domains rather than have a centralized team.

Domain teams are responsible for all processes – data collection, transformations, cleaning, enrichment, and modeling. Within a domain, teams are arranged vertically and consist of roles required to deliver data such as DataOps engineers, Data Engineers, Data Scientists, Data Analysts, and Domain Experts.

Knowledge graphs and data mesh

The foundational principles of knowledge graphs, driven by semantics and context, position them as an ideal support system for enterprise data mesh and data fabric-based development. Knowledge graphs offer the means to ensure that data contracts are standardized, uniform, consistent, semantically correct, and aligned with data sets. They empower data-sharing platforms to connect data between users, systems, and applications consistently and unambiguously. This facilitates compliance with data contracts, guaranteeing data types, schema, entities, and their inter-relationships across data products are semantically valid.

For domain-centric and enterprise data catalogs, leveraging a knowledge graph to store semantics with metadata is highly beneficial. Knowledge graphs help in automatic metadata extraction, generation and enforcement of data quality standards, and certifying data assets based on semantic rules and validation criteria. 

The integration of knowledge graphs with data mesh can lead to the emergence of a semantic data mesh, which provides data across different domains in the mesh with context and meaning. This fosters semantic data discoverability, interoperability, augmentation, and enrichment and provides explainability for AI and ML use cases.

Conclusion

Data mesh is primarily a shift in culture, processes, and people, and these facets are hard to change quickly in larger organizations. It is not a concept that can easily be embraced and implemented like a data architecture. Some organizations choose to focus on specific aspects of data mesh or implement a simplified version of the architecture.

For organizations, considering data mesh is not a yes or no decision. Rather, it should be an exercise to identify the obstacles hindering them from delivering business value in a timely and efficient manner. The ability to connect, share, and access data effectively across the enterprise is a very likely answer and is one that a data mesh supported by knowledge graphs is set up to deliver.

Want to learn more about data mesh as well as other key items and trends?

New call-to-action

Ontotext Newsletter