Data mesh is an operational framework that allows organizations to have a better domain-centric, decentralized data architecture. It’s a blend of implementation, organizational structures, and a technology-agnostic set of principles. By transferring the ownership to the data product owners, data mesh can better serve consumers and reshape how data initiatives are managed within organizations.
The data mesh paradigm aims to solve some of the most significant pain points faced by most organizations.
One such challenge involves scaling enterprise data infrastructure to accommodate diverse data types from various sources. This issue becomes particularly troublesome when dealing with data veracity. Data mesh addresses this by promoting data autonomy, enabling users to make domain-related decisions without relying on a central authority. Additionally, it enhances velocity by implementing improved data governance and access aligned with business requirements.
Another common obstacle is the inability to establish a domain-centric sense of data ownership, which is often disconnected from the business’s understanding of data. This makes it challenging for organizations to leverage their data as a strategic asset. A mesh approach aids in distributing data ownership and reducing interdependencies between services, fostering a data-driven mindset.
Effective coordination and communication across cross-functional teams present yet another challenge and can often lead to failures as the gap between data and business widens. By granting organizational autonomy to teams, data mesh eliminates central bottlenecks and delivers value from data.
According to Zhamak Dehghani, the innovator behind this paradigm, the fundamental principles of data mesh include:
To better understand the principles of data mesh better, and how to best enable it, let’s first discuss the key components including domain, data product, data contracts, and data sharing.
In data mesh, the term “domain” refers to a logical grouping of organizational units that collectively serve a specific functional context. It covers a set of tasks that the domain is assigned to perform, explaining why the domain exists within the constraints of the organization.
Domains are represented by a node, which can take the form of an Operational Data Store (ODS), a data warehouse, or a data lake customized to meet the specific requirements of the domain. They can ingest operational data, create analytical data models as data products, and publish them with data contracts to fulfill the data needs of other domains.
Data mesh emerges when teams use data products from other domains and the domains communicate with others in a governed manner.
A data product is the node on the mesh that encapsulates code, data, metadata, and infrastructure. These data products are created, curated, and offered to users as self-service, providing a dependable and trustworthy source for sharing data across the organization.
Teams aligned with specific domains take ownership of these data products and assume responsibility for managing aspects like Service Level Agreements (SLAs), data quality, and governance. The data product owner is accountable for establishing mechanisms that allow secure and dependable interactions and transactions between data producers and data consumers. Additionally, they provide the necessary infrastructure and mechanisms to permit such interactions.
Figure 2 shows the concept of a data product.
Data contracts enable domain developers to create products according to specifications. These contracts ensure interface compatibility and include terms of service and an SLA. They cover the utilization of data and specify the required data quality.
The primary objective of data contracts is to establish transparency for data use and dependencies, while also outlining the terms of service and SLAs. However, implementing this requires a cultural shift, and users need time to familiarize themselves and understand the importance of data ownership. Data contracts should also include information schema, semantics, and lineage.
Data sharing facilitates domain teams to connect and share data products without the necessity of duplication. Ideally, data should not be copied, which reduces the generation of isolated data repositories and keeps ownership within domain ownership. To securely share datasets between producers and consumers, organizations need to employ a centralized data governance approach, ideally facilitated through metadata linking.
Team and organizational structure are important aspects to consider when considering the adoption of data mesh. It is typical to organize teams around selected domains rather than have a centralized team.
Domain teams are responsible for all processes – data collection, transformations, cleaning, enrichment, and modeling. Within a domain, teams are arranged vertically and consist of roles required to deliver data such as DataOps engineers, Data Engineers, Data Scientists, Data Analysts, and Domain Experts.
The foundational principles of knowledge graphs, driven by semantics and context, position them as an ideal support system for enterprise data mesh and data fabric based development. Knowledge graphs offer the means to ensure that data contracts are standardized, uniform, consistent, semantically correct, and aligned with datasets. They empower data-sharing platforms to connect data between users, systems, and applications consistently and unambiguously. This facilitates compliance with data contracts, guaranteeing data types, schema, entities, and their inter-relationships across data products are semantically valid.
For domain-centric and enterprise data catalogs, leveraging a knowledge graph to store semantics with metadata is highly beneficial. Knowledge graphs help in automatic metadata extraction, generation and enforcement of data quality standards, and certifying data assets based on semantic rules and validation criteria.
By integrating knowledge graphs with data mesh, a semantic data mesh can emerge. This can provide data across different domains in the mesh with context and meaning. It fosters semantic data discoverability, interoperability, augmentation, and enrichment and provides explainability for AI and machine learning use cases.
Data mesh is most of all a shift in culture, processes, and people, and these facets take time to change in larger organizations. This concept might not be easy to embrace and implement as a data architecture. Some organizations choose to focus on specific aspects of data mesh or implement a simplified version of the architecture.
For organizations, considering data mesh, this is not a yes or no decision. Instead, it should be an exercise of identifying the obstacles that hinder them from delivering business value in a timely and efficient manner. The ability to connect, share, and access data effectively across the enterprise is a very likely answer and is one that a data mesh supported by knowledge graphs is set up to deliver.