Data Mesh 101: How Data Mesh Can Be Used in an Organization

Best practices for successfully adopting the data mesh paradigm in your organization. This article was originally published in TDWI.org

February 12, 2024 6 mins. read Sumit Pal

Part one of this three-part series discussed the concept of data mesh and explored what it is and why an organization should care. Here, part two provides best practices for data mesh, including practical guidance, challenges, and limitations.

As discussed in part one, the data mesh paradigm is still a relatively new concept with implementations in the early stages. It requires a shift in mindset, changes in team structures, and maturity that doesn’t happen overnight. Most enterprises have years of legacy systems, processes, and practices that make it more demanding to quickly jump onto the data mesh bandwagon. How data mesh is implemented varies depending on each organization’s maturity and capabilities.

To make it easier to get started, here are some general best practices to help bring a successful data mesh initiative to life:

Avoid mesh rush – instead transition to a data mesh by implementing it in increments, iteratively, with feedback loops. It is important to assess data and technical maturity for each organization as implementing it is not suitable for certain organizations, such as start-ups or organizations with only a few domains.

Engage key stakeholders from the start. Getting buy-in from CDOs, CDAOs, CIOs, and others is critical for the eventual success of your data mesh.

Collaborate across domains – this is key for ensuring all teams understand the data services available and use and reuse them for data product development.

Determine ownership by making sure all teams involved in the data mesh own the quality of their domain data, ensure service-level agreements are met, and share that data with data contracts. Domain teams should continually monitor for data errors with data validation checks and incorporate data lineage to track usage.

Establish and enforce data governance by ensuring all data used is accurate, complete, and compliant with regulations. This includes establishing data quality checks, implementing processes for data maintenance, and developing guidelines for usage.

Empower teams with the right self-service capabilities to deliver data products quickly so they don’t get bogged down trying to implement everything a data mesh advocates. Enterprises should identify and adopt specific data mesh elements to achieve velocity.

Develop centralized data engineering and infrastructure teams and have them create reusable self-service capabilities for provisioning infrastructure, data ingestion, storage, and processing. Domain teams should not be creating these services.

Limitations of a Data Mesh

The biggest challenge in adopting a data mesh is the need to think in a domain-oriented way instead of a monolithic approach. This can be difficult, as it requires re-organizing teams, tools, and processes across the organization. With many independently managed services, it’s also important to have clear communication and coordination across domain teams to mitigate friction. This calls for additional planning, documentation, and testing. A data mesh will likely require more engineers to get started, so a critical mass is needed for successful adoption. Not all organizations are ready to make that investment.

Lastly, it can be difficult for organizations to carefully balance opposing forces — for example, the time needed to fix immediate issues versus the time to build necessary platforms and services for long-term benefits.

How to Adopt

Embarking on a data mesh journey is a significant undertaking that requires careful planning and consideration of culture, processes, technology, and governance. Ensure a successful implementation by simplifying the access, use, and publishing of data products.

  1. The first step is to assess your organizational readiness by evaluating the current architecture, infrastructure, and data teams’ skills. Understand the existing culture and willingness to embrace a decentralized approach. Then, identify potential domains and business units that are mature and ready to adopt this paradigm. Companies should organize teams around selected domains and invest in training teams on the principles and technologies that underpin the data mesh. These include domain-driven design, microservices, data governance, and modern data toolsets.
  2. The next stage is to set your goals, objectives, and a phased road map by defining what success looks like with data mesh. It’s also critical to advocate a smooth culture change because data mesh involves shifting from thinking about data as tables to data as a combination of multiple elements, such as code, infrastructure, and metadata.
  3. Now it’s time to hold your domain owners and data producers responsible for data quality and treat data assets as products to be delivered to the organization. Each team should be accountable for providing their prepared data sets to downstream systems. Likewise, domain teams should oversee end-to-end processes from data collection, transformations, cleaning, and enrichment to modeling. Ensuring domain teams have a full stack of engineers and specialized roles such as DataOps, data engineers, data scientists, analysts, and domain experts is another consideration. Remember, specialists may work across more than one domain team.
  4. Once you address the human aspect, begin with one or two domains while keeping everything else constant because data mesh can be adopted domain by domain, pillar by pillar. Collect feedback, continuously evaluate, and iterate by learning what works best for the organization. Empower centralized data engineering and infrastructure teams to build shared systems and allow domain teams to enable self-service capabilities for the domain teams to perform their work.
  5. Finally, set a governance framework for each domain team and at the global organization level. Processes for data quality checks, data maintenance, and guidelines need to be established. Your business should also implement templates or blueprints to help support domain teams with onboarding data. These templates should be customizable with documentation and incorporate security guardrails and policies. They should also be made available with self-service across infrastructure, data pipeline orchestration, data processing, and data access.

Data Mesh in Action

As data proliferates and sources diversify, data mesh is a great way to build data products and become data-driven at scale. Many reputable organizations have adopted it and successfully implemented several use cases.

For instance, JPMorgan Chase & Co. uses its data lake via data mesh to determine wholesale credit risk. HSBC securities services reworked its data into one-layered data-as-a-service with a data mesh producing data products accessible to clients. Zalando, a well-known fashion retailer, has successfully leveraged a data mesh for its marketing domain by focusing on building domain expertise, automating data pipelines with a governance framework, with organization-wide rollout.

Although data mesh adoption is gaining in popularity, it is an endeavor that needs continuous evaluation and checkpoints to ensure success. Every organization will have its unique path to its adoption and implementation. Collaboration and endorsement across business units are fundamental to your data mesh reaching its full potential and your organization maximizing its benefits. However, if done correctly, the rewards can be substantial.

In the final installment of this three-part series, I will explore how the data mesh can help bolster performance and share how it helps organizations and data teams work more effectively.

 

Article's content

Strategic Technology Director at Ontotext

Sumit Pal is an Ex-Gartner VP Analyst in Data Management & Analytics space. Sumit has more than 30 years of experience in the data and Software Industry in various roles spanning companies from startups to enterprise organizations in building, managing and guiding teams and building scalable software systems across the stack from middle tier, data layer, analytics and UI using Big Data, NoSQL, DB Internals, Data Warehousing, Data Modeling, Data Science and middle tier.