Implement a Connected Inventory of enterprise data assets, based on a knowledge graph, to get business insights about the current status and trends, risk and opportunities, based on a holistic interrelated view of all enterprise assets.
Improve engagement, discoverability and personalized recommendations for Financial and Business Media, Market Intelligence and Investment Information Agencies,Science, Technology and Medicine Publishers, etc.
How to adopt the principles of data hygiene and take advantage of semantic standards for identity, meaning and business rules
October 14, 20225 mins. readMichael Atkin
This post is designed to demystify semantic standards and knowledge graphs for executive stakeholders. Semantic standards were developed by the US Department of Defense two decades ago to support flexible analysis of large datasets and the need to share data across federated and interdependent systems. They form the infrastructure for the World Wide Web and are being used by many leading companies to harmonize data, unravel risk, automate processes and capitalize on business opportunities.
The basic components include:
Identity Resolution (IRI): The semantic standard is based on the assignment of a unique web address to every data concept in the form of an Internationalized Resource Identifier (IRI). The IRI is a meaningless ‘identifier’ (what something represents) as well as a ‘locator’ (where it resides). Think of the IRI as the Rosetta stone for identity resolution allowing firms to link data wherever it resides to one master ID.
Meaning Resolution (ontologies): Ontology is simply a data modeling and communication process used to ensure a shared understanding of requirements between business stakeholders and application developers. The Web standard uses conceptual data models (ontologies) to precisely describe what data means as well as how concepts are connected. Expressing data at a granular level – and unhooking meaning from structure – allows ultimate flexibility for it to be combined and aggregated.
Triple Expression (RDF, OWL): The method invented by the DoD to shift from columnar to semantic structure is known as triple store processing. Information is organized into groups of three that contain subjects and objects that are linked together by predicates. This basic sentence structure allows one to link data directly with meaning ensuring that concepts are defined and understood at their most granular level.
Business Rules (SHACL): Business rules are ‘conditional expressions’ based on criteria by SMEs. These rules are expressed in standard language and linked to ontologies to ensure that meaning is shared (not obscured by vague terms or cryptic codes). The logic is captured and expressed as executable models and consistently enforced across all systems and processes.
Using these four building blocks, semantic technology provides eight foundational capabilities that work together to create business value.
Quality by Math: Linking the data in your organization to the ontology ensures precision of meaning about concepts, systems, people and processes. This means that errors and definitional conflicts are verified before they are introduced into operational systems. As a result, users have confidence they are getting the information they need for context and to examine ad hoc business questions.
Concept Reuse: Using Web standards for modeling eliminates the problem of ‘hard-coded assumptions’ (i.e., doing the same thing in slightly different ways) because it focuses on concepts, not specific applications. Users always understand what the data represents at its most granular form. This enables an efficient reuse of important concepts across systems and processes.
Context: Semantic standards allow architects to separate business logic from code. This is accomplished by reference to the ontology and by its singular identity. Identity and meaning are moderated by a time stamp to express exactly when it occurred and by source, so you know where the data originated. With semantic standards, we can understand all data in context by examining these four dimensions of identity, meaning, time and source.
Access Control: Security is embedded into the design of the data and not constrained by either systems or administrative complexity. Rules can be modeled for all circumstances and controlled at both the applications and data level.
Lineage Traceability: All data is linked to a single identifier so that firms can trace the data as it flows through systems. Data can be transformed and renamed many times as it flows across systems without losing the knowledge of where it came from, what it represents and where it is going. Lineage and provenance objectives are automatic and fully auditable.
Governance: Semantic standards use the capabilities of resolvable identity, precise meaning, structural validation and lineage traceability to shift the governance focus from people-intensive data reconciliation to more automated data applications.
Machine-Readable: Semantic standards are written in a language that both humans and machines can understand. The use of machine-readable standards facilitates automatic validation and provides assurance of data quality.
Continuous Testing: Use cases and individual user objectives in semantic environments are linked to automated testing procedures and issue management. With these standards, every change is linked to a testing process for both logic and circular reasoning. If there are changes to authoritative sources, the downstream implications and dependencies are tracked and tested.
Semantic technology is designed to ensure flexibility of design and reusability of function. It represents a huge breakthrough for data management because it is cost-efficient, non-intrusive, standards-based and governed by trusted processes.
Do you want to know more about data management using semantic standards?
Michael Atkin has been an analyst and advocate for data management since 1985. His experience spans from the foundations of the information industry to the adoption of semantic technology. He has served as an advisor to financial institutions, global regulators, publishers, consulting firms and technology companies.