Ontotext talks to Gene Loh, Director Software Development at Synaptica, and Vassil Momtchev, Ontotext CTO, about the RDF-star extension to the RDF graph data model,…
When people talk about Knowledge Organization Systems (KOS), they use a generic term that embraces taxonomies, thesauri, controlled vocabularies, ontologies, classification schemes, name authorities, topic maps and other structured terminologies. These are often presented on an ascending scale of complexity, with ontologies represented as a more complex version of taxonomies.
At Synaptica, we find it helpful to envision a semantic knowledge organization system, also frequently referred to as an ontology, as something that comprises both a semantic schema and a taxonomy. The schema is all the structural elements that provide the framework for creating individual named things. These named things can be an OWL class or an OWL named individual, or a SKOS concept. They comprise the superset of entities that belong inside the ontology and are subject to the rules of its semantic schema.
It is true that ontologies are often associated with OWL and taxonomies with SKOS. However, this is misleading because both OWL and SKOS are ontological schema used to describe domains of knowledge whose concepts, classes and instances collectively form a taxonomy. The key distinction between OWL and SKOS is that use-cases for OWL demand transitive relationships that support inferencing whereas use-cases for SKOS typically don’t.
So, SKOS is the right fit for some situations and OWL for others. That’s why, when we onboard new clients, we always start with a discussion to understand their particular knowledge domains, their business objectives and the knowledge modeling options for designing appropriate semantic schema.
There are several KOS editing tools in the marketplace. Synaptica developed Graphite, which we embed with Ontotext GraphDB to provide enterprises with easy-to-use tools for managing controlled vocabularies and ontology schema over an RDF graph database.
Reduced to three simple steps, building a KOS involves:
In the following section we will look at the process in more detail.
Knowledge modeling is a similar process to data modeling. It involves methodically identifying how to describe a domain of knowledge. At the high level you need to identify the fundamentally different types of thing in your domain, e.g. topical concepts, people, places, organizations, business processes, etc. Each fundamentally different type of thing will necessarily be described by different characteristics or properties. Fundamentally different types of thing are best managed as a separate KOS scheme, or as top-level classes within a KOS scheme.
For each scheme or top-level class the next knowledge modeling exercise involves defining what are the set of properties needed to describe entities of this type. For topical concepts the lexical and annotation properties of SKOS may suffice. In the case of people, for example, the properties will be very different, and may include attributes such as name, birthplace and birthdate. In addition to schemes, classes and properties, you will also want to define the relationships used to link together entities within a scheme and between schemes.
Collectively, all these components form the knowledge model, and when they are defined using RDF classes and predicates, they become the semantic schema of an ontology.
In Graphite, you can curate public and private namespaces and manage sets of classes and predicates within each namespace. Predicates include data properties for storing strings, numbers, dates, URLs, etc., and object properties for linking concepts together to form hierarchies and associations. Namespaces and their predicates serve as a library of reusable building blocks for constructing KOS schemes, which you can then populate with taxonomies of named things (concepts, classes and individuals).
The next step is to create a project, which in Graphite provides a collaborative workspace. In each project you can bring in taxonomy schemes, ontology schemes and set permissions, which gives you a view and editorial control of all your KOS schemes.
You can also create collections that help you tag concepts in one or more KOS schemes to identify subsets by topic, consuming system, workflow state, etc.
The next step is to visualize your project. This helps you see the interconnectedness of the KOS schemes that collectively form the semantic schema of a domain of knowledge. For example, the screenshot below visualizes the semantic schema of multiple KOS wired together to form something like a super KOS. The nodes of this graph are not concepts or classes, rather each node is an entire KOS scheme. The edges in this graph represent the ways that each scheme is wired to every other scheme to form an overall semantic schema.
Each individual KOS Scheme is the logical container for a discrete taxonomy. Each scheme has a schema, which defines the semantic structure and business rules of the taxonomy as well as the set of individual concepts that form the body of the taxonomy.
The screenshot below shows an individual KOS and how you can control it. You can pick the properties from a library of namespaces and predicates to bring in relationships and data properties, you can control cardinality constraints, etc. Graphite is designed for controlled vocabulary applications, so you can use it to nominate individual data properties to be the unique descriptive field (preferred label) for a concept and enforce descriptor uniqueness.
And that’s the last stage of modeling the semantic schema. A triplestore without a controlled vocabulary interface like Graphite is very permissive and will let you write any kind of triple you want. The idea of using a tool like Graphite on top of a tool like GraphDB is to create and manage the business rules for doing controlled vocabularies and managing permissions to ensure those controlled vocabularies are built and maintained according to the governance policies of your enterprise.
Now let’s have a quick look at importing. Apart from importing RDF files, you can also import data from simple grids in Excel or CSV.
In the example below, you can see an Excel file that contains a concept hierarchy using a human readable nested indented format in the first three columns, plus additional columns for definitions, synonyms and associative relationships.
This file is then imported under an empty Living Things taxonomy, which transforms it into a SKOS or OWL based RDF KOS.
For example, if you look at the “Canines” concept in the hierarchy below, you will see that it illustrates all the taxonomy elements: all the parents, children, synonyms, definitions, and related concepts have been imported.
Anything you want to change is as simple as dragging and dropping it in the hierarchy. Or dragging it from the discovery pane and dropping it to the workspace on to the predicate that you want to link it to.
Now you can start exploring the taxonomy, not just as a simple tree structure, but as a multi-dimensional web of interconnected relationships that form a knowledge graph. For example, visualizing ‘Carbon dioxide’ in the Chemistry (SKOS) taxonomy exposes relationships to a web of concepts from an EU and a UN taxonomy.
In this graph, concepts in the Chemistry taxonomy link to concepts in GEMET (a European Union multilingual environmental sciences thesaurus) and to UNESCO. So, you can see how, in a knowledge graph, you can traverse from the chemical compound ‘carbon dioxide’ to ‘climate change mitigation’ and then to ‘climate policy’ in the EU’s environmental science taxonomy. It’s easy to visualize this in Graphite, so that when you build and edit the taxonomies, you have full sight of all that connectedness.
Collaboration, workflow and governance is another important part of doing controlled vocabularies. As mentioned earlier, a triplestore alone is a very permissive world without a lot of rules. In a controlled vocabulary environment, you need to control permissions, workflows, etc. So, we have built tooling for that.
In the Project Admin tool, you can define role-based user groups. For each group you can set hide, view or edit permissions, and the model is granular, down to the level of individual property fields. For example, you might have Senior Editors with one level of permissions, Junior Editors with another level of permissions, or you could create a user role called “Subject Matter Experts” with permissions to view the taxonomies but not change anything, except to make comments.
By the way, our entire permissions model was built on RDF-Star. So, when Ontotext was an early adopter of RDF-Star in GraphDB, we immediately followed suit. We re-engineered our whole permission model to work on RDF-Star, which gave us more flexibility.
Graphite also supports several standard workflow and governance controls, which can be quickly extended and customized to meet the needs of each specific client. For example, you can have controls like “Ready for the Approver”, “Rejected by the Approver”, “Approver comments”, “Subject Matter Expert comments”. New governance and workflow controls can be rapidly defined and deployed. They give enterprises fine control over their community of editors, authors, reviewers, commentators and what each group can see and do.
So, once you’ve built your semantic knowledge organization systems in Graphite and they are stored as RDF in the GraphDB triplestore, the next task is to connect the graph to other systems.
Graphite supports full REST-based, read-write APIs. Any task you can perform as an editor using Graphite’s UI can also be performed as an API call, for example: autocomplete search, hierarchical browse, associative graph browse, creating a new candidate concept, deleting a concept, etc. In addition to the REST APIs, Graphite also supports a SPARQL endpoint and options for GraphQL.
In the graph so far we have the ontological schema and the taxonomies of named things that populate the schema. But these things alone only define the domains of knowledge for an enterprise. It is not yet a fully complete enterprise knowledge graph.
An enterprise knowledge graph needs to also be content-aware, that is it needs to include reference data and/or content metadata. When a graph is content-aware you can do analytics and discovery over the taxonomies, reference data, and content. And you can do it through the lens of the ontology and the taxonomy.
The reason why Synaptica has forged a very close alliance with Ontotext is because enterprise knowledge graphs are the confluence of two disciplines, which require the tight integration of two toolsets. It includes both information science (which Synaptica has been doing for 25 years) and data science (which Ontotext has been doing for over 20 years).
Together Synaptica’s Graphite and Ontotext’s GraphDB provide enterprises with a robust and tightly integrated set of tools to build semantic knowledge organization systems, and upon that foundation, to build enterprise knowledge graphs.