Read about how ontologies open up opportunities for a new class of tools to power information consumption and knowledge management.
Relating the Things
I’ve covered a lot and haven’t even started talking about relationships and how to relate all these things you’ve discovered to each other. The basic question words can be applied again here to track down and relate content.
Let’s take our invoice example: Who needs the invoice, for what process, when do they need it, where is the information located, why is it important, and how is the information provided?
We can answer these questions using simple subject-verb-object sentences including the things we’ve already identified. The things are nouns, which are called subjects or objects in the world of semantic standards. We can connect these noun with verbs, which are called predicates or relationships. We are then beginning to model the concepts, properties, and relationships needed in an ontology model.
So:
Sales Team has content type Invoice
Sales (process) has content type Invoice
Sales (process) has content type Invoice (at date, time, or triggering event)
Invoice has system Customer Relationship Management System
Invoice has system Product Information System
Invoice has system Taxonomy Management System
Sales (process) has record Invoice
Sales (process) has system Invoicing System
Sales Team has system Invoicing System
There are a lot of uni-directional relationships possible, but I’ve narrowed it down to three, more general relationships which can be reused in a variety of contexts. And this brings me to some general principles of designing and managing relationships: reduce, reuse, recycle.
Reduce
In general, I think it’s good practice to reduce the number of relationships you identify and use to connect content. Reducing the number of relationships makes management and application easier across systems and helps to ensure you are using the same semantic relationship versus nearly synonymous named relationship variants.
For example, do we need the relationship has Sales (process) content type to distinguish what one group or process needs versus another? I doubt it. Every group and process across your organization will need content and information to perform their jobs. If content types are identified as part of a taxonomy dedicated to electronic and physical assets, the use of hierarchical and associative relationships will determine the context of something like an “Invoice”. There is no real need to be more specific when the subjects, predicates, and objects will do the work for you. In one organization I worked in, employees wanted to specify types of meeting minutes: sales meeting minutes, engineering meeting minutes, financial meeting minutes, etc. But why? Are meeting minutes inherently different based on who is having the meeting or what the meeting is covering? Not really, so the more general “meeting minutes” can be used with a relationship to a specific team and/or topic. There is no need to pre-coordinate concepts in the subject, predicate, or object when they can be split into more elemental concepts and related to each other with simple, reusable relationships.
Reducing the number of relationships is probably going to save you a lot of governance work, but you don’t need to be stingy. Related to is a useful associative relationship and every possible relationship could be rolled up into this to reduce the number of relationships used. However, related to is vague and not semantically descriptive. How a subject and object is related is far more useful.
Reuse
Part of the work of reducing the number of relationships is reuse. As shown above, a single relationship like has topic can be used for a variety of content types without the need for specifying things like has finance topic, has engineering topic, or has information technology topic. Rather, these topics are contextualized by their location in the Topics taxonomy providing tagging terminology. Financial concepts are in taxonomy branches with other financial topics. It is this context which makes it clear which topics are being applied to content.
Where reuse can fall short, however, is in the design of reciprocal relationships. Many taxonomy management systems do not support the reuse of the same one-way relationship in multiple inverse relationships. For example, it is possible to have something like Character has film Film and Actor has film Film, but the single has film relationship cannot have both a has character AND has actor as a reciprocal. Each of these must be set up as single, one-way relationships.
Reducing and reusing relationships requires planning and governance. While it’s not possible to predict every use case for your ontology model, especially as domains change or expand over time with events like mergers or new markets, careful planning of relationship policies will help to mitigate future problems.
Recycle
Finally, consider recycling relationships. Recycling is slightly different than reusing. Relationships defined in system schemas often disappear when the system is sunset or removed. Salvage these relationships before the system disappears and recycle them for use in connecting content relocated from that system into a new location.
Another possibility for recycling relationships is to apply them in new applications. While controlled vocabularies and their systems should not be a solution searching for a problem, they can often be used in applications beyond what they were originally developed or purchased to do. Advocating for repurposing of existing controlled vocabularies and their semantic relationships for new uses often results in having to do extension work to existing semantic structures rather than building from the ground up…much like using existing material to recycle into a new product.
Which Things do We Include?
Some of the most common controlled vocabulary and ontology modeling challenges are deciding what among your many things and verbs should be represented in which way. For most things, identifying them as subjects and objects will be the best modeling choice. These subjects and objects will have label names as a descriptor or preferred label. When the thing is a less common or less desirable name for a thing you already have, it is a synonym, connected by a used/used for relationship or by an alternative label field.
What if a thing describes another thing within your domain? Then we have modeling choices to make. It could still be represented as concepts connected by relationships, such as Shirt has color Blue. However, this descriptive information could be an attribute of a concept. In this case, the concept “Shirt” would have a metadata field called “Color” which could be populated by a dropdown list of colors or by using a free text field.
I mentioned briefly above that things like proper names may have many use cases in which they are not managed as part of a controlled vocabulary. There are other obvious concepts which are better left unmanaged in a controlled vocabulary: dates, addresses, some product information, rapidly changing data values, etc. There are always exceptions. Managing “09/11” as a concept is not the same as managing that information as a date. Similarly, “1600 Pennsylvania Avenue” is a valid synonym for the concept “The White House”.
Deciding what not to manage can be important as deciding what to manage. Not everything in your semantic model is part of a controlled vocabulary, but data and content can be connected to those controlled vocabularies by relationships mapped out in an ontological domain model.