The KIM Platform: Semantic Annotation

Here is what we consider semantic annotations:

The information about what entities appear in a text and where they do. Actually, the references from the text to a semantic repository, containing further knowledge.

Annotation

'Annotation' has two meanings in contemporary English (according to WordNet, similar in Merriam-Webster):

In linguistics (and particularly in computational linguistics) an annotation is considered a formal note added to a specific part of the text. There are a number of alternatives regarding the organization, structuring, and preservation of annotations. For instance, all the markup languages (HTML, SGML, XML, etc.) can be considered as schemata for embedded annotation. Contrary there are models suggesting that the annotations should be kept detached (non-embedded) from the content, i.e.
Semantic Annotations

We refer to semantic annotation at the same time as (i) a sort of meta-data and (ii) the process of generation of such meta-data.

While there could be an argument with respect to the name (it could well be "Entity annotation") its nature is quite unambiguous: the named entities in the text are recognized and identified. The result is formally recorded and associated with the place in the text where the entity has been mentioned. The identity of the entity is "verbalized" via URIs which means that those can be easily linked to their descriptions within a semantic repository, as demonstrated below.

Although redundant, in accordance with the good NE recongnition tradition in the IE community, the types of the entities are also explicitly indicated via URIs to the respective (most specific) classes in the ontology.

Named Entities

Named entities (NE) are considered: people, organizations, locations, and others referred by name. Apples and bicycles are not considered NE, because those are not typically referred by name.

Within a wider interpretation, NE can be considered also some scalar values (numbers, amounts of money, dates) and addresses.

Couple of principle comments:

What about words?

Words can also be formally marked up. One of the typical approaches is to annotate the respective word with some sort of a designator of the word sense used in the specific case. For instance, a designator could be "link-v2", meaning that the second meaning (according to some register) of the word "link" is taken as a verb (it could well serve as a noun).

There are number of tough issues relared to the word meanings:

We respect the above mentioned questions and the complexity of their answers. If one is eager to dedicate a part of his/her energy to further examine these phenomena, we would recommend the following projects for the purposes of his/her study: WordNet and Cyc and conferences like SENSEVAL, FOIS, and OntoLex.

However, at present this is not our prime objective. At this stage our focus is on a much simpler problem - the basic semantics of the named entities. And we do believe that the general semantic annotation appoach, proposed and implemented in KIM, can serve word-level semantics as well.