At present, for some time now, KIM has been actively
used for massive semantic annotation in the context of the
SWAN and
SEKT projects. In
both cases, scalability has been identified as a critical
issue - at least for the following use cases:
To process large volumes of data for the purpose of designing
and training of statistical information extraction (IE) methods;
To enable public metadata-on-demand services.
The development team extended KIM with the KIM Cluster Architecture so that extensive scaling of the throughput of the Platform could be made possible.
The architecture for this extention was driven straight by the
setup and the requirements of the SWAN project - within
its framework, KIM was deployed on a cluster of servers
with much greater computing power than that of any similar setup in
the Semantic Web research area. Some of the
main features follow below:
Support for a virtually unlimited number of annotators (the machines/components, performing the most expensive processing computationally);
Centralized storage and querying of ontologies (and more generally, knowledge) ;
Centralized storage, indexing, and querying of meta-data (annotations) and documents;
Support for multiple crawlers (or other data sources);
Capacity for a dynamic reconfiguration of the cluster (e.g. starting new crawlers or annotators on demand).