KIM is designed to ensure high accuracy and throughput within a robust architecture. Indications of different sorts of performance follow below:
Scale and throughput on a $1000-worth PC:
Annotation speed: 10 kb/s. The annotation speed depends primarily on the JAPE engine of GATE.
Indexing & Storage speed: 27 kb/s. Based on Lucene.
Documents (with annotations) stored: 300k. Retrieval of a document by ID within a few milliseconds.
A test for scalability and speed of the semantic store, namely the Sesame RDF(S) repository, with support for custom axioms (rules):
The result: a repository of 15M explicit statements, describing 1.2M entities, is manageable with an indicative upload speed of 1300 statements/sec.
The experiment was performed on a 2xOpteron 240 (1.4GHz) server with 6GB of RAM - a $3000-worth, brandless machine. 64-bit beta-versions for Amd64 of Windows 2003 Server and JDK 1.5.0 were used.
The Java VM running Sesame was allowed to take up to 6GB of heap.
Sesame was configured for in-memory reasoning with N3 persistence.
During the test, Sesame was managing a knowledge base (KB) of entity descriptions. The KB included an average of 12 statements for each entity, linking it with literals and a few auxiliary RDF(S) resources.
The initial KB contained 700k entity descriptions - the KIM World KB was extended with the entities and properties, extracted from the top news for the period 2002-2004.
The test was performing transactions of an addition of 1000 synthetically generated entity descriptions. Thus the average number of statements, added per transaction, is 12k.
The average time for committing a transaction was 9 sec.
There were no considerable delays, related to the growing size
of the repository. This indicates a speed of about 1300 statements/sec.
The maximum volume, achieved within the 6GB RAM limitations, was: 1.2M entity descriptions; 15M explicit statements. The estimated number of all statements (including the results from the forward-chaining) is about 35M. Thus, the in-memory representation of a statement takes about 171 bytes.
The size of the final N3 storage was 2.2GB - i.e. an average of about 147 bytes/statement.
Precision 90%, recall 86% on a standard, named-entity type recognition evaluation (ACE style).