Ontotext

Release Notes

Release highlights

The most important changes and improvements cover the following areas:

KIM Components

  • Several stand-alone components to be used outside the platform – for details, see the Developer’s Guide in the System Documentation.

Indexing and search

  • Fully compliant SPARQL support and full-text RDF indexing through the updated BigOWLIM 3.3.
  • Handles millions of documents and hundreds of millions of annotations using the latest BigOWLIM RDF database.
  • Highly configurable indexing. You can turn off the indexing features you do not need, in order to improve scalability.
  • Parallel indexing of multiple documents on multi-core machines.
  • No longer dependent on Oracle. The default CORE implementation is based on RDF and SPARQL only.

Text analysis

  • Improved text analysis capabilities based on GATE 5.2.
  • Automatic input document language identification. Multi-language analysis support.
  • Improved cross-document named entity linking.
  • Support for more input formats - Open Document Format and Microsoft Office 2007, including Word and Powerpoint. The support for PDF and the regular Microsoft Word is also improved, including table formatting extraction from Word documents.

User interface (UI)

  • Overall UI facelift. Complex screens now update asynchronously, with AJAX.
  • Smart autocomplete in the semantic Patterns search interface.
  • Improved timeline analysis and interface.
  • Uploading, exporting and deleting documents right from the UI. You can export any document result page as a package of annotated documents. Uploading and deleting can be secured separately from the read-only part of the interface.
  • Relevant snippets from documents in the document search results.
  • Bookmarkable semantic query results with live updates. These query results can be converted to personalized automatically updated RSS 2.0 feeds, compatible with any RSS reader.

API, management and extensibility

  • Streamlined Java API
  • Complete management support through JMX. Visual management using jVisualVM.
  • Partial management support through the web interface (password protected).
  • Updated companion tools - support for exporting extracted document metadata as RDF, population from the console, easier large corpus population.
  • Support for writing extensions (in Java) in the same virtual machine as the server, for the lowest communication overhead.
  • Less configuration options even though there are more features! The existing configuration options are better documented.

Documentation

  • More detailed and easy to follow documentation and API Javadoc.
  • Task-driven user guides provided as single documents.
This is not the complete changelog, as the complete list of improvements is too large.


 

Please contact us if you are interested in any details.

Notes on upgrading
  • Although the Java API of the KIM Server has been altered significantly, the overall structure remains the same. If your client software uses a removed method for some interface, please consult the interface javadoc to find a replacement method.
  • The SOAP Web Service API has only a few very minor changes that will not affect any standard usage scenarios. Still, we recommend to update the auto-generated client code, if any.
  • Upgrading from an existing KIM 2.4 installation is possible, but complicated. We will post instructions as soon as the BigOWLIM team posts instructions for upgrading from 2.0 to 3.3.
  • This will be the last KIM version to support Java SE 5. We believe that since Sun Java SE reached End-of-Life in 2009, the number of our customers who require Java 1.6 has decreased. KIM 3.5 and later will require Java 6 or Java 7.
Experimental features
  • Hybrid search in the UI allows you to search for documents through an autocomplete-assisted interface.
  • JMS-based communication with the server, instead of RMI-based. Disabled by default.
  • Running KIM 3 components in a lightweight service-oriented framework. Disabled by default.
  • Running separate RDF database and KIM Server using JMS communication. Disabled by default.
  • Experimental integration with Ontotext MIMIR. Disabled by default.
  • Pure RDF document indexing and storage via Semantic Annotation Repository (SAR). Can be enabled via the document repository configuration.
  • Tripleset support, in addition to named graphs in RDF via ORDI.
Known issues
  • The entity look up with more than 50 000 results is slow.
  • Search for person whose name is similar to David C. U'Prichard is not possible in Structure screen, because of the punctuation.
  • The autocomplete feature in Patterns does not function properly. The Patterns screen can still be used even with malfunctioning suggestions.
  • The Patterns screen doesn't handle invalid characters well, such as !!@@##$$^^&&** .
  • Creating timelines about "most popular entities" is slow.
  • Sometimes, the query from one search screen is not cleared when switching to another screen, causing incorrect results.
  • The KB Explorer doesn't show any information about annotated percentages in documents.
  • PDF and Microsoft Word document annotation through the web interface doesn't work.