GATE Components and Applications


GATE (General Architecture for Text Engineering) is the world most popular software platform for language engineering developed by the NLP group of the university of Sheffield.

Japec

Japec is a JAPE-to-Java compiler, packaged as a processing resource for GATE 3.1+. The processing resource (Ontotext Japec Transducer) is designed as a replacement for the standard JAPE Transducer, provided for optimized performance. The implementation is based on finite state machines and the compiler uses the standard algorithms for determination and minimization in order to achieve better performance. The actual compiler is a standalone executable written in Haskell because it involves complicated algorithms with dynamic data structures. The processing resource wraps the compiler and translates the grammar under the hood.

Japec is proven to be 2 to 5 times faster than the standard JAPE Transducer. The code of Japec is donated to the GATE project and community under the LGPL licence. The GATE team have adopted Japec internally, and are now working with Ontotext to port Japec to Java for release 4 of the system.

Contributions to GATE2.0

Ontotext took part in the development of GATE after release 2.0 with number of tasks, among which:

Most of the above mentioned enhancements are part of the GATE distribution and come free. There are also number of OntoText proprietary tools and applications which are not free for commercial use. However, those are still free for reseach or education purposes.

Hash Gazetteer

A typical gazetteer - Java-implemented lookup tool that allows occurrences of strings from a predefined lists to be found in texts. The ciritical issues about such routines are the speed and memory usage. A classical implementation approach is making a finite state machine (FSM) recogniser or a kind of suffix-tree.

The OntoText's Hash Gazetteer work is based on hashtables instead of FSM. On average it takes four times less memory and works three times faster than an optimized FSM implementation! It is currently available for free download (for non-commercial use) as a CREOLE component to be used within the GATE framework (see above).

Hidden Markov Model Learner

A stohastic module capable in filtering annotations, disambiguation, and other "soft" tasks based on confidence measures. It is based on OntoText's proprietary HMM implementation tunned for Information Extraction applications - see [Scheffer et all.2002] for the most general ideas. Contact us at info-at-ontotext.com for more information and applications.