GATE (General Architecture for Text Engineering) is the world's most popular software platform for language engineering, developed by the NLP group of the university of Sheffield.
Japec is a JAPE-to-Java compiler, packaged as a processing resource for GATE 3.1+. The processing resource (Ontotext Japec Transducer) is designed as a replacement for the standard JAPE Transducer, provided for optimized performance. The implementation is based on finite state machines and the compiler uses the standard algorithms for determination and minimization in order to achieve better performance. The actual compiler is a standalone executable written in Haskell because it involves complicated algorithms with dynamic data structures. The processing resource wraps the compiler and translates the grammar under the hood.
Japec is proven to be 2 to 5 times faster than the standard JAPE Transducer. The code of Japec is donated to the GATE project and community under the LGPL license. The GATE team have adopted Japec internally, and are now working with Ontotext to port Japec to Java for release 4 of the system.
Ontotext is a core participant GATE development since release 2.0 with a number of tasks, among which:
Most of the above mentioned enhancements are part of the GATE distribution and come free. There are also number of Ontotext proprietary tools and applications which are not free for commercial use. However, those are still free for research or education purposes.
A gazetteer is a Java-implemented lookup tool that allows occurrences of strings from predefined lists to be found in texts. The critical issues about such routines are the speed and memory usage. A classical implementation approach uses a finite state machine (FSM) recognizer or a kind of suffix-tree. Ontotext has developed several gazetteers for GATE.
A stochastic module capable in filtering annotations, disambiguation, and other "soft" tasks based on confidence measures. It is based on Ontotext's proprietary HMM implementation tuned for Information Extraction applications - see [Scheffer et al 2002] for the most general ideas. Contact us at info-at-ontotext.com for more information and applications.