Following the long tradition of supporting significant events in the development of the Semantic Web, Ontotext is a gold sponsor of this year's RR2010.
The International Conference on Web Reasoning and Rule Systems (RR) is a major
forum for discussion and dissemination of new results concerning Web Reasoning and Rule Systems. It will be held in Bressanone, Italy between
September 22 and 24, 2010.
RR 2010 is build on the success of the first three International Conferences on Web Reasoning and Rule Systems. They were held in Innsbruck,
Austria (2007), Karlsruhe, Germany (2008), and Chantilly, Virginia, USA
(2009), where received enthusiastic support from the Web Reasoning community.
In 2010, RR will continue to attract the best Web Reasoning and Rules researchers from all over the world.
Traditionally Ontotext supports and this year's issue of the AIMSA conference - probably
the oldest, running conference for AI research and development since 1984. The venue will be located in Varna, Bulgaria, from the 8th to the 10th of
September.
Ontotext is top one of the three sponsors this year and will welcome each participant with its new innovative brochure which has interesting stories around the latest product releases.
Atanas Kiryakov (Ontotext's Executive Director) was invited as keynote speaker
earlier this year and he will present a talk on "FactForge – the Fast Track to the Centre of the Data Web or How reason-able views bring the Semantic
Web closer to its tipping point"
A tutorial led by Ontotext's managers will take place on Friday, 10th September, between 14:30 and 17:00. The topic is
"Semantic Technologies and Applications: Web Mining, Text Analysis,
Linked Data Search and Reasoning". This tutorial will demonstrate how vision, concise engineering and 10 years of experience can
deliver successful applications of semantic technology. The tutorial is free for participants who register for AIMSA and S3T events.
Mariana Damova will present poster "Mapping Data Driven Ontology and
Upper Ontology" at AIMSA and the paper "Query-Based Summarization: A survey" afternoon on September 12, at S3T.
AIMSA is co-located with S3T event. The latter will take place in Varna, between 11-12th of September.
S3T is rather new international conference which will provide a forum for connecting researchers and international research communities for worldwide
dissemination and sharing of ideas and results in the areas of Software and Services and Intelligent Content and Semantics.
Marin Dimitrov, CTO of Ontotext, is to lead the Semantic Technologies Track at the 3rd GATE
Training Course taking place between August 30th and September 3rd 2010 in Montreal, Canada. The event comprises of 4 parallel tracks (16 modules)
covering an introduction to Text Mining and GATE, intermediate and advanced GATE programming and a track on Semantic Technologies. The Semantic
Technologies Track takes on Semantic Web standards, ontology engineering, Linked Data management, RDF databases and semantic search.
GATE is an open source platform for text analysis and language engineering. The core GATE team from
the University of Sheffield has been developing and maturing the platform since 1995; with over 50 thousand engineers and linguists experienced in
GATE, it qualifies as today's most popular text analysis platform. The training course in Montreal builds on the experience and the popularity of
the GATE training course held in July 2009 and May 2010 in Sheffield, UK.
Registration for the training course is still open. More information about the event is available at http://gate.ac.uk/conferences/montreal-2010/.
As Ontotext announced earlier, the company's own semantic repository BigOWLIM was successfully integrated
into the high performance Semantic Web publishing stack powering the BBC's
2010 World Cup website. BigOWLIM is used there as a triple-store performing OWL reasoning on continuously changing data and handling millions of
page requests per day.
A couple of recent blog posts from the technical team at BBC provide an insight into the business case for deployment of semantic technologies in
their World Cup website, the technical architecture of the publishing stack, the strategic importance of the project's success and the plans for usage
of semantic technology and linked data within the BBC.
In "The World Cup and a call to action
around Linked Data", John O'Donovan, Chief Technical Architect, Journalism and Knowledge, BBC Future Media & Technology, discusses the
business benefits of the implemented semantic solution:
"The World Cup site is our first major statement on how we think this (the Semantic Web) can work for mass market media and a showcase for the
benefits it brings. … Though we have been using RDF and linked data on some other sites (…) we believe this is the first large scale, mass media site
to be using concept extraction, RDF and a Triple store to deliver content."
"…we are not publishing pages, but publishing content as assets which are then organised by the metadata dynamically into pages, but could be
re-organised into any format we want much more easily than we could before. …There is also a change in editorial workflow for creating content and
managing the site. This changes from publishing stories and index pages, to one where you publish content and check the suggested tags are correct.
The index pages are published automatically. This process is what assures us of the highest quality output, but still saves large amounts of time in
managing the site and makes it possible for us to efficiently run so many pages for the World Cup."
"As more content has Linked Data principles applied to it … the vision of a Semantic Web moves closer. Importantly, what we have been able to
show with the World Cup, is that the technology behind this is ready to deliver large scale products."
"This is more than just a technical exercise - we have delivered real benefits back to the business as well as establishing a future model for
more dynamic publishing which we think will allow us to make best use of our content and also use Linked Data to more accurately share this content
and link out to other sites
and content, a key goal for the BBC. We look forward to seeing the use of
Linked Data grow as we move towards a more Semantic Web."
In a following post "BBC World Cup 2010
dynamic semantic publishing", Jem Rayfield, Senior Technical Architect, BBC News and Knowledge, provides more information on the technical
architecture of the high-performance publishing stack and the related data flows and data modelling:
"The World Cup 2010 website is a significant step
change in the way that content is published. … As you navigate through the site it becomes apparent that this is a far deeper and richer use of
content than can be achieved through traditional CMS-driven publishing solutions.
"The site features 700-plus team, group and player pages, which are powered by a high-performance dynamic semantic publishing framework. This
framework facilitates the publication of automated metadata-driven web pages that are light-touch, requiring minimal journalistic management, as they
automatically aggregate and render links to relevant stories."
"The foundation of these dynamic aggregations is a rich ontological domain model. The ontology describes entity existence, groups and
relationships between the things/concepts that describe the World Cup. For example, "Frank Lampard" is part of the "England Squad" and the
"England Squad" competes in "Group C" of the "FIFA World Cup 2010". The ontology also describes journalist-authored assets (stories, blogs, profiles,
images, video and statistics) and enables them to be associated to concepts within the domain model…."
"A RDF triplestore
(ref. BigOWLIM) and SPARQL
approach was chosen over and above traditional relational database technologies due to the requirements for interpretation of metadata with
respect to an ontological domain model. The high level goal is that the domain ontology allows for intelligent mapping of journalist assets to
concepts and queries. The chosen triple store provides reasoning following the forward-chaining model and thus implied inferred statements are
automatically derived from the explicitly applied journalist metadata concepts."
"This inference capability makes both the journalist tagging and the triple store powered SPARQL queries simpler and indeed quicker than a
traditional SQL approach. Dynamic aggregations based on inferred statements increase the quality and breadth of content across the site.
The RDF triple approach also facilitates agile modeling, whereas traditional relational schema modeling is less flexible and also increases query complexity."
"Our triple store is deployed multi-data center in a resilient, clustered, performant and horizontally scalable fashion, allowing future
expansion for additional ontologies and indeed linked open data (LOD) sets. … The triple store is abstracted via a JAVA/Spring/CXF JSR 311 compliant
REST service. ... The API is designed as a generic fa?ade onto the triple store allowing RDF data to be re-purposed and re-used pan BBC. This service
orchestrates SPARQL queries and ensures that results are dynamically cached with a low 'time-to-live' (TTL) (1 minute) expiry cross data center using
memcached."
"This dynamic semantic publishing architecture has been serving millions of page requests a day throughout the World Cup with continually
changing OWL reasoned semantic RDF data. The platform currently serves an average of a million SPARQL queries a day with a peak RDF transaction rate
of 100s of player statistics per minute. …."
"The development of this new high-performance dynamic semantic publishing stack is a great innovation for the BBC as we are the first to use
this technology on such a high-profile site. It also puts us at the cutting edge of development for the next phase of the Internet, Web 3.0."
The blog posts by BBC set off a wave of enthusiasm in the community. Some of the reflections are cited below.
BBC World Cup Website Showcases Semantic
Technologies, post by Richard MacManus, the founder of ReadWrite Web. "…if there was a World Cup for the Semantic Web, then the BBC may have
lifted the trophy for its country"
First BBC microsite powered by a
triple-store, post by Yves Raimond at DBTune. "All this is very exciting, the World Cup Website proved that triple store technologies can be
used to drive a production website with significant traffic. I am expecting lots more parts of the BBC web infrastructure to evolve in the same way :-)"
These posts were accompanied by an avalanche of twits and blog comments, dominated by the word "impressive". A few of the comments of the ReadWriteWeb post
were particularly positive: "excellent technology on both software and hardware", "It Begins ...", "It's really fantastic to see organizations like
the BBC building really exciting semantic applications which demonstrate quantifiable business value. Great stuff!..."
If you are lost in your masses of documents, unlinked data and scattered knowledge, KIM 3 might be the remedy for you!
KIM 3 gives you the ability to:
KIM 3 is faster than ever, modular, and based on the latest semantic data standards. It allows you to utilize and harmonize your conceptual models
and taxonomies, linked them automatically to you documents or unstructured data, and thus more easily access, navigate, and reuse your enterprise
content.
Moreover, in order to make it more accessible, we can build a tailored
KIM-based system for you and train you how to go on from there.
The most important changes and improvements in KIM 3 cover the following areas:
- Stand-alone KIM components
- Several stand-alone components to be used outside the platform
- Indexing and search
- Fully compliant SPARQL support and full-text RDF indexing through the updated BigOWLIM 3.3.
- Handles millions of documents and hundreds of millions of annotations using the latest BigOWLIM RDF database.
- Highly configurable indexing. You can turn off the indexing features you do not need, in order to improve scalability.
- Parallel indexing of multiple documents on multi-core machines.
- No longer dependent on Oracle. The default CORE implementation is based on RDF and SPARQL only.
- Text analysis
- Improved text analysis capabilities based on GATE 5.2.
- Automatic input document language identification. Multi-language analysis support.
- Improved cross-document named entity linking.
- Support for more input formats - Open Document Format and Microsoft Office 2007, including Word and Powerpoint. The support for PDF and
the regular Microsoft Word is also improved, including table formatting extraction from Word documents.
- User interface (UI)
- Overall UI facelift. Complex screens now update asynchronously, with AJAX.
- Smart autocomplete in the semantic Patterns search interface.
- Improved timeline analysis and interface.
- Uploading, exporting and deleting documents right from the UI. You can export any document result page as a package of annotated documents. Uploading and deleting can be secured separately from the read-only part of the interface.
- Relevant snippets from documents in the document search results.
- Bookmarkable semantic query results with live updates. These query results can be converted to personalized automatically updated RSS 2.0
feeds, compatible with any RSS reader.
- API, management and extensibility
- Streamlined Java API
- Complete management support through JMX. Visual management using jVisualVM.
- Partial management support through the web interface (password protected).
- Updated companion tools - support for exporting extracted document metadata as RDF, population from the console, easier large corpus
population.
- Support for writing extensions (in Java) in the same virtual machine as the server, for the lowest communication overhead.
- Less configuration options even though there are more features! The existing configuration options are better documented.
The National Archives has invested in an 'intelligent discovery tool'
to improve searches of archived UK Government websites. The contract for development of Government Web Archive Semantic Knowledge Base was granted to a
consortium of semantic technology professionals lead by Ontotext.
The consortium includes experts from the GATE team at the University of Sheffield, creators of the most comprehensive open-source text mining
ecosystem in the world, as well as System Simulation, a major UK integrator of images, digital assets, collection and content management systems.
The contract was awarded after a call for competition and a public tender process.
The project started in mid June and is expected to finish before the end of March 2011. It will aim to bring new methods of search, navigation and
information modeling to The National Archives and in doing so make the web archive a more valuable and popular resource.
The Government Web Archive Semantic Knowledge Base will also bring together publicly available linked data and open-source text mining technology
in a system, which is easy to understand, and can be managed and extended in a predictable and cost efficient manner.
BigOWLIM 3.3 Handles Millions of Queries per Day, Supports OWL 2 RL Reasoning and
Provides Unmatched Linked Data Integration, Management and Retrieval Capabilities
Ontotext announces version 3.3 of BigOWLIM, the World's
most scalable semantic repository.
This release consolidates as part of the main product a number of advanced features,
which have existed as bespoke developments. Considerable effort was also dedicated
to making BigOWLIM more robust and easy to use.
The development of this version was influenced by the requirements of
FactForge and LinkedLifeData
(two of the most advanced linked data portals) and the
BBC's 2010 World Cup website -
probably the most challenging real-world use case of semantic repositories implemented so far.
Within the LarKC project, BigOWLIM is used as the data layer
in a platform for Web-scale reasoning that features a range of diverse reasoning plug-ins.
The key characteristics and features of BigOWLIM include:
- Based on independent opinions, it is the
most efficient semantic repository in the World, in terms of speed of loading, inferencing, and query answering;
- Pure Java implementation and fully compatible with
Sesame 2;
- Clustering support brings resilience, failover and horizontally scalable parallel query processing;
- Customisable reasoning, in addition to RDFS, OWL-Horst, and most of the language features of OWL-Lite,
BigOWLIM 3.3 is the only semantic repository that provides comprehensive OWL 2 RL support today;
- Optimized owl:sameAs handling, which delivers dramatic improvements in performance and usability when huge volumes of data from multiple sources are integrated;
- Full-text search, based on either Lucene or proprietary techniques;
- High performance retraction of statements and their inferences;
- Powerful and expressive consistency/integrity constraint checking mechanisms;
- RDF rank, similar to Google's PageRank, can be calculated for the nodes in an
RDF graph and used for ordering query results by relevance and many other purposes;
- RDF Priming, based upon activation spreading, allows efficient data selection and context-aware query answering for handling huge datasets;
- Notification mechanism, to allow clients to react to statements in the update stream.
It was developed within the SOA4All project,
where OWLIM is used in the Semantic Spaces component and in the
iServe platform for publishing semantic web
services as linked data.
These features are already proven at FactForge
(previously known as LDSR), where BigOWLIM is used to load 8 of the central
LOD
datasets in a repository which contains 1.2 billion explicit and 0.8 billion implicit statements.
BigOWLIM's owl:sameAs optimization allows FactForge to deal with 'only' 2 billion statements
in its indices, while the number of distinct statements retrievable form the repository is 10 billion.
This feature allows FactForge to deliver non-inflated query results, while the semantics of owl:sameAs
is still fully accounted for during query evaluation.
BigOWLIM version 3.3 is also at the heart of the LinkedLifeData
RDF warehouse (release 0.5), which combines 25 of the most popular biomedical databases in a repository
that contains more than 4 billion statements.
The latest version of the BigOWLIM repository has been successfully integrated into
the high performance Semantic Web publishing stack powering the
BBC's 2010 World Cup football website,
performing OWL reasoning with continuously
changing data and handling millions of page requests per day.
Included with version 3.3 is a thoroughly reworked documentation set that includes:
- User guide - updated with details of new features and all configuration parameters;
- Primer - updated with examples and recent trends in semantic technologies;
- Quick start guide - to help those new to BigOWLIM to get set up and running smoothly.
Some popular namespace prefixes come predefined within BigOWLIM 3.3 in order
to simplify query writing. Such as the prefixes for: the RDF, RDFS, and OWL schemata;
all the prefixes for linked data namespaces used in FactForge; and prefixes for projects
like Good Relations.
A detailed list is of the predefined prefixes can be found in the prefixes.txt file in the distribution.
Furthermore, the OWLIM website has been
revised to include more relevant details, latest benchmark results, etc.
For further information, please contact OWLIM-info-at-ontotext.com.
Ontotext announces re-branding of its Linked Data Semantic Repository (LDSR) linked data service to FactForge.
FactForge is designed as an index that provides a fast track to the center of
the web of linked data. It represents a reason-able view
including 8 of the LOD
datasets, namely: DBPedia, Geonames, Wordnet, Musicbrainz, Freebase, UMBEL, Lingvoj and the CIA World Factbook.
Those datasets are preprocessed and loaded into BigOWLIM to form an integrated dataset of about 1.2 billion exlicit statements.
Forward-chaining is performed to materialise 0.8 billion implicit statements, in accordance with the semantics of
the ontologies used in the datasets.
For end-users, FactForge is a free public service delivering a fast and reliable single
point of access to the central LOD datasets. The data can be accessed in several ways:
- Go-to-resource: incremental URI auto-suggest;
- Keyword search: RDF Search, returning a ranked list of RDF snippets;
- Exploration: traversing the data, one resource at a time, e.g. Sofia;
- Structured queries: evaluation of queries in SPARQL (supported by plenty of sample queries);
- Remote server access: SPARQL end-point.
Shortly after the release in May, Ontotext announced that version 0.5 is
available online.
LLD is a semantic data integration platform for the biomedical domain that interlinks 25
popular biomedical data sources. The current release includes more than 4 billion statements,
which represent correlations between 0.5 billion biomedical entities.
The main focus of this release was the improvement of the user interface of the system and
optimization of the auto-complete search functionality.
Changes in 0.5 release:
- Enhancement of the Concept view screen: widget-like sections for different types of information;
- page-ing within the widgets for navigation in large lists
- tool-tips with semantic types and definition for the concepts
- ranked document list with all PubMed articles related to the concept of interest
- Exhibit view for all related documents with multiple filtering and grouping options
- new view for documents metadata - Document view
- added "View as triples" which displays the data in Concept and Document view in RDF triples
- enhancement of the auto-complete functionality
- improvement of concept ranking
- more concepts covered
For the fifth year running, Ontotext will participate in SemTech -
the world's largest, most authoritative and highest ranking Semantic Technology conference. The venue will be located in San Francisco,
from the 21st to the 25th of June.
A platinum sponsor of this event, Ontotext is pleased to welcome guests and clients at booth #101 where they can sample the technology,
make an enquiry and receive a quote or just to touch base.
"We are excited to have the opportunity to offer the latest developments in our technologies in three separate presentations on Wednesday,
23rd of June - the first day of the main conference program. What's more, on Friday afternoon, Ontotext will present a 360° view on its technology
and solutions in the course of a half-day seminar" - says Atanas Kiryakov, Executive Director of Ontotext.
The first presentation - "Using GPUs to
Browse through Billions of Linked Data Facts" will be delivered by A. Kiryakov and falls within the Linked Data Track of the conference.
The focus is on the need to devise a meaningful way to filter and represent a "good" portion of the information returned as results
in FactForge - a gateway to the LOD cloud, which allows querying and searching eight of its
central datasets. A mechanism, called "RDF priming" is introduced to tackle this task. Furthermore, the benefits of using the CUDA architecture of
NVIDIA graphic cards are demonstrated as a cost-effective alternative to dual-core servers.
"How to integrate 10 databases and interlink
them with 10 million documents in 10 weeks" - a presentation by Matthew Petrillo and Vassil Momtchev, group leader in Ontotext specializing in the
implementation of semantic solutions in the life sciences and healthcare domain. Matthew and Vassil will take us through the company's experience in
implementing Semantic Enterprise solutions and reusing common software components and best practices. The talk will cover the full path of adopting
semantic technology in the enterprise and demonstrate a fully operational infrastructure.
For the Industry Track of the conference, Atanas Kiryakov and Barry Bishop will draw the attention to publishing with their
"OWLIM Cluster Allows Resilient Handling of Millions of Queries per Day" presentation. At the heart of the talk is a real-world scenario where
BigOWLIM is used to handle millions of requests per day, serving as a back-end to seamlessly bring
the power of the Semantic Web to the website of one of the largest and most popular television production companies in the World.
On Friday afternoon the Ontotext team will conduct the
"360° Semantic Technologies: Web Mining, Text Analysis, Linked Data Search and Reasoning" workshop, which will conclude the Business Applications
track on the last day of the conference. This workshop will demonstrate how vision, concise engineering and 10 years of experience can deliver
successful applications of semantic technology. Three Ontotext partners will also present their services and technology as part of the workshop:
BPEng (Italy), Profium (Finland) and TopQuadrant (USA).
The W3C Workshop - "RDF Next Steps" starts on June 26 and is hosted by the National Center for
Biomedical Ontology (NCBO). In the space of a couple of days, the attendees and presenters will try to gather and analyse feedback from the
Web community on whether and, if yes, in which direction RDF should evolve. One of the main issues the Workshop should help deciding is whether it
is timely for W3C to start a new RDF Working Group to define and standardize a next version of RDF.
Ontotext's involvement in the Workshop shall materialise in the presentation by Atanas Kiryakov and Vassil Momtchev of a position paper called
"Triplesets: Tagging and Grouping in RDF Datasets". The paper argues that the need to augment the standard triple-based RDF
data model towards quadruples is widely recognized. Furthermore, Ontotext proposes semantics for the operations addition and removal of
quadruples from integrated datasets. Encouraged is also the need to further extend the RDF model and propose a specific mechanism called triplesets.
The proposed mechanism is already supported in the OWLIM semantic repository and is used in the data layer of LarKC - probably the most ambitious
large scale reasoning project.
SemData@ESWC Workshop - the second event of the SemData series, took place in Crete,
Greece on 30th of May. The workshop was collocated with the Extended Semantic Web Conference and was kicked off with an introductory talk by Atanas
Kiryakov of Ontotext AD.
SemData@ESWC builds on the success of the first event in the series, hosted by Ontotext in Sofia in March this year. Only a few months into the
start of the SemData initiative in Jan 2010, it has already succeeded in bringing together many of the leading academy experts in the field, enlisting
more than half of the leading semantic repository vendors (ORACLE, IBM, OpenLink, Franz, SYSTAP, Ontotext) as well as the developers of YARS, MonetDB,
WebPIE and HEXASTORE.
The agenda and the presentations from the SemData@ESWC Workshop can be found at: http://semdata.org/events/2010/eswc
On May 27th, Ontotext released an enhanced version of the LinkedLifeData (LLD)- an unique
knowledge discovery platform, based on scalable semantic data integration in life sciences and health care. It supports heterogeneous data sources,
simple information updates, and easy incremental extension and integration of datasets.
The current release includes/combines data from 25 different data sources covering information about genes, proteins, molecular interactions,
drugs, small chemical compounds, medications' clinical trials, side effects, scientific publications and many more. The LinkedLifeData repository
stores 4.2 billion statements, which interconnect 583 million different entities. The new release represents an enlarged knowledge repository with
information from 3 new data sets:
- LarKC Carcinogenesis research dataset
- Chemical Entities of Biological Interest (ChEBI)
- Literature-derived Human Gene-Disease Network (LHGDN)
In addition to the provided SPARQL end-point, RelFinder is integrated as a new data mining
tool. RelFinder is an open source application for interactive relationship discovery in RDF datasets. Researchers can access it on
http://linkedlifedata.com/relfinder to mine the huge repository of LLD for distant
relations between different entities, which may be not so obvious if using conventional mining techniques.
LLD is developed as part of EU-funded project LarKC in co-operation with AstraZeneca - a global, innovation-driven biopharmaceutical business
with a primary focus on the discovery, development and commercialization of prescription medicines. The primary focus of LinkedLifeData is to provide
innovative approaches to analyze the available scientific information, to combine it with the companies' internal knowledge, thus facilitating and
speeding-up the drug development process at lower costs. LinkedLifeData (LLD) is freely available for the scientific community at:
http://linkedlifedata.com.
On 16/04/2010 Richard MacManus published on ReadWriteWeb a post "The Modigliani Test: The Semantic Web's Tipping Point", which essentially argues that
the linked data are not sufficiently linked. He wrote that "The tipping point for the long-awaited Semantic Web may be when you can query a set
of data about someone not too famous, and get a long list of structured results in return". Then he defined the "Modigliani Test" for
the Semantic Web: he wants to be able to query a search engine "tell me the locations of all the original paintings of Modigliani" and
get back large list of results.
In a LarKC blog post http://blog.larkc.eu/ from 23/04/2010 Atanas Kiryakov presented how the LDSR reason-able view to the linked data web is passing the test.
On 11th and 12th of March Ontotext hosts SemData@Sofia round table - the first one of a serie of events called SemData - dedicated to semantic data management.
The goal of this event series is to investigate various aspects of semantic databases and data management in the large. We seek expert discussions and
trans-disciplinary collaborations on issues such as semantic repositories, their virtualization and distribution, and interoperability with relational
solutions, XML and others. Moreover, there is a need for advanced mechanisms to "move the logic closer to the data".
The roundtable gathers representatives of most of the outstanding semantic repository and database vendors (Openlink, IBM, Systap, Ontotext, IBM,
CWI, MySQL) and many of the leading researchers in the field. The event is organized as joint activity between three EC research projects: LarKC,
SOA4All, and PlanetData.
The project main goal is to develop a set of tools for translating texts between the 23 official EU languages in real time and with high quality.
MOLTO takes a hybrid approach by combining grammar-based & statistical machine translation.
Ontotext involvement in this project is to provide:
- knowledge representation and reasoning infrastructure
- conceptual modeling and ontology alignment for the needs of the use cases
- two-way interoperability between ontology standards (OWL) and GF grammars, to enable multilingual natural-language-based interaction with machine-readable knowledge
- Using multiple languages to acquire new knowledge base entries
- Executing queries against the semantic knowledge databases
The project aims to deliver pluggable open-source libraries enabling standard translation tools and workflows. This technology will be a basic building block in translation
and retrieval services for both information producers and consumers.
This year Ontotext is sponsoring the Semantic Technology Conference for 5th annual time. The SemTech conference
will be held in San Francisco, USA, between 21-25th June, 2010.
Ontotext will present the new product developments and particular business applications in several presentation during the main program and one workshop held after the conference.
During the exhibit hours Ontotext will welcome all present and future partners, customers and friends at large booth (#201).
Sofia, Bulgaria. Ontotext received the prestigious Pitagoras award 2010 for ‘a company that has most successfully mastered
new scientific research or provided specialized services for the benefit of society for the past 3 years’. These awards are presented by
the Ministry of Education to honor important achievements in 11 categories where scientists and project teams compete individually or as groups.
This initiative started in 2003 and has become the most esteemed award for special contribution in the development of scientific research in Bulgaria.
The assessment of the contestants in the group categories covers a range of indicators such as participation in national and European projects,
awards, and attracted resources. This is an important recognition for Ontotext. It proves that a small software company in Bulgaria can participate
in research projects on a highest world-wide level, apply them into software products, and successfully compete with the global IT leaders.
Sofia, Bulgaria. Ontotext released the next version of Linked Life Data
(LLD) service. LLD is focussed on the life science and biotechnology domain and facilitates the semantic integration of data silos.
The service is based on OWLIM engine and is fully compliant with linked data standards. It performs complex queries that span over
multiple data sources.
The major service improvements are:
- Semantic annotations that link structured knowledge to database text fields like article abstracts;
- Increased number of data sources resulting in more than 4 billion statements;
- New front-end functionality that significantly improves the search and navigation user experience.
The Web 3.0 Winter Conference will take place on 26th and 27th of January in Santa Clara,
California. Continuing its support for the Web 3.0 series of events, Ontotext sponsors
the winter conference, which gathers speakers from many of the leading vendors in the field
of search and data management technology.
Sofia, Bulgaria. Ontotext released version 2.0 of
its LDSR
reason-able view to the web of data.
It offers improved search, querying, and exploration facilities for eight of the central datasets and in the
Linking Open Data (LOD) cloud.
LDSR uses the OWLIM engine to interpret the semantics of the data,
which allows for flexible matching of the query constraints disregarding the assertion syntax.
Users have the freedom to use any of the URIs which denote one and the same entity or relationship,
while the engine guarantees to always deliver extensive and semantically correct results.
Further, one can ask queries in different directions and at different levels of generality and even to
uncover relationships which require interpretation of data from multiple datasets.
The major novelties in the new version can be summarized as follows:
- Two additional datasets are added to the index:
Freebase and MusicBrainz (RDF from Zitgist);
- The latest version 3.4 of DBPedia is indexed;
- The new version of LDSR includes about 1 billion explicit statements,
which, through inference, convert in to 4 billion different retrievable statements.
Sheffield, UK; Vienna, Austria; Sofia, Bulgaria.
There is nothing one can do with semantic technology, that cannot be done without it!
Semantic technologies enable the development of more advanced information management
systems with less effort. Still, their widespread adoption faces the typical barriers
for each new technology: immature tools, unspecified best practices, lack of industry
support. To address these issues Ontotext have formed a team which combines the most
mature text-mining platform with cutting-edge semantic database and search tools and
robust solutions-providing organizations.
Ontotext announces today a strategic partnership with the GATE
team at from the University
of Sheffield and Matrixware Information Services
that represents a leap forward in open source
web mining and semantic systems. Many organisations are turning to open source solutions for
their server and infrastructure requirements, but until now it has been difficult to deploy
open source semantic search or knowledge management, for example. Ontotext has a decade of
experience with GATE and has deployed GATE-based systems for clients ranging from top-10
pharmaceuticals to national financial intelligence units. Now we are teaming with Matrixware,
Austria's premier information services company, to create a team that can supply enterprise
level support, custom development and training for our customers.
Ontotext provides core semantic technology, distinctive for its performance, scale, and
standards compliance. Over the last 9 years, Ontotext has established itself as a major contributor
to a number of high-impact open source projects like GATE and
Sesame.
The General Architecture for Text Engineering (GATE) is an open source platform,
developed over the last 15 years by the Natural Language Processing (NLP) group of the
University of Sheffield. Today GATE is the most-popular text-mining platform, used by
thousands around the world in areas ranging from identity theft prevention to
medical research. The GATE developerment team has built a reputation as a global leader in
NLP infrastructure, and have distilled its experience into a defined and repeatable process for
creating robust and maintainable text processing workflows.
Matrixware Information Services offers superior solutions and services for professional
information retrieval to the global market. These solutions and services help organizations to
face the information economy and thereby provide them with a distinct business advantage.
Matrixware builds strong, trusting relationships through cutting-edge, open science, open
source and open business concepts.
Berlin, Germany. At the end of November the authors of the
BSBM benchmark
published the results from an evaluation of three of today's outstanding RDF
databases: "BSBM Results for
Virtuoso, Jena TDB, BigOWLIM (November 2009)"
BigOWLIM proves to be several times faster than the other engines in loading the BSBM datasets.
On the 200M version of BSBM, BigOWLIM is some 20% slower than Virtuoso on
query evaluation with the complete query set and about 20% faster than it
with the reduced query set. The authors of BSBM recommend the reduced query set to
be considered for datasets above 25M.
While there are critiques about the suitability of BSBM for benchmarking RDF
databases, it is one of the most popular benchmarks in the field. It measures
the performance of the engines with respect to a variety of SPARQL queries
executed against a relatively dense (non-sparse) dataset. In contrast to
LUBM,
answering the queries of BSBM requires no reasoning.
Archive of earlier announcements of Ontotext is available here.