News

Following the long tradition of supporting significant events in the development of the Semantic Web, Ontotext is a gold sponsor of this year's RR2010.

The International Conference on Web Reasoning and Rule Systems (RR) is a major forum for discussion and dissemination of new results concerning Web Reasoning and Rule Systems. It will be held in Bressanone, Italy between September 22 and 24, 2010.

RR 2010 is build on the success of the first three International Conferences on Web Reasoning and Rule Systems. They were held in Innsbruck, Austria (2007), Karlsruhe, Germany (2008), and Chantilly, Virginia, USA (2009), where received enthusiastic support from the Web Reasoning community.

In 2010, RR will continue to attract the best Web Reasoning and Rules researchers from all over the world.

Traditionally Ontotext supports and this year's issue of the AIMSA conference - probably the oldest, running conference for AI research and development since 1984. The venue will be located in Varna, Bulgaria, from the 8th to the 10th of September.

Ontotext is top one of the three sponsors this year and will welcome each participant with its new innovative brochure which has interesting stories around the latest product releases.

Atanas Kiryakov (Ontotext's Executive Director) was invited as keynote speaker earlier this year and he will present a talk on "FactForge – the Fast Track to the Centre of the Data Web or How reason-able views bring the Semantic Web closer to its tipping point"

A tutorial led by Ontotext's managers will take place on Friday, 10th September, between 14:30 and 17:00. The topic is "Semantic Technologies and Applications: Web Mining, Text Analysis, Linked Data Search and Reasoning". This tutorial will demonstrate how vision, concise engineering and 10 years of experience can deliver successful applications of semantic technology. The tutorial is free for participants who register for AIMSA and S3T events.

Mariana Damova will present poster "Mapping Data Driven Ontology and Upper Ontology" at AIMSA and the paper "Query-Based Summarization: A survey" afternoon on September 12, at S3T.

AIMSA is co-located with S3T event. The latter will take place in Varna, between 11-12th of September. S3T is rather new international conference which will provide a forum for connecting researchers and international research communities for worldwide dissemination and sharing of ideas and results in the areas of Software and Services and Intelligent Content and Semantics.

Marin Dimitrov, CTO of Ontotext, is to lead the Semantic Technologies Track at the 3rd GATE Training Course taking place between August 30th and September 3rd 2010 in Montreal, Canada. The event comprises of 4 parallel tracks (16 modules) covering an introduction to Text Mining and GATE, intermediate and advanced GATE programming and a track on Semantic Technologies. The Semantic Technologies Track takes on Semantic Web standards, ontology engineering, Linked Data management, RDF databases and semantic search.

GATE is an open source platform for text analysis and language engineering. The core GATE team from the University of Sheffield has been developing and maturing the platform since 1995; with over 50 thousand engineers and linguists experienced in GATE, it qualifies as today's most popular text analysis platform. The training course in Montreal builds on the experience and the popularity of the GATE training course held in July 2009 and May 2010 in Sheffield, UK.

Registration for the training course is still open. More information about the event is available at http://gate.ac.uk/conferences/montreal-2010/.

As Ontotext announced earlier, the company's own semantic repository BigOWLIM was successfully integrated into the high performance Semantic Web publishing stack powering the BBC's 2010 World Cup website. BigOWLIM is used there as a triple-store performing OWL reasoning on continuously changing data and handling millions of page requests per day.

A couple of recent blog posts from the technical team at BBC provide an insight into the business case for deployment of semantic technologies in their World Cup website, the technical architecture of the publishing stack, the strategic importance of the project's success and the plans for usage of semantic technology and linked data within the BBC.

In "The World Cup and a call to action around Linked Data", John O'Donovan, Chief Technical Architect, Journalism and Knowledge, BBC Future Media & Technology, discusses the business benefits of the implemented semantic solution:

"The World Cup site is our first major statement on how we think this (the Semantic Web) can work for mass market media and a showcase for the benefits it brings. … Though we have been using RDF and linked data on some other sites (…) we believe this is the first large scale, mass media site to be using concept extraction, RDF and a Triple store to deliver content."

"…we are not publishing pages, but publishing content as assets which are then organised by the metadata dynamically into pages, but could be re-organised into any format we want much more easily than we could before. …There is also a change in editorial workflow for creating content and managing the site. This changes from publishing stories and index pages, to one where you publish content and check the suggested tags are correct. The index pages are published automatically. This process is what assures us of the highest quality output, but still saves large amounts of time in managing the site and makes it possible for us to efficiently run so many pages for the World Cup."

"As more content has Linked Data principles applied to it … the vision of a Semantic Web moves closer. Importantly, what we have been able to show with the World Cup, is that the technology behind this is ready to deliver large scale products."

"This is more than just a technical exercise - we have delivered real benefits back to the business as well as establishing a future model for more dynamic publishing which we think will allow us to make best use of our content and also use Linked Data to more accurately share this content and link out to other sites and content, a key goal for the BBC. We look forward to seeing the use of Linked Data grow as we move towards a more Semantic Web."

In a following post "BBC World Cup 2010 dynamic semantic publishing", Jem Rayfield, Senior Technical Architect, BBC News and Knowledge, provides more information on the technical architecture of the high-performance publishing stack and the related data flows and data modelling:

"The World Cup 2010 website is a significant step change in the way that content is published. … As you navigate through the site it becomes apparent that this is a far deeper and richer use of content than can be achieved through traditional CMS-driven publishing solutions.

"The site features 700-plus team, group and player pages, which are powered by a high-performance dynamic semantic publishing framework. This framework facilitates the publication of automated metadata-driven web pages that are light-touch, requiring minimal journalistic management, as they automatically aggregate and render links to relevant stories."

"The foundation of these dynamic aggregations is a rich ontological domain model. The ontology describes entity existence, groups and relationships between the things/concepts that describe the World Cup. For example, "Frank Lampard" is part of the "England Squad" and the "England Squad" competes in "Group C" of the "FIFA World Cup 2010". The ontology also describes journalist-authored assets (stories, blogs, profiles, images, video and statistics) and enables them to be associated to concepts within the domain model…."

"A RDF triplestore (ref. BigOWLIM) and SPARQL approach was chosen over and above traditional relational database technologies due to the requirements for interpretation of metadata with respect to an ontological domain model. The high level goal is that the domain ontology allows for intelligent mapping of journalist assets to concepts and queries. The chosen triple store provides reasoning following the forward-chaining model and thus implied inferred statements are automatically derived from the explicitly applied journalist metadata concepts."

"This inference capability makes both the journalist tagging and the triple store powered SPARQL queries simpler and indeed quicker than a traditional SQL approach. Dynamic aggregations based on inferred statements increase the quality and breadth of content across the site. The RDF triple approach also facilitates agile modeling, whereas traditional relational schema modeling is less flexible and also increases query complexity."

"Our triple store is deployed multi-data center in a resilient, clustered, performant and horizontally scalable fashion, allowing future expansion for additional ontologies and indeed linked open data (LOD) sets. … The triple store is abstracted via a JAVA/Spring/CXF JSR 311 compliant REST service. ... The API is designed as a generic fa?ade onto the triple store allowing RDF data to be re-purposed and re-used pan BBC. This service orchestrates SPARQL queries and ensures that results are dynamically cached with a low 'time-to-live' (TTL) (1 minute) expiry cross data center using memcached."

"This dynamic semantic publishing architecture has been serving millions of page requests a day throughout the World Cup with continually changing OWL reasoned semantic RDF data. The platform currently serves an average of a million SPARQL queries a day with a peak RDF transaction rate of 100s of player statistics per minute. …."

"The development of this new high-performance dynamic semantic publishing stack is a great innovation for the BBC as we are the first to use this technology on such a high-profile site. It also puts us at the cutting edge of development for the next phase of the Internet, Web 3.0."

The blog posts by BBC set off a wave of enthusiasm in the community. Some of the reflections are cited below.

BBC World Cup Website Showcases Semantic Technologies, post by Richard MacManus, the founder of ReadWrite Web. "…if there was a World Cup for the Semantic Web, then the BBC may have lifted the trophy for its country"

First BBC microsite powered by a triple-store, post by Yves Raimond at DBTune. "All this is very exciting, the World Cup Website proved that triple store technologies can be used to drive a production website with significant traffic. I am expecting lots more parts of the BBC web infrastructure to evolve in the same way :-)"

These posts were accompanied by an avalanche of twits and blog comments, dominated by the word "impressive". A few of the comments of the ReadWriteWeb post were particularly positive: "excellent technology on both software and hardware", "It Begins ...", "It's really fantastic to see organizations like the BBC building really exciting semantic applications which demonstrate quantifiable business value. Great stuff!..."

If you are lost in your masses of documents, unlinked data and scattered knowledge, KIM 3 might be the remedy for you!

KIM 3 gives you the ability to:

KIM 3 is faster than ever, modular, and based on the latest semantic data standards. It allows you to utilize and harmonize your conceptual models and taxonomies, linked them automatically to you documents or unstructured data, and thus more easily access, navigate, and reuse your enterprise content.

Moreover, in order to make it more accessible, we can build a tailored KIM-based system for you and train you how to go on from there.

The most important changes and improvements in KIM 3 cover the following areas:

The National Archives has invested in an 'intelligent discovery tool' to improve searches of archived UK Government websites. The contract for development of Government Web Archive Semantic Knowledge Base was granted to a consortium of semantic technology professionals lead by Ontotext.

The consortium includes experts from the GATE team at the University of Sheffield, creators of the most comprehensive open-source text mining ecosystem in the world, as well as System Simulation, a major UK integrator of images, digital assets, collection and content management systems.

The contract was awarded after a call for competition and a public tender process.

The project started in mid June and is expected to finish before the end of March 2011. It will aim to bring new methods of search, navigation and information modeling to The National Archives and in doing so make the web archive a more valuable and popular resource.

The Government Web Archive Semantic Knowledge Base will also bring together publicly available linked data and open-source text mining technology in a system, which is easy to understand, and can be managed and extended in a predictable and cost efficient manner.

BigOWLIM 3.3 Handles Millions of Queries per Day, Supports OWL 2 RL Reasoning and Provides Unmatched Linked Data Integration, Management and Retrieval Capabilities

Ontotext announces version 3.3 of BigOWLIM, the World's most scalable semantic repository. This release consolidates as part of the main product a number of advanced features, which have existed as bespoke developments. Considerable effort was also dedicated to making BigOWLIM more robust and easy to use.

The development of this version was influenced by the requirements of FactForge and LinkedLifeData (two of the most advanced linked data portals) and the BBC's 2010 World Cup website - probably the most challenging real-world use case of semantic repositories implemented so far. Within the LarKC project, BigOWLIM is used as the data layer in a platform for Web-scale reasoning that features a range of diverse reasoning plug-ins.

The key characteristics and features of BigOWLIM include:

These features are already proven at FactForge (previously known as LDSR), where BigOWLIM is used to load 8 of the central LOD datasets in a repository which contains 1.2 billion explicit and 0.8 billion implicit statements. BigOWLIM's owl:sameAs optimization allows FactForge to deal with 'only' 2 billion statements in its indices, while the number of distinct statements retrievable form the repository is 10 billion. This feature allows FactForge to deliver non-inflated query results, while the semantics of owl:sameAs is still fully accounted for during query evaluation.

BigOWLIM version 3.3 is also at the heart of the LinkedLifeData RDF warehouse (release 0.5), which combines 25 of the most popular biomedical databases in a repository that contains more than 4 billion statements.

The latest version of the BigOWLIM repository has been successfully integrated into the high performance Semantic Web publishing stack powering the BBC's 2010 World Cup football website, performing OWL reasoning with continuously changing data and handling millions of page requests per day.

Included with version 3.3 is a thoroughly reworked documentation set that includes:

Some popular namespace prefixes come predefined within BigOWLIM 3.3 in order to simplify query writing. Such as the prefixes for: the RDF, RDFS, and OWL schemata; all the prefixes for linked data namespaces used in FactForge; and prefixes for projects like Good Relations. A detailed list is of the predefined prefixes can be found in the prefixes.txt file in the distribution.

Furthermore, the OWLIM website has been revised to include more relevant details, latest benchmark results, etc. For further information, please contact OWLIM-info-at-ontotext.com.

Ontotext announces re-branding of its Linked Data Semantic Repository (LDSR) linked data service to FactForge.

FactForge is designed as an index that provides a fast track to the center of the web of linked data. It represents a reason-able view including 8 of the LOD datasets, namely: DBPedia, Geonames, Wordnet, Musicbrainz, Freebase, UMBEL, Lingvoj and the CIA World Factbook. Those datasets are preprocessed and loaded into BigOWLIM to form an integrated dataset of about 1.2 billion exlicit statements. Forward-chaining is performed to materialise 0.8 billion implicit statements, in accordance with the semantics of the ontologies used in the datasets.

For end-users, FactForge is a free public service delivering a fast and reliable single point of access to the central LOD datasets. The data can be accessed in several ways:

Shortly after the release in May, Ontotext announced that version 0.5 is available online.

LLD is a semantic data integration platform for the biomedical domain that interlinks 25 popular biomedical data sources. The current release includes more than 4 billion statements, which represent correlations between 0.5 billion biomedical entities. The main focus of this release was the improvement of the user interface of the system and optimization of the auto-complete search functionality.

Changes in 0.5 release:

For the fifth year running, Ontotext will participate in SemTech - the world's largest, most authoritative and highest ranking Semantic Technology conference. The venue will be located in San Francisco, from the 21st to the 25th of June.

A platinum sponsor of this event, Ontotext is pleased to welcome guests and clients at booth #101 where they can sample the technology, make an enquiry and receive a quote or just to touch base.

"We are excited to have the opportunity to offer the latest developments in our technologies in three separate presentations on Wednesday, 23rd of June - the first day of the main conference program. What's more, on Friday afternoon, Ontotext will present a 360° view on its technology and solutions in the course of a half-day seminar" - says Atanas Kiryakov, Executive Director of Ontotext.

The first presentation - "Using GPUs to Browse through Billions of Linked Data Facts" will be delivered by A. Kiryakov and falls within the Linked Data Track of the conference. The focus is on the need to devise a meaningful way to filter and represent a "good" portion of the information returned as results in FactForge - a gateway to the LOD cloud, which allows querying and searching eight of its central datasets. A mechanism, called "RDF priming" is introduced to tackle this task. Furthermore, the benefits of using the CUDA architecture of NVIDIA graphic cards are demonstrated as a cost-effective alternative to dual-core servers.

"How to integrate 10 databases and interlink them with 10 million documents in 10 weeks" - a presentation by Matthew Petrillo and Vassil Momtchev, group leader in Ontotext specializing in the implementation of semantic solutions in the life sciences and healthcare domain. Matthew and Vassil will take us through the company's experience in implementing Semantic Enterprise solutions and reusing common software components and best practices. The talk will cover the full path of adopting semantic technology in the enterprise and demonstrate a fully operational infrastructure.

For the Industry Track of the conference, Atanas Kiryakov and Barry Bishop will draw the attention to publishing with their "OWLIM Cluster Allows Resilient Handling of Millions of Queries per Day" presentation. At the heart of the talk is a real-world scenario where BigOWLIM is used to handle millions of requests per day, serving as a back-end to seamlessly bring the power of the Semantic Web to the website of one of the largest and most popular television production companies in the World.

On Friday afternoon the Ontotext team will conduct the "360° Semantic Technologies: Web Mining, Text Analysis, Linked Data Search and Reasoning" workshop, which will conclude the Business Applications track on the last day of the conference. This workshop will demonstrate how vision, concise engineering and 10 years of experience can deliver successful applications of semantic technology. Three Ontotext partners will also present their services and technology as part of the workshop: BPEng (Italy), Profium (Finland) and TopQuadrant (USA).

The W3C Workshop - "RDF Next Steps" starts on June 26 and is hosted by the National Center for Biomedical Ontology (NCBO). In the space of a couple of days, the attendees and presenters will try to gather and analyse feedback from the Web community on whether and, if yes, in which direction RDF should evolve. One of the main issues the Workshop should help deciding is whether it is timely for W3C to start a new RDF Working Group to define and standardize a next version of RDF.

Ontotext's involvement in the Workshop shall materialise in the presentation by Atanas Kiryakov and Vassil Momtchev of a position paper called "Triplesets: Tagging and Grouping in RDF Datasets". The paper argues that the need to augment the standard triple-based RDF data model towards quadruples is widely recognized. Furthermore, Ontotext proposes semantics for the operations addition and removal of quadruples from integrated datasets. Encouraged is also the need to further extend the RDF model and propose a specific mechanism called triplesets. The proposed mechanism is already supported in the OWLIM semantic repository and is used in the data layer of LarKC - probably the most ambitious large scale reasoning project.

SemData@ESWC Workshop - the second event of the SemData series, took place in Crete, Greece on 30th of May. The workshop was collocated with the Extended Semantic Web Conference and was kicked off with an introductory talk by Atanas Kiryakov of Ontotext AD.

SemData@ESWC builds on the success of the first event in the series, hosted by Ontotext in Sofia in March this year. Only a few months into the start of the SemData initiative in Jan 2010, it has already succeeded in bringing together many of the leading academy experts in the field, enlisting more than half of the leading semantic repository vendors (ORACLE, IBM, OpenLink, Franz, SYSTAP, Ontotext) as well as the developers of YARS, MonetDB, WebPIE and HEXASTORE.

The agenda and the presentations from the SemData@ESWC Workshop can be found at: http://semdata.org/events/2010/eswc

On May 27th, Ontotext released an enhanced version of the LinkedLifeData (LLD)- an unique knowledge discovery platform, based on scalable semantic data integration in life sciences and health care. It supports heterogeneous data sources, simple information updates, and easy incremental extension and integration of datasets.

The current release includes/combines data from 25 different data sources covering information about genes, proteins, molecular interactions, drugs, small chemical compounds, medications' clinical trials, side effects, scientific publications and many more. The LinkedLifeData repository stores 4.2 billion statements, which interconnect 583 million different entities. The new release represents an enlarged knowledge repository with information from 3 new data sets:

In addition to the provided SPARQL end-point, RelFinder is integrated as a new data mining tool. RelFinder is an open source application for interactive relationship discovery in RDF datasets. Researchers can access it on http://linkedlifedata.com/relfinder to mine the huge repository of LLD for distant relations between different entities, which may be not so obvious if using conventional mining techniques.

LLD is developed as part of EU-funded project LarKC in co-operation with AstraZeneca - a global, innovation-driven biopharmaceutical business with a primary focus on the discovery, development and commercialization of prescription medicines. The primary focus of LinkedLifeData is to provide innovative approaches to analyze the available scientific information, to combine it with the companies' internal knowledge, thus facilitating and speeding-up the drug development process at lower costs. LinkedLifeData (LLD) is freely available for the scientific community at: http://linkedlifedata.com.

On 16/04/2010 Richard MacManus published on ReadWriteWeb a post "The Modigliani Test: The Semantic Web's Tipping Point", which essentially argues that the linked data are not sufficiently linked. He wrote that "The tipping point for the long-awaited Semantic Web may be when you can query a set of data about someone not too famous, and get a long list of structured results in return". Then he defined the "Modigliani Test" for the Semantic Web: he wants to be able to query a search engine "tell me the locations of all the original paintings of Modigliani" and get back large list of results.

In a LarKC blog post http://blog.larkc.eu/ from 23/04/2010 Atanas Kiryakov presented how the LDSR reason-able view to the linked data web is passing the test.

On 11th and 12th of March Ontotext hosts SemData@Sofia round table - the first one of a serie of events called SemData - dedicated to semantic data management.

The goal of this event series is to investigate various aspects of semantic databases and data management in the large. We seek expert discussions and trans-disciplinary collaborations on issues such as semantic repositories, their virtualization and distribution, and interoperability with relational solutions, XML and others. Moreover, there is a need for advanced mechanisms to "move the logic closer to the data".

The roundtable gathers representatives of most of the outstanding semantic repository and database vendors (Openlink, IBM, Systap, Ontotext, IBM, CWI, MySQL) and many of the leading researchers in the field. The event is organized as joint activity between three EC research projects: LarKC, SOA4All, and PlanetData.

The project main goal is to develop a set of tools for translating texts between the 23 official EU languages in real time and with high quality. MOLTO takes a hybrid approach by combining grammar-based & statistical machine translation.

Ontotext involvement in this project is to provide: The project aims to deliver pluggable open-source libraries enabling standard translation tools and workflows. This technology will be a basic building block in translation and retrieval services for both information producers and consumers.

This year Ontotext is sponsoring the Semantic Technology Conference for 5th annual time. The SemTech conference will be held in San Francisco, USA, between 21-25th June, 2010.

Ontotext will present the new product developments and particular business applications in several presentation during the main program and one workshop held after the conference.

During the exhibit hours Ontotext will welcome all present and future partners, customers and friends at large booth (#201).

Sofia, Bulgaria. Ontotext received the prestigious Pitagoras award 2010 for ‘a company that has most successfully mastered new scientific research or provided specialized services for the benefit of society for the past 3 years’. These awards are presented by the Ministry of Education to honor important achievements in 11 categories where scientists and project teams compete individually or as groups. This initiative started in 2003 and has become the most esteemed award for special contribution in the development of scientific research in Bulgaria.

The assessment of the contestants in the group categories covers a range of indicators such as participation in national and European projects, awards, and attracted resources. This is an important recognition for Ontotext. It proves that a small software company in Bulgaria can participate in research projects on a highest world-wide level, apply them into software products, and successfully compete with the global IT leaders.

Sofia, Bulgaria. Ontotext released the next version of Linked Life Data (LLD) service. LLD is focussed on the life science and biotechnology domain and facilitates the semantic integration of data silos. The service is based on OWLIM engine and is fully compliant with linked data standards. It performs complex queries that span over multiple data sources.

The major service improvements are: