Menu

Ontotext

OWLIM Enterprise

OWLIM Enterprise (also known as OWLIM Replication Cluster) is an enterprise grade, distributed database system that provides:

  • Scale-out handling of concurrent query requests: the query processing rate scales linearly with the number of worker nodes
  • Resilience in the event of hardware/software failure: automatic failover of cluster nodes, synchronization and dynamic configuration

A cluster is organized as one or more master nodes that manage one or more worker nodes. Failover and load-balancing between worker nodes is automatic. Multiple (stand-by) master nodes ensure continuous cluster performance even in the event of a master node failure. The cluster deployment can be modified when running, which allows worker nodes to be added during peak times, or released for maintenance, backup, etc.

Such a set up guarantees a high performance always on service that can handle millions of query requests per hour. See the cloud performance page for details.

 

Cluster deployment using two master nodes and three worker nodes

OWLIM-Enterprise deployments keep a complete copy of the entire RDF database at each worker node. This allows a single worker to evaluate one client query and avoids the need to for distributed joins.

Managing the cluster

A master node does not store any data, rather it is the entry point to the cluster - all read and write requests should pass through a master node. The current read/write master node is responsible for keeping the worker nodes synchronized. To do this, it dispatches updates to a randomly selected worker, tests the update and if everything is ok then the update is pushed to the other worker nodes. Read requests are simply sent to the worker node with the shortest processing queue, thus providing simple, but practical load-balancing.

Cluster deployment showing read/write requests

 

Cluster deployment showing the processing queues of the master and worker nodes

OWLIM-Enterprise in use

OWLIM-Enterprise has a proven track record in dynamic environments, where non-trivial reasoning (through forward-chaining and materialization) is performed on top of data which is being constantly updated.

In the summer of 2010, OWLIM Replication Cluster was deployed as part of the BBC's World Cup website. Here OWLIM supported a stream of continuous updates while simultaneously processing huge numbers of queries. A blog from Jem Rayfield, a Senior Technical Architect at the BBC, describes how OWLIM was integrated into the semantic publishing stack, where a small number of machines (six workers and two masters) distributed across two data centers were able to cope with more than a million SPARQL queries per day, while handling hundreds of updates per hour - all in the context of non-trivial OWL reasoning based on materialization. To the best of our knowledge, no other RDF engine is capable of providing such performance.

OWLIM Replication Cluster was recently put through its paces with the Berlin SPARQL Benchmark running on the Amazon EC2 cloud infrastructure. The results show an almost linear improvement in parallel query throughput against the number of deployed worker nodes. Cluster deployments of up to 100 worker nodes were tested, which scored 200,000 BSBM query mixes per hour. This equates to 5 million SPARQL queries per hour. The total Amazon EC2 costs for such a setup equate to 100,000 SPARQL queries per dollar!