Extensive experiments were carried out using the Amazon EC2 infrastructure in order to gain as much information as possible on the following subjects:
- How do the performance of Amazon Cloud instances compare to "real" physical, non-virtualized server configurations?
- What are the limits for the horizontal scalability of the BigOWLIM Replication Cluster configuration?
- What is the cost of query evaluation when BigOWLIM is deployed on a cloud infrastructure?
The results of the experiments carried out with BigOWLIM version 3.3 and 3.4 are reported in this presentation. These can be summarized as follows:
- OWLIM Replication Cluster can handle 5 Million SPARQL queries per hour - the best query performance reported for SPARQL and RDF ever. In a test based on BSBM, a cluster of 100 Amazon EC2 High-Memory Extra Large (HM-XL: 2x3.25 ECU, 17 GB of RAM) instances demonstrated a throughput of 200,000 BSBM query mixes (5,000,000 SPARQL queries) per hour. This result was achieved with 1000 concurrent clients issuing queries against a 100 million triples BSBM dataset.
- Excellent horizontal scalability - almost linear up to a 100 node cluster (the largest cluster tested)
- Low parallelization overhead - each node in the 100 node cluster had 90% utilization compared to a standalone machine
- The BSBM 100M query performance of BigOWLIM on an Amazon XL instance (2xECU and 15 GB of RAM) is comparable to the performance of a physical machine with single Intel Core i7 CPU at 2.9GHz and 12 GB of RAM (ontosol test environment);
- 100,000 SPARQL queries can be answered per 1$ of Amazon EC2 infrastructure.