Ontotext talks about Open Data at Data Summit Brussels

March 27, 2015 4 mins. read Marin Dimitrov

Opening up for change Open Data

A few weeks ago Marin Dimitrov gave a presentation titled “Enabling low-cost Open Data Publication and Reuse” at the Data Summit Brussels. The presentation was based on the ongoing work in one of the EC funded research projects that Ontotext participates in: DaPaaS, which has the goal of developing a platform for Open Data publishing and access.

In recent years, Open Data initiatives have been growing at a rapid pace worldwide. More and more data (mostly from government organizations) has been made available for open access. Organizations such as the Open Data Institute have set their mission to educate government organizations and SMEs on how to publish, utilize and monetize Open Data.

It has a significant potential for improving the transparency and quality of public services, as well as optimizing costs and improving innovation in various industry sectors. In the McKinsey report titled “Open data: Unlocking innovation and performance with liquid information” analysts wrote:

Our research suggests that seven sectors alone could generate more than $3 trillion a year in additional value as a result of open data, which is already giving rise to hundreds of entrepreneurial businesses and helping established companies to segment markets, define new products and services, and improve the efficiency and effectiveness of operations.

Use the power of semantic graph databases for your enterprise! Read our White Paper: The Truth About Triplestores: The Top 8 Things You Need to Know When Considering a Triplestore!

Open Data Challenges

At the same time, there are some challenges to the wider adoption of Open Data. For example, the quality of data is a significant problem. Lots of organizations are obligated to open up their data but lack the required expertise, tooling, resources or sustainability plans on how to make this data useful, how to improve its quality and how to maintain it with regular updates and enhancements.

Data Quality

Most of the Open Data available these days is really just plain CSV files of questionable quality, which are difficult to access and use. While some organizations will reference the hundreds of thousands of open datasets available as a proof for the success of the Open Data movement, very few have taken a more critical look at the quality, usage statistics and value that most of these datasets provide.

Additional factors limiting the adoption of Open Data include the lack of expertise, resources and commitment by many organizations to make data available as live data services and APIs easily accessible to 3rd party applications.

DaPaaS – Making it Easier to Publish and Reuse Open Data

DaPaaS is an EC-funded research project that has the goal of making it easier to publish and reuse Open Data. The partners in the project include: Ontotext (Bulgaria), SINTEF (Norway), Swirrl (UK), Open Data Institute (UK), and Sirma Mobile (Bulgaria), as well as an associated partner from South Korea: Saltlux.

The DaPaaS project has chosen the Linked Data paradigm as a way to publish and consume Open Data, so that the data can be better described, interlinked and queried in a way that is not possible utilizing the traditional approaches of CVS files or very simple Web APIs providing access to Open Data. The key building blocks of the DaPaaS platform include:

  • Grafter – an open source suite of tools and a DSL for data cleaning and transformation. Grafter can easily transform from one tabular format to another or from a tabular format to RDF. It is designed for stream-like processing, so that even very large datasets can be processed efficiently. A key feature of Grafter is that the data transformation and cleanup workflows can be easily packaged as REST services and that the transformations are repeatable and reusable over the same dataset in the long term.
  • Grafterizer, another open source tool providing an IDE for the Grafter suite so that developers can easily create data transformation and cleanup workflows.
  • A scalable RDF database-as-a-service (DBaaS), based on Ontotext’s enterprise-grade GraphDB, which makes it possible to instantly deploy a large number of live data services (RDF databases and SPARQL endpoints) over the open datasets, which were cleaned up and RDF-ized with Grafter and Grafterizer.

The DaPaaS Open Data platform will soon be open to the general public. More information is available via Twitter and email.

Want to learn more how to benefit from open data in the context of RDF triplestores like Ontotext’s GraphDB?

White Paper: The Truth About Triplestores

Download Now

Article's content

Marin's sharp mind can explain complex things in a simple way, making him an invaluable resource in Ontotext. He is a frequent speaker on semantic conferences and open data meetups at various technology related events.