Datathon – Hacking the Bulgarian Commercial Register

Mar 20, 2017


At the Datathon Bulgaria 2017  Ontotext will challenge teams of data enthusiasts to convert data from the Bulgarian Commercial Register into a Linked Open Data (LOD) format in order to demonstrate how semantic graph databases can reveal relationships and uncover hidden facts from denormalized data, for example:

  • Identify and rank the biggest groups of related companies in Bulgaria or in a specific region;
  • Board-walk: analyze networks of influence through directors that co-participate in boards of multiple companies.

As part of the first practical data challenge in Central and Eastern Europe – Datathon Bulgaria Ontotext will participate with ‘Hacking the Bulgarian Commercial Register’. For this challenge, Ontotext will provide the teams with a subset of the Bulgarian Commercial Register between 2008-2017 and will mentor them throughout the steps of converting the data into LOD using a simple RDF model and linking it to other open datasets.

Read the full Case Description.

Ontotext’s challenge at the Datathon will show how a big set of highly complex data such as the Bulgarian Commercial Register – currently organized as a set of daily updates in XML files – can be aggregated and converted into an LOD-suitable format that is accessible, open (based on open standards and recommendations by W3C) and interconnected (showing the relationships between companies, managers, locations, regulatory and court filings).

The resulting dataset will allow to easily link all the data with additional open data sources such as Geonames (all geographic objects on Earth), DBPedia (structured version of Wikipedia), Wikidata, OpenCorporates and many others. Creating an LOD format of the Bulgarian Commercial Register has the potential to make the data more transparent and informative for businesses, as well as easy and efficient to query by researchers and reporters, thus enhancing availability and helping fight corruption.

The mentoring of ‘hacking’ the Commercial Register will be provided by Dimitar Manov and Plamen Tarkalanov from Ontotext and Alex Angelov from OpenCorporates. One week before the event, Ontotext will supply the teams with free training video materials adapted from the one day “Semantic technology Proof-of-Concept” live training.

In order to get familiar in advance with the case, all participants can access the listed bellow materials:

List of the Ontology Models

List of the Data Sets prepared for the case

The Dathaton will take place between March 24 and 26 and will award the teams that have come up with the most precise, creative and elegant solution to the data problems.

Every participant will get vouchers for using the Standard tier of GraphDB on the cloud free of charge that will be valid for three months after the event.

If you are visiting the conference and want to meet us, get in touch with us.