Ontotext to Showcase Automated Industry Classification for Open Data Company Graph at the Global Datathon 2018

Monday, September 17, 2018


Ontotext will take part in the Global Datathon 2.0 – the worldwide online data science hackathon between September 28th and 30th, organized by the Data Science Society for the third time. It’s a 48-hours competition for all data enthusiasts who are ready to explore real-world challenges together with like-minded people and mentors, and expand their network of connections globally.

As part of the practical data challenge, Ontotext will participate with ‘Automated Industry Classification for Open Data Company Graph’. Classification of companies into industry sectors is crucial for unlocking advanced business intelligence capabilities. A standardized industry classification opens up possibilities for various data processing and advanced analytical task such as reconciliation of company records from different sources, measuring the similarity between companies, calculating company ranking score, etc.

However, high quality commercially available company data is often difficult to afford while the readily available (and quickly growing) Open Data sources often lack practical approach to industry classification. And in any case, both Open Data and commercial data sources use inconsistent classification systems.

Another problem is incomplete information in the data sources. Industry information is often recorded during the initial manual data collection and if there is not an appropriate industry category available in the classification, it is often left empty. Another scenario is when the data comes from textual documents where the data about the company’s industry is not specifically mentioned in the original source and therefore not assigned.

With all this in mind, Ontotext’s big data challenge is to develop an automated and standardized classification model that can be used on any source to enrich the originally available data with industry sector information. Aware that there are many classification systems used nowadays with their pros and cons, Ontotext will be using the ICB as a taxonomy for company classification for this challenge.

The Global Datathon is open for data science students as well as experts with different backgrounds and data science skills. Everyone will have a chance to experiment with new types of data presented in 6 real-world cases, covering topics in different domains such as communication engineering, internet of things, environmental issues, etc. Internationally renowned mentors will help all participants to develop their data models.

This year, the Global Datathon will be hosted for the first time not only online, but with physical presence in more than one location in Asia and Europe. There will be a special platform for the participants to collaborate and exchange ideas before, during and after the Global Datathon. There will also be a video stream of the finalists’ presentations.

Get ready to choose a big data challenge and book your place online before 24th of September to take part in the second Global Datathon!

