Ontotext’s Case for Datathon 2018: ML Prediction of Parent-Subsidiary Relations from News

Monday, January 29, 2018


For a second consecutive year, Ontotext is a sponsor of Datathon – a weekend-long challenge for all data passionates willing to work on real-world business cases in the areas of computer vision, Natural Language Processing (NLP) and Artificial Intelligence (AI). This year, the third edition of this event will be on a larger scale as it will happen online, making it the first digital Data Hackathon.

Datathon 2018 is organized by the Data Science Society and will take place from February 9th to 11th. You can join online or offline at Sofia University, Bulgaria, where the event will be held. All participants need to register before February 5th.

At this year’s event, Ontotext will participate with ‘Machine learning (ML) prediction of parent-subsidiary relations from news’. The idea for Ontotext’s case study springs from real business cases of our customers who often have large text collections that need to be searched efficiently. Frequently, they are not only interested in key concepts such as Organizations, People and Locations, but also in the relationships between them. ML methods can train AI how to extract such relations occurring in the text on the basis of already annotated examples.

However, these methods need a large amount of expert annotations, which are expensive. One way to overcome this limitation is to teach AI to “read” both text and Open data knowledge. Lately, it is becoming more and more popular to use facts from open knowledge databases such as Wikipedia/ DBPedia to automatically annotate much larger amounts of text than human experts could manage. Then, by applying ML methods and training AI to recognize relations between entities, new relations can be automatically identfied, thus “filling the gaps” in open knowledge bases.

Ontotext’s case will challenge a team of data enthusiasts to use such ML methods on a text that has been automatically annotated with concepts and relations from Open Data and infer relations that are missing or wrong in that data.

This year’s experienced mentors will be Ontotext’s Dr. Laura Tolosi-Halacheva, a senior data scientist, and Andrey Tagarev, a software developer – both members of the Research and Innovations team. After the event, Ontotext will give selected participants vouchers for free training video materials adapted from the one day “Semantic technology Proof-of-Concept” live training.

The Dathaton will award the teams that have come up with the most precise, creative and elegant solution to a data challenge.

If you want to participate, do not forget to register before the 5th of February.

For more information, contact Doug Kimball, Chief Marketing Officer at Ontotext