Read about the first Datathon in Central and Eastern Europe and the case Ontotext's winning team worked on.
What better to do on a rainy working Saturday in Sofia than a Hackathon? Teams are ready, ideas are ambitious and pizzas are hardly enough for the high-calorie consumption that thinking out of the box requires.
Needless to say, we learned a lot. We teamed with colleagues we didn’t work with closely before. We were goal-oriented and pragmatic enough and, for a day and a half, we managed to deliver a meaningful result, convincing the public that there was real value in the prototypes developed.
And maybe the most valuable takeaway was that Ontotext technologies are not rigid or limited to a set of commercial products, but can be exploited in new, innovative ways. Expect to see some of those put into practice soon!
Traditional market analysis is a tedious process that is customized and carried out manually for each beneficiary. Currently, keyword analysis of competitive websites determines the key concepts around which the competition is trying to position themselves on the online market. Research analysts in a company gather information relevant to the market from a variety of resources, including government agencies, primary and secondary research.
Semantic Technology can automate much of the workflow of market research. Given a set of resources pointed as relevant to the company, our tools can extract the entities that denote the big players in the industry, their relations, dynamics in the news, etc. This was a non-technical project but it provided a good use case for a platform like FactForge-News, used in the Today’s News project (see below).
Ontotext’s focus on Social Media analysis has been secondary to the traditional publishing channels. We are involved in the Pheme project, which gives the premises for future development of Social Media analysis. This small toy-project involved the analysis of our own Twitter accounts, which revealed the main topics that Ontotext cares and speaks about.
Every year, each sports updates and publishes a list of banned substances against doping. Ontotext’s Linked Life Data (LLD) can be used for automated recognition of banned substances in text, may that be a list of ingredients, a list of active substances in a drug, etc.
The team managed to augment LLD with additional data from PubChem and successfully implemented such a checker during the hackathon. As a test, the ITF prohibited drug list was normalized to 11 955 distinct compounds with an overall of 97 330 literals. The implemented pipeline was able to identify both meldonium and mildronate, which were recently added to the ITF banned drugs list, and because of which Maria Sharapova was suspended.
Word vectors are the promising first steps of deep learning in the field of Natural Language Processing. We want to take Ontotext’s text analysis tools to a higher level of performance by implementing word vectors features and deep neural networks to serve document classification (including sentiment analysis, topic classification, keyword assignment), Named Entity Recognition, document clustering, entity clustering, topic modeling, content recommendation, etc.
At the end of the day, we improved the current models for multi-label classification by 4%! And they were pretty good to start with, showing more than 80% F1 on a challenging use case that we developed for The IET. We also managed to cluster named entities occurring in interviews with Holocaust victims (we are part of the EHRI project), which should give us a start for discovering rare spellings of locations, people and organizations involved in the Holocaust.
Ontotext’s news showcase NOW.ontotext.com has been accumulating semantically annotated news for more than a year, linking them to a rich collection of Linked Open Data like DBpedia, Geonames, etc. Think of 120 thousand news, linked with 7 million tags to a Knowledge Graph of 500 million statements, describing more than 7 million entities and resources.
On average, there are 70 annotations of news with identifiers of specific concepts in the Knowledge Graph. The integrated resources reveal subtle and rich information, suffice it to know SPARQL well enough! And if you don’t, it’s definitely worth learning!
With a SPARQL query and some normalization of frequencies using z-scores, one can find the most relevant entity mentions for the day in the news. Here’s what’s going on in the news between Feb 14th and Feb 20th, 2016 (the picture below). With a proper UI, all these entities would be clickable and leading to news and DBpedia articles.
This platform will be released for public access and experimentation during the March 24 webinar “Boost Your Data Analytics with Open Data and Public News Content”.
We have been writing SPARQL queries for a long time in Ontotext. It’s about time we analyze them, too. We love the recursive logic here: let’s data mine our data mining tools!
GraphDB gives us access to log files that represent submitted queries and their parse trees. We wanted to visualize the population of queries and to cluster them. So we used a trick: we represented SPARQL statements as protein amino acids in order to take advantage of so many algorithms for computing phylogenetic trees.
So, we had our own population of queries, with their sisters and cousins and distant relativеs. On a practical note, this kind of tooling can help our support team to faster analyze the logs of our enterprise clients’ production systems and to be much more efficient in identifying problematic query patterns.
Ontotext’s semantic analysis and search technologies are in a process of diversification. One important direction is to offer multi-language support. We already have solid experience with German, Italian, Dutch, Bulgarian and French.
This project took the necessary steps towards publishing Ontotext’s NOW.ontotext.com semantic news showcase in Bulgarian. So far, it only contains news from one big Bulgarian online publisher (a client of ours) and it is not public. Here’s an early preview:
So, the road ahead is full of exciting new opportunities to prove the value of embracing cognitive and semantic technology!
Ready to join us?