Q&As from Our Webinar: Graph Analytics on Company Data and News

March 29, 2018 6 mins. read Milen Yankulov


On March 14, 2018, Atanas Kiryakov, CEO of Ontotext, presented Graph Analytics on Company Data and News. This webinar, now available on-demand, highlighted some of the challenges in analyzing diverse data from multiple sources. Atanas demonstrated how graph analytics on top of Ontotext’s Knowledge Graph was able to provide entity awareness about People, Organizations and Locations (POL), which is part of the solution that Ontotext provides to overcome such challenges.

Ontotext’s Knowledge Graph (loaded with about 2 billion triples in Ontotext’s GraphDB) combines several open data sources. It is mapped to the FIBO ontology and its entities are interlinked to 1 million news articles.

Download Ontotext' GraphDB!


Q & A from the Webinar

During the webinar, Atanas showed the power of cognitive graph analytics to create links between various datasets and lead to knowledge discovery.

Some really interesting technical questions were raised by the audience about the workings of FactForge (Ontotext’s public service for free access to POL data) and some of the intricacies behind knowledge graph analytics:

Q: Can you please say a bit more about FactForge? Is FactForge a proprietary product of Ontotext? Is it a free or a paid-for service product?

A: Yes, FactForge is a proprietary product of Ontotext. It is a public demonstrator that everyone can use for free. As every public demonstrator, it has some limitations. In this case, the limitations are on the number of requests you can make per second and the size of the results that you can get from each of the queries.

FactForge is also commercially available and you can pay for access to it through our cognitive cloud services. So, yes, you can also get the production version.

Also, we often use selected pieces of FactForge in commercial projects. FactForge provides open data and free news content. When we use it in a specific project, it is often complemented with commercial sources of company data or other data, needed to guarantee the minimum requirements for coverage and quality.

Q: Is there a comprehensive list of sources that FactForge links to?

A: Yes, you can see them in the About page of FactForge.

Q: How do you update the knowledge graph in FactForge?

A: We have an automated procedure for ingestion of the updated versions of the different datasets that we use such as DBpedia or GeoNames. But what is more interesting is that we have a constant feed of news that we use for picking up new entities and new relationships. This data provides hints about how we can enrich the knowledge graph. So, it is regularly updated through loading newer versions of all these datasets and through the information we derive from news.

Q: Can FactForge extract new relations from texts?

A: The short answer is, yes.

When you do this, you end up with plenty of candidate-relationships, for which you have relatively low confidence – think of 70%. So, you will need to figure out how to distill which of these relations are to be trusted. You will need to decide how “candidate-relationships” are consolidated because often one and the same relationship in the real world is expressed in the text with plenty of variations. Also, above what level of importance you want to make them “first-class citizens” in the knowledge graph. It requires a bit of filtering, but, yes, we do extract relationships from texts.

Q: Which countries are covered in FactForge and what are the new source language distributions?

A: In this demonstrator, we have news with global coverage. But these are just news in English. We have deployed this technology also for Dutch, German, Russian, Bulgarian. For quite a number of languages, actually. But the demonstrator itself is just in English.

Q: Is it possible to train based on a corpus instead of Linked Open Data?

A: Yes, we experiment in combining word embedding techniques with knowledge graph analytics. So, yes, you can train a corpus and combine modern analytic techniques based on text with Linked Open Data.

Q: How do you decide which value is the identifier when disambiguating entities?

A: Wow. That’s the secret sauce. Basically, we use all the information that we have about the entities in the knowledge graph. For instance, if we have “Paris” as a string in the text, first we check which of the candidates such as Paris in France, or Paris in Texas, or Paris Hilton, is compatible with this reference as a type. Most of the time, the document context “tells you” whether it is a person, an organization or a location.

If the document context helps us figure out that this is a location, then it cannot be Paris Hilton. After that, we have to figure out whether it is Paris in France, or Paris in Texas, or some other Paris. We do this by comparing the semantic fingerprint of the document with the semantic fingerprint of each of the candidates. What we call a semantic fingerprint is a sort of profile that represents context; one can also call it embedding. And if this doesn’t help, we will consider popularity, importance, etc. So, these are essentially the methods that we have in our arsenal for disambiguation. For specific projects and tasks, we use different combinations.

Q: How is the relevance ratio calculated?

A: The baseline is similar to TF.IDF – term frequency-inverse document frequency. One can also augment the relevance score by taking into consideration the similarity between the document fingerprint and the fingerprint of the entity. In other words, this is comparing the information that we have in the knowledge graph about an entity and the information about it in the document. In this way, relevance is based not just on string statistics, but also on deeper knowledge about concepts and entities. Which also helps for personalized recommendation.

Q: How do you fuse two entities that are the same?

A: Essentially, to find the right match for a company from one source to a company in another source, first we have to find the likely candidates in the second source. This is called pre-selection. Then we evaluate each of these candidates and score them. For example, if one of the companies is registered in the US and the other is registered in Italy, this is strong evidence that they are not the same entity. If one of them is in the phone industry and the other is a bank, again, they are probably not the same entity.

The slides from this presentation are available on SlideShare and a recording of the presentation is available on demand.

Want to learn more about knowledge graphs and smart data analytics?

GraphDB Free Download
Ontotext’s GraphDB
Give it a try today!

Download Now

Article's content

Marketing Manager at Ontotext

Milen Yankulov has a vast experience in both traditional and digital marketing communications. His professional interests are related but not limited to Web and News Medias, Semantic Search and Social channels and all digital disruptions that change the way we communicate and do business.

Reflections on the Knowledge Graph Conference 2023

Read Milen Yankulov’s impressions from the conference, Ontotext positioning, the role of ML, AI & LLM in the graph space and more

Ontotext’s Top 5 Most Popular Blog Posts for 2020

Read about another busy year at Ontotext in our traditional round-up of the most popular blog posts we have published throughout 2020.

Johnson Controls Selects Ontotext’s GraphDB for the New Version of Metasys Building Automation System

Johnson Controls selected GraphDB to provide semantic data creation and management for their Metasys system – a Top-5 Integrated Building Management System.

The Importance of FAIR Data Principles in Healthcare & Life Sciences

Read about FAIR data principles – a relatively new concept for data discoverability and management that has quickly gained traction among the scientific data community and policymakers.

Boosting Cybersecurity Efficiency with Knowledge Graphs

Read about how a live knowledge graph helped a cybersecurity and defense company easily integrate new data sources and efficiently navigate their dynamically updated information.

Computer Vision Technology for Boosting Retailers’ Marketing & Product Management  

Read about how Ontotext’s customer demographic analysis solution, based on computer vision, helps retailers track and analyze customer traffic and behavior in stores.

Knowledge Graph Conference 2020 Recap: Knowledge Graphs Are Getting Into the Limelight

Read about KGC 2020 and how knowledge graphs-based technologies continue to advance into mainstream enterprise operations.

GraphDB Empowers Scientific Projects to Fight COVID-19 and Publish Knowledge Graphs

Read about COVID-19 related research projects, which are currently using Ontotext’s GraphDB.

Ontotext’s GraphDB Builds a Thriving Community of Expert Followers

Read about the thriving community GraphDB has generated over the years and the insights and experience they share in many blog posts and tutorials.

Ontotext Knowledge Graph Platform: The Modern Way of Building Smart Enterprise Applications

Read our article about Ontotext Platform, originally published in a special report “Empowering Machine Learning with Knowledge Graphs” by DBTA magazine.

How Pharma Companies Can Scale Up Their Knowledge Discovery with Semantic Similarity Search 

Read about how semantic similarity search helps Pharma companies efficiently process and answer large volumes of Regulatory Authorities’ questions.

How Computer Vision Technology Can Bring Smart Surveillance to Retail    

Read about how Computer Vision technology can provide efficient face recognition to identify known and potential offenders in retail stores.

Ontotext’s Graph Database Helps Create EU-Wide Company Business Graph

Read about the EU-funded project euBusinessGraph aiming to compile, integrate and analyze business data from various public and private sources.

Ontotext’s Most Popular Blog Posts for 2019

Read about another busy and exciting year at Ontotext in our traditional countdown of the most popular blog posts we have published in 2019.

Semantic Technology and the Strive for Drug Safety

Learn about Ontotext’s solution for tracking and collecting drug safety data, based on text analysis and knowledge graph technology.

Semantic Technology-based Media Publishing Boosts User Engagement

Read about how the more media publishers know about how users consume their content, the more relevant their content and ad recommendations will be.

Smart Analysis of Pharma Research Literature Makes Novel Therapy Identification Easier

Learn how knowledge graphs help discovering novel therapies by identifying new patterns and discovering previously unknown links between drugs and potential treatments.

Smart Negative News Monitoring Makes Banks’ KYC Process More Efficient

Read about how knowledge graph-based negative news monitoring, as part of a smart KYC process, provides a fully automated workflow for financial institutions and helps them comply with existing regulations and avoid reputational risk.

Semantic Search for Smart Data Discovery in the Pharma Industry

Read about how Ontotext’s smart semantic search solution enables users to easily find relevant information across huge volumes of siloed data-sources and get better knowledge insights from more efficient data management and discovery.

Top 5 Technology Trends to Track in 2019

Ontotext’s review of the top 5 technology trends as we expect to continue making their mark on the way companies gain faster and better insights.

Ontotext’s Top Webinars for 2018

Read on to see how Ontotext’s top webinars for 2018 helped businesses with knowledge discovery thanks to graph analytics and AI-powered services.

Ontotext’s Most Fascinating Blog Posts for 2018

Read about another busy and exciting year at Ontotext in our traditional round-up of the most fascinating blog posts we have published throughout 2018.

Ontotext’s GraphDB Powers UK Parliament’s New Data Service

Read about UK Parliament’s new data service and how it modernizes the way it consumes and shares data.

Q&As from Our Webinar: Graph Analytics on Company Data and News

Read some Q&As from our webinar: Graph Analytics on Company Data and News, presented by Atanas Kiryakov, CEO of Ontotext.

Top 5 Semantic Technology Trends to Track in 2018

As we are going into 2018, here is Ontotext’s list of the top 5 semantic technology trends to keep an eye on.

Your Favorite Ontotext Blog Posts for 2017

As we roll into the New Year 2018, our readability count distilled the following 5 favorite posts for 2017.