Learn about Hercule: a platform developed by Ontotext to help journalists detect emerging news topics, check their veracity, track events, etc.
Over the past year, one issue has been dominating both the social media and the political realms: fake news. Twitter and Facebook users have been showered with all kinds of sensational reports, images and links claiming one breaking news scoop after the other.
The vast and fast increase of fake news – often shared and spread via social media – has the potential to alter people’s perception or confirm their beliefs in one issue or another. Sensational fake news posts have often trumped credible, excellently-sourced and critical coverage of various topics, as users are often lost among the heaps of conflicting and contradictory information they are being exposed to every day.
At the end of March, researchers at the Oxford University published a report, which found that in the ten days leading up to the U.S. presidential election, Twitter users based in the battleground state of Michigan shared as many links to fake news (or as the researchers called it ‘junk news’) as they shared links to news by reputable professional news organizations.
“The number of links to junk news alone is roughly equivalent to the number of links to professionally researched journalism,” the Oxford University researchers said in their report that had examined the Twitter behavior of nearly 140,000 Michigan-based potential voters.
Prior to the election, polls had predicted a win for the Democratic candidate Hillary Clinton in the state. Donald Trump won the state by the narrowest of margins to become the first Republican candidate to win Michigan since 1988.
Before and after the U.S. election, and a few months earlier at the time of Brexit, the issue of fake news was dominant in the global and regional digital and political debates. It still is.
People who want to look with a critical eye at the information they come across are searching for ways to detect which news are fake, which sources to trust, and to what extent they can believe the credibility of the original source of the news.
Before ‘fake news’ became the latest buzzword, we at Ontotext had started working in January 2014 alongside eight other partners on an EU-funded project aimed at creating a computational framework for automatic discovery and verification of information at scale and fast.
Project PHEME – Computing Veracity Across Media, Languages, and Social Networks, which launched in January 2014 and finished on March 31, 2017, focused on modeling, identifying, and verifying phemes (internet memes with added truthfulness or deception), as they spread across media, languages, and social networks. Aptly named after the goddess of rumors and fame in Greek mythology, Pheme, the project was aimed at developing a smart way to alert users to rumors and misinformation.
As a partner in the PHEME project, one of Ontotext’s main contributions has been its semantic graph database GraphDB, which served as a semantic repository with scalable lightweight reasoning. Datasets from the Linked Open Data (LOD) cloud such as FactForge, DBpedia, OpenCyc, and Linked Life Data were used as the factual knowledge sources.
Another major contribution has been to develop an algorithm for rumor classification, which tells users whether a tweet is a rumor or not. However, it is important to point out that the opposite of a rumor is not a fact. Click To Tweet Some tweets, such as “I had a beer in the park” for example, are not rumors, but neither are they really claims that need to be proven. By classifying tweets into rumor/not rumor with some probability (from 1 to 10), the algorithm aims to identify the ones that are intended to spread a rumor.
Using its extensive experience in text analytics, Ontotext has also provided concept extraction and enrichment. By interlinking people, organizations and locations from unchecked social-media streams to the rich contextual information about them in our knowledge base, we know who these people are, which organizations they are associated with, and where they are based. This helps us recognize whether something is a rumor or not.
As part of the project, Ontotext and partners also developed an open-source digital journalism prototype that aims to harness the systems being developed within PHEME and present them in a dashboard geared specifically at journalists looking to quickly locate and verify information online.
In addition, Ontotext and its project partners developed the fact-checking assistant Hercule, a web-based portal that aims to help journalists with the daily tasks of sorting and retrieving newsworthy pieces of information from Twitter. With the help of the PHEME named-entity recognition and resolution tools (linking objects to the respective concept in Linked Open Data), and the application of high-confidence classifiers for rumor detection, veracity calculation and “check-worthiness” calculation, each tweet is enriched with new features and concepts.
The tweets are grouped into stories based on their similarity. Each story can be visualized together with the concepts mentioned in the individual tweets (for example, names of persons, organizations, and locations) and the news articles related to it.
These features provide a greater context to the story, thereby facilitating the verification of claims on the social network. Users can explore each related concept and news article with a single click. In this way, they can quickly get the information they need in order to fact-check the contents of a tweet. They can also view trends in the information about different concepts to see how often the media has mentioned them.
So, semantic technologies and machine algorithms help social media users and journalists to quickly check to what extent they can trust a post on social networks or a piece of news. Technologies help with analysis and fact-checking and tell us how a piece of shared content has been referred to, to what extent the information in it has been verified, and if the same or similar information can be found in highly-reliable reputable sources.
Technologies save users a lot of the time that they would otherwise spend fact-checking a post in the sea of newly generated content. If it takes you more than 5 minutes to google keywords from an unconfirmed report in order to pull some stories related to the people or organizations mentioned in the post you are fact-checking, chances are that you’ll give up trying because it’s frustrating and takes a lot of your valuable time. Semantic technology comes to the rescue here with referencing the story, providing more information on the topic, and pointing to relevant related stories.
On top of sowing fear and confusion, the widespread sharing of fake news lowers people’s trust in the media and only by finding a way to fight it, the media can hope to gain back their users’ credibility.
Although there is no universal remedy for eradicating the spreading of fake news once and for all, semantic technology and Ontotext help the fight against its proliferation in an increasingly divisive society by trying to foster a culture of looking critically at any news report and sensational post in the social media. Helping people stay informed and sharpen their critical judgment skills is the beginning of the pursuit of truth in our post-truth world.