Read about how knowledge graphs help data preparation for analysis tasks and enables contextual awareness and smart search of data by virtue of formal semantics.
A consistent electoral process is at the foundation of every modern democracy. Public trust in this process and its fairness is among the most important ingredients of the social capital that makes the democracy function effectively. For these reasons, publishing the data related to elections is obligatory for all EU member states under Directive 2003/98/EC on the re-use of public sector information and the Bulgarian Central Elections Committee (CEC) has released a complete export of every election database since 2011.
The data shared by the CEC represents the election process at the most granular level. It is a collection of the digital version of the polling station protocols. Each protocol contains the results from the counting of a single ballot box and lists all vote counts, who the votes were for as well as various data points concerning the particular polling station such as its geography, the particular election cycle it concerns, the number of voters, the number of invalid ballots cast, etc. In parallel, of course, the CEC shares the names and numbers of the candidates and their parties.
While we, as citizens, may be accustomed to hearing about the end results of a given election and (hopefully) to actually voting, little do we know about the actual complexity of the process on a national level. Without appropriate data publishing and exploration platforms, it is too difficult to comprehend the different levels of aggregation the votes go through or how they are articulated with the administrative territorial divisions of the country. Not being able to co-relate their votes to the final results and to understand the data, some people can get the perception of a lack of transparency, which erodes the trust in the fairness of the elections.
Although the data is comprehensive, it is difficult to process due to various reasons:
Furthermore, the format of the export and process changes slightly from election to election, making comparing data chronologically almost impossible without substantial data wrangling and ad-hoc cleaning and matching.
For these reasons, we have applied semantic data integration and produced a coherent knowledge graph covering all Bulgarian elections from 2013 to the present day. These are the counts of the main entities in the graph:
These diagrams illustrate how the data is structured (click to zoom):
The data in the knowledge graph is harmonized along the most important dimensions:
The data is publicly available as a SPARQL endpoint at https://elections.ontotext.com/. In the back-end the data is hosted in Ontotext’s GraphDB engine. One can explore that data in GraphDB Workbench using its search, graph traversal and visualization facilities. A set of of sample queries is provided to help the understanding of the data model and shorten the learning curve.
In the future, besides adding data about new election cycles as it becomes available, we are going to work to have even finer-grained geographical information for the polling stations as well as deduplicating the individual candidates, matching and uploading them to the LOD cloud.
As we are not political experts, it is not our ambition to interpret the data and draw conclusions. However, by providing 5-star Linked Open Elections Data, we want this resource to become a go-to source of data about the electoral activity in Bulgaria and to ultimately become a tool that strengthens the public’s trust in the democratic process.
See for yourself!