Ontotext’s Perspective on an Energy Knowledge Graph

In this post you learn what existing energy data exchange standards currently lack, how they can be advanced with semantic technology and what happens when some energy data is integrated in a knowledge graph

January 7, 2022 9 mins. read Vladimir AlexievGergana PetkovaGergana Petkova

In our post Electrical Standards, Smart Grids and Your Air Conditioner, we talked about why accessing timely, relevant and reliable information is mission-critical to the European electricity grid and the Single energy market. We looked at the major challenges of electrical grids, the main standards guiding electricity data exchange and how these challenges could be addressed by applying Linked Data principles and knowledge graph (KG) technology to electricity data.

In the last few years, the EU has been actively promoting development of Data Spaces for industrial data exchange in key funding programs such as Horizon Europe and the brand new Digital Europe. This work is crucial for increasing industry digitalization, efficiency and competitiveness, counteracting the economic downturn from the COVID crisis as well as for realizing the European Green Deal by 2050. But for any serious digitalization to happen, we need to be able to apply AI and ML in every industry sector, and for that we need data. Simply put, AI without data is like a human surviving without oxygen.

New call-to-action


In compliance with the EU market transparency regulation (Regulation EU No 5 43/2013 of 14 June 2013 on submission and publication of data in electricity markets), ENTSO-E is doing a great job of collecting electricity market data (generation, transmission, consumption, balancing, congestion, outages, etc.) in its Transparency Platform. For example, Generation Forecast – Day Ahead must be submitted at the latest the previous day (D-1) at 18:00 Brussels time according to regulation articles 14.1.c and 14.2.c.

However, there is a huge difference between a collection of disconnected data points and an integrated, interoperable, dynamic database that is managed and accessed via a live knowledge graph.

RDF XML Messages Vs Knowledge Graphs

As we have already discussed in our previous post, using persistent identifiers for electricity data is a necessity for building an Energy Knowledge Graph and on this CIM still has a long way to go to fulfill its semantic potential. In particular, we believe that CIM should leverage:

  • Linked Data principles, i.e., basing data sharing on distributed semantic knowledge graphs where electricity related data is available on demand and is as up to date as its master source/originator organization keeps it, rather than mere RDF XML message exchange  that may become obsolete the moment a message is received;
  • Persistent identifiers and a standardized namespace for all energy resources and market participants, rather than blank nodes or temporary Globally Unique Identifiers (GUIDs). CIM offers an object registry mechanism to express the mapping between different coding schemes used at national, regional or pan-European level, but this mapping is not available globally;
  • Standard RDF serialization formats (in particular JSON-LD), rather than a customized RDF XML format;

Ontotext has recently won cascade funding under the EU project INTERRFACE, and we started the Transparency Energy Knowledge Graph project. It creates a KG  from ENTSO-E’s Transparency data. As part of proposal writing, we made a small Transparency Energy KG (see a diagram of the architecture below, and you can click on the image for full size). We continue work on the project and will have a final version in the middle of 2022.

Semantic graph databases are excellent for representing “master data” in any domain as they can integrate heterogeneous data from many sources and can make links between datasets. They also focus on the relationships between entities and can infer new knowledge out of existing information. That is why we have used GraphDB, Ontotext Platform and our significant expertise in semantic data integration to show how we can improve the quality of ENTSO-E Transparency data and develop flexible analytics by leveraging the knowledge graph approach

Let’s take a closer look.

Ontotext’s Transparency Energy Knowledge Graph

At this point, we have converted a small part of ENTSO-E Transparency data (4 out of 85 datasets) and have integrated it into our Transparency Energy Knowledge Graph. This part includes the Energy Identification Code (EIC) file, lookup codes, a knowledge base of data items, and configured Production Units (i.e. generation capacity).

Below is a diagram that provides an example of this data. It shows Bulgaria as biddingZone and controlArea, ESO as Bulgaria’s Transmission System Operator (TSO) and providerParticipant, the NPP KOZLODUY company as responsibleParticipant, the Kozloduy NPP nuclear power station (a Production Unit), G9 and G10 (its two Generation Units), highVoltageLimit and nominalP of these units, assetType (Nuclear fuel), CodeValues and CodeLists.

As you can see (click on the image for full size), pieces of data coming from different sources are depicted in different colors and marked with different letters in a circle:

  • In red (E) is baseline EIC data that describes market players and major assets (power plants, generation units, transmission lines, etc):
    • “32XNPP-KOZLODUY2” represents NPP Kozloduy, the company responsible for Bulgaria’s nuclear power plant (NPP), with EU VAT BG106513772, which is a Balance Responsible Party.
    • “32W001100100217D” represents NPP KOZLODUY, which is a nuclear power plant.
    • “32W001100100017L” represents NPP KOZLODUY UNIT 6 GEN 10, which is a nuclear power reactor.
  • In light-blue (P) is the PRODUCTION_UNIT data: Kozloduy is a nuclear power station with two reactors (generation units). It has nominal power (production capacity) of 2000 MW, each unit having 1000 MW.
  • In green (C) are code lists, which show, for example, that KVT/kV is the symbol for kilovolts, MW/MAW is the symbol for megawatt and both are part of the code list for symbols of units of measure.

Spotting Data Consistency Issues

We integrated only a bit of Transparency data, but have already found various data quality issues. This is possible because by semantically interlinking different types of data coming from different sources, we can look at the bigger picture and can easily see problems in the data.

Now, if you go back to the diagram above, you can see a data inconsistency. Namely, the parentResource link of NPP Kozloduy G10 doesn’t point to NPP Kozloduy but to TPP MARITSA EAST 2. Although the blue (PRODUCTION_UNIT) file correctly shows the parent to be NPP Kozloduy (the generatingResource link in inverse direction), the red (EIC) file has a mistake in the field eICParent_MarketDocument.mRID:

Another example of ENTSO-E Transparency data inconsistency is that not all GenerationUnits are in the same country as their parent ProductionUnit, as you can see from the following SPARQL query:

which returns one data inconsistency:

Keep in mind that this query doesn’t check for missing country codes, of which there are plenty.

There are many other examples of ENTSO-E Transparency data inconsistencies, but we’ll mention only a few more:

  • Some EIC resources are described multiple times and entries referring to the same resource have different descriptions;
  • Some resources have a null or invalid “function”, which is a key field describing the nature and market role of the resource;
  • Some resource functions have variations in spelling (e.g. “Power Unit” vs “Power Plant”);
  • Some VAT numbers of market participants are syntactically invalid or expired;

And this is what transpired by integrating only a tiny bit of ENTSO-E Transparency data in a knowledge graph. As part of our work on the Transparency EKG project, we will integrate and interlink more data in such a semantic model, add more data quality checks and create a data quality dashboard. Then spotting errors, inconsistencies and other problems will be straightforward and painless. This could lead to tightening the Transparency data regulations and their implementation, improving the quality of collected data significantly, and providing the foundation for a better Energy Knowledge Graph in the future.

Semantically Integrated Data for Hybrid Access and Advanced Analytics

Another important point is that because ENTSO-E Transparency Platform offers data as disconnected CSV or XML messages, it is next to impossible to get a complete picture of the current state of the energy system. For a holistic view, one needs to fetch the latest XML messages about certain data items and then interconnect them in some way.

For example, there is an XML message showing that Kozloduy uses B14 as fuel, but you need to go to another XML message to find out that B14 stands for “nuclear fuel”. Whereas when you access this data as a knowledge graph, all you need to do is follow the relationships and you will get all the information you require.

Ontotext’s Transparency EKG will provide hybrid access to data through the SPARQL and GraphQL query languages. Each of these languages has its benefits and uses. It will also implement automatic data flows between best-in-class platforms: GraphDB for semantic storage and querying, Elasticsearch for full-text and faceted search, Kibana for visualizations and analytics.

A 360-degree view on high quality energy data will enable advanced analytics, providing a sound basis for deeper insights and better decision-making by market players. It will also empower the creation of new energy market analytics products, beyond the canned charts and reports available at ENTSO-E Transparency Platform. Kibana is a very powerful tool for making analytics that is part of the Elastic stack.

  • It’s easy to make individual charts, put them on dashboards, manage user rights and subscriptions, etc.
  • Kibana is especially strong for visualizing time-series data, as its original domain was log file analysis.

We’ve made a simple demo Kibana dashboard with 11 charts:

(You can go to SPARQL Analytical Queries, GraphQL Queries, Kibana Analytics and Faceted Search to see some more examples).

To Sum It Up

Leveraging Linked Data principles for CIM and Transparency electricity data can make it more comprehensible, consistent, interlinked and timely (on-demand). Having interoperable data managed and accessed via a live knowledge graph offers endless possibilities for the EU’s electricity market and for the exchange of master data of energy resources in general.

By building Ontotext’s Transparency Energy Knowledge Graph, we will demonstrate the benefits of the semantic approach and pave the way for similar future projects. Improving data quality and enabling advanced analytics does not only provide a foundation for a better Energy KG. Having data fit for use in operations, decision-making and planning could also bring major technological benefits for the energy sector and will significantly improve cross-industry data exchange.

Do you want to learn how Ontotext’s knowledge graph-based technology can help in your particular use case?


New call-to-action

Article's content

Chief Data Architect at Ontotext

Vladimir’s passion is data modelling, ontologies and data representation standards. He is a member of the DBpedia and Europeana quality committees, and frequent speaker at conferences and events. His favourite topics are Linked Open Data and its application in cultural heritage and digital humanities.

Gergana Petkova

Gergana Petkova

Marketing Content Manager at Ontotext

Gergana Petkova is a philologist and has more than 10 years of experience at Ontotext, working on technical documentation, Gold Standard corpus curation and preparing content about Semantic Technology and Ontotext's offerings.