Read about how applying Linked Data principles and semantic technology to electricity data can make for a more efficient, reliable and sustainable electricity market.
In our post Electrical Standards, Smart Grids and Your Air Conditioner, we talked about why accessing timely, relevant and reliable information is mission-critical to the European electricity grid and the Single energy market. We looked at the major challenges of electrical grids, the main standards guiding electricity data exchange and how these challenges could be addressed by applying Linked Data principles and knowledge graph (KG) technology to electricity data.
In the last few years, the EU has been actively promoting development of Data Spaces for industrial data exchange in key funding programs such as Horizon Europe and the brand new Digital Europe. This work is crucial for increasing industry digitalization, efficiency and competitiveness, counteracting the economic downturn from the COVID crisis as well as for realizing the European Green Deal by 2050. But for any serious digitalization to happen, we need to be able to apply AI and ML in every industry sector, and for that we need data. Simply put, AI without data is like a human surviving without oxygen.
In compliance with the EU market transparency regulation (Regulation EU No 5 43/2013 of 14 June 2013 on submission and publication of data in electricity markets), ENTSO-E is doing a great job of collecting electricity market data (generation, transmission, consumption, balancing, congestion, outages, etc.) in its Transparency Platform. For example, Generation Forecast – Day Ahead must be submitted at the latest the previous day (D-1) at 18:00 Brussels time according to regulation articles 14.1.c and 14.2.c.
However, there is a huge difference between a collection of disconnected data points and an integrated, interoperable, dynamic database that is managed and accessed via a live knowledge graph.
As we have already discussed in our previous post, using persistent identifiers for electricity data is a necessity for building an Energy Knowledge Graph and on this CIM still has a long way to go to fulfill its semantic potential. In particular, we believe that CIM should leverage:
Ontotext has recently won cascade funding under the EU project INTERRFACE, and we started the Transparency Energy Knowledge Graph project. It creates a KG from ENTSO-E’s Transparency data. As part of proposal writing, we made a small Transparency Energy KG (see a diagram of the architecture below, and you can click on the image for full size). We continue work on the project and will have a final version in the middle of 2022.
Semantic graph databases are excellent for representing “master data” in any domain as they can integrate heterogeneous data from many sources and can make links between datasets. They also focus on the relationships between entities and can infer new knowledge out of existing information. That is why we have used GraphDB, Ontotext Platform and our significant expertise in semantic data integration to show how we can improve the quality of ENTSO-E Transparency data and develop flexible analytics by leveraging the knowledge graph approach
Let’s take a closer look.
At this point, we have converted a small part of ENTSO-E Transparency data (4 out of 85 datasets) and have integrated it into our Transparency Energy Knowledge Graph. This part includes the Energy Identification Code (EIC) file, lookup codes, a knowledge base of data items, and configured Production Units (i.e. generation capacity).
Below is a diagram that provides an example of this data. It shows Bulgaria as biddingZone and controlArea, ESO as Bulgaria’s Transmission System Operator (TSO) and providerParticipant, the NPP KOZLODUY company as responsibleParticipant, the Kozloduy NPP nuclear power station (a Production Unit), G9 and G10 (its two Generation Units), highVoltageLimit and nominalP of these units, assetType (Nuclear fuel), CodeValues and CodeLists.
As you can see (click on the image for full size), pieces of data coming from different sources are depicted in different colors and marked with different letters in a circle:
We integrated only a bit of Transparency data, but have already found various data quality issues. This is possible because by semantically interlinking different types of data coming from different sources, we can look at the bigger picture and can easily see problems in the data.
Now, if you go back to the diagram above, you can see a data inconsistency. Namely, the parentResource link of NPP Kozloduy G10 doesn’t point to NPP Kozloduy but to TPP MARITSA EAST 2. Although the blue (PRODUCTION_UNIT) file correctly shows the parent to be NPP Kozloduy (the generatingResource link in inverse direction), the red (EIC) file has a mistake in the field eICParent_MarketDocument.mRID:
Another example of ENTSO-E Transparency data inconsistency is that not all GenerationUnits are in the same country as their parent ProductionUnit, as you can see from the following SPARQL query:
which returns one data inconsistency:
Keep in mind that this query doesn’t check for missing country codes, of which there are plenty.
There are many other examples of ENTSO-E Transparency data inconsistencies, but we’ll mention only a few more:
And this is what transpired by integrating only a tiny bit of ENTSO-E Transparency data in a knowledge graph. As part of our work on the Transparency EKG project, we will integrate and interlink more data in such a semantic model, add more data quality checks and create a data quality dashboard. Then spotting errors, inconsistencies and other problems will be straightforward and painless. This could lead to tightening the Transparency data regulations and their implementation, improving the quality of collected data significantly, and providing the foundation for a better Energy Knowledge Graph in the future.
Another important point is that because ENTSO-E Transparency Platform offers data as disconnected CSV or XML messages, it is next to impossible to get a complete picture of the current state of the energy system. For a holistic view, one needs to fetch the latest XML messages about certain data items and then interconnect them in some way.
For example, there is an XML message showing that Kozloduy uses B14 as fuel, but you need to go to another XML message to find out that B14 stands for “nuclear fuel”. Whereas when you access this data as a knowledge graph, all you need to do is follow the relationships and you will get all the information you require.
Ontotext’s Transparency EKG will provide hybrid access to data through the SPARQL and GraphQL query languages. Each of these languages has its benefits and uses. It will also implement automatic data flows between best-in-class platforms: GraphDB for semantic storage and querying, Elasticsearch for full-text and faceted search, Kibana for visualizations and analytics.
A 360-degree view on high quality energy data will enable advanced analytics, providing a sound basis for deeper insights and better decision-making by market players. It will also empower the creation of new energy market analytics products, beyond the canned charts and reports available at ENTSO-E Transparency Platform. Kibana is a very powerful tool for making analytics that is part of the Elastic stack.
We’ve made a simple demo Kibana dashboard with 11 charts:
(You can go to SPARQL Analytical Queries, GraphQL Queries, Kibana Analytics and Faceted Search to see some more examples).
Leveraging Linked Data principles for CIM and Transparency electricity data can make it more comprehensible, consistent, interlinked and timely (on-demand). Having interoperable data managed and accessed via a live knowledge graph offers endless possibilities for the EU’s electricity market and for the exchange of master data of energy resources in general.
By building Ontotext’s Transparency Energy Knowledge Graph, we will demonstrate the benefits of the semantic approach and pave the way for similar future projects. Improving data quality and enabling advanced analytics does not only provide a foundation for a better Energy KG. Having data fit for use in operations, decision-making and planning could also bring major technological benefits for the energy sector and will significantly improve cross-industry data exchange.