Project Equality in Leadership for Latin America STEM (ELLAS) Uses GraphDB to Help Fight Gender Inequality in Leadership Positions

GraphDB powers the ELLAS open data platform that collects, integrates, centralizes, and visualizes data about women in STEM in Latin America.

  • Centralization of data from different countries in one place
  • Easy visualization for decision makers
  • Ability to evaluate the results of existing policies
  • Improved visibility of policies and initiatives in the context of women in STEM

The Goal

The Latin American Open Data for Gender Equality Policies Focusing on Leadership in STEM project (original name in English), known by the acronym ELLAS (Equality in Leadership for Latin American STEM), wanted to develop a multi-country open data platform. The platform’s task was to provide and encourage the use of accurate data for understanding the deep roots of gender disparities, evaluating policies, implementing evidence-based interventions, and promoting accountability and transparency.

The overall aim of the project was to reduce the gender gap in Science, Technology, Engineering and Mathematics (STEM), especially for gender inequality in leadership in universities, industries, and public institutions.

The Challenge

The main challenges the ELLAS project faced were both technical and managerial. 

For a start, it was difficult to create uniform processes for data collection, cleaning, and search for countries with different languages, vocabularies, transparency policies, and data collection and update granularity. In addition to all these differences, finding data sources of sufficient quality was not trivial. 

Another challenge came from the project’s aspiration to make it easy for decision makers, responsible for creating public policies, to interact with the platform without prior technical knowledge. The team was aware that requiring knowledge of a specific language like SPARQL to access the data, for example, would impede the wide use of the platform.

Last but not least, the project operated in a demanding managerial environment. Because of the enormity of the task, there were multiple teams responsible for different parts of the process as well as for training other team members. Also, the multidisciplinary nature of the project called for experts from different areas such as Computing, Education, Social Sciences, Business, Economics, and Psychology. The project still needs a lot of research on protocols and methodologies to understand how to better integrate technical knowledge of connected data with domain knowledge.

The Solution: A platform of connected open data to help reduce the gender gap for women in STEM fields in Latin America

The team chose to use knowledge graph technology for the ELLAS platform and in particular Ontotext’s RDF database for knowledge graphs GraphDB. It is a web platform created to help integrate and centralize data on the female presence in STEM careers in Latin America. At the moment, the data comprises the realities of Brazil, Bolivia, and Peru but the goal is to include other countries.

Some of the important features of the ELLAS platform include:

  • Collecting data from academic papers, gray (unofficial) literature, and surveys – structuring and curation processes for data integration from Brazil, Bolivia, and Peru. The data was originally present only in scientific articles and unofficial documents.
  • Collecting structured data from online open data sources – search, collection, and integration of data from open data platforms from Brazil, Bolivia, and Peru with automated collection of data available in non-RDF format.
  • Pipelines for creating the knowledge graph – building the knowledge graph by mapping the data from spreadsheets to the ontology model using Ontotext Refine. This pipeline orchestrates inserting the data into GraphDB by using Python and Pentaho.
  • User interface for non-technical users to facilitate their interaction with the knowledge graph.
  • Aggregating new data sources through quality checks using SHACL.
Diagram showing the process of data collection and integration for STEM-related research using Ontotext Refine and GraphDB. On the left, data from academic papers, gray literature, and surveys is collected. On the right, structured data from online open data sources is gathered. Both data types are processed by Ontotext Refine, then visualized in GraphDB, ultimately displayed in a user interface for ELLAS (Emerging Latent Learning Analytics Systems).

Figure 1: Summary view of the data collection and integration process in a knowledge graph by using Ontotext Refine and GraphDB until it reaches the user

As the ELLAS platform had to ingest both structured and unstructured data from various sources in the 3 countries, the process was divided into smaller tasks. Some teams were assigned to collect, select, and create spreadsheets with the data. Another team was responsible for curating the data before integraing it into the knowledge graph. 

The main data sources for the ELLAS knowledge graph come from academic publications as well as open data sources such as UNESCO Core Data Portal, National Institute of Applied Studies and Research (INEP) – Brazil, Sistemas Información Universitaria (TUNI) – Peru, Comite Ejecutivo de la Universidad Boliviana (CEUB) – Bolivia, and other sources. In total, there are more than 40 sources used for data extraction.

While the ELLAS platform is still a work in progress, with a planned launch date of February 2025, the processes for data collection, curation, and integration have already been developed and have been successfully implemented.

Business Benefits

  • Centralization of data coming from different countries 
  • Ability to evaluate the results of existing policies
  • Visibility of policies and initiatives otherwise not recognized in the context of women in STEM
  • Easy interaction with the ELLAS knowledge graph for users without prior technical knowledge

Why Choose Ontotext

“The scalable architecture for developing real applications with knowledge graphs and a simple and friendly configuration interface are the main reasons for using GraphDB. The ease of integration with tools such as Python and the quick learning curve are also essential for social impact projects. This is especially true for projects that interact with universities, which usually do not have large financial resources but have real social significance.”

Rodgers Fritoli (researcher) and Prof. Rita C. G. Berardi, Ph.D (supervisor)

Do you think this case resembles your particular needs?

Contact Us Now