AstraZeneca: Early Hypotheses Testing Through Linked Data

As part of LarKC project, AstraZaneca and Ontotext collaborated to build a large knowledge graph to support building of a holistic view in the field of translational medicine. The resulting graph integrates various categories of information into a unified, explorable network of knowledge. The built in causal relations ontology allows exploration of distant (indirect) relations between objects that are not obvious in single data source and thus the platform forms the foundation for early hypotheses testing in the life sciences.

AstraZeneca is a global research-based biopharmaceutical company with skills and resources focused on discovering, developing and marketing of medicine for some of the world’s most serious illnesses including cancer, heart disease, neurological disorders, respiratory disease and infection.

The Goal

AstraZeneca aimed to develop a platform for Interactive Relationship Discovery to enable the identification of long causal relationship chains between the biomedical objects in the Linked Life Data cloud. The industry-specific platform was to be used for early hypothesis testing, which requires identifying direct and non-direct relationships between biomedical entities and suggesting possible mechanisms that usually remain hidden.

To facilitate the process of relationship discovery, the platform needed to provide an easy and intuitive tool that would allow the researchers to interactively mine and explore the causal relations.

The Challenge

In the pharmaceutical research and discovery process, success is highly dependent on the availability and accessibility of high-quality research data. The quality of the data can be assessed by its accuracy, correctness, completeness, currency and relevance. The accuracy and the correctness of data are purely defined by the methods used to generate the data. However, the latter three – completeness, currency and relevance – could be determined partially or completely by an effective semantic data integration approach, which:

  • aggregates all relevant information;
  • removes redundancy and ambiguities in the data;
  • interlinks the related entities.

Researchers gather information from a broad range of biomedical data sources in an iterative way in order to generate or expand a certain theory, to test hypotheses, to make informed assertions about which relationships are causal and about exactly how they are causal. They need a mechanism that would allow them to mine all data scattered among different relevant resources and to identify visible (direct) and invisible (distant) relations between biomedical entities studied in the pharmaceutical research and discovery process.

The Solution: Linked Life Data Cloud

Semantic warehousing helps researchers get an overview of the existing relationships within scientific and clinical data by utilizing causality data mining. Linked Life Data is used as a platform for Interactive Relationship Discovery between biomedical entities as it:

  • integrates over 25 diverse data sources;
  • aligns the data to more than 17 different biomedical objects (genes, proteins, molecular functions, biological processes/pathways, molecular interactions, cell localization, organisms, organs/tissues, cell lines, cell types, diseases, symptoms, drugs, drug side effects, small chemical compounds, clinical trials, scientific publications, etc.);
  • identifies explicit relationships between entities locked in the original data sets and categorizes them to a causality relationship ontology;
  • mines unstructured data in order to identify relationships hidden within the text (inclusion/exclusion criteria for clinical studies).


Since the entities in Linked Life Data are usually strongly interlinked, in most cases the approach for simply crawling/querying the repository for relationships and listing them is not sufficient. That’s why Linked Life Data also provides a user-centered process and interactive tools for assisting the discovery of even very large numbers of causal relations.

relfinder_0-2Business Benefits

  • Efficiently get an overview of the identified causal relationships between biomedical objects;
  • Interactively explore these relationships;
  • Easily spot and separate relationships that are of relevance in a certain use case.

Why Choose Ontotext?

With Ontotext’s Linked Life Data platform, researchers at AstraZeneca can increase their efficiency and cut time and resources on exploring relationships. As a result, the biopharmaceutical company can now quickly resolve uncertainties about the early development of drugs with the help of data-driven testing of hypotheses.

Do you think this case resembles your particular needs?

New call-to-action

Contact Us Now