Read about how to build a knowledge graph the semantic data modeling way in 10 steps, provided by our knowledge graph technology experts.
Ever since Hippocrates founded his school of medicine in ancient Greece some 2,500 years ago, writes Hannah Fry in her book Hello World: Being Human in the Age of Algorithms, what has been fundamental to healthcare (as she calls it “the fight to keep us healthy”) was observation, experimentation and the analysis of data. The entire history and practice of modern medicine, argues Fry, is built on finding patterns in data.
Today, to the hunt for finding patterns, we can add a mighty army of iron soldiers who can help us in the fight for health. These are the so-called supercomputers, led by a smart legion of researchers and practitioners in the fields of data-driven knowledge discovery. Thanks to their might, now scientists and practitioners can develop innovative ways of collecting, storing, processing, and, ultimately, finding patterns in data.
If you, like me, have a penchant for cyberpunk, maybe all this brings to mind Wowbagger – Douglas Adam’s character who “after a period of total boredom, especially on Sunday afternoons, decided to insult everyone in the entire universe in alphabetical order”. Wowbagger used the supercomputer on his space ship to calculate the location of each and every soon-to-be insulted living creature. Alternatively, if you like the history of computing, another association that can come to mind is Licklinder’s Intergalactic Computer Network or the Galactic Network, envisioned in the 1960s.
Both associations are not far from the general idea behind supercomputers as they relate to connecting and working with massive amounts of data.
Supercomputers, also known under the umbrella term high-performance computing (HPC), are machines built to execute tasks that cannot be executed by general-purpose computers. Their main strength lies in the capability to solve a single large problem in the shortest possible time with the maximum computing power available. Such problems and the complexities related to such computationally-intensive tasks are essential in the fields of weather forecasting, molecular modeling, airplane and spacecraft aerodynamics, personalized medicine, self-driving cars.
To keep it simple, the idea behind HPC is to solve a problem by dividing it into chunks and planning how and in what sequence these chunks will be tackled in order to execute the task. The capacity and performance of supercomputers is measured with the so-called FLOPS (floating point operations per second). And since the early 1960s, when one of the first supercomputers – the Livermore Atomic Research Computer (LARC) – was built, until today when Facebook has already bought 26 supercomputers, the FLOPS have been rising.
As of 2017, the fastest computers have reached a speed of 93 PetaFLOPS, which is: 93×1015, or 93,000,000,000,000,000 operations per second. And just when we might have thought FLOPS had hit their limit, here’s another peak achieved at the U.S. Summit: 1.88×1018. This is where the so-called exascale computing enters the stage.
Exascale computing refers to systems capable of at least one exaFLOPS calculation per second and that is billion billion (or if you wish a quintillion) operations per second. Although still not very well-known, exascale supercomputers are poised to dramatically change the way we approach the solutions (at least their computational facet) of the world’s most vexing problems in the areas of climate, healthcare, national security, etc. Another significant application of exascale supercomputers are research projects.
One of them, the ExaMode project, is what we want to tell you about.
ExaMode, an acronym for Extreme-scale Analytics via Multimodal Ontology Discovery & Enhancement, is a project funded by the European Union, H2020 programme. It aims to solve the challenges that healthcare faces as a result of the heterogeneity and the volume of biomedical data (more than 2’000 exabytes of biomedical data are expected to be produced by 2020).
The ExaMode project aims to develop new architectures and tools for pathologists and medical researchers. These tools will allow them to effectively and efficiently handle extremely large volumes of disparate data – digitized histopathology slides from the visual and textual content of patient’s records, medical publications, diagnoses, etc.
The project is coordinated by the Institute of Information Systems of the HES-SO Valais-Wallis, Sierre, Switzerland. Besides the HES-SO, six other universities, hospitals and companies from Italy, the Netherlands, Poland and Bulgaria form the project consortium. These partners are: the Department of Information Engineering from the University of Padova (UNIPD), Radboud University Medical Center, MicroscopeIT, Cannizzaro Hospital (AOEC), SurfSara and Sirma AI, trading as Ontotext.
ExaMode’s main goal is to help professionals efficiently search and work with medical (and more specifically histopathology) data, developing a system for easy and fast knowledge discovery based on heterogeneous exascale data. With the help of such sophisticated tools, a physician will be able, for example, to access a constellation of information related to a patient’s case, including similar cases, the latest publications in the field, specific terms or visual features and related images in both scientific literature and hospital information systems.
Both the information inferred from the image analysis and from the raw textual data in the EHR records needs to be semantically normalized in order to be used for the generation of the multimodal knowledge graph.
And this is what Ontotext’s role in the project is about: knowledge graphs. More specifically, the semantically normalized annotation of images and textual resources that are further fed into a knowledge graph for easier search and discovery.
Together with the other partners, Ontotext will be leveraging text analysis in order to extract structured data from medical records and from annotated images related to histopathology information. Furthermore, the team will be working to normalize the data with established public medical ontologies to create a knowledge graph and thus enable knowledge discovery, identification of similar medical cases and referential cases described in the scientific literature.
Again, the overall aim is to extract knowledge from data and, through algorithms based on artificial intelligence, to assist medical professionals in routine diagnostics processes.
There are four types of data sources that the team will work with. The first type is metadata from images. The second is the brief texts doctors write to summarize each of the patient’s images and findings in them, the so-called synopses, where they describe each individual image case in short. The third type of data comes from the longer text forms such as discharge letters and EHRs (Electronic Health Records), the clinical record of the patient’s stay in the hospital information system that include the anamnesis, complaints, diagnoses, treatment, etc. The final one is research publications data from the PubMed repository – the repository of the US National Library of Medicine collected from quality sources. The relevant scientific publications in full-text format can be accessed there.
All these descriptions of images, the brief summaries of patient records and the articles from research databases (in various formats, be it a scientific publication, a description, a synopsis, etc.) will be processed. This will be done by mapping the extracted data to relevant ontologies – already existing ones as well as specialized ontologies developed by UNIPD and AOEC that cover histopathological conditions. The training of the image processing algorithms requires massive computing power, which will be provided by the exascale computer hosted on one of the most energy-efficient data centers – SurfSara, in Amsterdam. It is called Cartesius and is one of the best high-performance computers in Europe.
GraphDB and the ExaMode tools will blend the data and provide a semantic layer to it. Ontotext’s signature RDF database will create a powerful knowledge graph where all pieces will be put together to serve computer-aided diagnosis systems. At the end of the day, enriching text (and images) with semantic metadata will allow for better knowledge discovery, which will help doctors with their diagnosis and decision making.
But your doctor will definitely have a richly interlinked archive to consult. ExaMode and the objectives it sets are by no means about replacing the human doctor but rather about doing the heavy lifting of processing data.
A supercomputer-powered decision support system will allow physicians to use a highly interconnected architecture of medical records, histopathological images and scientific publications. This will level up the processes of observation, experimentation and analysis, which are fundamental for medicine. More importantly, it will take the patient diagnosis and care to the next level.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825292 (ExaMode, htttp://www.examode.eu/)