Ontotext and Sofia University Team Rank #7 in SNOMED CT Challenge

The Entity Linking Challenge focused on medical entities extraction and normalization to SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms)

Sofia Friday, May 17, 2024

In March 2024, researchers from Ontotext and from Sofia University formed a joint-team (named ENIGMA) and applied their AI skills in the healthcare domain by participating in the SNOMED CT Entity Linking Challenge. The competition focused on medical entities extraction and normalization to SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms). SNOMED CT is a comprehensive and multilingual clinical terminology used extensively in Healthcare systems around the world.

Taking part in this shared task was an important step in staying up-to-date with medical Natural Language Processing (NLP) technologies, as the challenge offered the first set of real electronic health records manually annotated with SNOMED CT codes. Working with real documents (as opposed to PubMed medical articles, which is more typical for such tasks), posits many challenges. As the documents are written by doctors and not meant for a wider audience, they include many abbreviations, medical jargon, and typos. 

In addition, the annotation inconsistencies and ambiguities interfere with building high-quality entity linking solutions. In this case, the words “right” and “left” were mapped to up to 50 different SNOMED concepts, therefore the team had to leverage context-aware methods for disambiguation. Another issue specific to the task is a long-tail distribution of the entities. So, for example, out of 220,000 reference SNOMED entities, there were only 5,000 mentioned in the train set, and the test set contained many unseen entities.

The ENIGMA team leveraged a combination of advanced transformer models and dictionary-based approaches, and put a strong emphasis on pre-processing the documents. The innovative methods they used showcased the power of AI in the Healthcare domain and demonstrated a commitment to pushing the boundaries in the field. As a result, out of about 40 teams who submitted their solutions (with 500 participants registered initially), ENIGMA secured the impressive 7th place. Their achievement can be seen on the public leaderboard.

Despite the advancements in NLP, large language models still struggle to annotate text for medical entities and completely fail to map those to specific terminologies. Continued research in the Healthcare domain is crucial and this understanding is the reason behind Ontotext’s participation in projects like AIDAVA and the development of Ontotext Target Discovery. Now, the approach applied at the SNOMED CT Challenge offers exciting opportunities for reuse and further research. Looking further, Ontotext plans to address multilinguality as their next challenge in Healthcare technology, based on the experience from projects like EXA MODE and AI4EU, which were previously recognized as an important EC-funded innovation and a success story.

For more information, contact Doug Kimball, Chief Marketing Officer at Ontotext