What is Entity Linking?

Unveiling what is entity linking, how it works, why it’s vital for Natural Language Processing and why its synergy with knowledge graphs facilitates deeper understanding and processing of textual content

Understanding text is a challenging task stemming from the complexities of language and communication. Lack of context, ambiguity, nuances and contextual differences, idioms and figurative language can be quite hard to master for both humans and machines. 

While humans can rely on intuition and lived experience, machines require sophisticated Natural Language Processing (NLP) algorithms and large datasets to approximate this understanding. A central part of that involves identifying distinct entities (such as people, places, organizations, and concepts), mentioned in a text and linking them to a unique identifier in a knowledge base. This task is called entity linking and it is crucial for identifying the specific entities texts refer to, especially when they have ambiguous names or are mentioned in various forms.

The task

Consider the sentence: “Jordan played exceptionally well against Phoenix last night.”. It illustrates several challenges:

  • Ambiguous labels: The name “Jordan” could refer to multiple entities, the most obvious of which is Michael Jordan, the basketball player, or it could refer to another athlete or even a non-public figure named Jordan. Without additional context, linking “Jordan” to the correct entity in a knowledge base is challenging. Besides a sports club, if taken out of context, the term “Phoenix” could refer to the capital city of Arizona, for example.
  • Contextual clues for disambiguation: The mention of “played” provides a contextual clue, suggesting that the context might be related to sports, possibly basketball, if one assumes “Phoenix” refers to the Phoenix Suns, an NBA team. This clue is crucial for disambiguation but requires the entity linking system to understand the context and make connections between entities.

We see how entity linking must deal with ambiguity and leverage contextual clues for disambiguation to accurately identify and link entities to their corresponding entries in a knowledge base.

The entity linking process typically involves two main steps:

  • Entity detection or Named Entity Recognition (NER): Identifying the spans of text that mention entities. This step involves recognizing that a piece of text refers to a specific type of entity, such as a person, location, or organization.
  • Entity disambiguation: Once entities are detected, they must be disambiguated to distinguish between entities with similar or identical names. For example, determining whether “Jordan” refers to the country, the river, a person’s name, or a brand. The disambiguated entities are then linked to a unique identifier in a knowledge base (such as Wikidata, DBpedia, or a domain-specific database). This linking provides a way to access a rich set of information about the entity, such as its attributes, and relationships with other entities.

Entity linking is the core task in the process of semantic annotation of documents, which would typically also include further information extraction and generation of semantic metadata.

Why is entity linking important?

Entity linking is a crucial part of NLP and it’s especially important when a large size of textual content needs to be analyzed. To organize content, make it easily discoverable, and transform the information encoded in the text into a structured knowledge, you need to attribute the mentions in the text to actual known objects or instances in your database:

Examples of applications that entity linking contributes to are:

  • Enhancing search engines: By understanding the specific entities mentioned in queries and documents, search engines can provide more accurate and relevant results, e.g. filtering out documents about Paris Hilton when searching for info about the capital of France. Entity linking can also help getting more complete results, e.g. returning documents that mention Peking for queries about Beijing.  Such an augmentation to traditional search approaches is often referred to as “semantic search”.
  • Information extraction and knowledge augmentation: Entity linking helps extracting and transforming information into structured form, such as identifying unknown properties for an entity or relationships between entities.
  • Semantic analysis and recommendations: Understanding the specific entities mentioned in texts enables deeper semantic analysis, which is essential for applications like sentiment analysis, content recommendation, and personalized services.
  • Natural language querying (NLQ) and retrieval augmented generation (RAG): Entity linking helps identification of specific concepts to be passed over as parameters for template-based NLQ. It can also help compile more precise queries for retrieval in some RAG implementations.

What are common entity linking approaches?

Early entity linking  systems were rule-based – they often relied on hand-crafted rules to identify and disambiguate entities. These rules could include heuristic methods based on entity types, context keywords, and other linguistic features. While rule-based approaches can be highly accurate for specific domains or datasets, they tend to lack scalability and flexibility, especially across diverse or evolving datasets. 

Machine learning approaches emerged as a more scalable and accurate alternative.  Traditional machine learning  approaches for entity linking involve feature engineering, labeled data to train models such as Support Vector Machines (SVM), Random Forests, or Gradient Boosting that can recognize and disambiguate entities based on features extracted from the text. 

Using large language models for entity linking

The development of neural networks, transformer models, and large language models has revolutionized NLP including entity linking. These models excel at capturing deep contextual cues from text, making them highly effective for both detecting named entities and disambiguating them based on context. Still, language models will always require some customization for entity disambiguation – they need to be “educated” about the specific entity descriptions and identifiers from the reference knowledge base. 

Transformers, such as RoBERTa,  can be fine-tuned on entity linking tasks to achieve state-of-the-art performance. On the other hand, very large generative language models, such as ChatGPT, are less appropriate for entity linking, because they can be much more expensive to fine-tune and use.

How do knowledge graphs help entity linking?

Knowledge graphs prove to be a great foundation for entity linking, as they encode rich semantic information about entities, including their attributes, types, and relationships with other entities. This information can be leveraged by entity linking systems to disambiguate entities based on their context within the text and their semantic roles in the knowledge graph. 

Knowledge graphs can fit directly into the machine learning model architecture or training process, enhancing the model’s ability to link entities to the specific entries in a graph. This can involve encoding graph information into the model’s embeddings or using graph neural networks to leverage the structure of knowledge graphs. Integrating entity linking with knowledge graphs creates a feedback loop where the entity linking system benefits from the structured knowledge in the graph, and the graph, in turn, is enriched by the newly linked entities, relationships and facts extracted from text. This mutually beneficial relationship facilitates continuous improvement and learning.

Ontotext Metadata Studio provides out-of-the-box methods for entity linking:

Conclusion

Entity linking is a foundational task in text analysis that unlocks the doors for multiple applications. By combining entity linking and knowledge graphs, the resulting system not only becomes more efficient at understanding and processing information but also evolves to become a dynamic, self-improving knowledge repository to unlock new possibilities for extracting, organizing, and leveraging knowledge.

Want to learn more about entity linking?

Dive into our AI in Action series!

Ontotext Newsletter