What is Extractive Question Answering?

Extractive Question Answering unlocks the doors to a new world where every question's answer lies at your fingertips, not buried in paragraphs of text, but provided in a clear and concise way.

Chatbots and generative models are widely adopted in different applications today, but the information they provide is not always reliable. (You can read more about how we use them in our post Enhancing Knowledge Discovery: Implementing Retrieval Augmented Generation with Ontotext Technologies.) Usually, they operate in a “closed book” setting, which means that such systems are trying to provide an answer based on the general knowledge about the world. 

However, in sensitive domains and applications such answers are not reliable enough. We would like to be able to extract the information from the source as if highlighting relevant passages or phrases in a book. The latter approach is called Extractive Question Answering (QA) and although it’s not talked about as often nowadays as the generative approach to the problem, it can provide valuable insights and help structure data. 

Some examples of Extractive QA application

We can actually use Extractive QA on a daily basis. Whenever we try to google something, the search engine will not only find the relevant page but will also highlight the answer. In this way, we are shown the answer with particular evidence in a trustworthy source.

In the same way, we can look for answers in different domains and sets of documents. For instance, we can look for specific procedures, symptoms, and treatments in medical documents or instructions. 

We can even ask the question, “What is Extractive Question Answering?”

We would not yet trust large language models (LLM) to answer such sensitive questions, but finding the exact response in the trustworthy source is quite helpful in this case.

So let’s dive deeper into what Extractive QA is, how it differs from other typical information extraction approaches and when and where it can be used.

What is Extractive QA?

Formally, Extractive QA is a task within Natural Language Processing (NLP) that involves extracting relevant snippets of text from a given document to answer a user’s question. This approach often leverages additional machine learning models to determine the most appropriate passage containing the answer (information retrieval). The model can’t make up the answer. It’s particularly valuable for extracting heterogeneous information and works well with unseen questions.

Probably the most popular question answering technique is Generative QA, which aims to generate the answer based on a given text or context. This is the natural language querying techniques used by chatbots and Generative AI models. 

In contrast to Generative QA that tries to formulate the answer with its own words, Extractive QA is like highlighting the answer in the given text. From a practical perspective, applying generative models to such tasks is more expensive in terms of computation and less precise due to the non-deterministic nature of its output.

What is the relationship between Extractive QA and Named Entity Recognition?

We may think that the examples above can be solved with the Named Entity Recognition (NER) approach as many of them are named entities. However, Extractive QA is much more flexible when it comes to the type of information and the domain to which the models can be applied.

Let’s compare the approaches with respect to different aspects of the methods.


The primary goal of Extractive QA is to identify and extract relevant snippets of text that directly answer a given question. NER, on the other hand, focuses on identifying and classifying specific closed sets of entities (such as names, locations, and organizations) within a text, without any questions to prompt it.

In the screenshot above, you can see that the typical NER system highlights all mentioned locations (using Tag by Ontotext): Berlin, Germany, Brandenburg, and so on. We can’t ask for a specific location. 

In Extractive QA as shown in the screenshot below, the setting is different. Even though the text mentions several locations, only the required one is extracted.


The task of Extractive QA requires a deeper understanding of the context to identify the most relevant passage that directly addresses the user’s question. The scope of the questions is almost unlimited as long as the answer is present in the text. If it isn’t, the model will return an empty output. The resulting pipeline can be easily applied to the questions that were not seen during the training phase. With a single model we can extract any kind of information within the same domain. It’s usually robust with respect to the way the questions are formulated, allowing users smoother interaction.

In the case of NER, the set of entities produced is always frozen after the algorithm is trained and it depends on the training data. If the model has not been originally trained to extract TIME as an entity, it’s impossible to extend it without retraining it from scratch. In addition, NER is more sensitive to the domain as it doesn’t receive additional context from the question and its grammatical structure.


Extractive QA:


How do Extractive QA models work?

The Extractive QA models, based on pre-trained transformers, perform the following main steps:

  • Input: Two types of input data are taken – the text (context) and the question related to this text. 
  • Preprocessing: Both inputs are prepared for processing, which can include cleaning (removing some unnecessary characters) or tokenization (splitting the text into tokens or words). Then both are converted into numerical representation that can be used by the transformer model.
  • Encoding: The transformer model processes the preprocessed input, employing a self-attention mechanism. It captures the relationships between the words in the input text and comprehends the context in which they appear.
  • Scoring: The model calculates probability scores for each token in the passage to determine how likely it is for it to be part of the answer. The model produces probabilities independently for the start and end positions of the answer. 
  • Answer selection: The high scored text passages by the model are selected as answer boundaries – start and end positions of the answer within the text or passage.
  • Post processing (extracting the answer): The corresponding passage is extracted from the text for the selected start and end positions and some post processing may occur to refine the answer and convert it back into textual representation.
  • Output: The extracted and post-processed answer is provided as a result.

Extractive QA model high-level architecture


Stanford Question Answering Dataset (SQuAD) is a specifically designed benchmark dataset for training Extractive QA models. It consists of more than 100,000 question-answer pairs [1] posed by humans on a set of query log Wikipedia articles and crowdsourced data. The answers to these questions are segments of text (or span/passages) from the corresponding articles, and there are also cases in which the question may be unanswerable. SQuAD is used for training machine learning models and for evaluating their performance. The original dataset is in English, but there are several versions of the SQuAD translated and adopted for multiple languages.

Example of a SQuAD question and answer for a given context


Examples of state-of-the-art models

In today’s rapidly evolving technologies, some of the recent state-of-the-art models for Extractive QA task are based on:

  • BERT (Bidirectional Encoder Representations from Transformers) [2] – transformers based deep learning model that is especially fine-tuned for the Extractive QA task with QA datasets. This allows them to predict the start and end position of the corresponding answers within the given text.
  • RoBERTa (A Robustly Optimized BERT Pretraining Approach) [3] – based on the BERT model with an optimized pre-training process, with enlarged data for training, extended training time, and using larger batches. This optimization results in significant improvement of its performance in all tasks, including Extractive QA.
  • ALBERT (A Lite BERT) [4] – based on modifying BERT by significantly cutting down its size, reducing memory consumption, and increasing training speed. More importantly, all these optimizations don’t impact the model performance, and make it more efficient for Extractive QA tasks.
  • ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) [5] – which introduces a new learning paradigm by corrupting some tokens in the input text and training the discriminator model to predict whether the tokens are “real” or “fake”. This approach improves the model understanding about the linguistic nuances and context that is crucial for the Extractive QA task.
  • T5 (Text-to-Text Transfer Transformer) [6] – an approach that treats Extractive QA tasks as text-to-text “translation” tasks, that is, transforming one text to another text.
  • Longformer [7] – a model that extends the capabilities of transformers by using a sliding window that allows resolving the limitations with small context windows used by other models. This is quite useful for Extractive QA tasks when large documents are used as a context.


Real world applications of Extractive QA

Extractive QA models have significant uses in various real-world applications like: 

  • Search engines – to provide a prominent highlight of the answer to the user’s question at the top search results 
  • Automated customer support – to accelerate the provision of automated customer help by swiftly addressing frequently asked questions
  • Content summarization – to extract key sentences or information from larger texts. This application is essentially useful for quick summarization of news articles, scientific papers, or other large documents without the need to read the full text.
  • Educational tools – to assist the learning process by automatically creating tests or by supporting students with practice questions
  • Healthcare information retrieval – to extract patient information about the treatment course and disease progression from clinical documents
  • Compliance and legal assistance – to enable quickly pinpointing specific details across large volumes of legal  texts. These models are typically employed to extract relevant case laws or statutes that answer specific legal questions.
  • Business intelligence – to support decision making, identifying opportunities and risk, and to gain deeper insights into market trends. This is achieved by delivering information extracted from extensive text such as reports, emails, and other documents.


Want to learn more about Extractive QA?

Dive into our AI in Action series!



[1] Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
[2] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[3] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., … & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
[4] Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.
[5] Clark, K., Luong, M. T., Le, Q. V., & Manning, C. D. (2020). Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
[6] Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., … & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1-67. https://www.jmlr.org/papers/volume21/20-074/20-074.pdf 
[7] Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.

Ontotext Newsletter