What is Graph RAG?

Discover the transformative potential of Retrieval Augmented Generation (RAG), a method that enhances large language models (LLMs) with external knowledge for more accurate, contextual question answering. See how RAG can evolve into Graph RAG (or GraphRAG) which uses knowledge graphs (KG) as a source of context or factual information. Ontotext products and services streamline a range of Graph RAG patterns, opening up new possibilities for chatbots, natural language querying and information extraction.

The topic of interfacing in natural language with knowledge graphs has gained tremendous popularity. Moreover, as per Gartner, this will be a trend that stays and transforms a lot of the computer systems interactions we are used to. The first major step in this direction, seems to be natural language querying (NLQ) – lately everyone seems to want to ask natural language questions on their own data. 

Using out-of-the-box large language model (LLM) chatbots for question answering in enterprises can rarely be helpful, as they don’t encode domain specific proprietary knowledge about the organization activities that would actually bring value to a conversational interface for information extraction. This is where the Graph RAG approach enters the scene as an ideal solution to tailor an LLM to meet your specific requirements.

What is RAG?

RAG is a natural language querying approach for enhancing existing LLMs with external knowledge, so answers to questions are more relevant in case they require specific knowledge. It includes a retrieval information component that is used to fetch additional information from an external source, also known as “grounding context”, which is then fed to the LLM prompt to answer the desired question with higher accuracy. 

This approach is the cheapest and most standard way to enhance LLMs with additional knowledge for the purposes of answering a question. In addition, it is shown to reduce the tendency of LLMs for hallucination, as the generation adheres more to the information from the context, which is generally reliable. Due to this nature of the approach, RAG emerged as the most popular way to augment the output of generative models.

Besides question-answering RAG can also be used for many natural language processing tasks, such as information extraction from text, recommendations,  sentiment analysis and summarization, to name a few.

How to do RAG?

To achieve Graph RAG for question answering, you need to select what part of the information that is available to you to send to the LLM. This is usually done by querying a database based on the intent in the user question. The most appropriate databases for this purpose are vector databases, which via embeddings capture the latent semantic meanings, syntactic structures, and relationships between items in a continuous vector space. The enriched prompt contains the user question together with the pre-selected additional information, so the generated answer takes it into account.

As simple as the basic implementation is, you need to take into account a list of challenges and considerations to ensure good quality of the results:

  • Data quality and relevance is crucial for the effectiveness of Graph RAG, so questions such as how to fetch the most relevant content to send the LLM and how much content to send it should be considered.
  • Handling dynamic knowledge is usually difficult as one needs to constantly update the vector index with new data. Depending on the size of the data this can impose further challenges such as efficiency and scalability of the system.
  • Transparency of the generated results is important to make the system trustworthy and usable. There are techniques for prompt engineering that can be used to stimulate the LLM to explain the source of the information included in the answer.

The Different Varieties of Graph RAG

Graph RAG is an enhancement over the popular RAG approach. Graph RAG includes a graph database as a source of the contextual information sent to the LLM. Providing the LLM with textual chunks extracted from larger sized documents can lack the necessary context, factual correctness and language accuracy for the LLM to understand the received chunks in depth. Unlike sending plain text chunks of documents to the LLM, Graph RAG can also provide structured entity information to the LLM combining the entity textual description with its many properties and relationships, thus encouraging deeper insights facilitated by the LLM. With Graph RAG each record in the vector database can have contextually rich representation increasing the understandability of specific terminology, so the LLM can make better sense of specific subject domains. Graph RAG can be combined with the standard RAG approach to get the best of both worlds – the structure and accuracy of the graph representation combined with the vastness of textual content.

We can summarize several varieties of Graph RAG, depending on the nature of the questions, the domain and information in the knowledge graph at hand:

  • Graph as a Content Store: Extract relevant chunks of documents and ask the LLM to answer using them. This variety requires a KG containing relevant textual content and metadata about it as well as integration with a vector database.
  • Graph as а Subject Matter Expert:  Extract descriptions of concepts and entities relevant to the natural language (NL)  question and pass those to the LLM as additional “semantic context”. The description should ideally include relationships between the concepts. This variety requires a KG with a comprehensive conceptual model, including relevant ontologies, taxonomies or other entity descriptions. The implementation requires entity linking or another mechanism for the identification of concepts relevant to the question.
  • Graph as a Database: Map (part of) the NL question to a graph query, execute the query and ask the LLM to summarize the results. This variety requires a graph that holds relevant factual information. The implementation of such a pattern requires some sort of NL-to-Graph-query tool and entity linking.

How Ontotext GraphDB makes Graph RAG easier

Ontotext GraphDB provides numerous integrations which enable users to create their own Graph RAG implementation quickly and efficiently. GraphDB’s Similarity plugin is an easy tool to create an embedding index of your content for free and use SPARQL to query this index for the top K entities or pieces of content closest to the user question.

For more complex use cases, which require higher precision of the results, GraphDB also provides the ChatGPT Retrieval Plugin Connector through which you can index your content in a vector database using a state-of-the-art embeddings generation model and run powerful queries against this vector database. Moreover, the plugin takes care to continuously sync the state of the knowledge in GraphDB with the vector database in a transactionally-safe manner, which means new data will be immediately available for an LLM integration. 

The ChatGPT Retrieval Plugin Connector, similarly to other GraphDB plugins, allows you to precisely configure what data you want to pull from your knowledge graph and store as embeddings in an external vector database. It’s not limited to textual fields, it can also convert structured data about RDF entities into text embeddings. The connection to the vector database is managed by the ChatGPT Retrieval Plugin – hence the name of the connector. This is an open-source tool, developed and maintained by OpenAI under an MIT license. Its function is to index and query content. To index content, the plugin receives a piece of text, splits it down into manageable parts or “chunks”, creates vector embeddings for each chunk using OpenAI’s embedding model, and stores these embeddings in a vector database.  The plugin is compatible with numerous databases, including some of the most popular ones in the last few years, such as Weaviate, Pinecone and Elasticsearch.

GraphDB interacts with the plugin’s REST interface, making it decoupled from the embeddings model or vector database. Ontotext has also developed an alternative to OpenAI’s solution, which allows a custom embeddings model to be plugged in, including the Llama and Hugging Face models. The ChatGPT Retrieval Plugin also provides a REST interface for querying the vector database given a query text, which is available in GraphDB through SPARQL. If you prefer to not use a formal query language such as SPARQL for this, GraphDB supports an out-of-the-box interface to converse with your own RDF data, on top of a ChatGPT Retrieval Plugin Connector, called Talk to Your Graph. Follow the GraphDB documentation of Talk to Your Graph for instructions on how to set it up.

For more information, check out our blog post that compares the various RAG approaches.


In conclusion, the Graph RAG approach represents a significant advancement in the enrichment of LLMs. By effectively combining the strengths of both retrieval-based and generative approaches, Graph RAG enhances the ability of LLMs to produce more accurate, relevant, and contextually informed responses. This technique not only improves the overall quality of outputs but also expands the capabilities of LLMs in handling complex and nuanced queries. As a result, Graph RAG opens up new possibilities in various applications, from advanced chatbots to sophisticated data analysis tools, making it a pivotal development in the field of natural language processing.

Want to learn more about retrieval augmented generation?

Dive into our AI in Action series!

Ontotext Newsletter