What is Natural Language Querying?

In an era where data shapes virtually all aspects of our lives, the ability to access and understand it is more critical than ever. Enter the rapidly evolving field of Natural Language Querying.

Natural Language Querying (NLQ) enables users to interact with complex databases – yes, including knowledge graphs – using ordinary human language, eliminating the need for specialized query language skills. Over the past year, NLQ has skyrocketed into prominence, proving to be an indispensable tool in unlocking the staggering potential of structured and unstructured data alike.

Why is NLQ important?

With the recent advancements in generative models, the research field of NLQ has been dominated by approaches using large language models (LLMs) to understand human questions and provide natural language answers. LLMs and conversational interfaces have clearly demonstrated the benefits of exploring and extracting information easily from these extremely large knowledge structures. 

They opened the door for a next-level user experience when it comes to how to search and consume new knowledge. Enterprise organizations possessing a lot of data hosted in various data stores, including knowledge graphs, seek to enable such interfaces for consuming the information represented there. The goal is to enhance knowledge discovery and enable non-technical users or employees to benefit from all the information for knowledge-driven decision-making. 

NLQ systems can be seen as a subset of Question-Answering (QA) systems, which are designed to answer questions posed by users in natural language. A related term to the broader QA area is Extractive Question Answering. It, unlike NLQ, which targets database querying, aims to extract a specific segment of text from a provided document that directly answers a user question.

How Can You Do NLQ?

So, how do LLMs help in tackling NLQ tasks?
Following is a list of recently emerging approaches for NLQ with the help of LLMs:

  • Enhancing the grounding context of an LLM with Retrieval Augmented Generation (RAG) for covering questions specific to a domain or proprietary knowledge. The way to achieve RAG is simply to provide additional context as part of the prompt with information that could be useful for the generation of the answer. The LLM then generates an answer by taking into account the provided context information.
  • Using the skills of LLMs to translate questions into structured database queries that are then executed against an on-premise database. This very powerful technique can unlock access to knowledge encoded in various database systems without needing to reindex or transform their content. The LLM takes care – to parse the user question, understand its semantics, and generate a valid query in the query language it’s instructed to use. Optionally, the results from the database query can be fed back to the LLM so a natural language response is returned to the user.
  • Fine-tuning custom LLM models with proprietary data. This process tailors the LLM model to understand and process queries specific to an organization’s datasets, improving its performance on domain-specific tasks. Proper data collection and preparation are key to achieving good results, as aspects such as data diversity, consistency, accuracy, and lack of bias need to be considered.

Many libraries and tutorials are emerging with helpful tools and guidelines to speed up such development, the most fast-growing out of which is currently LangChain.

Challenges

All of these techniques lead to empowering the out-of-the-box LLM to answer questions better, even those that require external knowledge. The field of NLQ is quickly advancing, but there are still major challenges:

  • Contextual understanding – many queries depend on the context for their interpretation. This context can be immediate (based on the conversation history) or external (based on current events or user-specific knowledge).
  • Complex query interpretations – users may phrase their queries in complex ways that involve nested conditions, aggregations, or comparisons. Interpreting these correctly to form accurate database queries or data retrievals requires advanced understanding and processing capabilities.
  • Challenges in modeling and storing data – the data representation and schema determine the variety of questions that NLQ systems can cover. Depending on the types of questions an NLQ system needs to cover, the information to provide their answers must be indexed in an appropriate data storage. Also making an LLM aware of the structure of the data is crucial, especially for query translation.
  • Transparency and trustworthiness – building systems that users trust and feel comfortable using requires not only technical accuracy but also transparency in how queries are interpreted and processed. Managing user expectations regarding the system’s capabilities and limitations is also crucial.

Knowledge graphs provide a helpful addition to these approaches thanks to the structured knowledge representation in the form of ontologies as well as the contextual richness and connectedness of the data, which enables semantic reasoning. These features can be particularly useful in domain-specific NLQ systems where understanding the specific terminology and relationships is crucial. Semantic technologies and knowledge graphs can enhance LLMs when it comes to extending the context for LLM in a rich, accurate, and transparent manner.  

How Ontotext Offerings Make NLQ Easier?

Ontotext’s products cover the full spectrum of foundational technologies to set you up to speed with your NLQ development.

GraphDB

  • By complying with the SPARQL standard, GraphDB makes a perfect candidate for integration with SPARQL-generated queries 
  • GraphDB opens the doors for every user to implement quickly and efficiently their own RAG implementation using for grounding the descriptions of the entities, relationships and content in the graph:
    • By using GraphDB’s Similarity plugin, you can create an embeddings index for free and use SPARQL to query this index for the top-K pieces of knowledge closest to the user question.
    • With the ChatGPT Retrieval Plugin Connector, you can index subsets of the graph in a vector database using a state-of-the-art embeddings generation model and run powerful queries against this vector database.
  • RDF data can be exposed via other interfaces thanks to the variety of connectors that GraphDB provides. For example, indexing data in Elasticsearch and using an LLM to generate Elasticsearch queries has been a very promising research area

Semantic Objects

Ontotext’s Semantic Objects provides a GraphQL interface on top of the knowledge graph data that is very easy to pick up by LLMs. GraphQL is one of the most developer-friendly interfaces to consume knowledge graphs. Semantic Objects provides access to the schema of the data in a readable YAML format, which makes it easy for an LLM to parse and understand the model of the data. Alternatively, for bigger schemas, a dynamic schema introspection can be integrated into the query generation process.

Ontotext Metadata Studio

To fine-tune an LLM model for NLQ, you need a high-quality training dataset with questions and answers pairs on a representative sample of your proprietary knowledge. Ontotext Metadata Studio (OMDS) is the tool for making LLM fine-tuning easy via:

  • built-in text analytics capabilities driven by the semantic database
  • flexible annotation schema
  • user-friendly manual annotation capabilities, including such for question-answering tasks, for developing custom training datasets for LLM fine-tuning
  • monitoring of the evolution of the quality of your fine-tuned LLM model in time

Using OMDS, general-purpose LLMs can be turned into specialized ones based on proprietary content, domain model, and expertise. A produced training dataset in a custom domain enriched with high-quality semantic metadata can teach the user’s LLM instance to excel in the new area. With all these features made easier and right on top of your knowledge graph, OMDS is the perfect companion for this job.

Real-world Applications

NLQ has great potential in a wide range of industries such as Healthcare, Financial Services, Infrastructure, and Manufacturing. In reality, few of the domain experts in each of these areas actually possess a deep technical understanding of query languages in order to take full advantage of the graph data for their day-to-day analysis. By combining semantic graph technologies with LLMs, you can make knowledge graphs easier to enrich, consume, and understand, so the value they unlock can be democratized to a wider public.

In Healthcare, for example, semantic technologies and knowledge graphs improve personal healthcare by providing a comprehensive patient dossier for clinical decision-making and enabling access to high-quality data for clinical research. NLQ can be used to quickly retrieve patient information or research data, simply by asking questions like “What were the patient’s last lab results?” or “What are the recent studies on this disease?”. This can make data management more efficient and can help in making timely diagnosis and treatment decisions.

In Financial Services, knowledge graphs allow organizations to derive more value from their data by capturing unique relationships between data points, which is crucial for complex queries and analytics. NLQ enhanced by knowledge graphs can help data analysts trace and understand the origins of the information and access real-time insights to make informed knowledge-driven decisions.

Another example is Industry and Manufacturing, where knowledge graphs help bridge the gap between different industry sectors by revolutionizing how data is structured and analyzed, leading to better knowledge management and process automation. By querying systems in natural language, users can quickly find information relevant to their tasks, be it maintenance and troubleshooting, supply chain management, or security and compliance.

In all of the areas above, the aspect of knowledge sharing and collaboration across organizations is key. NLQ can facilitate the discovery and access to relevant information, expertise, best practices, and lessons learned, thus promoting collaboration, encouraging knowledge exchange, and preventing silos of information.

Conclusion

It is no wonder why the emergence of LLMs has generated such unprecedented hype – people feel empowered by how accurately a machine can understand and meet their needs. Such a seamless experience will be expected and demanded by users of more and more applications around us. Knowledge graphs provide key features to unlock NLQ capabilities powered by data connectedness, semantic context, and inference. 

Want to learn more about natural language querying?

Dive into our AI in Action series!

Ontotext Newsletter