RAG - PolyAI Platform

This page explains how your agent finds the right topic to answer a caller’s question. You do not need to configure RAG directly – it works automatically based on how you structure your Managed Topics and Connected Knowledge.

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique where the system first searches a knowledge source for relevant content, then feeds that content to a language model to generate an accurate response.

Components of RAG

Retrieval component: Searches the knowledge for relevant information based on the input query.
Augmentation: Uses retrieved information to enhance the original query with additional context.
Generation component: Generates responses using a language model, integrating both the query and retrieved information.

How RAG works in Agent Studio

PolyAI uses RAG to match user queries to Knowledge topics and generate contextual responses. Here is how it works in Agent Studio:

Query processing: When a caller provides a query, the RAG framework is initiated.
Retrieval: The retriever component searches the structured knowledge to find matching topics. The knowledge is organized to optimize retrieval performance and ensure precise matches.
Generation: The LLM uses the retrieved information to select and generate the response.

Write clear, specific topic names and realistic sample questions for best results. These are key signals the retriever uses to find the right match. You can add up to 20 sample questions per topic – more questions help the retriever find the right match. For best retrieval and generation quality, use PolyAI’s Raven model – it is robust to irrelevant retrieved content and will say “I don’t know” rather than hallucinate when information isn’t available.

Managed Topic structure for RAG

Each Managed Topic is structured for effective retrieval. A topic includes:

Topic name: The FAQ name or category of the information.
Sample Questions: Example queries that callers might use. These help RAG understand user intent and improve matching accuracy.
Content: The information you want the agent to provide to users.
Action: Specific actions triggered by the query, such as calling a function, initiating a workflow, or handing off to a human agent.

Disabling topics at runtime

For deterministic control over which topics are available during a conversation, you can disable specific Managed Topics from a Python function using conv.disable_kb_topics(). Disabled topics are excluded from retrieval until the conversation ends or you re-enable them. See Disable KB topics.

Why RAG?

You do not need to retrain a model when you update your Knowledge. RAG retrieves from the current Knowledge at query time, so updates are available as soon as they are promoted to the target environment.

Behavior may vary depending on your agent’s configuration. For example, agents using the real-time (speech-to-speech) model may trigger retrieval differently than standard voice agents. For multilingual agents, see multilingual configuration for guidance on setting up Knowledge topics across languages.

Last modified on May 19, 2026

OverviewUse flows to guide callers through structured, multi-step processes with validation and branching logic.

⌘I

​What is RAG?

​Components of RAG

​How RAG works in Agent Studio

​Managed Topic structure for RAG

​Disabling topics at runtime

​Why RAG?

What is RAG?

Components of RAG

How RAG works in Agent Studio

Managed Topic structure for RAG

Disabling topics at runtime

Why RAG?