Conversation flow - PolyAI Platform

Processing stages

A conversation moves through the following stages:

1. Input and processing

User: The user provides input–speech (voice) or text (webchat/SMS).
Input capture: For voice, the audio stream is captured and sent for transcription. For webchat/SMS, text is received directly.
ASR Provider (voice only): The system receives the raw audio.
ASR Service (voice only): Converts the audio into text using automatic speech recognition.
ASR Processing (voice only): Searches for transcription issues and applies any relevant corrections.
Transcript/Text → Processed Input: The processed input is passed to Retrieval.
Retrieval: Pulls relevant topics retrieved from the Knowledge area using RAG (retrieval-augmented generation) to provide context for the response.

2. Compute prompt and generate response

Compute Prompt: The system builds an LLM prompt using retrieved topics, system knowledge, and conversation history.
Run LLM: The LLM processes the request and determines whether to return:
- Returned Text: A direct text response.
- Returned Function: A tool call (if applicable).
Execute Function (if applicable): Runs the function and passes the result back to the LLM.
LLM Refinement: If a function result is returned, the LLM updates its response before proceeding.

3. Streaming and chunking

Chunk LLM Output: The response is broken into chunks for delivery.
Postprocess Chunks: Applies rules such as stop keywords to remove unnecessary phrases.
Stream Partial Responses: The system sends chunks as soon as they are ready, rather than waiting for the full response.
TTS Service (voice only): Converts text chunks into speech using text-to-speech synthesis. Configure voices in voice settings.
Response delivery: For voice, synthesized speech is streamed to the user. For webchat/SMS, text responses are sent directly.

4. Post-processing and handoff

Live Handoff (if applicable): If escalation is needed, the agent triggers a live handoff. For voice, this transfers the call; for webchat, this can route to a live chat agent.
Conversation Logs: The system stores conversation history and logs for analytics.
Final Response: The user receives the completed response as it streams, without waiting for the entire message.

Advanced: How response streaming works

PolyAI agents don’t wait for the full response before speaking. Instead, responses are processed and streamed in real time:

LLM Streaming: Words are generated and sent continuously.
Chunking: Responses are broken into chunks for controlled delivery.
Postprocessing: Stop keywords remove unnecessary phrases before delivery.
Response Streaming: For voice, users hear speech as soon as it’s processed via TTS. For webchat, text appears progressively as it’s generated.

Watch it in action

This video visualizes the conversation flow, showing how responses are processed, chunked, and streamed:

Understand system components and data flow

Configure your agent’s personality and behavior

Add FAQs and knowledge sources

Tune ASR and input processing