GPT-4o Mini
The lightweight version of GPT-4o, offering faster performance and cost-efficiency. Best for simple queries and use-cases that prioritise speed and affordability.GPT-4o
An optimised version of GPT-4 with high-quality responses and reduced latency. Ideal when you need both accuracy and responsiveness.GPT-4
The most advanced model for generating detailed, high-quality responses. Recommended for complex tasks requiring precision and context.GPT-3.5 (0125)
An enhanced build of GPT-3.5 with stability improvements for specific workloads. Balances performance and cost.GPT-3.5
A reliable, cost-efficient option when speed and affordability are the main concerns. Good for straightforward interactions and real-time responses.Bedrock Claude 3.5 Haiku
A lightweight version of Anthropic’s Claude model, hosted on AWS Bedrock. Suitable for simple, predictable tasks.Raven
PolyAI’s proprietary model, optimised for real-time voice interactions.Gemini 1.5 (coming soon)
Google’s next-generation LLM focused on reasoning and long context windows. Currently being integrated.Mistral (coming soon)
An open-weight model designed for high-performance reasoning and coding tasks. Integration planned for a future release.Configuring the model

- Open Agent Settings → Large Language Model.
 - Select the desired model from the dropdown.
 - Click Save to apply your changes.
 
- OpenAI Models
 - Anthropic (Claude)
 - Google DeepMind (Gemini)
 - Mistral
 - Amazon Nova Micro
 - Contact PolyAI for information about Raven, PolyAI’s proprietary LLM.
 
Bring Your Own Model (BYOM)
PolyAI supports bring-your-own-model (BYOM) via a simple API integration. If you run your own LLM, expose an endpoint that follows the OpenAIchat/completions schema and PolyAI will treat it like any other provider.
Overview
- Expose an API endpoint that accepts/returns data in the OpenAI 
chat/completionsformat. - Provide authentication — PolyAI can send either an 
x-api-keyheader or a Bearer token. - (Optional) Support streaming responses using 
stream: true. 
API endpoint
Request format
frequency_penalty, presence_penalty, etc.
Response format
Streaming support (optional)
Ifstream is true, send Server-Sent Events (SSE) mirroring OpenAI’s format:
Authentication
| Method | Header sent by PolyAI | 
|---|---|
| API Key | x-api-key: YOUR_API_KEY | 
| Bearer | Authorization: Bearer YOUR_TOKEN | 
Sample implementation (Python / Flask)
Final checklist
- Endpoint reachable via POST.
 -  Request/response match OpenAI 
chat/completionsschema. - Authentication header configured (API Key or Bearer token).
 - (Optional) Streaming supported if needed.
 
- Endpoint URL
 - Model ID
 - Auth method & credential
 

