> ## Documentation Index
> Fetch the complete documentation index at: https://docs.poly.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Guardrails

> Platform safety guardrails that protect your agent in production, with observability in conversation transcripts.

Platform guardrails are pre-built safety protections that PolyAI applies to every conversation. Each one targets a common production risk. All five are enabled by default and can be toggled off at any time.

Manage guardrails in **Behavior**, in the **Guardrails** section.

<Frame caption="Guardrails section in Behavior">
  <img src="https://mintcdn.com/polyai/OemEpDllMhPcDtbh/images/agent-settings/guardrails.png?fit=max&auto=format&n=OemEpDllMhPcDtbh&q=85&s=aeb70437e564f98634d9deec2158bcb9" alt="Guardrails settings showing five toggleable platform guardrails" width="1903" height="544" data-path="images/agent-settings/guardrails.png" />
</Frame>

<Note>
  Platform guardrails are applied automatically, standardized across projects, and maintained by PolyAI — no per-agent prompt engineering required.
</Note>

## The five guardrails

The underlying prompts are managed by PolyAI and are not currently visible or editable in Agent Studio.

| Guardrail                         | What it does                                                                                                                                                                  |
| --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Jailbreak & Prompt Defence**    | Blocks attempts to extract your agent's instructions, override its behavior, or impersonate a different AI.                                                                   |
| **Scope & Hallucination Control** | Restricts the agent to its knowledge base. Prevents fabrication of phone numbers, prices, or policies.                                                                        |
| **AI Identity & Confidentiality** | Prevents the agent from disclosing which LLM, provider, or platform powers it.                                                                                                |
| **Emergency & Crisis Escalation** | Escalates immediately if a caller expresses suicidal ideation, self-harm, threats, or a medical emergency. Catches conversational distress signals that content filters miss. |
| **Tool Call Integrity**           | Prevents the agent from speaking internal function calls or tool names aloud.                                                                                                 |

## Enable or disable a guardrail

1. Open **Behavior** then sccroll to **Guardrails**.
2. Toggle a guardrail off or on. Disabling prompts you to confirm.
3. Test with **Chat with Agent** before promoting to a higher environment.

## Observe when guardrails fire

Guardrail events are recorded on every conversation.

* **In a transcript:** open a conversation in [Conversations](/analytics/conversations/review), open transcript display options, and toggle on **Guardrails**. Each turn where a guardrail fired is annotated inline.
* **Across conversations:** filter by guardrail in the **QA category** of the conversation filters.
* Guardrails are stored per-project and travel through [environments and versions](/environments-and-versions/introduction) – the configuration is part of each published version.

## How guardrails fit with safety filters

Platform guardrails are **prompt-level** instructions to the LLM. They run alongside the input/output [safety filters](/behavior/guardrails/safety-filters):

* **Safety filters** classify each user input and agent output against hate, violence, sexual, and self-harm categories at the model layer. Configure thresholds per category in **Behavior** and override per channel.
* **Jailbreak detection** is always-on at the model layer and is independent of the Jailbreak & Prompt Defence guardrail. The guardrail tells the LLM how to respond; the detector blocks input upstream.
* **Emergency & Crisis Escalation** catches conversational distress signals that content filters miss – for example, "I don't want to be here anymore" said in a measured tone.

Use guardrails and safety filters together. They protect different layers.

## Related pages

<CardGroup cols={2}>
  <Card title="Behavior rules" icon="list-check" href="/behavior/general/rules">
    Add custom rules for terminology, tone, and edge cases on top of the platform guardrails.
  </Card>

  <Card title="Conversation review" icon="comments" href="/analytics/conversations/review">
    See guardrail events inline in transcripts and filter by guardrail in QA category.
  </Card>

  <Card title="Safety filters" icon="filter" href="/behavior/guardrails/safety-filters">
    Per-channel content filters for hate, sexual, violence, and self-harm.
  </Card>

  <Card title="Safety dashboard" icon="shield" href="/analytics/dashboards/safety">
    Track jailbreak attempts and other safety signals across all conversations.
  </Card>

  <Card title="Studio Assistant" icon="message" href="/studio-assistant/introduction">
    Validate guardrail behavior in the preview before promoting a version.
  </Card>
</CardGroup>
