Skip to main content
Level 2 — Lesson 5 of 8 — Understand and manage the audio cache for optimal performance.
Audio Management controls how TTS output is generated, cached, and replayed. Without understanding caching, teams often think changes “didn’t apply” when they’re hearing old audio.

Understanding the audio cache

What caching is

Cached audio stores previously generated TTS so it can be replayed instantly, reducing latency and keeping repeated phrases consistent.

Cache requirements

Audio is only cached if the same utterance is generated at least twice within a 24-hour window.
One-off utterances will not persist in cache by default.

Managing cached audio

1

Open Audio Management

Navigate to Channels > Voice > Audio management in the platform.
2

Review cached utterances

Check the list of cached utterances:
  • Greeting
  • Transfer / handoff language
  • SMS offer phrasing
  • Closings and confirmations
3

Adjust individual utterances

For any high-frequency utterance:
  • Open it and review how often it has been used
  • Adjust stability and clarity for that utterance only if needed
  • Use the play button to preview changes
4

Ensure stability for critical phrases

If an utterance must remain stable:
  • Generate it multiple times within 24 hours, or
  • Upload a static audio file to overwrite the cached version

Check your understanding

Interaction style (response latency)

Interaction style controls how quickly the agent responds after detecting user speech. This directly affects interruption rate and perceived naturalness.
~400ms latencyExtremely fast, higher interruption risk.

Barge-in

Barge-in determines whether callers can interrupt the agent mid-speech.
  • Useful for Turbo mode
  • Can feel chaotic if enabled without careful phrasing and latency tuning

Pronunciations

Ensure domain-specific terms are spoken clearly and correctly in Call.
Pronunciations are defined in the Pronunciations tab under Channels > Voice > Response Control and applied globally. They modify how text is converted to speech, without changing the underlying text.

When to use pronunciations

Brand names

Product names that are mispronounced

Proper nouns

Locations, people, departments

Numbers or IDs

Structured read-back requirements

Pacing

Phrases where pacing matters for comprehension

How pronunciations work

Matching is done using regular expressions. Replacements can be:
International Phonetic AlphabetFor precise pronunciation control

Examples

Regex: \bLouvre\bReplacement: /ˈluːvrə/Case sensitive: FALSE
Regex: (\d{3})[ -]?(\d{3})[ -]?(\d{4})Replacement: \1 <break time="0.5s" /> \2 <break time="0.5s" /> \3

Best practices

Incremental

Add pronunciations one at a time

Test thoroughly

Test each change in Call before adding more

Keep it simple

Prefer clarity over cleverness—overly complex regex is hard to maintain

Check your understanding

Verification checklist

After any voice or phrasing change:
  • Start a new call session
  • Confirm you are hearing updated audio, not a cached version
  • Validate that turn-taking still feels natural after changing latency or barge-in
  • Mispronounced terms are corrected consistently
  • Pauses improve comprehension rather than slowing the call excessively

Try it yourself

1

Challenge: Fix a mispronounced brand name

Your agent says “Hopper” but it is consistently pronounced incorrectly (sounds like “Hooper”). You also want phone numbers read back with a natural pause between each segment.Write both pronunciation configurations:
  1. IPA correction for “Hopper”
  2. Phone number formatting with 0.5s pauses
For the IPA, write out what “Hopper” sounds like phonetically. For the phone number, use regex capture groups to split the digits and insert SSML <break> tags.
Brand name correction:
  • Regex: \bHopper\b
  • Replacement: /ˈhɒpər/
  • Case sensitive: FALSE
Phone number with pauses:
  • Regex: (\d{3})[ -]?(\d{3})[ -]?(\d{4})
  • Replacement: \1 <break time="0.5s" /> \2 <break time="0.5s" /> \3

Check your understanding

← Previous: Response Control

Lesson 4 of 8

Next: Global ASR →

Lesson 6 of 8
Last modified on March 31, 2026