Voice

The PolyAI platform supports flexible voice selection for external providers such as ElevenLabs, AWS Polly, and Microsoft Azure TTS.

Provider classes

When picking models, adjusting stability, or accessing third-party providers — use provider-specific TTSVoice classes. You can also optionally adjust clarity and latency_mode to match the agent’s interaction style.

Example: ElevenLabs

from polyai.voice import ElevenLabsVoice

conv.set_voice(
    ElevenLabsVoice(
        provider_voice_id="gDnGxUcsitTxRiGHr904",
        model_id="eleven_flash_v2",
        stability=0.5,
        similarity_boost=0.7,
        clarity=0.8,            # Optional: controls crispness of enunciation
        latency_mode="swift",   # Optional: aligns with interaction-style settings
    )
)

Example: AWS Polly

from polyai.voice import PollyVoice

conv.set_voice(
    PollyVoice(
        provider_voice_id="Joanna",
        engine="neural",
        clarity=0.6,
    )
)

Example: Microsoft Azure TTS

from polyai.voice import AzureVoice

conv.set_voice(
    AzureVoice(
        provider_voice_id="en-US-JennyNeural",
        style="cheerful",
        role="customer-service-rep",
        clarity=0.9,
    )
)

Example: Cartesia

from polyai.voice import CartesiaVoice, Emotion, EmotionKind, EmotionIntensity

conv.set_voice(
    CartesiaVoice(
        provider_voice_id="a1b2c3d4",
        speed=0.0,  # -1.0 (slowest) to 1.0 (fastest)
        emotions=[
            Emotion(EmotionKind.POSITIVITY, EmotionIntensity.HIGH)
        ],
        model_id="sonic"  # or "sonic-preview"
    )
)

Emotion options:

EmotionKind: ANGER, POSITIVITY, SURPRISE
EmotionIntensity: LOWEST, LOW, HIGH, HIGHEST

Example: Rime

from polyai.voice import RimeVoice

conv.set_voice(
    RimeVoice(
        provider_voice_id="voice_id",
        speech_alpha=1.0,  # <1.0 faster, >1.0 slower
        model_id="mistv2"  # or "mist"
    )
)

Example: Minimax

from polyai.voice import MinimaxVoice

conv.set_voice(
    MinimaxVoice(
        model_id="speech-02-hd",  # or speech-02-turbo, speech-01-hd, speech-01-turbo
        voice_id="voice_id",
        speed=1.0,      # 0.5-2.0
        vol=1.0,        # 0-10
        pitch=0,        # -12 to 12
        emotion="happy" # happy, sad, angry, fearful, disgusted, surprised, neutral
    )
)

Example: Hume

from polyai.voice import HumeVoice

conv.set_voice(
    HumeVoice(
        provider_voice_id="voice_uuid_or_name",
        voice_description="patient, empathetic counselor",  # Optional
        version="2",        # "1" for octave-1, "2" for octave-2
        instant_mode=False, # Ultra-low latency mode
        provider="HUME_AI"  # "CUSTOM_VOICE" or "HUME_AI"
    )
)

Example: Google TTS

from polyai.voice import GoogleVoice

conv.set_voice(
    GoogleVoice(
        provider_voice_id="ja-JP-Neural2-B",
        gender="male"  # "male", "female", or "neutral"
    )
)

Example: Custom provider

from polyai.voice import CustomVoice

conv.set_voice(
    CustomVoice(
        provider="MY_PROVIDER",
        provider_voice_id="voice_id",
        custom_param="value"  # Any additional kwargs
    )
)

Voice randomization

Use VoiceWeighting to randomly select a voice based on weighted probabilities:

from polyai.voice import VoiceWeighting, ElevenLabsVoice

conv.randomize_voice([
    VoiceWeighting(
        voice=ElevenLabsVoice(provider_voice_id="voice1"),
        weight=0.7
    ),
    VoiceWeighting(
        voice=ElevenLabsVoice(provider_voice_id="voice2"),
        weight=0.3
    ),
])

Weights must sum to 1.0.
Voices without explicit weights share the remaining probability equally.

Cache behavior

Changing model_id does not automatically invalidate cached audio.
To reset cached audio:
- Go to Audio → Cache and delete existing entries.
- Or, create a new voice entry with a different voice_id.
- You can prepend the model ID to the voice ID (e.g. eleven_flash_v2/a1b2c3...) if you want to isolate caches across models.

Additional options

clarity – fine-tunes articulation sharpness per utterance (0.0–1.0).
latency_mode – chooses a response profile (“swift”, “balanced”, “precise”, “turbo”) consistent with Interaction style.
stability – controls tone variability across runs.
randomize_voice() – supports external providers for weighted selection.

Introduction

Analytics

Build

Channels

Configure

Deployments

Troubleshoot

Legal

Provider classes

Example: ElevenLabs

Example: AWS Polly

Example: Microsoft Azure TTS

Example: Cartesia

Example: Rime

Example: Minimax

Example: Hume

Example: Google TTS

Example: Custom provider

Voice randomization

Cache behavior

Additional options

Introduction

Analytics

Build

Channels

Configure

Deployments

Troubleshoot

Legal

​Provider classes

​Example: ElevenLabs

​Example: AWS Polly

​Example: Microsoft Azure TTS

​Example: Cartesia

​Example: Rime

​Example: Minimax

​Example: Hume

​Example: Google TTS

​Example: Custom provider

​Voice randomization

​Cache behavior

​Additional options

Provider classes

Example: ElevenLabs

Example: AWS Polly

Example: Microsoft Azure TTS

Example: Cartesia

Example: Rime

Example: Minimax

Example: Hume

Example: Google TTS

Example: Custom provider

Voice randomization

Cache behavior

Additional options