> ## Documentation Index > Fetch the complete documentation index at: https://docs.poly.ai/llms.txt > Use this file to discover all available pages before exploring further. # Tutorial: Polishing the voice experience > PolyAcademy Level 3 – Master filler, turn-taking, voice quality, and personalization. export const LessonMeta = ({level, difficulty, time}) => { const levelConfig = { 1: { badge: 'bg-green-100 text-green-800 dark:bg-green-900 dark:text-green-200', label: 'Level 1' }, 2: { badge: 'bg-amber-100 text-amber-800 dark:bg-amber-900 dark:text-amber-200', label: 'Level 2' }, 3: { badge: 'bg-red-100 text-red-800 dark:bg-red-900 dark:text-red-200', label: 'Level 3' } }; const difficultyConfig = { Beginner: 'bg-green-100 text-green-800 dark:bg-green-900 dark:text-green-200', Intermediate: 'bg-amber-100 text-amber-800 dark:bg-amber-900 dark:text-amber-200', Advanced: 'bg-red-100 text-red-800 dark:bg-red-900 dark:text-red-200' }; const lvl = levelConfig[level] || levelConfig[1]; const diffColor = difficultyConfig[difficulty] || difficultyConfig['Beginner']; return

{lvl.label} {difficulty} {time && {time} }

; }; export const ProgressTracker = ({lessonNum, totalLessons, level}) => { const [checked, setChecked] = useState(false); return

setChecked(prev => !prev)} className={checked ? 'flex items-center gap-3 p-4 rounded-lg border-2 border-green-600 bg-green-50 dark:bg-green-950 cursor-pointer select-none transition-all' : 'flex items-center gap-3 p-4 rounded-lg border-2 border-gray-200 dark:border-gray-600 bg-gray-50 dark:bg-gray-800 cursor-pointer select-none transition-all'}>

{checked ? : null}

{checked ? 'Lesson complete' : 'Mark lesson complete'}

{lessonNum && totalLessons ?

{level ? level + ' - ' : ''}Lesson {lessonNum} of {totalLessons}

: null}

; }; export const Quiz = ({questions = []}) => { const [selected, setSelected] = useState({}); const [resetCount, setResetCount] = useState(0); const letters = ['A', 'B', 'C', 'D']; const handleSelect = (qIdx, optIdx) => { if (selected[qIdx] !== undefined) return; setSelected(prev => ({ ...prev, [qIdx]: optIdx })); }; const handleReset = () => { setSelected({}); setResetCount(c => c + 1); }; if (!questions?.length) return null; const getOptionClasses = ({hasAnswered, isThisCorrect, isThisSelected}) => { if (!hasAnswered) { return { btn: 'flex w-full items-center gap-3 py-2.5 px-4 rounded-xl text-sm leading-normal transition-all duration-150 text-left border cursor-pointer border-gray-200 bg-white text-gray-700 hover:border-gray-300 hover:bg-gray-50 hover:shadow-sm dark:border-gray-600 dark:bg-gray-800 dark:text-gray-200 dark:hover:border-gray-500 dark:hover:bg-gray-700', badge: 'w-6 h-6 rounded-full text-xs font-bold flex items-center justify-center shrink-0 leading-none transition-all duration-150 bg-gray-100 text-gray-500 dark:bg-gray-700 dark:text-gray-300', icon: null }; } if (isThisCorrect) { return { btn: 'flex w-full items-center gap-3 py-2.5 px-4 rounded-xl text-sm leading-normal transition-all duration-150 text-left border cursor-default border-green-400 bg-green-50 text-green-900 font-medium dark:border-green-500 dark:bg-green-950 dark:text-green-100', badge: 'w-6 h-6 rounded-full text-xs font-bold flex items-center justify-center shrink-0 leading-none transition-all duration-150 bg-green-500 text-white dark:bg-green-500', icon: }; } if (isThisSelected) { return { btn: 'flex w-full items-center gap-3 py-2.5 px-4 rounded-xl text-sm leading-normal transition-all duration-150 text-left border cursor-default border-red-400 bg-red-50 text-red-900 dark:border-red-500 dark:bg-red-950 dark:text-red-100', badge: 'w-6 h-6 rounded-full text-xs font-bold flex items-center justify-center shrink-0 leading-none transition-all duration-150 bg-red-500 text-white dark:bg-red-500', icon: }; } return { btn: 'flex w-full items-center gap-3 py-2.5 px-4 rounded-xl text-sm leading-normal transition-all duration-150 text-left border cursor-default border-gray-100 bg-white text-gray-400 dark:border-gray-700 dark:bg-gray-800 dark:text-gray-500', badge: 'w-6 h-6 rounded-full text-xs font-bold flex items-center justify-center shrink-0 leading-none transition-all duration-150 bg-gray-100 text-gray-500 dark:bg-gray-700 dark:text-gray-500', icon: null }; }; return

{questions.map((q, qIdx) => { const answer = selected[qIdx]; const hasAnswered = answer !== undefined; const isCorrect = answer === q.correct; return

{qIdx + 1} {q.q}

{q.options.map((opt, i) => { const isThisCorrect = i === q.correct; const isThisSelected = i === answer; const {btn, badge, icon} = getOptionClasses({ hasAnswered, isThisCorrect, isThisSelected }); return ; })}

{hasAnswered ?

{isCorrect ? 'Correct.' : 'Not quite.'} {' '} {q.explanation}

: null}

; })}

; }; **Level 3 – Lesson 5 of 5** – Go beyond usability to create voice experiences that sound genuinely good. After building an agent that works and is easy to use, the final layer is polish: selecting voices that perform well in practice, adding natural filler and hesitation, managing turn-taking, and personalizing based on user context. ## The layers of a good voice experience Speech recognition transcribes correctly, APIs respond, the task can be completed. The interaction is efficient, intuitive, and follows the [design principles](/learn/guides/design-principles). Copywriting, voice quality, turn-taking, and personalization make the experience enjoyable. This is the focus of this lesson. ## Voice selection and quality Pick a voice that sounds good in practice, not just in samples. If you need to regenerate 50 times to find one good take, that voice won't produce consistent quality in a live deployment. **After selecting a voice:** * Listen to the most common things the agent says: greeting, "how can I help", "anything else", and the main flow prompts * The LLM often generates similar phrasing for repeated scenarios – these get cached, so make sure they sound good * Regenerate cached audio until it sounds right Written copy always looks more informal than it sounds. Don't let written reviews make you over-formalise. When in doubt, build a short audio prototype and listen back – text on a page always sounds more formal than it does when spoken aloud. ## Natural filler and hesitation Real humans pause, say "um", and hesitate – especially when they're thinking. Adding small amounts of this to agent speech makes it sound more natural. In linguistics this is called **disfluency**, and it includes filled pauses ("um", "uh"), slight repetitions, and drawn-out sounds. ### When to use it | Context | What to add | Example | | ------------------------ | --------------------------- | -------------------------------------------------------------- | | API call / lookup | Filler phrase | "Um, let me just have a look at what space we have..." | | Complex instructions | Slight hesitation | "So what you'll want to do is, uh, go to settings and then..." | | After a misunderstanding | Drawn-out sound, regrouping | "Hmm, what was it I can do for you?" | ### Why it works * **During API calls**: filler sounds like someone checking another screen – it matches what the user expects is happening * **After misunderstandings**: hesitation sounds like someone regrouping after a miscommunication, which is exactly what's happening * **In general**: small pauses signal that the agent is "thinking", which makes silence less awkward Keep it subtle. Too much filler makes the agent sound confused rather than natural. Use it situationally, not on every turn. ## Turn-taking Turn-taking – how the agent and user take turns speaking – is one of the most impactful aspects of voice experience, and one of the hardest to control at the project level. Three common problems: * **Too much latency** – the agent takes too long to respond after the user finishes speaking. Users disengage. * **Interruptions** – the agent starts speaking before the user has finished. Users get frustrated. * **No barge-in** – the user cannot interrupt the agent, even when the agent is saying something wrong or irrelevant. Many turn-taking issues need platform-level improvements rather than project-level fixes. If you encounter persistent turn-taking problems, document specific examples and contact support. ### What you can control * **Response length** – shorter responses reduce the chance of the agent and user talking over each other * **Interaction style settings** – adjust latency thresholds in [audio management](/learn/guides/advanced/audio-management) * **Barge-in configuration** – enable or disable based on the interaction type * **Front-load key information** – put the important part first, so even if the user interrupts, they've heard what matters ## Personalisation Personalisation uses information about the user to tailor the experience. It works at three levels: ### From the current conversation If the user gives their name, you can use it – but not on every turn. LLMs tend to overuse names, which sounds scripted. Use sparingly for warmth. ### From API data If you can see a user's recent activity, use it to shortcut the conversation: > "I can see you just canceled a flight. Is that what you're calling about?" This proves competence immediately and shortens the interaction. ### From previous calls If the user called before and was sent an SMS for self-service, and they're calling back: > "I see you were calling about this earlier. Was that text not working for you?" This kind of continuity across calls signals attentiveness and builds user confidence in the system. Personalisation can feel intrusive if overdone. Use it when it clearly helps the user reach their goal faster. Avoid making users feel surveilled. ## Matching the user's style People naturally adjust how they speak depending on who they're talking to. In voice agents, this happens partially through the LLM (which adjusts vocabulary and formality based on user input). For now, focus on: * **Word choice** – if the user uses informal language, the agent should match * **Pacing** – if the user speaks slowly, don't rush them with rapid-fire responses * **Formality** – match the user's level of formality ## Try it yourself A user asks to track their order. The flow collects the tracking number and then makes an API call that takes 2-3 seconds. Design: 1. What does the agent say while the API call runs? 2. How do you handle a successful lookup? 3. How do you handle a failed lookup? For each, consider: filler, tone, brevity, and what information to say first. **During API call:** > "Okay, let me just pull that up for you..." > (Subtle filler – sounds like checking a screen) **Successful lookup:** > "Got it – your order's been shipped and should arrive Thursday. Want me to send you the tracking link?" > (Brief, key info first, natural offer for follow-up) **Failed lookup:** > "Hmm, I'm not finding anything for that number. Could you double-check it and try again?" > (Hesitation signals regrouping, blames the number not the user) Lesson 4 of 5 Recap and next steps