Read this before the hands-on lessons. It covers why agent design works the way it does.
Design priorities
Every design decision should be evaluated against these priorities, in order:1. Complete the task
The user must be able to complete their goal. Right integrations, edge-case fallbacks, no dead ends. If the task is impossible, nothing else matters.
2. Make it easy
Once the task is possible, remove friction. Each step should be obvious, unnecessary steps should be cut, and the path to completion should be short.
When priorities conflict
Collecting a long reference number by voice has a high transcription error rate. Asking for DTMF (keypad) input is less conversational — but if the alternative is three failed attempts and a handoff, completing the task wins. Use DTMF.Why sound human
PolyAI agents sound human by design. Not to deceive — because humans know how to talk to other humans. Open questions, natural turn-taking, and a human-like voice remove the need for users to learn a new interaction model. Even when users know it’s automated, they’re more comfortable and more successful when the agent meets them on familiar territory.Design guidelines
Use these during design, before build, and when reviewing live conversations. Later lessons reference these by number — they’re the shared vocabulary for evaluating agent quality.1. Guide the user
The user should always know what to say next. Two ways to do this:- Implicit guidance — sound conversational, and users respond conversationally. If the agent says “What can I help you with?”, most people describe their problem naturally. No instructions needed.
- Explicit guidance — tell the user the expected format. “Your order number should be six digits starting with a letter” prevents three rounds of “sorry, I didn’t catch that.”
Good: “Could you read me your booking reference? It’s six characters — starts with two letters, then four numbers.” Bad: “What’s your booking reference?“
2. Listen robustly
Real users don’t answer one question at a time. They say “tomorrow at 6 for three people” when you asked for the date. They answer “both” when you gave two options. They say “actually, never mind that — can you check my balance instead?” mid-flow. A well-designed agent handles all of this:- Information in any order — if the user gives name, date, and party size in one sentence, capture all three.
- Or-questions — “yes”, “no”, “both”, “neither”, “the first one”, “whichever’s cheaper.”
- Topic switches — the user can abandon one request and start another without the agent getting confused.
- Predictable out-of-scope requests — if users often ask about parking during a restaurant booking, handle it gracefully even if it’s not part of the flow.
A user says “I want to book for tomorrow at 6, there’ll be three of us, and my kid has a nut allergy.” Your flow collects date, time, party size, and dietary needs separately. The agent should accept all four values from that single utterance and skip ahead.
3. Give feedback
Users need to know the agent heard them correctly and that something is happening. Two types:- Implicit confirmation — weave the user’s input into the next question. “To look up the booking under 07700 900123, I’ll just need your surname.” This confirms the phone number without asking “Did you say 07700 900123?”
- Process feedback — when something takes time, say so. “Let me pull that up” is better than silence. Silence on a voice call feels like a dropped connection.
Implicit: “Great, so that’s a table for three tomorrow at 6. And you mentioned a nut allergy — I’ll add that to the booking.” Explicit (avoid unless critical): “You said three people. Is that correct?” / “You said tomorrow. Is that correct?” / “You said 6pm. Is that correct?”Reserve explicit confirmation for high-stakes values — payment amounts, medical details, irreversible actions.
4. Support correction
Users make mistakes. They also change their minds. The agent should handle both without restarting the entire flow. This means:- Correct a value — “Actually, it’s the 15th, not the 14th” should update the date without re-collecting everything else.
- Switch workflows — if a user starts booking a table and then says “wait, I actually want to cancel a reservation”, the agent should pivot cleanly.
- Undo an action — if possible, let the user reverse what just happened. If not possible (e.g., an API call already fired), say so clearly.
5. Prevent errors
Two parts: confirm before irreversible actions, and plan for things going wrong. Before irreversible actions:- Booking submissions, payments, cancellations, account changes — always read back the details and get a “yes” before executing.
- This adds one turn to the call, but the cost of undoing an incorrect booking is far higher.
- APIs time out. Build a fallback (“I wasn’t able to process that — let me connect you with someone who can”).
- Speech recognition fails. Design retry logic that doesn’t sound robotic (“Sorry, I didn’t quite catch that. Could you say it one more time?”).
- The user gives an answer you didn’t expect. Don’t dead-end. Route to a sensible default.
Good: “Just to confirm: a table for three on Thursday the 15th at 6pm, with a note about nut allergies. Should I go ahead and book that?” Bad: The agent silently submits the booking after collecting the last field.
6. Act efficiently
Every unnecessary turn costs time, patience, and containment rate. Remove unnecessary steps wherever possible:- If the user already gave information, don’t ask for it again.
- If only one option makes sense, don’t present it as a choice — proceed directly.
- If an explanation isn’t needed for the user to make a decision, skip it.
- Shorten utterances. “What’s your phone number?” not “Could you please provide me with the phone number associated with your account?”
7. Speak clearly and naturally
Err on the side of informality. Voice agents that sound like legal documents or corporate emails create an unnatural conversational experience and increase user disengagement. Practical rules:- Use contractions: “I’ll”, “we’re”, “that’s”
- Use short sentences. One idea per sentence.
- Avoid filler preambles: “In order to assist you with your request” → cut.
- Avoid hedging: “I believe”, “It seems like”, “I think” → state the fact or say you don’t know.
- Match how real humans speak, not how they write.
| Avoid | Prefer |
|---|---|
| ”Could you please provide me with your account number?" | "What’s your account number?" |
| "I apologize for the inconvenience." | "Sorry about that." |
| "I’m going to go ahead and process that for you." | "Done.” or “All set.” |
8. Behave consistently
Users build expectations fast. If the agent is warm and casual in the greeting, it should stay that way throughout the call. If it uses “we” to refer to the company, it should always use “we.” Consistency applies to:- Voice — same voice model, same speed, same warmth throughout the call.
- Phrasing style — if you use contractions, always use them. If you don’t, never use them.
- Response length — if most answers are 1-2 sentences, a sudden 5-sentence answer feels wrong.
- Turn-taking rhythm — if the agent usually waits for a pause before speaking, one instance of cutting the user off is jarring.
9. Be flexible
Not every user fits the happy path. Design for the edges:- Can’t receive SMS — offer an alternative (email, verbal read-back, transfer to a human).
- Can’t spell their name — accept phonetic spelling, offer letter-by-letter confirmation.
- Doesn’t have the expected information — “I don’t have my booking reference” should not dead-end the conversation. Offer lookup by name, date, or phone number.
- Accessibility — some users need more time, repeat information, or have speech patterns that challenge ASR. The agent should be patient.
10. Adapt to the user
Use what you know. If the system has context about the user — their account, their recent activity, their location — use it to skip steps and personalize the conversation. Examples:- Caller has a cancelled flight → “I can see your flight was cancelled. Are you calling about rebooking?”
- Caller authenticated via IVR → don’t ask for their account number again.
- Caller is on a mobile number you recognize → “Is this about the account ending in 4821?”
- Return caller within 24 hours → “Are you calling back about the same issue?”

