Skip to main content
Read this before the hands-on lessons. It covers why agent design works the way it does.

Design priorities

Every design decision should be evaluated against these priorities, in order:
1

1. Complete the task

The user must be able to complete their goal. Right integrations, edge-case fallbacks, no dead ends. If the task is impossible, nothing else matters.
2

2. Make it easy

Once the task is possible, remove friction. Each step should be obvious, unnecessary steps should be cut, and the path to completion should be short.
3

3. Add delight

Once it works and it’s easy, polish it. Natural pacing, warm tone, good turn-taking.
The order matters. If there’s a conflict, the earlier priority wins. A polished agent that can’t complete the task is a failure.

When priorities conflict

Collecting a long reference number by voice has a high transcription error rate. Asking for DTMF (keypad) input is less conversational — but if the alternative is three failed attempts and a handoff, completing the task wins. Use DTMF.

Why sound human

PolyAI agents sound human by design. Not to deceive — because humans know how to talk to other humans. Open questions, natural turn-taking, and a human-like voice remove the need for users to learn a new interaction model. Even when users know it’s automated, they’re more comfortable and more successful when the agent meets them on familiar territory.

Design guidelines

Use these during design, before build, and when reviewing live conversations. Later lessons reference these by number — they’re the shared vocabulary for evaluating agent quality.

1. Guide the user

The user should always know what to say next. Two ways to do this:
  • Implicit guidance — sound conversational, and users respond conversationally. If the agent says “What can I help you with?”, most people describe their problem naturally. No instructions needed.
  • Explicit guidance — tell the user the expected format. “Your order number should be six digits starting with a letter” prevents three rounds of “sorry, I didn’t catch that.”
Where this matters most: collecting structured data (reference numbers, dates, addresses). If the agent asks “What’s your booking reference?” without hinting at format, users guess — and guess wrong.
Good: “Could you read me your booking reference? It’s six characters — starts with two letters, then four numbers.” Bad: “What’s your booking reference?“

2. Listen robustly

Real users don’t answer one question at a time. They say “tomorrow at 6 for three people” when you asked for the date. They answer “both” when you gave two options. They say “actually, never mind that — can you check my balance instead?” mid-flow. A well-designed agent handles all of this:
  • Information in any order — if the user gives name, date, and party size in one sentence, capture all three.
  • Or-questions — “yes”, “no”, “both”, “neither”, “the first one”, “whichever’s cheaper.”
  • Topic switches — the user can abandon one request and start another without the agent getting confused.
  • Predictable out-of-scope requests — if users often ask about parking during a restaurant booking, handle it gracefully even if it’s not part of the flow.
A user says “I want to book for tomorrow at 6, there’ll be three of us, and my kid has a nut allergy.” Your flow collects date, time, party size, and dietary needs separately. The agent should accept all four values from that single utterance and skip ahead.

3. Give feedback

Users need to know the agent heard them correctly and that something is happening. Two types:
  • Implicit confirmation — weave the user’s input into the next question. “To look up the booking under 07700 900123, I’ll just need your surname.” This confirms the phone number without asking “Did you say 07700 900123?”
  • Process feedback — when something takes time, say so. “Let me pull that up” is better than silence. Silence on a voice call feels like a dropped connection.
Implicit confirmation is almost always better than explicit. Asking “Did you say X?” on every turn doubles the call length and makes the agent sound like a bad phone tree.
Implicit: “Great, so that’s a table for three tomorrow at 6. And you mentioned a nut allergy — I’ll add that to the booking.” Explicit (avoid unless critical): “You said three people. Is that correct?” / “You said tomorrow. Is that correct?” / “You said 6pm. Is that correct?”
Reserve explicit confirmation for high-stakes values — payment amounts, medical details, irreversible actions.

4. Support correction

Users make mistakes. They also change their minds. The agent should handle both without restarting the entire flow. This means:
  • Correct a value — “Actually, it’s the 15th, not the 14th” should update the date without re-collecting everything else.
  • Switch workflows — if a user starts booking a table and then says “wait, I actually want to cancel a reservation”, the agent should pivot cleanly.
  • Undo an action — if possible, let the user reverse what just happened. If not possible (e.g., an API call already fired), say so clearly.
A common anti-pattern: the agent says “I’m sorry, let’s start over” and drops all collected information. This is a design failure — the agent should update only the corrected value and retain everything else.

5. Prevent errors

Two parts: confirm before irreversible actions, and plan for things going wrong. Before irreversible actions:
  • Booking submissions, payments, cancellations, account changes — always read back the details and get a “yes” before executing.
  • This adds one turn to the call, but the cost of undoing an incorrect booking is far higher.
Plan for failure:
  • APIs time out. Build a fallback (“I wasn’t able to process that — let me connect you with someone who can”).
  • Speech recognition fails. Design retry logic that doesn’t sound robotic (“Sorry, I didn’t quite catch that. Could you say it one more time?”).
  • The user gives an answer you didn’t expect. Don’t dead-end. Route to a sensible default.
Good: “Just to confirm: a table for three on Thursday the 15th at 6pm, with a note about nut allergies. Should I go ahead and book that?” Bad: The agent silently submits the booking after collecting the last field.

6. Act efficiently

Every unnecessary turn costs time, patience, and containment rate. Remove unnecessary steps wherever possible:
  • If the user already gave information, don’t ask for it again.
  • If only one option makes sense, don’t present it as a choice — proceed directly.
  • If an explanation isn’t needed for the user to make a decision, skip it.
  • Shorten utterances. “What’s your phone number?” not “Could you please provide me with the phone number associated with your account?”
The most common efficiency failure: the agent explains why before acting. “In order to look up your booking, I’ll need your reference number. Could you please provide that?” Ask directly: “What’s your booking reference?” This is revisited in detail in Level 3: Writing agent speech.

7. Speak clearly and naturally

Err on the side of informality. Voice agents that sound like legal documents or corporate emails create an unnatural conversational experience and increase user disengagement. Practical rules:
  • Use contractions: “I’ll”, “we’re”, “that’s”
  • Use short sentences. One idea per sentence.
  • Avoid filler preambles: “In order to assist you with your request” → cut.
  • Avoid hedging: “I believe”, “It seems like”, “I think” → state the fact or say you don’t know.
  • Match how real humans speak, not how they write.
AvoidPrefer
”Could you please provide me with your account number?""What’s your account number?"
"I apologize for the inconvenience.""Sorry about that."
"I’m going to go ahead and process that for you.""Done.” or “All set.”
This is the guideline that matters most for voice. Long, formal sentences create awkward pacing and make users more likely to interrupt or disengage. This is revisited in detail in Level 3: Writing agent speech.

8. Behave consistently

Users build expectations fast. If the agent is warm and casual in the greeting, it should stay that way throughout the call. If it uses “we” to refer to the company, it should always use “we.” Consistency applies to:
  • Voice — same voice model, same speed, same warmth throughout the call.
  • Phrasing style — if you use contractions, always use them. If you don’t, never use them.
  • Response length — if most answers are 1-2 sentences, a sudden 5-sentence answer feels wrong.
  • Turn-taking rhythm — if the agent usually waits for a pause before speaking, one instance of cutting the user off is jarring.
The one exception: deliberate mode shifts. Reading a legal disclaimer in a different tone signals “this part is important and different.” That’s intentional inconsistency with a purpose.

9. Be flexible

Not every user fits the happy path. Design for the edges:
  • Can’t receive SMS — offer an alternative (email, verbal read-back, transfer to a human).
  • Can’t spell their name — accept phonetic spelling, offer letter-by-letter confirmation.
  • Doesn’t have the expected information — “I don’t have my booking reference” should not dead-end the conversation. Offer lookup by name, date, or phone number.
  • Accessibility — some users need more time, repeat information, or have speech patterns that challenge ASR. The agent should be patient.
The test: can every user who has a legitimate reason to call actually complete their task? If any common user profile is locked out, the design fails guideline 1 (complete the task) as well.

10. Adapt to the user

Use what you know. If the system has context about the user — their account, their recent activity, their location — use it to skip steps and personalize the conversation. Examples:
  • Caller has a cancelled flight → “I can see your flight was cancelled. Are you calling about rebooking?”
  • Caller authenticated via IVR → don’t ask for their account number again.
  • Caller is on a mobile number you recognize → “Is this about the account ending in 4821?”
  • Return caller within 24 hours → “Are you calling back about the same issue?”
This saves time and signals competence. However, avoid surfacing information in a way that feels intrusive — if you reference data the user didn’t expect you to have, briefly explain the source (“I can see from your account that…”). The risk of not adapting: the agent asks for information it already has, which wastes time and makes the user feel like they’re talking to a system, not a service.
Ready to build? Start with Level 1: Get started.
Last modified on March 31, 2026