VerticalAI docs
Guides

Voice and fillers

How the agent sounds — voice, speed, tone, language — plus the greeting, response length, ambience, and filler phrases.

Two pages shape how the agent sounds and behaves out loud: Voice picks the voice and greeting, and Conversation tunes how it speaks on a call. Both matter for whether a call feels right to a human ear.

Voice

The Voice page sets the agent's voice (powered by Cartesia) and its greeting — the first thing a caller hears when the agent picks up. A clear, short greeting that names who they have reached sets the tone for the whole call.

Conversation

The Conversation page is where the agent's speaking behaviour lives, grouped into five categories.

Speech

  • Speed — playback rate from 0.5× to 2.0×. Around natural pace is usually best; too fast reads as rushed, too slow as robotic.
  • Tone — an emotional colour for the voice. Options include neutral, content, happy, excited, calm, confident, sympathetic, curious, sad, angry, and scared. Match it to the job — a support line wants calm or sympathetic, not excited.
  • Language — the spoken language. English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese, Hindi, and Arabic are available.

Response length

How long the agent's replies run, as a token budget:

  • Brief — 150 tokens. Tight, transactional answers.
  • Standard — 300 tokens. A good default.
  • Detailed — 500 tokens.
  • Thorough — 800 tokens.

On a phone call, shorter is usually better — long monologues are hard to follow by ear and slow the back-and-forth.

Office ambience

An optional low background audio bed so the agent does not sound like it is in a vacuum. On or off.

Filler phrases

Fillers are the short, natural sounds and phrases the agent says while it works — "let me check that for you", "one moment" — so the caller is not met with silence while a tool runs. This is the single biggest lever on perceived latency: a filler at the right moment makes a slow lookup feel responsive.

  • On/off — whether the agent uses fillers at all.
  • Delay — how long it waits (0.5–5.0s) before reaching for a filler, so it does not interrupt a quick reply.

Pronunciation

Teach the voice how to say brand names, places, or jargon it would otherwise mangle. Add the word and how it should be pronounced.

Where filler phrases are edited

The on/off and delay controls are here on Conversation, but the actual phrases the agent draws from are managed on the Tools page:

  • Default fillers — the agent-wide pool, edited from the Tools page.
  • Per-tool fillers — phrases specific to one tool ("looking up your order now"), edited inside that tool's editor.

Going live

All of these settings land in a draft and only affect live calls once you publish.

On this page