Model
Picking the language model that powers the agent, and the latency-versus-quality trade-off.
The Model page picks the language model that drives the agent. The model is the brain: it reads the prompt and the conversation so far, decides what to say, and decides which tools to call. Everything else — voice, tools, fillers — sits around it.
Choosing a model
The picker lists the models available to your workspace with live pricing pulled from OpenRouter, so the cost you see is current. The right choice is a trade-off between three things:
- Latency — how fast the model starts responding. On a phone call this is the difference between a natural exchange and an awkward pause. Faster models feel more human.
- Quality — how well it follows the prompt, picks the right tool, and recovers when a call goes sideways.
- Cost — the per-token price, shown live in the picker.
A sensible default
Start with a fast, capable mid-tier model and only move up if you see the agent making reasoning or tool-selection mistakes a bigger model would avoid. A slower model that gives a marginally better answer often makes the call worse, because the caller feels the lag. Latency is part of call quality, not separate from it.
When to change it
Change the model when you have evidence, not a hunch:
- The agent picks the wrong tool or misreads the caller → try a stronger model.
- Replies are correct but the caller is waiting too long → try a faster one.
- Use the test panel and evals to compare before and after rather than judging off a single call.
The model choice lands in a draft and only affects live calls once you publish.