Best starting models for Support chatbot, priced per call.
A conversation accumulates. Every turn re-sends the transcript so the model keeps context, which means input grows with the dialogue and you pay to reprocess the history on each reply.
The chain is intent, retrieve, generate. The cheap classify step runs every turn; the generate step does the answering and is the only place that needs a capable model.
- Re-sent transcript grows with every turn.
- The classify step runs once per turn, cheaply.
- The stable prefix is highly cacheable.
The pipeline
A feature is a chain of calls, each with a different job. Steps run top to bottom.
-
01
intent / route
classify the message and pick a path (FAQ, handoff, tool)
per-call shape 400 sys + 300 in + 5 out -
02
retrieve
pull relevant help-centre passages for grounding
per-call shape 200 sys + 1.5K in + 30 out -
03
generate reply
answer in context, re-sending the accumulating transcript each turn
per-call shape 400 sys + 3.1K in + 250 out
How to choose for Support chatbot
Every turn runs intent / route, then retrieve, then generate reply, and the transcript grows each turn. The cost-driver step and the capable-model step are different: the cheap classify and retrieve steps run on every turn, while generate reply is the only one that needs a capable model.
Start generate reply on a small or mid model and reserve a step up for genuinely hard threads. Keep intent / route and retrieve small; they run constantly and should cost almost nothing per call. The stable prefix of the transcript is highly cacheable, which is the single biggest lever on this shape.
The takeaway
The cost-driver step and the capable-model step are the same one: generate reply. Spend there; keep the rest small.
No fabricated bills, no rankings.