Guide

Best starting models for Support chatbot, priced per call.

A conversation accumulates. Every turn re-sends the transcript so the model keeps context, which means input grows with the dialogue and you pay to reprocess the history on each reply.

The chain is intent, retrieve, generate. The cheap classify step runs every turn; the generate step does the answering and is the only place that needs a capable model.

Re-sent transcript grows with every turn.
The classify step runs once per turn, cheaply.
The stable prefix is highly cacheable.

The pipeline

A feature is a chain of calls, each with a different job. Steps run top to bottom.

01

intent / route

classify the message and pick a path (FAQ, handoff, tool)

Small

per-call shape 400 sys + 300 in + 5 out

cheap default GPT-4.1 Nano ≈ <$0.0001 per call

step-up for quality Claude Haiku 4.5 ≈ $0.0007 per call

open-weight option Mistral Small 4 ≈ <$0.0001 per call
See all small-tier models in the price table
02

retrieve

pull relevant help-centre passages for grounding

Small

per-call shape 200 sys + 1.5K in + 30 out

cheap default Claude Haiku 4.5 ≈ $0.0018 per call

step-up for quality Gemini 3.5 Flash ≈ $0.0028 per call

open-weight option Llama 4 Scout ≈ $0.0001 per call
See all small-tier models in the price table
03

generate reply

answer in context, re-sending the accumulating transcript each turn

Mid cost-driver step capable-model step

per-call shape 400 sys + 3.1K in + 250 out

cheap default Claude Haiku 4.5 ≈ $0.0048 per call

step-up for quality Claude Sonnet 4.6 ≈ $0.014 per call

open-weight option Llama 4 Maverick ≈ $0.0007 per call
See all mid-tier models in the price table

How to choose for Support chatbot

Every turn runs intent / route, then retrieve, then generate reply, and the transcript grows each turn. The cost-driver step and the capable-model step are different: the cheap classify and retrieve steps run on every turn, while generate reply is the only one that needs a capable model.

Start generate reply on a small or mid model and reserve a step up for genuinely hard threads. Keep intent / route and retrieve small; they run constantly and should cost almost nothing per call. The stable prefix of the transcript is highly cacheable, which is the single biggest lever on this shape.

The takeaway

The cost-driver step and the capable-model step are the same one: generate reply. Spend there; keep the rest small.

No fabricated bills, no rankings.

Go deeper

Explainer See the full cost breakdown What this task costs and why, worked through line by line with live prices. Price table Every model, priced per 1M tokens Sort and filter the full catalog the options above link into.

All tasks in the guide