Best starting models for Coding agent, priced per call.
An agent reads a repo, plans a change, edits, runs tools, then re-checks. Each loop re-sends a growing context, so the same tokens get billed again and again.
Three drivers stack at once: a long re-sent context, a tool loop that repeats, and reasoning that bills as output. The plan step is where capability earns its keep; the looping edit step is where the spend piles up.
- A long context, re-sent on every loop step.
- A tool loop that repeats many times per task.
- Reasoning tokens, billed as output, often longer than the visible action.
The pipeline
A feature is a chain of calls, each with a different job. Steps run top to bottom.
-
01
plan
decompose the task and decide what to change
per-call shape 2.5K sys + 8K in + 1.2K out -
02
edit / tool-call
make changes and run tools, re-sending the growing context each step
per-call shape 2.5K sys + 24K in + 2K out -
03
verify
read tool output and decide whether the change is correct
per-call shape 1K sys + 6K in + 200 out
How to choose for Coding agent
Three steps, three jobs: plan decides what to change, edit / tool-call makes it across a looping context, and verify checks the result. The cost-driver step and the capable-model step are different here, and getting that split right is the whole game.
Put the capable model on plan, the step that decides the change. The spend piles up on edit / tool-call, where a long context is re-sent on every loop, so the lever there is cached input, not a bigger model. Keep verify on a small model. A frontier model across the whole loop pays frontier rates for steps that never needed it.
The takeaway
The cost-driver step is edit / tool-call. The capable-model step is plan. They are different, so put the capable model on plan and keep the rest small.
No fabricated bills, no rankings.