Best starting models for Agentic workflow, priced per call.

An orchestrator delegates repeated search and tool work to cheap subagents, then synthesises what they find. The fan-out is where the calls multiply.

The mismatch is the point: the money sits on the small looping subagent step, while the capability sits on the separate orchestrator. The cost-driver step and the capable-model step are different models.

  • Subagent fan-out multiplies the cheap calls.
  • The looping search step carries most of the spend.
  • The orchestrator runs less often but needs capability.

The pipeline

A feature is a chain of calls, each with a different job. Steps run top to bottom.

  1. 01

    orchestrate / plan

    decide the next move and synthesise subagent results

    Frontier capable-model step
    per-call shape 3K sys + 9K in + 1.5K out
    cheap default Claude Sonnet 4.6 ≈ $0.058 per call
    step-up for quality Claude Opus 4.8 ≈ $0.098 per call
    open-weight option Qwen 3.7 Max ≈ $0.041 per call
    See all frontier-tier models in the price table
  2. 02

    subagent search

    fan out cheap exploration/tool calls, repeated many times

    Small repeats cost-driver step
    per-call shape 800 sys + 4K in + 400 out
    cheap default Claude Haiku 4.5 ≈ $0.0068 per call
    step-up for quality Gemini 3.5 Flash ≈ $0.011 per call
    open-weight option Mistral Small 4 ≈ $0.0006 per call
    See all small-tier models in the price table
  3. 03

    final answer

    produce the user-facing result from gathered evidence

    Mid
    per-call shape 1K sys + 6K in + 700 out
    cheap default Claude Haiku 4.5 ≈ $0.010 per call
    step-up for quality Claude Sonnet 4.6 ≈ $0.032 per call
    open-weight option Llama 4 Maverick ≈ $0.0015 per call
    See all mid-tier models in the price table

How to choose for Agentic workflow

An orchestrator delegates repeated work to cheap subagents, then a final step synthesises it. The cost-driver step and the capable-model step are different, and the mismatch is sharp: the money sits on the small, looping subagent search step, while the capability sits on the separate orchestrate / plan step.

Put the capable model on orchestrate / plan and keep subagent search on a small model, because it loops many times and carries most of the spend. Cutting the number of fan-out calls moves the bill more than upgrading any single model. Reach for a bigger subagent only when cheap exploration keeps coming back wrong.

The takeaway

The cost-driver step is subagent search. The capable-model step is orchestrate / plan. They are different, so put the capable model on orchestrate / plan and keep the rest small.

No fabricated bills, no rankings.