Best starting models for Summarization, priced per call.

The most input-heavy shape there is: a long document in, a short summary out. The input rate gets multiplied by every page you feed in.

Output is modest and well-bounded, so the input meter scaled by length sets the cost. Cheap long-context models handle routine summaries; no step here reaches for a frontier model.

  • Document length scales the input meter directly.
  • Output is modest and bounded.
  • Over-sending boilerplate inflates every call.

The pipeline

A feature is a chain of calls, each with a different job. Steps run top to bottom.

  1. 01

    summarise

    compress a long document into a short, faithful summary

    Small cost-driver step
    per-call shape 300 sys + 22K in + 600 out
    cheap default GPT-4.1 Mini ≈ $0.0099 per call
    step-up for quality Claude Haiku 4.5 ≈ $0.025 per call
    open-weight option Llama 4 Scout ≈ $0.0020 per call
    See all small-tier models in the price table

How to choose for Summarization

One step, summarise: a long document in, a short summary out. No step here needs a frontier model. The input meter scaled by document length sets the cost, and the cost-driver step is summarise itself.

Cheap long-context models have made routine summarisation a small-tier task, so start there. The lever after tier choice is not over-sending: strip boilerplate, headers, and repeated front-matter before the document goes in, because every token you send is priced on every call. Pay for a higher tier only when missing a clause buried mid-document actually carries a cost.

The takeaway

No step here needs a frontier model. The bill concentrates on the cost-driver step (summarise); a small model handles it.

No fabricated bills, no rankings.