Best starting models for Summarization, priced per call.
The most input-heavy shape there is: a long document in, a short summary out. The input rate gets multiplied by every page you feed in.
Output is modest and well-bounded, so the input meter scaled by length sets the cost. Cheap long-context models handle routine summaries; no step here reaches for a frontier model.
- Document length scales the input meter directly.
- Output is modest and bounded.
- Over-sending boilerplate inflates every call.
The pipeline
A feature is a chain of calls, each with a different job. Steps run top to bottom.
-
01
summarise
compress a long document into a short, faithful summary
per-call shape 300 sys + 22K in + 600 out
How to choose for Summarization
One step, summarise: a long document in, a short summary out. No step here needs a frontier model. The input meter scaled by document length sets the cost, and the cost-driver step is summarise itself.
Cheap long-context models have made routine summarisation a small-tier task, so start there. The lever after tier choice is not over-sending: strip boilerplate, headers, and repeated front-matter before the document goes in, because every token you send is priced on every call. Pay for a higher tier only when missing a clause buried mid-document actually carries a cost.
The takeaway
No step here needs a frontier model. The bill concentrates on the cost-driver step (summarise); a small model handles it.
No fabricated bills, no rankings.