Guide

Best starting models for Summarization, priced per call.

The most input-heavy shape there is: a long document in, a short summary out. The input rate gets multiplied by every page you feed in.

Output is modest and well-bounded, so the input meter scaled by length sets the cost. Cheap long-context models handle routine summaries; no step here reaches for a frontier model.

Document length scales the input meter directly.
Output is modest and bounded.
Over-sending boilerplate inflates every call.

The pipeline

A feature is a chain of calls, each with a different job. Steps run top to bottom.

01

summarise

compress a long document into a short, faithful summary

Small cost-driver step

per-call shape 300 sys + 22K in + 600 out

cheap default GPT-4.1 Mini ≈ $0.0099 per call

step-up for quality Claude Haiku 4.5 ≈ $0.025 per call

open-weight option Llama 4 Scout ≈ $0.0020 per call
See all small-tier models in the price table

How to choose for Summarization

One step, summarise: a long document in, a short summary out. No step here needs a frontier model. The input meter scaled by document length sets the cost, and the cost-driver step is summarise itself.

Cheap long-context models have made routine summarisation a small-tier task, so start there. The lever after tier choice is not over-sending: strip boilerplate, headers, and repeated front-matter before the document goes in, because every token you send is priced on every call. Pay for a higher tier only when missing a clause buried mid-document actually carries a cost.

The takeaway

No step here needs a frontier model. The bill concentrates on the cost-driver step (summarise); a small model handles it.

No fabricated bills, no rankings.

Go deeper

Explainer See the full cost breakdown What this task costs and why, worked through line by line with live prices. Price table Every model, priced per 1M tokens Sort and filter the full catalog the options above link into.

All tasks in the guide