Price your workload across every model.
Describe it in a sentence or fill in the fields. You'll see the bill for every model, the cheapest equivalent that fits, and where the money goes. Prices pull from the live index.
Ranked cheapest first · green = cheapest equivalent, your baseline marked
Where the money goes
Llama 4 Maverick · $43.80/moThis workload, priced through history
cheapest model that fit, on each dateCut it further
contextual levers for this workloadSend easy calls to a small model and reserve a frontier model for the hard ones. A 70/30 split often halves spend.
Non-urgent jobs (eval, backfill, summarization) run ~50% cheaper on async batch endpoints.
Assumptions
- Prices are USD per 1M tokens, at current list rates.
- Dimensions priced: input, output, and cached input where offered.
- Cache hit rate applies to your system / reused tokens only.
- Excludes batch discounts, free tiers, image/audio, and rate-limit effects.
- “Cheapest equivalent” respects your context-window and minimum-capability constraints.
Measure your real usage
Paste a production trace or connect your provider data and get your exact cost per call — then the same cheapest-equivalent recommendation, on the calls you actually ship.
No spam — one note when it ships. This is the separate measure product, not part of the estimator.
Alert me when this estimate changes
Get an email when a price change shifts this bill — we'll name the model and the new number.