June 2026

LLM costs are hard to estimate

When a team picks the model for a new feature, price often barely figures in the decision — not because it doesn't matter, but because it's hard to compute ahead of time.

The cost of an LLM feature isn't printed on any pricing page. It's price × token volume × traffic shape × caching × batch discounts — per model — and every one of those variables is opaque before launch, uncertain during the build, and tedious to re-derive once a feature is live. As a result, the figure tends to go unestimated: it's guessed at, or the question is skipped entirely.

That tends to produce two outcomes, both common. One is over-paying: defaulting to a frontier model for a task a model twenty times cheaper would handle, because "the good one" looks like the safe choice when the trade-off can't be quantified. The other is under-building: shelving a feature over a bill that was never actually estimated. Either way the cost decision gets made without much information — and it's rarely revisited, even though it's one of the few engineering decisions that compounds with every request.

Two things make this hard.

The first is that sticker prices don't compare directly. Every provider quotes pricing in its own format: per million tokens here, per-thousand tables lingering in docs there. Cached input is priced separately — sometimes reads, sometimes writes too. Batch sits at half price, often in a footnote. Thinking models bill their reasoning as output tokens, so the number on the page isn't the number on the invoice. Even the first step — "which of these two models is cheaper?" — usually means opening a spreadsheet.

The second is that the answer keeps changing, and the history isn't recorded anywhere. This market reprices itself more often than most of software does. In May, DeepSeek made a 75% discount on its frontier model permanent overnight: $1.74 per million input tokens became $0.435. In 2024, Alibaba cut Qwen prices by up to 97% in a single day, and Baidu made models free within hours. GPT-4 launched at $30 per million input tokens three years ago; comparable capability now costs well under a dollar. A model committed to in February may have a cheaper equivalent by June — and the provider's pricing page, which only ever shows now, won't surface that.

So tokenprice.fyi does two straightforward things:

  1. Normalise. Every model, every provider, one table, one unit — dollars per million tokens, input, output and cached, with a single blended figure to sort by. The comparison step becomes a glance.
  2. Remember. Prices here are kept as an append-only history rather than a single number that gets overwritten. Every price point is dated and carries its source. When DeepSeek cuts 75%, the launch price doesn't disappear — it becomes the start of a line on a chart, next to the market events that explain it. Trajectory can inform decisions too: whether a provider tends to cut after launch, whether it's worth waiting, whether the answer settled on last quarter still holds.

A note on accuracy, since it's central to the product: Anthropic figures are curated by hand. Other providers are best-effort — public pricing pages plus a daily sync, sourced on every data point, and occasionally lagging. When something is wrong, it gets fixed and noted rather than silently overwritten. tokenprice.fyi isn't affiliated with any provider, which is part of the point. If a price looks stale or wrong, the source sits right there on the price point — it can be checked, and flagged.

The site doesn't try to say which model is smartest — there are plenty of benchmarks for that. It also can't yet say what a given workload will cost, though that's the direction it's heading. What it does today is the prerequisite for any of that: what the market is charging, what it charged before, and how fast that's moving. A cost decision is hard to make without that information, and the information is hard to use without a record. This is the record.