Learn › Reasoning tokens · June 2026

Reasoning & "thinking" tokens.

Reasoning models think before they answer, and that thinking bills at the output rate. It's usually the biggest line on the invoice and the easiest one to misjudge.

A standard model reads your prompt and writes a reply. A reasoning model adds a step in between: before it answers, it generates an internal chain of thought — planning, working through cases, checking itself. Those are thinking tokens, and they're billed as output, at the output rate, exactly like the visible reply. The difference is you mostly don't see them. The answer is 200 tokens; the thinking behind it might be 4,000.

So the meter runs on tokens you never read. That's what makes reasoning cost hard to estimate from a pricing page: the per-token rate looks ordinary, but the token count is several times what the visible output suggests.

Effort is a volume dial, not a price dial

Every provider gives you a way to turn reasoning up or down. The exact control varies, but it comes in three shapes:

Named effort levels

A setting like minimal / low / medium / high. Higher means more thinking tokens spent before the answer.

e.g. OpenAI reasoning_effort

Token budgets

A cap on how many tokens the model may think for. Spend more budget, get more deliberation.

e.g. Claude extended thinking, Gemini thinking budget

Off

Many reasoning models can also run with thinking disabled, behaving like a standard model. For easy work that's the cheapest setting by a wide margin.

no thinking tokens billed

None of these change the price per token. They change how many output tokens the model spends, which is why reasoning bills at the output rate rather than some separate reasoning rate. To the invoice, a thinking token is an output token like any other. Effort is a volume control on the most expensive meter there is.

Live · prices today

Output costs 4.0× input, on average

Output costs more than input across every provider. Across 283 models the multiple ranges from 0.1× to 12.2×.

I Ling-2.6-flash$0.01 in · $0.03 out 3.0× Llama 3.1 8B Instruct$0.02 in · $0.03 out 1.5× Mistral Nemo$0.02 in · $0.03 out 1.5× S Llama 3 8B Lunaris$0.04 in · $0.05 out 1.3× G MythoMax 13B$0.06 in · $0.06 out 1.0× I Granite 4.0 Micro$0.017 in · $0.112 out 6.6×

input / 1M output / 1M per 1M tokens · tap a row for its history

Live from the index — thinking bills at the output rate, so this spread is the reasoning tax.

What the dial costs

Take the cheapest frontier model on the index today, Llama 4 Maverick, at $0.6 output per 1M tokens. Hold the visible answer at 300 tokens and move only the thinking:

Same answer — low vs high effort · Llama 4 Maverick

Visible answer (both)300 tokens

Low effort: + thinking500 tokens

High effort: + thinking7,000 tokens

Low 800 out × $0.6 / 1M$0.0005

High 7,300 out × $0.6 / 1M$0.0044

High costs9.1× the low call

Output rate is live; the token counts are illustrative. The visible answer is identical — the gap is entirely thinking you don't read.

Why it's often the biggest line

Output is already the expensive meter. It runs several times the input rate because the model generates each token sequentially. Reasoning multiplies that meter specifically, so a reasoning-heavy call can spend more on thinking the reader never sees than on its entire input and visible answer combined.

The effect is largest in agent loops. An agent makes many calls per task, and when each step reasons, the thinking tax lands on every step, on top of a transcript that grows each turn. The per-step reasoning and the lengthening transcript compound, so the cost of a reasoning agent climbs faster than either effect would on its own.

There's no fixed multiplier per model

How much more high effort costs on a given model depends on the task, not on the model alone. The same model might spend twice its baseline thinking on a simple lookup and eight times as much on a hard proof. Effort sets a ceiling and a tendency, and the prompt decides where within that range a given call lands.

That's why this site prices what it can verify — the published per-token rates — and doesn't publish a per-model "effort multiplier." Any single number would be wrong for most workloads. The reliable way to know your multiplier is to measure it: run a representative sample of your own prompts at each effort level and read the output-token counts back from the API's usage field. Once you have that ratio, the cost is just that token count times the rate above.

Turning it down

Reasoning effort is one of the largest levers on a bill, because thinking tokens routinely outnumber the visible answer by 10–20×. There are a few ways to bring it down:

Turn it off for easy work. Classification, extraction, formatting, and routing rarely need deliberation. Run them with thinking disabled or at the minimum.
Set effort per task, not per app. Reserve high effort for the calls where correctness pays for the tokens, and dial the rest down.
Cap the budget. Where the provider takes a token budget, a ceiling stops a hard prompt from running the meter unbounded.
Route the easy share. Send high-volume, low-difficulty traffic to a cheaper or non-reasoning model entirely.

The catch is the same one that applies to every cost lever: hard tasks genuinely benefit from reasoning, and turning it down too far costs accuracy on exactly the problems you reached for a reasoning model to solve. The decision is per-task, and only your eval settles it.

Cost-cutting strategies & what they save How LLM pricing works