Learn the index, explained

Understand what you're paying for.

How LLM API billing actually works — explained, with live prices and a direct jump into the estimator.

Start here · the foundation

How LLM API pricing works

Tokens, the input/output split, cached reads, and why the same answer costs several times more to write than to read.

01 / 07 6 min read live data inside

Live data inside

4.0×

output-to-input price ratio, live, across all tracked models.

Tokens, the input/output split, cached reads, and why the same answer costs several times more to write than to read.

Next up

Reuse a big system prompt or document across calls and pay up to 90% less for the repeated part.

02 · 5 min

Next up

Trade latency for ~50% off. When a job can wait minutes, the async batch endpoint halves the bill.

03 · 4 min

Next up

Reasoning models bill the hidden thinking they do before answering — often the biggest line on the invoice.

04 · 6 min

Next up

An agent makes many model calls per task, each carrying a growing transcript. The cost compounds fast.

05 · 7 min

RAG, chat, classification, summarization, a coding agent — each has a different cost shape. Here's why.

Tiering, routing, caching, shorter outputs, batch. The levers that move a bill the most — and by how much.

Describe your workload in a sentence and see the bill across every model — cheapest equivalent included.