the index, explained

Understand what you're paying for.

How LLM API billing actually works — explained, with live prices and a direct jump into the estimator.

Start here · the foundation

How LLM API pricing works

Tokens, the input/output split, cached reads, and why the same answer costs several times more to write than to read.

01 / 07 6 min read live data inside
Read the explainer
Live data inside
4.0×

output-to-input price ratio, live, across all tracked models.

The series

01

How LLM API pricing works

Tokens, the input/output split, cached reads, and why the same answer costs several times more to write than to read.

01 · 6 minRead
Next up

Prompt caching

Reuse a big system prompt or document across calls and pay up to 90% less for the repeated part.

02 · 5 min
Next up

Batch processing

Trade latency for ~50% off. When a job can wait minutes, the async batch endpoint halves the bill.

03 · 4 min
Next up

Reasoning & “thinking” tokens

Reasoning models bill the hidden thinking they do before answering — often the biggest line on the invoice.

04 · 6 min
Next up

What an AI agent actually costs

An agent makes many model calls per task, each carrying a growing transcript. The cost compounds fast.

05 · 7 min
06

What drives the cost of common features

RAG, chat, classification, summarization, a coding agent — each has a different cost shape. Here's why.

06 · 8 minRead
07

Cost-cutting strategies & savings

Tiering, routing, caching, shorter outputs, batch. The levers that move a bill the most — and by how much.

07 · 6 minRead

Price your own workload.

Describe your workload in a sentence and see the bill across every model — cheapest equivalent included.

Open the estimator