Understand what you're paying for.
How LLM API billing actually works — explained, with live prices and a direct jump into the estimator.
How LLM API pricing works
Tokens, the input/output split, cached reads, and why the same answer costs several times more to write than to read.
Read the explaineroutput-to-input price ratio, live, across all tracked models.
The series
How LLM API pricing works
Tokens, the input/output split, cached reads, and why the same answer costs several times more to write than to read.
Prompt caching
Reuse a big system prompt or document across calls and pay up to 90% less for the repeated part.
Batch processing
Trade latency for ~50% off. When a job can wait minutes, the async batch endpoint halves the bill.
Reasoning & “thinking” tokens
Reasoning models bill the hidden thinking they do before answering — often the biggest line on the invoice.
What an AI agent actually costs
An agent makes many model calls per task, each carrying a growing transcript. The cost compounds fast.
What drives the cost of common features
RAG, chat, classification, summarization, a coding agent — each has a different cost shape. Here's why.
Cost-cutting strategies & savings
Tiering, routing, caching, shorter outputs, batch. The levers that move a bill the most — and by how much.
Price your own workload.
Describe your workload in a sentence and see the bill across every model — cheapest equivalent included.