Learn › Feature anatomy · June 2026

What an AI feature is actually made of.

A feature is a chain of calls, each with a different job. The step that drives the bill and the step that needs the capable model are usually different ones.

The thing people picture as one model call is almost never one call. A support chatbot classifies the message, retrieves help-centre passages, then writes a reply. A coding agent plans, edits, runs tools, and re-checks across many steps. Each link in the chain has a different job, a different token shape, and a different right model.

Two of those steps matter most, and they pull in opposite directions. One step is the cost-driver step: where the tokens, and so the bill, concentrate. Another is the capable-model step: the one that actually needs a frontier model to get the job right. The common mistake is putting one frontier model across the whole chain because a single step needs it.

The fix is to read the chain step by step. Below are four common shapes, each rendered from the same data the guide uses. For each, the cost-driver step and the capable-model step are named from the chain itself.

Live · prices today

Output costs 4.0× input, on average

Output costs more than input across every provider. Across 283 models the multiple ranges from 0.1× to 12.2×.

I Ling-2.6-flash$0.01 in · $0.03 out 3.0× Llama 3.1 8B Instruct$0.02 in · $0.03 out 1.5× Mistral Nemo$0.02 in · $0.03 out 1.5× S Llama 3 8B Lunaris$0.04 in · $0.05 out 1.3× G MythoMax 13B$0.06 in · $0.06 out 1.0× I Granite 4.0 Micro$0.017 in · $0.112 out 6.6×

input / 1M output / 1M per 1M tokens · tap a row for its history

Live from the index — the per-1M spread every chain below is multiplied against.

That spread sets the stakes for the capable-model step. Today the cheapest frontier model on the index is Llama 4 Maverick, at $0.15 in / $0.6 out per 1M tokens. Put that rate on one step that needs it, not on every step that doesn't.

Support chatbot A multi-turn conversation that routes, retrieves, and replies.

intent / route

classify the message and pick a path (FAQ, handoff, tool)

Small

↓

retrieve

pull relevant help-centre passages for grounding

Small

↓

generate reply

answer in context, re-sending the accumulating transcript each turn

Mid cost-driver step capable-model step

Here the cost-driver step and the capable-model step are the same one: generate reply. Spend there, keep the rest small.

RAG support bot Answer questions over retrieved documents, grounded and citable.

embed query

turn the question into a vector to search the index

Small

↓

retrieve / rerank

score and order candidate passages so only the best go in

Small

↓

generate answer

read the retrieved context and answer without inventing

Mid cost-driver step capable-model step

Here the cost-driver step and the capable-model step are the same one: generate answer. Spend there, keep the rest small.

Agentic workflow An orchestrator delegating repeated search and tool work to cheap subagents.

orchestrate / plan

decide the next move and synthesise subagent results

Frontier capable-model step

↓

subagent search

fan out cheap exploration/tool calls, repeated many times

Small loops cost-driver step

↓

final answer

produce the user-facing result from gathered evidence

Mid

The cost-driver step is subagent search. The capable-model step is orchestrate / plan. They are different steps, so the capable model goes on orchestrate / plan and the rest stay small.

Classification / extraction A document in, a label or small JSON object out. Output is tiny.

classify / extract

label the document or pull structured fields to JSON

Small cost-driver step

The bill concentrates on classify / extract. No step here needs a capable model, so a small model runs the whole chain.

The cost-driver step and the capable-model step are usually different.

Where this goes next

The shapes above are the skeleton. Two pages put numbers and choices on them.

The worked cost breakdown behind these shapes See starting options in the guide