GLM 4.7 Flash

Z.ai · Released Jan 2026

GLM 4.7 Flash is Z.ai's optimized inference variant of the GLM 4.7 foundation model, designed for faster response times with a 202K-token context window.

Strengths: Delivers reduced latency while retaining the reasoning and code-generation capabilities of the full GLM 4.7 model.
Best for: Latency-sensitive applications that need strong reasoning and code handling without the computational overhead of the full model.
Limitations: As a speed-optimized variant, it may trade some accuracy or capability depth compared to the full GLM 4.7 foundation model.

Input / 1M

$0.06

Output / 1M

Cached input / 1M

$0.01

Context window

Price history

Input (solid)Output (dashed)

Price change

30d: in decreased 0.0% out decreased 0.0%
90d: in decreased 0.0% out decreased 0.0%
1y: in decreased 0.0% out decreased 0.0%
Since launch: in decreased 0.0% out decreased 0.0%

Snapshots

Effective	Input	Output	Cached in	Note	Source
23 Jul 2026	$0.06	$0.4	$0.01	Imported from OpenRouter	openrouter.ai
17 Jul 2026	$0.0605	$0.4	—	Imported from OpenRouter	openrouter.ai
16 Jul 2026	$0.0605	$0.4	—	Imported from OpenRouter	openrouter.ai
15 Jul 2026	$0.0605	$0.4	—	Imported from OpenRouter	openrouter.ai
11 Jun 2026	$0.06	$0.4	$0.01	Imported from OpenRouter	openrouter.ai

More from Z.ai

GLM 5.2

in $0.7686 · out $2.42

GLM 5.1

in $0.966 · out $3.04

GLM 5V Turbo

in $1.20 · out $4.00

GLM 5 Turbo

in $1.20 · out $4.00

Data updated Jul 23, 2026 Report a problem