Qwen3 VL 32B Instruct

Alibaba · Multimodal · Released Oct 2025

A 32B multimodal model from Alibaba that processes text, images, and video with a 131k-token context window.

Strengths: Handles vision-language tasks across images and video while maintaining a large context window for processing long documents or multiple inputs together.
Best for: Multimodal workloads that require visual understanding at a mid-size scale, balancing capability with efficiency compared to the larger 235B variant.
Limitations: Smaller than the 235B A22B Instruct model released in the same period, making it less suitable for tasks requiring maximum reasoning depth or handling the most complex visual scenarios.

Input / 1M

$0.104

Output / 1M

$0.416

Cached input / 1M

Context window

Price history

Effective	Input	Output	Cached in	Note	Source
11 Jun 2026	$0.104	$0.416	—	Imported from OpenRouter	openrouter.ai

Qwen3.7 Flash

in $0.03 · out $0.13

Qwen3.7 Plus

in $0.32 · out $1.28

Qwen 3.7 Max

in $2.50 · out $7.50

Qwen3.5 Plus 2026-04-20

in $0.3 · out $1.80

Data updated Jul 28, 2026 Report a problem