Llama 3.2 11B Vision Instruct

by MetaRank #283Score 40.0

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Performance Overview

Score

40.0

Rank

#283

24h Change

7d Change

State

stable

Confidence

high

Signal Scores

Signal	Normalized	Weight	Contribution	Freshness
Capabilities capability	50.0	30%	15.0	2026-06-24T10:05:54.307Z
Pricing pricing_tier	99.7	25%	24.9	2026-06-24T10:05:54.307Z
Context Window context_window	73.1	15%	11.0	2026-06-24T10:05:54.307Z
Recency recency	16.8	15%	2.5	2026-06-24T10:05:54.307Z
Output Capacity output_capacity	70.2	15%	10.5	2026-06-24T10:05:54.307Z

Top Drivers

positive

Pricing

$0.34/M output tokens

$0.34

neutral

Capabilities

Supports vision, JSON mode, streaming

3/7

positive

Context Window

131K token context window

131K

positive

Output Capacity

Up to 16K output tokens per request

16K

Capabilities

Capability	Supported
Vision	Yes
Reasoning	No
JSON Mode	Yes
Streaming	Yes
Function Calling	No
Web Search	No

Pricing

Input / 1M tokens

$0.34

Output / 1M tokens

$0.34

Context Window

131K

Max Output

16K

Model Detail All Trackers Leaderboard

Llama 3.2 11B Vision Instruct

Performance Overview

Signal Scores

Top Drivers

Capabilities

Pricing

Related Models

Related

Llama 3.2 11B Vision Instruct

Performance Overview

Signal Scores

Top Drivers

Capabilities

Pricing

Related Models

Related