These pages track pre-release signals, not confirmed launch data. Provider announcements and official model cards are stronger evidence than mentions in research papers, and pricing, benchmarks, and availability remain unconfirmed until release.

Available

Qwen3-1.7B

Name: Qwen3-1.7B
Author: Alibaba

Language Model by Alibaba

What We Know

Qwen3-1.7B represents Alibaba\'s aggressive play in the efficiency-optimized LLM segment, trained on 36 trillion tokens across 119 languages while maintaining a deployment footprint 43% smaller than Qwen2.5-3B. The model achieves 50.99% on MMLU (14.63 points behind Qwen2.5-3B) but surprises with only a 3.05 point gap on EvalPlus coding benchmarks, suggesting selective capability preservation through Strong-to-Weak Distillation. At $0.11 per million input tokens, it undercuts most sub-2B models while introducing a hybrid thinking mode that dynamically switches between chain-of-thought and direct response patterns based on query complexity.

Provider

Alibaba

Benchmark Performance

Benchmark	Qwen3-1.7B	Comparison
MMLU	50.99%	65.62%
GSM8K	43.97%	79.08%
MATH	26.1%	42.64%
EvalPlus	43.23%	46.28%
MultiPL-E	28.06%	39.65%
MBPP	46.4%	54.6%
GPQA	24.24%	26.26%
BBH	51.7%	56.3%
MMLU-Pro	29.23%	34.61%
MGSM	33.11%	47.53%
Artificial Analysis Intelligence Index	7%	8%

Pricing

Input

$0.11

per 1M tokens

Output

$0.42

per 1M tokens

Capabilities & Features

codingreasoningthinking_modemultilingualtool_usefunction_callinglong_contextHybrid thinking mode with seamless switching between thinking and non-thinking modesStrong-to-Weak Distillation training approach from larger Qwen3 modelsQK-Norm attention mechanism for improved training stabilityTrained on 36 trillion tokens across 119 languages and dialectsSupport for both dense and MoE architecture variants

Timeline

April 29, 2025

Qwen3 series officially released including 1.7B variant

May 2025

Technical report published with comprehensive benchmarks

Resources

Primary source →Qwen3 Technical Report →Alibaba Introduces Qwen3 Official Announcement →Qwen3-1.7B HuggingFace Model Card →Qwen3 GitHub Repository →Paper →

Verification Status

Qwen3-1.7B is available. Once it appears on our tracked API providers, it will be added to the LLM Leaderboard with full scoring, benchmarks, and pricing.

More from Alibaba

All Alibaba Models Compare AI Models API Pricing

Frequently Asked Questions

Qwen3-1.7B shows predictable degradation in reasoning-heavy tasks: GSM8K drops 35.11 points (43.97% vs 79.08%), MATH loses 16.54 points (26.1% vs 42.64%), and MMLU falls 14.63 points (50.99% vs 65.62%). However, coding benchmarks show surprising resilience with EvalPlus at 43.23% (only 3.05 points behind) and MBPP at 46.4% (8.2 point gap), indicating the Strong-to-Weak Distillation process successfully prioritized code generation capabilities during compression.

At $0.11 per million input tokens and $0.42 per million output tokens, Qwen3-1.7B costs approximately 3.8x more for output than input. For a typical production scenario processing 100M tokens daily (70% input, 30% output), monthly costs would be $343.50, compared to $1,260 for GPT-3.5-Turbo at similar volumes. The 32K context window means fewer prompt truncations than 8K-limited alternatives, potentially reducing total token usage by 15-20% for document-heavy applications.

The model uses QK-Norm attention mechanisms to identify query complexity patterns and automatically switches between thinking and non-thinking modes without explicit prompting. Complex reasoning queries trigger internal chain-of-thought generation (similar to o1-preview) while simple factual queries bypass this overhead. Performance data shows this reduces latency by 40-60% on straightforward queries while maintaining 26.1% accuracy on MATH problems that require multi-step reasoning.

The 1.7B parameter count creates clear capability ceilings: GPQA scientific reasoning scores only 24.24% (barely above random), multilingual math (MGSM) hits just 33.11% compared to 47.53% for the 3B variant, and the Artificial Analysis Intelligence Index rates it 7/10 versus 8/10 for comparable models. The model particularly struggles with tasks requiring extensive world knowledge or complex multi-hop reasoning, making it unsuitable for research-grade applications or high-stakes decision support.

The model's strength lies in code completion and simple programming tasks (46.4% MBPP, 43.23% EvalPlus) combined with basic multilingual support across 119 languages. Optimal deployments include IDE code suggestions, API response generation, multilingual customer support automation, and structured data extraction where the 32K context window provides advantages. The 51.7% BBH score indicates competence at logical puzzles and pattern matching, making it suitable for rule-based workflow automation.

What We Know

Provider

Alibaba