These pages track pre-release signals, not confirmed launch data. Provider announcements and official model cards are stronger evidence than mentions in research papers, and pricing, benchmarks, and availability remain unconfirmed until release.

Available

Qwen3-Next-80A3B

Name: Qwen3-Next-80A3B
Author: Alibaba

Language Model by Alibaba

What We Know

Qwen3-Next-80B-A3B represents a radical departure from traditional transformer scaling, achieving 80B-parameter performance while activating only 3B parameters per token through a novel Hybrid Attention architecture combining Gated DeltaNet and sparse MoE. The model delivers 10x inference throughput on 32K+ contexts compared to dense transformers, scores 90.9% on MMLU-Redux (matching much larger models), and uniquely offers both thinking and non-thinking modes while maintaining input costs as low as $0.09 per million tokens. Most critically, Alibaba achieved this at just 10% of the training compute required for their previous Qwen3-32B-Base, potentially redefining the economics of frontier model development.

Provider

Alibaba

Benchmark Performance

Benchmark	Qwen3-Next-80A3B	Comparison
SWE-Bench Verified	70%	-
MMLU-Redux	90.9%	-
MultiPL-E	87.8%	-
IFEval	87.6%	90.4%
WritingBench	87.3%	-
Creative Writing v3	85.3%	-
Artificial Analysis Intelligence Index	27%	15%
Output Speed	168.7%	84.2%

Pricing

Input

$0.09-0.50

per 1M tokens

Output

$1.10-6.00

per 1M tokens

Capabilities & Features

codingreasoninglong_contexttool_usethinking_modemultilingualfunction_callingHybrid Attention (Gated DeltaNet + Gated Attention)High-Sparsity Mixture-of-Experts (MoE)Ultra-long context modeling up to 256K tokens10x inference throughput for 32K+ contextsThinking and non-thinking mode integrationMulti-Token Prediction (MTP) support10% training cost vs Qwen3-32B-Base

Timeline

April 28, 2025

Qwen3 series announced

September 11, 2025

Qwen3-Next-80B-A3B released

December 2025

FP8 quantized version released

Resources

Primary source →Qwen3-Next-80B-A3B-Instruct HuggingFace →Qwen3 Technical Report arXiv →Artificial Analysis Intelligence Report →Qwen3-Coder-Next Blog Post →Paper →

Verification Status

Qwen3-Next-80A3B is available. Once it appears on our tracked API providers, it will be added to the LLM Leaderboard with full scoring, benchmarks, and pricing.

More from Alibaba

All Alibaba Models Compare AI Models API Pricing

Frequently Asked Questions

On SWE-Bench Verified, Qwen3-Next-80B-A3B achieves 70% accuracy, positioning it among the top coding models despite using only 3B active parameters. While the model trails specialized coding variants like DeepSeek-V3, it significantly outperforms most models in its parameter class on MultiPL-E with 87.8% accuracy. The combination of strong coding performance with $0.09-0.50 per million input tokens makes it particularly compelling for code generation pipelines where Claude's $3-15 pricing creates budget constraints.

The Hybrid Attention mechanism combines Gated DeltaNet (a linear attention variant) with traditional Gated Attention in a learned routing pattern, enabling the model to process 256K token contexts with 10x the throughput of dense transformers at 32K+ sequence lengths. This architecture activates only 3B of the total 80B parameters per token through high-sparsity MoE routing, resulting in 168.7 tokens/second output speed versus 84.2 for similarly-sized models. For production deployments handling long documents or multi-turn conversations, this translates to dramatically lower GPU memory requirements and faster response times without sacrificing quality.

Qwen3-Next-80B-A3B implements dual inference modes: a standard mode for rapid responses and a thinking mode that performs internal chain-of-thought reasoning before generating outputs. Based on the Artificial Analysis Intelligence Index score of 27 (versus 15 median for similar models), the thinking mode appears to provide substantial reasoning improvements, though specific latency penalties aren't disclosed. The model seamlessly switches between modes based on query complexity, making it suitable for mixed workloads without manual prompt engineering.

Pricing ranges from $0.09-0.50 per million input tokens and $1.10-6.00 per million output tokens depending on the provider and thinking mode usage. For a typical RAG application processing 10K input tokens and generating 500 output tokens per query, costs range from $0.0014-0.0080 per request, compared to GPT-4o's $0.0275 or Claude 3.5 Sonnet's $0.0325. The 10-70x cost advantage makes Qwen3-Next particularly attractive for high-volume applications, though the upper pricing tier approaches GPT-4o-mini territory.

The model shows a 2.8 percentage point deficit on IFEval (87.6% vs Gemma 3 27B's 90.4%), suggesting slightly weaker instruction-following capabilities for complex multi-step tasks. Creative Writing v3 scores of 85.3% and WritingBench at 87.3% indicate solid but not frontier-level creative text generation. The Artificial Analysis Intelligence Index of 27 places it well below GPT-4o or Claude 3.5 Sonnet, confirming that despite efficiency gains, there remains a quality gap to top-tier models on reasoning-intensive tasks.

What We Know

Provider

Alibaba