These pages track pre-release signals, not confirmed launch data. Provider announcements and official model cards are stronger evidence than mentions in research papers, and pricing, benchmarks, and availability remain unconfirmed until release.

Available

Qwen3.6-35B-A3B-UD-Q4_K_S.gguf

Name: Qwen3.6-35B-A3B-UD-Q4_K_S.gguf
Author: Alibaba

Language Model by Alibaba

What We Know

Qwen3.6-35B-A3B uses a sparse MoE architecture with only 3B active parameters out of 35B total, achieving 49.5% on SWE-bench Pro (13.8 points above Gemma4-31B) while costing just $0.16/$0.90 per million tokens. The model implements Gated Delta Networks combined with early fusion multimodal training, enabling native vision-language capabilities with 92% accuracy on RefCOCO spatial reasoning tasks. With both thinking and non-thinking modes plus a 262K context window, it targets the gap between open-source models and GPT-5.2 (which beats it by only 4 points on AIME 2026) at 1/50th the typical cost of frontier models.

Provider

Alibaba

Benchmark Performance

Benchmark	Qwen3.6-35B-A3B-UD-Q4_K_S.gguf	Comparison
SWE-bench Verified	73.4%	75%
SWE-bench Pro	49.5%	35.7%
Terminal-Bench 2.0	51.5%	42.9%
AIME 2026	92.7%	96.7%
GPQA Diamond	86%	93.3%
MMLU Pro	85.2%	-
HMMT Feb 2026	83.6%	94.8%
Humanity\'s Last Exam (HLE)	21.4%	-
RefCOCO (Spatial Intelligence)	92%	-
ODInW13 (Object Detection)	50.8%	-
Artificial Analysis Intelligence Index	37%	-

Pricing

Input

$0.16

per 1M tokens

Output

$0.90

per 1M tokens

Capabilities & Features

codingreasoningmultimodalvisionthinking_modenon_thinking_modetool_usefunction_callingagentic_workflowslong_contextSparse Mixture-of-Experts (MoE) architecture with 35B total parameters, 3B activeNative multimodal vision-language capabilitiesGated Delta Networks combined with sparse MoE for efficiencyThinking preservation for iterative developmentFrontend workflows and repository-level reasoningBoth thinking and non-thinking modesEarly fusion training on multimodal tokensTool calling and agent capabilitiesUnified vision-language foundation

Timeline

February 16, 2026

Qwen3.5 series announcement

February 24, 2026

Qwen3.5-35B-A3B medium series release

April 16, 2026

Qwen3.6-35B-A3B release with enhanced agentic coding

Resources

Primary source →Alibaba Announces Qwen3.6-35B-A3B, Beats Gemma 4-31B On Many →Qwen/Qwen3.6-35B-A3B · Hugging Face →GitHub - QwenLM/Qwen3.6 →Qwen3.5 35B A3B - Intelligence, Performance & Price Analysis →

Verification Status

Qwen3.6-35B-A3B-UD-Q4_K_S.gguf is available. Once it appears on our tracked API providers, it will be added to the LLM Leaderboard with full scoring, benchmarks, and pricing.

More from Alibaba

All Alibaba Models Compare AI Models API Pricing

Frequently Asked Questions

The 35B total/3B active parameter design achieves 85.2% on MMLU Pro while requiring only the memory footprint of a 3B model during inference. This sparse activation pattern enables deployment on single A100 GPUs with INT4 quantization, delivering 73.4% on SWE-bench Verified (within 1.6 points of the larger Qwen3.5-27B). The Gated Delta Networks allow dynamic expert routing based on input complexity, explaining why it excels at coding tasks (51.5% on Terminal-Bench 2.0) despite its compact active size.

At $0.16 input/$0.90 output per million tokens, Qwen3.6-35B-A3B costs approximately 40x less than GPT-4o's typical $5-15 per million token pricing. For a workload processing 100M tokens daily (70/30 input/output split), that's $38.20/day versus $800-1500/day. The tradeoff: you lose 4-11 points on mathematical reasoning benchmarks (92.7% vs 96.7% on AIME 2026) but gain self-hosting capability and 262K context windows without rate limits.

The model scores 49.5% on SWE-bench Pro (versus Gemma4-31B's 35.7%) and 51.5% on Terminal-Bench 2.0 (versus 42.9%), indicating strong performance on repository-level code understanding and terminal-based development workflows. Its thinking mode preservation allows iterative debugging across multiple turns, while the 262K context window handles entire codebases. However, it trails Qwen3.5-27B by 1.6 points on SWE-bench Verified (73.4% vs 75%), suggesting the larger model may be better for complex refactoring tasks.

With 92% accuracy on RefCOCO spatial intelligence tasks and 50.8% on ODInW13 object detection, Qwen3.6-35B-A3B's early fusion training delivers competitive vision performance without requiring separate encoders. The unified vision-language foundation means you can process images and text in the same context window without mode switching. This native multimodality comes at no additional inference cost, unlike models that bolt on vision capabilities post-training.

The Q4_K_S quantization reduces the 35B parameter model to approximately 20GB, fitting on consumer GPUs with 24GB VRAM. Benchmark degradation is minimal: expect 2-3% drops on reasoning tasks based on the architecture's robustness to quantization. The sparse MoE design means only 3B parameters activate per token, maintaining inference speeds of 40-50 tokens/second on RTX 4090 hardware even with the full 262K context loaded.

What We Know

Provider

Alibaba