These pages track pre-release signals, not confirmed launch data. Provider announcements and official model cards are stronger evidence than mentions in research papers, and pricing, benchmarks, and availability remain unconfirmed until release.

Available

Phi-4-mini

Name: Phi-4-mini
Author: Microsoft

Language Model by Microsoft

What We Know

Phi-4-mini achieves 88.6% on GSM8K (4 percentage points above Phi-3.5-Mini) while maintaining just 3.8B parameters, making it the highest-performing model under 5B parameters for mathematical reasoning. The model leverages grouped-query attention and a 200K vocabulary to support multilingual capabilities across its 128K context window, trained on synthetic data through December 2024. Released March 3, 2025 on Azure AI Foundry and HuggingFace, it outperforms Phi-3.5-Mini on 8 of 12 benchmarks including a 7.3 percentage point gain on BBH, though it underperforms on MBPP by 4.7 points and HellaSwag by 3.1 points.

Provider

Microsoft

Benchmark Performance

Benchmark	Phi-4-mini	Comparison
GSM8K	88.6%	84.6%
ARC-Challenge	83.7%	84.6%
HumanEval	74.4%	70.1%
HumanEval+	68.3%	62.8%
MBPP	65.3%	70%
MBPP+	63.8%	63.8%
MMLU	67.3%	65.5%
MMLU-Pro	52.8%	47.4%
BBH (BigBench-Hard)	70.4%	63.1%
GPQA	30.4%	25.2%
HellaSwag	69.1%	72.2%
LiveCodeBench	19.9%	15.7%
BoolQ	81.2%	71.4%
OpenBookQA	79.2%	72.6%
PIQA	77.6%	68.2%

Capabilities & Features

codingreasoningfunction_callinginstruction_followinglong_contextmultilingual200K vocabulary for multilingual supportGrouped-query attentionShared embedding architectureFunction calling128K context windowSynthetic data training

Timeline

November 2024

Training began

December 2024

Training completed

March 3, 2025

Official release announced on Azure AI Foundry and HuggingFace

Resources

Primary source →Phi-4-Mini Technical Report: Compact yet Powerful Multimodal →Empowering innovation: The next generation of the Phi family →microsoft/Phi-4-mini-instruct · Hugging Face →Phi-4 vs Phi-4 Mini Comparison →Paper →

Verification Status

Phi-4-mini is available. Once it appears on our tracked API providers, it will be added to the LLM Leaderboard with full scoring, benchmarks, and pricing.

More from Microsoft

All Microsoft Models Compare AI Models API Pricing

Frequently Asked Questions

Phi-4-mini shows mixed coding performance: it improves on HumanEval (74.4% vs 70.1%) and HumanEval+ (68.3% vs 62.8%), but drops on MBPP (65.3% vs 70%) while matching MBPP+ at 63.8%. The model excels at LiveCodeBench with 19.9% compared to Phi-3.5-Mini's 15.7%, suggesting better performance on contemporary coding challenges versus traditional benchmarks.

Phi-4-mini implements grouped-query attention within its dense decoder-only transformer architecture, reducing memory overhead compared to full attention mechanisms. The model uses a shared embedding architecture across its 200K vocabulary tokens, optimizing parameter efficiency. This design allows processing 128K tokens while maintaining 3.8B parameters, a 32x increase over typical 4K context windows in similar-sized models.

Phi-4-mini demonstrates substantial reasoning gains over Llama-3.2-3B: BoolQ improves by 9.8 percentage points (81.2% vs 71.4%), OpenBookQA by 6.6 points (79.2% vs 72.6%), and PIQA by 9.4 points (77.6% vs 68.2%). These improvements suggest Phi-4-mini's synthetic training data effectively targets common-sense reasoning tasks despite having only 0.6B more parameters.

Phi-4-mini struggles with graduate-level reasoning (GPQA: 30.4%) and shows regression on MBPP (-4.7 points) and HellaSwag (-3.1 points) versus Phi-3.5-Mini. The ARC-Challenge score also drops slightly to 83.7% from 84.6%. These patterns suggest the model trades some general language understanding for improved mathematical and coding capabilities.

Training began in November 2024 and completed in December 2024, with a knowledge cutoff of June 2024. The model was officially released on March 3, 2025, making it one of the newest entries in the sub-5B parameter class. The 6-month gap between data cutoff and training start follows Microsoft's pattern of extensive data curation for the Phi series.

What We Know

Provider

Microsoft