Google DeepMind released Gemma 4 31B on April 2, 2026 - the fourth generation of their open-weight model family and the most capable Gemma to date. At 30.7 billion parameters with Apache 2.0 licensing, it sits at an interesting intersection: powerful enough to compete with mid-tier proprietary models, small enough to self-host on a single GPU, and open enough to fine-tune for any use case. It currently ranks #14 out of 315 coding models with a composite score of 87/100.
Architecture: Dense 30.7B with Configurable Thinking
Gemma 4 31B is a dense transformer - all 30.7 billion parameters are active during every forward pass. This is a deliberate choice that contrasts with the MoE trend (see our MoE report). Dense models trade raw parameter efficiency for predictability: consistent latency, simpler deployment, and more uniform behavior across tasks.
The standout architectural feature is configurable thinking/reasoning mode. Like chain-of-thought prompting but built into the model natively, this allows Gemma 4 to allocate more compute to harder problems. When reasoning mode is enabled, the model produces intermediate thinking steps before its final answer, improving accuracy on complex coding, math, and logic tasks at the cost of higher latency and token usage.
This is similar to what models like o3 and DeepSeek R1 do, but Gemma 4 makes it configurable rather than always-on - you choose when to pay the reasoning tax.
Why 31B? The Self-Hosting Sweet Spot
The 30.7B parameter count is not arbitrary. Here is the hardware reality:
This makes Gemma 4 31B the largest open-weight model that can realistically run on consumer hardware. Larger models (70B+) require multi-GPU setups or expensive cloud instances. Smaller models (7-13B) sacrifice too much quality. The 31B size maximizes the quality-to-deployability ratio. For self-hosting guidance, see our self-hosted AI models guide and best local LLM for coding rankings.
Signal Breakdown: What Drives the Score
Here is how Gemma 4 31B performs across each scoring dimension:
Competitive Positioning
Here is where Gemma 4 31B sits in the current coding model rankings:
19 models score within 5 points of Gemma 4 31B, including GPT-5.4 Pro (91), GPT-5.4 (91), GPT-5.1 (90), and 16 others. Use our compare tool to see detailed head-to-head breakdowns.
Pricing Analysis: $0.14/$0.40 Per Million Tokens
Through OpenRouter, Gemma 4 31B costs $0.14/M input tokens and $0.40/M output tokens. To put this in perspective:
These costs are among the lowest for any model scoring above 82/100. And with self-hosting, the marginal cost drops to zero (only infrastructure costs remain). See our pricing trends report for broader market context, or use the pricing calculator to estimate costs for your specific workload.
Open-Weight Landscape: Where Gemma 4 Fits
The open-weight AI model ecosystem is increasingly competitive. Here are the top open-source/open-weight models in our coding rankings:
Gemma 4 31B's main advantage over competitors like DeepSeek and Qwen is licensing clarity. Apache 2.0 is the most permissive widely-used open-source license - no usage restrictions, no commercial limitations, no requirement to share modifications. Compare this against models with more restrictive licenses (Llama's community license) or models where the licensing terms have been debated (some Chinese-origin models). For the full ranking, visit our open-source AI models page.
Size Class Analysis: ~30B Parameter Models
Models in the 27-35B parameter range represent a popular size class that balances quality and deployability. Here is how Gemma 4 31B compares against similarly-sized competitors:
131K Max Output: The Hidden Advantage
A detail that often gets overlooked: Gemma 4 31B supports up to 131,072 output tokens per response. Most competing models cap output at 4K-16K tokens. This 8-16x difference matters enormously for tasks like:
For models with long output support, see our long-output models ranking.
140+ Languages: Multilingual by Default
Gemma 4 31B supports over 140 languages natively - by far the broadest multilingual coverage of any open-weight model in its size class. For global teams, this means code comments, documentation, and natural-language interfaces can be generated in the target language without separate translation steps. For specific multilingual model rankings, visit our best AI for translation page.
Google's Model Lineup
Gemma 4 31B exists alongside Google's proprietary Gemini models. Here is how Google's full lineup ranks:
The Gemini models (proprietary) consistently outrank Gemma (open-weight), but the gap has narrowed with each Gemma generation. Gemma 4 is Google's way of participating in the open-source ecosystem while maintaining its premium positioning with Gemini. See the full provider breakdown at Google provider page.
Fine-Tuning Potential
The Apache 2.0 license means Gemma 4 31B can be fine-tuned without restrictions. Common fine-tuning approaches for a 31B model: