Skip to content
Review
April 3, 20267 min read

Gemma 4 Review: Google DeepMind's Open-Weight Challenger

Google DeepMind launches Gemma 4 31B with Apache 2.0 licensing, multimodal capabilities, and a 256K context window. We test its performance against closed and open-source competitors.

Google DeepMind released Gemma 4 31B on April 2, 2026 - the fourth generation of their open-weight model family and the most capable Gemma to date. At 30.7 billion parameters with Apache 2.0 licensing, it sits at an interesting intersection: powerful enough to compete with mid-tier proprietary models, small enough to self-host on a single GPU, and open enough to fine-tune for any use case. It currently ranks #45 out of 317 coding models with a composite score of 80/100.

#45
Rank
of 317 models
80
Score
/100
30.7B
Parameters
dense
256K
Context
tokens
131K
Max Output
tokens
$0.14
Input Price
/M tokens
$0.40
Output Price
/M tokens
140+
Languages
supported

Architecture: Dense 30.7B with Configurable Thinking

Gemma 4 31B is a dense transformer - all 30.7 billion parameters are active during every forward pass. This is a deliberate choice that contrasts with the MoE trend (see our MoE report). Dense models trade raw parameter efficiency for predictability: consistent latency, simpler deployment, and more uniform behavior across tasks.

The standout architectural feature is configurable thinking/reasoning mode. Like chain-of-thought prompting but built into the model natively, this allows Gemma 4 to allocate more compute to harder problems. When reasoning mode is enabled, the model produces intermediate thinking steps before its final answer, improving accuracy on complex coding, math, and logic tasks at the cost of higher latency and token usage.

This is similar to what models like o3 and DeepSeek R1 do, but Gemma 4 makes it configurable rather than always-on - you choose when to pay the reasoning tax.

Why 31B? The Self-Hosting Sweet Spot

The 30.7B parameter count is not arbitrary. Here is the hardware reality:

FP16 (full precision)
~62 GB VRAM
Single A100 80GB or 2x RTX 4090 - Maximum quality, highest memory
INT8 quantized
~31 GB VRAM
Single A100 40GB or 1x RTX 4090 - Minimal quality loss, practical for most
INT4/GPTQ quantized
~16 GB VRAM
RTX 4080 or M2 Ultra Mac - Some quality degradation, consumer hardware viable
GGUF Q4_K_M (llama.cpp)
~18 GB VRAM
RTX 3090 / M1 Pro Mac - CPU offloading possible, slowest but most accessible

This makes Gemma 4 31B the largest open-weight model that can realistically run on consumer hardware. Larger models (70B+) require multi-GPU setups or expensive cloud instances. Smaller models (7-13B) sacrifice too much quality. The 31B size maximizes the quality-to-deployability ratio. For self-hosting guidance, see our self-hosted AI models guide and best local LLM for coding rankings.

Signal Breakdown: What Drives the Score

Here is how Gemma 4 31B performs across each scoring dimension:

Benchmarks
86.1/100 (weight: 30%)
Contribution: 25.8 pts - composite across MMLU, HumanEval, GPQA, SWE-bench, and more
Capabilities
83.3/100 (weight: 20%)
Contribution: 16.7 pts - 5 of 7 capabilities (vision, function calling, streaming, JSON mode, reasoning)
Recency
100.0/100 (weight: 15%)
Contribution: 15.0 pts - released 93 days ago (newer models score higher)
Pricing
99.7/100 (weight: 15%)
Contribution: 14.9 pts - $0.35/M output tokens
Output Capacity
90.3/100 (weight: 10%)
Contribution: 9.0 pts - 262.1K max output tokens (2x higher than most competitors)
Context Window
77.4/100 (weight: 10%)
Contribution: 7.7 pts - 262.1K tokens

Competitive Positioning

Here is where Gemma 4 31B sits in the current coding model rankings:

34 models score within 5 points of Gemma 4 31B, including Claude Sonnet 4.6 (85), Claude Opus 4.5 (85), Gemini 2.5 Pro (83), and 31 others. Use our compare tool to see detailed head-to-head breakdowns.

Pricing Analysis: $0.14/$0.40 Per Million Tokens

Through OpenRouter, Gemma 4 31B costs $0.14/M input tokens and $0.40/M output tokens. To put this in perspective:

1,000 code reviews (~5K tokens each)
$0.70 input + $2.00 output = $2.70 total
10,000 function generations (~500 tokens each)
$0.07 input + $2.00 output = $2.07 total
Full day of pair-programming (100 turns, ~2K tokens each)
$0.03 input + $0.08 output = $0.11 total
Processing 100 documents (~10K tokens each)
$0.14 input + $0.40 output = $0.54 total

These costs are among the lowest for any model scoring above 75/100. And with self-hosting, the marginal cost drops to zero (only infrastructure costs remain). See our pricing trends report for broader market context, or use the pricing calculator to estimate costs for your specific workload.

Open-Weight Landscape: Where Gemma 4 Fits

The open-weight AI model ecosystem is increasingly competitive. Here are the top open-source/open-weight models in our coding rankings:

Gemma 4 31B's main advantage over competitors like DeepSeek and Qwen is licensing clarity. Apache 2.0 is the most permissive widely-used open-source license - no usage restrictions, no commercial limitations, no requirement to share modifications. Compare this against models with more restrictive licenses (Llama's community license) or models where the licensing terms have been debated (some Chinese-origin models). For the full ranking, visit our open-source AI models page.

Size Class Analysis: ~30B Parameter Models

Models in the 27-35B parameter range represent a popular size class that balances quality and deployability. Here is how Gemma 4 31B compares against similarly-sized competitors:

131K Max Output: The Hidden Advantage

A detail that often gets overlooked: Gemma 4 31B supports up to 131,072 output tokens per response. Most competing models cap output at 4K-16K tokens. This 8-16x difference matters enormously for tasks like:

Generating entire file implementations (not just snippets)
Producing comprehensive documentation or technical specifications
Creating long-form analysis reports from data inputs
Multi-step code generation where the model builds complete features
Translation of long documents without truncation

For models with long output support, see our long-output models ranking.

140+ Languages: Multilingual by Default

Gemma 4 31B supports over 140 languages natively - by far the broadest multilingual coverage of any open-weight model in its size class. For global teams, this means code comments, documentation, and natural-language interfaces can be generated in the target language without separate translation steps. For specific multilingual model rankings, visit our best AI for translation page.

Google's Model Lineup

Gemma 4 31B exists alongside Google's proprietary Gemini models. Here is how Google's full lineup ranks:

The Gemini models (proprietary) consistently outrank Gemma (open-weight), but the gap has narrowed with each Gemma generation. Gemma 4 is Google's way of participating in the open-source ecosystem while maintaining its premium positioning with Gemini. See the full provider breakdown at Google provider page.

Fine-Tuning Potential

The Apache 2.0 license means Gemma 4 31B can be fine-tuned without restrictions. Common fine-tuning approaches for a 31B model:

LoRA/QLoRA
The most practical approach. Train adapter layers on a single GPU (16-24GB VRAM) while keeping base weights frozen. Achieves 90%+ of full fine-tuning quality at a fraction of the compute cost. Ideal for domain adaptation (legal, medical, proprietary codebase conventions).
Full Fine-Tuning
Requires 4-8x A100s or equivalent. Only justified for major distribution shifts (e.g., entirely new programming languages, specialized output formats). Most teams should start with LoRA.
Distillation
Use Gemma 4 31B as a teacher to train smaller models (7B, 2B) for edge deployment. The open license explicitly permits this, unlike some competitors.

Limitations and Tradeoffs

!
256K context vs 1M+ competitors: Sufficient for most tasks, but models like Gemini 3 Pro and Qwen 3.6 offer larger windows for extreme use cases.
!
Dense architecture: All 30.7B parameters are active per token, making it more expensive per-token than equivalent MoE models. The tradeoff is simpler deployment and more consistent behavior.
!
No web search capability: Unlike some competing models, Gemma 4 does not natively support web search or real-time information retrieval.
!
Single size at launch: Only 31B is currently available. Teams needing smaller (for edge) or larger (for maximum quality) variants will need to wait.
!
No image output: Gemma 4 accepts images and video as input but only produces text output. It cannot generate or edit images.

Bottom Line

Gemma 4 31B ranks #45 with 80/100 - competitive for an open-weight 31B model.
Apache 2.0 licensing is the clearest competitive advantage - no other model in this performance tier offers the same freedom.
At $0.14/$0.40 per M tokens via API, or zero marginal cost self-hosted, it is among the most cost-effective options available.
131K max output tokens is a sleeper advantage - most competitors cap at 4-16K, limiting their usefulness for long-form generation.
The 31B dense architecture is the optimal size for single-GPU self-hosting with quantization, making it accessible to individual developers.
140+ languages and configurable reasoning mode add practical versatility that goes beyond raw benchmark scores.
Frequently Asked Questions

Gemma 4 is Google DeepMind's fourth-generation open-weight model family. The 31B variant launched on April 2, 2026 with Apache 2.0 licensing, 256K context window, and native multimodal capabilities.

Gemma 4 is released under the Apache 2.0 license as open weights. This means you can freely download, fine-tune, distill, and deploy it commercially without licensing restrictions or usage fees.

Both are open-weight models targeting developers. Gemma 4 31B is a dense transformer focused on efficiency and deployability, while Llama 4 Maverick uses a MoE architecture for higher raw parameter counts. Check our leaderboard for live score comparisons.

The Gemma 4 31B is the first variant released. Google DeepMind has indicated additional sizes will follow, continuing the multi-size strategy from previous Gemma generations.

Gemma 4 Review: Google DeepMind's Open-Weight Challenger | LM Market Cap