Small Language Models

135 lightweight AI models under $1/1M tokens. Small language models (SLMs) are optimized for speed, low cost, and edge deployment - ideal for mobile apps, IoT, chatbots, and high-volume production workloads.

135

Under $1/1M

Free

105

Open Source

110

+ Tool Use

Small Language Models - Ranked by Efficiency

#	Model	Provider	Score	$/1M Out	Context
1	Gemma 4 31B (free)Google	Google	80	Free	262K
2	DeepSeek V4 ProDeepSeek	DeepSeek	86	$0.87	1.0M
3	DeepSeek V3.2DeepSeek	DeepSeek	81	$0.34	131K
4	Gemma 4 31BGoogle	Google	80	$0.35	262K
5	DeepSeek V4 FlashDeepSeek	DeepSeek	77	$0.18	1.0M
6	Gemini 2.5 Flash Lite Preview 09-2025Google	Google	79	$0.40	1.0M
7	Gemini 2.5 Flash LiteGoogle	Google	79	$0.40	1.0M
8	Gemma 4 26B A4B (free)Google	Google	73	Free	262K
9	Gemma 2 27BGoogle	Google	77	$0.65	8K
10	MiMo-V2.5Xiaomi	Xiaomi	72	$0.28	1.0M
11	Gemma 4 26B A4B Google	Google	73	$0.33	262K
12	MiniMax M2.5MiniMax	MiniMax	78	$0.90	205K
13	MiMo-V2.5-ProXiaomi	Xiaomi	76	$0.87	1.0M
14	Qwen3 Next 80B A3B Instruct (free)Alibaba	Alibaba	67	Free	262K
15	Hy3 previewTencent	Tencent	68	$0.21	262K
16	DeepSeek V3.2 ExpDeepSeek	DeepSeek	70	$0.41	164K
17	Qwen3.5-FlashAlibaba	Alibaba	68	$0.26	1M
18	Llama 3.3 70B Instruct (free)Meta	Meta	66	Free	131K
19	Qwen3.5-9BAlibaba	Alibaba	66	$0.15	262K
20	Qwen3 235B A22B Thinking 2507Alibaba	Alibaba	65	$0.10	262K
21	DeepSeek V3 0324DeepSeek	DeepSeek	71	$0.77	164K
22	Step 3.5 FlashStepFun	StepFun	67	$0.30	262K
23	Qwen3 235B A22B Instruct 2507Alibaba	Alibaba	64	$0.10	262K
24	Llama 3.3 70B InstructMeta	Meta	66	$0.32	131K
25	GPT-4o-miniOpenAI	OpenAI	69	$0.60	128K
26	GLM 4.5 AirZhipu AI	Zhipu AI	70	$0.85	131K
27	Llama 4 MaverickMeta	Meta	67	$0.60	1.0M
28	DeepSeek V3.1DeepSeek	DeepSeek	69	$0.79	164K
29	DeepSeek V3DeepSeek	DeepSeek	69	$0.80	131K
30	Llama 3.1 70B InstructMeta	Meta	65	$0.40	131K

When to Use Small Language Models

High-Volume Applications

Processing millions of requests per day? SLMs cost 10-100x less than premium models. A chatbot handling 1M messages/month costs ~$100 with budget models vs $10,000+ with premium ones.

Edge & Mobile Deployment

Open-source SLMs can run on consumer hardware - laptops, phones, or edge devices. Models like Phi, Gemma, and small Llama variants fit in 4-8GB of RAM.

Low-Latency Requirements

Smaller models respond faster. For real-time applications like autocomplete, classification, or chat, SLMs deliver sub-100ms responses.

Task-Specific Workloads

Many tasks - classification, extraction, summarization, translation - don't need the largest models. A well-chosen SLM can match premium model quality on focused tasks.

Cheapest Models Fastest Models Free Models Open Source Self-Hosted LLM Leaderboard

Frequently Asked Questions

Small language models are AI models with fewer parameters, typically under 10 billion. They run faster, cost less, and can operate on edge devices while still handling many common tasks like text generation, summarization, and simple coding assistance.

Use SLMs when you need low latency, low cost, or on-device deployment. Use full LLMs when you need complex reasoning, creative writing, or state-of-the-art accuracy. SLMs are ideal for chatbots, simple Q&A, and high-volume applications where cost matters more than peak performance.

SLMs trade some capability for speed and efficiency. Modern SLMs like Phi-3 and Gemma 2 can match older large models on many benchmarks. For specialized tasks, a fine-tuned SLM can outperform a general-purpose LLM while being 10-100x cheaper to run.

Model

Score

Gemma 4 31B (free)Google

DeepSeek V4 ProDeepSeek

DeepSeek V3.2DeepSeek

Gemma 4 31BGoogle

DeepSeek V4 FlashDeepSeek

Gemini 2.5 Flash Lite Preview 09-2025Google

Gemini 2.5 Flash LiteGoogle

Gemma 4 26B A4B (free)Google

Gemma 2 27BGoogle

MiMo-V2.5Xiaomi

Gemma 4 26B A4B Google

MiniMax M2.5MiniMax

MiMo-V2.5-ProXiaomi

Qwen3 Next 80B A3B Instruct (free)Alibaba

Hy3 previewTencent

DeepSeek V3.2 ExpDeepSeek

Qwen3.5-FlashAlibaba

Llama 3.3 70B Instruct (free)Meta

Qwen3.5-9BAlibaba

Qwen3 235B A22B Thinking 2507Alibaba

DeepSeek V3 0324DeepSeek

Step 3.5 FlashStepFun

Qwen3 235B A22B Instruct 2507Alibaba

Llama 3.3 70B InstructMeta

GPT-4o-miniOpenAI

GLM 4.5 AirZhipu AI

Llama 4 MaverickMeta

DeepSeek V3.1DeepSeek

DeepSeek V3DeepSeek

Llama 3.1 70B InstructMeta

When to Use Small Language Models

High-Volume Applications

Processing millions of requests per day? SLMs cost 10-100x less than premium models. A chatbot handling 1M messages/month costs ~$100 with budget models vs $10,000+ with premium ones.

Edge & Mobile Deployment

Open-source SLMs can run on consumer hardware - laptops, phones, or edge devices. Models like Phi, Gemma, and small Llama variants fit in 4-8GB of RAM.

Low-Latency Requirements

Smaller models respond faster. For real-time applications like autocomplete, classification, or chat, SLMs deliver sub-100ms responses.

Task-Specific Workloads

Many tasks - classification, extraction, summarization, translation - don't need the largest models. A well-chosen SLM can match premium model quality on focused tasks.

Small Language Models

Small Language Models - Ranked by Efficiency

When to Use Small Language Models

High-Volume Applications

Edge & Mobile Deployment

Low-Latency Requirements

Task-Specific Workloads

Related Pages

Small Language Models

Small Language Models - Ranked by Efficiency

When to Use Small Language Models

High-Volume Applications

Edge & Mobile Deployment

Low-Latency Requirements

Task-Specific Workloads

Related Pages