135 lightweight AI models under $1/1M tokens. Small language models (SLMs) are optimized for speed, low cost, and edge deployment - ideal for mobile apps, IoT, chatbots, and high-volume production workloads.
| # | Model | Score |
|---|---|---|
| 1 | Gemma 4 31B (free)Google | 81 |
| 2 | MiniMax M2.5 (free)MiniMax | 78 |
| 3 | Gemma 4 31BGoogle | 81 |
| 4 | Gemini 2.5 Flash Lite Preview 09-2025Google | 79 |
| 5 | Gemini 2.5 Flash LiteGoogle | 79 |
| 6 | Gemma 4 26B A4B (free)Google | 73 |
| 7 | Grok 4.1 FastxAI | 78 |
| 8 | Gemma 2 27BGoogle | 77 |
| 9 | GLM 4.5 Air (free)Zhipu AI | 71 |
| 10 | Gemma 4 26B A4B Google | 73 |
| 11 | DeepSeek V4 FlashDeepSeek | 72 |
| 12 | Gemini 2.0 FlashGoogle | 72 |
| 13 | Grok 4 FastxAI | 73 |
| 14 | DeepSeek V4 ProDeepSeek | 76 |
| 15 | Qwen3 Next 80B A3B Instruct (free)Alibaba | 67 |
| 16 | DeepSeek V3.2DeepSeek | 70 |
| 17 | Hy3 previewTencent | 69 |
| 18 | Qwen3.5-FlashAlibaba | 69 |
| 19 | DeepSeek V3.2 ExpDeepSeek | 70 |
| 20 | Llama 3.3 70B Instruct (free)Meta | 66 |
| 21 | Qwen3.5-9BAlibaba | 67 |
| 22 | DeepSeek V3 0324DeepSeek | 72 |
| 23 | Step 3.5 FlashStepFun | 67 |
| 24 | Qwen3 235B A22B Instruct 2507Alibaba | 65 |
| 25 | Llama 3.3 70B InstructMeta | 67 |
| 26 | GPT-4o-miniOpenAI | 69 |
| 27 | GLM 4.5 AirZhipu AI | 71 |
| 28 | DeepSeek V3.1DeepSeek | 69 |
| 29 | Llama 3.1 70B InstructMeta | 65 |
| 30 | Llama 4 MaverickMeta | 67 |
Processing millions of requests per day? SLMs cost 10-100x less than premium models. A chatbot handling 1M messages/month costs ~$100 with budget models vs $10,000+ with premium ones.
Open-source SLMs can run on consumer hardware - laptops, phones, or edge devices. Models like Phi, Gemma, and small Llama variants fit in 4-8GB of RAM.
Smaller models respond faster. For real-time applications like autocomplete, classification, or chat, SLMs deliver sub-100ms responses.
Many tasks - classification, extraction, summarization, translation - don't need the largest models. A well-chosen SLM can match premium model quality on focused tasks.
Small language models are AI models with fewer parameters, typically under 10 billion. They run faster, cost less, and can operate on edge devices while still handling many common tasks like text generation, summarization, and simple coding assistance.
Use SLMs when you need low latency, low cost, or on-device deployment. Use full LLMs when you need complex reasoning, creative writing, or state-of-the-art accuracy. SLMs are ideal for chatbots, simple Q&A, and high-volume applications where cost matters more than peak performance.
SLMs trade some capability for speed and efficiency. Modern SLMs like Phi-3 and Gemma 2 can match older large models on many benchmarks. For specialized tasks, a fine-tuned SLM can outperform a general-purpose LLM while being 10-100x cheaper to run.