The top AI models for every use case, ranked by our composite scoring system. Covering 364+ models across 55+ providers. Data refreshed hourly from live benchmarks, pricing, and capabilities.
by Anthropic - 1M context, $150.00/1M output
Models with reasoning capabilities, function calling, and top benchmark scores for code generation.
High-quality language models with streaming support, large context windows, and strong generation.
Models with dedicated reasoning capabilities for complex problem-solving and logical tasks.
The cheapest models that still deliver strong quality. Maximum performance per dollar.
Top-performing open-weight models you can self-host, fine-tune, and deploy without vendor lock-in.
Dedicated image generation models for creating visuals, art, and design assets from text prompts.
The highest-scoring models across all categories, ranked by composite score.
Our scoring system is benchmark-driven: benchmark performance (90%) from Arena Elo ratings, MMLU, GPQA, HumanEval, SWE-bench, and 15+ standardized evaluations, plus capabilities (5%) and context window (5%) as tiebreakers. Scores range from 0 to 100.
Benchmark scores are aggregated from multiple independent sources including head-to-head Arena evaluations, academic leaderboards, and curated official results. Models without benchmark data are capped at a score of 40, ensuring empirically evaluated models always rank higher.
Data is aggregated from multiple live API sources, covering 364+ models from 55+ providers. Scores refresh hourly so rankings always reflect the latest model releases and pricing changes.
We rank 364+ models using a composite scoring system that weighs benchmarks (90%), capabilities (5%), and context window (5%). Each use case category - coding, writing, math, image generation - gets separate rankings so the top model varies by task. Currently Claude Opus 4.7 (Fast) by Anthropic leads the overall rankings with a score of 95.
Claude Opus 4.7 (Fast) by Anthropic leads our coding rankings with a score of 95. It excels at code generation thanks to its reasoning capabilities and 1M context window. Other strong coding models include Claude Opus 4.7 and GPT-5.5.
Qwen3 235B A22B Instruct 2507 by Alibaba offers excellent value at just $0.100 per million output tokens while maintaining a quality score of 65. For truly free options, several models from providers like Google and Meta are available at zero cost.
Open source models have closed the gap significantly. DeepSeek V4 Pro scores 87, competitive with many proprietary options. Models from DeepSeek, Meta (Llama), and Alibaba (Qwen) now rival GPT-4o and Claude on many benchmarks. The main advantages of open source are self-hosting flexibility, fine-tuning, no vendor lock-in, and often lower API costs. Proprietary models like GPT-4o and Claude still lead on some enterprise features and ecosystem integrations.
Dive deeper into rankings, compare models head-to-head, or filter by price, category, and capabilities.