300 models ranked for scientific research. Scored with bonuses for reasoning (complex analysis), large context (papers), vision (diagrams), web search (literature), and JSON mode (structured data).
| # | Model | Score |
|---|---|---|
| 1 | GPT-5.4 ProOpenAI | 94 |
| 2 | GPT-5.4OpenAI | 94 |
| 3 | GPT-5.4 MiniOpenAI | 93 |
| 4 | GPT-5.2 ProOpenAI | 93 |
| 5 | GPT-5.2OpenAI | 93 |
| 6 | Claude Opus 4.6Anthropic | 92 |
| 7 | GPT-5 ProOpenAI | 92 |
| 8 | o3 Deep ResearchOpenAI | 92 |
| 9 | Claude Opus 4.5Anthropic | 90 |
| 10 | GPT-5OpenAI | 90 |
| 11 | Claude Sonnet 4.6Anthropic | 89 |
| 12 | Claude Sonnet 4.5Anthropic | 89 |
| 13 | o3 ProOpenAI | 88 |
| 14 | Grok 4.1 FastxAI | 87 |
| 15 | Gemini 3 Flash PreviewGoogle | 89 |
| 16 | Grok 4.20 BetaxAI | 86 |
| 17 | Grok 4xAI | 86 |
| 18 | o3OpenAI | 86 |
| 19 | GPT-5.1OpenAI | 85 |
| 20 | GPT-5.4 NanoOpenAI | 85 |
| 21 | GPT-5.3-CodexOpenAI | 85 |
| 22 | GPT-5.2-CodexOpenAI | 85 |
| 23 | GPT-5.1-Codex-MaxOpenAI | 85 |
| 24 | Sonar Pro SearchPerplexity | 85 |
| 25 | o4 Mini Deep ResearchOpenAI | 85 |
| 26 | o4 Mini HighOpenAI | 85 |
| 27 | o4 MiniOpenAI | 84 |
| 28 | Grok 4 FastxAI | 83 |
| 29 | Claude Haiku 4.5Anthropic | 83 |
| 30 | Gemini 3.1 Pro PreviewGoogle | 86 |
Large context models (128K+) can process entire research papers. Combined with reasoning, they extract key findings, identify methodology gaps, and synthesize across multiple sources.
Vision models analyze charts, plots, and experimental images. Reasoning models work through complex statistical analyses, helping researchers validate findings and spot patterns.
Reasoning models help design experiments, identify confounding variables, and suggest controls. Web search keeps research informed by the latest published methods and protocols.
Large output models draft sections of scientific papers with proper structure. Models review drafts for logical consistency, suggest improvements, and check against current literature.
Based on our composite scoring updated hourly, the top-ranked models are shown at the top of this page. Rankings consider benchmarks, pricing, capabilities, and community adoption.
Yes, several models listed on this page offer free tiers or are fully open-source. Look for models marked as Free in the pricing column above.
We use a composite scoring system combining benchmark performance, capability matching, pricing, context window size, and community adoption. Scores are updated hourly.
Rankings refresh every hour using real-time data from benchmarks, API testing, and community metrics. The data shown always reflects the most current performance.