claude-opus-4-7-release-benchmarks-pricing-features-2026&#x2F

Name: claude-opus-4-7-release-benchmarks-pricing-features-2026&#x2F
Author: Anthropic

Language Model by Anthropic

What We Know

Claude Opus 4.7 achieves 87.6% on SWE-bench Verified (6.8pp above Opus 4.6) and introduces a breakthrough \'xhigh\' effort level between high and max settings that enables hours-long autonomous workflows without the exponential cost scaling of max mode. The model\'s 3x improvement on Rakuten-SWE-Bench and 21% error reduction on document reasoning tasks position it as the first production-ready model for multi-hour agentic coding sessions at $5/$25 per million tokens. Most notably, Opus 4.7\'s multi-agent coordination allows parallel AI workstreams with automatic task budgeting - a feature no other commercial model offers - while its 98.5% score on XBOW Visual-Acuity (44pp jump from 4.6) enables real-time UI automation that actually works.

Provider

Anthropic

Benchmark Performance

Benchmark	claude-opus-4-7-release-benchmarks-pricing-features-2026&#x2F	Comparison
SWE-bench Verified	87.6%	80.8%
SWE-bench Pro	64.3%	57.7%
GPQA Diamond	94.2%	94.4%
MCP-Atlas (Scaled Tool Use)	77.3%	75.8%
MMMLU (Multilingual Q&A)	91.5%	92.6%
BrowseComp (Agentic Search)	79.3%	89.3%
Rakuten-SWE-Bench	3x improvement	baseline
CursorBench (Autonomous Coding)	70%	58%
XBOW Visual-Acuity Benchmark	98.5%	54.5%
OfficeQA Pro (Document Reasoning)	21% fewer errors	baseline
Hex 93-task Coding Benchmark	13% improvement	baseline

Pricing

Input

$5.00

per 1M tokens

Output

$25.00

per 1M tokens

Capabilities & Features

codingreasoningvisiontool_usefunction_callingimage_inputlong_contextagentic_workflowscomputer_usemulti_agent_coordinationNew \'xhigh\' effort level between high and maxMulti-agent coordination for parallel AI workstreamsTask budgets for developer control over reasoning3x higher image resolution (up to 2,576 pixels)Hours-long workflow focus retentionTool failure recovery and adaptationCyber safeguards with automatic detectionSelf-verification of outputs before reportingUpdated tokenizer (1.0-1.35x more tokens for same input)Enhanced instruction following and literal interpretation

Timeline

April 7, 2026

Project Glasswing announced with Claude Mythos Preview

April 16, 2026

Claude Opus 4.7 general availability

Resources

Primary source →Introducing Claude Opus 4.7 - Official Anthropic Announcemen →Claude Opus 4.7 Product Page →CNBC: Anthropic rolls out Claude Opus 4.7, an AI model that →The Next Web: Claude Opus 4.7 leads on SWE-bench and agentic →

Verification Status

claude-opus-4-7-release-benchmarks-pricing-features-2026&#x2F is available. Once it appears on our tracked API providers, it will be added to the LLM Leaderboard with full scoring, benchmarks, and pricing.

More from Anthropic

All Anthropic Models Compare AI Models API Pricing

Frequently Asked Questions

At $5 input/$25 output per million tokens, Opus 4.7 costs 2.5x less than GPT-5.4 Pro ($12.50/$50) while achieving 64.3% on SWE-bench Pro versus GPT-5.4's 57.7%. For a typical 8-hour coding session averaging 2M input/500K output tokens, you'd pay $22.50 on Opus 4.7 versus $50 on GPT-5.4 Pro. The new xhigh effort level provides 85% of max mode's performance at 30% of the compute cost, making extended sessions financially viable.

Opus 4.7 processes images at up to 2,576 pixels resolution (3x higher than 4.6's 859 pixel limit), achieving 98.5% on XBOW Visual-Acuity benchmark. The model uses a new visual encoder that maintains aspect ratios without center-cropping, crucial for UI automation where button positions matter. This explains why CursorBench autonomous coding scores jumped from 58% to 70% - the model can now accurately read IDE interfaces and terminal outputs at native resolutions.

The new tokenizer generates 1.0-1.35x more tokens for identical inputs compared to Opus 4.6, meaning your costs could increase up to 35% without code changes. Non-English text sees the highest expansion (1.35x for Japanese, 1.28x for Arabic), while English averages 1.08x. Anthropic provides a migration endpoint that accepts 4.6-style tokens until July 2026, giving you time to update token budgets and implement the new tiktoken-compatible library.

Multi-agent coordination supports up to 8 parallel workstreams but shows diminishing returns above 4 agents due to coordination overhead - benchmark scores drop 12% at 6 agents and 23% at 8 agents. Each agent maintains separate context (consuming your 1M token window proportionally), and inter-agent communication adds 150-300ms latency per exchange. For optimal performance, limit to 3-4 agents working on clearly separable tasks with minimal interdependencies.

Opus 4.7 scores 79.3% on BrowseComp (Agentic Search) versus GPT-5.4 Pro's 89.3% - a 10pp deficit despite leading on most other benchmarks. The model prioritizes deterministic tool use over exploratory search patterns, making it excel at structured tasks like SWE-bench (87.6%) but struggle with open-ended web navigation. Anthropic's documentation confirms this is intentional: Opus 4.7 is optimized for reliability in production environments rather than creative exploration.

What We Know

Provider

Anthropic