Evaluates AI coding assistants on real-world multi-file editing tasks across diverse codebases. Tests the ability to understand project context and make coordinated changes.
为什么重要: The first benchmark designed specifically for agentic coding assistants that edit multiple files. More realistic than single-function benchmarks like HumanEval.
顶级模型
61.3%
Composer 2
平均评分
61.3%
共1个模型
已测试模型
1
指标: pass rate
人类基准
-
评分范围: 0%–100%
CursorBench Scores - Top 1 Models
Ranked by CursorBench score (%)
All models with a reported CursorBench score, ranked by highest pass rate.
CursorBench is a standardized evaluation that measures AI model performance on specific tasks. It provides comparable scores across different models, helping developers choose the right model for their needs.
Composer 2 currently holds the top score on the CursorBench benchmark. See our full rankings table above for the complete leaderboard with 1 models.
We update benchmark data from multiple sources including HuggingFace open-source model leaderboards and LMArena. Scores are refreshed regularly as new evaluations are published and new models are released.
No. While CursorBench is an important indicator, real-world performance depends on many factors including pricing, latency, context window, and specific task requirements. We recommend using our composite score which weighs multiple benchmarks and practical factors.