ARC-AGI v2 Benchmark Leaderboard
ARC-AGI-2 is an upgraded benchmark for measuring abstract reasoning and problem-solving abilities in AI systems through visual grid transformation tasks. It evaluates fluid intelligence via input-output grid pairs (1x1 to 30x30) using colored cells (0-9), requiring models to identify underlying transformation rules from demonstration examples and apply them to test cases. Designed to be easy for humans but challenging for AI, focusing on core cognitive abilities like spatial reasoning, pattern recognition, and compositional generalization.
Leaderboard
Top 16 models on ARC-AGI v2 Benchmark Leaderboard (scores from public evaluations).
- 1GPT-5.585.0% on ARC-AGI v2 Benchmark Leaderboard
- 2Gemini 3.1 Pro77.1% on ARC-AGI v2 Benchmark Leaderboard
- 3GPT-5.473.3% on ARC-AGI v2 Benchmark Leaderboard
- 4Gemini 3.5 Flash72.1% on ARC-AGI v2 Benchmark Leaderboard
- 5Claude Opus 4.668.8% on ARC-AGI v2 Benchmark Leaderboard
- 6Claude Sonnet 4.658.3% on ARC-AGI v2 Benchmark Leaderboard
- 7GPT-5.2 Pro54.2% on ARC-AGI v2 Benchmark Leaderboard
- 8GPT-5.252.9% on ARC-AGI v2 Benchmark Leaderboard
- 9Muse Spark42.5% on ARC-AGI v2 Benchmark Leaderboard
- 10Claude Opus 4.537.6% on ARC-AGI v2 Benchmark Leaderboard
- 11Gemini 3 Flash33.6% on ARC-AGI v2 Benchmark Leaderboard
- 12Gemini 3 Pro31.1% on ARC-AGI v2 Benchmark Leaderboard
- 13Grok-415.9% on ARC-AGI v2 Benchmark Leaderboard
- 14Claude Opus 48.6% on ARC-AGI v2 Benchmark Leaderboard
- 15o36.5% on ARC-AGI v2 Benchmark Leaderboard
- 16Gemini 2.5 Pro4.9% on ARC-AGI v2 Benchmark Leaderboard
Models tracked
Models with arc-agi-v2 in their evaluation profile.
- No models linked yet.