ARC-AGI Benchmark Leaderboard
The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) is a benchmark designed to test general intelligence and abstract reasoning capabilities through visual grid-based transformation tasks. Each task consists of 2-5 demonstration pairs showing input grids transformed into output grids according to underlying rules, with test-takers required to infer these rules and apply them to novel test inputs. The benchmark uses colored grids (up to 30x30) with 10 discrete colors/symbols, designed to measure human-like general fluid intelligence and skill-acquisition efficiency with minimal prior knowledge.
Leaderboard
Top 7 models on ARC-AGI Benchmark Leaderboard (scores from public evaluations).
- 1GPT-5.595.0% on ARC-AGI Benchmark Leaderboard
- 2GPT-5.493.7% on ARC-AGI Benchmark Leaderboard
- 3GPT-5.2 Pro90.5% on ARC-AGI Benchmark Leaderboard
- 4o388.0% on ARC-AGI Benchmark Leaderboard
- 5GPT-5.286.2% on ARC-AGI Benchmark Leaderboard
- 6LongCat-Flash-Thinking50.3% on ARC-AGI Benchmark Leaderboard
- 7Qwen3-235B-A22B-Instruct-250741.8% on ARC-AGI Benchmark Leaderboard
Models tracked
Models with arc-agi in their evaluation profile.
- No models linked yet.