ARC-AGI Benchmark Leaderboard

The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) is a benchmark designed to test general intelligence and abstract reasoning capabilities through visual grid-based transformation tasks. Each task consists of 2-5 demonstration pairs showing input grids transformed into output grids according to underlying rules, with test-takers required to infer these rules and apply them to novel test inputs. The benchmark uses colored grids (up to 30x30) with 10 discrete colors/symbols, designed to measure human-like general fluid intelligence and skill-acquisition efficiency with minimal prior knowledge.

Leaderboard

Top 7 models on ARC-AGI Benchmark Leaderboard (scores from public evaluations).

Rank	Model	Score	Lab
1	GPT-5.5	95.0%	—
2	GPT-5.4	93.7%	—
3	GPT-5.2 Pro	90.5%	—
4	o3	88.0%	—
5	GPT-5.2	86.2%	—
6	LongCat-Flash-Thinking	50.3%	—
7	Qwen3-235B-A22B-Instruct-2507	41.8%	—

Models tracked

Models with arc-agi in their evaluation profile.

No models linked yet.

View task leaderboards →