AIME 2025 Benchmark Leaderboard
All 30 problems from the 2025 American Invitational Mathematics Examination (AIME I and AIME II), testing olympiad-level mathematical reasoning with integer answers from 000-999. Used as an AI benchmark to evaluate large language models' ability to solve complex mathematical problems requiring multi-step logical deductions and structured symbolic reasoning.
Leaderboard
Top 50 models on AIME 2025 Benchmark Leaderboard (scores from public evaluations).
- 1Grok-4 Heavy100.0% on AIME 2025 Benchmark Leaderboard
- 1GPT-5.2100.0% on AIME 2025 Benchmark Leaderboard
- 1Kimi K2-Thinking-0905100.0% on AIME 2025 Benchmark Leaderboard
- 1GPT-5.2 Pro100.0% on AIME 2025 Benchmark Leaderboard
- 1Gemini 3 Pro100.0% on AIME 2025 Benchmark Leaderboard
- 6Claude Opus 4.699.8% on AIME 2025 Benchmark Leaderboard
- 7Gemini 3 Flash99.7% on AIME 2025 Benchmark Leaderboard
- 8LongCat-Flash-Thinking-260199.6% on AIME 2025 Benchmark Leaderboard
- 8GPT-5.1 High99.6% on AIME 2025 Benchmark Leaderboard
- 10Nemotron 3 Nano (30B A3B)99.2% on AIME 2025 Benchmark Leaderboard
- 11GPT OSS 20B High98.7% on AIME 2025 Benchmark Leaderboard
- 12GPT-5.1 Medium98.4% on AIME 2025 Benchmark Leaderboard
- 13Seed 2.0 Pro98.3% on AIME 2025 Benchmark Leaderboard
- 14Step-3.5-Flash97.3% on AIME 2025 Benchmark Leaderboard
- 15Sarvam-30B96.7% on AIME 2025 Benchmark Leaderboard
- 15GPT-5.1 Codex High96.7% on AIME 2025 Benchmark Leaderboard
- 15Sarvam-105B96.7% on AIME 2025 Benchmark Leaderboard
- 18Kimi K2.596.1% on AIME 2025 Benchmark Leaderboard
- 19DeepSeek-V3.2-Speciale96.0% on AIME 2025 Benchmark Leaderboard
- 20GLM-4.795.7% on AIME 2025 Benchmark Leaderboard
- 21GPT-594.6% on AIME 2025 Benchmark Leaderboard
- 21GPT-5 High94.6% on AIME 2025 Benchmark Leaderboard
- 23MiMo-V2-Flash94.1% on AIME 2025 Benchmark Leaderboard
- 24GPT-5.1 Thinking94.0% on AIME 2025 Benchmark Leaderboard
- 24GPT-5.194.0% on AIME 2025 Benchmark Leaderboard
- 24GPT-5.1 Instant94.0% on AIME 2025 Benchmark Leaderboard
- 27GLM-4.693.9% on AIME 2025 Benchmark Leaderboard
- 28Grok-393.3% on AIME 2025 Benchmark Leaderboard
- 29DeepSeek-V3.2 (Thinking)93.1% on AIME 2025 Benchmark Leaderboard
- 29DeepSeek-V3.293.1% on AIME 2025 Benchmark Leaderboard
- 31Seed 2.0 Lite93.0% on AIME 2025 Benchmark Leaderboard
- 32K-EXAONE-236B-A23B92.8% on AIME 2025 Benchmark Leaderboard
- 33o4-mini92.7% on AIME 2025 Benchmark Leaderboard
- 34GPT OSS 120B High92.5% on AIME 2025 Benchmark Leaderboard
- 35Qwen3-235B-A22B-Thinking-250792.3% on AIME 2025 Benchmark Leaderboard
- 36Grok 4 Fast92.0% on AIME 2025 Benchmark Leaderboard
- 37Grok-491.7% on AIME 2025 Benchmark Leaderboard
- 38GLM-4.7-Flash91.6% on AIME 2025 Benchmark Leaderboard
- 39Mercury 291.1% on AIME 2025 Benchmark Leaderboard
- 39GPT-5 mini91.1% on AIME 2025 Benchmark Leaderboard
- 41Grok-3 Mini90.8% on AIME 2025 Benchmark Leaderboard
- 42LongCat-Flash-Thinking90.6% on AIME 2025 Benchmark Leaderboard
- 43Nemotron 3 Super (120B A12B)90.2% on AIME 2025 Benchmark Leaderboard
- 44Qwen3 VL 235B A22B Thinking89.7% on AIME 2025 Benchmark Leaderboard
- 45DeepSeek-V3.2-Exp89.3% on AIME 2025 Benchmark Leaderboard
- 46GPT-5 Medium88.9% on AIME 2025 Benchmark Leaderboard
- 47Gemini 2.5 Pro Preview 06-0588.0% on AIME 2025 Benchmark Leaderboard
- 48Qwen3-Next-80B-A3B-Thinking87.8% on AIME 2025 Benchmark Leaderboard
- 49Step3-VL-10B87.7% on AIME 2025 Benchmark Leaderboard
- 50DeepSeek-R1-052887.5% on AIME 2025 Benchmark Leaderboard
Models tracked
Models with aime-2025 in their evaluation profile.
- ChatGPT-4o Latest
- Claude 3.5 HaikuAnthropic
- Claude 3.5 Sonnet
- Claude 3.5 Sonnet
- Claude 3.7 Sonnet
- Claude 3 Haiku
- Claude 3 OpusAnthropic
- Claude 3 Sonnet
- Claude Haiku 4.5Anthropic
- Claude Mythos PreviewAnthropic
- Claude Opus 4.1
- Claude Opus 4Anthropic
- Claude Opus 4.5
- Claude Opus 4.6Anthropic
- Claude Opus 4.7Anthropic
- Claude Sonnet 4
- Claude Sonnet 4.5
- Claude Sonnet 4.6
- Codestral-22B
- Command R+
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- DeepSeek-V3.2 (Non-thinking)DeepSeek
- DeepSeek-R1-0528DeepSeek
- DeepSeek R1 Distill Llama 70BDeepSeek
- DeepSeek R1 Distill Llama 8BDeepSeek
- DeepSeek R1 Distill Qwen 14BDeepSeek
- DeepSeek R1 Distill Qwen 32BDeepSeek
- DeepSeek R1 Distill Qwen 7BDeepSeek
- DeepSeek R1 ZeroOpenAI
- DeepSeek-V3.2 (Thinking)DeepSeek
- DeepSeek-V2.5DeepSeek
- DeepSeek-V3 0324
- DeepSeek-V3.1DeepSeek
- DeepSeek-V3.2-ExpDeepSeek
- DeepSeek-V3.2-SpecialeDeepSeek
- DeepSeek-V3.2DeepSeek
- DeepSeek-V3
- DeepSeek-V4-Flash-MaxDeepSeek
- DeepSeek-V4-Pro-MaxDeepSeek
- DeepSeek VL2 SmallDeepSeek
- DeepSeek VL2 TinyDeepSeek
- DeepSeek VL2DeepSeek
- ERNIE 4.5
- ERNIE 5.0
- Gemini 1.0 Pro
- Gemini 1.5 Flash 8B