MMMU Benchmark Leaderboard
MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning. Contains 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering across 30 subjects and 183 subfields.
Leaderboard
Top 50 models on MMMU Benchmark Leaderboard (scores from public evaluations).
- 1Qwen3.6 Plus86.0% on MMMU Benchmark Leaderboard
- 2GPT-5.1 Instant85.4% on MMMU Benchmark Leaderboard
- 2GPT-5.185.4% on MMMU Benchmark Leaderboard
- 2GPT-5.1 Thinking85.4% on MMMU Benchmark Leaderboard
- 5GPT-584.2% on MMMU Benchmark Leaderboard
- 6Qwen3.5-122B-A10B83.9% on MMMU Benchmark Leaderboard
- 7Qwen3.6-27B82.9% on MMMU Benchmark Leaderboard
- 7o382.9% on MMMU Benchmark Leaderboard
- 9Qwen3.5-27B82.3% on MMMU Benchmark Leaderboard
- 10Gemini 2.5 Pro Preview 06-0582.0% on MMMU Benchmark Leaderboard
- 11Qwen3.6-35B-A3B81.7% on MMMU Benchmark Leaderboard
- 12o4-mini81.6% on MMMU Benchmark Leaderboard
- 13Qwen3.5-35B-A3B81.4% on MMMU Benchmark Leaderboard
- 14Gemini 2.5 Flash79.7% on MMMU Benchmark Leaderboard
- 15Gemini 2.5 Pro79.6% on MMMU Benchmark Leaderboard
- 16Step3-VL-10B78.1% on MMMU Benchmark Leaderboard
- 17Grok-378.0% on MMMU Benchmark Leaderboard
- 18o177.6% on MMMU Benchmark Leaderboard
- 19Gemini 2.0 Flash Thinking75.4% on MMMU Benchmark Leaderboard
- 20GPT-4.575.2% on MMMU Benchmark Leaderboard
- 21Claude 3.7 Sonnet75.0% on MMMU Benchmark Leaderboard
- 22GPT-4.174.8% on MMMU Benchmark Leaderboard
- 23Claude Sonnet 474.4% on MMMU Benchmark Leaderboard
- 24Llama 4 Maverick73.4% on MMMU Benchmark Leaderboard
- 25Gemini 2.5 Flash-Lite72.9% on MMMU Benchmark Leaderboard
- 26GPT-4.1 mini72.7% on MMMU Benchmark Leaderboard
- 27GPT-4o72.2% on MMMU Benchmark Leaderboard
- 28Gemini 2.0 Flash70.7% on MMMU Benchmark Leaderboard
- 29QvQ-72B-Preview70.3% on MMMU Benchmark Leaderboard
- 30Qwen2.5 VL 72B Instruct70.2% on MMMU Benchmark Leaderboard
- 31Qwen2.5 VL 32B Instruct70.0% on MMMU Benchmark Leaderboard
- 31Kimi-k1.570.0% on MMMU Benchmark Leaderboard
- 33Llama 4 Scout69.4% on MMMU Benchmark Leaderboard
- 34Claude 3.5 Sonnet68.3% on MMMU Benchmark Leaderboard
- 35Gemini 2.0 Flash-Lite68.0% on MMMU Benchmark Leaderboard
- 36Grok-266.1% on MMMU Benchmark Leaderboard
- 37Gemini 1.5 Pro65.9% on MMMU Benchmark Leaderboard
- 38Pixtral Large64.0% on MMMU Benchmark Leaderboard
- 39Grok-2 mini63.2% on MMMU Benchmark Leaderboard
- 40Mistral Small 3.2 24B Instruct62.5% on MMMU Benchmark Leaderboard
- 41Gemini 1.5 Flash62.3% on MMMU Benchmark Leaderboard
- 42Nova Pro61.7% on MMMU Benchmark Leaderboard
- 43Llama 3.2 90B Instruct60.3% on MMMU Benchmark Leaderboard
- 44GPT-4o mini59.4% on MMMU Benchmark Leaderboard
- 45Mistral Small 3.1 24B Instruct59.3% on MMMU Benchmark Leaderboard
- 45Mistral Small 3.1 24B Base59.3% on MMMU Benchmark Leaderboard
- 47Qwen2.5-Omni-7B59.2% on MMMU Benchmark Leaderboard
- 48Qwen2.5 VL 7B Instruct58.6% on MMMU Benchmark Leaderboard
- 49Nova Lite56.2% on MMMU Benchmark Leaderboard
- 50GPT-4.1 nano55.4% on MMMU Benchmark Leaderboard
Models tracked
Models with mmmu in their evaluation profile.
- ChatGPT-4o Latest
- Claude 3.5 HaikuAnthropic
- Claude 3.5 Sonnet
- Claude 3.5 Sonnet
- Claude 3.7 Sonnet
- Claude 3 Haiku
- Claude 3 OpusAnthropic
- Claude 3 Sonnet
- Claude Haiku 4.5Anthropic
- Claude Mythos PreviewAnthropic
- Claude Opus 4.1
- Claude Opus 4Anthropic
- Claude Opus 4.5
- Claude Opus 4.6Anthropic
- Claude Opus 4.7Anthropic
- Claude Sonnet 4
- Claude Sonnet 4.5
- Claude Sonnet 4.6
- Codestral-22B
- Command R+
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- DeepSeek-V3.2 (Non-thinking)DeepSeek
- DeepSeek-R1-0528DeepSeek
- DeepSeek R1 Distill Llama 70BDeepSeek
- DeepSeek R1 Distill Llama 8BDeepSeek
- DeepSeek R1 Distill Qwen 14BDeepSeek
- DeepSeek R1 Distill Qwen 32BDeepSeek
- DeepSeek R1 Distill Qwen 7BDeepSeek
- DeepSeek R1 ZeroOpenAI
- DeepSeek-V3.2 (Thinking)DeepSeek
- DeepSeek-V2.5DeepSeek
- DeepSeek-V3 0324
- DeepSeek-V3.1DeepSeek
- DeepSeek-V3.2-ExpDeepSeek
- DeepSeek-V3.2-SpecialeDeepSeek
- DeepSeek-V3.2DeepSeek
- DeepSeek-V3
- DeepSeek-V4-Flash-MaxDeepSeek
- DeepSeek-V4-Pro-MaxDeepSeek
- DeepSeek VL2 SmallDeepSeek
- DeepSeek VL2 TinyDeepSeek
- DeepSeek VL2DeepSeek
- ERNIE 4.5
- ERNIE 5.0
- Gemini 1.0 Pro
- Gemini 1.5 Flash 8B