MMLU Benchmark Leaderboard
Massive Multitask Language Understanding benchmark testing knowledge across 57 diverse subjects including STEM, humanities, social sciences, and professional domains
Leaderboard
Top 50 models on MMLU Benchmark Leaderboard (scores from public evaluations).
- 1GPT-592.5% on MMLU Benchmark Leaderboard
- 2o191.8% on MMLU Benchmark Leaderboard
- 3GPT-4.590.8% on MMLU Benchmark Leaderboard
- 3o1-preview90.8% on MMLU Benchmark Leaderboard
- 5Sarvam-105B90.6% on MMLU Benchmark Leaderboard
- 5Qwen3 VL 235B A22B Thinking90.6% on MMLU Benchmark Leaderboard
- 7Claude 3.5 Sonnet90.4% on MMLU Benchmark Leaderboard
- 7Claude 3.5 Sonnet90.4% on MMLU Benchmark Leaderboard
- 9GPT-4.190.2% on MMLU Benchmark Leaderboard
- 9Kimi K2 090590.2% on MMLU Benchmark Leaderboard
- 11GPT OSS 120B90.0% on MMLU Benchmark Leaderboard
- 12LongCat-Flash-Chat89.7% on MMLU Benchmark Leaderboard
- 13Kimi K2-Instruct-090589.5% on MMLU Benchmark Leaderboard
- 13Kimi K2 Instruct89.5% on MMLU Benchmark Leaderboard
- 15Qwen3 VL 235B A22B Instruct88.8% on MMLU Benchmark Leaderboard
- 16Qwen3 VL 32B Thinking88.7% on MMLU Benchmark Leaderboard
- 16GPT-4o88.7% on MMLU Benchmark Leaderboard
- 18DeepSeek-V388.5% on MMLU Benchmark Leaderboard
- 19Qwen3 235B A22B87.8% on MMLU Benchmark Leaderboard
- 20Kimi K2 Base87.8% on MMLU Benchmark Leaderboard
- 21Qwen3 VL 30B A3B Thinking87.6% on MMLU Benchmark Leaderboard
- 22GPT-4.1 mini87.5% on MMLU Benchmark Leaderboard
- 22Grok-287.5% on MMLU Benchmark Leaderboard
- 24Kimi-k1.587.4% on MMLU Benchmark Leaderboard
- 25Llama 3.1 405B Instruct87.3% on MMLU Benchmark Leaderboard
- 26o3-mini86.9% on MMLU Benchmark Leaderboard
- 27Claude 3 Opus86.8% on MMLU Benchmark Leaderboard
- 28GPT-4 Turbo86.5% on MMLU Benchmark Leaderboard
- 29Qwen3 VL 32B Instruct86.4% on MMLU Benchmark Leaderboard
- 29GPT-486.4% on MMLU Benchmark Leaderboard
- 31Grok-2 mini86.2% on MMLU Benchmark Leaderboard
- 32Llama 3.2 90B Instruct86.0% on MMLU Benchmark Leaderboard
- 32Llama 3.3 70B Instruct86.0% on MMLU Benchmark Leaderboard
- 34Nova Pro85.9% on MMLU Benchmark Leaderboard
- 34Gemini 1.5 Pro85.9% on MMLU Benchmark Leaderboard
- 36GPT-4o85.7% on MMLU Benchmark Leaderboard
- 37LongCat-Flash-Lite85.5% on MMLU Benchmark Leaderboard
- 38Llama 4 Maverick85.5% on MMLU Benchmark Leaderboard
- 39GPT OSS 20B85.3% on MMLU Benchmark Leaderboard
- 40Qwen3 VL 8B Thinking85.2% on MMLU Benchmark Leaderboard
- 40o1-mini85.2% on MMLU Benchmark Leaderboard
- 42Sarvam-30B85.1% on MMLU Benchmark Leaderboard
- 43Qwen3 VL 30B A3B Instruct85.0% on MMLU Benchmark Leaderboard
- 44Phi 484.8% on MMLU Benchmark Leaderboard
- 45Mistral Large 284.0% on MMLU Benchmark Leaderboard
- 46Llama 3.1 70B Instruct83.6% on MMLU Benchmark Leaderboard
- 47Qwen2.5 32B Instruct83.3% on MMLU Benchmark Leaderboard
- 48Qwen2 72B Instruct82.3% on MMLU Benchmark Leaderboard
- 49GPT-4o mini82.0% on MMLU Benchmark Leaderboard
- 50Qwen3 VL 4B Thinking81.5% on MMLU Benchmark Leaderboard
Models tracked
Models with mmlu in their evaluation profile.
- ChatGPT-4o Latest
- Claude 3.5 HaikuAnthropic
- Claude 3.5 Sonnet
- Claude 3.5 Sonnet
- Claude 3.7 Sonnet
- Claude 3 Haiku
- Claude 3 OpusAnthropic
- Claude 3 Sonnet
- Claude Haiku 4.5Anthropic
- Claude Mythos PreviewAnthropic
- Claude Opus 4.1
- Claude Opus 4Anthropic
- Claude Opus 4.5
- Claude Opus 4.6Anthropic
- Claude Opus 4.7Anthropic
- Claude Sonnet 4
- Claude Sonnet 4.5
- Claude Sonnet 4.6
- Codestral-22B
- Command R+
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- DeepSeek-V3.2 (Non-thinking)DeepSeek
- DeepSeek-R1-0528DeepSeek
- DeepSeek R1 Distill Llama 70BDeepSeek
- DeepSeek R1 Distill Llama 8BDeepSeek
- DeepSeek R1 Distill Qwen 14BDeepSeek
- DeepSeek R1 Distill Qwen 32BDeepSeek
- DeepSeek R1 Distill Qwen 7BDeepSeek
- DeepSeek R1 ZeroOpenAI
- DeepSeek-V3.2 (Thinking)DeepSeek
- DeepSeek-V2.5DeepSeek
- DeepSeek-V3 0324
- DeepSeek-V3.1DeepSeek
- DeepSeek-V3.2-ExpDeepSeek
- DeepSeek-V3.2-SpecialeDeepSeek
- DeepSeek-V3.2DeepSeek
- DeepSeek-V3
- DeepSeek-V4-Flash-MaxDeepSeek
- DeepSeek-V4-Pro-MaxDeepSeek
- DeepSeek VL2 SmallDeepSeek
- DeepSeek VL2 TinyDeepSeek
- DeepSeek VL2DeepSeek
- ERNIE 4.5
- ERNIE 5.0
- Gemini 1.0 Pro
- Gemini 1.5 Flash 8B