MMLU Benchmark Leaderboard

Massive Multitask Language Understanding benchmark testing knowledge across 57 diverse subjects including STEM, humanities, social sciences, and professional domains

Leaderboard

Top 50 models on MMLU Benchmark Leaderboard (scores from public evaluations).

  1. 1GPT-592.5% on MMLU Benchmark Leaderboard
  2. 2o191.8% on MMLU Benchmark Leaderboard
  3. 3GPT-4.590.8% on MMLU Benchmark Leaderboard
  4. 3o1-preview90.8% on MMLU Benchmark Leaderboard
  5. 5Sarvam-105B90.6% on MMLU Benchmark Leaderboard
  6. 5Qwen3 VL 235B A22B Thinking90.6% on MMLU Benchmark Leaderboard
  7. 7Claude 3.5 Sonnet90.4% on MMLU Benchmark Leaderboard
  8. 7Claude 3.5 Sonnet90.4% on MMLU Benchmark Leaderboard
  9. 9GPT-4.190.2% on MMLU Benchmark Leaderboard
  10. 9Kimi K2 090590.2% on MMLU Benchmark Leaderboard
  11. 11GPT OSS 120B90.0% on MMLU Benchmark Leaderboard
  12. 12LongCat-Flash-Chat89.7% on MMLU Benchmark Leaderboard
  13. 13Kimi K2-Instruct-090589.5% on MMLU Benchmark Leaderboard
  14. 13Kimi K2 Instruct89.5% on MMLU Benchmark Leaderboard
  15. 15Qwen3 VL 235B A22B Instruct88.8% on MMLU Benchmark Leaderboard
  16. 16Qwen3 VL 32B Thinking88.7% on MMLU Benchmark Leaderboard
  17. 16GPT-4o88.7% on MMLU Benchmark Leaderboard
  18. 18DeepSeek-V388.5% on MMLU Benchmark Leaderboard
  19. 19Qwen3 235B A22B87.8% on MMLU Benchmark Leaderboard
  20. 20Kimi K2 Base87.8% on MMLU Benchmark Leaderboard
  21. 21Qwen3 VL 30B A3B Thinking87.6% on MMLU Benchmark Leaderboard
  22. 22GPT-4.1 mini87.5% on MMLU Benchmark Leaderboard
  23. 22Grok-287.5% on MMLU Benchmark Leaderboard
  24. 24Kimi-k1.587.4% on MMLU Benchmark Leaderboard
  25. 25Llama 3.1 405B Instruct87.3% on MMLU Benchmark Leaderboard
  26. 26o3-mini86.9% on MMLU Benchmark Leaderboard
  27. 27Claude 3 Opus86.8% on MMLU Benchmark Leaderboard
  28. 28GPT-4 Turbo86.5% on MMLU Benchmark Leaderboard
  29. 29Qwen3 VL 32B Instruct86.4% on MMLU Benchmark Leaderboard
  30. 29GPT-486.4% on MMLU Benchmark Leaderboard
  31. 31Grok-2 mini86.2% on MMLU Benchmark Leaderboard
  32. 32Llama 3.2 90B Instruct86.0% on MMLU Benchmark Leaderboard
  33. 32Llama 3.3 70B Instruct86.0% on MMLU Benchmark Leaderboard
  34. 34Nova Pro85.9% on MMLU Benchmark Leaderboard
  35. 34Gemini 1.5 Pro85.9% on MMLU Benchmark Leaderboard
  36. 36GPT-4o85.7% on MMLU Benchmark Leaderboard
  37. 37LongCat-Flash-Lite85.5% on MMLU Benchmark Leaderboard
  38. 38Llama 4 Maverick85.5% on MMLU Benchmark Leaderboard
  39. 39GPT OSS 20B85.3% on MMLU Benchmark Leaderboard
  40. 40Qwen3 VL 8B Thinking85.2% on MMLU Benchmark Leaderboard
  41. 40o1-mini85.2% on MMLU Benchmark Leaderboard
  42. 42Sarvam-30B85.1% on MMLU Benchmark Leaderboard
  43. 43Qwen3 VL 30B A3B Instruct85.0% on MMLU Benchmark Leaderboard
  44. 44Phi 484.8% on MMLU Benchmark Leaderboard
  45. 45Mistral Large 284.0% on MMLU Benchmark Leaderboard
  46. 46Llama 3.1 70B Instruct83.6% on MMLU Benchmark Leaderboard
  47. 47Qwen2.5 32B Instruct83.3% on MMLU Benchmark Leaderboard
  48. 48Qwen2 72B Instruct82.3% on MMLU Benchmark Leaderboard
  49. 49GPT-4o mini82.0% on MMLU Benchmark Leaderboard
  50. 50Qwen3 VL 4B Thinking81.5% on MMLU Benchmark Leaderboard

Models tracked

Models with mmlu in their evaluation profile.

View task leaderboards →