MMMU Benchmark Leaderboard

MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning. Contains 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering across 30 subjects and 183 subfields.

Leaderboard

Top 50 models on MMMU Benchmark Leaderboard (scores from public evaluations).

  1. 1Qwen3.6 Plus86.0% on MMMU Benchmark Leaderboard
  2. 2GPT-5.1 Instant85.4% on MMMU Benchmark Leaderboard
  3. 2GPT-5.185.4% on MMMU Benchmark Leaderboard
  4. 2GPT-5.1 Thinking85.4% on MMMU Benchmark Leaderboard
  5. 5GPT-584.2% on MMMU Benchmark Leaderboard
  6. 6Qwen3.5-122B-A10B83.9% on MMMU Benchmark Leaderboard
  7. 7Qwen3.6-27B82.9% on MMMU Benchmark Leaderboard
  8. 7o382.9% on MMMU Benchmark Leaderboard
  9. 9Qwen3.5-27B82.3% on MMMU Benchmark Leaderboard
  10. 10Gemini 2.5 Pro Preview 06-0582.0% on MMMU Benchmark Leaderboard
  11. 11Qwen3.6-35B-A3B81.7% on MMMU Benchmark Leaderboard
  12. 12o4-mini81.6% on MMMU Benchmark Leaderboard
  13. 13Qwen3.5-35B-A3B81.4% on MMMU Benchmark Leaderboard
  14. 14Gemini 2.5 Flash79.7% on MMMU Benchmark Leaderboard
  15. 15Gemini 2.5 Pro79.6% on MMMU Benchmark Leaderboard
  16. 16Step3-VL-10B78.1% on MMMU Benchmark Leaderboard
  17. 17Grok-378.0% on MMMU Benchmark Leaderboard
  18. 18o177.6% on MMMU Benchmark Leaderboard
  19. 19Gemini 2.0 Flash Thinking75.4% on MMMU Benchmark Leaderboard
  20. 20GPT-4.575.2% on MMMU Benchmark Leaderboard
  21. 21Claude 3.7 Sonnet75.0% on MMMU Benchmark Leaderboard
  22. 22GPT-4.174.8% on MMMU Benchmark Leaderboard
  23. 23Claude Sonnet 474.4% on MMMU Benchmark Leaderboard
  24. 24Llama 4 Maverick73.4% on MMMU Benchmark Leaderboard
  25. 25Gemini 2.5 Flash-Lite72.9% on MMMU Benchmark Leaderboard
  26. 26GPT-4.1 mini72.7% on MMMU Benchmark Leaderboard
  27. 27GPT-4o72.2% on MMMU Benchmark Leaderboard
  28. 28Gemini 2.0 Flash70.7% on MMMU Benchmark Leaderboard
  29. 29QvQ-72B-Preview70.3% on MMMU Benchmark Leaderboard
  30. 30Qwen2.5 VL 72B Instruct70.2% on MMMU Benchmark Leaderboard
  31. 31Qwen2.5 VL 32B Instruct70.0% on MMMU Benchmark Leaderboard
  32. 31Kimi-k1.570.0% on MMMU Benchmark Leaderboard
  33. 33Llama 4 Scout69.4% on MMMU Benchmark Leaderboard
  34. 34Claude 3.5 Sonnet68.3% on MMMU Benchmark Leaderboard
  35. 35Gemini 2.0 Flash-Lite68.0% on MMMU Benchmark Leaderboard
  36. 36Grok-266.1% on MMMU Benchmark Leaderboard
  37. 37Gemini 1.5 Pro65.9% on MMMU Benchmark Leaderboard
  38. 38Pixtral Large64.0% on MMMU Benchmark Leaderboard
  39. 39Grok-2 mini63.2% on MMMU Benchmark Leaderboard
  40. 40Mistral Small 3.2 24B Instruct62.5% on MMMU Benchmark Leaderboard
  41. 41Gemini 1.5 Flash62.3% on MMMU Benchmark Leaderboard
  42. 42Nova Pro61.7% on MMMU Benchmark Leaderboard
  43. 43Llama 3.2 90B Instruct60.3% on MMMU Benchmark Leaderboard
  44. 44GPT-4o mini59.4% on MMMU Benchmark Leaderboard
  45. 45Mistral Small 3.1 24B Instruct59.3% on MMMU Benchmark Leaderboard
  46. 45Mistral Small 3.1 24B Base59.3% on MMMU Benchmark Leaderboard
  47. 47Qwen2.5-Omni-7B59.2% on MMMU Benchmark Leaderboard
  48. 48Qwen2.5 VL 7B Instruct58.6% on MMMU Benchmark Leaderboard
  49. 49Nova Lite56.2% on MMMU Benchmark Leaderboard
  50. 50GPT-4.1 nano55.4% on MMMU Benchmark Leaderboard

Models tracked

Models with mmmu in their evaluation profile.

View task leaderboards →