MMMU-Pro Benchmark Leaderboard

A more robust multi-discipline multimodal understanding benchmark that enhances MMMU through a three-step process: filtering text-only answerable questions, augmenting candidate options, and introducing vision-only input settings. Achieves significantly lower model performance (16.8-26.9%) compared to original MMMU, providing more rigorous evaluation that closely mimics real-world scenarios.

Leaderboard

Top 49 models on MMMU-Pro Benchmark Leaderboard (scores from public evaluations).

  1. 1Gemini 3.5 Flash83.6% on MMMU-Pro Benchmark Leaderboard
  2. 2GPT-5.583.2% on MMMU-Pro Benchmark Leaderboard
  3. 3Gemini 3 Flash81.2% on MMMU-Pro Benchmark Leaderboard
  4. 3GPT-5.481.2% on MMMU-Pro Benchmark Leaderboard
  5. 5Gemini 3 Pro81.0% on MMMU-Pro Benchmark Leaderboard
  6. 6Gemini 3.1 Pro80.5% on MMMU-Pro Benchmark Leaderboard
  7. 7Muse Spark80.4% on MMMU-Pro Benchmark Leaderboard
  8. 8Kimi K2.680.1% on MMMU-Pro Benchmark Leaderboard
  9. 9GPT-5.279.5% on MMMU-Pro Benchmark Leaderboard
  10. 10Qwen3.6 Plus78.8% on MMMU-Pro Benchmark Leaderboard
  11. 11Kimi K2.578.5% on MMMU-Pro Benchmark Leaderboard
  12. 12GPT-578.4% on MMMU-Pro Benchmark Leaderboard
  13. 13Claude Opus 4.677.3% on MMMU-Pro Benchmark Leaderboard
  14. 14Qwen3.5-122B-A10B76.9% on MMMU-Pro Benchmark Leaderboard
  15. 14Gemma 4 31B76.9% on MMMU-Pro Benchmark Leaderboard
  16. 16Gemini 3.1 Flash-Lite76.8% on MMMU-Pro Benchmark Leaderboard
  17. 17GPT-5.4 mini76.6% on MMMU-Pro Benchmark Leaderboard
  18. 18o376.4% on MMMU-Pro Benchmark Leaderboard
  19. 19GPT-5.5 Instant76.0% on MMMU-Pro Benchmark Leaderboard
  20. 20Qwen3.6-27B75.8% on MMMU-Pro Benchmark Leaderboard
  21. 21Claude Sonnet 4.675.6% on MMMU-Pro Benchmark Leaderboard
  22. 22Qwen3.6-35B-A3B75.3% on MMMU-Pro Benchmark Leaderboard
  23. 23Qwen3.5-35B-A3B75.1% on MMMU-Pro Benchmark Leaderboard
  24. 24Qwen3.5-27B75.0% on MMMU-Pro Benchmark Leaderboard
  25. 25Gemma 4 26B-A4B73.8% on MMMU-Pro Benchmark Leaderboard
  26. 26Qwen3 VL 235B A22B Thinking69.3% on MMMU-Pro Benchmark Leaderboard
  27. 27Qwen3 VL 235B A22B Instruct68.1% on MMMU-Pro Benchmark Leaderboard
  28. 27Qwen3 VL 32B Thinking68.1% on MMMU-Pro Benchmark Leaderboard
  29. 29GPT-5.4 nano66.1% on MMMU-Pro Benchmark Leaderboard
  30. 30Qwen3 VL 32B Instruct65.3% on MMMU-Pro Benchmark Leaderboard
  31. 31Qwen3 VL 30B A3B Thinking63.0% on MMMU-Pro Benchmark Leaderboard
  32. 32Qwen3 VL 30B A3B Instruct60.4% on MMMU-Pro Benchmark Leaderboard
  33. 32Qwen3 VL 8B Thinking60.4% on MMMU-Pro Benchmark Leaderboard
  34. 34Mistral Small 460.0% on MMMU-Pro Benchmark Leaderboard
  35. 35GPT-4o59.9% on MMMU-Pro Benchmark Leaderboard
  36. 36Llama 4 Maverick59.6% on MMMU-Pro Benchmark Leaderboard
  37. 37Qwen3 VL 4B Thinking57.0% on MMMU-Pro Benchmark Leaderboard
  38. 38Qwen3 VL 8B Instruct55.9% on MMMU-Pro Benchmark Leaderboard
  39. 39Qwen3 VL 4B Instruct53.2% on MMMU-Pro Benchmark Leaderboard
  40. 40Gemma 4 E4B52.6% on MMMU-Pro Benchmark Leaderboard
  41. 41Qwen2.5 VL 72B Instruct51.1% on MMMU-Pro Benchmark Leaderboard
  42. 42Qwen2.5 VL 32B Instruct49.5% on MMMU-Pro Benchmark Leaderboard
  43. 43Qwen2-VL-72B-Instruct46.2% on MMMU-Pro Benchmark Leaderboard
  44. 44Llama 3.2 90B Instruct45.2% on MMMU-Pro Benchmark Leaderboard
  45. 45Gemma 4 E2B44.2% on MMMU-Pro Benchmark Leaderboard
  46. 46Phi-4-multimodal-instruct38.5% on MMMU-Pro Benchmark Leaderboard
  47. 47Qwen2.5 VL 7B Instruct38.3% on MMMU-Pro Benchmark Leaderboard
  48. 48Qwen2.5-Omni-7B36.6% on MMMU-Pro Benchmark Leaderboard
  49. 49Llama 3.2 11B Instruct33.0% on MMMU-Pro Benchmark Leaderboard

Models tracked

Models with mmmu-pro in their evaluation profile.

  • No models linked yet.

View task leaderboards →