MMMU-Pro Benchmark Leaderboard
A more robust multi-discipline multimodal understanding benchmark that enhances MMMU through a three-step process: filtering text-only answerable questions, augmenting candidate options, and introducing vision-only input settings. Achieves significantly lower model performance (16.8-26.9%) compared to original MMMU, providing more rigorous evaluation that closely mimics real-world scenarios.
Leaderboard
Top 49 models on MMMU-Pro Benchmark Leaderboard (scores from public evaluations).
- 1Gemini 3.5 Flash83.6% on MMMU-Pro Benchmark Leaderboard
- 2GPT-5.583.2% on MMMU-Pro Benchmark Leaderboard
- 3Gemini 3 Flash81.2% on MMMU-Pro Benchmark Leaderboard
- 3GPT-5.481.2% on MMMU-Pro Benchmark Leaderboard
- 5Gemini 3 Pro81.0% on MMMU-Pro Benchmark Leaderboard
- 6Gemini 3.1 Pro80.5% on MMMU-Pro Benchmark Leaderboard
- 7Muse Spark80.4% on MMMU-Pro Benchmark Leaderboard
- 8Kimi K2.680.1% on MMMU-Pro Benchmark Leaderboard
- 9GPT-5.279.5% on MMMU-Pro Benchmark Leaderboard
- 10Qwen3.6 Plus78.8% on MMMU-Pro Benchmark Leaderboard
- 11Kimi K2.578.5% on MMMU-Pro Benchmark Leaderboard
- 12GPT-578.4% on MMMU-Pro Benchmark Leaderboard
- 13Claude Opus 4.677.3% on MMMU-Pro Benchmark Leaderboard
- 14Qwen3.5-122B-A10B76.9% on MMMU-Pro Benchmark Leaderboard
- 14Gemma 4 31B76.9% on MMMU-Pro Benchmark Leaderboard
- 16Gemini 3.1 Flash-Lite76.8% on MMMU-Pro Benchmark Leaderboard
- 17GPT-5.4 mini76.6% on MMMU-Pro Benchmark Leaderboard
- 18o376.4% on MMMU-Pro Benchmark Leaderboard
- 19GPT-5.5 Instant76.0% on MMMU-Pro Benchmark Leaderboard
- 20Qwen3.6-27B75.8% on MMMU-Pro Benchmark Leaderboard
- 21Claude Sonnet 4.675.6% on MMMU-Pro Benchmark Leaderboard
- 22Qwen3.6-35B-A3B75.3% on MMMU-Pro Benchmark Leaderboard
- 23Qwen3.5-35B-A3B75.1% on MMMU-Pro Benchmark Leaderboard
- 24Qwen3.5-27B75.0% on MMMU-Pro Benchmark Leaderboard
- 25Gemma 4 26B-A4B73.8% on MMMU-Pro Benchmark Leaderboard
- 26Qwen3 VL 235B A22B Thinking69.3% on MMMU-Pro Benchmark Leaderboard
- 27Qwen3 VL 235B A22B Instruct68.1% on MMMU-Pro Benchmark Leaderboard
- 27Qwen3 VL 32B Thinking68.1% on MMMU-Pro Benchmark Leaderboard
- 29GPT-5.4 nano66.1% on MMMU-Pro Benchmark Leaderboard
- 30Qwen3 VL 32B Instruct65.3% on MMMU-Pro Benchmark Leaderboard
- 31Qwen3 VL 30B A3B Thinking63.0% on MMMU-Pro Benchmark Leaderboard
- 32Qwen3 VL 30B A3B Instruct60.4% on MMMU-Pro Benchmark Leaderboard
- 32Qwen3 VL 8B Thinking60.4% on MMMU-Pro Benchmark Leaderboard
- 34Mistral Small 460.0% on MMMU-Pro Benchmark Leaderboard
- 35GPT-4o59.9% on MMMU-Pro Benchmark Leaderboard
- 36Llama 4 Maverick59.6% on MMMU-Pro Benchmark Leaderboard
- 37Qwen3 VL 4B Thinking57.0% on MMMU-Pro Benchmark Leaderboard
- 38Qwen3 VL 8B Instruct55.9% on MMMU-Pro Benchmark Leaderboard
- 39Qwen3 VL 4B Instruct53.2% on MMMU-Pro Benchmark Leaderboard
- 40Gemma 4 E4B52.6% on MMMU-Pro Benchmark Leaderboard
- 41Qwen2.5 VL 72B Instruct51.1% on MMMU-Pro Benchmark Leaderboard
- 42Qwen2.5 VL 32B Instruct49.5% on MMMU-Pro Benchmark Leaderboard
- 43Qwen2-VL-72B-Instruct46.2% on MMMU-Pro Benchmark Leaderboard
- 44Llama 3.2 90B Instruct45.2% on MMMU-Pro Benchmark Leaderboard
- 45Gemma 4 E2B44.2% on MMMU-Pro Benchmark Leaderboard
- 46Phi-4-multimodal-instruct38.5% on MMMU-Pro Benchmark Leaderboard
- 47Qwen2.5 VL 7B Instruct38.3% on MMMU-Pro Benchmark Leaderboard
- 48Qwen2.5-Omni-7B36.6% on MMMU-Pro Benchmark Leaderboard
- 49Llama 3.2 11B Instruct33.0% on MMMU-Pro Benchmark Leaderboard
Models tracked
Models with mmmu-pro in their evaluation profile.
- No models linked yet.