MathVista Benchmark Leaderboard

MathVista evaluates mathematical reasoning of foundation models in visual contexts. It consists of 6,141 examples derived from 28 existing multimodal datasets and 3 newly created datasets (IQTest, FunctionQA, and PaperQA), combining challenges from diverse mathematical and visual tasks to assess models' ability to understand complex figures and perform rigorous reasoning.

Leaderboard

Top 36 models on MathVista Benchmark Leaderboard (scores from public evaluations).

  1. 1o386.8% on MathVista Benchmark Leaderboard
  2. 2o4-mini84.3% on MathVista Benchmark Leaderboard
  3. 3Step3-VL-10B84.0% on MathVista Benchmark Leaderboard
  4. 4Kimi-k1.574.9% on MathVista Benchmark Leaderboard
  5. 5Llama 4 Maverick73.7% on MathVista Benchmark Leaderboard
  6. 6GPT-4.1 mini73.1% on MathVista Benchmark Leaderboard
  7. 7GPT-4.572.3% on MathVista Benchmark Leaderboard
  8. 8GPT-4.172.2% on MathVista Benchmark Leaderboard
  9. 9o171.8% on MathVista Benchmark Leaderboard
  10. 10QvQ-72B-Preview71.4% on MathVista Benchmark Leaderboard
  11. 11Llama 4 Scout70.7% on MathVista Benchmark Leaderboard
  12. 12Pixtral Large69.4% on MathVista Benchmark Leaderboard
  13. 13Grok-269.0% on MathVista Benchmark Leaderboard
  14. 14Grok-2 mini68.1% on MathVista Benchmark Leaderboard
  15. 14Gemini 1.5 Pro68.1% on MathVista Benchmark Leaderboard
  16. 16Qwen2.5-Omni-7B67.9% on MathVista Benchmark Leaderboard
  17. 17Claude 3.5 Sonnet67.7% on MathVista Benchmark Leaderboard
  18. 18Mistral Small 3.2 24B Instruct67.1% on MathVista Benchmark Leaderboard
  19. 19Gemini 1.5 Flash65.8% on MathVista Benchmark Leaderboard
  20. 20GPT-4o63.8% on MathVista Benchmark Leaderboard
  21. 21DeepSeek VL262.8% on MathVista Benchmark Leaderboard
  22. 22Phi-4-multimodal-instruct62.4% on MathVista Benchmark Leaderboard
  23. 23GPT-4o61.4% on MathVista Benchmark Leaderboard
  24. 24DeepSeek VL2 Small60.7% on MathVista Benchmark Leaderboard
  25. 25Pixtral-12B58.0% on MathVista Benchmark Leaderboard
  26. 26Llama 3.2 90B Instruct57.3% on MathVista Benchmark Leaderboard
  27. 27GPT-4o mini56.7% on MathVista Benchmark Leaderboard
  28. 28GPT-4.1 nano56.2% on MathVista Benchmark Leaderboard
  29. 29Gemini 1.5 Flash 8B54.7% on MathVista Benchmark Leaderboard
  30. 30DeepSeek VL2 Tiny53.6% on MathVista Benchmark Leaderboard
  31. 31Grok-1.552.8% on MathVista Benchmark Leaderboard
  32. 31Grok-1.5V52.8% on MathVista Benchmark Leaderboard
  33. 33Llama 3.2 11B Instruct51.5% on MathVista Benchmark Leaderboard
  34. 34Gemini 1.0 Pro46.6% on MathVista Benchmark Leaderboard
  35. 35Phi-3.5-vision-instruct43.9% on MathVista Benchmark Leaderboard
  36. 36GPT-3.5 Turbo0.0% on MathVista Benchmark Leaderboard

Models tracked

Models with mathvista in their evaluation profile.

  • No models linked yet.

View task leaderboards →