MathVista Benchmark Leaderboard
MathVista evaluates mathematical reasoning of foundation models in visual contexts. It consists of 6,141 examples derived from 28 existing multimodal datasets and 3 newly created datasets (IQTest, FunctionQA, and PaperQA), combining challenges from diverse mathematical and visual tasks to assess models' ability to understand complex figures and perform rigorous reasoning.
Leaderboard
Top 36 models on MathVista Benchmark Leaderboard (scores from public evaluations).
- 1o386.8% on MathVista Benchmark Leaderboard
- 2o4-mini84.3% on MathVista Benchmark Leaderboard
- 3Step3-VL-10B84.0% on MathVista Benchmark Leaderboard
- 4Kimi-k1.574.9% on MathVista Benchmark Leaderboard
- 5Llama 4 Maverick73.7% on MathVista Benchmark Leaderboard
- 6GPT-4.1 mini73.1% on MathVista Benchmark Leaderboard
- 7GPT-4.572.3% on MathVista Benchmark Leaderboard
- 8GPT-4.172.2% on MathVista Benchmark Leaderboard
- 9o171.8% on MathVista Benchmark Leaderboard
- 10QvQ-72B-Preview71.4% on MathVista Benchmark Leaderboard
- 11Llama 4 Scout70.7% on MathVista Benchmark Leaderboard
- 12Pixtral Large69.4% on MathVista Benchmark Leaderboard
- 13Grok-269.0% on MathVista Benchmark Leaderboard
- 14Grok-2 mini68.1% on MathVista Benchmark Leaderboard
- 14Gemini 1.5 Pro68.1% on MathVista Benchmark Leaderboard
- 16Qwen2.5-Omni-7B67.9% on MathVista Benchmark Leaderboard
- 17Claude 3.5 Sonnet67.7% on MathVista Benchmark Leaderboard
- 18Mistral Small 3.2 24B Instruct67.1% on MathVista Benchmark Leaderboard
- 19Gemini 1.5 Flash65.8% on MathVista Benchmark Leaderboard
- 20GPT-4o63.8% on MathVista Benchmark Leaderboard
- 21DeepSeek VL262.8% on MathVista Benchmark Leaderboard
- 22Phi-4-multimodal-instruct62.4% on MathVista Benchmark Leaderboard
- 23GPT-4o61.4% on MathVista Benchmark Leaderboard
- 24DeepSeek VL2 Small60.7% on MathVista Benchmark Leaderboard
- 25Pixtral-12B58.0% on MathVista Benchmark Leaderboard
- 26Llama 3.2 90B Instruct57.3% on MathVista Benchmark Leaderboard
- 27GPT-4o mini56.7% on MathVista Benchmark Leaderboard
- 28GPT-4.1 nano56.2% on MathVista Benchmark Leaderboard
- 29Gemini 1.5 Flash 8B54.7% on MathVista Benchmark Leaderboard
- 30DeepSeek VL2 Tiny53.6% on MathVista Benchmark Leaderboard
- 31Grok-1.552.8% on MathVista Benchmark Leaderboard
- 31Grok-1.5V52.8% on MathVista Benchmark Leaderboard
- 33Llama 3.2 11B Instruct51.5% on MathVista Benchmark Leaderboard
- 34Gemini 1.0 Pro46.6% on MathVista Benchmark Leaderboard
- 35Phi-3.5-vision-instruct43.9% on MathVista Benchmark Leaderboard
- 36GPT-3.5 Turbo0.0% on MathVista Benchmark Leaderboard
Models tracked
Models with mathvista in their evaluation profile.
- No models linked yet.