GSM8k Benchmark Leaderboard
Grade School Math 8K, a dataset of 8.5K high-quality linguistically diverse grade school math word problems requiring multi-step reasoning and elementary arithmetic operations.
Leaderboard
Top 47 models on GSM8k Benchmark Leaderboard (scores from public evaluations).
- 1Kimi K2 Instruct97.3% on GSM8k Benchmark Leaderboard
- 2o197.1% on GSM8k Benchmark Leaderboard
- 3GPT-4.597.0% on GSM8k Benchmark Leaderboard
- 4Llama 3.1 405B Instruct96.8% on GSM8k Benchmark Leaderboard
- 5Claude 3.5 Sonnet96.4% on GSM8k Benchmark Leaderboard
- 5Claude 3.5 Sonnet96.4% on GSM8k Benchmark Leaderboard
- 7Gemma 3 27B95.9% on GSM8k Benchmark Leaderboard
- 7Qwen2.5 32B Instruct95.9% on GSM8k Benchmark Leaderboard
- 9Qwen2.5 72B Instruct95.8% on GSM8k Benchmark Leaderboard
- 10DeepSeek-V2.595.1% on GSM8k Benchmark Leaderboard
- 11Claude 3 Opus95.0% on GSM8k Benchmark Leaderboard
- 12Nova Pro94.8% on GSM8k Benchmark Leaderboard
- 12Qwen2.5 14B Instruct94.8% on GSM8k Benchmark Leaderboard
- 14Nova Lite94.5% on GSM8k Benchmark Leaderboard
- 15Gemma 3 12B94.4% on GSM8k Benchmark Leaderboard
- 16Qwen3 235B A22B94.4% on GSM8k Benchmark Leaderboard
- 17Mistral Large 293.0% on GSM8k Benchmark Leaderboard
- 18Claude 3 Sonnet92.3% on GSM8k Benchmark Leaderboard
- 18Nova Micro92.3% on GSM8k Benchmark Leaderboard
- 20Kimi K2 Base92.1% on GSM8k Benchmark Leaderboard
- 21Qwen2.5 7B Instruct91.6% on GSM8k Benchmark Leaderboard
- 22Llama 3.1 Nemotron 70B Instruct91.4% on GSM8k Benchmark Leaderboard
- 23Qwen2.5-Coder 32B Instruct91.1% on GSM8k Benchmark Leaderboard
- 23Qwen2 72B Instruct91.1% on GSM8k Benchmark Leaderboard
- 25Gemini 1.5 Pro90.8% on GSM8k Benchmark Leaderboard
- 26Grok-1.590.0% on GSM8k Benchmark Leaderboard
- 27Gemma 3 4B89.2% on GSM8k Benchmark Leaderboard
- 28Claude 3 Haiku88.9% on GSM8k Benchmark Leaderboard
- 29Phi-3.5-MoE-instruct88.7% on GSM8k Benchmark Leaderboard
- 29Qwen2.5-Omni-7B88.7% on GSM8k Benchmark Leaderboard
- 31Phi 4 Mini88.6% on GSM8k Benchmark Leaderboard
- 32Jamba 1.5 Large87.0% on GSM8k Benchmark Leaderboard
- 33Phi-3.5-mini-instruct86.2% on GSM8k Benchmark Leaderboard
- 33Gemini 1.5 Flash86.2% on GSM8k Benchmark Leaderboard
- 35Qwen2.5-Coder 7B Instruct83.9% on GSM8k Benchmark Leaderboard
- 36Qwen2 7B Instruct82.3% on GSM8k Benchmark Leaderboard
- 37Granite 3.3 8B Instruct80.9% on GSM8k Benchmark Leaderboard
- 38Mistral Small 3 24B Base80.7% on GSM8k Benchmark Leaderboard
- 39Llama 3.2 3B Instruct77.7% on GSM8k Benchmark Leaderboard
- 40Jamba 1.5 Mini75.8% on GSM8k Benchmark Leaderboard
- 41Gemma 2 27B74.0% on GSM8k Benchmark Leaderboard
- 42Command R+70.7% on GSM8k Benchmark Leaderboard
- 43IBM Granite 4.0 Tiny Preview70.1% on GSM8k Benchmark Leaderboard
- 44Gemma 2 9B68.6% on GSM8k Benchmark Leaderboard
- 45Gemma 3 1B62.8% on GSM8k Benchmark Leaderboard
- 46Granite 3.3 8B Base59.0% on GSM8k Benchmark Leaderboard
- 47ERNIE 4.525.2% on GSM8k Benchmark Leaderboard
Models tracked
Models with gsm8k in their evaluation profile.
- No models linked yet.