IFEval Benchmark Leaderboard
Instruction-Following Evaluation (IFEval) benchmark for large language models, focusing on verifiable instructions with 25 types of instructions and around 500 prompts containing one or more verifiable constraints
Leaderboard
Top 50 models on IFEval Benchmark Leaderboard (scores from public evaluations).
- 1Qwen3.5-27B95.0% on IFEval Benchmark Leaderboard
- 2Qwen3.6 Plus94.3% on IFEval Benchmark Leaderboard
- 3o3-mini93.9% on IFEval Benchmark Leaderboard
- 4Qwen3.5-122B-A10B93.4% on IFEval Benchmark Leaderboard
- 5Claude 3.7 Sonnet93.2% on IFEval Benchmark Leaderboard
- 6Qwen3.5-397B-A17B92.6% on IFEval Benchmark Leaderboard
- 7Llama 3.3 70B Instruct92.1% on IFEval Benchmark Leaderboard
- 7Nova Pro92.1% on IFEval Benchmark Leaderboard
- 9Qwen3.5-35B-A3B91.9% on IFEval Benchmark Leaderboard
- 10Qwen3.5-9B91.5% on IFEval Benchmark Leaderboard
- 11Gemma 3 27B90.4% on IFEval Benchmark Leaderboard
- 12Nemotron Nano 9B v290.3% on IFEval Benchmark Leaderboard
- 13Gemma 3 4B90.2% on IFEval Benchmark Leaderboard
- 14Kimi K2-Instruct-090589.8% on IFEval Benchmark Leaderboard
- 14Kimi K2 Instruct89.8% on IFEval Benchmark Leaderboard
- 14Qwen3.5-4B89.8% on IFEval Benchmark Leaderboard
- 17Nova Lite89.7% on IFEval Benchmark Leaderboard
- 18LongCat-Flash-Chat89.6% on IFEval Benchmark Leaderboard
- 19Llama 3.1 Nemotron Ultra 253B v189.5% on IFEval Benchmark Leaderboard
- 20Gemma 3 12B88.9% on IFEval Benchmark Leaderboard
- 20Qwen3-Next-80B-A3B-Thinking88.9% on IFEval Benchmark Leaderboard
- 22Qwen3-235B-A22B-Instruct-250788.7% on IFEval Benchmark Leaderboard
- 23Llama 3.1 405B Instruct88.6% on IFEval Benchmark Leaderboard
- 24GPT-4.588.2% on IFEval Benchmark Leaderboard
- 24Qwen3 VL 235B A22B Thinking88.2% on IFEval Benchmark Leaderboard
- 26Qwen3-235B-A22B-Thinking-250787.8% on IFEval Benchmark Leaderboard
- 26Qwen3 VL 235B A22B Instruct87.8% on IFEval Benchmark Leaderboard
- 26Qwen3 VL 32B Thinking87.8% on IFEval Benchmark Leaderboard
- 29Qwen3-Next-80B-A3B-Instruct87.6% on IFEval Benchmark Leaderboard
- 30Llama 3.1 70B Instruct87.5% on IFEval Benchmark Leaderboard
- 31GPT-4.187.4% on IFEval Benchmark Leaderboard
- 32Nova Micro87.2% on IFEval Benchmark Leaderboard
- 32Kimi-k1.587.2% on IFEval Benchmark Leaderboard
- 34DeepSeek-V386.1% on IFEval Benchmark Leaderboard
- 35Qwen3 VL 30B A3B Instruct85.8% on IFEval Benchmark Leaderboard
- 36Phi 4 Reasoning Plus84.9% on IFEval Benchmark Leaderboard
- 37Sarvam-105B84.8% on IFEval Benchmark Leaderboard
- 38Qwen3 VL 32B Instruct84.7% on IFEval Benchmark Leaderboard
- 39Qwen2.5 72B Instruct84.1% on IFEval Benchmark Leaderboard
- 39GPT-4.1 mini84.1% on IFEval Benchmark Leaderboard
- 41QwQ-32B83.9% on IFEval Benchmark Leaderboard
- 42Qwen3 VL 8B Instruct83.7% on IFEval Benchmark Leaderboard
- 43Phi 4 Reasoning83.4% on IFEval Benchmark Leaderboard
- 44Qwen3 VL 8B Thinking83.2% on IFEval Benchmark Leaderboard
- 45Mistral Small 3 24B Instruct82.9% on IFEval Benchmark Leaderboard
- 46Qwen3 VL 4B Thinking82.6% on IFEval Benchmark Leaderboard
- 47Qwen3 VL 4B Instruct82.3% on IFEval Benchmark Leaderboard
- 48Qwen3 VL 30B A3B Thinking81.7% on IFEval Benchmark Leaderboard
- 49GPT-4o81.0% on IFEval Benchmark Leaderboard
- 50Llama 3.1 8B Instruct80.4% on IFEval Benchmark Leaderboard
Models tracked
Models with ifeval in their evaluation profile.
- No models linked yet.