IFEval Benchmark Leaderboard

Instruction-Following Evaluation (IFEval) benchmark for large language models, focusing on verifiable instructions with 25 types of instructions and around 500 prompts containing one or more verifiable constraints

Leaderboard

Top 50 models on IFEval Benchmark Leaderboard (scores from public evaluations).

  1. 1Qwen3.5-27B95.0% on IFEval Benchmark Leaderboard
  2. 2Qwen3.6 Plus94.3% on IFEval Benchmark Leaderboard
  3. 3o3-mini93.9% on IFEval Benchmark Leaderboard
  4. 4Qwen3.5-122B-A10B93.4% on IFEval Benchmark Leaderboard
  5. 5Claude 3.7 Sonnet93.2% on IFEval Benchmark Leaderboard
  6. 6Qwen3.5-397B-A17B92.6% on IFEval Benchmark Leaderboard
  7. 7Llama 3.3 70B Instruct92.1% on IFEval Benchmark Leaderboard
  8. 7Nova Pro92.1% on IFEval Benchmark Leaderboard
  9. 9Qwen3.5-35B-A3B91.9% on IFEval Benchmark Leaderboard
  10. 10Qwen3.5-9B91.5% on IFEval Benchmark Leaderboard
  11. 11Gemma 3 27B90.4% on IFEval Benchmark Leaderboard
  12. 12Nemotron Nano 9B v290.3% on IFEval Benchmark Leaderboard
  13. 13Gemma 3 4B90.2% on IFEval Benchmark Leaderboard
  14. 14Kimi K2-Instruct-090589.8% on IFEval Benchmark Leaderboard
  15. 14Kimi K2 Instruct89.8% on IFEval Benchmark Leaderboard
  16. 14Qwen3.5-4B89.8% on IFEval Benchmark Leaderboard
  17. 17Nova Lite89.7% on IFEval Benchmark Leaderboard
  18. 18LongCat-Flash-Chat89.6% on IFEval Benchmark Leaderboard
  19. 19Llama 3.1 Nemotron Ultra 253B v189.5% on IFEval Benchmark Leaderboard
  20. 20Gemma 3 12B88.9% on IFEval Benchmark Leaderboard
  21. 20Qwen3-Next-80B-A3B-Thinking88.9% on IFEval Benchmark Leaderboard
  22. 22Qwen3-235B-A22B-Instruct-250788.7% on IFEval Benchmark Leaderboard
  23. 23Llama 3.1 405B Instruct88.6% on IFEval Benchmark Leaderboard
  24. 24GPT-4.588.2% on IFEval Benchmark Leaderboard
  25. 24Qwen3 VL 235B A22B Thinking88.2% on IFEval Benchmark Leaderboard
  26. 26Qwen3-235B-A22B-Thinking-250787.8% on IFEval Benchmark Leaderboard
  27. 26Qwen3 VL 235B A22B Instruct87.8% on IFEval Benchmark Leaderboard
  28. 26Qwen3 VL 32B Thinking87.8% on IFEval Benchmark Leaderboard
  29. 29Qwen3-Next-80B-A3B-Instruct87.6% on IFEval Benchmark Leaderboard
  30. 30Llama 3.1 70B Instruct87.5% on IFEval Benchmark Leaderboard
  31. 31GPT-4.187.4% on IFEval Benchmark Leaderboard
  32. 32Nova Micro87.2% on IFEval Benchmark Leaderboard
  33. 32Kimi-k1.587.2% on IFEval Benchmark Leaderboard
  34. 34DeepSeek-V386.1% on IFEval Benchmark Leaderboard
  35. 35Qwen3 VL 30B A3B Instruct85.8% on IFEval Benchmark Leaderboard
  36. 36Phi 4 Reasoning Plus84.9% on IFEval Benchmark Leaderboard
  37. 37Sarvam-105B84.8% on IFEval Benchmark Leaderboard
  38. 38Qwen3 VL 32B Instruct84.7% on IFEval Benchmark Leaderboard
  39. 39Qwen2.5 72B Instruct84.1% on IFEval Benchmark Leaderboard
  40. 39GPT-4.1 mini84.1% on IFEval Benchmark Leaderboard
  41. 41QwQ-32B83.9% on IFEval Benchmark Leaderboard
  42. 42Qwen3 VL 8B Instruct83.7% on IFEval Benchmark Leaderboard
  43. 43Phi 4 Reasoning83.4% on IFEval Benchmark Leaderboard
  44. 44Qwen3 VL 8B Thinking83.2% on IFEval Benchmark Leaderboard
  45. 45Mistral Small 3 24B Instruct82.9% on IFEval Benchmark Leaderboard
  46. 46Qwen3 VL 4B Thinking82.6% on IFEval Benchmark Leaderboard
  47. 47Qwen3 VL 4B Instruct82.3% on IFEval Benchmark Leaderboard
  48. 48Qwen3 VL 30B A3B Thinking81.7% on IFEval Benchmark Leaderboard
  49. 49GPT-4o81.0% on IFEval Benchmark Leaderboard
  50. 50Llama 3.1 8B Instruct80.4% on IFEval Benchmark Leaderboard

Models tracked

Models with ifeval in their evaluation profile.

  • No models linked yet.

View task leaderboards →