Humanity's Last Exam Benchmark Leaderboard

Humanity's Last Exam (HLE) is a multi-modal academic benchmark with 2,500 questions across mathematics, humanities, and natural sciences, designed to test LLM capabilities at the frontier of human knowledge with unambiguous, verifiable solutions

Leaderboard

Top 50 models on Humanity's Last Exam Benchmark Leaderboard (scores from public evaluations).

  1. 1Claude Mythos Preview64.7% on Humanity's Last Exam Benchmark Leaderboard
  2. 2Muse Spark58.4% on Humanity's Last Exam Benchmark Leaderboard
  3. 3GPT-5.5 Pro57.2% on Humanity's Last Exam Benchmark Leaderboard
  4. 4Claude Opus 4.754.7% on Humanity's Last Exam Benchmark Leaderboard
  5. 5Claude Opus 4.653.1% on Humanity's Last Exam Benchmark Leaderboard
  6. 6GLM-5.152.3% on Humanity's Last Exam Benchmark Leaderboard
  7. 7GPT-5.552.2% on Humanity's Last Exam Benchmark Leaderboard
  8. 8Gemini 3.1 Pro51.4% on Humanity's Last Exam Benchmark Leaderboard
  9. 9Kimi K2-Thinking-090551.0% on Humanity's Last Exam Benchmark Leaderboard
  10. 10Grok-4 Heavy50.7% on Humanity's Last Exam Benchmark Leaderboard
  11. 11Kimi K2.550.2% on Humanity's Last Exam Benchmark Leaderboard
  12. 12Claude Sonnet 4.649.0% on Humanity's Last Exam Benchmark Leaderboard
  13. 13Qwen3.5-27B48.5% on Humanity's Last Exam Benchmark Leaderboard
  14. 14DeepSeek-V4-Pro-Max48.2% on Humanity's Last Exam Benchmark Leaderboard
  15. 15Qwen3.5-122B-A10B47.5% on Humanity's Last Exam Benchmark Leaderboard
  16. 16Qwen3.5-35B-A3B47.4% on Humanity's Last Exam Benchmark Leaderboard
  17. 17Gemini 3 Pro45.8% on Humanity's Last Exam Benchmark Leaderboard
  18. 18DeepSeek-V4-Flash-Max45.1% on Humanity's Last Exam Benchmark Leaderboard
  19. 19Gemini 3 Flash43.5% on Humanity's Last Exam Benchmark Leaderboard
  20. 20GLM-4.742.8% on Humanity's Last Exam Benchmark Leaderboard
  21. 21DeepSeek-V3.240.8% on Humanity's Last Exam Benchmark Leaderboard
  22. 22Gemini 3.5 Flash40.2% on Humanity's Last Exam Benchmark Leaderboard
  23. 23Grok-440.0% on Humanity's Last Exam Benchmark Leaderboard
  24. 24GPT-5.439.8% on Humanity's Last Exam Benchmark Leaderboard
  25. 25ERNIE 5.039.0% on Humanity's Last Exam Benchmark Leaderboard
  26. 26GPT-5.2 Pro36.6% on Humanity's Last Exam Benchmark Leaderboard
  27. 27Kimi K2.636.4% on Humanity's Last Exam Benchmark Leaderboard
  28. 28GPT-5.234.5% on Humanity's Last Exam Benchmark Leaderboard
  29. 29DeepSeek-V3.2-Speciale30.6% on Humanity's Last Exam Benchmark Leaderboard
  30. 30Qwen3.6 Plus28.8% on Humanity's Last Exam Benchmark Leaderboard
  31. 31Qwen3.5-397B-A17B28.7% on Humanity's Last Exam Benchmark Leaderboard
  32. 32GPT-5.4 mini28.2% on Humanity's Last Exam Benchmark Leaderboard
  33. 33Gemma 4 31B26.5% on Humanity's Last Exam Benchmark Leaderboard
  34. 34LongCat-Flash-Thinking-260125.2% on Humanity's Last Exam Benchmark Leaderboard
  35. 35DeepSeek-V3.2 (Thinking)25.1% on Humanity's Last Exam Benchmark Leaderboard
  36. 36GPT-524.8% on Humanity's Last Exam Benchmark Leaderboard
  37. 37GPT-5.4 nano24.3% on Humanity's Last Exam Benchmark Leaderboard
  38. 38Qwen3.6-27B24.0% on Humanity's Last Exam Benchmark Leaderboard
  39. 39Nemotron 3 Super (120B A12B)22.8% on Humanity's Last Exam Benchmark Leaderboard
  40. 40MiMo-V2-Flash22.1% on Humanity's Last Exam Benchmark Leaderboard
  41. 41MiniMax M2.122.0% on Humanity's Last Exam Benchmark Leaderboard
  42. 42Gemini 2.5 Pro Preview 06-0521.6% on Humanity's Last Exam Benchmark Leaderboard
  43. 43Qwen3.6-35B-A3B21.4% on Humanity's Last Exam Benchmark Leaderboard
  44. 44Grok 4 Fast20.0% on Humanity's Last Exam Benchmark Leaderboard
  45. 45DeepSeek-V3.2-Exp19.8% on Humanity's Last Exam Benchmark Leaderboard
  46. 46Qwen3-235B-A22B-Thinking-250718.2% on Humanity's Last Exam Benchmark Leaderboard
  47. 47Gemini 2.5 Pro17.8% on Humanity's Last Exam Benchmark Leaderboard
  48. 48DeepSeek-R1-052817.7% on Humanity's Last Exam Benchmark Leaderboard
  49. 49GLM-4.617.2% on Humanity's Last Exam Benchmark Leaderboard
  50. 49Gemma 4 26B-A4B17.2% on Humanity's Last Exam Benchmark Leaderboard

Models tracked

Models with humanity's-last-exam in their evaluation profile.

  • No models linked yet.

View task leaderboards →