Humanity's Last Exam Benchmark Leaderboard
Humanity's Last Exam (HLE) is a multi-modal academic benchmark with 2,500 questions across mathematics, humanities, and natural sciences, designed to test LLM capabilities at the frontier of human knowledge with unambiguous, verifiable solutions
Leaderboard
Top 50 models on Humanity's Last Exam Benchmark Leaderboard (scores from public evaluations).
- 1Claude Mythos Preview64.7% on Humanity's Last Exam Benchmark Leaderboard
- 2Muse Spark58.4% on Humanity's Last Exam Benchmark Leaderboard
- 3GPT-5.5 Pro57.2% on Humanity's Last Exam Benchmark Leaderboard
- 4Claude Opus 4.754.7% on Humanity's Last Exam Benchmark Leaderboard
- 5Claude Opus 4.653.1% on Humanity's Last Exam Benchmark Leaderboard
- 6GLM-5.152.3% on Humanity's Last Exam Benchmark Leaderboard
- 7GPT-5.552.2% on Humanity's Last Exam Benchmark Leaderboard
- 8Gemini 3.1 Pro51.4% on Humanity's Last Exam Benchmark Leaderboard
- 9Kimi K2-Thinking-090551.0% on Humanity's Last Exam Benchmark Leaderboard
- 10Grok-4 Heavy50.7% on Humanity's Last Exam Benchmark Leaderboard
- 11Kimi K2.550.2% on Humanity's Last Exam Benchmark Leaderboard
- 12Claude Sonnet 4.649.0% on Humanity's Last Exam Benchmark Leaderboard
- 13Qwen3.5-27B48.5% on Humanity's Last Exam Benchmark Leaderboard
- 14DeepSeek-V4-Pro-Max48.2% on Humanity's Last Exam Benchmark Leaderboard
- 15Qwen3.5-122B-A10B47.5% on Humanity's Last Exam Benchmark Leaderboard
- 16Qwen3.5-35B-A3B47.4% on Humanity's Last Exam Benchmark Leaderboard
- 17Gemini 3 Pro45.8% on Humanity's Last Exam Benchmark Leaderboard
- 18DeepSeek-V4-Flash-Max45.1% on Humanity's Last Exam Benchmark Leaderboard
- 19Gemini 3 Flash43.5% on Humanity's Last Exam Benchmark Leaderboard
- 20GLM-4.742.8% on Humanity's Last Exam Benchmark Leaderboard
- 21DeepSeek-V3.240.8% on Humanity's Last Exam Benchmark Leaderboard
- 22Gemini 3.5 Flash40.2% on Humanity's Last Exam Benchmark Leaderboard
- 23Grok-440.0% on Humanity's Last Exam Benchmark Leaderboard
- 24GPT-5.439.8% on Humanity's Last Exam Benchmark Leaderboard
- 25ERNIE 5.039.0% on Humanity's Last Exam Benchmark Leaderboard
- 26GPT-5.2 Pro36.6% on Humanity's Last Exam Benchmark Leaderboard
- 27Kimi K2.636.4% on Humanity's Last Exam Benchmark Leaderboard
- 28GPT-5.234.5% on Humanity's Last Exam Benchmark Leaderboard
- 29DeepSeek-V3.2-Speciale30.6% on Humanity's Last Exam Benchmark Leaderboard
- 30Qwen3.6 Plus28.8% on Humanity's Last Exam Benchmark Leaderboard
- 31Qwen3.5-397B-A17B28.7% on Humanity's Last Exam Benchmark Leaderboard
- 32GPT-5.4 mini28.2% on Humanity's Last Exam Benchmark Leaderboard
- 33Gemma 4 31B26.5% on Humanity's Last Exam Benchmark Leaderboard
- 34LongCat-Flash-Thinking-260125.2% on Humanity's Last Exam Benchmark Leaderboard
- 35DeepSeek-V3.2 (Thinking)25.1% on Humanity's Last Exam Benchmark Leaderboard
- 36GPT-524.8% on Humanity's Last Exam Benchmark Leaderboard
- 37GPT-5.4 nano24.3% on Humanity's Last Exam Benchmark Leaderboard
- 38Qwen3.6-27B24.0% on Humanity's Last Exam Benchmark Leaderboard
- 39Nemotron 3 Super (120B A12B)22.8% on Humanity's Last Exam Benchmark Leaderboard
- 40MiMo-V2-Flash22.1% on Humanity's Last Exam Benchmark Leaderboard
- 41MiniMax M2.122.0% on Humanity's Last Exam Benchmark Leaderboard
- 42Gemini 2.5 Pro Preview 06-0521.6% on Humanity's Last Exam Benchmark Leaderboard
- 43Qwen3.6-35B-A3B21.4% on Humanity's Last Exam Benchmark Leaderboard
- 44Grok 4 Fast20.0% on Humanity's Last Exam Benchmark Leaderboard
- 45DeepSeek-V3.2-Exp19.8% on Humanity's Last Exam Benchmark Leaderboard
- 46Qwen3-235B-A22B-Thinking-250718.2% on Humanity's Last Exam Benchmark Leaderboard
- 47Gemini 2.5 Pro17.8% on Humanity's Last Exam Benchmark Leaderboard
- 48DeepSeek-R1-052817.7% on Humanity's Last Exam Benchmark Leaderboard
- 49GLM-4.617.2% on Humanity's Last Exam Benchmark Leaderboard
- 49Gemma 4 26B-A4B17.2% on Humanity's Last Exam Benchmark Leaderboard
Models tracked
Models with humanity's-last-exam in their evaluation profile.
- No models linked yet.