LiveCodeBench Benchmark Leaderboard
LiveCodeBench is a holistic and contamination-free evaluation benchmark for large language models for code. It continuously collects new problems from programming contests (LeetCode, AtCoder, CodeForces) and evaluates four different scenarios: code generation, self-repair, code execution, and test output prediction. Problems are annotated with release dates to enable evaluation on unseen problems released after a model's training cutoff.
Leaderboard
Top 50 models on LiveCodeBench Benchmark Leaderboard (scores from public evaluations).
- 1DeepSeek-V4-Pro-Max93.5% on LiveCodeBench Benchmark Leaderboard
- 2DeepSeek-V4-Flash-Max91.6% on LiveCodeBench Benchmark Leaderboard
- 3DeepSeek-V3.2 (Thinking)83.3% on LiveCodeBench Benchmark Leaderboard
- 3DeepSeek-V3.283.3% on LiveCodeBench Benchmark Leaderboard
- 5MiniMax M283.0% on LiveCodeBench Benchmark Leaderboard
- 6LongCat-Flash-Thinking-260182.8% on LiveCodeBench Benchmark Leaderboard
- 7Nemotron 3 Super (120B A12B)81.2% on LiveCodeBench Benchmark Leaderboard
- 8Grok-3 Mini80.4% on LiveCodeBench Benchmark Leaderboard
- 9Grok 4 Fast80.0% on LiveCodeBench Benchmark Leaderboard
- 10Grok-379.4% on LiveCodeBench Benchmark Leaderboard
- 10Grok-4 Heavy79.4% on LiveCodeBench Benchmark Leaderboard
- 10LongCat-Flash-Thinking79.4% on LiveCodeBench Benchmark Leaderboard
- 13Grok-479.0% on LiveCodeBench Benchmark Leaderboard
- 14MiniMax M2.178.0% on LiveCodeBench Benchmark Leaderboard
- 15DeepSeek-V3.2-Exp74.1% on LiveCodeBench Benchmark Leaderboard
- 16DeepSeek-R1-052873.3% on LiveCodeBench Benchmark Leaderboard
- 17GLM-4.572.9% on LiveCodeBench Benchmark Leaderboard
- 18Nemotron Nano 9B v271.1% on LiveCodeBench Benchmark Leaderboard
- 19Qwen3 235B A22B70.7% on LiveCodeBench Benchmark Leaderboard
- 19GLM-4.5-Air70.7% on LiveCodeBench Benchmark Leaderboard
- 21Gemini 2.5 Pro Preview 06-0569.0% on LiveCodeBench Benchmark Leaderboard
- 22Mercury 267.0% on LiveCodeBench Benchmark Leaderboard
- 23Llama 3.1 Nemotron Ultra 253B v166.3% on LiveCodeBench Benchmark Leaderboard
- 24Qwen3 32B65.7% on LiveCodeBench Benchmark Leaderboard
- 25MiniMax M1 80K65.0% on LiveCodeBench Benchmark Leaderboard
- 26Ministral 3 (14B Reasoning 2512)64.6% on LiveCodeBench Benchmark Leaderboard
- 27Mistral Small 463.6% on LiveCodeBench Benchmark Leaderboard
- 28QwQ-32B63.4% on LiveCodeBench Benchmark Leaderboard
- 29Qwen3 30B A3B62.6% on LiveCodeBench Benchmark Leaderboard
- 30MiniMax M1 40K62.3% on LiveCodeBench Benchmark Leaderboard
- 31Ministral 3 (8B Reasoning 2512)61.6% on LiveCodeBench Benchmark Leaderboard
- 32DeepSeek R1 Distill Llama 70B57.5% on LiveCodeBench Benchmark Leaderboard
- 33DeepSeek R1 Distill Qwen 32B57.2% on LiveCodeBench Benchmark Leaderboard
- 34DeepSeek-V3.156.4% on LiveCodeBench Benchmark Leaderboard
- 35Qwen2.5 72B Instruct55.5% on LiveCodeBench Benchmark Leaderboard
- 36Min istral 3 (3B Reasoning 2512)54.8% on LiveCodeBench Benchmark Leaderboard
- 37Phi 4 Reasoning53.8% on LiveCodeBench Benchmark Leaderboard
- 38Kimi K2-Instruct-090553.7% on LiveCodeBench Benchmark Leaderboard
- 39Phi 4 Reasoning Plus53.1% on LiveCodeBench Benchmark Leaderboard
- 39DeepSeek R1 Distill Qwen 14B53.1% on LiveCodeBench Benchmark Leaderboard
- 41Magistral Small 250651.3% on LiveCodeBench Benchmark Leaderboard
- 42Magistral Medium50.3% on LiveCodeBench Benchmark Leaderboard
- 43QwQ-32B-Preview50.0% on LiveCodeBench Benchmark Leaderboard
- 43DeepSeek R1 Zero50.0% on LiveCodeBench Benchmark Leaderboard
- 45DeepSeek-V3 032449.2% on LiveCodeBench Benchmark Leaderboard
- 46LongCat-Flash-Chat48.0% on LiveCodeBench Benchmark Leaderboard
- 47Llama 4 Maverick43.4% on LiveCodeBench Benchmark Leaderboard
- 48DeepSeek R1 Distill Llama 8B39.6% on LiveCodeBench Benchmark Leaderboard
- 49DeepSeek-V337.6% on LiveCodeBench Benchmark Leaderboard
- 49DeepSeek R1 Distill Qwen 7B37.6% on LiveCodeBench Benchmark Leaderboard
Models tracked
Models with livecodebench in their evaluation profile.
- ChatGPT-4o Latest
- Claude 3.5 HaikuAnthropic
- Claude 3.5 Sonnet
- Claude 3.5 Sonnet
- Claude 3.7 Sonnet
- Claude 3 Haiku
- Claude 3 OpusAnthropic
- Claude 3 Sonnet
- Claude Haiku 4.5Anthropic
- Claude Mythos PreviewAnthropic
- Claude Opus 4.1
- Claude Opus 4Anthropic
- Claude Opus 4.5
- Claude Opus 4.6Anthropic
- Claude Opus 4.7Anthropic
- Claude Sonnet 4
- Claude Sonnet 4.5
- Claude Sonnet 4.6
- Codestral-22B
- Command R+
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- Compare AI Models: Side-by-Side
- DeepSeek-V3.2 (Non-thinking)DeepSeek
- DeepSeek-R1-0528DeepSeek
- DeepSeek R1 Distill Llama 70BDeepSeek
- DeepSeek R1 Distill Llama 8BDeepSeek
- DeepSeek R1 Distill Qwen 14BDeepSeek
- DeepSeek R1 Distill Qwen 32BDeepSeek
- DeepSeek R1 Distill Qwen 7BDeepSeek
- DeepSeek R1 ZeroOpenAI
- DeepSeek-V3.2 (Thinking)DeepSeek
- DeepSeek-V2.5DeepSeek
- DeepSeek-V3 0324
- DeepSeek-V3.1DeepSeek
- DeepSeek-V3.2-ExpDeepSeek
- DeepSeek-V3.2-SpecialeDeepSeek
- DeepSeek-V3.2DeepSeek
- DeepSeek-V3
- DeepSeek-V4-Flash-MaxDeepSeek
- DeepSeek-V4-Pro-MaxDeepSeek
- DeepSeek VL2 SmallDeepSeek
- DeepSeek VL2 TinyDeepSeek
- DeepSeek VL2DeepSeek
- ERNIE 4.5
- ERNIE 5.0
- Gemini 1.0 Pro
- Gemini 1.5 Flash 8B