BrowseComp Benchmark Leaderboard
BrowseComp is a benchmark comprising 1,266 questions that challenge AI agents to persistently navigate the internet in search of hard-to-find, entangled information. The benchmark measures agents' ability to exercise persistence in information gathering, demonstrate creativity in web navigation, and find concise, verifiable answers. Despite the difficulty of the questions, BrowseComp is simple and easy-to-use, as predicted answers are short and easily verifiable against reference answers.
Leaderboard
Top 46 models on BrowseComp Benchmark Leaderboard (scores from public evaluations).
- 1GPT-5.5 Pro90.1% on BrowseComp Benchmark Leaderboard
- 2Claude Mythos Preview86.9% on BrowseComp Benchmark Leaderboard
- 3Kimi K2.686.3% on BrowseComp Benchmark Leaderboard
- 4Gemini 3.1 Pro85.9% on BrowseComp Benchmark Leaderboard
- 5GPT-5.584.4% on BrowseComp Benchmark Leaderboard
- 6Claude Opus 4.684.0% on BrowseComp Benchmark Leaderboard
- 7DeepSeek-V4-Pro-Max83.4% on BrowseComp Benchmark Leaderboard
- 8GPT-5.482.7% on BrowseComp Benchmark Leaderboard
- 9Claude Opus 4.779.3% on BrowseComp Benchmark Leaderboard
- 9GLM-5.179.3% on BrowseComp Benchmark Leaderboard
- 11GPT-5.2 Pro77.9% on BrowseComp Benchmark Leaderboard
- 12Seed 2.0 Pro77.3% on BrowseComp Benchmark Leaderboard
- 13MiniMax M2.576.3% on BrowseComp Benchmark Leaderboard
- 14GLM-575.9% on BrowseComp Benchmark Leaderboard
- 15Kimi K2.574.9% on BrowseComp Benchmark Leaderboard
- 16Claude Sonnet 4.674.7% on BrowseComp Benchmark Leaderboard
- 17DeepSeek-V4-Flash-Max73.2% on BrowseComp Benchmark Leaderboard
- 18Qwen3.5-397B-A17B69.0% on BrowseComp Benchmark Leaderboard
- 18Step-3.5-Flash69.0% on BrowseComp Benchmark Leaderboard
- 20GPT-5.265.8% on BrowseComp Benchmark Leaderboard
- 21Qwen3.5-122B-A10B63.8% on BrowseComp Benchmark Leaderboard
- 22MiniMax M2.162.0% on BrowseComp Benchmark Leaderboard
- 23Qwen3.5-35B-A3B61.0% on BrowseComp Benchmark Leaderboard
- 23Qwen3.5-27B61.0% on BrowseComp Benchmark Leaderboard
- 25Kimi K2-Thinking-090560.2% on BrowseComp Benchmark Leaderboard
- 26MiMo-V2-Flash58.3% on BrowseComp Benchmark Leaderboard
- 27LongCat-Flash-Thinking-260156.6% on BrowseComp Benchmark Leaderboard
- 28GPT-554.9% on BrowseComp Benchmark Leaderboard
- 29GLM-4.752.0% on BrowseComp Benchmark Leaderboard
- 30o4-mini51.5% on BrowseComp Benchmark Leaderboard
- 31DeepSeek-V3.251.4% on BrowseComp Benchmark Leaderboard
- 31DeepSeek-V3.2 (Thinking)51.4% on BrowseComp Benchmark Leaderboard
- 33o349.7% on BrowseComp Benchmark Leaderboard
- 34Sarvam-105B49.5% on BrowseComp Benchmark Leaderboard
- 35Mistral Medium 3.548.6% on BrowseComp Benchmark Leaderboard
- 36GLM-4.645.1% on BrowseComp Benchmark Leaderboard
- 37Grok 4 Fast44.9% on BrowseComp Benchmark Leaderboard
- 38MiniMax M244.0% on BrowseComp Benchmark Leaderboard
- 39GLM-4.7-Flash42.8% on BrowseComp Benchmark Leaderboard
- 40DeepSeek-V3.2-Exp40.1% on BrowseComp Benchmark Leaderboard
- 41Sarvam-30B35.5% on BrowseComp Benchmark Leaderboard
- 42Nemotron 3 Super (120B A12B)31.3% on BrowseComp Benchmark Leaderboard
- 43DeepSeek-V3.130.0% on BrowseComp Benchmark Leaderboard
- 44GLM-4.526.4% on BrowseComp Benchmark Leaderboard
- 45GLM-4.5-Air21.3% on BrowseComp Benchmark Leaderboard
- 46DeepSeek-R1-05288.9% on BrowseComp Benchmark Leaderboard
Models tracked
Models with browsecomp in their evaluation profile.
- No models linked yet.