BrowseComp Long Context 128k Benchmark Leaderboard

A challenging benchmark for evaluating web browsing agents' ability to persistently navigate the internet and find hard-to-locate, entangled information. Comprises 1,266 questions requiring strategic reasoning, creative search, and interpretation of retrieved content, with short and easily verifiable answers.

Leaderboard

Top 5 models on BrowseComp Long Context 128k Benchmark Leaderboard (scores from public evaluations).

Models tracked

Models with browsecomp-long-128k in their evaluation profile.

  • No models linked yet.

View task leaderboards →