OSWorld-Verified Benchmark Leaderboard
OSWorld-Verified is a verified subset of OSWorld, a scalable real computer environment for multimodal agents supporting task setup, execution-based evaluation, and interactive learning across Ubuntu, Windows, and macOS.
Leaderboard
Top 13 models on OSWorld-Verified Benchmark Leaderboard (scores from public evaluations).
- 1Claude Mythos Preview79.6% on OSWorld-Verified Benchmark Leaderboard
- 2GPT-5.578.7% on OSWorld-Verified Benchmark Leaderboard
- 3Gemini 3.5 Flash78.4% on OSWorld-Verified Benchmark Leaderboard
- 4Claude Opus 4.778.0% on OSWorld-Verified Benchmark Leaderboard
- 5GPT-5.475.0% on OSWorld-Verified Benchmark Leaderboard
- 6Kimi K2.673.1% on OSWorld-Verified Benchmark Leaderboard
- 7GPT-5.4 mini72.1% on OSWorld-Verified Benchmark Leaderboard
- 8GPT-5.3 Codex64.7% on OSWorld-Verified Benchmark Leaderboard
- 9Qwen3.6 Plus62.5% on OSWorld-Verified Benchmark Leaderboard
- 10Qwen3.5-122B-A10B58.0% on OSWorld-Verified Benchmark Leaderboard
- 11Qwen3.5-27B56.2% on OSWorld-Verified Benchmark Leaderboard
- 12Qwen3.5-35B-A3B54.5% on OSWorld-Verified Benchmark Leaderboard
- 13GPT-5.4 nano39.0% on OSWorld-Verified Benchmark Leaderboard
Models tracked
Models with osworld-verified in their evaluation profile.
- No models linked yet.