OSWorld-G Benchmark Leaderboard

OSWorld-G (Grounding) evaluates screenshot grounding accuracy for OS automation tasks.

Leaderboard

Top 1 models on OSWorld-G Benchmark Leaderboard (scores from public evaluations).

  1. 1Qwen3 VL 235B A22B Thinking0.68 on OSWorld-G Benchmark Leaderboard

Models tracked

Models with osworld-g in their evaluation profile.

  • No models linked yet.

View task leaderboards →