SWE-Bench Multimodal Benchmark Leaderboard

SWE-Bench Multimodal extends SWE-Bench to evaluate language models on software engineering tasks that involve visual inputs such as screenshots, UI mockups, and diagrams alongside code understanding.

Leaderboard

Top 1 models on SWE-Bench Multimodal Benchmark Leaderboard (scores from public evaluations).

  1. 1Claude Mythos Preview59.0% on SWE-Bench Multimodal Benchmark Leaderboard

Models tracked

Models with swe-bench-multimodal in their evaluation profile.

  • No models linked yet.

View task leaderboards →