SWE-Bench Multimodal Benchmark Leaderboard
SWE-Bench Multimodal extends SWE-Bench to evaluate language models on software engineering tasks that involve visual inputs such as screenshots, UI mockups, and diagrams alongside code understanding.
Leaderboard
Top 1 models on SWE-Bench Multimodal Benchmark Leaderboard (scores from public evaluations).
Models tracked
Models with swe-bench-multimodal in their evaluation profile.
- No models linked yet.