CyberGym Benchmark Leaderboard

CyberGym is a benchmark for evaluating AI agents on cybersecurity tasks, testing their ability to identify vulnerabilities, perform security analysis, and complete security-related challenges in a controlled environment.

Leaderboard

Top 6 models on CyberGym Benchmark Leaderboard (scores from public evaluations).

Models tracked

Models with cybergym in their evaluation profile.

  • No models linked yet.

View task leaderboards →