CyberGym Benchmark Leaderboard
CyberGym is a benchmark for evaluating AI agents on cybersecurity tasks, testing their ability to identify vulnerabilities, perform security analysis, and complete security-related challenges in a controlled environment.
Leaderboard
Top 6 models on CyberGym Benchmark Leaderboard (scores from public evaluations).
- 1Claude Mythos Preview83.1% on CyberGym Benchmark Leaderboard
- 2GPT-5.581.8% on CyberGym Benchmark Leaderboard
- 3Claude Opus 4.673.8% on CyberGym Benchmark Leaderboard
- 4Claude Opus 4.773.1% on CyberGym Benchmark Leaderboard
- 5GLM-5.168.7% on CyberGym Benchmark Leaderboard
- 6Kimi K2.541.3% on CyberGym Benchmark Leaderboard
Models tracked
Models with cybergym in their evaluation profile.
- No models linked yet.