GLM-5V-Turbo

GLM-5V-Turbo is Z.AI's first multimodal coding foundation model, built for vision-based coding tasks. It natively processes multimodal inputs including images, video, text, and files, while excelling at long-horizon planning, complex coding, and action execution. Deeply optimized for agent workflows, it works seamlessly with agents such as Claude Code and OpenClaw to complete the full loop of perceiving, planning, and executing tasks. It features systematic upgrades across model architecture, training methods, data construction, and tooling, including native multimodal fusion, 30+ task joint reinforcement learning, and an expanded multimodal toolchain.

Context 32K

Benchmarks