GLM-4.7-Flash

GLM-4.7-Flash is a high-speed, cost-efficient variant of GLM-4.7 optimized for fast inference and lower latency. It retains the coding-centric capabilities of GLM-4.7 including thinking before acting, preserved reasoning across turns, and per-request thinking control for speed or accuracy trade-offs. Ideal for applications requiring quick responses while maintaining strong performance on coding, agentic workflows, and general reasoning tasks.

Context

Benchmarks