Grok-1.5V

A multimodal model capable of processing text and visual information, including documents, diagrams, charts, screenshots, and photographs. Notable for strong real-world spatial understanding capabilities.

Context —

Benchmarks

GPQA
MMLU
MMLU-Pro
AIME 2025
MATH
HumanEval
MMMU
LiveCodeBench
SWE-Bench Verified

← All models Compare models Benchmark scores