CharXiv-R Benchmark Leaderboard
CharXiv-R is the reasoning component of the CharXiv benchmark, focusing on complex reasoning questions that require synthesizing information across visual chart elements. It evaluates multimodal large language models on their ability to understand and reason about scientific charts from arXiv papers through various reasoning tasks.
Leaderboard
Top 36 models on CharXiv-R Benchmark Leaderboard (scores from public evaluations).
- 1Claude Mythos Preview93.2% on CharXiv-R Benchmark Leaderboard
- 2Claude Opus 4.791.0% on CharXiv-R Benchmark Leaderboard
- 3Kimi K2.686.7% on CharXiv-R Benchmark Leaderboard
- 4Muse Spark86.4% on CharXiv-R Benchmark Leaderboard
- 5Gemini 3.5 Flash84.2% on CharXiv-R Benchmark Leaderboard
- 6GPT-5.282.1% on CharXiv-R Benchmark Leaderboard
- 7GPT-5.5 Instant81.6% on CharXiv-R Benchmark Leaderboard
- 8Qwen3.6 Plus81.5% on CharXiv-R Benchmark Leaderboard
- 9Gemini 3 Pro81.4% on CharXiv-R Benchmark Leaderboard
- 10GPT-581.1% on CharXiv-R Benchmark Leaderboard
- 11Gemini 3 Flash80.3% on CharXiv-R Benchmark Leaderboard
- 12Qwen3.5-27B79.5% on CharXiv-R Benchmark Leaderboard
- 13o378.6% on CharXiv-R Benchmark Leaderboard
- 14Qwen3.6-27B78.4% on CharXiv-R Benchmark Leaderboard
- 15Qwen3.6-35B-A3B78.0% on CharXiv-R Benchmark Leaderboard
- 16Qwen3.5-35B-A3B77.5% on CharXiv-R Benchmark Leaderboard
- 16Kimi K2.577.5% on CharXiv-R Benchmark Leaderboard
- 18Claude Opus 4.677.4% on CharXiv-R Benchmark Leaderboard
- 19Qwen3.5-122B-A10B77.2% on CharXiv-R Benchmark Leaderboard
- 20Gemini 3.1 Flash-Lite73.2% on CharXiv-R Benchmark Leaderboard
- 21o4-mini72.0% on CharXiv-R Benchmark Leaderboard
- 22Qwen3 VL 235B A22B Thinking66.1% on CharXiv-R Benchmark Leaderboard
- 23Qwen3 VL 32B Thinking65.2% on CharXiv-R Benchmark Leaderboard
- 24Qwen3 VL 32B Instruct62.8% on CharXiv-R Benchmark Leaderboard
- 25Qwen3 VL 235B A22B Instruct62.1% on CharXiv-R Benchmark Leaderboard
- 26GPT-4o58.8% on CharXiv-R Benchmark Leaderboard
- 27GPT-4.1 mini56.8% on CharXiv-R Benchmark Leaderboard
- 28GPT-4.156.7% on CharXiv-R Benchmark Leaderboard
- 29Qwen3 VL 30B A3B Thinking56.6% on CharXiv-R Benchmark Leaderboard
- 30GPT-4.555.4% on CharXiv-R Benchmark Leaderboard
- 31Qwen3 VL 8B Thinking53.0% on CharXiv-R Benchmark Leaderboard
- 32Qwen3 VL 4B Thinking50.3% on CharXiv-R Benchmark Leaderboard
- 33Qwen3 VL 30B A3B Instruct48.9% on CharXiv-R Benchmark Leaderboard
- 34Qwen3 VL 8B Instruct46.4% on CharXiv-R Benchmark Leaderboard
- 35GPT-4.1 nano40.5% on CharXiv-R Benchmark Leaderboard
- 36Qwen3 VL 4B Instruct39.7% on CharXiv-R Benchmark Leaderboard
Models tracked
Models with charxiv-r in their evaluation profile.
- No models linked yet.