Qwen3.6-35B-A3B

Qwen3.6-35B-A3B is the first open-weight variant of the Qwen3.6 series, a multimodal Mixture-of-Experts model with 35B total parameters and 3B activated. It pairs a vision encoder with a hybrid 40-layer language model that interleaves Gated DeltaNet linear-attention blocks and Gated Attention blocks (10 × (3 × DeltaNet + 1 × Attention)) over 256 experts (8 routed + 1 shared, expert dim 512). The release prioritizes stability and real-world utility, with substantial gains in agentic coding (frontend workflows, repo-level reasoning) and a new option to preserve reasoning context across turns. Native context length is 262K tokens, extensible to ~1M via YaRN, and the model thinks by default.

Context 262K

Benchmarks

GPQA
MMLU
MMLU-Pro
AIME 2025
MATH
HumanEval
MMMU
LiveCodeBench
SWE-Bench Verified

← All models Compare models Benchmark scores