DeepSeek-V3
A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token
A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.
LLMs
Free tier
Intelligence
Popularity49/100
Monthly visits—
Growth—
Updated2026-05-21
Features
GPQA
MMLU
MMLU-Pro
AIME 2025
MATH
HumanEval
Pros
Cons
Use cases
API inference · Fine-tuning · Benchmarking
AI models used
DeepSeek-V3
FAQ
How much does DeepSeek-V3 cost?
Free tier
Does DeepSeek-V3 have a free plan?
Limited or no free tier
Is there an API?
Yes