DeepSeek-V3
Version: 1
Learn more: [original model announcement ]
DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.
DeepSeek-V3 was pre-train on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities.
Model alignment
Microsoft and external researchers have found Deepseek V3 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems.Content filtering
When deployed via Azure AI Foundry, prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Configuration options for content filtering vary when you deploy a model for production in Azure AI; learn more .Extract from the original model evaluation
Category | Benchmark (Metric) | # Shots | DeepSeek-V2 | Qwen2.5 72B | LLaMA3.1 405B | DeepSeek-V3 |
---|---|---|---|---|---|---|
Architecture | - | MoE | Dense | Dense | MoE | |
# Activated Params | - | 21B | 72B | 405B | 37B | |
# Total Params | - | 236B | 72B | 405B | 671B | |
English | Pile-test (BPB) | - | 0.606 | 0.638 | 0.542 | 0.548 |
BBH (EM) | 3-shot | 78.8 | 79.8 | 82.9 | 87.5 | |
MMLU (Acc.) | 5-shot | 78.4 | 85.0 | 84.4 | 87.1 | |
MMLU-Redux (Acc.) | 5-shot | 75.6 | 83.2 | 81.3 | 86.2 | |
MMLU-Pro (Acc.) | 5-shot | 51.4 | 58.3 | 52.8 | 64.4 | |
DROP (F1) | 3-shot | 80.4 | 80.6 | 86.0 | 89.0 | |
ARC-Easy (Acc.) | 25-shot | 97.6 | 98.4 | 98.4 | 98.9 | |
ARC-Challenge (Acc.) | 25-shot | 92.2 | 94.5 | 95.3 | 95.3 | |
HellaSwag (Acc.) | 10-shot | 87.1 | 84.8 | 89.2 | 88.9 | |
PIQA (Acc.) | 0-shot | 83.9 | 82.6 | 85.9 | 84.7 | |
WinoGrande (Acc.) | 5-shot | 86.3 | 82.3 | 85.2 | 84.9 | |
RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | 74.2 | 67.1 | |
RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | 56.8 | 51.3 | |
TriviaQA (EM) | 5-shot | 80.0 | 71.9 | 82.7 | 82.9 | |
NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | 41.5 | 40.0 | |
AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | 79.6 | |
Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | 65.2 |
MBPP (Pass@1) | 3-shot | 65.0 | 72.6 | 68.4 | 75.4 | |
LiveCodeBench-Base (Pass@1) | 3-shot | 11.6 | 12.9 | 15.5 | 19.4 | |
CRUXEval-I (Acc.) | 2-shot | 52.5 | 59.1 | 58.5 | 67.3 | |
CRUXEval-O (Acc.) | 2-shot | 49.8 | 59.9 | 59.9 | 69.8 | |
Math | GSM8K (EM) | 8-shot | 81.6 | 88.3 | 83.5 | 89.3 |
MATH (EM) | 4-shot | 43.4 | 54.4 | 49.0 | 61.6 | |
MGSM (EM) | 8-shot | 63.6 | 76.2 | 69.9 | 79.8 | |
CMath (EM) | 3-shot | 78.7 | 84.5 | 77.3 | 90.7 | |
Chinese | CLUEWSC (EM) | 5-shot | 82.0 | 82.5 | 83.0 | 82.7 |
C-Eval (Acc.) | 5-shot | 81.4 | 89.2 | 72.5 | 90.1 | |
CMMLU (Acc.) | 5-shot | 84.0 | 89.5 | 73.7 | 88.8 | |
CMRC (EM) | 1-shot | 77.4 | 75.8 | 76.0 | 76.3 | |
C3 (Acc.) | 0-shot | 77.4 | 76.7 | 79.7 | 78.6 | |
CCPM (Acc.) | 0-shot | 93.0 | 88.5 | 78.6 | 92.0 | |
Multilingual | MMMLU-non-English (Acc.) | 5-shot | 64.0 | 74.8 | 73.8 | 79.4 |
Model Specifications
Context Length128000
LicenseCustom
Last UpdatedMarch 2025
Input TypeText
Output TypeText
PublisherDeepSeek
Languages2 Languages