DeepSeek-V3
Version: 1
Key capabilities
About this model
DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. DeepSeek-V3 was pre-train on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities.Key model capabilities
Based on the evaluation results, DeepSeek-V3 demonstrates strong performance across multiple categories including English language tasks, coding, mathematics, Chinese language tasks, and multilingual capabilities. The model shows particularly strong performance on benchmarks such as BBH (87.5%), MMLU (87.1%), HumanEval (65.2%), GSM8K (89.3%), and MATH (61.6%).Use cases
See Responsible AI for additional considerations for responsible use.Key use cases
The provider has not supplied this information.Out of scope use cases
Microsoft and external researchers have found Deepseek V3 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems.Pricing
Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.Technical specs
DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.Training cut-off date
The provider has not supplied this information.Training time
The provider has not supplied this information.Input formats
The provider has not supplied this information.Output formats
The provider has not supplied this information.Supported languages
The provider has not supplied this information.Sample JSON response
The provider has not supplied this information.Model architecture
DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.Long context
The provider has not supplied this information.Optimizing model performance
The provider has not supplied this information.Additional assets
The provider has not supplied this information.Training disclosure
Training, testing and validation
DeepSeek-V3 was pre-train on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities.Distribution
Distribution channels
The provider has not supplied this information.More information
Learn more: [original model announcement ]Responsible AI considerations
Safety techniques
Microsoft and external researchers have found Deepseek V3 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems. When deployed via Azure AI Foundry, prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Configuration options for content filtering vary when you deploy a model for production in Azure AI; learn more .Safety evaluations
The provider has not supplied this information.Known limitations
The provider has not supplied this information.Acceptable use
Acceptable use policy
The provider has not supplied this information.Quality and performance evaluations
Source: DeepSeek Extract from the original model evaluation| Category | Benchmark (Metric) | # Shots | DeepSeek-V2 | Qwen2.5 72B | LLaMA3.1 405B | DeepSeek-V3 |
|---|---|---|---|---|---|---|
| Architecture | - | MoE | Dense | Dense | MoE | |
| # Activated Params | - | 21B | 72B | 405B | 37B | |
| # Total Params | - | 236B | 72B | 405B | 671B | |
| English | Pile-test (BPB) | - | 0.606 | 0.638 | 0.542 | 0.548 |
| BBH (EM) | 3-shot | 78.8 | 79.8 | 82.9 | 87.5 | |
| MMLU (Acc.) | 5-shot | 78.4 | 85.0 | 84.4 | 87.1 | |
| MMLU-Redux (Acc.) | 5-shot | 75.6 | 83.2 | 81.3 | 86.2 | |
| MMLU-Pro (Acc.) | 5-shot | 51.4 | 58.3 | 52.8 | 64.4 | |
| DROP (F1) | 3-shot | 80.4 | 80.6 | 86.0 | 89.0 | |
| ARC-Easy (Acc.) | 25-shot | 97.6 | 98.4 | 98.4 | 98.9 | |
| ARC-Challenge (Acc.) | 25-shot | 92.2 | 94.5 | 95.3 | 95.3 | |
| HellaSwag (Acc.) | 10-shot | 87.1 | 84.8 | 89.2 | 88.9 | |
| PIQA (Acc.) | 0-shot | 83.9 | 82.6 | 85.9 | 84.7 | |
| WinoGrande (Acc.) | 5-shot | 86.3 | 82.3 | 85.2 | 84.9 | |
| RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | 74.2 | 67.1 | |
| RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | 56.8 | 51.3 | |
| TriviaQA (EM) | 5-shot | 80.0 | 71.9 | 82.7 | 82.9 | |
| NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | 41.5 | 40.0 | |
| AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | 79.6 | |
| Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | 65.2 |
| MBPP (Pass@1) | 3-shot | 65.0 | 72.6 | 68.4 | 75.4 | |
| LiveCodeBench-Base (Pass@1) | 3-shot | 11.6 | 12.9 | 15.5 | 19.4 | |
| CRUXEval-I (Acc.) | 2-shot | 52.5 | 59.1 | 58.5 | 67.3 | |
| CRUXEval-O (Acc.) | 2-shot | 49.8 | 59.9 | 59.9 | 69.8 | |
| Math | GSM8K (EM) | 8-shot | 81.6 | 88.3 | 83.5 | 89.3 |
| MATH (EM) | 4-shot | 43.4 | 54.4 | 49.0 | 61.6 | |
| MGSM (EM) | 8-shot | 63.6 | 76.2 | 69.9 | 79.8 | |
| CMath (EM) | 3-shot | 78.7 | 84.5 | 77.3 | 90.7 | |
| Chinese | CLUEWSC (EM) | 5-shot | 82.0 | 82.5 | 83.0 | 82.7 |
| C-Eval (Acc.) | 5-shot | 81.4 | 89.2 | 72.5 | 90.1 | |
| CMMLU (Acc.) | 5-shot | 84.0 | 89.5 | 73.7 | 88.8 | |
| CMRC (EM) | 1-shot | 77.4 | 75.8 | 76.0 | 76.3 | |
| C3 (Acc.) | 0-shot | 77.4 | 76.7 | 79.7 | 78.6 | |
| CCPM (Acc.) | 0-shot | 93.0 | 88.5 | 78.6 | 92.0 | |
| Multilingual | MMMLU-non-English (Acc.) | 5-shot | 64.0 | 74.8 | 73.8 | 79.4 |
Benchmarking methodology
Source: DeepSeek The provider has not supplied this information.Public data summary
Source: DeepSeek Microsoft and external researchers have found Deepseek V3 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems.Model Specifications
Context Length128000
LicenseCustom
Last UpdatedNovember 2025
Input TypeText
Output TypeText
ProviderDeepSeek
Languages2 Languages