DeepSeek-V3
DeepSeek-V3
Version: 1
DeepSeekLast updated November 2025
A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
Coding
Agents

Key capabilities

About this model

DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. DeepSeek-V3 was pre-train on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities.

Key model capabilities

Based on the evaluation results, DeepSeek-V3 demonstrates strong performance across multiple categories including English language tasks, coding, mathematics, Chinese language tasks, and multilingual capabilities. The model shows particularly strong performance on benchmarks such as BBH (87.5%), MMLU (87.1%), HumanEval (65.2%), GSM8K (89.3%), and MATH (61.6%).

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

The provider has not supplied this information.

Out of scope use cases

Microsoft and external researchers have found Deepseek V3 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

The provider has not supplied this information.

Output formats

The provider has not supplied this information.

Supported languages

The provider has not supplied this information.

Sample JSON response

The provider has not supplied this information.

Model architecture

DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

Long context

The provider has not supplied this information.

Optimizing model performance

The provider has not supplied this information.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

DeepSeek-V3 was pre-train on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities.

Distribution

Distribution channels

The provider has not supplied this information.

More information

Learn more: [original model announcement ]

Responsible AI considerations

Safety techniques

Microsoft and external researchers have found Deepseek V3 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems. When deployed via Azure AI Foundry, prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Configuration options for content filtering vary when you deploy a model for production in Azure AI; learn more .

Safety evaluations

The provider has not supplied this information.

Known limitations

The provider has not supplied this information.

Acceptable use

Acceptable use policy

The provider has not supplied this information.

Quality and performance evaluations

Source: DeepSeek Extract from the original model evaluation
CategoryBenchmark (Metric)# ShotsDeepSeek-V2Qwen2.5 72BLLaMA3.1 405BDeepSeek-V3
Architecture-MoEDenseDenseMoE
# Activated Params-21B72B405B37B
# Total Params-236B72B405B671B
EnglishPile-test (BPB)-0.6060.6380.5420.548
BBH (EM)3-shot78.879.882.987.5
MMLU (Acc.)5-shot78.485.084.487.1
MMLU-Redux (Acc.)5-shot75.683.281.386.2
MMLU-Pro (Acc.)5-shot51.458.352.864.4
DROP (F1)3-shot80.480.686.089.0
ARC-Easy (Acc.)25-shot97.698.498.498.9
ARC-Challenge (Acc.)25-shot92.294.595.395.3
HellaSwag (Acc.)10-shot87.184.889.288.9
PIQA (Acc.)0-shot83.982.685.984.7
WinoGrande (Acc.)5-shot86.382.385.284.9
RACE-Middle (Acc.)5-shot73.168.174.267.1
RACE-High (Acc.)5-shot52.650.356.851.3
TriviaQA (EM)5-shot80.071.982.782.9
NaturalQuestions (EM)5-shot38.633.241.540.0
AGIEval (Acc.)0-shot57.575.860.679.6
CodeHumanEval (Pass@1)0-shot43.353.054.965.2
MBPP (Pass@1)3-shot65.072.668.475.4
LiveCodeBench-Base (Pass@1)3-shot11.612.915.519.4
CRUXEval-I (Acc.)2-shot52.559.158.567.3
CRUXEval-O (Acc.)2-shot49.859.959.969.8
MathGSM8K (EM)8-shot81.688.383.589.3
MATH (EM)4-shot43.454.449.061.6
MGSM (EM)8-shot63.676.269.979.8
CMath (EM)3-shot78.784.577.390.7
ChineseCLUEWSC (EM)5-shot82.082.583.082.7
C-Eval (Acc.)5-shot81.489.272.590.1
CMMLU (Acc.)5-shot84.089.573.788.8
CMRC (EM)1-shot77.475.876.076.3
C3 (Acc.)0-shot77.476.779.778.6
CCPM (Acc.)0-shot93.088.578.692.0
MultilingualMMMLU-non-English (Acc.)5-shot64.074.873.879.4

Benchmarking methodology

Source: DeepSeek The provider has not supplied this information.

Public data summary

Source: DeepSeek Microsoft and external researchers have found Deepseek V3 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems.
Model Specifications
Context Length128000
LicenseCustom
Last UpdatedNovember 2025
Input TypeText
Output TypeText
ProviderDeepSeek
Languages2 Languages