DeepSeek-R1
DeepSeek-R1
Version: 1
DeepSeekLast updated December 2025
DeepSeek-R1 excels at reasoning tasks using a step-by-step training process, such as language, scientific reasoning, and coding tasks.
Reasoning
Coding
Agents

Direct from Azure models

Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
  • Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
  • Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
  • Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
  • Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Learn more about Direct from Azure models .

Key capabilities

About this model

DeepSeek-R1 excels at reasoning tasks using a step-by-step training process, such as language, scientific reasoning, and coding tasks.

Key model capabilities

DeepSeek-R1 excels at reasoning tasks using a step-by-step training process, such as language, scientific reasoning, and coding tasks. It achieves state-of-the-art performance on reasoning benchmarks through a refined training pipeline that blends reasoning-oriented RL with supervised fine-tuning on curated datasets.

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

DeepSeek-R1 excels at reasoning tasks using a step-by-step training process, such as language, scientific reasoning, and coding tasks.

Out of scope use cases

Microsoft and external researchers have found Deepseek R1 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems. The model's reasoning output (contained within the tags) may contain more harmful content than the model's final response. Consider how your application will use or display the reasoning output; you may want to suppress the reasoning output in a production setting.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

Architecture: MoE

Training disclosure

Training, testing and validation

DeepSeek-R1 follows a refined training pipeline that blends reasoning-oriented RL with supervised fine-tuning on curated datasets. It evolved from an earlier version, DeepSeek-R1-Zero, which relied solely on RL and showed strong reasoning skills but had issues like hard-to-read outputs and language inconsistencies. To address these limitations, DeepSeek-R1 incorporates a small amount of cold-start data and combines reinforcement learning (RL) with fine-tuning on carefully chosen datasets.

Distribution

Distribution channels

The provider has not supplied this information.

More information

When deployed via Microsoft Foundry, prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Configuration options for content filtering vary when you deploy a model for production in Azure AI; learn more .

Responsible AI considerations

Safety techniques

Microsoft and external researchers have found Deepseek R1 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems. The model's reasoning output (contained within the tags) may contain more harmful content than the model's final response. Consider how your application will use or display the reasoning output; you may want to suppress the reasoning output in a production setting. When deployed via Microsoft Foundry, prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Configuration options for content filtering vary when you deploy a model for production in Azure AI; learn more .

Safety evaluations

The provider has not supplied this information.

Known limitations

The provider has not supplied this information.

Acceptable use

Acceptable use policy

The provider has not supplied this information.

Quality and performance evaluations

Source: DeepSeek For DeepSeek-R1, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, a temperature of 0.6, a top-p value of 0.95, and generate 64 responses per query to estimate pass@1 were used.
CategoryBenchmark (Metric)Claude-3.5-Sonnet-1022GPT-4o 0513DeepSeek V3OpenAI o1-miniOpenAI o1-1217DeepSeek R1
Architecture--MoE--MoE
# Activated Params--37B--37B
# Total Params--671B--671B
EnglishMMLU (Pass@1)88.387.288.585.291.890.8
MMLU-Redux (EM)88.988.089.186.7-92.9
MMLU-Pro (EM)78.072.675.980.3-84.0
DROP (3-shot F1)88.383.791.683.990.292.2
IF-Eval (Prompt Strict)86.584.386.184.8-83.3
GPQA-Diamond (Pass@1)65.049.959.160.075.771.5
SimpleQA (Correct)28.438.224.97.047.030.1
FRAMES (Acc.)72.580.573.376.9-82.5
AlpacaEval2.0 (LC-winrate)52.051.170.057.8-87.6
ArenaHard (GPT-4-1106)85.280.485.592.0-92.3
CodeLiveCodeBench (Pass@1-COT)33.834.2-53.863.465.9
Codeforces (Percentile)20.323.658.793.496.696.3
Codeforces (Rating)7177591134182020612029
SWE Verified (Resolved)50.838.842.041.648.949.2
Aider-Polyglot (Acc.)45.316.049.632.961.753.3
MathAIME 2024 (Pass@1)16.09.339.263.679.279.8
MATH-500 (Pass@1)78.374.690.290.096.497.3
CNMO 2024 (Pass@1)13.110.843.267.6-78.8
ChineseCLUEWSC (EM)85.487.990.989.9-92.8
C-Eval (EM)76.776.086.568.9-91.8
C-SimpleQA (Correct)55.458.768.040.3-63.7

Benchmarking methodology

Source: DeepSeek When evaluating model performance, it is recommended to conduct multiple tests and average the results.

Public data summary

Source: DeepSeek Microsoft and external researchers have found Deepseek R1 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks.
Model Specifications
Context Length128000
LicenseMit
Last UpdatedDecember 2025
Input TypeText
Output TypeText
ProviderDeepSeek
Languages2 Languages