DeepSeek-R1
DeepSeek-R1
Version: 1
DeepSeekLast updated January 2025
DeepSeek-R1 excels at reasoning tasks using a step-by-step training process, such as language, scientific reasoning, and coding tasks.
Reasoning
Coding
Agents
Learn more: [original model announcement ] DeepSeek-R1 excels at reasoning tasks using a step-by-step training process, such as language, scientific reasoning, and coding tasks. It features 671B total parameters with 37B active parameters, and 128k context length. DeepSeek-R1 builds on the progress of earlier reasoning-focused models that improved performance by extending Chain-of-Thought (CoT) reasoning. DeepSeek-R1 takes things further by combining reinforcement learning (RL) with fine-tuning on carefully chosen datasets. It evolved from an earlier version, DeepSeek-R1-Zero, which relied solely on RL and showed strong reasoning skills but had issues like hard-to-read outputs and language inconsistencies. To address these limitations, DeepSeek-R1 incorporates a small amount of cold-start data and follows a refined training pipeline that blends reasoning-oriented RL with supervised fine-tuning on curated datasets, resulting in a model that achieves state-of-the-art performance on reasoning benchmarks.

Usage Recommendations

We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:
  • Avoid adding a system prompt; all instructions should be contained within the user prompt.
  • For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
  • When evaluating model performance, it is recommended to conduct multiple tests and average the results.

Model alignment

Microsoft and external researchers have found Deepseek R1 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems.

Reasoning outputs

The model's reasoning output (contained within the tags) may contain more harmful content than the model's final response. Consider how your application will use or display the reasoning output; you may want to suppress the reasoning output in a production setting.

Content filtering

When deployed via Azure AI Foundry, prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Configuration options for content filtering vary when you deploy a model for production in Azure AI; learn more .
Extract from the original model evaluation
For DeepSeek-R1, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, a temperature of 0.6, a top-p value of 0.95, and generate 64 responses per query to estimate pass@1 were used.
CategoryBenchmark (Metric)Claude-3.5-Sonnet-1022GPT-4o 0513DeepSeek V3OpenAI o1-miniOpenAI o1-1217DeepSeek R1
Architecture--MoE--MoE
# Activated Params--37B--37B
# Total Params--671B--671B
EnglishMMLU (Pass@1)88.387.288.585.291.890.8
MMLU-Redux (EM)88.988.089.186.7-92.9
MMLU-Pro (EM)78.072.675.980.3-84.0
DROP (3-shot F1)88.383.791.683.990.292.2
IF-Eval (Prompt Strict)86.584.386.184.8-83.3
GPQA-Diamond (Pass@1)65.049.959.160.075.771.5
SimpleQA (Correct)28.438.224.97.047.030.1
FRAMES (Acc.)72.580.573.376.9-82.5
AlpacaEval2.0 (LC-winrate)52.051.170.057.8-87.6
ArenaHard (GPT-4-1106)85.280.485.592.0-92.3
CodeLiveCodeBench (Pass@1-COT)33.834.2-53.863.465.9
Codeforces (Percentile)20.323.658.793.496.696.3
Codeforces (Rating)7177591134182020612029
SWE Verified (Resolved)50.838.842.041.648.949.2
Aider-Polyglot (Acc.)45.316.049.632.961.753.3
MathAIME 2024 (Pass@1)16.09.339.263.679.279.8
MATH-500 (Pass@1)78.374.690.290.096.497.3
CNMO 2024 (Pass@1)13.110.843.267.6-78.8
ChineseCLUEWSC (EM)85.487.990.989.9-92.8
C-Eval (EM)76.776.086.568.9-91.8
C-SimpleQA (Correct)55.458.768.040.3-63.7
Model Specifications
Context Length128000
Quality Index0.87
LicenseMit
Last UpdatedJanuary 2025
Input TypeText
Output TypeText
PublisherDeepSeek
Languages2 Languages