DeepSeek-V3.1
DeepSeek-V3.1
Version: 1
DeepSeekLast updated January 2026
DeepSeek-V3.1 is a hybrid model that enhances tool usage, thinking efficiency, and supports both thinking and non-thinking modes via chat template switching
Coding
Agents

Direct from Azure models

Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
  • Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
  • Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
  • Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
  • Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Learn more about Direct from Azure models .

Key capabilities

About this model

DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved. DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.

Key model capabilities

  • Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template.
  • Smarter tool calling: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved.
  • Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

The provider has not supplied this information.

Out of scope use cases

Microsoft and external researchers have found Deepseek V3.1 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format on both model weights and activations to ensure compatibility with microscaling data formats. Please refer to DeepGEMM for more details.

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

The provider has not supplied this information.

Output formats

The provider has not supplied this information.

Supported languages

The provider has not supplied this information.

Sample JSON response

The provider has not supplied this information.

Model architecture

The provider has not supplied this information.

Long context

DeepSeek-V3.1 is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens.

Optimizing model performance

The provider has not supplied this information.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format on both model weights and activations to ensure compatibility with microscaling data formats.

Distribution

Distribution channels

The provider has not supplied this information.

More information

Learn more: original model announcement

Content filtering

When deployed via Microsoft Foundry, prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Configuration options for content filtering vary when you deploy a model for production in Azure AI; learn more .

Responsible AI considerations

Safety techniques

The provider has not supplied this information.

Safety evaluations

Microsoft and external researchers have found Deepseek V3.1 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems.

Known limitations

The provider has not supplied this information.

Acceptable use

Acceptable use policy

The provider has not supplied this information.

Quality and performance evaluations

Source: DeepSeek
CategoryBenchmark (Metric)DeepSeek V3.1-NonThinkingDeepSeek V3 0324DeepSeek V3.1-ThinkingDeepSeek R1 0528
General
MMLU-Redux (EM)91.890.593.793.4
MMLU-Pro (EM)83.781.284.885.0
GPQA-Diamond (Pass@1)74.968.480.181.0
Humanity's Last Exam (Pass@1)--15.917.7
Search Agent
BrowseComp--30.08.9
BrowseComp_zh--49.235.7
Humanity's Last Exam (Python + Search)--29.824.8
SimpleQA--93.492.3
Code
LiveCodeBench (2408-2505) (Pass@1)56.443.074.873.3
Codeforces-Div1 (Rating)--20911930
Aider-Polyglot (Acc.)68.455.176.371.6
Code Agent
SWE Verified (Agent mode)66.045.4-44.6
SWE-bench Multilingual (Agent mode)54.529.3-30.5
Terminal-bench (Terminus 1 framework)31.313.3-5.7
Math
AIME 2024 (Pass@1)66.359.493.191.4
AIME 2025 (Pass@1)49.851.388.487.5
HMMT 2025 (Pass@1)33.529.284.279.4

Benchmarking methodology

Source: DeepSeek The provider has not supplied this information.

Public data summary

Source: DeepSeek Microsoft and external researchers have found Deepseek V3.1 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems.
Model Specifications
Context Length128000
Quality Index0.85
LicenseCustom
Last UpdatedJanuary 2026
Input TypeText
Output TypeText
ProviderDeepSeek
Languages2 Languages