AI Model Catalog | Azure AI Foundry Models

DeepSeek-V3.1

Version: 1

DeepSeek•Last updated September 2025

DeepSeek-V3.1 is a hybrid model that enhances tool usage, thinking efficiency, and supports both thinking and non-thinking modes via chat template switching

Coding

Agents

Learn more: original model announcement DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects:

Hybrid thinking mode: One model supports both thinking mode and non-thinking mode by changing the chat template.
Smarter tool calling: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved.
Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.

DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format on both model weights and activations to ensure compatibility with microscaling data formats. Please refer to DeepGEMM for more details.

Model alignment

Microsoft and external researchers have found Deepseek V3.1 to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems.

Content filtering

When deployed via Azure AI Foundry, prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Configuration options for content filtering vary when you deploy a model for production in Azure AI; learn more .

Evaluation

Category	Benchmark (Metric)	DeepSeek V3.1-NonThinking	DeepSeek V3 0324	DeepSeek V3.1-Thinking	DeepSeek R1 0528
General
	MMLU-Redux (EM)	91.8	90.5	93.7	93.4
	MMLU-Pro (EM)	83.7	81.2	84.8	85.0
	GPQA-Diamond (Pass@1)	74.9	68.4	80.1	81.0
	Humanity's Last Exam (Pass@1)	-	-	15.9	17.7
Search Agent
	BrowseComp	-	-	30.0	8.9
	BrowseComp_zh	-	-	49.2	35.7
	Humanity's Last Exam (Python + Search)	-	-	29.8	24.8
	SimpleQA	-	-	93.4	92.3
Code
	LiveCodeBench (2408-2505) (Pass@1)	56.4	43.0	74.8	73.3
	Codeforces-Div1 (Rating)	-	-	2091	1930
	Aider-Polyglot (Acc.)	68.4	55.1	76.3	71.6
Code Agent
	SWE Verified (Agent mode)	66.0	45.4	-	44.6
	SWE-bench Multilingual (Agent mode)	54.5	29.3	-	30.5
	Terminal-bench (Terminus 1 framework)	31.3	13.3	-	5.7
Math
	AIME 2024 (Pass@1)	66.3	59.4	93.1	91.4
	AIME 2025 (Pass@1)	49.8	51.3	88.4	87.5
	HMMT 2025 (Pass@1)	33.5	29.2	84.2	79.4

Model Specifications

Context Length128000

Quality Index0.85

LicenseCustom

Last UpdatedSeptember 2025

Input TypeText

Output TypeText

PublisherDeepSeek

Languages2 Languages

Quick Start

Related Models

o3-mini

Llama-4-Maverick-17B-128E-Instruct-FP8

Phi-4