DeepSeek-V4-Flash
DeepSeek-V4-Flash
Version: 2026-04-23
DeepSeekLast updated May 2026
DeepSeek V4 is an efficient MoE model family with 1M context and near state-of-the-art open-source reasoning performance.
Coding

Direct from Azure models

Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
  • Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
  • Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all as part of one Microsoft Foundry platform.
  • Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
  • Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Learn more about Direct from Azure models .

Key capabilities

About this model

We present a preview version of the DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline. The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followed by unified model consolidation via on-policy distillation, integrating distinct proficiencies across diverse domains into a single model. DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, significantly advances the knowledge capabilities of open-source models, firmly establishing itself as the best open-source model available today. It achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks. Meanwhile, DeepSeek-V4-Flash-Max achieves comparable reasoning performance to the Pro version when given a larger thinking budget, though its smaller parameter scale naturally places it slightly behind on pure knowledge tasks and the most complex agentic workflows.

Key model capabilities

  • Hybrid Attention Architecture: We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2.
  • Manifold-Constrained Hyper-Connections (mHC): We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity.
  • Muon Optimizer: We employ the Muon optimizer for faster convergence and greater training stability.

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

The provider has not supplied this information.

Out of scope use cases

Microsoft and external researchers have found Deepseek-V4-Flash to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

The provider has not supplied this information.

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

Text

Output formats

Text

Supported languages

The provider has not supplied this information.

Sample JSON response

The provider has not supplied this information.

Model architecture

The provider has not supplied this information.

Long context

The provider has not supplied this information.

Optimizing model performance

The provider has not supplied this information.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

The provider has not supplied this information.

Distribution

Distribution channels

The provider has not supplied this information.

More information

Learn more: original model announcement This release does not include a Jinja-format chat template. Instead, we provide a dedicated encoding folder with Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model, and how to parse the model's text output. Please refer to the encoding folder for full documentation.

Content filtering

When deployed via Microsoft Foundry, prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Configuration options for content filtering vary when you deploy a model for production in Azure AI; learn more .

Responsible AI considerations

Safety techniques

The provider has not supplied this information.

Safety evaluations

Microsoft and external researchers have found Deepseek-V4-Flash to be less aligned than other models -- meaning the model appears to have undergone less refinement designed to make its behavior and outputs more safe and appropriate for users -- resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. We recommend customers use Azure AI Content Safety in conjunction with this model and conduct their own evaluations on production systems.

Known limitations

The provider has not supplied this information.

Acceptable use

Acceptable use policy

The provider has not supplied this information.

Base Model

Benchmark (Metric)# ShotsDeepSeek-V3.2-BaseDeepSeek-V4-Flash-BaseDeepSeek-V4-Pro-Base
Architecture-MoEMoEMoE
# Activated Params-37B13B49B
# Total Params-671B284B1.6T
World Knowledge
AGIEval (EM)0-shot80.182.683.1
MMLU (EM)5-shot87.888.790.1
MMLU-Redux (EM)5-shot87.589.490.8
MMLU-Pro (EM)5-shot65.568.373.5
MMMLU (EM)5-shot87.988.890.3
C-Eval (EM)5-shot90.492.193.1
CMMLU (EM)5-shot88.990.490.8
MultiLoKo (EM)5-shot38.742.251.1
Simple-QA verified (EM)25-shot28.330.155.2
SuperGPQA (EM)5-shot45.046.553.9
FACTS Parametric (EM)25-shot27.133.962.6
TriviaQA (EM)5-shot83.382.885.6
Language & Reasoning
BBH (EM)3-shot87.686.987.5
DROP (F1)1-shot88.288.688.7
HellaSwag (EM)0-shot86.485.788.0
WinoGrande (EM)0-shot78.979.581.5
CLUEWSC (EM)5-shot83.582.285.2
Code & Math
BigCodeBench (Pass@1)3-shot63.956.859.2
HumanEval (Pass@1)0-shot62.869.576.8
GSM8K (EM)8-shot91.190.892.6
MATH (EM)4-shot60.557.464.5
MGSM (EM)8-shot81.385.784.4
CMath (EM)3-shot92.693.690.9
Long Context
LongBench-V2 (EM)1-shot40.244.751.5

Instruct Model

DeepSeek-V4-Pro and DeepSeek-V4-Flash both support three reasoning effort modes:
Reasoning ModeCharacteristicsTypical Use CasesResponse Format
Non-thinkFast, intuitive responsesRoutine daily tasks, low-risk decisions</think> summary
Think HighConscious logical analysis, slower but more accurateComplex problem-solving, planning<think> thinking </think> summary
Think MaxPush reasoning to its fullest extentExploring the boundary of model reasoning capabilitySpecial system prompt + <think> thinking </think> summary

DeepSeek-V4-Pro-Max vs Frontier Models

Benchmark (Metric)Opus-4.6 MaxGPT-5.4 xHighGemini-3.1-Pro HighK2.6 ThinkingGLM-5.1 ThinkingDS-V4-Pro Max
Knowledge & Reasoning
MMLU-Pro (EM)89.187.591.087.186.087.5
SimpleQA-Verified (Pass@1)46.245.375.636.938.157.9
Chinese-SimpleQA (Pass@1)76.476.885.975.975.084.4
GPQA Diamond (Pass@1)91.393.094.390.586.290.1
HLE (Pass@1)40.039.844.436.434.737.7
LiveCodeBench (Pass@1)88.8-91.789.6-93.5
Codeforces (Rating)-31683052--3206
HMMT 2026 Feb (Pass@1)96.297.794.792.789.495.2
IMOAnswerBench (Pass@1)75.391.481.086.083.889.8
Apex (Pass@1)34.554.160.924.011.538.3
Apex Shortlist (Pass@1)85.978.189.175.572.490.2
Long Context
MRCR 1M (MMR)92.9-76.3--83.5
CorpusQA 1M (ACC)71.7-53.8--62.0
Agentic
Terminal Bench 2.0 (Acc)65.475.168.566.763.567.9
SWE Verified (Resolved)80.8-80.680.2-80.6
SWE Pro (Resolved)57.357.754.258.658.455.4
SWE Multilingual (Resolved)77.5--76.773.376.2
BrowseComp (Pass@1)83.782.785.983.279.383.4
HLE w/ tools (Pass@1)53.152.051.654.050.448.2
GDPval-AA (Elo)161916741314148215351554
MCPAtlas Public (Pass@1)73.867.269.266.671.873.6
Toolathlon (Pass@1)47.254.648.850.040.751.8

Comparison across Modes

Benchmark (Metric)V4-Flash Non-ThinkV4-Flash HighV4-Flash MaxV4-Pro Non-ThinkV4-Pro HighV4-Pro Max
Knowledge & Reasoning
MMLU-Pro (EM)83.086.486.282.987.187.5
SimpleQA-Verified (Pass@1)23.128.934.145.046.257.9
Chinese-SimpleQA (Pass@1)71.573.278.975.877.784.4
GPQA Diamond (Pass@1)71.287.488.172.989.190.1
HLE (Pass@1)8.129.434.87.734.537.7
LiveCodeBench (Pass@1)55.288.491.656.889.893.5
Codeforces (Rating)-28163052-29193206
HMMT 2026 Feb (Pass@1)40.891.994.831.794.095.2
IMOAnswerBench (Pass@1)41.985.188.435.388.089.8
Apex (Pass@1)1.019.133.00.427.438.3
Apex Shortlist (Pass@1)9.372.185.79.285.590.2
Long Context
MRCR 1M (MMR)37.576.978.744.783.383.5
CorpusQA 1M (ACC)15.559.360.535.656.562.0
Agentic
Terminal Bench 2.0 (Acc)49.156.656.959.163.367.9
SWE Verified (Resolved)73.778.679.073.679.480.6
SWE Pro (Resolved)49.152.352.652.154.455.4
SWE Multilingual (Resolved)69.770.273.369.874.176.2
BrowseComp (Pass@1)-53.573.2-80.483.4
HLE w/ tools (Pass@1)-40.345.1-44.748.2
MCPAtlas (Pass@1)64.067.469.069.474.273.6
GDPval-AA (Elo)--1395--1554
Toolathlon (Pass@1)40.743.547.846.349.051.8
Model Specifications
Context Length1000000
Quality Index0.72
LicenseCustom
Last UpdatedMay 2026
Input TypeText
Output TypeText
ProviderDeepSeek
Languages2 Languages
Related Models