DeepSeek V3.1
DeepSeek V3.1
Version: 1
Fireworks•Last updated April 2026
DeepSeek V3.1 is a 685B-parameter Mixture-of-Experts model with dual-mode thinking and non-thinking chat, featuring two-phase long context extension up to 163K tokens.
Coding
Agents

Fireworks on Foundry

Models available for use with Fireworks on Foundry deliver optimized, best-in-class performance on the Fireworks Inference Cloud. Fireworks on Foundry is a Preview subject to Azure Preview terms and the following supplemental terms: When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, and different compliance and data handling rules will apply. See documentation for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organization’s compliance requirements.

Key capabilities

About this model

DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.

Key model capabilities

  • Dual-mode architecture with "thinking" and "non-thinking" chat modes for both fast inference and complex agentic behaviors
  • Two-phase long context extension with 32K extension (630B tokens) and 128K extension (209B tokens)
  • Trained using the UE8M0 FP8 scale data format for compatibility with microscaling data formats
  • Function calling support including custom tools, code agents, search agents, and multi-turn tool use

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

  • Conversational AI
  • Code assistance
  • Agentic systems
  • Enterprise RAG (retrieval-augmented generation)
  • Multimodal workflows

Out of scope use cases

The provider has not supplied this information.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

The provider has not supplied this information.

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

Text

Output formats

Text

Supported languages

English, Chinese

Sample JSON response

The provider has not supplied this information.

Model architecture

DeepSeek-V3.1 is a hybrid large language model (LLM) with a Mixture-of-Experts (MoE) architecture.
PropertyValue
Total Parameters685B
Activated Parameters37B
ArchitectureMixture-of-Experts (MoE)

Long context

Maximum context length is 163,840 tokens. The base model was trained on 32K and 128K token extensions.

Optimizing model performance

The provider has not supplied this information.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

The provider has not supplied this information.

Distribution

Distribution channels

The provider has not supplied this information.

More information

Responsible AI considerations

This model is sourced from Fireworks AI. It is a Non-Microsoft Product
under the Product Terms, and has not been tested or evaluated by
Microsoft. Customers should ensure that the model is appropriate for
their specific use, including by evaluating any legal or export-control
considerations and conducting their own model risk and safety
evaluations. You can learn about Foundry risk and safety evaluations
here .
Model Specifications
Context Length163840
LicenseOther
Last UpdatedApril 2026
Input TypeText
Output TypeText
ProviderFireworks
Languages2 Languages