FW-DeepSeek-V4-Pro

DeepSeek V4 Pro is a 1.6T-parameter Mixture-of-Experts model with 49B activated parameters, supporting a 1M token context window with highly efficient long-context intelligence.

Fireworks

Version: 1

Fireworks on Foundry

Models available for use with Fireworks on Foundry deliver optimized, best-in-class performance on the Fireworks Inference Cloud. Fireworks on Foundry is a Non-Microsoft Product. The following terms apply to a Customer's use of Fireworks on Foundry: When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, Customer Data will be sent outside of Microsoft systems, Customer Data will not be processed pursuant to any Foundry data residency documentation, and different compliance and data handling rules will apply. See Trust Center - Fireworks AI for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organization's compliance requirements.

Key capabilities

About this model

DeepSeek-V4-Pro is a large language model from DeepSeek AI featuring a Mixture-of-Experts (MoE) architecture with 1.6 trillion total parameters and 49 billion activated parameters, supporting a context length of one million tokens. It incorporates several key architectural innovations including a Hybrid Attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for dramatically improved long-context efficiency, Manifold-Constrained Hyper-Connections (mHC) for enhanced signal propagation stability, and the Muon optimizer for faster convergence. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. The model is pre-trained on more than 32T diverse and high-quality tokens and supports three reasoning effort modes: Non-think, Think High, and Think Max.

Key model capabilities

Mixture-of-Experts (MoE) architecture with 1.6T total parameters and 49B activated parameters
One million token context window with highly efficient long-context processing
Hybrid Attention (CSA + HCA) for dramatically reduced inference FLOPs and KV cache usage
Three reasoning modes: Non-think, Think High, and Think Max
Function calling support for tool use and agentic workflows
Top-tier performance on coding, reasoning, and agentic benchmarks

Use cases

Pricing

Technical specs

Training disclosure

Distribution

More information

Quick facts

Model providerFireworks

TypeChat completion

LifecycleGenerally available (GA)

Input typetext

Output typetext

Context window1048.576k

PricingView pricing

FW-DeepSeek-V4-Pro

About this model

Key model capabilities

Quick facts

Quick start