FW-DeepSeek-V4-Pro
DeepSeek V4 Pro is a 1.6T-parameter Mixture-of-Experts model with 49B activated parameters, supporting a 1M token context window with highly efficient long-context intelligence.
Models available for use with Fireworks on Foundry deliver optimized, best-in-class performance on the Fireworks Inference Cloud. Fireworks on Foundry is a Non-Microsoft Product. The following terms apply to a Customer's use of Fireworks on Foundry: When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, Customer Data will be sent outside of Microsoft systems, Customer Data will not be processed pursuant to any Foundry data residency documentation, and different compliance and data handling rules will apply. See Trust Center - Fireworks AI for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organization's compliance requirements.
About this model
DeepSeek-V4-Pro is a large language model from DeepSeek AI featuring a Mixture-of-Experts (MoE) architecture with 1.6 trillion total parameters and 49 billion activated parameters, supporting a context length of one million tokens. It incorporates several key architectural innovations including a Hybrid Attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for dramatically improved long-context efficiency, Manifold-Constrained Hyper-Connections (mHC) for enhanced signal propagation stability, and the Muon optimizer for faster convergence. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. The model is pre-trained on more than 32T diverse and high-quality tokens and supports three reasoning effort modes: Non-think, Think High, and Think Max.Key model capabilities
- Mixture-of-Experts (MoE) architecture with 1.6T total parameters and 49B activated parameters
- One million token context window with highly efficient long-context processing
- Hybrid Attention (CSA + HCA) for dramatically reduced inference FLOPs and KV cache usage
- Three reasoning modes: Non-think, Think High, and Think Max
- Function calling support for tool use and agentic workflows
- Top-tier performance on coding, reasoning, and agentic benchmarks
Quick facts
Model providerFireworks
TypeChat completion
LifecycleGenerally available (GA)
Input typetext
Output typetext
Context window1048.576k
PricingView pricing