FW-DeepSeek-V4-Flash
Models available for use with Fireworks on Foundry deliver optimized, best-in-class performance on the Fireworks Inference Cloud. Fireworks on Foundry is a Non-Microsoft Product. The following terms apply to a Customer's use of Fireworks on Foundry: When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, Customer Data will be sent outside of Microsoft systems, Customer Data will not be processed pursuant to any Foundry data residency documentation, and different compliance and data handling rules will apply. See Trust Center - Fireworks AI for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organization's compliance requirements.
About this model
DeepSeek-V4-Flash is a large language model from DeepSeek AI featuring a Mixture-of-Experts (MoE) architecture with 284 billion total parameters and 13 billion activated parameters, supporting a context length of one million tokens. It incorporates several key architectural innovations including a Hybrid Attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for dramatically improved long-context efficiency, Manifold-Constrained Hyper-Connections (mHC) for enhanced signal propagation stability, and the Muon optimizer for faster convergence. The model is pre-trained on more than 32T diverse and high-quality tokens and supports three reasoning effort modes: Non-think, Think High, and Think Max. DeepSeek-V4-Flash-Max achieves comparable reasoning performance to the Pro version when given a larger thinking budget.
Key model capabilities
- Mixture-of-Experts (MoE) architecture with 284B total parameters and 13B activated parameters
- One million token context window with highly efficient long-context processing
- Hybrid Attention (CSA + HCA) for dramatically reduced inference FLOPs and KV cache usage
- Three reasoning modes: Non-think, Think High, and Think Max
- Function calling support for tool use and agentic workflows
- Strong performance on coding, reasoning, and agentic benchmarks with a compact activated parameter count