FW-DeepSeek-V4-Flash

DeepSeek V4 Flash is a 284B-parameter Mixture-of-Experts model with 13B activated parameters, supporting a 1M token context window with highly efficient long-context intelligence.

Fireworks

Version: 1

Fireworks on Foundry

Models available for use with Fireworks on Foundry deliver optimized, best-in-class performance on the Fireworks Inference Cloud. Fireworks on Foundry is a Non-Microsoft Product. The following terms apply to a Customer's use of Fireworks on Foundry: When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, Customer Data will be sent outside of Microsoft systems, Customer Data will not be processed pursuant to any Foundry data residency documentation, and different compliance and data handling rules will apply. See Trust Center - Fireworks AI for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organization's compliance requirements.

Key capabilities

About this model

DeepSeek-V4-Flash is a large language model from DeepSeek AI featuring a Mixture-of-Experts (MoE) architecture with 284 billion total parameters and 13 billion activated parameters, supporting a context length of one million tokens. It incorporates several key architectural innovations including a Hybrid Attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for dramatically improved long-context efficiency, Manifold-Constrained Hyper-Connections (mHC) for enhanced signal propagation stability, and the Muon optimizer for faster convergence. The model is pre-trained on more than 32T diverse and high-quality tokens and supports three reasoning effort modes: Non-think, Think High, and Think Max. DeepSeek-V4-Flash-Max achieves comparable reasoning performance to the Pro version when given a larger thinking budget.

Key model capabilities

Mixture-of-Experts (MoE) architecture with 284B total parameters and 13B activated parameters
One million token context window with highly efficient long-context processing
Hybrid Attention (CSA + HCA) for dramatically reduced inference FLOPs and KV cache usage
Three reasoning modes: Non-think, Think High, and Think Max
Function calling support for tool use and agentic workflows
Strong performance on coding, reasoning, and agentic benchmarks with a compact activated parameter count

Use cases

Pricing

Technical specs

Training disclosure

Distribution

More information

Quick facts

Model providerFireworks

TypeChat completion

LifecycleGenerally available (GA)

Input typetext

Output typetext

Context window1048.576k

PricingView pricing

FW-DeepSeek-V4-Flash

About this model

Key model capabilities

Quick facts

Quick start