Kimi K2 Thinking
Version: 1
Fireworks on Foundry
Models available for use with Fireworks on Foundry deliver optimized, best-in-class performance on the Fireworks Inference Cloud. Fireworks on Foundry is a Preview subject to Azure Preview terms and the following supplemental terms: When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, and different compliance and data handling rules will apply. See documentation for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organization’s compliance requirements.Key capabilities
About this model
Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, it was built as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.Key model capabilities
- Step-by-step chain-of-thought reasoning with autonomous tool use
- Native INT4 quantization via Quantization-Aware Training (QAT) for lossless performance
- Stable tool-use across 200–300 sequential calls
- 256K token context window
- Function calling support (OpenAI-style)
Use cases
See Responsible AI for additional considerations for responsible use.Key use cases
- Agentic systems and tool-augmented reasoning
- Coding
- Autonomous search
- Longform writing and conversational AI
- Enterprise RAG and complex reasoning tasks like AIME25, GPQA
Out of scope use cases
The provider has not supplied this information.Pricing
Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.Technical specs
The provider has not supplied this information.Training cut-off date
The provider has not supplied this information.Training time
The provider has not supplied this information.Input formats
TextOutput formats
TextSupported languages
EnglishSample JSON response
The provider has not supplied this information.Model architecture
Kimi K2 Thinking is built on the Kimi K2 base, a Mixture-of-Experts (MoE) language model with 1 trillion total parameters and 32 billion activated parameters per forward pass. It is a native INT4 quantization model using Quantization-Aware Training (QAT).| Property | Value |
|---|---|
| Architecture | Mixture-of-Experts (MoE) |
| Total Parameters | 1T |
| Activated Parameters | 32B |
| Number of Layers (Dense layer included) | 61 |
| Number of Dense Layers | 1 |
| Attention Hidden Dimension | 7168 |
| MoE Hidden Dimension (per Expert) | 2048 |
| Number of Attention Heads | 64 |
| Number of Experts | 384 |
| Selected Experts per Token | 8 |
| Number of Shared Experts | 1 |
| Vocabulary Size | 160K |
| Context Length | 256K |
| Attention Mechanism | MLA |
| Activation Function | SwiGLU |
Long context
Context Length: 256K. The model maintains coherence across long sequences and supports up to 200–300 sequential tool-use steps without degradation.Optimizing model performance
The provider has not supplied this information.Additional assets
The provider has not supplied this information.Training disclosure
Training, testing and validation
The provider has not supplied this information.Distribution
Distribution channels
The provider has not supplied this information.More information
The provider has not supplied this information.Model Specifications
Context Length262144
LicenseOther
Last UpdatedApril 2026
Input TypeText
Output TypeText
ProviderFireworks
Languages1 Language