Kimi K2 Thinking

Version: 1

Fireworks•Last updated April 2026

Kimi K2 Thinking is an open-source thinking model that reasons step-by-step while dynamically invoking tools, with native INT4 quantization for lossless reductions in inference latency and memory usage.

Coding

Agents

Fireworks on Foundry

Models available for use with Fireworks on Foundry deliver optimized, best-in-class performance on the Fireworks Inference Cloud. Fireworks on Foundry is a Preview subject to Azure Preview terms and the following supplemental terms: When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, and different compliance and data handling rules will apply. See documentation for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organization’s compliance requirements.

Key capabilities

About this model

Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, it was built as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.

Key model capabilities

Step-by-step chain-of-thought reasoning with autonomous tool use
Native INT4 quantization via Quantization-Aware Training (QAT) for lossless performance
Stable tool-use across 200–300 sequential calls
256K token context window
Function calling support (OpenAI-style)

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

Agentic systems and tool-augmented reasoning
Coding
Autonomous search
Longform writing and conversational AI
Enterprise RAG and complex reasoning tasks like AIME25, GPQA

Out of scope use cases

The provider has not supplied this information.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

The provider has not supplied this information.

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

Text

Output formats

Text

Supported languages

English

Sample JSON response

The provider has not supplied this information.

Model architecture

Kimi K2 Thinking is built on the Kimi K2 base, a Mixture-of-Experts (MoE) language model with 1 trillion total parameters and 32 billion activated parameters per forward pass. It is a native INT4 quantization model using Quantization-Aware Training (QAT).

Property	Value
Architecture	Mixture-of-Experts (MoE)
Total Parameters	1T
Activated Parameters	32B
Number of Layers (Dense layer included)	61
Number of Dense Layers	1
Attention Hidden Dimension	7168
MoE Hidden Dimension (per Expert)	2048
Number of Attention Heads	64
Number of Experts	384
Selected Experts per Token	8
Number of Shared Experts	1
Vocabulary Size	160K
Context Length	256K
Attention Mechanism	MLA
Activation Function	SwiGLU

Long context

Context Length: 256K. The model maintains coherence across long sequences and supports up to 200–300 sequential tool-use steps without degradation.

Optimizing model performance

The provider has not supplied this information.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

The provider has not supplied this information.

Distribution

Distribution channels

The provider has not supplied this information.

More information

The provider has not supplied this information.

Model Specifications

Context Length262144

LicenseOther

Last UpdatedApril 2026

Input TypeText

Output TypeText

ProviderFireworks

Languages1 Language

Quick Start