Kimi-K2-Thinking
Kimi-K2-Thinking
Version: 1
Moonshot AILast updated December 2025
Kimi K2 Thinking is the latest, most capable version of open-source thinking model
Reasoning
Multilingual

Direct from Azure models

Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
  • Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
  • Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all as part of one Microsoft Foundry platform.
  • Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
  • Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Learn more about Direct from Azure models .

Key capabilities

About this model

Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.

Key model capabilities

  • Deep Thinking & Tool Orchestration: End-to-end trained to interleave chain-of-thought reasoning with function calls, enabling autonomous research, coding, and writing workflows that last hundreds of steps without drift.
  • Native INT4 Quantization: Quantization-Aware Training (QAT) is employed in post-training stage to achieve lossless 2x speed-up in low-latency mode.
  • Stable Long-Horizon Agency: Maintains coherent goal-directed behavior across up to 200–300 consecutive tool invocations, surpassing prior models that degrade after 30–50 steps.

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

The provider has not supplied this information.

Out of scope use cases

The provider has not supplied this information.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

ArchitectureMixture-of-Experts (MoE)
Total Parameters1T
Activated Parameters32B
Number of Layers (Dense layer included)61
Number of Dense Layers1
Attention Hidden Dimension7168
MoE Hidden Dimension (per Expert)2048
Number of Attention Heads64
Number of Experts384
Selected Experts per Token8
Number of Shared Experts1
Vocabulary Size160K
Context Length256K
Attention MechanismMLA
Activation FunctionSwiGLU

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

The provider has not supplied this information.

Output formats

The provider has not supplied this information.

Supported languages

The provider has not supplied this information.

Sample JSON response

The provider has not supplied this information.

Model architecture

The provider has not supplied this information.

Long context

The provider has not supplied this information.

Optimizing model performance

The provider has not supplied this information.

Additional assets

Please see MoonshotAI's Kimi-K2-Thinking model card here.

Training disclosure

Training, testing and validation

The provider has not supplied this information.

Distribution

Distribution channels

The provider has not supplied this information.

More information

The provider has not supplied this information.

Responsible AI considerations

Safety techniques

Kimi-K2-Thinking poses an elevated risk of producing content that would be blocked by the Foundry Models Protected Material Detection filter . When deployed via Microsoft Foundry, prompts and completions are passed through a default configuration of classification models to detect and prevent the output of harmful content. We recommend customers use the Protected Material Detection filter in conjunction with this model. As with any model, customers should conduct thorough evaluations on production systems before launching, as well as appropriate post-launch monitoring. All customers must comply with the Microsoft Enterprise AI Services Code of Conduct. Configuration options for content filtering vary when you deploy a model for production in Azure AI; learn more .

Safety evaluations

The provider has not supplied this information.

Known limitations

The provider has not supplied this information.

Acceptable use

Acceptable use policy

The provider has not supplied this information.

Quality and performance evaluations

Reasoning Tasks
BenchmarkSettingK2 ThinkingGPT-5
(High)
Claude Sonnet 4.5
(Thinking)
K2 0905DeepSeek-V3.2Grok-4
HLE (Text-only)no tools23.926.319.8*7.919.825.4
w/ tools44.941.7*32.0*21.720.3*41.0
heavy51.042.0---50.7
AIME25no tools94.594.687.051.089.391.7
w/ python99.199.6100.075.258.1*98.8
heavy100.0100.0---100.0
HMMT25no tools89.493.374.6*38.883.690.0
w/ python95.196.788.8*70.449.5*93.9
heavy97.5100.0---96.7
IMO-AnswerBenchno tools78.676.0*65.9*45.876.0*73.1
GPQAno tools84.585.783.474.279.987.5
General Tasks
BenchmarkSettingK2 ThinkingGPT-5
(High)
Claude Sonnet 4.5
(Thinking)
K2 0905DeepSeek-V3.2
MMLU-Prono tools84.687.187.581.985.0
MMLU-Reduxno tools94.495.395.692.793.7
Longform Writingno tools73.871.479.862.872.5
HealthBenchno tools58.067.244.243.846.9
Agentic Search Tasks
BenchmarkSettingK2 ThinkingGPT-5
(High)
Claude Sonnet 4.5
(Thinking)
K2 0905DeepSeek-V3.2
BrowseCompw/ tools60.254.924.17.440.1
BrowseComp-ZHw/ tools62.363.0*42.4*22.247.9
Seal-0w/ tools56.351.4*53.4*25.238.5*
FinSearchComp-T3w/ tools47.448.5*44.0*10.427.0*
Framesw/ tools87.086.0*85.0*58.180.2*
Coding Tasks
BenchmarkSettingK2 ThinkingGPT-5
(High)
Claude Sonnet 4.5
(Thinking)
K2 0905DeepSeek-V3.2
SWE-bench Verifiedw/ tools71.374.977.269.267.8
SWE-bench Multilingualw/ tools61.155.3*68.055.957.9
Multi-SWE-benchw/ tools41.939.3*44.333.530.6
SciCodeno tools44.842.944.730.737.7
LiveCodeBenchV6no tools83.187.0*64.0*56.1*74.1
OJ-Bench (cpp)no tools48.756.2*30.4*25.5*38.2*
Terminal-Benchw/ simulated tools (JSON)47.143.851.044.537.7

Benchmarking methodology

Source: MoonshotAI The provider has not supplied this information.

Public data summary

Source: MoonshotAI The provider has not supplied this information.
Model Specifications
Context Length262144
LicenseOther
Last UpdatedDecember 2025
Input TypeText
Output TypeText
ProviderMoonshot AI
Languages1 Language