Kimi-K2-Thinking
Version: 1
Direct from Azure models
Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:- Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
- Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all as part of one Microsoft Foundry platform.
- Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
- Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Key capabilities
About this model
Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.Key model capabilities
- Deep Thinking & Tool Orchestration: End-to-end trained to interleave chain-of-thought reasoning with function calls, enabling autonomous research, coding, and writing workflows that last hundreds of steps without drift.
- Native INT4 Quantization: Quantization-Aware Training (QAT) is employed in post-training stage to achieve lossless 2x speed-up in low-latency mode.
- Stable Long-Horizon Agency: Maintains coherent goal-directed behavior across up to 200–300 consecutive tool invocations, surpassing prior models that degrade after 30–50 steps.
Use cases
See Responsible AI for additional considerations for responsible use.Key use cases
The provider has not supplied this information.Out of scope use cases
The provider has not supplied this information.Pricing
Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.Technical specs
| Architecture | Mixture-of-Experts (MoE) |
| Total Parameters | 1T |
| Activated Parameters | 32B |
| Number of Layers (Dense layer included) | 61 |
| Number of Dense Layers | 1 |
| Attention Hidden Dimension | 7168 |
| MoE Hidden Dimension (per Expert) | 2048 |
| Number of Attention Heads | 64 |
| Number of Experts | 384 |
| Selected Experts per Token | 8 |
| Number of Shared Experts | 1 |
| Vocabulary Size | 160K |
| Context Length | 256K |
| Attention Mechanism | MLA |
| Activation Function | SwiGLU |
Training cut-off date
The provider has not supplied this information.Training time
The provider has not supplied this information.Input formats
The provider has not supplied this information.Output formats
The provider has not supplied this information.Supported languages
The provider has not supplied this information.Sample JSON response
The provider has not supplied this information.Model architecture
The provider has not supplied this information.Long context
The provider has not supplied this information.Optimizing model performance
The provider has not supplied this information.Additional assets
Please see MoonshotAI's Kimi-K2-Thinking model card here.Training disclosure
Training, testing and validation
The provider has not supplied this information.Distribution
Distribution channels
The provider has not supplied this information.More information
The provider has not supplied this information.Responsible AI considerations
Safety techniques
Kimi-K2-Thinking poses an elevated risk of producing content that would be blocked by the Foundry Models Protected Material Detection filter . When deployed via Microsoft Foundry, prompts and completions are passed through a default configuration of classification models to detect and prevent the output of harmful content. We recommend customers use the Protected Material Detection filter in conjunction with this model. As with any model, customers should conduct thorough evaluations on production systems before launching, as well as appropriate post-launch monitoring. All customers must comply with the Microsoft Enterprise AI Services Code of Conduct. Configuration options for content filtering vary when you deploy a model for production in Azure AI; learn more .Safety evaluations
The provider has not supplied this information.Known limitations
The provider has not supplied this information.Acceptable use
Acceptable use policy
The provider has not supplied this information.Quality and performance evaluations
Reasoning Tasks| Benchmark | Setting | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 (Thinking) | K2 0905 | DeepSeek-V3.2 | Grok-4 |
|---|---|---|---|---|---|---|---|
| HLE (Text-only) | no tools | 23.9 | 26.3 | 19.8* | 7.9 | 19.8 | 25.4 |
| w/ tools | 44.9 | 41.7* | 32.0* | 21.7 | 20.3* | 41.0 | |
| heavy | 51.0 | 42.0 | - | - | - | 50.7 | |
| AIME25 | no tools | 94.5 | 94.6 | 87.0 | 51.0 | 89.3 | 91.7 |
| w/ python | 99.1 | 99.6 | 100.0 | 75.2 | 58.1* | 98.8 | |
| heavy | 100.0 | 100.0 | - | - | - | 100.0 | |
| HMMT25 | no tools | 89.4 | 93.3 | 74.6* | 38.8 | 83.6 | 90.0 |
| w/ python | 95.1 | 96.7 | 88.8* | 70.4 | 49.5* | 93.9 | |
| heavy | 97.5 | 100.0 | - | - | - | 96.7 | |
| IMO-AnswerBench | no tools | 78.6 | 76.0* | 65.9* | 45.8 | 76.0* | 73.1 |
| GPQA | no tools | 84.5 | 85.7 | 83.4 | 74.2 | 79.9 | 87.5 |
| Benchmark | Setting | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 (Thinking) | K2 0905 | DeepSeek-V3.2 |
|---|---|---|---|---|---|---|
| MMLU-Pro | no tools | 84.6 | 87.1 | 87.5 | 81.9 | 85.0 |
| MMLU-Redux | no tools | 94.4 | 95.3 | 95.6 | 92.7 | 93.7 |
| Longform Writing | no tools | 73.8 | 71.4 | 79.8 | 62.8 | 72.5 |
| HealthBench | no tools | 58.0 | 67.2 | 44.2 | 43.8 | 46.9 |
| Benchmark | Setting | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 (Thinking) | K2 0905 | DeepSeek-V3.2 |
|---|---|---|---|---|---|---|
| BrowseComp | w/ tools | 60.2 | 54.9 | 24.1 | 7.4 | 40.1 |
| BrowseComp-ZH | w/ tools | 62.3 | 63.0* | 42.4* | 22.2 | 47.9 |
| Seal-0 | w/ tools | 56.3 | 51.4* | 53.4* | 25.2 | 38.5* |
| FinSearchComp-T3 | w/ tools | 47.4 | 48.5* | 44.0* | 10.4 | 27.0* |
| Frames | w/ tools | 87.0 | 86.0* | 85.0* | 58.1 | 80.2* |
| Benchmark | Setting | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 (Thinking) | K2 0905 | DeepSeek-V3.2 |
|---|---|---|---|---|---|---|
| SWE-bench Verified | w/ tools | 71.3 | 74.9 | 77.2 | 69.2 | 67.8 |
| SWE-bench Multilingual | w/ tools | 61.1 | 55.3* | 68.0 | 55.9 | 57.9 |
| Multi-SWE-bench | w/ tools | 41.9 | 39.3* | 44.3 | 33.5 | 30.6 |
| SciCode | no tools | 44.8 | 42.9 | 44.7 | 30.7 | 37.7 |
| LiveCodeBenchV6 | no tools | 83.1 | 87.0* | 64.0* | 56.1* | 74.1 |
| OJ-Bench (cpp) | no tools | 48.7 | 56.2* | 30.4* | 25.5* | 38.2* |
| Terminal-Bench | w/ simulated tools (JSON) | 47.1 | 43.8 | 51.0 | 44.5 | 37.7 |
Benchmarking methodology
Source: MoonshotAI The provider has not supplied this information.Public data summary
Source: MoonshotAI The provider has not supplied this information.Model Specifications
Context Length262144
LicenseOther
Last UpdatedDecember 2025
Input TypeText
Output TypeText
ProviderMoonshot AI
Languages1 Language