Phi-4-reasoning
State-of-the-art open-weight reasoning model.
1. Memory/compute constrained environments.
2. Latency bound scenarios.
3. Reasoning and logic.
Models from Microsoft, Partners, and Community models are a select portfolio of curated models both general-purpose and niche models across diverse scenarios by developed by Microsoft teams, partners, and community contributors
- Managed by Microsoft: Purchase and manage models directly through Azure with a single license, world class support and enterprise grade Azure infrastructure
- Validated by providers: Each model is validated and maintained by its respective provider, with Azure offering integration and deployment guidance.
- Innovation and agility: Combines Microsoft research models with rapid, community-driven advancements.
- Seamless Azure integration: Standard Microsoft Foundry experience, with support managed by the model provider.
- Flexible deployment: Deployable as Managed Compute or Serverless API, based on provider preference.
About this model
Our model is designed to accelerate research on language models, for use as a building block for generative AI powered features. It provides uses for general purpose AI systems and applications (primarily in English) which require:1. Memory/compute constrained environments.
2. Latency bound scenarios.
3. Reasoning and logic.
Key model capabilities
Overall, Phi-4-Reasoning, with only 14B parameters, performs well across a wide range of reasoning tasks, outperforming significantly larger open-weight models such as DeepSeek-R1 distilled 70B model and approaching the performance levels of full DeepSeek R1 model. We also test the models on multiple new reasoning benchmarks for algorithmic problem solving and planning, including 3SAT, TSP, and BA-Calendar. These new tasks are nominally out-of-domain for the models as the training process did not intentionally target these skills, but the models still show strong generalization to these tasks. Furthermore, when evaluating performance against standard general abilities benchmarks such as instruction following or non-reasoning tasks, we find that our new models improve significantly from Phi-4, despite the post-training being focused on reasoning skills in specific domains.Quick facts
Model providerMicrosoft
TypeChat completion
LifecycleGenerally available (GA)
Input typetext
Output typetext
Context window32768
Token limits32768 output
PricingView pricing