Phi-4-mini-reasoning

Lightweight math reasoning model optimized for multi-step problem solving

Microsoft

Version: 1

Models from Microsoft, Partners, and Community

Models from Microsoft, Partners, and Community models are a select portfolio of curated models both general-purpose and niche models across diverse scenarios by developed by Microsoft teams, partners, and community contributors

Managed by Microsoft: Purchase and manage models directly through Azure with a single license, world class support and enterprise grade Azure infrastructure
Validated by providers: Each model is validated and maintained by its respective provider, with Azure offering integration and deployment guidance.
Innovation and agility: Combines Microsoft research models with rapid, community-driven advancements.
Seamless Azure integration: Standard Microsoft Foundry experience, with support managed by the model provider.
Flexible deployment: Deployable as Managed Compute or Serverless API, based on provider preference.

Learn more about models from Microsoft, Partners, and Community

Key capabilities

About this model

Built on synthetic and high-quality math datasets, the model leverages advanced fine-tuning techniques such as supervised fine-tuning and preference modeling to enhance reasoning capabilities. Its training incorporates safety and alignment protocols, ensuring robust and reliable performance across supported use cases.

Key model capabilities

Phi-4-mini-reasoning is designed for multi-step, logic-intensive mathematical problem-solving tasks. Some of the use cases include formal proof generation, symbolic computation, advanced word problems, and a wide range of mathematical reasoning scenarios. These models excel at maintaining context across steps, applying structured logic, and delivering accurate, reliable solutions in domains that require deep analytical thinking.

To understand the capabilities, the Phi-4-mini-reasoning model was evaluated using a variety of popular math reasoning benchmarks. These benchmarks assess the model's performance on complex mathematical reasoning and problem-solving tasks, particularly focused on multi-step, logic-intensive questions. The model demonstrated strong performance in reasoning and problem-solving, particularly in contexts requiring advanced mathematical understanding.

We evaluate the model with three of the most popular math benchmarks where the strongest reasoning models are competing together. Specifically:

Math-500: This benchmark consists of 500 challenging math problems designed to test the model's ability to perform complex mathematical reasoning and problem-solving.
AIME 2024: The American Invitational Mathematics Examination (AIME) is a highly regarded math competition that features a series of difficult problems aimed at assessing advanced mathematical skills and logical reasoning.
GPQA Diamond: The Graduate-Level Google-Proof Q&A (GPQA) Diamond benchmark focuses on evaluating the model's ability to understand and solve a wide range of mathematical questions, including both straightforward calculations and more intricate problem-solving tasks.

Benchmark	o1-mini*	DeepSeek-R1-Distill-Qwen-7B	DeepSeek-R1-Distill-Llama-8B	Bespoke-Stratos-7B*	OpenThinker-7B*	Llama-3.2-3B-Instruct	Phi-4-Mini (base model, 3.8B)	Phi-4-mini-reasoning (3.8B)
AIME	63.6	53.3	43.3	20.0	31.3	6.7	10.0	57.5
MATH-500	90.0	91.4	86.9	82.0	83.0	44.4	71.8	94.6
GPQA Diamond	60.0	49.5	47.3	37.8	42.4	25.3	36.9	52.0

Overall, Phi-4-mini-reasoning demonstrates competitive reasoning capabilities across these benchmarks, excelling particularly in mathematical reasoning tasks. The model outperforms comparable models in multiple benchmarks, reinforcing its suitability for applications requiring high-quality mathematical problem-solving.

Use cases

Pricing

Technical specs

Training disclosure

Distribution

More information

Quick facts

Model providerMicrosoft

TypeChat completion

LifecyclePreview

Input typetext

Output typetext

Context window128k

Token limits128k output

PricingView pricing

Phi-4-mini-reasoning

About this model

Key model capabilities

Quick facts

Quick start