Phi-3-mini-4k-instruct

Tiniest member of the Phi-3 family. Optimized for both quality and low latency.

Microsoft

Version: 15

The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties.
The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support.

The model underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures.
When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3 Mini-4K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters.

Resources

🏡 Phi-3 Portal

📰 Phi-3 Microsoft Blog

📖 Phi-3 Technical Report

🛠️ Phi-3 on Azure AI Studio

👩‍🍳 Phi-3 Cookbook

Model Architecture

Phi-3 Mini-4K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidelines.

Training Datasets

Our training data includes a wide variety of sources, totaling 4.9 trillion tokens, and is a combination of

Publicly available documents filtered rigorously for quality, selected high-quality educational data, and code;
Newly created synthetic, "textbook - like" data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (science, daily activities, theory of mind, etc.);
High quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness.

We are focusing on the quality of data that could potentially improve the reasoning ability for the model, and we filter the publicly available documents to contain the correct level of knowledge. As an example, the result of a game in premier league in a particular day might be good training data for frontier models, but we need to remove such information to leave more model capacity for reasoning for the small size models. More details about data can be found in the Phi-3 Technical Report .

Quick facts

Model providerMicrosoft

TypeChat completion

LifecycleRetired

Input typetext

Output typetext

Context window4096

Token limits4096 output

PricingView pricing

Phi-3-mini-4k-instruct

Resources

Model Architecture

Training Datasets

Quick facts

Quick start