Phi-3-mini-128k-instruct

Same Phi-3-mini model, but with a larger context size for RAG or few shot prompting.

Microsoft

Version: 13

The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.

After initial training, the model underwent a post-training process that involved supervised fine-tuning and direct preference optimization to enhance its ability to follow instructions and adhere to safety measures.
When evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning, the Phi-3 Mini-128K-Instruct demonstrated robust and state-of-the-art performance among models with fewer than 13 billion parameters.

Resources

🏡 Phi-3 Portal

📰 Phi-3 Microsoft Blog

📖 Phi-3 Technical Report

🛠️ Phi-3 on Azure AI Studio

👩‍🍳 Phi-3 Cookbook

Model Architecture

Phi-3 Mini-128K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidelines.

Training Datasets

Our training data includes a wide variety of sources, totaling 4.9 trillion tokens, and is a combination of

Publicly available documents filtered rigorously for quality, selected high-quality educational data, and code;
Newly created synthetic, "textbook - like" data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (science, daily activities, theory of mind, etc.);
High quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness.

We are focusing on the quality of data that could potentially improve the reasoning ability for the model, and we filter the publicly available documents to contain the correct level of knowledge. As an example, the result of a game in premier league in a particular day might be good training data for frontier models, but we need to remove such information to leave more model capacity for reasoning for the small size models. More details about data can be found in the Phi-3 Technical Report .

Quick facts

Model providerMicrosoft

TypeChat completion

LifecycleRetired

Input typetext

Output typetext

Context window131.072k

Token limits4096 output

PricingView pricing

Phi-3-mini-128k-instruct

Resources

Model Architecture

Training Datasets

Quick facts

Quick start