Phi-3-mini-4k-instruct
Tiniest member of the Phi-3 family. Optimized for both quality and low latency.The Phi-3-Mini-4K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties.
The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support. The model underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures.
When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3 Mini-4K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters.
📰 Phi-3 Microsoft Blog
📖 Phi-3 Technical Report
🛠️ Phi-3 on Azure AI Studio
👩🍳 Phi-3 Cookbook
The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support. The model underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures.
When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3 Mini-4K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters.
Resources
🏡 Phi-3 Portal📰 Phi-3 Microsoft Blog
📖 Phi-3 Technical Report
🛠️ Phi-3 on Azure AI Studio
👩🍳 Phi-3 Cookbook
Model Architecture
Phi-3 Mini-4K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidelines.Training Datasets
Our training data includes a wide variety of sources, totaling 4.9 trillion tokens, and is a combination of- Publicly available documents filtered rigorously for quality, selected high-quality educational data, and code;
- Newly created synthetic, "textbook - like" data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (science, daily activities, theory of mind, etc.);
- High quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness.
Quick facts
Model providerMicrosoft
TypeChat completion
LifecycleGenerally available (GA)
Input typetext
Output typetext
Context window4096
Token limits4096 output
PricingView pricing