OpenAI GPT-4o mini
Version: 2024-07-18
Direct from Azure models
Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:- Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
- Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
- Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
- Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Key capabilities
About this model
GPT-4o mini surpasses GPT-3.5 Turbo and other small models on academic benchmarks across both textual intelligence and multimodal reasoning, and supports the same range of languages as GPT-4o. It also demonstrates strong performance in function calling, which can enable developers to build applications that fetch data or take actions with external systems, and improved long-context performance compared to GPT-3.5 Turbo.Key model capabilities
GPT-4o mini has been evaluated across several key benchmarks. Reasoning tasks: GPT-4o mini is better than other small models at reasoning tasks involving both text and vision, scoring 82.0% on MMLU, a textual intelligence and reasoning benchmark, as compared to 77.9% for Gemini Flash and 73.8% for Claude Haiku. Math and coding proficiency: GPT-4o mini excels in mathematical reasoning and coding tasks, outperforming previous small models on the market. On MGSM, measuring math reasoning, GPT-4o mini scored 87.0%, compared to 75.5% for Gemini Flash and 71.7% for Claude Haiku. GPT-4o mini scored 87.2% on HumanEval, which measures coding performance, compared to 71.5% for Gemini Flash and 75.9% for Claude Haiku. Multimodal reasoning: GPT-4o mini also shows strong performance on MMMU, a multimodal reasoning eval, scoring 59.4% compared to 56.1% for Gemini Flash and 50.2% for Claude Haiku.| Task | GPT-4o mini Score | Gemini Flash Score | Claude Haiku Score |
|---|---|---|---|
| MMLU (Reasoning Text and Vision) | 82.0% | 77.9% | 73.8% |
| MGSM (Math Reasoning) | 87.0% | 75.5% | 71.7% |
| HumanEval (Coding Performance) | 87.2% | 71.5% | 75.9% |
| MMMU (Multimodal Reasoning) | 59.4% | 56.1% | 50.2% |
Use cases
See Responsible AI for additional considerations for responsible use.Key use cases
GPT-4o mini enables a broad range of tasks with its low cost and latency, such as applications that chain or parallelize multiple model calls (e.g., calling multiple APIs), pass a large volume of context to the model (e.g., full code base or conversation history), or interact with customers through fast, real-time text responses (e.g., customer support chatbots).Out of scope use cases
The provider has not supplied this information.Pricing
Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.Technical specs
The provider has not supplied this information.Training cut-off date
The model has knowledge up to October 2023.Training time
The provider has not supplied this information.Input formats
Today, GPT-4o mini supports text and vision in the API, with support for text, image, video and audio inputs and outputs coming in the future.Output formats
Today, GPT-4o mini supports text and vision in the API, with support for text, image, video and audio inputs and outputs coming in the future.Supported languages
GPT-4o mini supports the same range of languages as GPT-4o. Thanks to the improved tokenizer shared with GPT-4o, handling non-English text is now even more cost effective.Sample JSON response
The provider has not supplied this information.Model architecture
The provider has not supplied this information.Long context
The model has a context window of 128K tokens and improved long-context performance compared to GPT-3.5 Turbo.Optimizing model performance
The provider has not supplied this information.Additional assets
The provider has not supplied this information.Training disclosure
Training, testing and validation
Built-in safety measures - Safety is built into our models from the beginning, and reinforced at every step of our development process. In pre-training, we filter out information that we do not want our models to learn from or output, such as hate speech, adult content, sites that primarily aggregate personal information, and spam. In post-training, we align the model's behavior to our policies using techniques such as reinforcement learning with human feedback (RLHF) to improve the accuracy and reliability of the models' responses.Distribution
Distribution channels
This model is provided through the Azure OpenAI service.More information
Responsible AI considerations
Safety techniques
Safety is built into our models from the beginning, and reinforced at every step of our development process. In pre-training, we filter out information that we do not want our models to learn from or output, such as hate speech, adult content, sites that primarily aggregate personal information, and spam. In post-training, we align the model's behavior to our policies using techniques such as reinforcement learning with human feedback (RLHF) to improve the accuracy and reliability of the models' responses. GPT-4o mini has the same safety mitigations built-in as GPT-4o, which we carefully assessed using both automated and human evaluations according to our Preparedness Framework and in line with our voluntary commitments. Building on these learnings, our teams also worked to improve the safety of GPT-4o mini using new techniques informed by our research. GPT-4o mini in the API is the first model to apply our instruction hierarchy method, which helps to improve the model's ability to resist jailbreaks, prompt injections, and system prompt extractions. This makes the model's responses more reliable and helps make it safer to use in applications at scale. Prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Additional classification models and configuration options are available when you deploy an Azure OpenAI model in production; learn more .Safety evaluations
More than 70 external experts in fields like social psychology and misinformation tested GPT-4o to identify potential risks, which we have addressed and plan to share the details of in the forthcoming GPT-4o system card and Preparedness scorecard. Insights from these expert evaluations have helped improve the safety of both GPT-4o and GPT-4o mini. We'll continue to monitor how GPT-4o mini is being used and improve the model's safety as we identify new risks.Known limitations
The provider has not supplied this information.Acceptable use
Acceptable use policy
The following documents are applicable:Quality and performance evaluations
Source: OpenAI GPT-4o mini surpasses GPT-3.5 Turbo and other small models on academic benchmarks across both textual intelligence and multimodal reasoning, and supports the same range of languages as GPT-4o. It also demonstrates strong performance in function calling, which can enable developers to build applications that fetch data or take actions with external systems, and improved long-context performance compared to GPT-3.5 Turbo. GPT-4o mini has been evaluated across several key benchmarks. Reasoning tasks: GPT-4o mini is better than other small models at reasoning tasks involving both text and vision, scoring 82.0% on MMLU, a textual intelligence and reasoning benchmark, as compared to 77.9% for Gemini Flash and 73.8% for Claude Haiku. Math and coding proficiency: GPT-4o mini excels in mathematical reasoning and coding tasks, outperforming previous small models on the market. On MGSM, measuring math reasoning, GPT-4o mini scored 87.0%, compared to 75.5% for Gemini Flash and 71.7% for Claude Haiku. GPT-4o mini scored 87.2% on HumanEval, which measures coding performance, compared to 71.5% for Gemini Flash and 75.9% for Claude Haiku. Multimodal reasoning: GPT-4o mini also shows strong performance on MMMU, a multimodal reasoning eval, scoring 59.4% compared to 56.1% for Gemini Flash and 50.2% for Claude Haiku.| Task | GPT-4o mini Score | Gemini Flash Score | Claude Haiku Score |
|---|---|---|---|
| MMLU (Reasoning Text and Vision) | 82.0% | 77.9% | 73.8% |
| MGSM (Math Reasoning) | 87.0% | 75.5% | 71.7% |
| HumanEval (Coding Performance) | 87.2% | 71.5% | 75.9% |
| MMMU (Multimodal Reasoning) | 59.4% | 56.1% | 50.2% |
Benchmarking methodology
Source: OpenAI The provider has not supplied this information.Public data summary
Source: OpenAI The provider has not supplied this information.Model Specifications
Context Length131072
Quality Index0.72
LicenseCustom
Training DataSeptember 2023
Last UpdatedDecember 2025
Input TypeText,Image,Audio
Output TypeText
ProviderOpenAI
Languages27 Languages
Related Models