OpenAI o1-mini
Version: 2024-09-12
Direct from Azure models
Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:- Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
- Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
- Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
- Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Key capabilities
About this model
The OpenAI o1 series models are specifically designed to tackle reasoning and problem-solving tasks with increased focus and capability. These models spend more time processing and understanding the user's request, making them exceptionally strong in areas like science, coding, math and similar fields.Key model capabilities
- Complex Code Generation: Capable of generating algorithms and handling advanced coding tasks to support developers.
- Advanced Problem Solving: Ideal for comprehensive brainstorming sessions and addressing multifaceted challenges.
- Complex Document Comparison: Perfect for analyzing contracts, case files, or legal documents to identify subtle differences.
- Instruction Following and Workflow Management: Particularly effective for managing workflows requiring shorter contexts.
Use cases
See Responsible AI for additional considerations for responsible use.Key use cases
For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows.Out of scope use cases
o1-mini model is currently in preview and do not include some features available in other models, such as image understanding and structured outputs found in the GPT-4o and GPT-4o-mini models. For many tasks, the generally available GPT-4o models may still be more suitable. Note: Configurable content filters are currently not available for o1-preview and o1-mini.Pricing
Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.Technical specs
Large language models such as o1 are pre-trained on vast text datasets. While these high-capacity models have broad world knowledge, they can be expensive and slow for real-world applications. In contrast, o1-mini is a smaller model optimized for STEM reasoning during pretraining. After training with the same high-compute reinforcement learning (RL) pipeline as o1, o1-mini achieves comparable performance on many useful reasoning tasks, while being significantly more cost efficient.Training cut-off date
The provider has not supplied this information.Training time
The provider has not supplied this information.Input formats
The provider has not supplied this information.Output formats
The provider has not supplied this information.Supported languages
The provider has not supplied this information.Sample JSON response
The provider has not supplied this information.Model architecture
The provider has not supplied this information.Long context
The provider has not supplied this information.Optimizing model performance
The provider has not supplied this information.Additional assets
The provider has not supplied this information.Training disclosure
Training, testing and validation
Large language models such as o1 are pre-trained on vast text datasets. While these high-capacity models have broad world knowledge, they can be expensive and slow for real-world applications. In contrast, o1-mini is a smaller model optimized for STEM reasoning during pretraining. After training with the same high-compute reinforcement learning (RL) pipeline as o1, o1-mini achieves comparable performance on many useful reasoning tasks, while being significantly more cost efficient.Distribution
Distribution channels
The provider has not supplied this information.More information
Responsible AI considerations
Safety techniques
OpenAI has incorporated additional safety measures into the o1 models, including new techniques to help the models refuse unsafe requests. These advancements make the o1 series some of the most robust models available. Note: Configurable content filters are currently not available for o1-preview and o1-mini.Safety evaluations
OpenAI measures safety is by testing how well models continue to follow its safety rules if a user tries to bypass them (known as "jailbreaking"). In OpenAI's internal tests, GPT-4o scored 22 (on a scale of 0-100) while o1-preview model scored 84. You can read more about this in the OpenAI's system card and research post .| Metric | GPT-4o | o1-mini |
|---|---|---|
| % Safe completions refusal on harmful prompts (standard) | 0.99 | 0.99 |
| % Safe completions on harmful prompts (Challenging: jailbreaks & edge cases) | 0.714 | 0.932 |
| % Compliance on benign edge cases ("not over-refusal") | 0.91 | 0.923 |
| Goodness@0.1 StrongREJECT jailbreak eval Souly et al. 2024 | 0.22 | 0.83 |
| Human sourced jailbreak eval | 0.77 | 0.95 |
Known limitations
o1-mini model is currently in preview and do not include some features available in other models, such as image understanding and structured outputs found in the GPT-4o and GPT-4o-mini models. For many tasks, the generally available GPT-4o models may still be more suitable. Large language models such as o1 are pre-trained on vast text datasets. While these high-capacity models have broad world knowledge, they can be expensive and slow for real-world applications.Acceptable use
Acceptable use policy
The provider has not supplied this information.Quality and performance evaluations
Source: OpenAI The following page is an extract from OpenaI o1-mini model announcement . Please refer to the original source for a full benchmark report. Large language models such as o1 are pre-trained on vast text datasets. While these high-capacity models have broad world knowledge, they can be expensive and slow for real-world applications. In contrast, o1-mini is a smaller model optimized for STEM reasoning during pretraining. After training with the same high-compute reinforcement learning (RL) pipeline as o1, o1-mini achieves comparable performance on many useful reasoning tasks, while being significantly more cost efficient.Evals
| Task | Dataset | Metric | GPT-4o | o1-mini | o1-preview |
|---|---|---|---|---|---|
| Coding | Codeforces | Elo | 900 | 1650 | 1258 |
| HumanEval | Accuracy | 90.2% | 92.4% | 92.4% | |
| Cybersecurity CTFs | Accuracy (Pass@12) | 20.0% | 28.7% | 43.0% | |
| STEM | MMLU (o-shot CoT) | 88.7% | 85.2% | 90.8% | |
| GPQA (Diamond, 0-shot CoT) | 53.6% | 60.0% | 73.3% | ||
| MATH-500 (0-shot CoT) | 60.3% | 90.0% | 858.5% |
Safety
| Metric | GPT-4o | o1-mini |
|---|---|---|
| % Safe completions refusal on harmful prompts (standard) | 0.99 | 0.99 |
| % Safe completions on harmful prompts (Challenging: jailbreaks & edge cases) | 0.714 | 0.932 |
| % Compliance on benign edge cases ("not over-refusal") | 0.91 | 0.923 |
| Goodness@0.1 StrongREJECT jailbreak eval Souly et al. 2024 | 0.22 | 0.83 |
| Human sourced jailbreak eval | 0.77 | 0.95 |
Benchmarking methodology
Source: OpenAI The provider has not supplied this information.Public data summary
Source: OpenAI The provider has not supplied this information.Model Specifications
Context Length128000
LicenseCustom
Training DataSeptember 2023
Last UpdatedDecember 2025
Input TypeText
Output TypeText
ProviderOpenAI
Languages27 Languages