OpenAI o1-mini
Version: 2024-09-12
OpenAI's o1 Series Models: Enhanced Reasoning and Problem Solving on Azure
The OpenAI o1 series models are specifically designed to tackle reasoning and problem-solving tasks with increased focus and capability. These models spend more time processing and understanding the user's request, making them exceptionally strong in areas like science, coding, math and similar fields. For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows. o1-mini is developed to provide a faster, cheaper reasoning model that is particularly effective at coding. As a smaller model, o1-mini is 80% cheaper than o1-preview, making it a powerful, cost-effective model for applications that require reasoning but not broad world knowledge. Note: Configurable content filters are currently not available for o1-preview and o1-mini. IMPORTANT: o1-mini model is available for limited access. To try the model in the playground, registration is required, and access will be granted based on Microsoft’s eligibility criteria.Key Capabilities of the o1 Series
- Complex Code Generation: Capable of generating algorithms and handling advanced coding tasks to support developers.
- Advanced Problem Solving: Ideal for comprehensive brainstorming sessions and addressing multifaceted challenges.
- Complex Document Comparison: Perfect for analyzing contracts, case files, or legal documents to identify subtle differences.
- Instruction Following and Workflow Management: Particularly effective for managing workflows requiring shorter contexts.
Model Variants
o4-mini
: The most efficient reasoning model in the o model series, well suited for agentic solutions. Now generally available.o3
: The most capable reasoning model in the o model series, and the first one to offer full tools support for agentic solutions. Now generally available.o3-mini
: A faster and more cost-efficient option in the o3 series, ideal for coding tasks requiring speed and lower resource consumption.o1
: The most capable model in the o1 series, offering enhanced reasoning abilities. Now generally available.o1-mini
: A faster and more cost-efficient option in the o1 series, ideal for coding tasks requiring speed and lower resource consumption.
Limitations
o1-mini model is currently in preview and do not include some features available in other models, such as image understanding and structured outputs found in the GPT-4o and GPT-4o-mini models. For many tasks, the generally available GPT-4o models may still be more suitable.Resources
Model provider
This model is provided through the Azure OpenAI Service.Relevant documents
The following documents are applicable:- Overview of Responsible AI practices for Azure OpenAI models
- Transparency Note for Azure OpenAI Service
Safety
OpenAI has incorporated additional safety measures into the o1 models, including new techniques to help the models refuse unsafe requests. These advancements make the o1 series some of the most robust models available. OpenAI measures safety is by testing how well models continue to follow its safety rules if a user tries to bypass them (known as "jailbreaking"). In OpenAI’s internal tests, GPT-4o scored 22 (on a scale of 0-100) while o1-preview model scored 84. You can read more about this in the OpenAI’s system card and research post .The following page is an extract from OpenaI o1-mini model announcement . Please refer to the original source for a full benchmark report.
Large language models such as o1 are pre-trained on vast text datasets. While these high-capacity models have broad world knowledge, they can be expensive and slow for real-world applications. In contrast, o1-mini is a smaller model optimized for STEM reasoning during pretraining. After training with the same high-compute reinforcement learning (RL) pipeline as o1, o1-mini achieves comparable performance on many useful reasoning tasks, while being significantly more cost efficient.
Evals
Task | Dataset | Metric | GPT-4o | o1-mini | o1-preview |
---|---|---|---|---|---|
Coding | Codeforces | Elo | 900 | 1650 | 1258 |
HumanEval | Accuracy | 90.2% | 92.4% | 92.4% | |
Cybersecurity CTFs | Accuracy (Pass@12) | 20.0% | 28.7% | 43.0% | |
STEM | MMLU (o-shot CoT) | 88.7% | 85.2% | 90.8% | |
GPQA (Diamond, 0-shot CoT) | 53.6% | 60.0% | 73.3% | ||
MATH-500 (0-shot CoT) | 60.3% | 90.0% | 858.5% |
Safety
Metric | GPT-4o | o1-mini |
---|---|---|
% Safe completions refusal on harmful prompts (standard) | 0.99 | 0.99 |
% Safe completions on harmful prompts (Challenging: jailbreaks & edge cases) | 0.714 | 0.932 |
% Compliance on benign edge cases (“not over-refusal”) | 0.91 | 0.923 |
Goodness@0.1 StrongREJECT jailbreak eval Souly et al. 2024 | 0.22 | 0.83 |
Human sourced jailbreak eval | 0.77 | 0.95 |
Model Specifications
Context Length128000
Quality Index0.82
LicenseCustom
Training DataSeptember 2023
Last UpdatedMarch 2025
Input TypeText
Output TypeText
PublisherOpenAI
Languages27 Languages
Related Models