OpenAI o1-mini
OpenAI o1-mini
Version: 2024-09-12
OpenAILast updated March 2025
Smaller, faster, and 80% cheaper than o1-preview, performs well at code generation and small context operations.
Reasoning
Multilingual
Coding

OpenAI's o1 Series Models: Enhanced Reasoning and Problem Solving on Azure

The OpenAI o1 series models are specifically designed to tackle reasoning and problem-solving tasks with increased focus and capability. These models spend more time processing and understanding the user's request, making them exceptionally strong in areas like science, coding, math and similar fields. For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows. o1-mini is developed to provide a faster, cheaper reasoning model that is particularly effective at coding. As a smaller model, o1-mini is 80% cheaper than o1-preview, making it a powerful, cost-effective model for applications that require reasoning but not broad world knowledge. Note: Configurable content filters are currently not available for o1-preview and o1-mini. IMPORTANT: o1-mini model is available for limited access. To try the model in the playground, registration is required, and access will be granted based on Microsoft’s eligibility criteria.

Key Capabilities of the o1 Series

  • Complex Code Generation: Capable of generating algorithms and handling advanced coding tasks to support developers.
  • Advanced Problem Solving: Ideal for comprehensive brainstorming sessions and addressing multifaceted challenges.
  • Complex Document Comparison: Perfect for analyzing contracts, case files, or legal documents to identify subtle differences.
  • Instruction Following and Workflow Management: Particularly effective for managing workflows requiring shorter contexts.

Model Variants

  • o4-mini: The most efficient reasoning model in the o model series, well suited for agentic solutions. Now generally available.
  • o3: The most capable reasoning model in the o model series, and the first one to offer full tools support for agentic solutions. Now generally available.
  • o3-mini: A faster and more cost-efficient option in the o3 series, ideal for coding tasks requiring speed and lower resource consumption.
  • o1: The most capable model in the o1 series, offering enhanced reasoning abilities. Now generally available.
  • o1-mini: A faster and more cost-efficient option in the o1 series, ideal for coding tasks requiring speed and lower resource consumption.

Limitations

o1-mini model is currently in preview and do not include some features available in other models, such as image understanding and structured outputs found in the GPT-4o and GPT-4o-mini models. For many tasks, the generally available GPT-4o models may still be more suitable.

Resources

Model provider

This model is provided through the Azure OpenAI Service.

Relevant documents

The following documents are applicable:

Safety

OpenAI has incorporated additional safety measures into the o1 models, including new techniques to help the models refuse unsafe requests. These advancements make the o1 series some of the most robust models available. OpenAI measures safety is by testing how well models continue to follow its safety rules if a user tries to bypass them (known as "jailbreaking"). In OpenAI’s internal tests, GPT-4o scored 22 (on a scale of 0-100) while o1-preview model scored 84. You can read more about this in the OpenAI’s system card and research post .
The following page is an extract from OpenaI o1-mini model announcement . Please refer to the original source for a full benchmark report. Large language models such as o1 are pre-trained on vast text datasets. While these high-capacity models have broad world knowledge, they can be expensive and slow for real-world applications. In contrast, o1-mini is a smaller model optimized for STEM reasoning during pretraining. After training with the same high-compute reinforcement learning (RL) pipeline as o1, o1-mini achieves comparable performance on many useful reasoning tasks, while being significantly more cost efficient.

Evals

TaskDatasetMetricGPT-4oo1-minio1-preview
CodingCodeforcesElo90016501258
HumanEvalAccuracy90.2%92.4%92.4%
Cybersecurity CTFsAccuracy (Pass@12)20.0%28.7%43.0%
STEMMMLU (o-shot CoT)88.7%85.2%90.8%
GPQA (Diamond, 0-shot CoT)53.6%60.0%73.3%
MATH-500 (0-shot CoT)60.3%90.0%858.5%

Safety

MetricGPT-4oo1-mini
% Safe completions refusal on harmful prompts
(standard)
0.990.99
% Safe completions on harmful prompts
(Challenging: jailbreaks & edge cases)
0.7140.932
% Compliance on benign edge cases
(“not over-refusal”)
0.910.923
Goodness@0.1 StrongREJECT jailbreak eval
Souly et al. 2024
0.220.83
Human sourced jailbreak eval0.770.95
Model Specifications
Context Length128000
Quality Index0.82
LicenseCustom
Training DataSeptember 2023
Last UpdatedMarch 2025
Input TypeText
Output TypeText
PublisherOpenAI
Languages27 Languages