OpenAI o1-preview
OpenAI o1-preview
Version: 1
OpenAILast updated December 2025
Focused on advanced reasoning and solving complex problems, including math and science tasks. Ideal for applications that require deep contextual understanding and agentic workflows.
Reasoning
Multilingual
Coding

Direct from Azure models

Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
  • Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
  • Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
  • Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
  • Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Learn more about Direct from Azure models .

Key capabilities

About this model

OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA). Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute).

Key model capabilities

  • Complex Code Generation: Capable of generating algorithms and handling advanced coding tasks to support developers.
  • Advanced Problem Solving: Ideal for comprehensive brainstorming sessions and addressing multifaceted challenges.
  • Complex Document Comparison: Perfect for analyzing contracts, case files, or legal documents to identify subtle differences.
  • Instruction Following and Workflow Management: Particularly effective for managing workflows requiring shorter contexts.

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

The provider has not supplied this information.

Out of scope use cases

o1-preview model is currently in preview and do not include some features available in other models, such as image understanding and structured outputs found in the GPT-4o and GPT-4o-mini models. For many tasks, the generally available GPT-4o models may still be more suitable. Note: Configurable content filters are currently not available for o1-preview and o1-mini.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

The provider has not supplied this information.

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

The provider has not supplied this information.

Output formats

The provider has not supplied this information.

Supported languages

The provider has not supplied this information.

Sample JSON response

The provider has not supplied this information.

Model architecture

The provider has not supplied this information.

Long context

The provider has not supplied this information.

Optimizing model performance

The provider has not supplied this information.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

The provider has not supplied this information.

Distribution

Distribution channels

The provider has not supplied this information.

More information

The following documents are applicable:

Responsible AI considerations

Safety techniques

OpenAI has incorporated additional safety measures into the o1 models, including new techniques to help the models refuse unsafe requests. These advancements make the o1 series some of the most robust models available. Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. Note: Configurable content filters are currently not available for o1-preview and o1-mini.

Safety evaluations

OpenAI measures safety is by testing how well models continue to follow its safety rules if a user tries to bypass them (known as "jailbreaking"). In OpenAI's internal tests, GPT-4o scored 22 (on a scale of 0-100) while o1-preview model scored 84. You can read more about this in the OpenAI's system card and research post .
MetricGPT-4oo1-preview
% Safe completions on harmful prompts Standard0.9900.995
% Safe completions on harmful prompts Challenging: jailbreaks & edge cases0.7140.934
↳ Harassment (severe)0.8450.900
↳ Exploitative sexual content0.4830.949
↳ Sexual content involving minors0.7070.931
↳ Advice about non-violent wrongdoing0.6880.961
↳ Advice about violent wrongdoing0.7780.963
% Safe completions for top 200 with highest Moderation API scores per category in WildChat
Zhao, et al. 2024
0.9450.971
Goodness@0.1 StrongREJECT jailbreak eval
Souly et al. 2024
0.2200.840
Human sourced jailbreak eval0.7700.960
% Compliance on internal benign edge cases "not over-refusal"0.9100.930
% Compliance on benign edge cases in XSTest
Röttger, et al. 2023
0.9240.976

Known limitations

o1-preview model is currently in preview and do not include some features available in other models, such as image understanding and structured outputs found in the GPT-4o and GPT-4o-mini models. For many tasks, the generally available GPT-4o models may still be more suitable. IMPORTANT: o1-preview model is available for limited access. To try the model in the playground, registration is required, and access will be granted based on Microsoft's eligibility criteria.

Acceptable use

Acceptable use policy

The provider has not supplied this information.

Quality and performance evaluations

Source: OpenAI OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA). While the work needed to make this new model as easy to use as current models is still ongoing, we are releasing an early version of this model, OpenAI o1-preview, for immediate use in ChatGPT and to trusted API users(opens in a new window).

Evals

DatasetMetricgpt-4oo1-preview
Competition Math
AIME (2024)
cons@6413.456.7
pass@19.344.6
Competition Code
CodeForces
Elo8081,258
Percentile11.062.0
GPQA Diamondcons@6456.178.3
pass@150.673.3
Biologycons@6463.273.7
pass@161.665.9
Chemistrycons@6443.060.2
pass@140.259.9
Physicscons@6468.689.5
pass@159.589.4
MATHpass@160.385.5
MMLUpass@188.092.3
MMMU (val)pass@169.1n/a
MathVista (testmini)pass@163.8n/a

Safety

MetricGPT-4oo1-preview
% Safe completions on harmful prompts Standard0.9900.995
% Safe completions on harmful prompts Challenging: jailbreaks & edge cases0.7140.934
↳ Harassment (severe)0.8450.900
↳ Exploitative sexual content0.4830.949
↳ Sexual content involving minors0.7070.931
↳ Advice about non-violent wrongdoing0.6880.961
↳ Advice about violent wrongdoing0.7780.963
% Safe completions for top 200 with highest Moderation API scores per category in WildChat
Zhao, et al. 2024
0.9450.971
Goodness@0.1 StrongREJECT jailbreak eval
Souly et al. 2024
0.2200.840
Human sourced jailbreak eval0.7700.960
% Compliance on internal benign edge cases "not over-refusal"0.9100.930
% Compliance on benign edge cases in XSTest
Röttger, et al. 2023
0.9240.976

Benchmarking methodology

Source: OpenAI Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them. OpenAI measures safety is by testing how well models continue to follow its safety rules if a user tries to bypass them (known as "jailbreaking"). In OpenAI's internal tests, GPT-4o scored 22 (on a scale of 0-100) while o1-preview model scored 84.

Public data summary

Source: OpenAI The provider has not supplied this information.
Model Specifications
Context Length128000
LicenseCustom
Training DataSeptember 2023
Last UpdatedDecember 2025
Input TypeText
Output TypeText
ProviderOpenAI
Languages27 Languages