OpenAI o1-preview
Version: 1
Direct from Azure models
Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:- Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
- Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
- Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
- Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Key capabilities
About this model
OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA). Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute).Key model capabilities
- Complex Code Generation: Capable of generating algorithms and handling advanced coding tasks to support developers.
- Advanced Problem Solving: Ideal for comprehensive brainstorming sessions and addressing multifaceted challenges.
- Complex Document Comparison: Perfect for analyzing contracts, case files, or legal documents to identify subtle differences.
- Instruction Following and Workflow Management: Particularly effective for managing workflows requiring shorter contexts.
Use cases
See Responsible AI for additional considerations for responsible use.Key use cases
The provider has not supplied this information.Out of scope use cases
o1-preview model is currently in preview and do not include some features available in other models, such as image understanding and structured outputs found in the GPT-4o and GPT-4o-mini models. For many tasks, the generally available GPT-4o models may still be more suitable. Note: Configurable content filters are currently not available for o1-preview and o1-mini.Pricing
Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.Technical specs
The provider has not supplied this information.Training cut-off date
The provider has not supplied this information.Training time
The provider has not supplied this information.Input formats
The provider has not supplied this information.Output formats
The provider has not supplied this information.Supported languages
The provider has not supplied this information.Sample JSON response
The provider has not supplied this information.Model architecture
The provider has not supplied this information.Long context
The provider has not supplied this information.Optimizing model performance
The provider has not supplied this information.Additional assets
The provider has not supplied this information.Training disclosure
Training, testing and validation
The provider has not supplied this information.Distribution
Distribution channels
The provider has not supplied this information.More information
The following documents are applicable:Responsible AI considerations
Safety techniques
OpenAI has incorporated additional safety measures into the o1 models, including new techniques to help the models refuse unsafe requests. These advancements make the o1 series some of the most robust models available. Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. Note: Configurable content filters are currently not available for o1-preview and o1-mini.Safety evaluations
OpenAI measures safety is by testing how well models continue to follow its safety rules if a user tries to bypass them (known as "jailbreaking"). In OpenAI's internal tests, GPT-4o scored 22 (on a scale of 0-100) while o1-preview model scored 84. You can read more about this in the OpenAI's system card and research post .| Metric | GPT-4o | o1-preview |
|---|---|---|
| % Safe completions on harmful prompts Standard | 0.990 | 0.995 |
| % Safe completions on harmful prompts Challenging: jailbreaks & edge cases | 0.714 | 0.934 |
| ↳ Harassment (severe) | 0.845 | 0.900 |
| ↳ Exploitative sexual content | 0.483 | 0.949 |
| ↳ Sexual content involving minors | 0.707 | 0.931 |
| ↳ Advice about non-violent wrongdoing | 0.688 | 0.961 |
| ↳ Advice about violent wrongdoing | 0.778 | 0.963 |
| % Safe completions for top 200 with highest Moderation API scores per category in WildChat Zhao, et al. 2024 | 0.945 | 0.971 |
| Goodness@0.1 StrongREJECT jailbreak eval Souly et al. 2024 | 0.220 | 0.840 |
| Human sourced jailbreak eval | 0.770 | 0.960 |
| % Compliance on internal benign edge cases "not over-refusal" | 0.910 | 0.930 |
| % Compliance on benign edge cases in XSTest Röttger, et al. 2023 | 0.924 | 0.976 |
Known limitations
o1-preview model is currently in preview and do not include some features available in other models, such as image understanding and structured outputs found in the GPT-4o and GPT-4o-mini models. For many tasks, the generally available GPT-4o models may still be more suitable. IMPORTANT: o1-preview model is available for limited access. To try the model in the playground, registration is required, and access will be granted based on Microsoft's eligibility criteria.Acceptable use
Acceptable use policy
The provider has not supplied this information.Quality and performance evaluations
Source: OpenAI OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA). While the work needed to make this new model as easy to use as current models is still ongoing, we are releasing an early version of this model, OpenAI o1-preview, for immediate use in ChatGPT and to trusted API users(opens in a new window).Evals
| Dataset | Metric | gpt-4o | o1-preview |
|---|---|---|---|
| Competition Math AIME (2024) | cons@64 | 13.4 | 56.7 |
| pass@1 | 9.3 | 44.6 | |
| Competition Code CodeForces | Elo | 808 | 1,258 |
| Percentile | 11.0 | 62.0 | |
| GPQA Diamond | cons@64 | 56.1 | 78.3 |
| pass@1 | 50.6 | 73.3 | |
| Biology | cons@64 | 63.2 | 73.7 |
| pass@1 | 61.6 | 65.9 | |
| Chemistry | cons@64 | 43.0 | 60.2 |
| pass@1 | 40.2 | 59.9 | |
| Physics | cons@64 | 68.6 | 89.5 |
| pass@1 | 59.5 | 89.4 | |
| MATH | pass@1 | 60.3 | 85.5 |
| MMLU | pass@1 | 88.0 | 92.3 |
| MMMU (val) | pass@1 | 69.1 | n/a |
| MathVista (testmini) | pass@1 | 63.8 | n/a |
Safety
| Metric | GPT-4o | o1-preview |
|---|---|---|
| % Safe completions on harmful prompts Standard | 0.990 | 0.995 |
| % Safe completions on harmful prompts Challenging: jailbreaks & edge cases | 0.714 | 0.934 |
| ↳ Harassment (severe) | 0.845 | 0.900 |
| ↳ Exploitative sexual content | 0.483 | 0.949 |
| ↳ Sexual content involving minors | 0.707 | 0.931 |
| ↳ Advice about non-violent wrongdoing | 0.688 | 0.961 |
| ↳ Advice about violent wrongdoing | 0.778 | 0.963 |
| % Safe completions for top 200 with highest Moderation API scores per category in WildChat Zhao, et al. 2024 | 0.945 | 0.971 |
| Goodness@0.1 StrongREJECT jailbreak eval Souly et al. 2024 | 0.220 | 0.840 |
| Human sourced jailbreak eval | 0.770 | 0.960 |
| % Compliance on internal benign edge cases "not over-refusal" | 0.910 | 0.930 |
| % Compliance on benign edge cases in XSTest Röttger, et al. 2023 | 0.924 | 0.976 |
Benchmarking methodology
Source: OpenAI Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them. OpenAI measures safety is by testing how well models continue to follow its safety rules if a user tries to bypass them (known as "jailbreaking"). In OpenAI's internal tests, GPT-4o scored 22 (on a scale of 0-100) while o1-preview model scored 84.Public data summary
Source: OpenAI The provider has not supplied this information.Model Specifications
Context Length128000
LicenseCustom
Training DataSeptember 2023
Last UpdatedDecember 2025
Input TypeText
Output TypeText
ProviderOpenAI
Languages27 Languages