AI Model Catalog | Azure AI Foundry Models

OpenAI o1-preview

Version: 1

OpenAI•Last updated August 2025

Focused on advanced reasoning and solving complex problems, including math and science tasks. Ideal for applications that require deep contextual understanding and agentic workflows.

Reasoning

Multilingual

Coding

OpenAI's o1 Series Models: Enhanced Reasoning and Problem Solving on Azure

The OpenAI o1 series models are specifically designed to tackle reasoning and problem-solving tasks with increased focus and capability. These models spend more time processing and understanding the user's request, making them exceptionally strong in areas like science, coding, math and similar fields. For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows. Note: Configurable content filters are currently not available for o1-preview and o1-mini. IMPORTANT: o1-preview model is available for limited access. To try the model in the playground, registration is required, and access will be granted based on Microsoft’s eligibility criteria.

Key Capabilities of the o1 Series

Complex Code Generation: Capable of generating algorithms and handling advanced coding tasks to support developers.
Advanced Problem Solving: Ideal for comprehensive brainstorming sessions and addressing multifaceted challenges.
Complex Document Comparison: Perfect for analyzing contracts, case files, or legal documents to identify subtle differences.
Instruction Following and Workflow Management: Particularly effective for managing workflows requiring shorter contexts.

Model Variants

o1-preview: The most capable model in the o1 series, offering enhanced reasoning abilities.
o1-mini: A faster and more cost-efficient option in the o1 series, ideal for coding tasks requiring speed and lower resource consumption.

Limitations

o1-preview model is currently in preview and do not include some features available in other models, such as image understanding and structured outputs found in the GPT-4o and GPT-4o-mini models. For many tasks, the generally available GPT-4o models may still be more suitable.

Resources

Model provider

This model is provided through the Azure OpenAI Service.

Relevant documents

The following documents are applicable:

Safety

OpenAI has incorporated additional safety measures into the o1 models, including new techniques to help the models refuse unsafe requests. These advancements make the o1 series some of the most robust models available. OpenAI measures safety is by testing how well models continue to follow its safety rules if a user tries to bypass them (known as "jailbreaking"). In OpenAI’s internal tests, GPT-4o scored 22 (on a scale of 0-100) while o1-preview model scored 84. You can read more about this in the OpenAI’s system card and research post .

The following page is an extract from Learning to Reason with LLMs, OpenAI blog, Sept 2024 . Please refer to the original source for a full benchmark report. OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA). While the work needed to make this new model as easy to use as current models is still ongoing, we are releasing an early version of this model, OpenAI o1-preview, for immediate use in ChatGPT and to trusted API users(opens in a new window). Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them.

Evals

Dataset	Metric	gpt-4o	o1-preview
Competition Math AIME (2024)	cons@64	13.4	56.7
	pass@1	9.3	44.6
Competition Code CodeForces	Elo	808	1,258
	Percentile	11.0	62.0
GPQA Diamond	cons@64	56.1	78.3
	pass@1	50.6	73.3
Biology	cons@64	63.2	73.7
	pass@1	61.6	65.9
Chemistry	cons@64	43.0	60.2
	pass@1	40.2	59.9
Physics	cons@64	68.6	89.5
	pass@1	59.5	89.4
MATH	pass@1	60.3	85.5
MMLU	pass@1	88.0	92.3
MMMU (val)	pass@1	69.1	n/a
MathVista (testmini)	pass@1	63.8	n/a

Safety

Metric	GPT-4o	o1-preview
% Safe completions on harmful prompts Standard	0.990	0.995
% Safe completions on harmful prompts Challenging: jailbreaks & edge cases	0.714	0.934
↳ Harassment (severe)	0.845	0.900
↳ Exploitative sexual content	0.483	0.949
↳ Sexual content involving minors	0.707	0.931
↳ Advice about non-violent wrongdoing	0.688	0.961
↳ Advice about violent wrongdoing	0.778	0.963
% Safe completions for top 200 with highest Moderation API scores per category in WildChat Zhao, et al. 2024	0.945	0.971
Goodness@0.1 StrongREJECT jailbreak eval Souly et al. 2024	0.220	0.840
Human sourced jailbreak eval	0.770	0.960
% Compliance on internal benign edge cases “not over-refusal”	0.910	0.930
% Compliance on benign edge cases in XSTest Röttger, et al. 2023	0.924	0.976

Model Specifications

Context Length128000

Quality Index0.71

LicenseCustom

Training DataSeptember 2023

Last UpdatedAugust 2025

Input TypeText

Output TypeText

PublisherOpenAI

Languages27 Languages

Quick Start

Related Models

o3-mini

Llama-4-Maverick-17B-128E-Instruct-FP8

Phi-4