Overview
OpenAI, Inc aims to create "safe and beneficial" artificial general intelligence, defined as highly autonomous systems that outperform humans at most economically valuable work. The organization leads today's AI boom with some of the world's most advanced multimodal foundation models—including GPT‑4o, its real‑time "omni" flagship—and continues to compress state‑of‑the‑art performance into smaller, cheaper variants like GPT‑4o‑mini. The latest GPT‑4o‑transcribe models replace Whisper for speech tasks, while GPT‑Image‑1 overtakes DALL·E 3 with native image generation and editing.Key Azure AI Foundry Models (July 2025)
- GPT‑4o – Best‑in‑class reasoning, code, audio & vision in one API.
- GPT‑4o‑mini – Small‑footprint 20B model that outperforms GPT‑3.5‑Turbo at half the cost.
- GPT‑4o‑transcribe / TTS – Upgraded speech models with lower error rates and customizable voices.
- GPT‑Image‑1 – Next‑gen text‑to‑image + in‑painting that supersedes DALL·E 3.
Why OpenAI on Azure
All OpenAI endpoints inherit enterprise‑grade Content Safety, flexible serverless or provisioned‑throughput deployments, Azure billing, and private‑network inference—so you can move from prototype to production in days, not months.gpt-5 is designed for logic-heavy and multi-step tasks.
gpt-5-mini is a lightweight version for cost-sensitive applications.
gpt-5-nano is optimized for speed, ideal for applications requiring low latency.
gpt-5-chat (preview) is an advanced, natural, multimodal, and context-aware conversations for enterprise applications.
The o3 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers.
codex-mini is a fine-tuned variant of the o4-mini model, designed to deliver rapid, instruction-following performance for developers working in CLI workflows. Whether you're automating shell commands, editing scripts, or refactoring repositories, Codex-Min
An efficient AI solution to generate videos
o3 includes significant improvements on quality and safety while supporting the existing features of o1 and delivering comparable or better performance.
o4-mini includes significant improvements on quality and safety while supporting the existing features of o3-mini and delivering comparable or better performance.
An efficient AI solution for diverse text and image tasks, including text to image, image to image, inpainting, and prompt transformation.
gpt-4.1 outperforms gpt-4o across the board, with major gains in coding, instruction following, and long-context understanding
gpt-4.1-mini outperform gpt-4o-mini across the board, with major gains in coding, instruction following, and long-context handling
gpt-4.1-nano provides gains in coding, instruction following, and long-context handling along with lower latency and cost
the largest and strongest general purpose model in the gpt model family up to date, best suited for diverse text and image tasks.
o3-mini includes the o1 features with significant cost-efficiencies for scenarios requiring high performance.
An advanced text-to-speech solution designed to convert written text into natural-sounding speech.
A cutting-edge speech-to-text solution that deliverables reliable and accurate transcripts.
A highly efficient and cost effective speech-to-text solution that deliverables reliable and accurate transcripts.
computer-use-preview is the model for Computer Use Agent for use in Responses API. You can use computer-use-preview model to get instructions to control a browser on your computer screen and take action on a user's behalf.
Best suited for rich, asynchronous audio input/output interactions, such as creating spoken summaries from text.
Best suited for rich, asynchronous audio input/output interactions, such as creating spoken summaries from text.
Focused on advanced reasoning and solving complex problems, including math and science tasks. Ideal for applications that require deep contextual understanding and agentic workflows.
Smaller, faster, and 80% cheaper than o1-preview, performs well at code generation and small context operations.
OpenAI's most advanced multimodal model in the gpt-4o family. Can handle both text and image inputs.
An affordable, efficient AI solution for diverse text and image tasks.
Best suited for rich, asynchronous audio input/output interactions, such as creating spoken summaries from text.
The gpt4orealtimepreview model introduces a new era in AI interaction by incorporating the new audio modality powered by gpt4o. This new modality allows for seamless speechtospeech and texttospeech applications, providing a richer and more engaging user experience. Engineered for speed and e
Focused on advanced reasoning and solving complex problems, including math and science tasks. Ideal for applications that require deep contextual understanding and agentic workflows.
gpt4 is a large multimodal model that accepts text or image inputs and outputs text. It can solve complex problems with greater accuracy than any of our previous models, thanks to its extensive general knowledge and advanced reasoning capabilities. gpt4 provides a wide range of model versions to
DALLE 3 generates images from text prompts that are provided by the user. DALLE 3 is generally available for use on Azure OpenAI. The image generation API creates an image from a text prompt. It does not edit existing images or create variations. Learn more at: <https://learn.microsoft.com/azur
Davinci002 is the latest versions of Davinci, gpt3 base models. Davinci002 replaces the deprecated Curie and Davinci models. It is a smaller, faster model that is primarily used for fine tuning tasks. This model supports 16384 max input tokens and training data is up to Sep 2021. Davinci002 su
gpt3.5 models can understand and generate natural language or code. The most capable and cost effective model in the gpt3.5 family is gpt3.5turbo, which has been optimized for chat and works well for traditional completions tasks as well. gpt3.5turbo is available for use with the Chat Completi
gpt3.5 models can understand and generate natural language or code. The most capable and cost effective model in the gpt3.5 family is gpt3.5turbo, which has been optimized for chat and works well for traditional completions tasks as well. gpt3.5turbo is available for use with the Chat Completi
The gpt35turbo (also known as ChatGPT) is the most capable and costeffective model in the gpt3.5 family which has been optimized for chat using the Chat Completions API. It is a language model designed for conversational interfaces and the model behaves differently than previous gpt3 models. Pr
Push the open model frontier with GPT-OSS models, released under the permissive Apache 2.0 license, allowing anyone to use, modify, and deploy them freely.
TTS is a model that converts text to natural sounding speech. TTS is optimized for realtime or interactive scenarios. For offline scenarios, TTSHD provides higher quality. The API supports six different voices. Max request data size: 4,096 chars can be converted from text to speech per API request
textembeddingada002 outperforms all the earlier embedding models on text search, code search, and sentence similarity tasks and gets comparable performance on text classification. Embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers
Text-embedding-3 series models are the latest and most capable embedding model from OpenAI.
The o3 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers.
Push the open model frontier with GPT-OSS models, released under the permissive Apache 2.0 license, allowing anyone to use, modify, and deploy them freely.
gpt4 can solve difficult problems with greater accuracy than any of the previous OpenAI models. Like gpt35turbo, gpt4 is optimized for chat but works well for traditional completions tasks. The gpt4 supports 8192 max input tokens and the gpt432k supports up to 32,768 tokens. Note: this model
TTSHD is a model that converts text to natural sounding speech. TTS is optimized for realtime or interactive scenarios. For offline scenarios, TTSHD provides higher quality. The API supports six different voices. Max request data size: 4,096 chars can be converted from text to speech per API requ
The Whisper models are trained for speech recognition and translation tasks, capable of transcribing speech audio into the text in the language it is spoken (automatic speech recognition) as well as translated into English (speech translation). Researchers at OpenAI developed the models to study th
Text-embedding-3 series models are the latest and most capable embedding model from OpenAI.