AI Model Catalog | Microsoft Foundry Models

OpenAI GPT-4o

Version: 2024-11-20

OpenAI•Last updated December 2024

OpenAI's most advanced multimodal model in the gpt-4o family. Can handle both text and image inputs.

Multipurpose

Multilingual

Multimodal

gpt-4o offers a shift in how AI models interact with multimodal inputs. By seamlessly combining text, images, and audio, gpt-4o provides a richer, more engaging user experience. Matching the intelligence of gpt-4 turbo, it is remarkably more efficient, delivering text at twice the speed and at half the cost. Additionally, GPT-4o exhibits the highest vision performance and excels in non-English languages compared to previous OpenAI models. gpt-4o is engineered for speed and efficiency. Its advanced ability to handle complex queries with minimal resources can translate into cost savings and performance. The introduction of gpt-4o opens numerous possibilities for businesses in various sectors:

Enhanced customer service: By integrating diverse data inputs, gpt-4o enables more dynamic and comprehensive customer support interactions.
Advanced analytics: Leverage gpt-4o's capability to process and analyze different types of data to enhance decision-making and uncover deeper insights.
Content innovation: Use gpt-4o's generative capabilities to create engaging and diverse content formats, catering to a broad range of consumer preferences.

Updates

gpt-4o-2024-11-20: this is the latest version of gpt-4o. Supports all previous output size (16,384) and features such as:

Text, image processing
JSON Mode
parallel function calling
Enhanced accuracy and responsiveness
Parity with English text and coding tasks compared to GPT-4 Turbo with Vision
Superior performance in non-English languages and in vision tasks
Support for enhancements
Support for complex structured outputs.

Resources

Model Provider

This model is provided through the Azure OpenAI service.

Relevant documents

The following documents are applicable:

Responsible AI Considerations

gpt-4o has safety built-in by design across modalities, through techniques such as filtering training data and refining the model's behavior through post-training. We have also created new safety systems to provide guardrails on voice outputs. We've evaluated gpt-4o according to our Preparedness Framework and in line with our voluntary commitments . Our evaluations of cybersecurity, CBRN, persuasion, and model autonomy show that GPT-4o does not score above Medium risk in any of these categories. This assessment involved running a suite of automated and human evaluations throughout the model training process. We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities. gpt-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they're discovered. We recognize that gpt-4o's audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we'll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities. For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies.

Content Filtering

Prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Additional classification models and configuration options are available when you deploy an Azure OpenAI model in production; learn more .

As measured on traditional benchmarks, gpt-4o achieves gpt-4 turbo-level performance on text, reasoning, and coding intelligence, while setting new high watermarks on multilingual, audio, and vision capabilities.

Model	MMLU	GPQA	MATH	MGSM	DROP	HumanEval
GPT-4o (2024-08-06)	88.7	53.6	76.6	90.5	83.4	90.2
GPT-4T	86.5	48.0	72.6	88.5	86.0	87.1
GPT-4	86.4	35.7	42.5	74.5	80.9	67.0
Claude3 Opus	86.8	50.4	60.1	90.7	83.1	84.9
Gemini Pro 1.5	81.9	--	58.5	88.7	78.9	71.9
Gemini Ultra 1.0	83.7	--	53.2	79.0	82.4	74.4
Llama3 400b	86.1	48.0	57.8	--	83.5	84.1

Source: the OpenAI announcement .

Model Specifications

Context Length131072

Quality Index0.75

LicenseCustom

Training DataSeptember 2023

Last UpdatedDecember 2024

Input TypeText,Image,Audio

Output TypeText

PublisherOpenAI

Languages27 Languages

Quick Start

Related Models

o3-mini

Llama-4-Maverick-17B-128E-Instruct-FP8

Phi-4