OpenAI GPT-4o
Version: 2024-11-20
gpt-4o offers a shift in how AI models interact with multimodal inputs. By seamlessly combining text, images, and audio, gpt-4o provides a richer, more engaging user experience.
Matching the intelligence of gpt-4 turbo, it is remarkably more efficient, delivering text at twice the speed and at half the cost. Additionally, GPT-4o exhibits the highest vision performance and excels in non-English languages compared to previous OpenAI models.
gpt-4o is engineered for speed and efficiency. Its advanced ability to handle complex queries with minimal resources can translate into cost savings and performance.
The introduction of gpt-4o opens numerous possibilities for businesses in various sectors:
- Enhanced customer service: By integrating diverse data inputs, gpt-4o enables more dynamic and comprehensive customer support interactions.
- Advanced analytics: Leverage gpt-4o's capability to process and analyze different types of data to enhance decision-making and uncover deeper insights.
- Content innovation: Use gpt-4o's generative capabilities to create engaging and diverse content formats, catering to a broad range of consumer preferences.
Updates
gpt-4o-2024-11-20
: this is the latest version of gpt-4o. Supports all previous output size (16,384) and features such as:
- Text, image processing
- JSON Mode
- parallel function calling
- Enhanced accuracy and responsiveness
- Parity with English text and coding tasks compared to GPT-4 Turbo with Vision
- Superior performance in non-English languages and in vision tasks
- Support for enhancements
- Support for complex structured outputs.
Resources
Model Provider
This model is provided through the Azure OpenAI service.Relevant documents
The following documents are applicable:- Overview of Responsible AI practices for Azure OpenAI models
- Transparency Note for Azure OpenAI Service
Responsible AI Considerations
gpt-4o has safety built-in by design across modalities, through techniques such as filtering training data and refining the model's behavior through post-training. We have also created new safety systems to provide guardrails on voice outputs. We've evaluated gpt-4o according to our Preparedness Framework and in line with our voluntary commitments . Our evaluations of cybersecurity, CBRN, persuasion, and model autonomy show that GPT-4o does not score above Medium risk in any of these categories. This assessment involved running a suite of automated and human evaluations throughout the model training process. We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities. gpt-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they're discovered. We recognize that gpt-4o's audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we'll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities. For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies.Content Filtering
Prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Additional classification models and configuration options are available when you deploy an Azure OpenAI model in production; learn more .As measured on traditional benchmarks, gpt-4o achieves gpt-4 turbo-level performance on text, reasoning, and coding intelligence, while setting new high watermarks on multilingual, audio, and vision capabilities.
Source: the OpenAI announcement .
Model | MMLU | GPQA | MATH | MGSM | DROP | HumanEval |
---|---|---|---|---|---|---|
GPT-4o (2024-08-06) | 88.7 | 53.6 | 76.6 | 90.5 | 83.4 | 90.2 |
GPT-4T | 86.5 | 48.0 | 72.6 | 88.5 | 86.0 | 87.1 |
GPT-4 | 86.4 | 35.7 | 42.5 | 74.5 | 80.9 | 67.0 |
Claude3 Opus | 86.8 | 50.4 | 60.1 | 90.7 | 83.1 | 84.9 |
Gemini Pro 1.5 | 81.9 | -- | 58.5 | 88.7 | 78.9 | 71.9 |
Gemini Ultra 1.0 | 83.7 | -- | 53.2 | 79.0 | 82.4 | 74.4 |
Llama3 400b | 86.1 | 48.0 | 57.8 | -- | 83.5 | 84.1 |
Model Specifications
Context Length131072
Quality Index0.75
LicenseCustom
Training DataSeptember 2023
Last UpdatedDecember 2024
Input TypeText,Image,Audio
Output TypeText
PublisherOpenAI
Languages27 Languages
Related Models