gpt-4o
OpenAI's most advanced multimodal model in the gpt-4o family. Can handle both text and image inputs.
Source: the OpenAI announcement .
Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
- Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
- Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
- Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
- Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
About this model
As measured on traditional benchmarks, gpt-4o achieves gpt-4 turbo-level performance on text, reasoning, and coding intelligence, while setting new high watermarks on multilingual, audio, and vision capabilities.Key model capabilities
- Text, image processing
- JSON Mode
- parallel function calling
- Enhanced accuracy and responsiveness
- Parity with English text and coding tasks compared to GPT-4 Turbo with Vision
- Superior performance in non-English languages and in vision tasks
- Support for enhancements
- Support for complex structured outputs.
| Model | MMLU | GPQA | MATH | MGSM | DROP | HumanEval |
|---|---|---|---|---|---|---|
| GPT-4o (2024-08-06) | 88.7 | 53.6 | 76.6 | 90.5 | 83.4 | 90.2 |
| GPT-4T | 86.5 | 48.0 | 72.6 | 88.5 | 86.0 | 87.1 |
| GPT-4 | 86.4 | 35.7 | 42.5 | 74.5 | 80.9 | 67.0 |
| Claude3 Opus | 86.8 | 50.4 | 60.1 | 90.7 | 83.1 | 84.9 |
| Gemini Pro 1.5 | 81.9 | -- | 58.5 | 88.7 | 78.9 | 71.9 |
| Gemini Ultra 1.0 | 83.7 | -- | 53.2 | 79.0 | 82.4 | 74.4 |
| Llama3 400b | 86.1 | 48.0 | 57.8 | -- | 83.5 | 84.1 |
Quick facts
Model providerAzure OpenAI
TypeChat completion, Responses
LifecycleGenerally available (GA)
Input typetext, image, audio
Output typetext
Context window131.072k
Token limits16384 output
PricingView pricing