OpenAI gpt-4o-audio-preview
OpenAI gpt-4o-audio-preview
Version: 2024-12-17
OpenAILast updated December 2025
Best suited for rich, asynchronous audio input/output interactions, such as creating spoken summaries from text.

Direct from Azure models

Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
  • Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
  • Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
  • Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
  • Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Learn more about Direct from Azure models .

Key capabilities

About this model

Currently, the GPT-4o-audio-preview model focuses on text and audio and does not support existing GPT-4o features such as image modality. For many tasks, the generally available GPT-4o models may still be more suitable.

Key model capabilities

These audio features can be utilized in various ways:
  • Create spoken summaries from text, offering a more engaging method to present information.
  • Analyze the sentiment of audio recordings, converting vocal nuances into text-based insights.
  • Facilitate asynchronous speech-in, speech-out interactions

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

These audio features can be utilized in various ways:
  • Create spoken summaries from text, offering a more engaging method to present information.
  • Analyze the sentiment of audio recordings, converting vocal nuances into text-based insights.
  • Facilitate asynchronous speech-in, speech-out interactions

Out of scope use cases

Currently, the GPT-4o-audio-preview model focuses on text and audio and does not support existing GPT-4o features such as image modality. For many tasks, the generally available GPT-4o models may still be more suitable. IMPORTANT: At this time, GPT-4o-audio-preview usage limits are suitable for test and development. To prevent abuse and preserve service integrity, rate limits will be adjusted as needed.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

2024-12-17: Introducing our new multimodal AI, which now supports both text and audio modalities. As this is a preview version, it is designed for testing and feedback purposes and is not yet optimized for production traffic.

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

The gpt-4o-audio-preview model adds support for audio inputs as prompts and generating spoken audio responses.

Output formats

The gpt-4o-audio-preview model adds support for audio inputs as prompts and generating spoken audio responses.

Supported languages

The provider has not supplied this information.

Sample JSON response

The provider has not supplied this information.

Model architecture

The provider has not supplied this information.

Long context

The provider has not supplied this information.

Optimizing model performance

The provider has not supplied this information.

Additional assets

The following documents are applicable:

Training disclosure

Training, testing and validation

The provider has not supplied this information.

Distribution

Distribution channels

This model is provided through the Azure OpenAI service.

More information

gpt-4o-audio-preview has safety built-in by design across modalities, through techniques such as filtering training data and refining the model's behavior through post-training. We have also created new safety systems to provide guardrails on voice outputs. We've evaluated gpt-4o-audio-preview according to our Preparedness Framework and in line with our voluntary commitments. Our evaluations of cybersecurity, CBRN, persuasion, and model autonomy show that gpt-4o-audio-preview does not score above Medium risk in any of these categories. This assessment involved running a suite of automated and human evaluations throughout the model training process. We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities. gpt-4o-audio-preview has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with gpt-4o-audio-preview. We will continue to mitigate new risks as they're discovered.

Responsible AI considerations

Safety techniques

gpt-4o-audio-preview has safety built-in by design across modalities, through techniques such as filtering training data and refining the model's behavior through post-training. We have also created new safety systems to provide guardrails on voice outputs.

Safety evaluations

We've evaluated gpt-4o-audio-preview according to our Preparedness Framework and in line with our voluntary commitments. Our evaluations of cybersecurity, CBRN, persuasion, and model autonomy show that gpt-4o-audio-preview does not score above Medium risk in any of these categories. This assessment involved running a suite of automated and human evaluations throughout the model training process. We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities. gpt-4o-audio-preview has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with gpt-4o-audio-preview. We will continue to mitigate new risks as they're discovered.

Known limitations

Currently, the GPT-4o-audio-preview model focuses on text and audio and does not support existing GPT-4o features such as image modality. For many tasks, the generally available GPT-4o models may still be more suitable. IMPORTANT: At this time, GPT-4o-audio-preview usage limits are suitable for test and development. To prevent abuse and preserve service integrity, rate limits will be adjusted as needed.

Acceptable use

Acceptable use policy

The provider has not supplied this information.

Quality and performance evaluations

Source: OpenAI The provider has not supplied this information.

Benchmarking methodology

Source: OpenAI We've evaluated gpt-4o-audio-preview according to our Preparedness Framework and in line with our voluntary commitments. Our evaluations of cybersecurity, CBRN, persuasion, and model autonomy show that gpt-4o-audio-preview does not score above Medium risk in any of these categories. This assessment involved running a suite of automated and human evaluations throughout the model training process. We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities. gpt-4o-audio-preview has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with gpt-4o-audio-preview. We will continue to mitigate new risks as they're discovered.

Public data summary

Source: OpenAI The provider has not supplied this information.
Model Specifications
Context Length128000
LicenseCustom
Training DataSeptember 2023
Last UpdatedDecember 2025
Input TypeAudio,Text
Output TypeAudio,Text
ProviderOpenAI
Languages27 Languages