gpt-audio

Best suited for rich, asynchronous audio input/output interactions, such as creating spoken summaries from text.

Azure OpenAI

Direct from Azure

Version: 2025-08-28

Direct from Azure models

Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:

Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.

Learn more about Direct from Azure models .

Key capabilities

About this model

gpt-audio enables voice-based interaction by processing spoken prompts and generating responses, capturing subtle audio cues for deeper, more immersive experiences.

Key model capabilities

These audio features can be utilized in various ways:

Create spoken summaries from text, offering a more engaging method to present information.
Analyze the sentiment of audio recordings, converting vocal nuances into text-based insights.
Facilitate asynchronous speech-in, speech-out interactions

Use cases

Pricing

Technical specs

Training disclosure

Distribution

More information

Quick facts

Model providerAzure OpenAI

TypeAudio generation

LifecycleGenerally available (GA)

Input typeaudio, text

Output typeaudio, text

Context window128k

Token limits16384 output

PricingView pricing

gpt-audio

About this model

Key model capabilities

Quick facts

Quick start