gpt-4o-mini-tts
gpt-4o-mini-tts
Version: 2025-12-15
OpenAILast updated December 2025
An advanced text-to-speech solution designed to convert written text into natural-sounding speech.

Direct from Azure models

Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
  • Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
  • Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
  • Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
  • Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Learn more about Direct from Azure models .

Key capabilities

About this model

The gpt-4o-mini-tts model is an advanced text-to-speech solution designed to convert written text into natural-sounding speech. Leveraging the capabilities of GPT-4o, this model offers customizable voice output, allowing developers to instruct the model to speak in specific ways, such as "talk like a sympathetic customer service agent."

Key model capabilities

  • Customizable voice output with the ability to instruct the model to speak in specific ways
  • Natural-sounding speech generation ideal for audiobooks, podcasts, and interactive voice agents
  • Expressive and dynamic voice generation capabilities
  • Processing of substantial text inputs with support for up to 2,000 tokens

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

  1. Customer Service Automation: gpt-4o-mini-tts can be integrated into customer service systems to provide dynamic and empathetic voice responses. By instructing the model to speak in specific ways, such as "talk like a sympathetic customer service agent," businesses can enhance customer interactions and improve satisfaction.
  2. Content Creation and Publishing: The model is ideal for converting written content into engaging audio formats. This can be particularly useful for creating audiobooks, podcasts, and other spoken content, allowing creators to reach a broader audience and cater to different consumer preferences.
  3. Accessibility Enhancements: gpt-4o-mini-tts can be used to make digital content more accessible to individuals with visual impairments or reading difficulties. By converting text into natural-sounding speech, the model helps ensure that information is available to everyone, promoting inclusivity.

Out of scope use cases

Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

The provider has not supplied this information.

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

This model supports an input token limit of 2,000 tokens, allowing it to process substantial text inputs effectively.

Output formats

The provider has not supplied this information.

Supported languages

The provider has not supplied this information.

Sample JSON response

The provider has not supplied this information.

Model architecture

The provider has not supplied this information.

Long context

The provider has not supplied this information.

Optimizing model performance

The provider has not supplied this information.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

gpt-4o-mini-tts has been pretrained on diverse and high-quality text and audio datasets, ensuring a deep understanding of speech nuances and natural intonation. The training process incorporates rigorous enhancement techniques, including supervised fine-tuning and reinforcement learning, to optimize performance and accuracy.

Distribution

Distribution channels

This model is provided through the Azure OpenAI Service.

More information

The following documents are applicable:

Responsible AI considerations

Safety techniques

The provider has not supplied this information.

Safety evaluations

The provider has not supplied this information.

Known limitations

Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case.

Acceptable use

Acceptable use policy

The provider has not supplied this information.
Model Specifications
Context Length2000
LicenseCustom
Last UpdatedDecember 2025
Input TypeText,Audio
Output TypeText,Audio
ProviderOpenAI
Languages57 Languages