AI Model Catalog | Azure AI Foundry Models

OpenAI gpt-4o-transcribe-diarize

Version: 2025-10-15

OpenAI•Last updated October 2025

A cutting-edge speech-to-text solution that deliverables reliable and accurate transcripts; now equipped with diarization support aka identifying different speakers through the transcription.

The gpt-4o-transcribe-diarize model is a cutting-edge speech-to-text solution that leverages the advanced capabilities of GPT-4o to deliver highly accurate audio transcriptions. This model offers significant improvements in word error rate and language recognition, and now equipped with diarization support aka identifying different speakers through the transcription. Designed for precision and efficiency, gpt-4o-transcribe-diarize aims to provide users with reliable and accurate transcripts, making it a valuable tool for various applications. Gpt-4o-transcribe-diarize has been pretrained on specialized audio-centric datasets, which include diverse and high-quality audio samples, ensuring a deep understanding of speech nuances. This model supports a substantial context window of 16,000 tokens, allowing it to process longer audio inputs effectively. With a maximum output of 2,000 tokens, gpt-4o-transcribe-diarize can generate detailed and comprehensive transcriptions. The training process incorporates rigorous enhancement techniques, including supervised fine-tuning and reinforcement learning, to optimize performance and accuracy.

Intended Use

Primary Use Cases

Enhanced Customer Service: gpt-4o-transcribe-diarize can be integrated into customer support systems to transcribe customer calls in real-time. This allows for more dynamic and comprehensive interactions, enabling support agents to quickly understand and resolve customer issues
Meeting Transcription: The model is highly effective for transcribing meeting notes, capturing detailed discussions and now with diarization support aka identifying different speakers through transcription. This can be particularly useful for creating accurate records of meetings, ensuring that all participants have access to the information discussed

Out-of-Scope Use Cases

Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case.

Model provider

This model is provided through the Azure OpenAI Service.

Relevant documents

The following documents are applicable:

Model Specifications

Context Length16000

LicenseCustom

Training DataJuly 2025

Last UpdatedOctober 2025

Input TypeText,Audio

Output TypeText

PublisherOpenAI

Languages57 Languages

Quick Start