OpenAI gpt-realtime-translate
OpenAI gpt-realtime-translate
Version: 2026-05-07
OpenAILast updated May 2026
Gpt‑realtime‑translate is a low‑latency streaming model that converts spoken audio into translated output in real time, enabling live cross‑language communication within voice applications.

Direct from Azure models

Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
  • Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
  • Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
  • Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
  • Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Learn more about Direct from Azure models .

Key capabilities

About this model

Gpt-realtime-translate is a low‑latency, streaming model for real‑time speech translation, designed to convert spoken language into translated output during live audio interactions. It processes continuous audio streams and enables applications to translate speech across languages as it is spoken, supporting multilingual communication scenarios such as live conversations, voice assistants, and cross‑language interactions. The model is part of a broader set of speech capabilities that include transcription and translation, allowing developers to build end‑to‑end voice pipelines that operate in real time. Supported region: Canada Central, France Central, and India South. More coming soon

Key model capabilities

Key Features:
  • Real-time speech-to-speech translation
    Converts incoming audio into translated speech output during live, streaming interactions.
  • Simultaneous translated transcription
    Always provides a text transcript of the translated audio in the target language.
  • Low-latency streaming operation
    Processes continuous audio input and returns translated audio in small streaming chunks for near real-time responsiveness.
  • Continuous audio input handling
    Designed for ongoing audio streams, including pauses (expects continuous input rather than discrete clips).
  • Target-language translation control
    Translates speech into a specified output language configured per session.
  • Optional input transcription (via separate model)
    Can include source-language transcripts, enabled through an external transcription model integrated into the pipeline.
  • Streaming audio + text outputs
    Emits both audio and transcript deltas incrementally, enabling synchronized playback and display.

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

The provider has not supplied this information.

Out of scope use cases

The provider has not supplied this information.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

The provider has not supplied this information.

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

The provider has not supplied this information.

Output formats

The provider has not supplied this information.

Supported languages

The provider has not supplied this information.

Sample JSON response

The provider has not supplied this information.

Model architecture

The provider has not supplied this information.

Long context

The provider has not supplied this information.

Optimizing model performance

The provider has not supplied this information.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

The provider has not supplied this information.

Distribution

Distribution channels

This model is provided through the Azure OpenAI Service.

More information

The following documents are applicable:

Responsible AI considerations

Safety techniques

The provider has not supplied this information.

Safety evaluations

The provider has not supplied this information.

Known limitations

The provider has not supplied this information.

Acceptable use

Acceptable use policy

The provider has not supplied this information.

Quality and performance evaluations

Source: OpenAI The provider has not supplied this information.

Benchmarking methodology

Source: OpenAI The provider has not supplied this information.

Public data summary

Source: OpenAI The provider has not supplied this information.
Model Specifications
Context Length128000
LicenseCustom
Training DataApril 2026
Last UpdatedMay 2026
Input TypeAudio
Output TypeAudio,Text
ProviderOpenAI
Languages27 Languages