OpenAI gpt-realtime-2
OpenAI gpt-realtime-2
Version: 2026-05-07
OpenAILast updated May 2026
Gpt‑realtime‑2 is a next‑generation speech‑to‑speech reasoning model that processes live audio input and generates audio responses with built‑in reasoning, enabling low‑latency conversational voice interactions.

Direct from Azure models

Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
  • Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
  • Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
  • Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
  • Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Learn more about Direct from Azure models .

Key capabilities

About this model

Gpt-realtime-2 is a next‑generation, low‑latency streaming model designed for real‑time speech‑to‑speech and conversational AI scenarios. It processes audio input continuously and generates responses during live interactions, enabling applications such as voice assistants and interactive agents. As part of the latest set of voice models, it builds on existing realtime capabilities to support more advanced conversational experiences within live audio pipelines that combine speech recognition, reasoning, and response generation in a unified flow. Supported region: Sweden Central, East US 2, Central US, France Central, South India, Canada Central

Key model capabilities

Key Features:
  • Speech‑to‑speech (S2S) interaction
    Accepts audio input and produces audio output, enabling full conversational voice flows.
  • Built‑in reasoning capabilities
    Incorporates internal reasoning during each turn, improving instruction following and response quality without exposing reasoning traces.
  • Reasoning‑first interaction model
    Designed with reasoning‑centric turn behavior, allowing the model to think before producing final responses.
  • Configurable reasoning effort
    Supports a reasoning.effort parameter with values:
    minimal | low | medium | high, enabling control over reasoning depth and latency tradeoffs.
  • Structured response phases
    Outputs responses in phases:
    commentary → preamble (thinking / filler / tool signals)
    final_answer → final response after reasoning
  • Low‑latency streaming for real‑time use
    Optimized for interactive voice applications where responses must be generated quickly during live conversations.
  • Long conversational context support
    Supports extended context windows (up to ~128k) to maintain richer multi‑turn conversations.
  • Improved instruction following vs earlier models
    Designed to provide stronger adherence to system prompts and user instructions compared to earlier realtime models.
  • Successor to gpt‑realtime‑1.5
    Represents a reasoning-enabled upgrade to the previous realtime S2S model stack.

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

The provider has not supplied this information.

Out of scope use cases

The provider has not supplied this information.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

The provider has not supplied this information.

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

The provider has not supplied this information.

Output formats

The provider has not supplied this information.

Supported languages

The provider has not supplied this information.

Sample JSON response

The provider has not supplied this information.

Model architecture

The provider has not supplied this information.

Long context

The provider has not supplied this information.

Optimizing model performance

The provider has not supplied this information.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

The provider has not supplied this information.

Distribution

Distribution channels

This model is provided through the Azure OpenAI Service.

More information

The following documents are applicable:

Responsible AI considerations

Safety techniques

The provider has not supplied this information.

Safety evaluations

The provider has not supplied this information.

Known limitations

The provider has not supplied this information.

Acceptable use

Acceptable use policy

The provider has not supplied this information.

Quality and performance evaluations

Source: OpenAI The provider has not supplied this information.

Benchmarking methodology

Source: OpenAI The provider has not supplied this information.

Public data summary

Source: OpenAI The provider has not supplied this information.
Model Specifications
Context Length128000
LicenseCustom
Training DataApril 2026
Last UpdatedMay 2026
Input TypeAudio,Text,Image
Output TypeAudio,Text
ProviderOpenAI
Languages27 Languages