Azure-Speech-Voice-Live

Version: 1

Microsoft•Last updated December 2025

Voice Live API is a single unified API that enables low-latency, high-quality speech to speech interactions for voice agents.

Azure Speech

Azure Speech is a comprehensive suite of AI-powered speech capabilities that includes speech to text, text to speech, speech translation, and voice live AI. It enables developers to build intelligent voice-enabled applications with high accuracy, multilingual support, and customizable voice experiences.

Key capabilities

About this model

Voice Live API is designed for developers seeking scalable and efficient voice-driven experiences as it eliminates the need to manually orchestrate multiple components. By integrating speech recognition, generative AI, and text to speech functionalities into a single, unified interface, it provides an end-to-end solution for creating seamless voice conversation experiences.

Key model capabilities

Voice Live API includes a comprehensive set of features to support diverse use cases and ensure superior voice interactions: Broad language coverage, customizable speech input and output, flexible GenAI model options, advanced noise suppression, echo cancelation and semantic VAD, avatar, and function calling.

Use cases

See Responsible AI for additional consideration for responsible use.

Key use cases

Azure AI Voice live API is ideal for scenarios where voice-driven interactions improve user experience. Examples include: Contact centers: Develop interactive voice bots for customer support, product catalog navigation, and self-service solutions.
Automotive assistants: Enable hands-free, in-car voice assistants for command execution, navigation, and general inquiries.
Education: Create voice-enabled learning companions and virtual tutors for interactive training and education.
Public services: Build voice agents to assist citizens with administrative queries and public service information.
Human resources: Enhance HR processes with voice-enabled tools for employee support, career development, and training.

Out of scope use cases

We encourage developers to leverage Voice Live API in their innovative solutions or applications. However, here are some considerations when choosing a use case: Avoid scenarios in which the use or misuse of the system could have a consequential impact on life opportunities or legal status: Examples include but are not limited to scenarios in which the AI system could affect an individual's legal status, legal rights, or their access to credit, education, employment, healthcare, housing, insurance, social welfare benefits, services, opportunities, or the terms on which these items are available.
Carefully consider all use cases in high-stakes domains or industries: Examples include but are not limited to healthcare, education, finance, and legal.
Legal and regulatory considerations: Organizations need to evaluate potential specific legal and regulatory obligations when using any AI services and solutions, which may not be appropriate for use in every industry or scenario. Restrictions may vary based on regional or local regulatory requirements. Additionally, AI services or solutions are not designed for and may not be used in ways prohibited in applicable terms of service and relevant codes of conduct.

Pricing

Pricing is based on a number of factors. See pricing details here.

Technical specs

Voice Live API is fully managed, eliminating the need for customers to handle backend orchestration or component integration. Developers provide audio input and receive audio output, avatar visuals, and action triggers—all with minimal latency. With the natively supported GenAI models, developers don't need to deploy or manage any generative AI models, as the API handles all the underlying infrastructure.

Training cut-off date

This information is not available.

Input formats

Text, audio

Supported language

Supported input human languages .

Supported Azure regions

List of supported Azure regions

Sample JSON response

This information is not available.

Model architecture

Voice Live API is composed of multiple AI systems, including language models (both large and small), speech to text models, text to speech models, and more.

Long context

gpt-realtime and gpt-realtime-mini: 32k. For other models please refer to the official Azure OpenAI documentation.

Optimizing model performance

This information is not available.

Additional assets

How to use Voice Live API

Distribution

Voice Live is available through the following distribution methods:

Voice Live SDK
Integrate Voice Live capabilities directly into applications using the Voice Live SDK, available for platforms including .NET, Python, Java, JavaScript.
WebSocket API
Access Voice Live functionality via a public, subscription-based API for flexible integration into web services, mobile apps, and backend systems.

More information

Learn more in the full Azure AI Speech Service documentation .

Responsible AI considerations

Safety techniques

Please refer to the safety techniques of the GenAI model of your choice.

Safety evaluations

Please refer to the safety evaluations of the GenAI model of your choice.

Known limitations

Speech models can exhibit varying levels of accuracy across different demographic groups and languages. View details here.

Acceptable use

Acceptable use policy

Intended use cases

Terms of Service

Terms of Service Link

Voice Live API stores and processes data to provide the service and to monitor for violations of the applicable Product Terms . See also the Microsoft Products and Services Data Protection Addendum , which governs data processing by the Azure AI services, including Voice Live API. Voice Live API is an Azure service; learn more about applicable Azure compliance offerings .

Model Specifications

Last UpdatedDecember 2025

Input TypeText,Audio

Output TypeText,Audio

ProviderMicrosoft

Quick Start