Azure-Speech-Voice-Live
Version: 1
Azure Speech
Azure Speech is a comprehensive suite of AI-powered speech capabilities that includes speech to text, text to speech, speech translation, and voice live AI. It enables developers to build intelligent voice-enabled applications with high accuracy, multilingual support, and customizable voice experiences.Key capabilities
About this model
Voice Live API is designed for developers seeking scalable and efficient voice-driven experiences as it eliminates the need to manually orchestrate multiple components. By integrating speech recognition, generative AI, and text to speech functionalities into a single, unified interface, it provides an end-to-end solution for creating seamless voice conversation experiences.Key model capabilities
Voice Live API includes a comprehensive set of features to support diverse use cases and ensure superior voice interactions: Broad language coverage, customizable speech input and output, flexible GenAI model options, advanced noise suppression, echo cancelation and semantic VAD, avatar, and function calling.Use cases
See Responsible AI for additional consideration for responsible use.Key use cases
Azure AI Voice live API is ideal for scenarios where voice-driven interactions improve user experience. Examples include: Contact centers: Develop interactive voice bots for customer support, product catalog navigation, and self-service solutions.Automotive assistants: Enable hands-free, in-car voice assistants for command execution, navigation, and general inquiries.
Education: Create voice-enabled learning companions and virtual tutors for interactive training and education.
Public services: Build voice agents to assist citizens with administrative queries and public service information.
Human resources: Enhance HR processes with voice-enabled tools for employee support, career development, and training.
Out of scope use cases
We encourage developers to leverage Voice Live API in their innovative solutions or applications. However, here are some considerations when choosing a use case: Avoid scenarios in which the use or misuse of the system could have a consequential impact on life opportunities or legal status: Examples include but are not limited to scenarios in which the AI system could affect an individual's legal status, legal rights, or their access to credit, education, employment, healthcare, housing, insurance, social welfare benefits, services, opportunities, or the terms on which these items are available.Carefully consider all use cases in high-stakes domains or industries: Examples include but are not limited to healthcare, education, finance, and legal.
Legal and regulatory considerations: Organizations need to evaluate potential specific legal and regulatory obligations when using any AI services and solutions, which may not be appropriate for use in every industry or scenario. Restrictions may vary based on regional or local regulatory requirements. Additionally, AI services or solutions are not designed for and may not be used in ways prohibited in applicable terms of service and relevant codes of conduct.
Pricing
Pricing is based on a number of factors. See pricing details here.Technical specs
Voice Live API is fully managed, eliminating the need for customers to handle backend orchestration or component integration. Developers provide audio input and receive audio output, avatar visuals, and action triggers—all with minimal latency. With the natively supported GenAI models, developers don't need to deploy or manage any generative AI models, as the API handles all the underlying infrastructure.Training cut-off date
This information is not available.Input formats
Text, audioSupported language
Supported input human languages .Supported Azure regions
List of supported Azure regionsSample JSON response
This information is not available.Model architecture
Voice Live API is composed of multiple AI systems, including language models (both large and small), speech to text models, text to speech models, and more.Long context
gpt-realtime and gpt-realtime-mini: 32k. For other models please refer to the official Azure OpenAI documentation.Optimizing model performance
This information is not available.Additional assets
How to use Voice Live APIDistribution
Voice Live is available through the following distribution methods:-
Voice Live SDK
Integrate Voice Live capabilities directly into applications using the Voice Live SDK, available for platforms including .NET, Python, Java, JavaScript. -
WebSocket API
Access Voice Live functionality via a public, subscription-based API for flexible integration into web services, mobile apps, and backend systems.
More information
Learn more in the full Azure AI Speech Service documentation .Responsible AI considerations
Safety techniques
Please refer to the safety techniques of the GenAI model of your choice.Safety evaluations
Please refer to the safety evaluations of the GenAI model of your choice.Known limitations
Speech models can exhibit varying levels of accuracy across different demographic groups and languages. View details here.Acceptable use
Acceptable use policy
Intended use casesTerms of Service
Terms of Service Link
Voice Live API stores and processes data to provide the service and to monitor for violations of the applicable Product Terms . See also the Microsoft Products and Services Data Protection Addendum , which governs data processing by the Azure AI services, including Voice Live API. Voice Live API is an Azure service; learn more about applicable Azure compliance offerings .Model Specifications
Last UpdatedDecember 2025
Input TypeText,Audio
Output TypeText,Audio
ProviderMicrosoft