Azure-Speech-Voice-Live

Azure-Speech-Voice-Live

Voice Live API is a single unified API that enables low-latency, high-quality speech to speech interactions for voice agents.
Microsoft
Version: 1
Azure Speech is a comprehensive suite of AI-powered speech capabilities that includes speech to text, text to speech, speech translation, and voice live AI. It enables developers to build intelligent voice-enabled applications with high accuracy, multilingual support, and customizable voice experiences.

About this model

Voice Live API is designed for developers seeking scalable and efficient voice-driven experiences as it eliminates the need to manually orchestrate multiple components. By integrating speech recognition, generative AI, and text to speech functionalities into a single, unified interface, it provides an end-to-end solution for creating seamless voice conversation experiences.

Key model capabilities

Voice Live API includes a comprehensive set of features to support diverse use cases and ensure superior voice interactions: Broad language coverage, customizable speech input and output, flexible GenAI model options, advanced noise suppression, echo cancelation and semantic VAD, avatar, and function calling.

Quick facts

Model providerMicrosoft
TypeConversational AI, Speech to text, Text to speech
LifecycleGenerally available (GA)
Input typetext, audio
Output typetext, audio