Azure-Speech-Speech-Translation
Translates streaming or recorded audio into text or audio across 140+ languages and dialects. Accuracy can be further optimized with custom models for your specialized use cases.
Live Interpreter (Limited Access): Translate spoken audio in real time using Personal Voice — a voice generated on-the-fly from the user's own speech input — delivering a personalized interpretation experience. Requires Limited Access approval before use (see Access requirements below).
Azure Speech is a comprehensive suite of AI-powered speech capabilities that includes speech to text, text to speech, speech translation, and voice live AI. It enables developers to build intelligent voice-enabled applications with high accuracy, multilingual support, and customizable voice experiences.
About this model
Speech Translation enables real-time translation of spoken audio across multiple language pairs, producing both translated text and synthesized audio output. It supports two speak-out modes: Realtime Speech Translation: Translate spoken audio in real time with streaming transcription and translation text display, using a prebuilt neural voice for audio output.Live Interpreter (Limited Access): Translate spoken audio in real time using Personal Voice — a voice generated on-the-fly from the user's own speech input — delivering a personalized interpretation experience. Requires Limited Access approval before use (see Access requirements below).
Key model capabilities
- Real-time streaming speech translation with simultaneous transcription and translated text display
- Personal Voice speak-out generated on-the-fly from user's realtime voice input (Live Interpreter)
- Prebuilt neural voice speak-out with voice name selection (Realtime Speech Translation)
- Auto-detect Language ID (open-range LID) for source language
- Original and translated audio playback after session completes
- API documentation and sample code accessible directly from the playground
Quick facts
Model providerMicrosoft
TypeTranslation, Speech translation
LifecycleGenerally available (GA)
Input typeaudio
Output typetext, audio
PricingView pricing