Azure-Speech-Speech-Translation

Azure-Speech-Speech-Translation

Translates streaming or recorded audio into text or audio across 140+ languages and dialects. Accuracy can be further optimized with custom models for your specialized use cases.
Microsoft
Version: 1
Azure Speech is a comprehensive suite of AI-powered speech capabilities that includes speech to text, text to speech, speech translation, and voice live AI. It enables developers to build intelligent voice-enabled applications with high accuracy, multilingual support, and customizable voice experiences.

About this model

Speech Translation enables real-time translation of spoken audio across multiple language pairs, producing both translated text and synthesized audio output. It supports two speak-out modes: Realtime Speech Translation: Translate spoken audio in real time with streaming transcription and translation text display, using a prebuilt neural voice for audio output.

Live Interpreter (Limited Access): Translate spoken audio in real time using Personal Voice — a voice generated on-the-fly from the user's own speech input — delivering a personalized interpretation experience. Requires Limited Access approval before use (see Access requirements below).

Key model capabilities

  • Real-time streaming speech translation with simultaneous transcription and translated text display
  • Personal Voice speak-out generated on-the-fly from user's realtime voice input (Live Interpreter)
  • Prebuilt neural voice speak-out with voice name selection (Realtime Speech Translation)
  • Auto-detect Language ID (open-range LID) for source language
  • Original and translated audio playback after session completes
  • API documentation and sample code accessible directly from the playground

Quick facts

Model providerMicrosoft
TypeTranslation, Speech translation
LifecycleGenerally available (GA)
Input typeaudio
Output typetext, audio