Azure-Speech-Speech-Translation
Version: 1
Azure Speech
Azure Speech is a comprehensive suite of AI-powered speech capabilities that includes speech to text, text to speech, speech translation, and voice live AI. It enables developers to build intelligent voice-enabled applications with high accuracy, multilingual support, and customizable voice experiences.Key capabilities
About this model
Speech Translation enables real-time translation of spoken audio across multiple language pairs, producing both translated text and synthesized audio output. It supports two speak-out modes: Realtime Speech Translation: Translate spoken audio in real time with streaming transcription and translation text display, using a prebuilt neural voice for audio output.Live Interpreter (Limited Access): Translate spoken audio in real time using Personal Voice — a voice generated on-the-fly from the user's own speech input — delivering a personalized interpretation experience. Requires Limited Access approval before use (see Access requirements below).
Key model capabilities
- Real-time streaming speech translation with simultaneous transcription and translated text display
- Personal Voice speak-out generated on-the-fly from user's realtime voice input (Live Interpreter)
- Prebuilt neural voice speak-out with voice name selection (Realtime Speech Translation)
- Auto-detect Language ID (open-range LID) for source language
- Original and translated audio playback after session completes
- API documentation and sample code accessible directly from the playground
Use cases
See Responsible AI for additional consideration for responsible use.Key use cases
| Use case | Scenario | Solution |
|---|---|---|
| Enterprise POC validation | An enterprise developer needs to evaluate Speech Translation quality and latency before committing to API integration. | Use the no-code playground to test Realtime Speech Translation across language pairs, assess latency, and review sample code — without writing any code upfront. |
| Live Interpreter demo for sales | A sales engineer needs to demo Live Interpreter with Personal Voice to a customer's non-technical decision-makers during a POC presentation. | Use the playground's Live Interpreter tab with a preset Personal Voice demo — no API key or code required — to show a compelling, personalized translation experience in-meeting. |
| Personal Voice evaluation | An enterprise customer wants to validate Personal Voice output quality for a live interpretation use case before applying for Limited Access. | Use pre-canned Personal Voice demos in the playground to experience the output without requiring Limited Access approval. |
| Multilingual conference interpretation | An event organizer needs real-time spoken translation for a multilingual audience. | Use Realtime Speech Translation with a prebuilt neural voice to translate the speaker's audio and play translated speech to the audience in real time. |
| API integration pre-validation | A solution architect wants to understand the Speech Translation API capability and review sample code before designing a system integration. | Use the playground to run a live translation session, then access the API docs and sample code linked directly from the playground page. |
Out of scope use cases
The Speech Translation Playground is designed for real-time, interactive translation try-out and POC validation. The following are explicitly out of scope:- Batch translation: Processing large volumes of pre-recorded audio files is not supported in this playground experience.
- Offline mode: The playground requires an active internet connection and cannot be used offline.
- Mobile support: The playground is a web-only experience; mobile platforms are not supported.
- New language pair additions: Only language pairs currently supported by the Speech Translation API are available. New pairs are not added via the playground.
Pricing
Pay-As-You-Go & Commitment Tiers See pricing details here . Important: Playground try-out sessions incur real Azure translation costs billed to the customer's subscription. For multi-target language translation pricing, see Speech Translation pricing .Technical specs
Speech Translation Playground offers the following core capabilities: Realtime Speech Translation: Stream audio input and receive simultaneous translated text and audio output using a prebuilt neural voice.Live Interpreter (Limited Access): Stream audio input and receive translated audio output using Personal Voice generated on-the-fly from the user's own voice. Requires Limited Access approval (see Access requirements below).
Auto-detect Language ID: Open-range language identification for source language — no manual selection required.
Training cut-off date
This information is not available.Input formats
Realtime Speech Translation and Live Interpreter: 8kHz/16kHz mono audio, PCM, ALAW, MULAW, G722Supported language
Speech Translation supported language pairs .Supported Azure regions
List of supported Azure regions .Sample response
Please refer to the Speech Translation REST API documentation and the sample code accessible directly from the playground page.Model architecture
This information is not available.Access requirements
Live Interpreter (Personal Voice): This feature is Limited Access. Users must apply before use: Limited Access application form . Users without approval can still experience Personal Voice output via pre-canned demos in the playground.Voice consent: Before Personal Voice processing begins, users must explicitly approve a voice consent prompt. Consent is captured and recorded server-side.
Additional assets
This information is not available.Distribution
You can deploy Azure AI Speech features in the cloud. Speech service deployment in sovereign clouds is available for some government entities and their partners. For example, the Azure Government cloud is available to US government entities and their partners. Microsoft Azure operated by 21Vianet cloud is available to organizations with a business presence in China. For more information, see sovereign clouds . The Speech CLI is a command-line tool for using Speech service without having to write any code. Most features in the Speech SDK are available in the Speech CLI, and some advanced features and customizations are simplified in the Speech CLI. The Speech SDK exposes many of the Speech service capabilities you can use to develop speech-enabled applications. The Speech SDK is available in many programming languages and across all platforms. In some cases, you can't or shouldn't use the Speech SDK. In those cases, you can use REST APIs to access the Speech service. For example, use REST APIs for speech translation .More information
Learn more in the full Azure AI Speech Translation documentation .Responsible AI considerations
Safety techniques
Refer to the guidance for integration and responsible use with speech to text .Safety evaluations
This information is not available.Known limitations
Speech translation recognizes what's spoken in an audio input, and then generates translation outputs. This requires proper setup for the expected languages used in the audio input and spoken styles. Non-optimal settings might lead to lower accuracy. Refer to Technical limitations, operational factors, and ranges for more details.Acceptable use
Acceptable use policy
The speech translation API offers convenient options for developing voice-enabled applications, but it is very important to consider the context in which you will integrate the API. You must ensure that you comply with all laws and regulations that apply to your application. This includes understanding your obligations under privacy and communication laws, including national and regional privacy, eavesdropping, and wiretap laws that apply to your jurisdiction. Collect and process only audio that is within the reasonable expectations of your users. This includes ensuring that you have all necessary and appropriate consents from users for you to collect, process, and store their audio data. Refer to Technical limitations, operational factors, and ranges for more details.Terms of Service
Terms of Service Link
Azure Speech - Speech translation is provided under Microsoft’s proprietary licensing terms. Access to the model is subscription-based and governed by Microsoft’s product licensing policies.- License Type: Proprietary
- Access Model: Subscription-based via Azure services
- Terms of Service: https://microsoft.com/licensing/terms/
Model Specifications
Last UpdatedApril 2026
Input TypeAudio
Output TypeText,Audio
ProviderMicrosoft