Azure-Speech-Text-to-speech-Avatar
Text to speech avatar converts text into a digital video of a human (either a standard avatar or a custom text to speech avatar) speaking with a natural-sounding voice. The text to speech avatar video can be synthesized asynchronously or in real time. Deve
Azure Speech is a comprehensive suite of AI-powered speech capabilities that includes speech to text, text to speech, speech translation, and voice live AI. It enables developers to build intelligent voice-enabled applications with high accuracy, multilingual support, and customizable voice experiences.
About this model
With text to speech avatar’s advanced neural network models and VASA-1 model, the feature empowers users to create low latency real-time live chat avatars and deliver life-like and high-quality synthetic talking avatar videos for various applications.Key model capabilities
- Converts text into a digital video of a human speaking with natural-sounding voices powered by Azure AI text to speech.
- Provides a collection of standard avatars.
- Azure AI text to speech generates the voice of the avatar. For more information, see Avatar voice and language .
- Synthesizes text to speech avatar video asynchronously with the batch synthesis API or in real-time .
- Allow user to create avatar video content in text to speech avatar playground in AI Foundry.
- Enables real-time interactive avatar through Voice live in AI Foundry.
Quick facts
Model providerMicrosoft
TypeVideo generation, Audio generation
LifecycleGenerally available (GA)
Input typetext
Output typevideo
PricingView pricing