Azure-Speech-Text-to-speech-Avatar

Azure-Speech-Text-to-speech-Avatar

Text to speech avatar converts text into a digital video of a human (either a standard avatar or a custom text to speech avatar) speaking with a natural-sounding voice. The text to speech avatar video can be synthesized asynchronously or in real time. Deve
Microsoft
Version: 1
Azure Speech is a comprehensive suite of AI-powered speech capabilities that includes speech to text, text to speech, speech translation, and voice live AI. It enables developers to build intelligent voice-enabled applications with high accuracy, multilingual support, and customizable voice experiences.

About this model

With text to speech avatar’s advanced neural network models and VASA-1 model, the feature empowers users to create low latency real-time live chat avatars and deliver life-like and high-quality synthetic talking avatar videos for various applications.

Key model capabilities

  1. Converts text into a digital video of a human speaking with natural-sounding voices powered by Azure AI text to speech.
  2. Provides a collection of standard avatars.
  3. Azure AI text to speech generates the voice of the avatar. For more information, see Avatar voice and language .
  4. Synthesizes text to speech avatar video asynchronously with the batch synthesis API or in real-time .
  5. Allow user to create avatar video content in text to speech avatar playground in AI Foundry.
  6. Enables real-time interactive avatar through Voice live in AI Foundry.

Quick facts

Model providerMicrosoft
TypeVideo generation, Audio generation
LifecycleGenerally available (GA)
Input typetext
Output typevideo