Azure-Speech-Text-to-speech-Avatar

Text to speech avatar converts text into a digital video of a human (either a standard avatar or a custom text to speech avatar) speaking with a natural-sounding voice. The text to speech avatar video can be synthesized asynchronously or in real time. Deve

Microsoft

Version: 1

Azure Speech

Azure Speech is a comprehensive suite of AI-powered speech capabilities that includes speech to text, text to speech, speech translation, and voice live AI. It enables developers to build intelligent voice-enabled applications with high accuracy, multilingual support, and customizable voice experiences.

Key capabilities

About this model

With text to speech avatar’s advanced neural network models and VASA-1 model, the feature empowers users to create low latency real-time live chat avatars and deliver life-like and high-quality synthetic talking avatar videos for various applications.

Key model capabilities

Converts text into a digital video of a human speaking with natural-sounding voices powered by Azure AI text to speech.
Provides a collection of standard avatars.
Azure AI text to speech generates the voice of the avatar. For more information, see Avatar voice and language .
Synthesizes text to speech avatar video asynchronously with the batch synthesis API or in real-time .
Allow user to create avatar video content in text to speech avatar playground in AI Foundry.
Enables real-time interactive avatar through Voice live in AI Foundry.

Use cases

Pricing

Technical specs

Distribution

More information

Quick facts

Model providerMicrosoft

TypeVideo generation, Audio generation

LifecycleGenerally available (GA)

Input typetext

Output typevideo

PricingView pricing

Azure-Speech-Text-to-speech-Avatar

About this model

Key model capabilities

Quick facts

Quick Start