MAI-Voice-1

MAI-Voice-1 is a text-to-speech (TTS) model that generates high-quality single-speaker speech and, soon, multi-speaker speech for public preview. It produces audio that strictly follows the input transcript and supports per-turn emotion control as well as

Microsoft

Version: 2025-12-18

MAI-Voice-1

MAI‑Voice-1 is a text‑to‑speech(TTS) model that generates high‑fidelity, highly natural, and expressive speech. It captures human‑like intonation, rhythm, and emotional nuance, enabling more engaging and lifelike conversational experiences. It strictly follows the provided transcript and supports per‑turn emotion control

About this model

There are two ways to set the voice for your project.
• Curated voice library: Licensed voices designed to work straight out of the box.
• Voice prompting: Provide a few secs long audio clip with your request and the model matches it instantly.

Key capabilities

• Natural voice synthesis.
• High-fidelity, high-clarity voice output.
• Licensed voices designed to work straight out of the box.
• Voice prompting: Instantly generate natural speech in any consented voice, without additional training/fine-tuning.
• Long form content generation while maintaining speaker consistency.

Key model capabilities

High fidelity Natural Voice Synthesis
Produces voice with the intonation, rhythm, and emotional range of a real speaker.
State–of-the-Art Voice Prompting
Provide few seconds of an audio clip(up to 120secs) and the model clones it instantly. No fine-tuning required allowing you to onboard a consented voice of your choice easily. Access requires Microsoft approval and guardrails are in place to avoid misuse.
Fine grained control
Shape delivery at the turn/sentence level by controlling the emotion and tone of the output.
Long-form content
Built for extended content covering audiobooks, lectures, podcasts, training materials, and long-form narration.

Together, these capabilities give developers the building blocks to ship voice at scale, across accessibility, virtual assistants, media narration, and customer service

Use cases

Text to speech offers a variety of features catering to a wide range of intended uses across industries and domains. All text to speech features are subject to the terms and conditions applicable to customers’ Azure subscription, including the Azure Acceptable Use Policy and the Code of conduct for Azure AI Speech text to speech.

Key use cases

Media: Entertainment - Give characters a voice. Generate expressive, lifelike audio for games, films, podcasts, audiobooks, and immersive AR/VR experiences.
Virtual Assistants and Chatbots - Make your assistant sound like it belongs in your product. Power conversational agents across apps, vehicles, appliances, and customer service with a branded voice.
Accessibility Features - Build products that more people can use. Add audio narration for visually impaired users and voice support for individuals with speech impairments.
Educational and Interactive Learning - Build character and brand voices for online courses, interactive lessons, simulations, and guided tours.
Media: Marketing and Advertising - Develop a consistent, recognizable voice across product launches, campaigns, and ads.
Self-authored Content - Voice talent can bring blogs, books, social media content, and personal stories to life using a custom voice built from their own.
Interactive Voice Response (IVR) Systems - Build dynamic, natural and expressive voices for call centers and automated phone interactions.
Public Service and Informational Announcements - Deliver clear and engaging voice messages for public venues, traffic updates, weather alerts, event information, and schedules.

Out of scope use cases

Usage will be restricted to use the service in any way that is inconsistent with the Code of Conduct

Pricing

Technical specs

Distribution

More information

Quick facts

Model providerMicrosoft

TypeText to speech, Audio generation

LifecyclePreview

Input typetext

Output typeaudio

PricingView pricing

MAI-Voice-1

About this model

Key capabilities

Key model capabilities

Key use cases

Out of scope use cases

Quick facts

Quick start