MAI-Transcribe-1.5
MAI-Transcribe-1.5 is the second iteration of Microsoft's best-in-class speech-to-text model family. It delivers consistently strong transcription accuracy across 43 languages, accents, speaking styles, and noisy environments, with faster inference and now
deliver reliable transcription across 43 languages. It powers a wide range of use cases, including
video captions, meeting transcription, accessibility tools, call analysis, content creation workflows,
and enabling voice agents. The model is optimized to be robust across diverse accents, dialects, and
real‑world acoustic conditions, giving developers a transcription system they can rely on.
The second iteration of our best-in-class speech‑to‑text model family. MAI‑Transcribe‑1.5 is now even more robust for real‑world audio. It provides
consistently strong transcription across accents, speaking styles, and noisy environments,
giving developers a strong foundation for building high‑quality voice understanding into their
applications. MAI-Transcribe-1.5 now supports entity biasing — domain-aware transcription that better recognizes industry and scientific terms, proper names, and other domain-specific terminology.
consistently strong transcription across accents, speaking styles, and noisy environments,
giving developers a strong foundation for building high‑quality voice understanding into their
applications. MAI-Transcribe-1.5 now supports entity biasing — domain-aware transcription that better recognizes industry and scientific terms, proper names, and other domain-specific terminology.
About this model
MAI‑Transcribe‑1.5 is a speech‑to‑text model built in‑house by the Microsoft AI team, designed todeliver reliable transcription across 43 languages. It powers a wide range of use cases, including
video captions, meeting transcription, accessibility tools, call analysis, content creation workflows,
and enabling voice agents. The model is optimized to be robust across diverse accents, dialects, and
real‑world acoustic conditions, giving developers a transcription system they can rely on.
Key capabilities
- Best-in-class transcription accuracy across 43 languages.
- 25 languages already covered by MAI-Transcribe-1: English, French, German, Italian, Spanish, Hindi, Portuguese, Czech, Danish, Finnish, Hungarian, Dutch, Norwegian Bokmål, Polish, Romanian, Swedish, Japanese, Korean, Chinese, Arabic, Indonesian, Russian, Thai, Turkish, and Vietnamese.
- 18 additional languages: Bulgarian, Catalan, Greek, Estonian, Lithuanian, Slovak, Slovenian, Ukrainian, Assamese, Bengali, Gujarati, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu.
- Robust in noisy, real-world conditions.
- Faster inference: Substantially lower latency compared to MAI-Transcribe-1, on long-form audio (up to 5.7x faster).
- Automatic language identification.
- Keyword/entity biasing (up to 200 keywords) to improve transcription in domain-specific contexts.
Quick facts
Model providerMicrosoft
TypeAutomatic speech recognition, Speech to text
LifecycleGenerally available (GA)
Input typeaudio
Output typetext
PricingView pricing