MAI-Transcribe-1.5

MAI-Transcribe-1.5 is the second iteration of Microsoft's best-in-class speech-to-text model family. It delivers consistently strong transcription accuracy across 43 languages, accents, speaking styles, and noisy environments, with faster inference and now

Microsoft

Version: 2026-06-02

MAI-Transcribe-1.5

The second iteration of our best-in-class speech‑to‑text model family. MAI‑Transcribe‑1.5 is now even more robust for real‑world audio. It provides
consistently strong transcription across accents, speaking styles, and noisy environments,
giving developers a strong foundation for building high‑quality voice understanding into their
applications. MAI-Transcribe-1.5 now supports entity biasing — domain-aware transcription that better recognizes industry and scientific terms, proper names, and other domain-specific terminology.

Overview

About this model

MAI‑Transcribe‑1.5 is a speech‑to‑text model built in‑house by the Microsoft AI team, designed to
deliver reliable transcription across 43 languages. It powers a wide range of use cases, including
video captions, meeting transcription, accessibility tools, call analysis, content creation workflows,
and enabling voice agents. The model is optimized to be robust across diverse accents, dialects, and
real‑world acoustic conditions, giving developers a transcription system they can rely on.

Key capabilities

Best-in-class transcription accuracy across 43 languages.
- 25 languages already covered by MAI-Transcribe-1: English, French, German, Italian, Spanish, Hindi, Portuguese, Czech, Danish, Finnish, Hungarian, Dutch, Norwegian Bokmål, Polish, Romanian, Swedish, Japanese, Korean, Chinese, Arabic, Indonesian, Russian, Thai, Turkish, and Vietnamese.
- 18 additional languages: Bulgarian, Catalan, Greek, Estonian, Lithuanian, Slovak, Slovenian, Ukrainian, Assamese, Bengali, Gujarati, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu.
Robust in noisy, real-world conditions.
Faster inference: Substantially lower latency compared to MAI-Transcribe-1, on long-form audio (up to 5.7x faster).
Automatic language identification.
Keyword/entity biasing (up to 200 keywords) to improve transcription in domain-specific contexts.

Use cases

Pricing

Technical specs

Distribution

More information

Quick facts

Model providerMicrosoft

TypeAutomatic speech recognition, Speech to text

LifecycleGenerally available (GA)

Input typeaudio

Output typetext

PricingView pricing

MAI-Transcribe-1.5

About this model

Key capabilities

Quick facts

Quick Start