Mistral Document AI (25.12)
Mistral Document AI (25.12)
Version: 1
Mistral AILast updated February 2026
Document conversion to markdown with interleaved images and text
Vision
Low latency

Direct from Azure models

Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
  • Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
  • Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
  • Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
  • Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Learn more about Direct from Azure models .

Key capabilities

About this model

Mistral Document AI comes with an improved Document OCR (Optical Character Recognition) processor, powered by our latest OCR model, mistral-ocr-2512, which enables you to extract text and structured content from PDFs and a variety of document types. Mistral Document AI offers enterprise-level document processing, combining cutting-edge OCR technology with advanced structured data extraction. Experience faster processing speeds, unparalleled accuracy, and cost-effective solutions, all scalable to meet your needs. Unlock the full potential of your documents with our multilingual support, annotations and adaptable workflows for many document types, enabling you to extract, comprehend, and analyze information with ease.

Enterprise OCR with superior document accuracy

Digitize text from images, PDFs, and a variety of document formats. Extract and understand complex text, handwriting, tables, forms, and images from any document, with benchmark-leading accuracy across global languages.

SOTA doc AI

Our latest model is designed to excel at:
  • Handwriting: Mistral OCR accurately interprets cursive, mixed-content annotations, and handwritten text layered over printed forms.
  • Forms: Improved detection of boxes, labels, handwritten entries, and dense layouts. Works well on invoices, receipts, compliance forms, government documents, and such.
  • Scanned & complex documents: Significantly more robust to compression artifacts, skew, distortion, low DPI, and background noise.
  • Complex tables: Reconstructs table structures with headers, merged cells, multi-row blocks, and column hierarchies. Outputs HTML table tags with colspan/rowspan to fully preserve layout.

Advanced extraction

Our latest OCR updates table extraction formatting is configurable between default markdown, markdown tables, and HTML tables, allowing for advanced table extraction support. In addition, explicit header and footer extraction is available via configurable parameters.

Multilingual, multimodal

World-class multilingual OCR: outperforms other solutions with benchmark-leading accuracy across 25+ languages.

Fastest in category

Lightweight and blazing fast, Mistral OCR outperforms bulkier alternatives without sacrificing accuracy.

For industries needing precision, speed, and compliance in document workflowws

  • Regulated sectors needing audit-ready data extraction.
  • Global enterprises processing multilingual documents in large volumes.
  • Researchers and academic institutions transforming PDFs into structured datasets.
  • Compliance-first organizations requiring secure deployment.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Intended Use

Primary Use Cases

  • Document-to-data, at scale. Convert physical documents (contracts, invoices, forms, and reports) to custom-structured digital copies in minutes.
  • Extract and analyze. Enable AI-powered insights: detect patterns, validate data, and enhance enterprise search out of scanned documents.
  • Translate and localize. Quickly localize contracts, reports, and correspondences across, with compliance-ready accuracy.
  • Automate workflows with AI. Build end-to-end document pipelines — from OCR digitization to natural language querying, with fully automated structuring in-between.
  • Monitor compliance and manage risk. Automatically audit document flows, redact sensitive data, or enforce retention policies, while keeping full traceability.

Basic OCR

Mistral Document AI comes with an improved Document OCR (Optical Character Recognition) processor, powered by our latest OCR model, mistral-ocr-2512, which enables you to extract text and structured content from PDFs and a variety of document types. Our latest model improves upon the previous OCR version (mistral-ocr-2507) with a 74% win-rate improvement on forms, scanned documents, complex tables, and handwriting.

Key Features

  • Extracts text content while maintaining document structure and hierarchy
  • Preserves formatting like headers, paragraphs, lists and tables
  • Returns results in markdown format for easy parsing and rendering
  • Handles complex layouts including multi-column text and mixed content
  • Processes documents at scale with high accuracy
  • Supports multiple document formats including:
    • image_url: png, jpeg/jpg, avif, png, tiff, gif, heic/heif, bmp, webp
    • document_url: pdf, pptx, docx, txt, epub, xml, rtf, odt, bib, fb2, ipynb, xml, tex, opml, man
      The OCR processor returns the extracted text content, images bboxes and metadata about the document structure, making it easy to work with the recognized content programmatically.
Go to Mistral AI OCR Documentation for further usage description.

Annotations

In addition to the basic OCR functionality, Mistral Document AI API adds the annotations functionality, which allows you to extract information in a structured json-format that you provide. Specifically, it offers two types of annotations:
  • bbox_annotation: gives you the annotation of the bboxes extracted by the OCR model (charts/ figures etc) based on user requirement and provided bbox/image annotation format. The user may ask to describe/caption the figure for instance.
  • document_annotation: returns the annotation of the entire document based on the provided document annotation format.

Key Features

  • Labeling and annotating data
  • Extraction and structuring of specific information from documents into a predefined JSON format
  • Automation of data extraction to reduce manual entry and errors
  • Efficient handling of large document volumes for enterprise-level applications
Go to Mistral AI Annotations Documentation for further usage description.

Limitations & Known Issues

  • Mistral Document AI on Foundry can process documents up to 30Mb and 30 pages.
  • Document Annotations are limited to 8 pages.
  • While the pure OCR process performs efficiently and quickly, the annotation process can be slower from time to time and may result in timeouts. An optimization will be available in a few weeks timeline.

Supported languages:

de, fr, es, nl, it, pt, hu, pl, cs, da, ro, no, sv, id, th, vi, tl, ar, he, hi, bn, gu, kn, ta, te, ml, en, ru, uk, ko, ja, zh, tr, hy, ka

Preview Terms

This Azure Direct Model is a Preview and is subject to the Supplemental Terms of Use for Microsoft Azure Previews

Quality and performance evaluations

Top-tier benchmarks

To raise the bar, we introduced more challenging internal benchmarks based on real business use-case examples from customers. We then evaluated several models across the domains highlighted below, comparing their outputs to ground truth using fuzzy-match metric for accuracy.:
ModelFormsHandwrittenInvoicesComplex TablesHistorical Scanned
DeepSeek OCR82.657.270.584.481.1
Google Document AI79.673.972.475.987.1
Azure OCR86.278.280.285.983.7
AWS Textract84.572.478.484.881.0
Mistral Document AI95.988.991.896.696.7
Benchmarks per language
Language GroupDeepSeek OCRAzure OCRGoogle Doc AIAWS TextractMistral Document AI
Chinese90.587.083.3N/A97.1
East-Asian90.789.980.7N/A97.6
Eastern Europe90.394.591.488.998.6
English94.693.591.693.998.6
Western Europe94.694.392.294.398.8
Model Specifications
Context Length128000
LicenseCustom
Last UpdatedFebruary 2026
Input TypePdf,Image
Output TypeText
ProviderMistral AI
Languages35 Languages