Mistral Document AI (25.05)
Mistral Document AI (25.05)
Version: 1
Mistral AILast updated August 2025
Document conversion to markdown with interleaved images and text
Vision
Low latency
Mistral Document AI offers enterprise-level document processing, combining cutting-edge OCR technology with advanced structured data extraction. Experience faster processing speeds, unparalleled accuracy, and cost-effective solutions, all scalable to meet your needs. Unlock the full potential of your documents with our multilingual support, annotations and adaptable workflows for many document types, enabling you to extract, comprehend, and analyze information with ease.

Why Mistral for Document AI?

Enterprise OCR with superior document accuracy

Digitize text from images or pdf document files. Extract and understand complex text, handwriting, tables, and images from any document, with 99%+ accuracy across global languages.

State of the Art Document AI

Go beyond raw text extraction. Our AI interprets tables, forms, invoices, and complex layouts with unprecedented accuracy and cognition.

Advanced extraction

Extract to structured JSON with customizable schema, parse forms, classify documents, and process images (text, charts, signatures). Convert charts to tables, extract fine print from figures, or define custom image types.

Multilingual, multimodal

World-class multilingual OCR: outperforms other solutions with 99%+ accuracy across 25+ languages.

Fastest in category

Lightweight and blazing fast, Mistral OCR outperforms bulkier alternatives without sacrificing accuracy.

For industries needing precision, speed, and compliance in document workflowws

  • Regulated sectors needing audit-ready data extraction.
  • Global enterprises processing multilingual documents in large volumes.
  • Researchers and academic institutions transforming PDFs into structured datasets.
  • Compliance-first organizations requiring secure deployment.

Content Safety

Content safety is applied for annotations only, it is not enforced for OCR outputs.

Intended Use

Primary Use Cases

  • Document-to-data, at scale. Convert physical documents (contracts, invoices, forms, and reports) to custom-structured digital copies in minutes.
  • Extract and analyze. Enable AI-powered insights: detect patterns, validate data, and enhance enterprise search out of scanned documents.
  • Translate and localize. Quickly localize contracts, reports, and correspondences across, with compliance-ready accuracy.
  • Automate workflows with AI. Build end-to-end document pipelines — from OCR digitization to natural language querying, with fully automated structuring in-between.
  • Monitor compliance and manage risk. Automatically audit document flows, redact sensitive data, or enforce retention policies, while keeping full traceability.

Basic OCR

Mistral Document AI comes with a Document OCR (Optical Character Recognition) processor, powered by our latest OCR model mistral-ocr-2505, which enables you to extract text and structured content from PDF documents.

Key Features

  • Extracts text content while maintaining document structure and hierarchy
  • Preserves formatting like headers, paragraphs, lists and tables
  • Returns results in markdown format for easy parsing and rendering
  • Handles complex layouts including multi-column text and mixed content
  • Processes documents at scale with high accuracy
  • Supports multiple document formats including:
    • image_url: png, jpeg/jpg, avif and more...
    • document_url: pdf, pptx, docx and more...
      The OCR processor returns the extracted text content, images bboxes and metadata about the document structure, making it easy to work with the recognized content programmatically.

Annotations

In addition to the basic OCR functionality, Mistral Document AI API adds the annotations functionality, which allows you to extract information in a structured json-format that you provide. Specifically, it offers two types of annotations:
  • bbox_annotation: gives you the annotation of the bboxes extracted by the OCR model (charts/ figures etc) based on user requirement and provided bbox/image annotation format. The user may ask to describe/caption the figure for instance.
  • document_annotation: returns the annotation of the entire document based on the provided document annotation format.

Key Features

  • Labeling and annotating data
  • Extraction and structuring of specific information from documents into a predefined JSON format
  • Automation of data extraction to reduce manual entry and errors
  • Efficient handling of large document volumes for enterprise-level applications

Limitations & Known Issues

  • Mistral Document AI on Foundry can process documents upto size of 30Mb and 30 pages.
  • Document Annotations are limited to 8 pages.
  • While the pure OCR process performs efficiently and quickly, the annotation process can be slower from time to time and may result in timeouts.
Top-tier benchmarks Mistral Document AI has consistently outperformed other leading OCR models in rigorous benchmark tests. Its superior accuracy across multiple aspects of document analysis is illustrated below. We can extract both images and text embedded in documents, unlike the other models. To ensure fairness in our comparison, we evaluated using a test set that is text-only, containing a variety of sources including papers and PDFs sourced from the web:
ModelOverallMathMultilingualScannedTables
Azure OCR89.5285.7287.5294.6589.52
Mistral Document AI94.8994.2989.5598.9696.12
GPT-4o-2024-11-2089.7787.5586.0094.5891.70
Gemini-1.5-Flash-00290.2389.1186.7694.8790.48
Gemini-1.5-Pro-00289.9288.4886.3396.1589.71
Gemini-2.0-Flash-00188.6984.1885.8095.1191.46
Google Document AI83.4280.2986.4292.7778.16
Natively multilingual Since Mistral’s founding, we have aspired to serve the world with our models, and consequently strived for multilingual capabilities across our offerings. Mistral Document AI takes this to a new level, being able to parse, understand, and transcribe thousands of scripts, fonts, and languages across all continents. This versatility is crucial for both global organizations that handle documents from diverse linguistic backgrounds, as well as hyperlocal businesses serving niche markets.
ModelFuzzy Match in Generation
Azure OCR97.31
Mistral Document AI99.02
Google-Document-AI95.88
Gemini-2.0-Flash-00196.53
Benchmarks per language
LanguageAzure OCRGoogle Doc AIGemini-2.0-Flash-001Mistral Document AI
ru97.3595.5696.5899.09
fr97.5096.3697.0699.20
hi96.4595.6594.9997.55
zh91.4090.8991.8597.11
pt97.9696.2497.2599.42
de98.3997.0997.1999.51
es98.5497.5297.7599.54
tr95.9193.8594.6697.00
uk97.8196.2496.7099.29
it98.3197.6997.6899.42
ro96.4595.1495.8898.79
Model Specifications
Context Length128000
LicenseCustom
Last UpdatedAugust 2025
Input TypePdf,Image
Output TypeText
PublisherMistral AI
Languages27 Languages