mistral-ocr-4-0

Document conversion to markdown with interleaved images and text

Mistral AI

Direct from Azure

Version: 1

Why Mistral for Document AI?

Mistral Document AI comes with an improved Document OCR (Optical Character Recognition) processor, powered by our latest OCR model, mistral-ocr-4-0, which enables you to extract text and structured content from PDFs and a variety of document types.

Mistral Document AI offers enterprise-level document processing, combining cutting-edge OCR technology with advanced structured data extraction. Experience faster processing speeds, unparalleled accuracy, and cost-effective solutions, all scalable to meet your needs. Unlock the full potential of your documents with our multilingual support, annotations and adaptable workflows for many document types, enabling you to extract, comprehend, and analyze information with ease.

Enterprise OCR with superior document accuracy

Digitize text from images, PDFs, and a variety of document formats. Extract and understand complex text, handwriting, tables, forms, and images from any document, with benchmark-leading accuracy across global languages.

SOTA doc AI

Our latest model is designed to excel at:

Handwriting: Mistral OCR accurately interprets cursive, mixed-content annotations, and handwritten text layered over printed forms.
Forms: Improved detection of boxes, labels, handwritten entries, and dense layouts. Works well on invoices, receipts, compliance forms, government documents, and such.
Scanned & complex documents: Significantly more robust to compression artifacts, skew, distortion, low DPI, and background noise.
Complex tables: Reconstructs table structures with headers, merged cells, multi-row blocks, and column hierarchies. Outputs HTML table tags with colspan/rowspan to fully preserve layout.

Advanced extraction

OCR4 introduces paragraph-level bounding boxes and detailed block classification (e.g., title, header, footer, code, table, equation, paragraph, list, signature, image, caption, references), moving beyond raw text extraction to enable true structured document understanding. This addresses the most consistent customer request and unlocks advanced use cases like source-grounded citations and human verification pipelines.

Our latest OCR updates table extraction formatting is configurable between default markdown, markdown tables, and HTML tables, allowing for advanced table extraction support. In addition, explicit header and footer extraction is available via configurable parameters.

Multilingual, multimodal

Supports 170 languages across 10 language groups, enabling global enterprise deployment and addressing diverse linguistic needs with confidence in high-quality outputs.

Fastest in category

Lightweight and blazing fast, Mistral OCR outperforms bulkier alternatives without sacrificing accuracy. Small model size, deployable on a single container, optimizing for both cost-efficiency and scalability in enterprise environments.

For industries needing precision, speed, and compliance in document workflows

Regulated sectors needing audit-ready data extraction.
Global enterprises processing multilingual documents in large volumes.
Researchers and academic institutions transforming PDFs into structured datasets.
Compliance-first organizations requiring secure deployment.

Targetted Improvements

Paragraph-Level Bounding Boxes: Addresses the #1 customer request by enabling precise text localization, allowing customers to highlight text and create downstream data pipelines with human verifiers. This is table stakes for competing in the enterprise OCR market.
Block Classification: Classifies each document block (e.g., title, header, footer, table, signature), empowering advanced use cases like form extraction, redactions, and structured data pipelines. Enables customers to evaluate specific elements (e.g., signatures) as true/false.
Formatting Fixes: Resolves previous limitations such as nested list support, ensuring accurate and reliable document parsing for complex layouts.
Enhanced Quality and Reliability: Builds on Q1 improvements (confidence scores, image bounding boxes) to deliver a more robust and accurate OCR solution, with a focus on enterprise readiness.
Future-Proof API Design: Redesigns the API response shape to support streaming and reduced time-to-first-token, unlocking unique offerings in the OCR space and enabling agents to begin citing, extracting, or acting before the full document finishes parsing.

Quick facts

Model providerMistral AI

TypeImage to text

LifecycleGenerally available (GA)

Input typepdf, image

Output typetext

Context window128k

Token limits4096 output

PricingView pricing