mistral-document-ai-2512
Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
- Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
- Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
- Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
- Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Learn more about Direct from Azure models .
About this model
Mistral Document AI comes with an improved Document OCR (Optical Character Recognition) processor, powered by our latest OCR model, mistral-ocr-2512, which enables you to extract text and structured content from PDFs and a variety of document types.
Mistral Document AI offers enterprise-level document processing, combining cutting-edge OCR technology with advanced structured data extraction. Experience faster processing speeds, unparalleled accuracy, and cost-effective solutions, all scalable to meet your needs. Unlock the full potential of your documents with our multilingual support, annotations and adaptable workflows for many document types, enabling you to extract, comprehend, and analyze information with ease.
Enterprise OCR with superior document accuracy
Digitize text from images, PDFs, and a variety of document formats. Extract and understand complex text, handwriting, tables, forms, and images from any document, with benchmark-leading accuracy across global languages.
SOTA doc AI
Our latest model is designed to excel at:
- Handwriting: Mistral OCR accurately interprets cursive, mixed-content annotations, and handwritten text layered over printed forms.
- Forms: Improved detection of boxes, labels, handwritten entries, and dense layouts. Works well on invoices, receipts, compliance forms, government documents, and such.
- Scanned & complex documents: Significantly more robust to compression artifacts, skew, distortion, low DPI, and background noise.
- Complex tables: Reconstructs table structures with headers, merged cells, multi-row blocks, and column hierarchies. Outputs HTML table tags with colspan/rowspan to fully preserve layout.
Advanced extraction
Our latest OCR updates table extraction formatting is configurable between default markdown, markdown tables, and HTML tables, allowing for advanced table extraction support. In addition, explicit header and footer extraction is available via configurable parameters.
Multilingual, multimodal
World-class multilingual OCR: outperforms other solutions with benchmark-leading accuracy across 25+ languages.
Fastest in category
Lightweight and blazing fast, Mistral OCR outperforms bulkier alternatives without sacrificing accuracy.
For industries needing precision, speed, and compliance in document workflowws
- Regulated sectors needing audit-ready data extraction.
- Global enterprises processing multilingual documents in large volumes.
- Researchers and academic institutions transforming PDFs into structured datasets.
- Compliance-first organizations requiring secure deployment.