AI Model Catalog | Azure AI Foundry Models

Mistral Small 3.1

Version: 1

Mistral AI•Last updated July 2025

Enhanced Mistral Small 3 with multimodal capabilities and a 128k context length.

Multipurpose

Vision

Multimodal

Mistral Small 3.1 (25.03) is the enhanced version of Mistral Small 3 (25.01), featuring multimodal capabilities and an extended context length of up to 128k. It can now process and understand visual inputs as well as long documents, further expanding its range of applications. Like its predecessor, Mistral Small 3.1 (25.03) is a versatile model designed for various tasks such as programming, mathematical reasoning, document understanding, and dialogue. Mistral Small 3.1 (25.03) was designed with low-latency applications in mind and delivers best-in-class efficiency compared to models of the same quality. Mistral Small 3.1 (25.03) has undergone a full post-training process to align the model with human preferences and needs, so it is suitable out-of-the-box for applications that require chat or precise instruction following.

Intended Use

Primary Use Cases

Mistral Small 3.1 (25.03) is a great versatile model for tasks such as:

Programming
Math reasoning
Dialogue
Long document understanding
Visual understanding
Summarization
Low-latency applications

Vision Evals With our improved training methodologies, we observe strong vision capabilities in the model. Not only does Mistral Small 3.1 (25.03) outperforms light-weight models like GPT4o mini, it also rivals larger models like qwen-vl-2 on visual knowledge and reasoning tasks. Moreover, with our innovations in the past few months, Mistral Small 3.1 (25.03) demonstrates performance on par with Pixtral Large we released last year across the board.

Model	MMMU	MMMU Pro	Mathvista	ChartQA	DocVQA	AI2D
Mistral Small 3.1 (25.03) Instruct	64.00	49.25	68.91	86.24	94.08	93.72
GPT4o mini	60.00	37.60	52.50	-	-	-
Qwen2-VL 7B	54.10	30.50	58.20	83.00	94.50	83.00
Qwen2.5-VL 7B	58.60	38.30	68.20	87.30	95.70	83.90
Qwen2-VL 72B	64.50	46.20	70.50	88.30	96.50	88.10
Qwen2.5-VL 72B	70.20	51.10	74.80	89.50	96.40	88.70
Claude 3.5 Haiku	60.50	-	61.60	87.20	90.00	92.10
Gemini 2.0 Flash-Lite	68.00	-	-	-	-	-
Pixtral Large	64.00	-	69.40	88.10	93.30	93.80

Text Pretrain Evals

Model	MMLU (5-shot)	MMLU Pro (5-shot CoT)	GPQA Main (5-shot CoT)	TriviaQA (5-shot)
Mistral Small 3.1 (25.03) Base	81.01%	56.03%	37.50%	80.50%
Mistral Small 3 Base	80.73%	54.37%	34.37%	80.32%
Gemma 2 27B	75.20%	-	-	83.70%
Qwen 2.5 32B	83.30%	55.10%	48.00%	-
Llama 3.1 70B	79.30%	53.80%	-	-

Text Instruct Evals In addition to its strong multimodal capabilities, Mistral Small 3.1 (25.03) retains the robust text performance of Mistral Small 3. It excels in knowledge benchmarks like MMLU and MMLU-Pro, graduate-level question answering (GPQA), reading comprehension (TriviaQA), and math and coding tasks (MATH, HumanEval). Mistral Small 3.1 (25.03) often matches or outperforms much larger models, including 70B parameter Llama models, as well as closed-source models like GPT4o mini and Claude 3.5 Haiku.

Model	MMLU Pro (5-shot CoT)	MATH	HumanEval	GPQA Main (5-shot CoT)
Mistral Small 3.1 (25.03) Instruct	66.76%	69.30%	88.41%	44.42%
Mistral Small 3 (25.01) Instruct	66.30%	70.60%	84.80%	45.30%
Gemma 2 27B Instruct	-	-	-	-
Qwen2.5 32B Instruct	69.00%	83.10%	88.40%	49.50%
Llama 3.3 70B Instruct	68.90%	77.00%	88.40%	-
GPT4o mini	-	70.20%	87.20%	40.20%
Claude 3.5 Haiku	65.00%	69.40%	88.10%	-
Gemini 2.0 Flash-Lite	71.60%	86.80%	-	-

Long-context Evals Mistral Small 3.1 (25.03) is our best generalist model for long-context tasks. It demonstrates 100% retrieval capability on passkey evaluations up to 128k context. Compared to both closed-source and open-source competitor models, Mistral Small 3.1 (25.03) excels in question-answering over long documents (LongBench v2) and in reasoning over entire contexts with challenging latent structure evaluations (Michelangelo Latent List). Most notably, Mistral Small 3.1 (25.03) improves upon Mistral Large in long-context capabilities. NOTE: For competitors, we run all evals using our stack as they did not report these evals

Model	Michelangelo Latent List 128k	LongBench v2 128k
Mistral Small 3.1 (25.03) Instruct	23.59%	36.78%
GPT4o mini (up to 128k context)	10.53%	29.30%
Gemini 2.0 Flash-Lite (up to 1M context)	25.22%	-
Qwen2.5 7B Instruct (with YaRN) (up to 128k context)	-	30%
Qwen2.5 32B Instruct (with YaRN) (up to 128k context)	31.66%	-
Qwen2.5 72B Instruct (with YaRN) (up to 128k context)	-	42.10%
Llama 3.3 70B Instruct (up to 128k context)	3.98%	29.80%
Mistral Large (24.11)	15.85%	34.40%

Model Specifications

Context Length128000

Quality Index0.70

LicenseCustom

Training DataOctober 2023

Last UpdatedJuly 2025

Input TypeText,Image

Output TypeText

PublisherMistral AI

Languages27 Languages

Quick Start

Related Models

o3-mini

Llama-4-Maverick-17B-128E-Instruct-FP8

Phi-4