Mistral Small 3.1
Mistral Small 3.1
Version: 1
Mistral AILast updated March 2025
Enhanced Mistral Small 3 with multimodal capabilities and a 128k context length.
Multipurpose
Vision
Multimodal
Mistral Small 3.1 (25.03) is the enhanced version of Mistral Small 3 (25.01), featuring multimodal capabilities and an extended context length of up to 128k. It can now process and understand visual inputs as well as long documents, further expanding its range of applications. Like its predecessor, Mistral Small 3.1 (25.03) is a versatile model designed for various tasks such as programming, mathematical reasoning, document understanding, and dialogue. Mistral Small 3.1 (25.03) was designed with low-latency applications in mind and delivers best-in-class efficiency compared to models of the same quality. Mistral Small 3.1 (25.03) has undergone a full post-training process to align the model with human preferences and needs, so it is suitable out-of-the-box for applications that require chat or precise instruction following.

Intended Use

Primary Use Cases

Mistral Small 3.1 (25.03) is a great versatile model for tasks such as:
  • Programming
  • Math reasoning
  • Dialogue
  • Long document understanding
  • Visual understanding
  • Summarization
  • Low-latency applications
Vision Evals With our improved training methodologies, we observe strong vision capabilities in the model. Not only does Mistral Small 3.1 (25.03) outperforms light-weight models like GPT4o mini, it also rivals larger models like qwen-vl-2 on visual knowledge and reasoning tasks. Moreover, with our innovations in the past few months, Mistral Small 3.1 (25.03) demonstrates performance on par with Pixtral Large we released last year across the board.
ModelMMMUMMMU ProMathvistaChartQADocVQAAI2D
Mistral Small 3.1 (25.03) Instruct64.0049.2568.9186.2494.0893.72
GPT4o mini60.0037.6052.50---
Qwen2-VL 7B54.1030.5058.2083.0094.5083.00
Qwen2.5-VL 7B58.6038.3068.2087.3095.7083.90
Qwen2-VL 72B64.5046.2070.5088.3096.5088.10
Qwen2.5-VL 72B70.2051.1074.8089.5096.4088.70
Claude 3.5 Haiku60.50-61.6087.2090.0092.10
Gemini 2.0 Flash-Lite68.00-----
Pixtral Large64.00-69.4088.1093.3093.80
Text Pretrain Evals
ModelMMLU (5-shot)MMLU Pro (5-shot CoT)GPQA Main (5-shot CoT)TriviaQA (5-shot)
Mistral Small 3.1 (25.03) Base81.01%56.03%37.50%80.50%
Mistral Small 3 Base80.73%54.37%34.37%80.32%
Gemma 2 27B75.20%--83.70%
Qwen 2.5 32B83.30%55.10%48.00%-
Llama 3.1 70B79.30%53.80%--
Text Instruct Evals In addition to its strong multimodal capabilities, Mistral Small 3.1 (25.03) retains the robust text performance of Mistral Small 3. It excels in knowledge benchmarks like MMLU and MMLU-Pro, graduate-level question answering (GPQA), reading comprehension (TriviaQA), and math and coding tasks (MATH, HumanEval). Mistral Small 3.1 (25.03) often matches or outperforms much larger models, including 70B parameter Llama models, as well as closed-source models like GPT4o mini and Claude 3.5 Haiku.
ModelMMLU Pro (5-shot CoT)MATHHumanEvalGPQA Main (5-shot CoT)
Mistral Small 3.1 (25.03) Instruct66.76%69.30%88.41%44.42%
Mistral Small 3 (25.01) Instruct66.30%70.60%84.80%45.30%
Gemma 2 27B Instruct----
Qwen2.5 32B Instruct69.00%83.10%88.40%49.50%
Llama 3.3 70B Instruct68.90%77.00%88.40%-
GPT4o mini-70.20%87.20%40.20%
Claude 3.5 Haiku65.00%69.40%88.10%-
Gemini 2.0 Flash-Lite71.60%86.80%--
Long-context Evals Mistral Small 3.1 (25.03) is our best generalist model for long-context tasks. It demonstrates 100% retrieval capability on passkey evaluations up to 128k context. Compared to both closed-source and open-source competitor models, Mistral Small 3.1 (25.03) excels in question-answering over long documents (LongBench v2) and in reasoning over entire contexts with challenging latent structure evaluations (Michelangelo Latent List). Most notably, Mistral Small 3.1 (25.03) improves upon Mistral Large in long-context capabilities. NOTE: For competitors, we run all evals using our stack as they did not report these evals
ModelMichelangelo Latent List 128kLongBench v2 128k
Mistral Small 3.1 (25.03) Instruct23.59%36.78%
GPT4o mini (up to 128k context)10.53%29.30%
Gemini 2.0 Flash-Lite (up to 1M context)25.22%-
Qwen2.5 7B Instruct (with YaRN) (up to 128k context)-30%
Qwen2.5 32B Instruct (with YaRN) (up to 128k context)31.66%-
Qwen2.5 72B Instruct (with YaRN) (up to 128k context)-42.10%
Llama 3.3 70B Instruct (up to 128k context)3.98%29.80%
Mistral Large (24.11)15.85%34.40%
Model Specifications
Context Length128000
Quality Index0.70
LicenseCustom
Training DataOct 2023
Last UpdatedMarch 2025
Input TypeText,Image
Output TypeText
PublisherMistral AI
Languages27 Languages