Mistral Small 3.1
Mistral Small 3.1
Version: 1
Mistral AILast updated August 2025
Enhanced Mistral Small 3 with multimodal capabilities and a 128k context length.
Multipurpose
Vision
Multimodal

Models from Microsoft, Partners, and Community

Models from Microsoft, Partners, and Community models are a select portfolio of curated models both general-purpose and niche models across diverse scenarios by developed by Microsoft teams, partners, and community contributors
  • Managed by Microsoft: Purchase and manage models directly through Azure with a single license, world class support and enterprise grade Azure infrastructure
  • Validated by providers: Each model is validated and maintained by its respective provider, with Azure offering integration and deployment guidance.
  • Innovation and agility: Combines Microsoft research models with rapid, community-driven advancements.
  • Seamless Azure integration: Standard Azure AI Foundry experience, with support managed by the model provider.
  • Flexible deployment: Deployable as Managed Compute or Serverless API, based on provider preference.
Learn more about models from Microsoft, Partners, and Community

Key capabilities

About this model

Mistral Small 3.1 (25.03) is a versatile model designed for various tasks such as programming, mathematical reasoning, document understanding, and dialogue. In addition to its strong multimodal capabilities, Mistral Small 3.1 (25.03) retains the robust text performance of Mistral Small 3. It excels in knowledge benchmarks like MMLU and MMLU-Pro, graduate-level question answering (GPQA), reading comprehension (TriviaQA), and math and coding tasks (MATH, HumanEval). Mistral Small 3.1 (25.03) often matches or outperforms much larger models, including 70B parameter Llama models, as well as closed-source models like GPT4o mini and Claude 3.5 Haiku. With our improved training methodologies, we observe strong vision capabilities in the model. Not only does Mistral Small 3.1 (25.03) outperforms light-weight models like GPT4o mini, it also rivals larger models like qwen-vl-2 on visual knowledge and reasoning tasks. Moreover, with our innovations in the past few months, Mistral Small 3.1 (25.03) demonstrates performance on par with Pixtral Large we released last year across the board.

Key model capabilities

Mistral Small 3.1 (25.03) is a great versatile model for tasks such as:
  • Programming
  • Math reasoning
  • Dialogue
  • Long document understanding
  • Visual understanding
  • Summarization
  • Low-latency applications

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

Mistral Small 3.1 (25.03) is a great versatile model for tasks such as programming, mathematical reasoning, dialogue, long document understanding, visual understanding, summarization, and low-latency applications. Mistral Small 3.1 (25.03) was designed with low-latency applications in mind and delivers best-in-class efficiency compared to models of the same quality.

Out of scope use cases

The provider has not supplied this information.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

The provider has not supplied this information.

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

Mistral Small 3.1 (25.03) can process and understand visual inputs as well as long documents.

Output formats

The provider has not supplied this information.

Supported languages

The provider has not supplied this information.

Sample JSON response

The provider has not supplied this information.

Model architecture

The provider has not supplied this information.

Long context

Mistral Small 3.1 (25.03) features an extended context length of up to 128k. It can now process and understand long documents, further expanding its range of applications. Mistral Small 3.1 (25.03) is our best generalist model for long-context tasks. It demonstrates 100% retrieval capability on passkey evaluations up to 128k context. Compared to both closed-source and open-source competitor models, Mistral Small 3.1 (25.03) excels in question-answering over long documents (LongBench v2) and in reasoning over entire contexts with challenging latent structure evaluations (Michelangelo Latent List). Most notably, Mistral Small 3.1 (25.03) improves upon Mistral Large in long-context capabilities. NOTE: For competitors, we run all evals using our stack as they did not report these evals
ModelMichelangelo Latent List 128kLongBench v2 128k
Mistral Small 3.1 (25.03) Instruct23.59%36.78%
GPT4o mini (up to 128k context)10.53%29.30%
Gemini 2.0 Flash-Lite (up to 1M context)25.22%-
Qwen2.5 7B Instruct (with YaRN) (up to 128k context)-30%
Qwen2.5 32B Instruct (with YaRN) (up to 128k context)31.66%-
Qwen2.5 72B Instruct (with YaRN) (up to 128k context)-42.10%
Llama 3.3 70B Instruct (up to 128k context)3.98%29.80%
Mistral Large (24.11)15.85%34.40%

Optimizing model performance

The provider has not supplied this information.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

The provider has not supplied this information.

Distribution

Distribution channels

The provider has not supplied this information.

More information

The provider has not supplied this information.

Responsible AI considerations

Safety techniques

The provider has not supplied this information.

Safety evaluations

The provider has not supplied this information.

Known limitations

The provider has not supplied this information.

Acceptable use

Acceptable use policy

The provider has not supplied this information.

Quality and performance evaluations

Source: Mistral AI Vision Evals With our improved training methodologies, we observe strong vision capabilities in the model. Not only does Mistral Small 3.1 (25.03) outperforms light-weight models like GPT4o mini, it also rivals larger models like qwen-vl-2 on visual knowledge and reasoning tasks. Moreover, with our innovations in the past few months, Mistral Small 3.1 (25.03) demonstrates performance on par with Pixtral Large we released last year across the board.
ModelMMMUMMMU ProMathvistaChartQADocVQAAI2D
Mistral Small 3.1 (25.03) Instruct64.0049.2568.9186.2494.0893.72
GPT4o mini60.0037.6052.50---
Qwen2-VL 7B54.1030.5058.2083.0094.5083.00
Qwen2.5-VL 7B58.6038.3068.2087.3095.7083.90
Qwen2-VL 72B64.5046.2070.5088.3096.5088.10
Qwen2.5-VL 72B70.2051.1074.8089.5096.4088.70
Claude 3.5 Haiku60.50-61.6087.2090.0092.10
Gemini 2.0 Flash-Lite68.00-----
Pixtral Large64.00-69.4088.1093.3093.80
Text Pretrain Evals
ModelMMLU (5-shot)MMLU Pro (5-shot CoT)GPQA Main (5-shot CoT)TriviaQA (5-shot)
Mistral Small 3.1 (25.03) Base81.01%56.03%37.50%80.50%
Mistral Small 3 Base80.73%54.37%34.37%80.32%
Gemma 2 27B75.20%--83.70%
Qwen 2.5 32B83.30%55.10%48.00%-
Llama 3.1 70B79.30%53.80%--
Text Instruct Evals In addition to its strong multimodal capabilities, Mistral Small 3.1 (25.03) retains the robust text performance of Mistral Small 3. It excels in knowledge benchmarks like MMLU and MMLU-Pro, graduate-level question answering (GPQA), reading comprehension (TriviaQA), and math and coding tasks (MATH, HumanEval). Mistral Small 3.1 (25.03) often matches or outperforms much larger models, including 70B parameter Llama models, as well as closed-source models like GPT4o mini and Claude 3.5 Haiku.
ModelMMLU Pro (5-shot CoT)MATHHumanEvalGPQA Main (5-shot CoT)
Mistral Small 3.1 (25.03) Instruct66.76%69.30%88.41%44.42%
Mistral Small 3 (25.01) Instruct66.30%70.60%84.80%45.30%
Gemma 2 27B Instruct----
Qwen2.5 32B Instruct69.00%83.10%88.40%49.50%
Llama 3.3 70B Instruct68.90%77.00%88.40%-
GPT4o mini-70.20%87.20%40.20%
Claude 3.5 Haiku65.00%69.40%88.10%-
Gemini 2.0 Flash-Lite71.60%86.80%--
Long-context Evals Mistral Small 3.1 (25.03) is our best generalist model for long-context tasks. It demonstrates 100% retrieval capability on passkey evaluations up to 128k context. Compared to both closed-source and open-source competitor models, Mistral Small 3.1 (25.03) excels in question-answering over long documents (LongBench v2) and in reasoning over entire contexts with challenging latent structure evaluations (Michelangelo Latent List). Most notably, Mistral Small 3.1 (25.03) improves upon Mistral Large in long-context capabilities. NOTE: For competitors, we run all evals using our stack as they did not report these evals
ModelMichelangelo Latent List 128kLongBench v2 128k
Mistral Small 3.1 (25.03) Instruct23.59%36.78%
GPT4o mini (up to 128k context)10.53%29.30%
Gemini 2.0 Flash-Lite (up to 1M context)25.22%-
Qwen2.5 7B Instruct (with YaRN) (up to 128k context)-30%
Qwen2.5 32B Instruct (with YaRN) (up to 128k context)31.66%-
Qwen2.5 72B Instruct (with YaRN) (up to 128k context)-42.10%
Llama 3.3 70B Instruct (up to 128k context)3.98%29.80%
Mistral Large (24.11)15.85%34.40%

Benchmarking methodology

Source: Mistral AI The provider has not supplied this information.

Public data summary

Source: Mistral AI The provider has not supplied this information.
Model Specifications
Context Length128000
Quality Index0.70
LicenseCustom
Training DataOctober 2023
Last UpdatedAugust 2025
Input TypeText,Image
Output TypeText
ProviderMistral AI
Languages27 Languages