Mistral Small 3.1
Version: 1
Models from Microsoft, Partners, and Community
Models from Microsoft, Partners, and Community models are a select portfolio of curated models both general-purpose and niche models across diverse scenarios by developed by Microsoft teams, partners, and community contributors- Managed by Microsoft: Purchase and manage models directly through Azure with a single license, world class support and enterprise grade Azure infrastructure
- Validated by providers: Each model is validated and maintained by its respective provider, with Azure offering integration and deployment guidance.
- Innovation and agility: Combines Microsoft research models with rapid, community-driven advancements.
- Seamless Azure integration: Standard Azure AI Foundry experience, with support managed by the model provider.
- Flexible deployment: Deployable as Managed Compute or Serverless API, based on provider preference.
Key capabilities
About this model
Mistral Small 3.1 (25.03) is a versatile model designed for various tasks such as programming, mathematical reasoning, document understanding, and dialogue. In addition to its strong multimodal capabilities, Mistral Small 3.1 (25.03) retains the robust text performance of Mistral Small 3. It excels in knowledge benchmarks like MMLU and MMLU-Pro, graduate-level question answering (GPQA), reading comprehension (TriviaQA), and math and coding tasks (MATH, HumanEval). Mistral Small 3.1 (25.03) often matches or outperforms much larger models, including 70B parameter Llama models, as well as closed-source models like GPT4o mini and Claude 3.5 Haiku. With our improved training methodologies, we observe strong vision capabilities in the model. Not only does Mistral Small 3.1 (25.03) outperforms light-weight models like GPT4o mini, it also rivals larger models like qwen-vl-2 on visual knowledge and reasoning tasks. Moreover, with our innovations in the past few months, Mistral Small 3.1 (25.03) demonstrates performance on par with Pixtral Large we released last year across the board.Key model capabilities
Mistral Small 3.1 (25.03) is a great versatile model for tasks such as:- Programming
- Math reasoning
- Dialogue
- Long document understanding
- Visual understanding
- Summarization
- Low-latency applications
Use cases
See Responsible AI for additional considerations for responsible use.Key use cases
Mistral Small 3.1 (25.03) is a great versatile model for tasks such as programming, mathematical reasoning, dialogue, long document understanding, visual understanding, summarization, and low-latency applications. Mistral Small 3.1 (25.03) was designed with low-latency applications in mind and delivers best-in-class efficiency compared to models of the same quality.Out of scope use cases
The provider has not supplied this information.Pricing
Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.Technical specs
The provider has not supplied this information.Training cut-off date
The provider has not supplied this information.Training time
The provider has not supplied this information.Input formats
Mistral Small 3.1 (25.03) can process and understand visual inputs as well as long documents.Output formats
The provider has not supplied this information.Supported languages
The provider has not supplied this information.Sample JSON response
The provider has not supplied this information.Model architecture
The provider has not supplied this information.Long context
Mistral Small 3.1 (25.03) features an extended context length of up to 128k. It can now process and understand long documents, further expanding its range of applications. Mistral Small 3.1 (25.03) is our best generalist model for long-context tasks. It demonstrates 100% retrieval capability on passkey evaluations up to 128k context. Compared to both closed-source and open-source competitor models, Mistral Small 3.1 (25.03) excels in question-answering over long documents (LongBench v2) and in reasoning over entire contexts with challenging latent structure evaluations (Michelangelo Latent List). Most notably, Mistral Small 3.1 (25.03) improves upon Mistral Large in long-context capabilities. NOTE: For competitors, we run all evals using our stack as they did not report these evals| Model | Michelangelo Latent List 128k | LongBench v2 128k |
|---|---|---|
| Mistral Small 3.1 (25.03) Instruct | 23.59% | 36.78% |
| GPT4o mini (up to 128k context) | 10.53% | 29.30% |
| Gemini 2.0 Flash-Lite (up to 1M context) | 25.22% | - |
| Qwen2.5 7B Instruct (with YaRN) (up to 128k context) | - | 30% |
| Qwen2.5 32B Instruct (with YaRN) (up to 128k context) | 31.66% | - |
| Qwen2.5 72B Instruct (with YaRN) (up to 128k context) | - | 42.10% |
| Llama 3.3 70B Instruct (up to 128k context) | 3.98% | 29.80% |
| Mistral Large (24.11) | 15.85% | 34.40% |
Optimizing model performance
The provider has not supplied this information.Additional assets
The provider has not supplied this information.Training disclosure
Training, testing and validation
The provider has not supplied this information.Distribution
Distribution channels
The provider has not supplied this information.More information
The provider has not supplied this information.Responsible AI considerations
Safety techniques
The provider has not supplied this information.Safety evaluations
The provider has not supplied this information.Known limitations
The provider has not supplied this information.Acceptable use
Acceptable use policy
The provider has not supplied this information.Quality and performance evaluations
Source: Mistral AI Vision Evals With our improved training methodologies, we observe strong vision capabilities in the model. Not only does Mistral Small 3.1 (25.03) outperforms light-weight models like GPT4o mini, it also rivals larger models like qwen-vl-2 on visual knowledge and reasoning tasks. Moreover, with our innovations in the past few months, Mistral Small 3.1 (25.03) demonstrates performance on par with Pixtral Large we released last year across the board.| Model | MMMU | MMMU Pro | Mathvista | ChartQA | DocVQA | AI2D |
|---|---|---|---|---|---|---|
| Mistral Small 3.1 (25.03) Instruct | 64.00 | 49.25 | 68.91 | 86.24 | 94.08 | 93.72 |
| GPT4o mini | 60.00 | 37.60 | 52.50 | - | - | - |
| Qwen2-VL 7B | 54.10 | 30.50 | 58.20 | 83.00 | 94.50 | 83.00 |
| Qwen2.5-VL 7B | 58.60 | 38.30 | 68.20 | 87.30 | 95.70 | 83.90 |
| Qwen2-VL 72B | 64.50 | 46.20 | 70.50 | 88.30 | 96.50 | 88.10 |
| Qwen2.5-VL 72B | 70.20 | 51.10 | 74.80 | 89.50 | 96.40 | 88.70 |
| Claude 3.5 Haiku | 60.50 | - | 61.60 | 87.20 | 90.00 | 92.10 |
| Gemini 2.0 Flash-Lite | 68.00 | - | - | - | - | - |
| Pixtral Large | 64.00 | - | 69.40 | 88.10 | 93.30 | 93.80 |
| Model | MMLU (5-shot) | MMLU Pro (5-shot CoT) | GPQA Main (5-shot CoT) | TriviaQA (5-shot) |
|---|---|---|---|---|
| Mistral Small 3.1 (25.03) Base | 81.01% | 56.03% | 37.50% | 80.50% |
| Mistral Small 3 Base | 80.73% | 54.37% | 34.37% | 80.32% |
| Gemma 2 27B | 75.20% | - | - | 83.70% |
| Qwen 2.5 32B | 83.30% | 55.10% | 48.00% | - |
| Llama 3.1 70B | 79.30% | 53.80% | - | - |
| Model | MMLU Pro (5-shot CoT) | MATH | HumanEval | GPQA Main (5-shot CoT) |
|---|---|---|---|---|
| Mistral Small 3.1 (25.03) Instruct | 66.76% | 69.30% | 88.41% | 44.42% |
| Mistral Small 3 (25.01) Instruct | 66.30% | 70.60% | 84.80% | 45.30% |
| Gemma 2 27B Instruct | - | - | - | - |
| Qwen2.5 32B Instruct | 69.00% | 83.10% | 88.40% | 49.50% |
| Llama 3.3 70B Instruct | 68.90% | 77.00% | 88.40% | - |
| GPT4o mini | - | 70.20% | 87.20% | 40.20% |
| Claude 3.5 Haiku | 65.00% | 69.40% | 88.10% | - |
| Gemini 2.0 Flash-Lite | 71.60% | 86.80% | - | - |
| Model | Michelangelo Latent List 128k | LongBench v2 128k |
|---|---|---|
| Mistral Small 3.1 (25.03) Instruct | 23.59% | 36.78% |
| GPT4o mini (up to 128k context) | 10.53% | 29.30% |
| Gemini 2.0 Flash-Lite (up to 1M context) | 25.22% | - |
| Qwen2.5 7B Instruct (with YaRN) (up to 128k context) | - | 30% |
| Qwen2.5 32B Instruct (with YaRN) (up to 128k context) | 31.66% | - |
| Qwen2.5 72B Instruct (with YaRN) (up to 128k context) | - | 42.10% |
| Llama 3.3 70B Instruct (up to 128k context) | 3.98% | 29.80% |
| Mistral Large (24.11) | 15.85% | 34.40% |
Benchmarking methodology
Source: Mistral AI The provider has not supplied this information.Public data summary
Source: Mistral AI The provider has not supplied this information.Model Specifications
Context Length128000
Quality Index0.70
LicenseCustom
Training DataOctober 2023
Last UpdatedAugust 2025
Input TypeText,Image
Output TypeText
ProviderMistral AI
Languages27 Languages
Related Models