Mistral Medium 3 (25.05)
Version: 1
Mistral Medium 3 is a SOTA & versatile model designed for a wide range of tasks, including programming, mathematical reasoning, understanding long documents, summarization, and dialogue.
It boasts multi-modal capabilities, enabling it to process visual inputs, and supports dozens of languages, including over 80 coding languages. Additionally, it features function calling and agentic workflows.
Mistral Medium 3 is optimized for single-node inference, particularly for long-context applications. Its size allows it to achieve high throughput on a single node.
Intended Use
Primary Use Cases
Mistral Medium 3 (25.05) is a great versatile model for tasks such as:- Programming
- Math reasoning
- Dialogue
- Long document understanding
- Visual understanding
- Summarization
- Low-latency applications
Academic Evals
Coding
Benchmark | Mistral Medium 3 | Llama 4 Maverick | GPT-4o | Claude Sonnet 3.7 | Command-A | DeepSeek 3.1 |
---|---|---|---|---|---|---|
HumanEval 0-shot | 0.921 | 0.854 | 0.915 | 0.921 | 0.829 | 0.933 |
LiveCodeBench (v6) 0-shot | 0.303 | 0.287 | 0.314 | 0.360 | 0.263 | 0.429 |
MultiPL-E average 0-shot | 0.814 | 0.764 | 0.798 | 0.834 | 0.731 | 0.849 |
Instruction Following
Benchmark | Mistral Medium 3 | Llama 4 Maverick | GPT-4o | Claude Sonnet 3.7 | Command-A | DeepSeek 3.1 |
---|---|---|---|---|---|---|
ArenaHard 0-shot | 0.971 | 0.918 | 0.954 | 0.932 | 0.951 | 0.973 |
IfEval 0-shot | 0.894 | 0.889 | 0.872 | 0.918 | 0.897 | 0.891 |
Math
Benchmark | Mistral Medium 3 | Llama 4 Maverick | GPT-4o | Claude Sonnet 3.7 | Command-A | DeepSeek 3.1 |
---|---|---|---|---|---|---|
Math500 Instruct 0-shot | 0.910 | 0.900 | 0.764 | 0.830 | 0.820 | 0.938 |
Knowledge
Benchmark | Mistral Medium 3 | Llama 4 Maverick | GPT-4o | Claude Sonnet 3.7 | Command-A | DeepSeek 3.1 |
---|---|---|---|---|---|---|
GPQA Diamond 5-shot CoT | 0.571 | 0.611 | 0.525 | 0.697 | 0.465 | 0.611 |
MMLU Pro 5-shot CoT | 0.772 | 0.804 | 0.758 | 0.800 | 0.689 | 0.811 |
Long Context
Benchmark | Mistral Medium 3 | Llama 4 Maverick | GPT-4o | Claude Sonnet 3.7 | Command-A | DeepSeek 3.1 |
---|---|---|---|---|---|---|
RULER 32K | 0.960 | 0.948 | 0.960 | 0.957 | 0.956 | 0.958 |
RULER 128K | 0.902 | 0.867 | 0.889 | 0.938 | 0.912 | 0.919 |
Multimodal
Benchmark | Mistral Medium 3 | Llama 4 Maverick | GPT-4o | Claude Sonnet 3.7 | Command-A | DeepSeek 3.1 |
---|---|---|---|---|---|---|
MMMU 0-shot | 0.661 | 0.718 | 0.661 | 0.713 | - | - |
DocVQA 0-shot | 0.953 | 0.941 | 0.859 | 0.843 | - | - |
AI2D 0-shot | 0.937 | 0.844 | 0.933 | 0.788 | - | - |
ChartQA 0-shot | 0.826 | 0.904 | 0.860 | 0.763 | - | - |
Human Evals
Mistral Wins vs Llama 4 Maverick
Domain | Mistral Win Rate | Llama 4 Maverick Win Rate |
---|---|---|
Coding | 81.82 | 18.18 |
Multimodal | 53.85 | 46.15 |
English | 66.67 | 33.33 |
French | 71.43 | 28.57 |
Spanish | 73.33 | 26.67 |
German | 62.50 | 37.50 |
Arabic | 64.71 | 35.29 |
Mistral Wins vs Competitor Wins for Coding
Model | Mistral Wins | Other Model Wins |
---|---|---|
claude_3_7 | 40.00 | 60.00 |
deepseek_v3_1 | 37.50 | 62.50 |
gpt_4o | 50.00 | 50.00 |
command_a | 69.23 | 30.77 |
llama_4_maverick | 81.82 | 18.18 |
Model Specifications
Context Length128000
Quality Index0.77
LicenseCustom
Last UpdatedMay 2025
Input TypeText,Image
Output TypeText
PublisherMistral AI
Languages27 Languages
Related Models