Mistral Medium 3 (25.05)
Mistral Medium 3 (25.05)
Version: 1
Mistral AILast updated May 2025
Mistral Medium 3 is an advanced Large Language Model (LLM) with state-of-the-art reasoning, knowledge, coding and vision capabilities.
Multipurpose
Vision
Multimodal
Mistral Medium 3 is a SOTA & versatile model designed for a wide range of tasks, including programming, mathematical reasoning, understanding long documents, summarization, and dialogue. It boasts multi-modal capabilities, enabling it to process visual inputs, and supports dozens of languages, including over 80 coding languages. Additionally, it features function calling and agentic workflows. Mistral Medium 3 is optimized for single-node inference, particularly for long-context applications. Its size allows it to achieve high throughput on a single node.

Intended Use

Primary Use Cases

Mistral Medium 3 (25.05) is a great versatile model for tasks such as:
  • Programming
  • Math reasoning
  • Dialogue
  • Long document understanding
  • Visual understanding
  • Summarization
  • Low-latency applications

Academic Evals

Coding

BenchmarkMistral Medium 3Llama 4 MaverickGPT-4oClaude Sonnet 3.7Command-ADeepSeek 3.1
HumanEval 0-shot0.9210.8540.9150.9210.8290.933
LiveCodeBench (v6) 0-shot0.3030.2870.3140.3600.2630.429
MultiPL-E average 0-shot0.8140.7640.7980.8340.7310.849

Instruction Following

BenchmarkMistral Medium 3Llama 4 MaverickGPT-4oClaude Sonnet 3.7Command-ADeepSeek 3.1
ArenaHard 0-shot0.9710.9180.9540.9320.9510.973
IfEval 0-shot0.8940.8890.8720.9180.8970.891

Math

BenchmarkMistral Medium 3Llama 4 MaverickGPT-4oClaude Sonnet 3.7Command-ADeepSeek 3.1
Math500 Instruct 0-shot0.9100.9000.7640.8300.8200.938

Knowledge

BenchmarkMistral Medium 3Llama 4 MaverickGPT-4oClaude Sonnet 3.7Command-ADeepSeek 3.1
GPQA Diamond 5-shot CoT0.5710.6110.5250.6970.4650.611
MMLU Pro 5-shot CoT0.7720.8040.7580.8000.6890.811

Long Context

BenchmarkMistral Medium 3Llama 4 MaverickGPT-4oClaude Sonnet 3.7Command-ADeepSeek 3.1
RULER 32K0.9600.9480.9600.9570.9560.958
RULER 128K0.9020.8670.8890.9380.9120.919

Multimodal

BenchmarkMistral Medium 3Llama 4 MaverickGPT-4oClaude Sonnet 3.7Command-ADeepSeek 3.1
MMMU 0-shot0.6610.7180.6610.713--
DocVQA 0-shot0.9530.9410.8590.843--
AI2D 0-shot0.9370.8440.9330.788--
ChartQA 0-shot0.8260.9040.8600.763--

Human Evals

Mistral Wins vs Llama 4 Maverick

DomainMistral Win RateLlama 4 Maverick Win Rate
Coding81.8218.18
Multimodal53.8546.15
English66.6733.33
French71.4328.57
Spanish73.3326.67
German62.5037.50
Arabic64.7135.29

Mistral Wins vs Competitor Wins for Coding

ModelMistral WinsOther Model Wins
claude_3_740.0060.00
deepseek_v3_137.5062.50
gpt_4o50.0050.00
command_a69.2330.77
llama_4_maverick81.8218.18
Model Specifications
Context Length128000
Quality Index0.77
LicenseCustom
Last UpdatedMay 2025
Input TypeText,Image
Output TypeText
PublisherMistral AI
Languages27 Languages