Claude Haiku 4.5
Version: 20251001
Models from Partners and Community
These models constitute the vast majority of the Azure AI Foundry Models and are provided by trusted third-party organizations, partners, research labs, and community contributors. These models offer specialized and diverse AI capabilities, covering a wide array of scenarios, industries, and innovations. An example of models from Partners and community are the family of large language models developed by Anthropic. Anthropic includes Claude family of state-of-the-art large language models that support text and image input, text output, multilingual capabilities, and vision. See Anthropic's privacy policy to know more about privacy. Learn how to deploy Anthropic models . Characteristics of Models from Partners and Community:- Developed and supported by external partners and community contributors.
- Diverse ran
Responsible AI considerations
Safety techniques
The Claude Haiku 4.5 system card describes in detail the evaluations Anthropic ran to assess the model's safety and alignment.Safety evaluations
Claude Haiku 4.5 shows large safety improvements compared to its predecessor, Claude Haiku 3.5. The new model's safety profile also compares favorably with other extant Anthropic models. The Claude Haiku 4.5 system card includes details of safety evaluations, including assessments of: the model's safeguards; the model's safety profile when working autonomously in “agentic” roles; the model's broad alignment; the model's own potential welfare; the model's tendency to “reward hack” by finding shortcuts to complete tests; and the model's potential to be misused to produce dangerous weapons.Known limitations
Please referQuality and performance evaluations
Claude Haiku 4.5 delivers near-frontier performance for a wide range of use cases, and stands out as one of the best coding and agent models–with the right speed and cost to power free products and scaled sub-agents.| Benchmark | Test Name | Haiku 4.5 Score |
|---|---|---|
| Agentic coding | SWE-bench Verified | 73.3% |
| Agentic terminal coding | Terminal-bench | 41.0% |
| Agentic tool use | t2-bench | Retail 83.2%, Airline 63.6%, Telecom 83.0% |
| Computer use | OSWorld | 50.7% |
| High school math competition | AIME 2025 | 96.3% (python), 80.7% (no tools) |
| Graduate-level reasoning | GPQA Diamond | 73.0% |
| Multilingual Q&A | MMLU | 83.0% |
| Visual reasoning | MMMU (validation) | 73.2% |
Benchmarking methodology
SWE-bench Verified: All Claude results were reported using a simple scaffold with two tools —bash and file editing via string replacements. We report 73.3%, which was averaged over 50 trials, no teModel Specifications
Context Length200000
Quality Index0.84
Training DataJuly 2025
Last UpdatedNovember 2025
Input TypeText,Image
Output TypeText
ProviderAnthropic
Languages8 Languages
Related Models