Claude Opus 4.1

Version: 20250805

Anthropic•Last updated November 2025

Claude Opus 4.1 is an industry leader for coding. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.

Reasoning

Models from Partners and Community

These models constitute the vast majority of the Azure AI Foundry Models and are provided by trusted third-party organizations, partners, research labs, and community contributors. These models offer specialized and diverse AI capabilities, covering a wide array of scenarios, industries, and innovations. An example of models from Partners and community are the family of large language models developed by Anthropic. Anthropic includes Claude family of state-of-the-art large language models that support text and image input, text output, multilingual capabilities, and vision. See Anthropic's privacy policy to know more about privacy. Learn how to deploy Anthropic models . Characteristics of Models from Partners and Community:

Developed and supported by external partners and community contributors.
Diverse range of specialized models catering to niche or broad use cases.
Typically validated by providers themselves, with integration guidelines provided by Azure.
Community-driven innovation and rapid availability of cutting-edge models.
Standard Azure AI integration, with support and maintenance managed by the respective providers.

Models from Partners and Community are deployable as Managed Compute or serverless API deployment options. The model provider selects how the models are deployable.

Key capabilities

About this model

Key model capabilities

Extended thinking: Extended thinking gives Claude enhanced reasoning capabilities for complex tasks.
Image & text input: With state of the art vision capabilities, Claude Sonnet 4.5 can process images and return text outputs to analyze and understand charts, graphs, technical diagrams, reports, and other visual assets.

Use cases

See Responsible AI for additional consideration for responsible use.

Key use cases

Claude Opus 4.1 is an industry leader for coding and agent capabilities, especially agentic search. It excels for customers needing frontier intelligence:

Advanced coding: Independently plan and execute complex development tasks end-to-end. It adapts to your style, thoughtfully plans and pivots, and maintains high code quality throughout.
Long-horizon tasks and complex problem solving (virtual collaborator): Unlock new use cases involving long-horizon tasks that require memory, sustained reasoning, and long chains of actions.
AI agents: Enable agents to tackle complex, multi-step tasks that require peak accuracy.
Agentic search and research: Connect to multiple data sources to synthesize comprehensive insights across repositories.
Content creation: Create human-quality content with natural prose. Produce long-form creative content, technical documentation, marketing copy, and frontend design mockups.
Memory and context management: Incorporates memory capabilities that allow it to effectively summarize and reference previous interactions.

Out of scope use cases

Please refer to the Claude Opus 4.1 system card .

Pricing

Pricing is based on a number of factors. See pricing details here .

Technical specs

Please refer to the Claude Opus 4.1 system card .

Training cut-off date

March 2025

Input formats

Image & text input: With powerful vision capabilities, Claude Opus 4.1 can process images and return text outputs to analyze and understand charts, graphs, technical diagrams, reports, and other visual assets. Text output: Claude Opus 4.1 can output text of a variety of types and formats, such as prose, lists, Markdown tables, JSON, HTML, code in various programming languages, and more.

Supported language

Claude Opus 4.1 can understand and output a wide variety of languages, such as French, Standard Arabic, Mandarin Chinese, Japanese, Korean, Spanish, and Hindi. Performance will vary based on how well-resourced the language is.

Sample JSON response

200:
{
  "content": [
    {
      "text": "Hi! My name is Claude.",
      "type": "text"
    }
  ],
  "id": "msg_313Zva2CMHLNnXjNJJKqJ2EH",
  "model": "claude-opus-4-1-20250805",
  "role": "assistant",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "type": "message",
  "usage": {
    "input_tokens": 31,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "cache_creation": { "ephemeral_5m_input_tokens": 0, "ephemeral_1h_input_tokens": 0 },
    "output_tokens": 25,
    "service_tier": "standard",
    
  }
}

4XX:
{
  "error": {
    "message": "Invalid request",
    "type": "invalid_request_error"
  },
  "request_id": "<string>",
  "type": "error"
}

Model architecture

Please refer to the Claude Opus 4.1 system card .

Long context

Claude Opus 4.1 has a 200K token context window.

Optimizing model performance

Please refer to the Claude Opus 4.1 system card .

Additional assets

Claude Documentation : Visit Anthropic's Claude documentation for a wealth of resources on model capabilities, prompting techniques, use case guidelines, and more.
Extended Thinking Guide : Understand how best to use extended thinking with Claude.
Claude Prompting Resources : Check out Anthropic's prompting tools and guides to learn how to craft prompts that elicit more helpful, nuanced responses.
Claude Cookbooks : Check out example code for a variety of complex tasks, such as RAG from various web sources, making SQL queries, function calling, multimodal prompting, and more.

Distribution channels

Claude API: For developers interested in building agents, Opus 4.1 is available on the Claude Developer Platform.
Claude Code: Use Opus 4.1 with Anthropic's industry-leading coding agent, Claude Code.

More information

Data handling

By default, we may process customer data in select countries in the US, Europe, Asia and Australia. We will only store data in data centers located in the United States. For more on data handling and retention, see our Privacy Center.
By default, we will not use your inputs or outputs from our commercial products (Anthropic API and Claude Code Enterprise) to train our models. If you explicitly report feedback or bugs to us or otherwise choose to allow us to use your data, then we may use your chats and coding sessions to train our models.
To find out more information regarding your use of an Anthropic commercial offering, or if you would like to know how to contact us regarding a privacy related topic, see our Trust Center and Commercial Terms.

Responsible AI considerations

Safety techniques

The Claude Opus 4.1 system card describes in detail the evaluations Anthropic ran to assess the model's safety and alignment.

Safety evaluations

Claude Opus 4.1 represents incremental improvements over Claude Opus 4, with enhancements in reasoning quality, instruction-following, and overall performance. The Claude Opus 4.1 system card includes details of safety evaluations, including safeguards, agentic safety, alignment and welfare assessments, and reward hacking. The Claude Opus 4 system card includes details of a wide range of pre-deployment safety tests conducted in line with the commitments in our Responsible Scaling Policy; tests of the model's behavior around violations of our Usage Policy; evaluations of specific risks such as “reward hacking” behavior; and agentic safety evaluations for computer use and coding capabilities. In addition, it includes a detailed alignment assessment covering a wide range of misalignment risks identified in our research, and a model welfare assessment.

Known limitations

Please refer to the Claude Opus 4.1 system card and the Claude Opus 4 system card .

Acceptable use

Acceptable use policy

Anthropic's Usage Policy is intended to help our users stay safe and promote the responsible use of our products and services.

Quality and performance evaluations

Benchmark	Test Name	Opus 4.1 Score
Agentic coding	SWE-bench Verified	74.5% / 79.4% with parallel test-time compute
Agentic terminal coding	Terminal-bench	46.5%
Agentic tool use	t2-bench	Retail 86.8%, Airline 63.0%, Telecom 71.5%
Computer use	OSWorld	44.4%
High school math competition	AIME 2025	78.0%
Graduate-level reasoning	GPQA Diamond	81.0%
Multilingual Q&A	MMLU	89.5%
Visual reasoning	MMMU (validation)	77.1%
Financial analysis	Finance Agent	50.9%

Benchmarking methodology

Claude models are hybrid reasoning models. The benchmarks reported in this blog post show the highest scores achieved with or without extended thinking. We've noted below for each result whether extended thinking was used:

No extended thinking: SWE-bench Verified, Terminal-bench
The following benchmarks were reported with extended thinking (up to 64K tokens): TAU-bench, GPQA Diamond, MMMLU, MMMU, and AIME.

TAU-bench methodology: Scores were achieved with a prompt addendum to both the Airline and Retail Agent Policy instructing Claude to better leverage its reasoning abilities while using extended thinking with tool use. The model is encouraged to write down its thoughts as it solves the problem distinct from our usual thinking mode, during the multi-turn trajectories to best leverage its reasoning abilities. To accommodate the additional steps Claude incurs by utilizing more thinking, the maximum number of steps (counted by model completions) was increased from 30 to 100 (most trajectories completed under 30 steps with only one trajectory reaching above 50 steps). SWE-bench methodology: For the Claude 4 family of models, we continue to use the same simple scaffold that equips the model with solely the two tools described in our prior releases here —a bash tool, and a file editing tool that operates via string replacements. We no longer include the third ‘planning tool' used by Claude 3.7 Sonnet. On all Claude 4 models, we report scores out of the full 500 problems.