Grok 4.3
Grok 4.3
Version: 1
xAILast updated May 2026
Grok 4.3 is the latest model from xAI, with advanced reasoning, productivity, and multi-agent capabilities, enabling it to achieve state-of-the-art performance across challenging academic and industry benchmarks.
Low latency
Agents
Multimodal

Direct from Azure models

Direct from Azure models are a select portfolio curated for their market-differentiated capabilities:
  • Secure and managed by Microsoft: Purchase and manage models directly through Azure with a single license, consistent support, and no third-party dependencies, backed by Azure's enterprise-grade infrastructure.
  • Streamlined operations: Benefit from unified billing, governance, and seamless PTU portability across models hosted on Azure - all part of Microsoft Foundry.
  • Future-ready flexibility: Access the latest models as they become available, and easily test, deploy, or switch between them within Microsoft Foundry; reducing integration effort.
  • Cost control and optimization: Scale on demand with pay-as-you-go flexibility or reserve PTUs for predictable performance and savings.
Learn more about Direct from Azure models .

Key capabilities

About this model

Grok 4.3 is the latest model from xAI, with advanced reasoning, productivity, and multi-agent capabilities, enabling it to achieve state-of-the-art performance across challenging academic and industry benchmarks.

Key model capabilities

Grok 4.3 provides a broad set of capabilities centered on reasoning, long-context understanding, and agentic execution. It features always-on reasoning that enables the model to handle complex, multi-step tasks with improved accuracy and instruction following. The model supports a large context window, allowing it to process extensive documents, conversations, or datasets in a single interaction. It also includes native function calling and tool integration, enabling workflows that combine reasoning with actions such as web search, code execution, and data retrieval. In addition, Grok 4.3 supports multimodal inputs—including text, images, and video—and can produce structured outputs, making it well suited for building reliable, production-grade systems that require automation, data analysis, and end-to-end task orchestration.

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

Grok 4.3 is a general-purpose model intended as a helpful, maximally truth-seeking AI assistant for a wide range of everyday and professional tasks, including answering questions, reasoning, research, writing, coding, translation, creative ideation, and multimodal reasoning over text and images.

Out of scope use cases

It is not designed or intended for high-risk applications (e.g., autonomous decision-making in medicine, law, finance, or safety-critical systems) without appropriate human oversight and domain-expert validation.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

Preferred input is structured text prompts, including natural language queries or tool-use instructions. Supports multimodal inputs like images. Example: "Search for the latest stock prices and summarize." The model expects clear, intent-explicit prompts for optimal performance

Output formats

The provider has not supplied this information.

Supported languages

English (primary), with multilingual support including Spanish, Chinese, Japanese, Arabic, Russian.

Sample JSON response

The provider has not supplied this information.

Model architecture

The provider has not supplied this information.

Long context

Context length of 256,000 tokens, supporting extensive conversational histories, document analysis, and agentic tool-use workflows in a single session.

Optimizing model performance

Use for instant responses; enable the reasoning variant (grok-4-20-reasoning) when deeper analysis is needed. Leverage parallel tool calling for efficiency in agent setups.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

The training dataset comprises a general purpose pre-training corpus (publicly available Internet data, third-party data for xAI, user/contractor data, internally generated data) with filtering for quality and safety (e.g., de-duplication, classification). Specialized post-training emphasized reinforcement learning for tool-calling, reduced hallucinations, speed optimization, and alignment. Testing focused on agentic benchmarks, tool-use accuracy, latency, and safety evaluations. No public data summary is available.

Distribution

Distribution channels

The provider has not supplied this information.

More information

Microsoft's safety and responsible AI evaluations found Grok-4.3 to be less aligned than other models evaluated and offered through Azure Direct resulting in (i) higher risks that the model will produce potentially harmful content and (ii) lower scores on safety and jailbreak benchmarks. To improve safety and enterprise reliability, Microsoft's deployment of Grok 4.3 features a system-applied safety prompt that cannot be disabled. Customers are expected to operate the model without attempting to bypass or interfere with this feature. Grok-4.1 may be capable of producing explicit content, and may do so with a higher propensity than other models. Customers should use both system safety messages and Azure AI Content Safety (AACS) service to manage model behavior and comply with the Microsoft Enterprise AI Services Code of Conduct. Additionally, there may be categories of harm this model can produce that are not covered by Azure AI Content Safety. Accordingly, customers should conduct their own evaluations before deploying Grok-4.1 in production systems.

Responsible AI considerations

Safety techniques

Post-training alignment included refusals for harmful requests (e.g., CBRN, cyber weapons, self-harm, CSAM) and robustness to adversarial inputs. Techniques featured supervised fine-tuning on refusal demonstrations, reinforcement learning for policy adherence, and system prompt safeguards for honesty and reduced misuse. Input filters block abuse attempts.

Safety evaluations

Evaluations assessed abuse potential, concerning propensities (deception, sycophancy, bias), and dual-use risks using internal benchmarks and human reviews. Mitigations reduced attack success rates significantly. Detailed metrics from near-final checkpoints are available in xAI publications.

Known limitations

The model prioritizes speed over depth, so it may not handle highly complex multi-step reasoning as effectively as the reasoning variant. Residual risks exist in adversarial or dual-use scenarios; user verification is recommended for sensitive outputs. Optimized primarily for English/general queries; performance may vary in niche contexts. Not suited for high-risk applications without safeguards.

Acceptable use

Acceptable use policy

Developers must comply with xAI's acceptable use policy, avoiding harmful outputs. For high-risk use cases, implement monitoring, truthfulness checks, and human oversight. Prohibited uses include generating harmful, illegal, or disallowed content (e.g., CSAM, violent crimes), as outlined in xAI's acceptable use policy.

Quality and performance evaluations

Source: xAI During model development, xAI conducted evaluations of Grok 4.3’s safety profile throughout its training process. xAI evaluates the model in both single-agent and multi-agent settings, both of which have access to xAI’s internal tools. Results are reported on the final deployed checkpoint unless otherwise stated.
xAI releases Grok 4.3 with safeguards appropriate for its capability threshold, such as refusal training. Further implementation details of these safeguards are available in Sections 2. With safeguards, xAI views Grok 4.3 to not pose significantly more risk than prior generations of models.
View the xAI model card here .

Benchmarking methodology

Source: xAI Benchmarking used standardized prompts and agentic evaluations for fair comparison. Focus on latency, accuracy, tool-calling success, and safety. Human and automated testing supplemented metrics. Further details on methodology are not publicly available.

Public data summary

Source: xAI The provider has not supplied this information.
Model Specifications
Context Length20000
LicenseCustom
Training DataSeptember 2025
Last UpdatedMay 2026
Input TypeText,Image
Output TypeText
ProviderxAI
Languages1 Language