Grok 3
Grok 3
Version: 1
xAILast updated May 2025
Grok 3 is xAI's debut model, pretrained by Colossus at supermassive scale to excel in specialized domains like finance, healthcare, and the law.
Understanding
Instruction
Summarization
Grok 3 is xAI's debut non-reasoning model, pre-trained by the Colossus datacenter at supermassive scale to excel in enterprise domains like finance, healthcare, and legal. It has exceptional instruction following capabilities and is purpose-built for common business use cases like data extraction, coding, and text summarization. Grok 3 supports a 131,072 token context window, enabling it to process and generate responses for extensive inputs while maintaining coherence and depth. Trained on a diverse dataset emphasizing high-quality, reasoning-rich content, it is particularly strong at drawing connections across domains and languages. Model developer: xAI Supported languages: English, Spanish, French, Afrikaans, Arabic, Bengali, Welsh, German, Greek, Indonesian, Icelandic, Italian, Japanese, Korean, Latvian, Marathi, Nepali, Punjabi, Polish, Russian, Swahili, Telugu, Thai, Turkish, Ukrainian, Urdu, and Chinese. Model Release Date: May 19, 2025

Intended Use

Primary Use Cases

Grok 3 blends unparalleled intelligence with vast pretraining knowledge, honed on xAI’s Colossus supercluster. The model comes equipped with enterprise features, such as:
  • Deep domain expertise: a strong world-knowledge in finance, healthcare, and the law.
  • Instruction-following: follows chain of command and is less likely to refuse queries.
  • Document reasoning: can process extensive and complicated professional documents.
Grok 3 is purpose-built to be the workhorse model for the enterprise, providing a strong foundation for business workflows in any professional field.

Core Capabilities

  • Deep domain expertise: With deep domain expertise in finance, healthcare, law and science, Grok 3 excels at enterprise tasks like financial forecasting, medical diagnosis support, legal document analysis, and scientific research assistance—delivering precise, domain-specific solutions.
  • Extended Context Length: With an extended context length of up to 16k tokens (131K coming soon), Grok 3 processes and understands vast datasets in a single pass—ideal for comprehensive analysis of large documents or complex workflows.
  • Steerability & Chain of Command: Grok 3 is extremely steerable and follows instructions closely. The model is less likely to refuse queries, providing more helpful responses while maintaining safety and ethical standards.
  • Structured outputs: Grok 3 supports structured outputs, enabling developers to specify JSON schemas for AI-powered automations.
  • Functions and Tools support: Like other xAI models, Grok 3 model supports functions and external tools that enable enterprises to build agentic workflows.

Grok 3 Benchmark Performance Overview

To evaluate Grok 3's capabilities, xAI compared its performance against a set of models across various benchmarks using their internal benchmark platform. Below is a high-level overview of Grok 3's quality on representative benchmarks.

Benchmark Results

CategoryBenchmarkGrok 3 Score (%)
Math CompetitionAIME 202460.0
Graduate-Level ReasoningGPQA79.1
Code GenerationLiveCodeBench65.5
Multi-Task Language UnderstandingMMLU-Pro83.1
FactualitySimpleQA44.5
Instruction FollowingIFEval91.1
Agentic ShoppingTauBench-Retail77.4
Agentic Flight BookingTauBench-Airline43.0
Average68.0

Key Highlights

  • State-of-the-Art Performance: Grok 3 achieves top-tier results among non-reasoning models on diverse academic benchmarks, including: Graduate-level science knowledge (GPQA), General knowledge (MMLU-Pro), and Math competition problems (AIME).
  • Document Processing: Grok 3 excels at processing extensive documents and handling complex prompts while maintaining high instruction-following accuracy.
  • Factuality and Style: Grok 3 demonstrates improved factual accuracy and enhanced stylistic control.
Model Specifications
Context Length131072
LicenseCustom
Last UpdatedMay 2025
Input TypeText
Output TypeText
PublisherxAI
Languages27 Languages