Grok 3
Version: 1
Grok 3 is xAI's debut non-reasoning model, pre-trained by the Colossus datacenter at supermassive scale to excel in enterprise domains like finance, healthcare, and legal. It has exceptional instruction following capabilities and is purpose-built for common business use cases like data extraction, coding, and text summarization.
Grok 3 supports a 131,072 token context window, enabling it to process and generate responses for extensive inputs while maintaining coherence and depth. Trained on a diverse dataset emphasizing high-quality, reasoning-rich content, it is particularly strong at drawing connections across domains and languages.
Model developer: xAI
Supported languages: English, Spanish, French, Afrikaans, Arabic, Bengali, Welsh, German, Greek, Indonesian, Icelandic, Italian, Japanese, Korean, Latvian, Marathi, Nepali, Punjabi, Polish, Russian, Swahili, Telugu, Thai, Turkish, Ukrainian, Urdu, and Chinese.
Model Release Date: May 19, 2025
Intended Use
Primary Use Cases
Grok 3 blends unparalleled intelligence with vast pretraining knowledge, honed on xAI’s Colossus supercluster. The model comes equipped with enterprise features, such as:- Deep domain expertise: a strong world-knowledge in finance, healthcare, and the law.
- Instruction-following: follows chain of command and is less likely to refuse queries.
- Document reasoning: can process extensive and complicated professional documents.
Core Capabilities
- Deep domain expertise: With deep domain expertise in finance, healthcare, law and science, Grok 3 excels at enterprise tasks like financial forecasting, medical diagnosis support, legal document analysis, and scientific research assistance—delivering precise, domain-specific solutions.
- Extended Context Length: With an extended context length of up to 16k tokens (131K coming soon), Grok 3 processes and understands vast datasets in a single pass—ideal for comprehensive analysis of large documents or complex workflows.
- Steerability & Chain of Command: Grok 3 is extremely steerable and follows instructions closely. The model is less likely to refuse queries, providing more helpful responses while maintaining safety and ethical standards.
- Structured outputs: Grok 3 supports structured outputs, enabling developers to specify JSON schemas for AI-powered automations.
- Functions and Tools support: Like other xAI models, Grok 3 model supports functions and external tools that enable enterprises to build agentic workflows.
Grok 3 Benchmark Performance Overview
To evaluate Grok 3's capabilities, xAI compared its performance against a set of models across various benchmarks using their internal benchmark platform. Below is a high-level overview of Grok 3's quality on representative benchmarks.Benchmark Results
Category | Benchmark | Grok 3 Score (%) |
---|---|---|
Math Competition | AIME 2024 | 60.0 |
Graduate-Level Reasoning | GPQA | 79.1 |
Code Generation | LiveCodeBench | 65.5 |
Multi-Task Language Understanding | MMLU-Pro | 83.1 |
Factuality | SimpleQA | 44.5 |
Instruction Following | IFEval | 91.1 |
Agentic Shopping | TauBench-Retail | 77.4 |
Agentic Flight Booking | TauBench-Airline | 43.0 |
Average | 68.0 |
Key Highlights
- State-of-the-Art Performance: Grok 3 achieves top-tier results among non-reasoning models on diverse academic benchmarks, including: Graduate-level science knowledge (GPQA), General knowledge (MMLU-Pro), and Math competition problems (AIME).
- Document Processing: Grok 3 excels at processing extensive documents and handling complex prompts while maintaining high instruction-following accuracy.
- Factuality and Style: Grok 3 demonstrates improved factual accuracy and enhanced stylistic control.
Model Specifications
Context Length131072
LicenseCustom
Last UpdatedMay 2025
Input TypeText
Output TypeText
PublisherxAI
Languages27 Languages