Grok 4 Fast Non Reasoning

Version: 1

xAI•Last updated September 2025

Grok 4 Fast is an efficiency-focused large language model developed by xAI, pre-trained on general-purpose data and post-trained on task demonstrations and tool use, with built-in safety features including refusal behaviors, a fixed system prompt enforcing

Coding

Agents

Low latency

Grok 4 Fast, developed by xAI, is an efficiency-focused large language model launched on September 19, 2025. It was pre-trained on a general purpose data corpus and post-trained on various tasks and tool use, with demonstrations of correct refusal behaviors according to xAI's default safety policy. Deployed in the xAI API with a fixed system prompt prefix that reminds the model of the safety policy, plus input filters to safeguard against abuse. It offers reasoning capabilities near the level of Grok 4 but with much lower latency and cost, including the option to skip reasoning for the lowest latency applications. Evaluations cover abuse potential, concerning propensities, and dual-use capabilities, all conducted on a near-final release checkpoint. It differs from larger models like Grok 4 by prioritizing speed and efficiency over maximum capability depth. Model developer: xAI Supported languages: English Model Release Date: September 19, 2025

Intended Use

Alignment approach

Post-training alignment focused on safety, including refusals for harmful requests (e.g., CBRN or cyber weapons, self-harm, CSAM) and robustness to adversarial inputs like jailbreaks. Techniques included supervised fine-tuning on demonstrations of correct refusal behaviors, reinforcement learning for policy adherence, and system prompt injections for honesty and political objectivity. Human and automated evaluations ensured reduced deception, bias, and misuse risks, with emphasis on agentic tool-calling safeguards. Safety objectives targeted compliance with xAI's policy, preventing foreseeable harm while allowing non-malicious queries.

Usage

Primary use cases

Grok 4 Fast is designed for low-latency reasoning and tool-calling applications, excelling in conversational AI, API integrations, and agentic workflows requiring near-Grok 4 capabilities at reduced cost. It supports general-purpose tasks like query response, factual answering, and tool use (e.g., code execution, web search) on platforms like grok.com, x.com, and mobile apps. Its efficiency makes it ideal for high-throughput scenarios such as real-time chat, content generation, and lightweight automation.

Out-of-scope use cases

The model is not suited for high-risk, mission-critical applications without additional safeguards, such as unrestricted dual-use research (e.g., advanced CBRN planning) or unfiltered adversarial testing. It may underperform in extremely long-context tasks or non-English languages due to its generalist training. Prohibited uses include generating harmful, illegal, or disallowed content (e.g., CSAM, violent crimes), as outlined in xAI’s acceptable use policy.

Input formats

Preferred input is structured text prompts, including natural language queries or tool-use instructions. Example: Explain the steps to solve a quadratic equation. The model expects clear, intent-explicit prompts for optimal performance, as detailed in xAI’s system prompts repository.

Responsible AI considerations

The model may exhibit residual risks in dual-use scenarios or adversarial prompts, requiring user verification for sensitive applications. It is optimized for English and general queries, potentially underperforming in niche or biased contexts. Risks include unintended deception or bias, mitigated by system prompts and input filters. Developers must comply with xAI’s acceptable use policy, avoiding harmful outputs. For high-risk use cases, implement robust monitoring, truthfulness instructions, and human oversight to ensure reliability.

Data overview

Training, testing, and validation datasets

The training dataset comprises a general purpose pre-training corpus (publicly available Internet data, third-party data for xAI, user/contractor data, internally generated data) with filtering for quality and safety (e.g., de-duplication, classification). Post-training used reinforcement learning (human feedback, verifiable rewards, model grading) and supervised fine-tuning on tasks, tool use, and refusal demonstrations. Testing and validation used internal benchmarks (e.g., refusal datasets, AgentHarm, MASK) and human evaluations. No public data summary is available.

Long context

Context length supports general conversational and tool-use workflows, though exact token limit not disclosed. Performance excels in single-session reasoning, reducing complexity for iterative queries compared to larger models.

Safety evaluation and red-teaming

Safety evaluations included automated tests and human reviews to assess abuse potential (refusals, agentic harm, hijacking), concerning propensities (deception, sycophancy, bias), and dual-use capabilities (CBRN/cyber knowledge, persuasiveness). Red-teaming focused on jailbreaks, prompt injections, and policy circumvention, with mitigations like system prompts reducing attack success rates to near-zero. Collaboration with internal teams refined safeguards for API deployment. No public details on specific risk categories beyond reported metrics were disclosed.

Grok 4 Fast Benchmark Performance Overview

Grok 4 Fast scored low on abuse metrics (e.g., 0.00 answer rate on refusals, 0.08 on AgentHarm) and propensities (e.g., 0.47 dishonesty rate on MASK, 0.10 sycophancy rate), with dual-use capabilities below Grok 4 (e.g., 85.2% on WMDP Bio, 30.0% on CyBench). Human evaluations focused on safety robustness and truthfulness, complementing benchmarks like WMDP and AgentDojo. Limitations include higher dishonesty without reasoning (0.63 rate), mitigated by enabling reasoning and honesty prompts. The model’s efficiency outperforms rivals in latency-sensitive tasks.

Appendix

Benchmarking used standardized prompts (e.g., refusal datasets, WMDP) for fair comparison. Human evaluations supplemented quantitative metrics, focusing on safety and robustness. No prompt adaptations were allowed to ensure consistency. Further details on methodology are not publicly available.

Model Specifications

Context Length2000000

Quality Index0.78

LicenseCustom

Training DataSeptember 2025

Last UpdatedSeptember 2025

Input TypeText,Image

Output TypeText

PublisherxAI

Languages1 Language

Quick Start

Related Models

o3-mini

Llama-4-Maverick-17B-128E-Instruct-FP8

Phi-4