FW-GLM-5.1

GLM 5.1 is zAI's next-generation flagship model for agentic engineering with 754B total parameters (40B activated), achieving state-of-the-art performance on SWE-Bench Pro and leading on NL2Repo and Terminal-Bench 2.0.

Fireworks

Version: 1

Fireworks on Foundry

Models available for use with Fireworks on Foundry deliver optimized, best-in-class performance on the Fireworks Inference Cloud. Fireworks on Foundry is a Non-Microsoft Product. The following terms apply to a Customer's use of Fireworks on Foundry: When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, Customer Data will be sent outside of Microsoft systems, Customer Data will not be processed pursuant to any Foundry data residency documentation, and different compliance and data handling rules will apply. See Trust Center - Fireworks AI for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organization's compliance requirements.

Key capabilities

About this model

GLM-5.1 is Z.ai's next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor GLM-5. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks). GLM-5.1 shares the same Mixture of Experts (MoE) architecture as GLM-5 with 754 billion total parameters, activating only 40 billion per token for efficient inference. It integrates DeepSeek Sparse Attention (DSA), which selects only the most relevant tokens for attention to reduce the cost of long-context processing. Unlike previous models that plateau early, GLM-5.1 is built to stay effective on agentic tasks over much longer horizons — it breaks complex problems down, runs experiments, reads results, and identifies blockers with real precision.

Key model capabilities

Mixture of Experts architecture: 754B total parameters with 40B activated per token
DeepSeek Sparse Attention for efficient long-context processing
Reasoning and thinking capabilities
Sustained optimization over hundreds of rounds and thousands of tool calls
Multilingual support (English and Chinese)
Function calling and tool use (file search, code interpreter)
Streaming support

Use cases

Pricing

Technical specs

Training disclosure

Distribution

More information

Quick facts

Model providerFireworks

TypeChat completion

LifecycleGenerally available (GA)

Input typetext

Output typetext

Context window202.752k

PricingView pricing

FW-GLM-5.1

About this model

Key model capabilities

Quick facts

Quick Start