FW-GLM-5.1
GLM 5.1 is zAI's next-generation flagship model for agentic engineering with 754B total parameters (40B activated), achieving state-of-the-art performance on SWE-Bench Pro and leading on NL2Repo and Terminal-Bench 2.0.
Models available for use with Fireworks on Foundry deliver optimized, best-in-class performance on the Fireworks Inference Cloud. Fireworks on Foundry is a Non-Microsoft Product. The following terms apply to a Customer's use of Fireworks on Foundry: When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, Customer Data will be sent outside of Microsoft systems, Customer Data will not be processed pursuant to any Foundry data residency documentation, and different compliance and data handling rules will apply. See Trust Center - Fireworks AI for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organization's compliance requirements.
About this model
GLM-5.1 is Z.ai's next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor GLM-5. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks). GLM-5.1 shares the same Mixture of Experts (MoE) architecture as GLM-5 with 754 billion total parameters, activating only 40 billion per token for efficient inference. It integrates DeepSeek Sparse Attention (DSA), which selects only the most relevant tokens for attention to reduce the cost of long-context processing. Unlike previous models that plateau early, GLM-5.1 is built to stay effective on agentic tasks over much longer horizons — it breaks complex problems down, runs experiments, reads results, and identifies blockers with real precision.Key model capabilities
- Mixture of Experts architecture: 754B total parameters with 40B activated per token
- DeepSeek Sparse Attention for efficient long-context processing
- Reasoning and thinking capabilities
- Sustained optimization over hundreds of rounds and thousands of tool calls
- Multilingual support (English and Chinese)
- Function calling and tool use (file search, code interpreter)
- Streaming support
Quick facts
Model providerFireworks
TypeChat completion
LifecycleGenerally available (GA)
Input typetext
Output typetext
Context window202.752k
PricingView pricing