FW-GLM-5

FW-GLM-5

GLM 5 is a 744B-parameter Mixture-of-Experts model targeting complex systems engineering and long-horizon agentic tasks, using DeepSeek Sparse Attention for efficient long-context processing.
Fireworks
Version: 1
Models available for use with Fireworks on Foundry deliver optimized, best-in-class performance on the Fireworks Inference Cloud. Fireworks on Foundry is a Preview subject to Azure Preview terms and the following supplemental terms: When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, and different compliance and data handling rules will apply. See documentation for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organization’s compliance requirements.

About this model

GLM-5 is Z.ai's latest model targeting complex systems engineering and long-horizon agentic tasks. It uses a Mixture of Experts (MoE) architecture with 744 billion total parameters, activating only 40 billion per token for efficient inference. Scaling up from GLM-4.7 (355B/32B), GLM-5 was pre-trained on 28.5 trillion tokens and integrates DeepSeek Sparse Attention (DSA), which selects only the most relevant tokens for attention to reduce the cost of long-context processing. GLM-5 delivers significant improvements over GLM-4.7 across reasoning, coding, and agentic tasks.

Key model capabilities

  • Mixture of Experts architecture: 744B total parameters with 40B activated per token
  • DeepSeek Sparse Attention for efficient long-context processing
  • Advanced thinking controls: interleaved, preserved, and turn-level thinking via the reasoning_history field
  • Multilingual support (English and Chinese)
  • Function calling and tool use (file search, code interpreter)
  • Streaming support

Quick facts

Model providerFireworks
TypeChat completion
LifecycleGenerally available (GA)
Input typetext
Output typetext
Context window202.752k