GLM 5
Version: 1
Fireworks on Foundry
Models available for use with Fireworks on Foundry deliver optimized, best-in-class performance on the Fireworks Inference Cloud. Fireworks on Foundry is a Preview subject to Azure Preview terms and the following supplemental terms: When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, and different compliance and data handling rules will apply. See documentation for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organization’s compliance requirements.Key capabilities
About this model
GLM-5 is Z.ai's latest model targeting complex systems engineering and long-horizon agentic tasks. It uses a Mixture of Experts (MoE) architecture with 744 billion total parameters, activating only 40 billion per token for efficient inference. Scaling up from GLM-4.7 (355B/32B), GLM-5 was pre-trained on 28.5 trillion tokens and integrates DeepSeek Sparse Attention (DSA), which selects only the most relevant tokens for attention to reduce the cost of long-context processing. GLM-5 delivers significant improvements over GLM-4.7 across reasoning, coding, and agentic tasks.Key model capabilities
- Mixture of Experts architecture: 744B total parameters with 40B activated per token
- DeepSeek Sparse Attention for efficient long-context processing
- Advanced thinking controls: interleaved, preserved, and turn-level thinking via the
reasoning_historyfield - Multilingual support (English and Chinese)
- Function calling and tool use (file search, code interpreter)
- Streaming support
Use cases
See Responsible AI for additional considerations for responsible use.Key use cases
- Complex systems engineering
- Long-horizon agentic tasks
- Coding and multilingual software engineering
- Reasoning and complex problem solving
- Document generation for enterprise workloads
Out of scope use cases
The provider has not supplied this information.Pricing
Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.Technical specs
The provider has not supplied this information.Training cut-off date
The provider has not supplied this information.Training time
The provider has not supplied this information.Input formats
TextOutput formats
TextSupported languages
English, ChineseSample JSON response
The provider has not supplied this information.Model architecture
GLM-5 is a Mixture of Experts (MoE) language model developed by Z.ai. It uses DeepSeek Sparse Attention for efficient long-context processing.| Property | Value |
|---|---|
| Architecture | Mixture of Experts (MoE) with DeepSeek Sparse Attention |
| Total Parameters | 744B |
| Activated Parameters | 40B |
| Number of Experts | 256 |
| Selected Experts per Token | 8 |
| Number of Layers (Dense layer included) | 78 |
| Number of Dense Layers | 3 |
| Number of Shared Experts | 1 |
| Number of Attention Heads | 64 |
| Context Length | 202,752 |
| Vocabulary Size | 154,880 |
Long context
Context Length: 202,752 tokensOptimizing model performance
The provider has not supplied this information.Additional assets
The provider has not supplied this information.Training disclosure
Training, testing and validation
The provider has not supplied this information.Distribution
Distribution channels
The provider has not supplied this information.More information
The provider has not supplied this information.Model Specifications
Context Length202752
LicenseOther
Last UpdatedApril 2026
Input TypeText
Output TypeText
ProviderFireworks
Languages2 Languages