GLM 5
GLM 5
Version: 1
Fireworks•Last updated April 2026
GLM 5 is a 744B-parameter Mixture-of-Experts model targeting complex systems engineering and long-horizon agentic tasks, using DeepSeek Sparse Attention for efficient long-context processing.
Coding
Agents

Fireworks on Foundry

Models available for use with Fireworks on Foundry deliver optimized, best-in-class performance on the Fireworks Inference Cloud. Fireworks on Foundry is a Preview subject to Azure Preview terms and the following supplemental terms: When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, and different compliance and data handling rules will apply. See documentation for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organization’s compliance requirements.

Key capabilities

About this model

GLM-5 is Z.ai's latest model targeting complex systems engineering and long-horizon agentic tasks. It uses a Mixture of Experts (MoE) architecture with 744 billion total parameters, activating only 40 billion per token for efficient inference. Scaling up from GLM-4.7 (355B/32B), GLM-5 was pre-trained on 28.5 trillion tokens and integrates DeepSeek Sparse Attention (DSA), which selects only the most relevant tokens for attention to reduce the cost of long-context processing. GLM-5 delivers significant improvements over GLM-4.7 across reasoning, coding, and agentic tasks.

Key model capabilities

  • Mixture of Experts architecture: 744B total parameters with 40B activated per token
  • DeepSeek Sparse Attention for efficient long-context processing
  • Advanced thinking controls: interleaved, preserved, and turn-level thinking via the reasoning_history field
  • Multilingual support (English and Chinese)
  • Function calling and tool use (file search, code interpreter)
  • Streaming support

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

  • Complex systems engineering
  • Long-horizon agentic tasks
  • Coding and multilingual software engineering
  • Reasoning and complex problem solving
  • Document generation for enterprise workloads

Out of scope use cases

The provider has not supplied this information.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

The provider has not supplied this information.

Training cut-off date

The provider has not supplied this information.

Training time

The provider has not supplied this information.

Input formats

Text

Output formats

Text

Supported languages

English, Chinese

Sample JSON response

The provider has not supplied this information.

Model architecture

GLM-5 is a Mixture of Experts (MoE) language model developed by Z.ai. It uses DeepSeek Sparse Attention for efficient long-context processing.
PropertyValue
ArchitectureMixture of Experts (MoE) with DeepSeek Sparse Attention
Total Parameters744B
Activated Parameters40B
Number of Experts256
Selected Experts per Token8
Number of Layers (Dense layer included)78
Number of Dense Layers3
Number of Shared Experts1
Number of Attention Heads64
Context Length202,752
Vocabulary Size154,880

Long context

Context Length: 202,752 tokens

Optimizing model performance

The provider has not supplied this information.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

The provider has not supplied this information.

Distribution

Distribution channels

The provider has not supplied this information.

More information

The provider has not supplied this information.
Model Specifications
Context Length202752
LicenseOther
Last UpdatedApril 2026
Input TypeText
Output TypeText
ProviderFireworks
Languages2 Languages