Fireworks
Fireworks
Total Models: 9
FW-MiniMax-M2.5
FW-MiniMax-M2.5

MiniMax M2.5 is a Mixture-of-Experts model built for state-of-the-art coding, agentic tool use, and search, trained with reinforcement learning across hundreds of thousands of real-world environments.

chat-completion
FW-GLM-5
FW-GLM-5

GLM 5 is a 744B-parameter Mixture-of-Experts model targeting complex systems engineering and long-horizon agentic tasks, using DeepSeek Sparse Attention for efficient long-context processing.

chat-completion
FW-GPT-OSS-120B
FW-GPT-OSS-120B

gpt-oss-120b is an open-weight Mixture-of-Experts model from OpenAI with 117B total parameters, designed for powerful reasoning, agentic tasks, and production-grade general-purpose use cases.

chat-completion
FW-Kimi-K2.5
FW-Kimi-K2.5

Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base.

chat-completion
FW-DeepSeek-V3.2
FW-DeepSeek-V3.2

DeepSeek V3.2 is a 675.2B-parameter Mixture-of-Experts model that combines high computational efficiency with superior reasoning and agent performance, supporting a 163.8K token context window.

chat-completion
FW-GLM-4.7
FW-GLM-4.7

GLM 4.7 is a general-purpose model optimized for coding, reasoning, and agentic workflows, featuring advanced thinking controls with interleaved, preserved, and turn-level thinking modes.

chat-completion
FW-Kimi-K2-Instruct-0905
FW-Kimi-K2-Instruct-0905

Kimi K2 Instruct is an updated Mixture-of-Experts model with 1 trillion total parameters featuring improved coding abilities, agentic tool use, and an extended 256K token context window.

chat-completion
FW-Kimi-K2-Thinking
FW-Kimi-K2-Thinking

Kimi K2 Thinking is an open-source thinking model that reasons step-by-step while dynamically invoking tools, with native INT4 quantization for lossless reductions in inference latency and memory usage.

chat-completion
FW-DeepSeek-V3.1
FW-DeepSeek-V3.1

DeepSeek V3.1 is a 685B-parameter Mixture-of-Experts model with dual-mode thinking and non-thinking chat, featuring two-phase long context extension up to 163K tokens.

chat-completion