Mercury

Mercury is a diffusion LLM (dLLM) that generates text in parallel, yielding a 5-10X speed-up for coding workflows.

Inception

Version: 1

Mercury is a diffusion LLM that makes AI feel instant. The model's breakthrough diffusion-based generation process delivers responses 5-10x faster than traditional LLMs. As a small model, with a quality comparable to GPT 4.1 Nano and Claude 3.5 Haiku (as measured by Artificial Analysis), Mercury is ideal for speed-sensitive tasks.

Mercury supports a 128K context length, tool calling, structured outputs - plus a rich set of tuning options to tailor deployments to your needs.

Quick facts

Model providerInception

TypeChat completion, Completions, Text generation, Summarization

LifecycleGenerally available (GA)

Input typetext

Output typetext

PricingUnit price varies depending on your deployment type

Mercury

Quick facts

Quick start