Mercury

Mercury

Mercury is a diffusion LLM (dLLM) that generates text in parallel, yielding a 5-10X speed-up for coding workflows.
Inception
Version: 1

Mercury is a diffusion LLM that makes AI feel instant. The model's breakthrough diffusion-based generation process delivers responses 5-10x faster than traditional LLMs. As a small model, with a quality comparable to GPT 4.1 Nano and Claude 3.5 Haiku (as measured by Artificial Analysis), Mercury is ideal for speed-sensitive tasks.

Mercury supports a 128K context length, tool calling, structured outputs - plus a rich set of tuning options to tailor deployments to your needs.

Quick facts

Model providerInception
TypeChat completion, Completions, Text generation, Summarization
LifecycleGenerally available (GA)
Input typetext
Output typetext
PricingUnit price varies depending on your deployment type