Cohere Command R 08-2024

Version: 1

Cohere•Last updated August 2025

Command R is a scalable generative model targeting RAG and Tool Use to enable production-scale AI for enterprise.

RAG

Multilingual

Models from Microsoft, Partners, and Community

Models from Microsoft, Partners, and Community models are a select portfolio of curated models both general-purpose and niche models across diverse scenarios by developed by Microsoft teams, partners, and community contributors

Managed by Microsoft: Purchase and manage models directly through Azure with a single license, world class support and enterprise grade Azure infrastructure
Validated by providers: Each model is validated and maintained by its respective provider, with Azure offering integration and deployment guidance.
Innovation and agility: Combines Microsoft research models with rapid, community-driven advancements.
Seamless Azure integration: Standard Azure AI Foundry experience, with support managed by the model provider.
Flexible deployment: Deployable as Managed Compute or Serverless API, based on provider preference.

Learn more about models from Microsoft, Partners, and Community

Key capabilities

About this model

Command R 08-2024 is a highly performant generative large language model, optimized for a variety of use cases including reasoning, summarization, and question answering.

Key model capabilities

Command R 08-2024 has been specifically trained with conversational tool use capabilities. These have been trained into the model via a mixture of supervised fine-tuning and preference fine-tuning, using a specific prompt template. Deviating from this prompt template will likely reduce performance, but we encourage experimentation. Command R's tool use functionality takes a conversation as input (with an optional user-system preamble), along with a list of available tools. The model will then generate a json-formatted list of actions to execute on a subset of those tools. Command R may use one of its supplied tools more than once. The model has been trained to recognise a special directly_answer tool, which it uses to indicate that it doesn't want to use any of its other tools. The ability to abstain from calling a specific tool can be useful in a range of situations, such as greeting a user, or asking clarifying questions. We recommend including the directly_answer tool, but it can be removed or renamed if required. Command R 08-2024 has been specifically trained with grounded generation capabilities. This means that it can generate responses based on a list of supplied document snippets, and it will include grounding spans (citations) in its response indicating the source of the information. This can be used to enable behaviors such as grounded summarization and the final step of Retrieval Augmented Generation (RAG).This behavior has been trained into the model via a mixture of supervised fine-tuning and preference fine-tuning, using a specific prompt template. Deviating from this prompt template may reduce performance, but we encourage experimentation. Command R's grounded generation behavior takes a conversation as input (with an optional user-supplied system preamble, indicating task, context and desired output style), along with a list of retrieved document snippets. The document snippets should be chunks, rather than long documents, typically around 100-400 words per chunk. Document snippets consist of key-value pairs. The keys should be short descriptive strings, the values can be text or semi-structured. By default, Command R will generate grounded responses by first predicting which documents are relevant, then predicting which ones it will cite, then generating an answer. Finally, it will then insert grounding spans into the answer. See below for an example. This is referred to as accurate grounded generation. The model is trained with a number of other answering modes, which can be selected by prompt changes . A fast citation mode is supported in the tokenizer, which will directly generate an answer with grounding spans in it, without first writing the answer out in full. This sacrifices some grounding accuracy in favor of generating fewer tokens. Command R 08-2024 has been optimized to interact with your code, by requesting code snippets, code explanations, or code rewrites. It might not perform well out-of-the-box for pure code completion. For better performance, we also recommend using a low temperature (and even greedy decoding) for code-generation related instructions. Structured Outputs ensures outputs from Cohere's Command R 08-2024 model adheres to a user-defined response format. It supports JSON response format, including user-defined JSON schemas. This enables developers to reliably and consistently generate model outputs for programmatic usage and reliable function calls. Some examples include extracting data, formulating queries, and displaying model outputs in the UI.

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

Command R 08-2024 is a highly performant generative large language model, optimized for a variety of use cases including reasoning, summarization, and question answering.

Out of scope use cases

Prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to detect and prevent the output of harmful content. Learn more about Azure AI Content Safety . Configuration options for content filtering vary when you deploy a model for production in Azure AI; learn more.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

This is an auto-regressive language model that uses an optimized transformer architecture. After pretraining, this model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety.