Cohere Embed 4
Version: 6
Cohere’s Embed 4 is a multilingual multimodal embedding model. It is capable of transforming different modalities such as images, texts, and interleaved images and texts into a single vector representation. Embed 4 offers state-of-the-art performance across all modalities (texts, images, interleaved texts and image) and in both English and multilingual settings.
Embed 4 supports a 128k context length and an images can have a maximum of 2MM pixels. Embed 4 is capable of vectorizing interleaved texts and images and capturing key visual features from screenshots of PDFs, slides, tables, figures, and more, thereby eliminating the need for complex document parsing. Embed 4 offers a variety of ways for compression both on the number of dimensions and the number-format precision. The model offers byte and binary quantization and matryoshka embeddings for further compression.
Embed-v4.0 Evaluations
The following tables showcase Embed-v4.0 Evaluations against other Embedding Models. We breakdown datasets into public/academic benchmarks as well as the dataset modality.Evaluation Datasets:
Our evaluations range from text-only, image-only, mixed-modality, and fused datasets.Generic Academic Datasets
BEIR
BEIR is a standard benchmark dataset for general-domain information retrieval. It features the following monolingual setup: English Queries to an English Corpus The domain is diverse, covering 18 tasks across areas such as fact-checking, biomedical, news, and question answering. The corpora are drawn from various sModel Specifications
Context Length131072
LicenseCustom
Last UpdatedDecember 2025
Input TypeImage,Text
Output TypeImage,Text
ProviderCohere
Languages10 Languages