AI Model Catalog | Azure AI Foundry Models

Cohere Embed 4

Version: 5

Cohere•Last updated July 2025

Embed 4 transforms texts and images into numerical vectors

Multilingual

Multimodal

Cohere’s Embed 4 is a multilingual multimodal embedding model. It is capable of transforming different modalities such as images, texts, and interleaved images and texts into a single vector representation. Embed 4 offers state-of-the-art performance across all modalities (texts, images, interleaved texts and image) and in both English and multilingual settings. Embed 4 supports a 128k context length and an images can have a maximum of 2MM pixels. Embed 4 is capable of vectorizing interleaved texts and images and capturing key visual features from screenshots of PDFs, slides, tables, figures, and more, thereby eliminating the need for complex document parsing. Embed 4 offers a variety of ways for compression both on the number of dimensions and the number-format precision. The model offers byte and binary quantization and matryoshka embeddings for further compression.

Embed-v4.0 Evaluations

The following tables showcase Embed-v4.0 Evaluations against other Embedding Models. We breakdown datasets into public/academic benchmarks as well as the dataset modality.

Evaluation Datasets:

Our evaluations range from text-only, image-only, mixed-modality, and fused datasets.

Generic Academic Datasets

BEIR

BEIR is a standard benchmark dataset for general-domain information retrieval. It features the following monolingual setup: English Queries to an English Corpus The domain is diverse, covering 18 tasks across areas such as fact-checking, biomedical, news, and question answering. The corpora are drawn from various sources including Wikipedia, scientific articles, and web forums. The queries are a mix of natural user queries, questions, and information needs, depending on the dataset. All metrics are NDCG@10

Model	Number of Dimensions	nfcorpus	scifact	arguana	scidocs	fiqa	trec-covid	webis-touche2020	quora	nq	dbpedia-entity	hotpotqa	fever	climate-fever	Average	Differential
Open AI Text Embedding Large	1536	42.07	77.77	57.99	23.07	55.00	79.56	23.36	89.05	61.27	44.76	71.58	87.94	30.28	57.21	-0.36
Cohere - Embed V3	1024	38.43	72.55	56.83	20.27	42.17	79.09	32.40	86.40	61.60	43.40	70.70	89.00	25.80	55.28	-2.28
Cohere - Embed V4	1536	40.25	77.10	57.32	20.97	54.50	69.36	33.67	89.22	68.46	46.53	73.50	84.63	32.84	57.56	0.00

Embed-v4.0 was not optimized for BEIR as it is largely saturated dataset - most of the models listed on MTEB's leaderboard have optimized instructions for BEIR datasets

MIRACL

MIRACL is a standard multilingual information retrieval dataset. It features the following monolingual setups: Multilingual Language to its respective Multilingual Language Wikipedia Corpus The domain is encyclopedic knowledge, with the corpus sourced from Wikipedia in each supported language. The queries are crowdsourced, modeled after real user search intents, and are available in both original languages and English translations for cross-lingual evaluation. We benchmarked on a subset of MIRACL with a focus on the most-popular languages and only in a monolingual setting All metrics are NDCG@10

Model	Number of Dimensions	Arabic	German	Spanish	French	Hindi	Japanese	Korean	Russian	Chinese	Average	Differential
ColQwen	Mutlivector	74.2	67.4	78.2	67.3	65.8	66.0	73.8	68.5	69.7	70.09	-10.72
GME (2bn)	1536	78.9	74.0	80.9	72.0	64.6	72.6	73.3	75.7	78.8	74.54	-6.27
GME (7bn)	3584	83.2	79.9	84.5	77.8	72.1	77.2	76.3	81.4	80.6	79.21	-1.59
Open AI Text Embedding Large	1536	83.6	81.6	86.0	80.0	65.2	79.7	74.6	82.9	81.7	79.48	-1.33
Cohere - Embed V3	1024	86.0	80.3	84.0	76.9	77.7	77.3	79.9	80.6	80.1	80.30	-0.51
Cohere - Embed V4	1024	86.1	81.5	86.5	78.3	75.5	78.4	78.3	81.2	81.5	80.81	0.00

NeuCLIR

NeuCLIR is a standard cross-lingual information retrieval dataset. It features the following cross-lingual setups: English Queries to a Chinese Corpus
English Queries to a Russian Corpus
English Queries to a Farsi Corpus The domain is news articles, with corpora sourced from multilingual news sources. The queries were created by human annotators to reflect realistic information needs across languages. All metrics are NDCG@10

Model	Number of Dimensions	English Queries / Farsi Corpus	English Queries / Russian Corpus	English Queries / Chinese Corpus	Average	Differential
ColQwen	Mutlivector	26.7	33.6	32.7	30.98	-16.33
GME (2bn)	1536	43.4	41.0	40.2	41.51	-5.80
GME (7bn)	3584	45.1	47.4	45.3	45.94	-1.37
Open AI Text Embedding Large	1536	43.7	48.4	42.1	44.75	-2.56
Cohere - Embed V3	1024	45.7	47.4	42.0	45.04	-2.27
Cohere - Embed V4	1024	48.6	49.6	43.7	47.31	0.00

ViDoRe Benchmark v2

ViDoRe Benchmark v2 is a comprehensive evaluation suite for visual document retrieval systems. It features the following multilingual and multimodal setups: The benchmark spans diverse domains, including biomedical research, economics, and environmental, social, and governance (ESG) reports. The corpora are sourced from publicly available documents such as academic papers, government reports, and industry publications. Queries are generated through a hybrid approach of synthetic generation and human-in-the-loop refinement, ensuring they reflect realistic and complex information needs.

Axa Insurance contains Axa Group Insurance Policies documents in French and the queries are in the following languages: English, French, German, and Spanish
MiT Biomedical contains MiT Anatomy Course Lecture Slides in English and the queries are in the following languages: English, French, German, and Spanish
RSE Restuarant contains ESG reports from companies in the Fast Casual / Quick Serve Industry in English and the queries are in the following languages: English, French, German, and Spanish
Synthetic Macro contains World Economic reports in Englishand the queries are in the following languages: English, French, German, and Spanish

ViDoRe Benchmark v2 is designed to evaluate the performance of retrieval models in handling visually rich documents across multiple languages and domains. It includes datasets like the MIT Tissue Interaction dataset, World Economic Reports, and ESG Reports, each presenting unique challenges in visual and multilingual retrieval. All metrics are NDCG@10

Model	Number of Dimensions	Axa Insurance	MiT BioMedical	RSE Restuarant	Synthetic Macro	Average	Differential
ColQwen	Multivector	52.0	53.6	46.7	48.5	50.20	-5.91
GME (2bn)	1536	59.6	54.3	54.4	54.0	55.59	-0.51
GME (7bn)	3584	59.9	53.3	47.4	53.7	53.58	-2.53
Cohere - Embed V3	1024	52.7	55.3	54.5	50.1	53.15	-2.96
Cohere - Embed V4	1536	63.9	58.3	53.3	48.9	56.11	0.00

Industry Specific Datasets:

MPMQA

MPMQA is a specialized information retrieval dataset focused on product manuals. It features the following monolingual setup: English Queries to an English Product Manual Corpus The domain is technical support and product documentation, with the corpus sourced from real-world manuals across various consumer electronics and appliances. The queries are natural language questions derived from customer support scenarios, reflecting realistic user information needs. All metrics are NDCG@10

Model	Number of Dimensions	MPMQA	Differential
ColQwen	Mutlivector	67.1	-6.48
GME (2bn)	1536	55.9	-17.69
GME (7bn)	3584	53.5	-20.06
Open AI Text Embedding Large	1536	58.5	-15.07
Cohere - Embed V3	1024	56.7	-16.90
Cohere - Embed V4	1024	73.6	0.00

BioASQ

BioASQ is a standard biomedical information retrieval dataset. It features the following monolingual setup: English Queries to English Biomedical Corpus The domain is biomedical research, with the corpus sourced from PubMed abstracts. The queries are expert-annotated, derived from real biomedical questions. All metrics are NDCG@10

Model	Number of Dimensions	MPMQA NDCG@10	Differential
ColQwen	Mutlivector	72.8	2.46
GME (2bn)	1536	46.4	-23.99
GME (7bn)	3584	49.7	-20.66
Open AI Text Embedding Large	1536	61.1	-9.24
Cohere - Embed V3	1024	44.5	-25.89
Cohere - Embed V4	1024	70.4	0.00

Finance Bench

FinanceBench is a benchmark dataset designed for evaluating large language models (LLMs) in financial question answering. It features the following monolingual setup: English Queries to English Financial Documents (SEC Filings such as 10Ks, 10Qs, 8K Filings) The domain encompasses financial data, with the corpus containing information about publicly traded companies. The dataset comprises 10,231 questions that are ecologically valid, covering a diverse set of scenarios related to financial question answering. These questions are intended to be clear-cut and straightforward to answer, serving as a minimum performance standard for LLMs. All metrics are NDCG@10

Model	Number of Dimensions	FinanceBench
ColQwen	Multivector	14.4
GME (2bn)	1536	25.0
GME (7bn)	3584	37.1
Open AI Text Embedding Large	1536	34.1
Cohere - Embed V3	1024	32.8
Cohere - Embed V4	1536	61.4

Model Specifications

Context Length131072

LicenseCustom

Last UpdatedJuly 2025

Input TypeImage,Text

Output TypeImage,Text

PublisherCohere

Languages10 Languages

Quick Start