Llama-3.2-NV-embedqa-1b-v2-NIM-microservice
Version: 2
NVIDIA NeMo™ Retriever Llama3.2 embedding model is optimized for multilingual and cross-lingual text question-answering retrieval with support for long documents (up to 8192 tokens) and dynamic embedding size (Matryoshka Embeddings). This model was evaluated on 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish.
In addition to enabling multilingual and cross-lingual question-answering retrieval, this model reduces the data storage footprint by 35x through dynamic embedding sizing and support for longer token length, making it feasible to handle large-scale datasets efficiently. This model is ready for commercial use.
The Llama 3.2 1b embedding model is a part of the NVIDIA NeMo Retriever collection of NIM, included in NVIDIA AI Enterprise , which provide state-of-the-art, commercially-ready models and microservices, optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can also readily customize them for domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants.
NVIDIA AI Enterprise
NVIDIA AI Enterprise is an end-to-end, cloud-native software platform that accelerates data science pipelines and streamlines development and deployment of production-grade co-pilots and other generative AI applications. Easy-to-use microservices provide optimized model performance with enterprise-grade security, support, and stability to ensure a smooth transition from prototype to production for enterprises that run their businesses on AI.
NVIDIA AI Enterprise is an end-to-end, cloud-native software platform that accelerates data science pipelines and streamlines development and deployment of production-grade co-pilots and other generative AI applications. Easy-to-use microservices provide optimized model performance with enterprise-grade security, support, and stability to ensure a smooth transition from prototype to production for enterprises that run their businesses on AI.
Intended Use
Primary Use Cases
The NeMo Retriever Llama3.2 embedding model is most suitable for users who want to build a multilingual question-and-answer application over a large text corpus, leveraging the latest dense retrieval technologies.Responsible AI Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.Training Data
The development of large-scale public open-QA datasets has enabled tremendous progress in powerful embedding models. However, one popular dataset named MS MARCO restricts commercial licensing, limiting the use of these models in commercial settings. To address this, NVIDIA created its own training dataset blend based on public QA datasets, which each have a license for commercial applications. Semi-supervised pre-training on 12M samples from public datasets and fine-tuning on 1M samples from public datasets.We evaluated the NeMo Retriever embedding model in comparison to literature open & commercial retriever models on academic benchmarks for question-answering - NQ, HotpotQA and FiQA (Finance Q&A) from BeIR benchmark and TechQA dataset. Note that the model was evaluated offline on A100 GPUs using the model's PyTorch checkpoint. In this benchmark, the metric used was Recall@5.
We evaluated the multilingual capabilities on the academic benchmark MIRACL across 15 languages and translated the English and Spanish version of MIRACL into additional 11 languages. The reported scores are based on an internal version of MIRACL by selecting hard negatives for each query to reduce.
We evaluated the cross-lingual capabilities on the academic benchmark MLQA based on 7 languages (Arabic, Chinese, English, German, Hindi, Spanish, Vietnamese). We consider only evaluation datasets when the query and documents are in different languages. We calculate the average Recall@5 across the 42 different language pairs.
The evaluation datasets are based on MTEB/BEIR, TextQA, TechQA, MIRACL, MLQA, and MLDR. The size ranges between 10,000s up to 5M depending on the dataset.
Llama 3.2 NVEmbedQA 1B-v2 NIM is optimized to run best on the following compute:
Model | Average Recall@5 |
---|---|
llama-3.2-nv-embedqa-1b-v2 (embedding dim 2048) | 68.60% |
llama-3.2-nv-embedqa-1b-v2 (embedding dim 384) | 64.48% |
llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048) | 68.97% |
nv-embedqa-mistral-7b-v2 | 72.97% |
nv-embedqa-mistral-7B-v1 | 64.93% |
nv-embedqa-e5-v5 | 62.07% |
nv-embedqa-e5-v4 | 57.65% |
e5-large-unsupervised | 48.03% |
BM25 | 44.67% |
Model | Average Recall@5 |
---|---|
llama-3.2-nv-embedqa-1b-v2 (embedding dim 2048) | 60.75% |
llama-3.2-nv-embedqa-1b-v2 (embedding dim 384) | 58.62% |
llama-3.2-nv-embedqa-1b-v1 | 60.07% |
nv-embedqa-mistral-7b-v2 | 50.42% |
BM25 | 26.51% |
Model | Average Recall@5 |
---|---|
llama-3.2-nv-embedqa-1b-v2 (embedding dim 2048) | 79.86% |
llama-3.2-nv-embedqa-1b-v2 (embedding dim 384) | 71.61% |
llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048) | 78.77% |
nv-embedqa-mistral-7b-v2 | 68.38% |
BM25 | 13.01% |
GPU | Total GPU memory | Azure VM compute | #GPUs on VM | Link |
---|---|---|---|---|
A100 | 80 | Standard_NC24ads_A100_v4 | 1 | link |
A100 | 160 | Standard_NC48ads_A100_v4 | 2 | link |
A100 | 320 | Standard_NC96ads_A100_v4 | 4 | link |
A100 | 640 | STANDARD_ND96AMSR_A100_V4 | 8 | link |
Model Specifications
LicenseCustom
Last UpdatedMarch 2025
PublisherNvidia