AI Model Catalog | Azure AI Foundry Models

Deepseek-R1-Distill-Llama-8B-NIM-microservice

Version: 2

Nvidia•Last updated June 2025

DeepSeek AI has developed a range of distilled models based on Meta's Llama architectures, with sizes spanning from 1.5 to 70 billion parameters, starting from the foundation of DeepSeek-R1. This distillation process involves training smaller models to replicate the behavior and reasoning of the larger 671 billion parameter DeepSeek-R1 model, effectively transferring its knowledge into more compact forms. The resulting models, including DeepSeek-R1-Distill-Llama-8B (derived from Llama-3.1-8B) and DeepSeek-R1-Distill-Llama-70B (from Llama-3.3-70B-Instruct), offer varying balances between performance and resource usage. While these distilled models may exhibit slightly reduced reasoning capabilities compared to the original 671B model, they significantly enhance inference speed and lower computational costs. For example, smaller models like the 8B version process requests more quickly and use fewer resources, making them more cost-effective for production use. This NIM efficiently deploys the distilled Llama 3.1 8B variant of DeepSeek R1 models on NVIDIA GPUs. DeepSeek R1 Distill Llama 3.1 8B is available as an NVIDIA NIM™ microservice, part of https://www.nvidia.com/en-us/data-center/products/ai-enterprise/ . NVIDIA NIM offers prebuilt containers for large language models (LLMs) that can be used to develop chatbots, content analyzers—or any application that needs to understand and generate human language. Each NIM consists of a container and a model and uses a CUDA-accelerated runtime for all NVIDIA GPUs, with special optimizations available for many configurations. NVIDIA AI Enterprise
NVIDIA AI Enterprise is an end-to-end, cloud-native software platform that accelerates data science pipelines and streamlines development and deployment of production-grade co-pilots and other generative AI applications. Easy-to-use microservices provide optimized model performance with enterprise-grade security, support, and stability to ensure a smooth transition from prototype to production for enterprises that run their businesses on AI.

Intended Use

Primary Use Cases

DeepSeek R1 Distill Llama 3.1 8B NIM can be used for text-to-text conversations, reasoning, agentic tool calling. DeepSeek-R1 distilled models retain much of the original model's reasoning capabilities, making them suitable for tasks that require logical reasoning, such as math, code generation, and complex problem-solving. Additionally, here are the usecases for this NIM:

Orchestrator for agentic flows
Chatbots & Virtual Assistants: Empower bots with human-like language understanding and responsiveness.
Content Generation & Summarization: Generate high-quality content or distill lengthy articles into concise summaries with ease.
Sentiment Analysis: Understand user sentiments in real-time, driving better business decisions.
Language Translation: Break language barriers with efficient and accurate translation services.
And many more… The potential applications of NVIDIA NIM for LLMs are vast, spanning across various industries and use-cases.

Responsible AI Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Model Limitations: The base model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.

Training Data

Overview: Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples. Data Freshness: The pretraining data has a cutoff of December 2023.

DeepSeek R1 Distill Llama 3.1 8B NIM is optimized to run best on the following compute:

GPU	Total GPU memory	Azure VM compute	#GPUs on VM	Link
A100	80	Standard_NC24ads_A100_v4	1	https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nca100v4-series?tabs=sizeaccelerators
A100	160	Standard_NC48ads_A100_v4	2	https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nca100v4-series?tabs=sizeaccelerators
A100	320	Standard_NC96ads_A100_v4	4	https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nca100v4-series?tabs=sizeaccelerators
A100	640	STANDARD_ND96AMSR_A100_V4	8	https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/ndma100v4-series?tabs=sizeaccelerators
H100	94	STANDARD_NC40ADS_H100_V5	1	https://learn.microsoft.com/en-us/azure/virtual-machines/ncads-h100-v5
H100	188	STANDARD_NC80ADIS_H100_V5	2	https://learn.microsoft.com/en-us/azure/virtual-machines/ncads-h100-v5

Model Specifications

LicenseCustom

Last UpdatedJune 2025

PublisherNvidia

Quick Start