Overview
NVIDIA’s Nemotron family supplies open‑weight reasoning models tuned for agentic workflows. The lineup scales from Nemotron Nano (8 B) for edge devices to Nemotron Ultra (253 B)—currently the top open model on reasoning leaderboards—while keeping permissive licensing.Key NVIDIA Models (July 2025)
- Nemotron Nano 8B – FP8 inference for mobile and IoT scenarios.
- Nemotron Super 49B – Balanced accuracy vs. single‑GPU cost.
- Nemotron Ultra 253B – State‑of‑the‑art open reasoning model with 128 K context.
Why NVIDIA on Azure
Combine Nemotron with Azure GPU SKUs, Triton inference, and ML ops tooling to build high‑throughput agents without commercial licensing hurdles.Model Overview Description Nemotron38BSteerLM is an 8 billion parameter generative language model based on the NVIDIA 8B GPT base model. It has been customized using the SteerLM Method developed by NVIDIA to allow for user control of model outputs during
Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instructiontuned textonly model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed c
Description Microsoft TRELLIS 3D is an asset generation model capable of producing detailed meshes directly from text prompts or images. With multiple size variants, TRELLIS offers options for users aiming to maximize quality and/or speed. This model is ready for noncommercial/commercial use
NVIDIANemotronParseNIMmicroservice is a general purpose textextraction model, specifically designed to handle documents. Given an image, nemotronparse is able to extract formattedtext, with boundingboxes and the corresponding semantic class. This has downstream benefits for several tasks
NVIDIA NeMo™ Retriever Llama3.2 embedding model is optimized for multilingual and crosslingual text questionanswering retrieval with support for long documents (up to 8192 tokens) and dynamic embedding size (Matryoshka Embeddings). This model was evaluated on 26 languages: English, Arabic, Ben
RFdiffusion (RoseTTAFold Diffusion) is a generative model that creates novel protein structures for protein scaffolding and protein binder design tasks. This model generates entirely new protein backbones and designs proteins that can be specifically tailored to bind to target molecules. It's a
DeepSeek AI has developed a range of distilled models based on Meta's Llama architectures, with sizes spanning from 1.5 to 70 billion parameters, starting from the foundation of DeepSeekR1. This distillation process involves training smaller models to replicate the behavior and reasoning of the lar
Versatile vision-language model for querying and summarizing images and video, deployable from data center to edge (via AWQ 4-bit TinyChat), with key findings that interleaved image-text, LLM unfreezing, and re-blended text-only data are essential for stro
Advanced LLM for reasoning, math, general knowledge, and function calling.
MSA Search NIM supports GPUaccelerated Multiple Sequence Alignment (MSA) of a query amino acid sequence against a set of protein sequence databases. These databases are searched for similar sequences to the query and then the collection of sequences are aligned to establish similar regions even
Description Boltz2 NIM is a nextgeneration structural biology foundation model that shows strong performance for both structure and affinity prediction. Boltz2 is the first deep learning model to approach the accuracy of free energy perturbation (FEP) methods in predicting binding affiniti
Mixtral 8x7B Instruct is a language model that can follow instructions, complete requests, and generate creative text formats. Mixtral 8x7B a highquality sparse mixture of experts model (SMoE) with open weights. This model has been optimized through supervised finetuning and direct preference
Model Overview Description Nemotron38BSteerLM is an 8 billion parameter generative language model based on the NVIDIA 8B GPT base model. It has been customized using the SteerLM Method developed by NVIDIA to allow for user control of model outputs during
Description NVIDIA Cosmos Reason – an open, customizable, 7Bparameter reasoning vision language model (VLM) for physical AI and robotics enables robots and vision AI agents to reason like humans, using prior knowledge, physics understanding and common sense to understand and act in the re
Description Evo 2 is a biological foundation model that is able to integrate information over long genomic sequences while retaining sensitivity to singlenucleotide change. At 40 billion parameters, the model understands the genetic code for all domains of life and is the largest AI model fo
Llama3.3NemotronSuper49Bv1.5 is a significantly upgraded version of Llama3.3NemotronSuper49Bv1 and is a large language model (LLM) which is a derivative of Meta Llama3.370BInstruct (AKA the reference model). It is a reasoning model that is post trained for reasoning, human chat pre
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). These models are optimized for multilingual dialogue use cases and outperform many of the available open
Description The NVIDIA Nemotron Nano 12B v2 VL NIM microservice enables multiimage reasoning and video understanding, along with strong document intelligence, visual Q&A and summarization capabilities. This model is ready for commercial use. Nemotron Nano 12B V2 VL is a model for multimod
Mistral7BInstructv0.3 is a language model that can follow instructions, complete requests, and generate creative text formats. It is an instruct version of the Mistral7Bv0.3 generative text model finetuned using a variety of publicly available conversation datasets. This model is ready for
NVIDIA NeMo™ Retriever Llama3.2 reranking model is optimized for providing a logit score that represents how relevant a document(s) is to a given query. The model was finetuned for multilingual, crosslingual text questionanswering retrieval, with support for long documents (up to 8192 tokens)
Openfold2 is a protein structure prediction model from the OpenFold Consortium and the Alquraishi Laboratory. The model is a PyTorch reimplementation of Google Deepmind’s AlphaFold2, with support fo
Model Overview Description Nemotron38BBase4k is a large language foundation model for enterprises to build custom LLMs. This foundation model has 8 billion parameters, and supports a context length of 4,096 tokens. Nemotron38BBase4k is part of Nemotron3, which is a family of enterpri
Advanced LLM for reasoning, math, general knowledge, and function calling.
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then conc
Model Overview Description Nemotron38BChat4kRLHF is a large language model instructtuned on an 8B base model. It takes input with context length up to 4,096 tokens.The model has been further finetuned for instruction following using Reinforcement Learning from Human Feedback (RLHF). N
Model Overview Description Nemotron38BChat4kSFT is a large language model instructtuned on an 8B base model. It takes input with context length up to 4,096 tokens. The model has been further finetuned for instruction following using [Supervised Finetuning](https://developer.nvidia.com
Model Overview Description Nemotron38BQA4k is a 8 billion parameter generative language model customized on an 8B base model. It takes input with context length up to 4,096 tokens. The model has been further finetuned for instruction following Supervised Finetuning (SFT) using a method
ProteinMPNN (Protein Message Passing Neural Network) is a cuttingedge, deep learningbased, graph neural network designed to predict amino acid sequences for given protein backbones. This network leverages evolutionary, functional, and structural information to generate sequences that are like