Nvidia
NvidiaOffers GPU-optimized models and tools for high-performance AI applications across various domains.

Overview

NVIDIA’s Nemotron family supplies open‑weight reasoning models tuned for agentic workflows. The lineup scales from Nemotron Nano (8 B) for edge devices to Nemotron Ultra (253 B)—currently the top open model on reasoning leaderboards—while keeping permissive licensing.

Key NVIDIA Models (July 2025)

  • Nemotron Nano 8B – FP8 inference for mobile and IoT scenarios.
  • Nemotron Super 49B – Balanced accuracy vs. single‑GPU cost.
  • Nemotron Ultra 253B – State‑of‑the‑art open reasoning model with 128 K context.

Why NVIDIA on Azure

Combine Nemotron with Azure GPU SKUs, Triton inference, and ML ops tooling to build high‑throughput agents without commercial licensing hurdles.
Total Models: 28
Nemotron-3-8B-Chat-4k-SteerLM
Nemotron-3-8B-Chat-4k-SteerLM

Model Overview Description Nemotron38BSteerLM is an 8 billion parameter generative language model based on the NVIDIA 8B GPT base model. It has been customized using the SteerLM Method developed by NVIDIA to allow for user control of model outputs during

text-generation
Llama-3.3-70B-Instruct-NIM-microservice
Llama-3.3-70B-Instruct-NIM-microservice

Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instructiontuned textonly model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed c

chat-completion
Trellis-NIM-microservice
Trellis-NIM-microservice

Description Microsoft TRELLIS 3D is an asset generation model capable of producing detailed meshes directly from text prompts or images. With multiple size variants, TRELLIS offers options for users aiming to maximize quality and/or speed. This model is ready for noncommercial/commercial use

image-to-3D
text-to-3D
3D-generation
NVIDIA-Nemotron-Parse-NIM-microservice
NVIDIA-Nemotron-Parse-NIM-microservice

NVIDIANemotronParseNIMmicroservice is a general purpose textextraction model, specifically designed to handle documents. Given an image, nemotronparse is able to extract formattedtext, with boundingboxes and the corresponding semantic class. This has downstream benefits for several tasks

DocumentAnalysis
Llama-3.2-NV-embedqa-1b-v2-NIM-microservice
Llama-3.2-NV-embedqa-1b-v2-NIM-microservice

NVIDIA NeMo™ Retriever Llama3.2 embedding model is optimized for multilingual and crosslingual text questionanswering retrieval with support for long documents (up to 8192 tokens) and dynamic embedding size (Matryoshka Embeddings). This model was evaluated on 26 languages: English, Arabic, Ben

embeddings
Rfdiffusion-NIM-microservice
Rfdiffusion-NIM-microservice

RFdiffusion (RoseTTAFold Diffusion) is a generative model that creates novel protein structures for protein scaffolding and protein binder design tasks. This model generates entirely new protein backbones and designs proteins that can be specifically tailored to bind to target molecules. It's a

protein-binder
Deepseek-R1-Distill-Llama-8B-NIM-microservice
Deepseek-R1-Distill-Llama-8B-NIM-microservice

DeepSeek AI has developed a range of distilled models based on Meta's Llama architectures, with sizes spanning from 1.5 to 70 billion parameters, starting from the foundation of DeepSeekR1. This distillation process involves training smaller models to replicate the behavior and reasoning of the lar

chat-completion
Llama-3.1-Nemotron-Nano-VL-8B-v1-NIM-microservice
Llama-3.1-Nemotron-Nano-VL-8B-v1-NIM-microservice

Versatile vision-language model for querying and summarizing images and video, deployable from data center to edge (via AWQ 4-bit TinyChat), with key findings that interleaved image-text, LLM unfreezing, and re-blended text-only data are essential for stro

image-classification
image-to-text
summarization
visual-question-answering
zero-shot-image-classification
Llama-3.3-Nemotron-Super-49B-v1-NIM-microservice
Llama-3.3-Nemotron-Super-49B-v1-NIM-microservice

Advanced LLM for reasoning, math, general knowledge, and function calling.

chat-completion
MSA-search-NIM-microservice
MSA-search-NIM-microservice

MSA Search NIM supports GPUaccelerated Multiple Sequence Alignment (MSA) of a query amino acid sequence against a set of protein sequence databases. These databases are searched for similar sequences to the query and then the collection of sequences are aligned to establish similar regions even

protein-binder
Boltz2-NIM-microservice
Boltz2-NIM-microservice

Description Boltz2 NIM is a nextgeneration structural biology foundation model that shows strong performance for both structure and affinity prediction. Boltz2 is the first deep learning model to approach the accuracy of free energy perturbation (FEP) methods in predicting binding affiniti

Structure-Prediction
Mixtral-8x7B-Instruct-v0.1-NIM-microservice
Mixtral-8x7B-Instruct-v0.1-NIM-microservice

Mixtral 8x7B Instruct is a language model that can follow instructions, complete requests, and generate creative text formats. Mixtral 8x7B a highquality sparse mixture of experts model (SMoE) with open weights. This model has been optimized through supervised finetuning and direct preference

chat-completion
Nemotron-3-8B-Chat-SteerLM
Nemotron-3-8B-Chat-SteerLM

Model Overview Description Nemotron38BSteerLM is an 8 billion parameter generative language model based on the NVIDIA 8B GPT base model. It has been customized using the SteerLM Method developed by NVIDIA to allow for user control of model outputs during

text-generation
Cosmos-reason1-NIM-microservice
Cosmos-reason1-NIM-microservice

Description NVIDIA Cosmos Reason – an open, customizable, 7Bparameter reasoning vision language model (VLM) for physical AI and robotics enables robots and vision AI agents to reason like humans, using prior knowledge, physics understanding and common sense to understand and act in the re

task-completion-verification
action-affordance
next-plausible-action-prediction
Evo2-40b-NIM-microservice
Evo2-40b-NIM-microservice

Description Evo 2 is a biological foundation model that is able to integrate information over long genomic sequences while retaining sensitivity to singlenucleotide change. At 40 billion parameters, the model understands the genetic code for all domains of life and is the largest AI model fo

Genomics
Llama-3.3-Nemotron-Super-49B-v1.5-NIM-microservice
Llama-3.3-Nemotron-Super-49B-v1.5-NIM-microservice

Llama3.3NemotronSuper49Bv1.5 is a significantly upgraded version of Llama3.3NemotronSuper49Bv1 and is a large language model (LLM) which is a derivative of Meta Llama3.370BInstruct (AKA the reference model). It is a reasoning model that is post trained for reasoning, human chat pre

chat-completion
summarization
Llama-3.1-8B-Instruct-NIM-microservice
Llama-3.1-8B-Instruct-NIM-microservice

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). These models are optimized for multilingual dialogue use cases and outperform many of the available open

chat-completion
NVIDIA-Nemotron-Nano-12B-v2-VL-NIM-microservice
NVIDIA-Nemotron-Nano-12B-v2-VL-NIM-microservice

Description The NVIDIA Nemotron Nano 12B v2 VL NIM microservice enables multiimage reasoning and video understanding, along with strong document intelligence, visual Q&A and summarization capabilities. This model is ready for commercial use. Nemotron Nano 12B V2 VL is a model for multimod

chat-completion
Mistral-7B-Instruct-v0.3-NIM-microservice
Mistral-7B-Instruct-v0.3-NIM-microservice

Mistral7BInstructv0.3 is a language model that can follow instructions, complete requests, and generate creative text formats. It is an instruct version of the Mistral7Bv0.3 generative text model finetuned using a variety of publicly available conversation datasets. This model is ready for

chat-completion
Llama-3.2-NV-rerankqa-1b-v2-NIM-microservice
Llama-3.2-NV-rerankqa-1b-v2-NIM-microservice

NVIDIA NeMo™ Retriever Llama3.2 reranking model is optimized for providing a logit score that represents how relevant a document(s) is to a given query. The model was finetuned for multilingual, crosslingual text questionanswering retrieval, with support for long documents (up to 8192 tokens)

text-classification
Openfold2-NIM-microservice
Openfold2-NIM-microservice

Openfold2 is a protein structure prediction model from the OpenFold Consortium and the Alquraishi Laboratory. The model is a PyTorch reimplementation of Google Deepmind’s AlphaFold2, with support fo

protein-binder
Nemotron-3-8B-Base-4k
Nemotron-3-8B-Base-4k

Model Overview Description Nemotron38BBase4k is a large language foundation model for enterprises to build custom LLMs. This foundation model has 8 billion parameters, and supports a context length of 4,096 tokens. Nemotron38BBase4k is part of Nemotron3, which is a family of enterpri

text-generation
Llama-3.1-Nemotron-Nano-8B-v1-NIM-microservice
Llama-3.1-Nemotron-Nano-8B-v1-NIM-microservice

Advanced LLM for reasoning, math, general knowledge, and function calling.

chat-completion
NVIDIA-Nemotron-Nano-9b-v2-NIM-microservice
NVIDIA-Nemotron-Nano-9b-v2-NIM-microservice

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then conc

chat-completion
Nemotron-3-8B-Chat-RLHF
Nemotron-3-8B-Chat-RLHF

Model Overview Description Nemotron38BChat4kRLHF is a large language model instructtuned on an 8B base model. It takes input with context length up to 4,096 tokens.The model has been further finetuned for instruction following using Reinforcement Learning from Human Feedback (RLHF). N

text-generation
Nemotron-3-8B-Chat-SFT
Nemotron-3-8B-Chat-SFT

Model Overview Description Nemotron38BChat4kSFT is a large language model instructtuned on an 8B base model. It takes input with context length up to 4,096 tokens. The model has been further finetuned for instruction following using [Supervised Finetuning](https://developer.nvidia.com

text-generation
Nemotron-3-8B-QA-4k
Nemotron-3-8B-QA-4k

Model Overview Description Nemotron38BQA4k is a 8 billion parameter generative language model customized on an 8B base model. It takes input with context length up to 4,096 tokens. The model has been further finetuned for instruction following Supervised Finetuning (SFT) using a method

text-generation
ProteinMPNN-NIM-microservice
ProteinMPNN-NIM-microservice

ProteinMPNN (Protein Message Passing Neural Network) is a cuttingedge, deep learningbased, graph neural network designed to predict amino acid sequences for given protein backbones. This network leverages evolutionary, functional, and structural information to generate sequences that are like

protein-binder