NVIDIA
NVIDIAOffers GPU-optimized models and tools for high-performance AI applications across various domains.
Total Models: 18
Llama-3.3-70B-Instruct-NIM-microservice
Llama-3.3-70B-Instruct-NIM-microservice

Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instructiontuned textonly model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed c

chat-completion
Nemotron-3-8B-Chat-4k-SteerLM
Nemotron-3-8B-Chat-4k-SteerLM

Model Overview Description Nemotron38BSteerLM is an 8 billion parameter generative language model based on the NVIDIA 8B GPT base model. It has been customized using the SteerLM Method developed by NVIDIA to allow for user control of model outputs during

text-generation
Llama-3.1-8B-Instruct-NIM-microservice
Llama-3.1-8B-Instruct-NIM-microservice

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). These models are optimized for multilingual dialogue use cases and outperform many of the available open

chat-completion
Llama-3.2-NV-embedqa-1b-v2-NIM-microservice
Llama-3.2-NV-embedqa-1b-v2-NIM-microservice

NVIDIA NeMo™ Retriever Llama3.2 embedding model is optimized for multilingual and crosslingual text questionanswering retrieval with support for long documents (up to 8192 tokens) and dynamic embedding size (Matryoshka Embeddings). This model was evaluated on 26 languages: English, Arabic, Ben

embeddings
Rfdiffusion-NIM-microservice
Rfdiffusion-NIM-microservice

RFdiffusion (RoseTTAFold Diffusion) is a generative model that creates novel protein structures for protein scaffolding and protein binder design tasks. This model generates entirely new protein backbones and designs proteins that can be specifically tailored to bind to target molecules. It's a

protein-binder
Llama-3.3-Nemotron-Super-49B-v1-NIM-microservice
Llama-3.3-Nemotron-Super-49B-v1-NIM-microservice

Advanced LLM for reasoning, math, general knowledge, and function calling.

chat-completion
Deepseek-R1-Distill-Llama-8B-NIM-microservice
Deepseek-R1-Distill-Llama-8B-NIM-microservice

DeepSeek AI has developed a range of distilled models based on Meta's Llama architectures, with sizes spanning from 1.5 to 70 billion parameters, starting from the foundation of DeepSeekR1. This distillation process involves training smaller models to replicate the behavior and reasoning of the lar

chat-completion
Mixtral-8x7B-Instruct-v0.1-NIM-microservice
Mixtral-8x7B-Instruct-v0.1-NIM-microservice

Mixtral 8x7B Instruct is a language model that can follow instructions, complete requests, and generate creative text formats. Mixtral 8x7B a highquality sparse mixture of experts model (SMoE) with open weights. This model has been optimized through supervised finetuning and direct preference

chat-completion
Nemotron-3-8B-Chat-SteerLM
Nemotron-3-8B-Chat-SteerLM

Model Overview Description Nemotron38BSteerLM is an 8 billion parameter generative language model based on the NVIDIA 8B GPT base model. It has been customized using the SteerLM Method developed by NVIDIA to allow for user control of model outputs during

text-generation
Llama-3.2-NV-rerankqa-1b-v2-NIM-microservice
Llama-3.2-NV-rerankqa-1b-v2-NIM-microservice

NVIDIA NeMo™ Retriever Llama3.2 reranking model is optimized for providing a logit score that represents how relevant a document(s) is to a given query. The model was finetuned for multilingual, crosslingual text questionanswering retrieval, with support for long documents (up to 8192 tokens)

text-classification
Mistral-7B-Instruct-v0.3-NIM-microservice
Mistral-7B-Instruct-v0.3-NIM-microservice

Mistral7BInstructv0.3 is a language model that can follow instructions, complete requests, and generate creative text formats. It is an instruct version of the Mistral7Bv0.3 generative text model finetuned using a variety of publicly available conversation datasets. This model is ready for

chat-completion
Openfold2-NIM-microservice
Openfold2-NIM-microservice

Openfold2 is a protein structure prediction model from the OpenFold Consortium and the Alquraishi Laboratory. The model is a PyTorch reimplementation of Google Deepmind’s AlphaFold2, with support fo

protein-binder
Nemotron-3-8B-Base-4k
Nemotron-3-8B-Base-4k

Model Overview Description Nemotron38BBase4k is a large language foundation model for enterprises to build custom LLMs. This foundation model has 8 billion parameters, and supports a context length of 4,096 tokens. Nemotron38BBase4k is part of Nemotron3, which is a family of enterpri

text-generation
Llama-3.1-Nemotron-Nano-8B-v1-NIM-microservice
Llama-3.1-Nemotron-Nano-8B-v1-NIM-microservice

Advanced LLM for reasoning, math, general knowledge, and function calling.

chat-completion
ProteinMPNN-NIM-microservice
ProteinMPNN-NIM-microservice

ProteinMPNN (Protein Message Passing Neural Network) is a cuttingedge, deep learningbased, graph neural network designed to predict amino acid sequences for given protein backbones. This network leverages evolutionary, functional, and structural information to generate sequences that are like

protein-binder
Nemotron-3-8B-Chat-RLHF
Nemotron-3-8B-Chat-RLHF

Model Overview Description Nemotron38BChat4kRLHF is a large language model instructtuned on an 8B base model. It takes input with context length up to 4,096 tokens.The model has been further finetuned for instruction following using Reinforcement Learning from Human Feedback (RLHF). N

text-generation
Nemotron-3-8B-Chat-SFT
Nemotron-3-8B-Chat-SFT

Model Overview Description Nemotron38BChat4kSFT is a large language model instructtuned on an 8B base model. It takes input with context length up to 4,096 tokens. The model has been further finetuned for instruction following using [Supervised Finetuning](https://developer.nvidia.com

text-generation
Nemotron-3-8B-QA-4k
Nemotron-3-8B-QA-4k

Model Overview Description Nemotron38BQA4k is a 8 billion parameter generative language model customized on an 8B base model. It takes input with context length up to 4,096 tokens. The model has been further finetuned for instruction following Supervised Finetuning (SFT) using a method

text-generation