AI Model Publishers | Azure AI Foundry

NVIDIAOffers GPU-optimized models and tools for high-performance AI applications across various domains.

Overview

NVIDIA’s Nemotron family supplies open‑weight reasoning models tuned for agentic workflows. The lineup scales from Nemotron Nano (8 B) for edge devices to Nemotron Ultra (253 B)—currently the top open model on reasoning leaderboards—while keeping permissive licensing.

Key Azure AI Foundry Models (July 2025)

Nemotron Nano 8B – FP8 inference for mobile and IoT scenarios.
Nemotron Super 49B – Balanced accuracy vs. single‑GPU cost.
Nemotron Ultra 253B – State‑of‑the‑art open reasoning model with 128 K context.

Why NVIDIA on Azure

Combine Nemotron with Azure GPU SKUs, Triton inference, and ML ops tooling to build high‑throughput agents without commercial licensing hurdles.

Total Models: 20

Model Overview Description Nemotron38BSteerLM is an 8 billion parameter generative language model based on the NVIDIA 8B GPT base model. It has been customized using the SteerLM Method developed by NVIDIA to allow for user control of model outputs during

text-generation

Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instructiontuned textonly model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed c

chat-completion

NVIDIA NeMo™ Retriever Llama3.2 embedding model is optimized for multilingual and crosslingual text questionanswering retrieval with support for long documents (up to 8192 tokens) and dynamic embedding size (Matryoshka Embeddings). This model was evaluated on 26 languages: English, Arabic, Ben

embeddings

DeepSeek AI has developed a range of distilled models based on Meta's Llama architectures, with sizes spanning from 1.5 to 70 billion parameters, starting from the foundation of DeepSeekR1. This distillation process involves training smaller models to replicate the behavior and reasoning of the lar

chat-completion

Versatile vision-language model for querying and summarizing images and video, deployable from data center to edge (via AWQ 4-bit TinyChat), with key findings that interleaved image-text, LLM unfreezing, and re-blended text-only data are essential for stro

image-classification

image-to-text

summarization

visual-question-answering

zero-shot-image-classification

RFdiffusion (RoseTTAFold Diffusion) is a generative model that creates novel protein structures for protein scaffolding and protein binder design tasks. This model generates entirely new protein backbones and designs proteins that can be specifically tailored to bind to target molecules. It's a

protein-binder

MSA Search NIM supports GPUaccelerated Multiple Sequence Alignment (MSA) of a query amino acid sequence against a set of protein sequence databases. These databases are searched for similar sequences to the query and then the collection of sequences are aligned to establish similar regions even

protein-binder

Advanced LLM for reasoning, math, general knowledge, and function calling.

chat-completion

Mixtral 8x7B Instruct is a language model that can follow instructions, complete requests, and generate creative text formats. Mixtral 8x7B a highquality sparse mixture of experts model (SMoE) with open weights. This model has been optimized through supervised finetuning and direct preference

chat-completion

text-generation

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). These models are optimized for multilingual dialogue use cases and outperform many of the available open

chat-completion

Model Overview Description Nemotron38BBase4k is a large language foundation model for enterprises to build custom LLMs. This foundation model has 8 billion parameters, and supports a context length of 4,096 tokens. Nemotron38BBase4k is part of Nemotron3, which is a family of enterpri

text-generation

Openfold2 is a protein structure prediction model from the OpenFold Consortium and the Alquraishi Laboratory. The model is a PyTorch reimplementation of Google Deepmind’s AlphaFold2, with support fo

protein-binder

NVIDIA NeMo™ Retriever Llama3.2 reranking model is optimized for providing a logit score that represents how relevant a document(s) is to a given query. The model was finetuned for multilingual, crosslingual text questionanswering retrieval, with support for long documents (up to 8192 tokens)

text-classification

Mistral7BInstructv0.3 is a language model that can follow instructions, complete requests, and generate creative text formats. It is an instruct version of the Mistral7Bv0.3 generative text model finetuned using a variety of publicly available conversation datasets. This model is ready for

chat-completion

Advanced LLM for reasoning, math, general knowledge, and function calling.

chat-completion

ProteinMPNN (Protein Message Passing Neural Network) is a cuttingedge, deep learningbased, graph neural network designed to predict amino acid sequences for given protein backbones. This network leverages evolutionary, functional, and structural information to generate sequences that are like

protein-binder

Model Overview Description Nemotron38BChat4kRLHF is a large language model instructtuned on an 8B base model. It takes input with context length up to 4,096 tokens.The model has been further finetuned for instruction following using Reinforcement Learning from Human Feedback (RLHF). N

text-generation

Model Overview Description Nemotron38BChat4kSFT is a large language model instructtuned on an 8B base model. It takes input with context length up to 4,096 tokens. The model has been further finetuned for instruction following using [Supervised Finetuning](https://developer.nvidia.com

text-generation

Model Overview Description Nemotron38BQA4k is a 8 billion parameter generative language model customized on an 8B base model. It takes input with context length up to 4,096 tokens. The model has been further finetuned for instruction following Supervised Finetuning (SFT) using a method

text-generation