Hugging Face
Hugging FaceHosts a vast repository of open-source models and tools for natural language processing tasks.
Total Models: 10909
qwen-qwen3.5-9b
qwen-qwen3.5-9b

Qwen/Qwen3.59B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token. ba

chat-completion
qwen-qwen3.5-35b-a3b
qwen-qwen3.5-35b-a3b

Qwen/Qwen3.535BA3B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML tok

chat-completion
coherelabs-command-a-plus-05-2026-w4a4
coherelabs-command-a-plus-05-2026-w4a4

CohereLabs/commandaplus052026w4a4 powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azur

chat-completion
coherelabs-command-a-plus-05-2026-bf16
coherelabs-command-a-plus-05-2026-bf16

CohereLabs/commandaplus052026bf16 powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azur

chat-completion
qwen-qwen3.6-27b
qwen-qwen3.6-27b

Qwen/Qwen3.627B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token.

chat-completion
qwen-qwen3.6-35b-a3b
qwen-qwen3.6-35b-a3b

Qwen/Qwen3.635BA3B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML tok

chat-completion
google-gemma-4-31b-it
google-gemma-4-31b-it

Invisible google/gemma431Bit is not visible on the Microsoft Foundry and Azure Machine Learning catalogs until 20260414T17:00:00.000Z, when it will automatically switch to visible i.e., public, unless either the invisibleUntil tag is removed or the invisibleUntil date is modified.

chat-completion
google-gemma-4-e4b-it
google-gemma-4-e4b-it

Invisible google/gemma4E4Bit is not visible on the Microsoft Foundry and Azure Machine Learning catalogs until 20260414T17:00:00.000Z, when it will automatically switch to visible i.e., public, unless either the invisibleUntil tag is removed or the invisibleUntil date is modified.

chat-completion
unsloth-qwen3.6-35b-a3b-gguf
unsloth-qwen3.6-35b-a3b-gguf

unsloth/Qwen3.635BA3BGGUF with UDIQ3S quantization, powered by llama.cpp Original Model Card llama.cpp Documentation GGUF GGUF is a binary file format optimized for quick loading and

chat-completion
tongyi-mai-z-image-turbo
tongyi-mai-z-image-turbo

TongyiMAI/ZImageTurbo powered by Hugging Face API Original Model Card texttoimage Task on Hugging Face Send Request You can use cURL or any REST Client to send a request to the Azure

text-to-image
openai-whisper-large-v3
openai-whisper-large-v3

openai/whisperlargev3 powered by Hugging Face Inference Toolkit Original Model Card automaticspeechrecognition Task on Hugging Face Send Request You can use cURL or any R

automatic-speech-recognition
google-gemma-4-26b-a4b-it
google-gemma-4-26b-a4b-it

Invisible google/gemma426BA4Bit is not visible on the Microsoft Foundry and Azure Machine Learning catalogs until 20260414T17:00:00.000Z, when it will automatically switch to visible i.e., public, unless either the invisibleUntil tag is removed or the invisibleUntil date is modified

chat-completion
facebook-sam3
facebook-sam3

Gated Model Access Required facebook/sam3 requires special access approval from the authors through Hugging Face. To use this model, you must: 1. Request access through the model page on Hugging Face and wait for approval from the model authors. 2. [Cr

mask-generation
coherelabs-command-a-plus-05-2026-fp8
coherelabs-command-a-plus-05-2026-fp8

CohereLabs/commandaplus052026fp8 powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure

chat-completion
sentence-transformers-all-minilm-l6-v2
sentence-transformers-all-minilm-l6-v2

sentencetransformers/allMiniLML6v2 powered by Text Embeddings Inference Original Model Card Text Embeddings Inference Documentation [featureextraction Task

embeddings
baai-bge-m3
baai-bge-m3

BAAI/bgem3 powered by Text Embeddings Inference (TEI) Original Model Card Text Embeddings Inference Documentation Send Request You can use cURL or any REST Client to send a request to the Azu

embeddings
google-gemma-4-e2b-it
google-gemma-4-e2b-it

Invisible google/gemma4E2Bit is not visible on the Microsoft Foundry and Azure Machine Learning catalogs until 20260414T17:00:00.000Z, when it will automatically switch to visible i.e., public, unless either the invisibleUntil tag is removed or the invisibleUntil date is modified.

chat-completion
openai-gpt-oss-20b
openai-gpt-oss-20b

Model Card for openai/gptoss20b in Azure <p align="center" <img alt="gptoss20b" src="https://raw.githubusercontent.com/openai/gptoss/main/docs/gptoss20b.svg" </p <p align="center" <a href="https://gptoss.com"<strongTry gptoss</strong</a � <a href="https://cookbook.openai.c

chat-completion
stabilityai-stable-diffusion-xl-base-1.0
stabilityai-stable-diffusion-xl-base-1.0

stabilityai/stablediffusionxlbase1.0 powered by Hugging Face API Original Model Card texttoimage Task on Hugging Face Send Request You can use cURL or any REST Client

text-to-image
nvidia-parakeet-tdt-0.6b-v3
nvidia-parakeet-tdt-0.6b-v3

nvidia/parakeettdt0.6bv3 powered by Hugging Face + NeMo Original Model Card Send Request You can use cURL or any REST Client to send a request to the AzureML endpoint with your AzureML token. bash curl <AZUREMLENDPOINTURL \

automatic-speech-recognition
nvidia-nemotron-3-nano-omni-30b-a3b-reasoning-bf16
nvidia-nemotron-3-nano-omni-30b-a3b-reasoning-bf16

nvidia/Nemotron3NanoOmni30BA3BReasoningBF16 powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to se

chat-completion
openai-gpt-oss-120b
openai-gpt-oss-120b

Model Card for openai/gptoss120b in Azure <p align="center" <img alt="gptoss120b" src="https://raw.githubusercontent.com/openai/gptoss/main/docs/gptoss120b.svg" </p <p align="center" <a href="https://gptoss.com"<strongTry gptoss</strong</a � <a href="https://cookbook.opena

chat-completion
qwen-qwen3.5-4b
qwen-qwen3.5-4b

Qwen/Qwen3.54B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token. ba

chat-completion
qwen-qwen-image-2512
qwen-qwen-image-2512

Qwen/QwenImage2512 powered by Hugging Face API Original Model Card texttoimage Task on Hugging Face Send Request You can use cURL or any REST Client to send a request to the Azure ML endpo

text-to-image
qwen-qwen3-0.6b
qwen-qwen3-0.6b

Qwen/Qwen30.6B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token. bas

chat-completion
tongyi-mai-z-image
tongyi-mai-z-image

TongyiMAI/ZImage powered by Hugging Face API Original Model Card texttoimage Task on Hugging Face Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint

text-to-image
black-forest-labs-flux.1-schnell
black-forest-labs-flux.1-schnell

Gated Model Access Required blackforestlabs/FLUX.1schnell requires special access approval from the authors through Hugging Face. To use this model, you must: 1. Request access through the model page on Hugging Face and wait for a

text-to-image
qwen-qwen3-embedding-0.6b
qwen-qwen3-embedding-0.6b

Qwen/Qwen3Embedding0.6B powered by Text Embeddings Inference Original Model Card Text Embeddings Inference Documentation [featureextraction Task on Hugging Face](https://h

embeddings
nousresearch-hermes-3-llama-3.1-8b
nousresearch-hermes-3-llama-3.1-8b

NousResearch/Hermes3Llama3.18B powered by Text Generation Inference (TGI) Example Notebook Original Model Card [Text Generation Inference Documentation](

text-generation
qwen-qwen2.5-7b-instruct
qwen-qwen2.5-7b-instruct

Qwen/Qwen2.57BInstruct powered by Text Generation Inference (TGI) Example Notebook Original Model Card [Text Generation Inference Documentation](https://huggingface.

text-generation
qwen-qwen3-asr-1.7b
qwen-qwen3-asr-1.7b

Qwen/Qwen3ASR1.7B powered by vLLM Original Model Card vLLM Documentation Transcriptions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token.

automatic-speech-recognition
qwen-webworld-8b
qwen-webworld-8b

Qwen/WebWorld8B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token.

chat-completion
qwen-webworld-32b
qwen-webworld-32b

Qwen/WebWorld32B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token.

chat-completion
qwen-qwen3.6-27b-fp8
qwen-qwen3.6-27b-fp8

Qwen/Qwen3.627BFP8 powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML tok

chat-completion
openai-community-gpt2
openai-community-gpt2

openaicommunity/gpt2 powered by Text Generation Inference (TGI) Example Notebook Original Model Card [Text Generation Inference Documentation](https://huggingface.co/doc

text-generation
openai-whisper-large-v3-turbo
openai-whisper-large-v3-turbo

openai/whisperlargev3turbo powered by Hugging Face Inference API Original Model Card automaticspeechrecognition Task on Hugging Face Send Request You can use cURL

automatic-speech-recognition
qwen-qwen3-coder-30b-a3b-instruct
qwen-qwen3-coder-30b-a3b-instruct

Qwen/Qwen3Coder30BA3BInstruct powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoi

chat-completion
qwen-qwen3.5-2b
qwen-qwen3.5-2b

Qwen/Qwen3.52B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token. ba

chat-completion
lordx64-qwen3.6-35b-a3b-claude-4.7-opus-reasoning-distilled
lordx64-qwen3.6-35b-a3b-claude-4.7-opus-reasoning-distilled

lordx64/Qwen3.635BA3BClaude4.7OpusReasoningDistilled powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any

chat-completion
nvidia-nemotron-3-nano-omni-30b-a3b-reasoning-nvfp4
nvidia-nemotron-3-nano-omni-30b-a3b-reasoning-nvfp4

nvidia/Nemotron3NanoOmni30BA3BReasoningNVFP4 powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to

chat-completion
meta-llama-meta-llama-3-8b-instruct
meta-llama-meta-llama-3-8b-instruct

metallama/MetaLlama38BInstruct powered by Text Generation Inference. Example Notebook Original Model Card Send Request You can use cURL or any REST

text-generation
dphn-dolphin-mistral-24b-venice-edition
dphn-dolphin-mistral-24b-venice-edition

dphn/DolphinMistral24BVeniceEdition powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azu

chat-completion
nousresearch-hermes-4.3-36b
nousresearch-hermes-4.3-36b

NousResearch/Hermes4.336B powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your

chat-completion
unsloth-qwen3.6-27b-nvfp4
unsloth-qwen3.6-27b-nvfp4

unsloth/Qwen3.627BNVFP4 powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Az

chat-completion
mistralai-mistral-7b-instruct-v0.2
mistralai-mistral-7b-instruct-v0.2

mistralai/Mistral7BInstructv0.2 powered by Text Generation Inference. Example Notebook Original Model Card Send Request You can use cURL or any REST Cl

text-generation
qwen-qwen2.5-vl-7b-instruct
qwen-qwen2.5-vl-7b-instruct

Qwen/Qwen2.5VL7BInstruct powered by Text Generation Inference (TGI) Example Notebook Original Model Card [Text Generation Inference Documentation](https://huggin

image-text-to-text
nvidia-gemma-4-31b-it-nvfp4
nvidia-gemma-4-31b-it-nvfp4

nvidia/Gemma431BITNVFP4 powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with you

chat-completion
google-bert-bert-base-uncased
google-bert-bert-base-uncased

googlebert/bertbaseuncased powered by Hugging Face Inference Toolkit Original Model Card fillmask Task on Hugging Face Send Request You can use cURL or any REST Client to send a reque

fill-mask
sentence-transformers-paraphrase-multilingual-minilm-l12-v2
sentence-transformers-paraphrase-multilingual-minilm-l12-v2

sentencetransformers/paraphrasemultilingualMiniLML12v2 powered by Text Embeddings Inference Original Model Card [Text Embeddings Inference Documentation](https://huggingface.co/docs/textembeddingsinfe

embeddings
qwen-qwen3-coder-next
qwen-qwen3-coder-next

Qwen/Qwen3CoderNext powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML t

chat-completion
qwen-qwen3.6-35b-a3b-fp8
qwen-qwen3.6-35b-a3b-fp8

Qwen/Qwen3.635BA3BFP8 powered by vLLM Original Model Card vLLM Documentation Chat Completions API Send Request You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azur

chat-completion
1