AI Model Publishers | Azure AI Foundry

MicrosoftProprietary AI models developed by Microsoft, tailored for various enterprise applications and integrated within Azure services.

Overview

Microsoft’s Phi family proves that small language models can deliver big‑league reasoning: Phi‑3 mini (3.8 B) runs on a single GPU or even a smartphone, while Phi‑4‑mini‑Flash introduces a hybrid “SambaY” architecture for 10× faster responses with 64 K context. Multimodal Phi‑3 Vision adds image understanding for edge and robotics.

Key Azure AI Foundry Models (July 2025)

Phi‑3‑mini‑128K‑Instruct – 3.8 B params, 128 K context; ideal for copilots and on‑device AI.
Phi‑3‑small‑8K / 128K – 7 B params with higher throughput for chat and RAG.
Phi‑3 Vision – Compact multimodal model for text + image tasks.
Phi‑4‑mini‑Flash‑Reasoning – Latency‑optimized 3.8 B model announced July 2025.

Why Microsoft Models on Azure

Because they are born on Azure, Phi models offer first‑party managed compute, granular quota, and fine‑tuning with zero data egress—perfect for latency‑critical and cost‑sensitive workloads.

Total Models: 100

Model router is a deployable AI model that is trained to select the most suitable large language model (LLM) for a given prompt.

chat-completion

MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team to fill in information gaps in the previous version of the model and improve its harm protections while maintaining R1 reasoning capabilities.

chat-completion

Microsoft Research's EvoDiff is a diffusion modeling framework capable of generating highfidelity, diverse, and novel proteins with the option of conditioning according to sequence constraints. Because it operates in the universal protein design space, EvoDiff can unconditionally sample diverse str

protein-sequence-generation

State-of-the-art open-weight reasoning model.

chat-completion

Lightweight math reasoning model optimized for multi-step problem solving

chat-completion

3.8B parameters Small Language Model outperforming larger models in reasoning, math, coding, and function-calling

chat-completion

First small multimodal model to have 3 modality inputs (text, audio, image), excelling in quality and efficiency

chat-completion

Phi-4 14B, a highly capable model for low latency scenarios.

chat-completion

Adapted AI model for financial reports analysis based on Phi-4

chat-completion

Adapted AI model for supply chain trade regulations based on Phi-4

chat-completion

Muse is a World and Human Action Model (WHAM), a generative model of gameplay (visuals and/or controller actions).

Model Summary Phi3 Vision is a lightweight, stateoftheart open multimodal model built upon datasets which include synthetic data and filtered publicly available websites with a focus on very highquality, reasoning dense data both on text and vision. The model belongs to the Phi3 model

chat-completion

Azure AI Language Azure AI Language is a cloudbased service designed to help you easily get insights from unstructured text data. It uses a combination of SLMs and LLMs, including taskoptimized decoder models and encoder models, for Language AI solutions. It provides premium quality at an affor

conversational-ai

This model is an optimized version of Phi3mini128kinstruct to enable local inference on AMD NPUs. This model uses RTN quantization. Model Description Developed by: Microsoft Model type: ONNX License: MIT Model Description: This is a conversion of the Phi3mini128kin

chat-completion

This model is an optimized version of Phi4 to enable local inference on TensorRTRTX GPUs. Model Description Developed by: Microsoft Model type: ONNX License: MIT Model Description: This is a conversion of the Phi4 for local inference on TensorRTRTX GPUs. Disclaimer

chat-completion

This model is an optimized version of DeepSeekR1DistillQwen7B to enable local inference on TensorRTRTX GPUs. Model Description Developed by: Microsoft Model type: ONNX License: MIT Model Description: This is a conversion of the DeepSeekR1DistillQwen7B for local in

chat-completion

This model is an optimized version of Qwen2.5Coder7BInstruct to enable local inference on TensorRTRTX GPUs. Model Description Developed by: Microsoft Model type: ONNX License: apache2.0 Model Description: This is a conversion of the Qwen2.5Coder7BInstruct for local

chat-completion

Orca 2 is a finetuned version of LLAMA2. Orca 2’s training data is a synthetic dataset that was created to enhance the small model’s reasoning abilities. All synthetic training data was moderated using the Microsoft Azure content filters. More details about the model can be found in the [Orca 2 pap

text-generation

Aurora is a machine learning model that can predict general environmental variables.

environmental-forecasting

A 14B parameters model, proves better quality than Phi-3-mini, with a focus on high-quality, reasoning-dense data.

chat-completion

Most medical imaging AI today is narrowly built to detect a small set of individual findings on a single modality like chest Xrays. This training approach is data and computationally inefficient, requiring ~612 months per finding[1], and often fails to generalize in real world environments. By fu

This model is an optimized version of Qwen2.5Coder7BInstruct to enable local inference on AMD NPUs. Model Description Developed by: Microsoft Model type: ONNX License: apache2.0 Model Description: This is a conversion of the Qwen2.5Coder7BInstruct for local inferenc

chat-completion

Whisper Large V3 Turbo is an advanced speech recognition model, optimized for highperformance GPU inference. It is suitable for automatic speech recognition (ASR) tasks in various domains, leveraging largescale training data for robust multilingual transcription. This model is an optimized version

automatic-speech-recognition

This model is an optimized version of Qwen2.5Coder0.5BInstruct to enable local inference on Intel NPUs. Model Description Developed by: Microsoft Model type: ONNX License: apache2.0 Model Description: This is a conversion of the Qwen2.5Coder0.5BInstruct for local in

chat-completion

This model is an optimized version of Qwen2.5Coder0.5BInstruct to enable local inference on TensorRTRTX GPUs. This model uses RTN quantization. Model Description Developed by: Microsoft Model type: ONNX License: apache2.0 Model Description: This is a conversion of the

chat-completion

This model is an optimized version of Qwen2.5Coder14BInstruct to enable local inference on TensorRTRTX GPUs. Model Description Developed by: Microsoft Model type: ONNX License: apache2.0 Model Description: This is a conversion of the Qwen2.5Coder14BInstruct for loc

chat-completion

This model is an optimized version of Qwen2.57BInstruct to enable local inference on TensorRTRTX GPUs. Model Description Developed by: Microsoft Model type: ONNX License: apache2.0 Model Description: This is a conversion of the Qwen2.57BInstruct for local inference o

chat-completion

State-of-the-art open-weight reasoning model.

chat-completion

This model is an optimized version of Qwen2.5Coder1.5BInstruct to enable local inference on AMD NPUs. Model Description Developed by: Microsoft Model type: ONNX License: apache2.0 Model Description: This is a conversion of the Qwen2.5Coder1.5BInstruct for local infe

chat-completion

This model is an optimized version of Qwen2.57BInstruct to enable local inference on AMD NPUs. Model Description Developed by: Microsoft Model type: ONNX License: apache2.0 Model Description: This is a conversion of the Qwen2.57BInstruct for local inference on AMD NPU

chat-completion

This model is an optimized version of qwen2.51.5binstruct to enable local inference on QNN NPUs. Model Description Developed by: Microsoft Model type: ONNX License: MIT Model Description: This is a conversion of the qwen2.51.5binstruct for local inference on QNN NPUs.

chat-completion

Learn more: \[original model announcement\] DeepSeekR1DistilledNPUOptimized is a downloadable package of DeepSeekR1DistilledQwen1.5B that is specifically optimized for the Neural Processing Unit (NPU). NPU optimized models let develo

chat-completion

Refresh of Phi-3-vision model.

chat-completion

A 7B parameters model, proves better quality than Phi-3-mini, with a focus on high-quality, reasoning-dense data.

chat-completion

A new mixture of experts model

chat-completion

Azure AI Translator Azure AI Translator, a part of the Azure AI services, is a cloudbased neural machine translation service that enables businesses to translate text and documents across multiple languages in real time and in batches. The service also offers customization options, enabling busi

document-translation

This model is an optimized version of qwen2.57binstruct to enable local inference on QNN NPUs. Model Description Developed by: Microsoft Model type: ONNX License: MIT Model Description: This is a conversion of the qwen2.57binstruct for local inference on QNN NPUs.

chat-completion

This model is an optimized version of phi3mini128kinstruct to enable local inference on TensorRTRTX GPUs. Model Description Developed by: Microsoft Model type: ONNX License: MIT Model Description: This is a conversion of the phi3mini128kinstruct for local inferenc

chat-completion

The TamGen is a 100 millionparameter model that can generate compounds based on the input protein information. TamGen is pretrained on 10 million compounds from PubChem and finetuned on CrossDocked and PDB datasets. We evaluate TamGen on existing benchmarks and achieve top performance. Furthermor

LLaVAMed v1.5, using mistralai/Mistral7BInstructv0.2 as LLM for a better commercial license Large Language and Vision Assistant for bioMedicine (i.e., “LLaVAMed”) is a large language and vision model trained using a curriculum lear

image-text-to-text

Lightweight math reasoning model optimized for multi-step problem solving

chat-completion

This model is an optimized version of Qwen2.5Coder0.5BInstruct to enable local inference on AMD NPUs. Model Description Developed by: Microsoft Model type: ONNX License: apache2.0 Model Description: This is a conversion of the Qwen2.5Coder0.5BInstruct for local infer

chat-completion

Whisper is an OpenAI pretrained speech recognition model with potential applications for ASR solutions for developers. However, due to weak supervision and largescale noisy data, it should be used with caution in highrisk domains. The model has been trained on 680k hours of audio data representin

automatic speech recognition

BiomedCLIP is a biomedical visionlanguage foundation model that is pretrained on PMC15M, a dataset of 15 million figurecaption pairs extracted from biomedical research articles in PubMed Central, using contrastive learning. It uses PubMedBERT as the text encoder and Vision Transformer as the imag

zero-shot-image-classification

The Swin Transformer V2 model is a type of Vision Transformer, pretrained on ImageNet21k with a resolution of 192x192, is introduced in the <a href="https://arxiv.org/abs/2111.09883" target="blank"researchpaper</a titled "Swin Transformer V2: Scaling Up Capacity and Resolution" authored by Liu

image-classification

Description Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles[^1],[^2],[^3]. Previous models often rely predominantly on tilelevel predictions, which can overlook critical slidelevel context and spatial dependen

image-feature-extraction

Orca 2 is a finetuned version of LLAMA2. Orca 2’s training data is a synthetic dataset that was created to enhance the small model’s reasoning abilities. All synthetic training data was moderated using the Microsoft Azure content filters. More details about the model can be found in the [Orca 2 pap

text-generation

Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. 3D medical images such as CT and MRI play unique roles in clinical practices. MedImageParse 3D is a foundation model for imaging parsing that can jointly co

image-segmentation

Same Phi-3-medium model, but with a larger context size for RAG or few shot prompting.

chat-completion

Description The adapted AI model for financial reports analysis (preview) is a state\of\the\art small language model (SLM) based on the Phi\3\small\128k architecture, designed specifically for analyzing financial reports. It has been fine\tuned on a few hundred million tokens derived fro

chat-completion

This model is an optimized version of Qwen2.51.5BInstruct to enable local inference on Intel NPUs. Model Description Developed by: Microsoft Model type: ONNX License: apache2.0 Model Description: This is a conversion of the Qwen2.51.5BInstruct for local inference on I

chat-completion

1