AI Model Catalog | Microsoft Foundry Models

qwen-qwen3-vl-32b-thinking-fp8

Version: 2

Hugging Face•Last updated November 2025

Qwen/Qwen3-VL-32B-Thinking-FP8 powered by vLLM

Chat Completions API

Send Request

You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token.

curl <AZUREML_ENDPOINT_URL> \\
    -X POST \\
    -H "Authorization: Bearer <AZUREML_TOKEN>" \\
    -H "Content-Type: application/json" \\
    -d '{"model":"Qwen/Qwen3-VL-32B-Thinking-FP8","messages":[{"role":"user","content":"What is Deep Learning?"}]}'

Supported Parameters

The following are the only mandatory parameters to send in the HTTP POST request to v1/chat/completions.

model (string): Model ID used to generate the response, in this case since only a single model is deployed within the same endpoint you can either set it to Qwen/Qwen3-VL-32B-Thinking-FP8 or leave it blank instead.
messages (array): A list of messages comprising the conversation so far. Depending on the model you use, different message types (modalities) are supported, like text, images, and audio.

The rest of the parameters are optional, and since this model is powered by vLLM with an OpenAI compatible interface on top for the Chat Completions API for text generation, then the I/O interfaces for both generation and streaming are the same as in OpenAI Chat Completions API. You can find the whole specification of the allowed parameters in the OpenAI Chat Completion API Specification , or alternatively in the endpoint /openapi.json for the current Azure ML Endpoint.

Example payload

{
  "model": "Qwen/Qwen3-VL-32B-Thinking-FP8",
  "messages": [
    {"role":"user","content":"What is Deep Learning?"}
  ],
  "max_completion_tokens": 256,
  "temperature": 0.6
}

Vision Support

This model supports vision capabilities. You can send images in your chat completion requests:

curl <AZUREML_ENDPOINT_URL> \
    -X POST \
    -H "Authorization: Bearer <AZUREML_TOKEN>" \
    -H "Content-Type: application/json" \
    -d '{"model":"Qwen/Qwen3-VL-32B-Thinking-FP8","messages":[{"role":"user","content":[{"type":"text","text":"What do you see in this image?"},{"type":"image_url","image_url":{"url":"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"}}]}],"temperature":0.7,"top_p":0.95,"max_tokens":128,"stream":false}'

Responses API

Alternatively, given that Qwen/Qwen3-VL-32B-Thinking-FP8 is a reasoning model, note that the recommended API is the OpenAI Responses API over the default OpenAI Chat Completions API aforementioned.

Send Request

curl <AZUREML_ENDPOINT_URL>/v1/responses \
    -X POST \
    -d '{"model":"Qwen/Qwen3-VL-32B-Thinking-FP8","input":"What is Deep Learning?","reasoning":{"effort":"medium"}}' \
    -H "Authorization: Bearer <AZUREML_TOKEN>" \
    -H "Content-Type: application/json"

Supported Parameters

This being said, the following are the only mandatory parameters to send in the HTTP POST request to /v1/responses.

model (string): Model ID used to generate the response, in this case since only a single model is deployed within the same endpoint you can either set it to Qwen/Qwen3-VL-32B-Thinking-FP8 or leave it blank instead.
input (str or array): A text, image, or file inputs to the model, or even a list of messages comprising the conversation so far, used to generate the response. Depending on the model you use, different message types (modalities) are supported, like text, images, and audio; whilst in this case only text generation is supported so image and audio inputs are disallowed.

The rest of the parameters are optional, and since this model is powered by vLLM with an OpenAI compatible interface, you can find the whole specification of the allowed parameters in the OpenAI Responses API Specification , or alternatively in the endpoint /openapi.json for the current Azure ML Endpoint.

Example Payload

{
  "model": "Qwen/Qwen3-VL-32B-Thinking-FP8",
  "input": "What is Deep Learning?",
  "max_output_tokens": 1024,
  "temperature": 0.6,
  "reasoning": {
    "effort": "medium"
  }
}

Model Specifications

LicenseApache-2.0

Last UpdatedNovember 2025

ProviderHugging Face

Quick Start