zai-org-glm-4.7-flash
Version: 1
zai-org/GLM-4.7-Flash powered by vLLM
Chat Completions API
Send Request
You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token.curl <AZUREML_ENDPOINT_URL> \
-X POST \
-H "Authorization: Bearer <AZUREML_TOKEN>" \
-H "Content-Type: application/json" \
-d '{"model":"zai-org/GLM-4.7-Flash","messages":[{"role":"user","content":"What is Deep Learning?"}]}'
Supported Parameters
The following are the only mandatory parameters to send in the HTTP POST request tov1/chat/completions.
- model (string): Model ID used to generate the response, in this case since only a single model is deployed within the same endpoint you can either set it to zai-org/GLM-4.7-Flash or leave it blank instead.
- messages (array): A list of messages comprising the conversation so far. Depending on the model you use, different message types (modalities) are supported, like text, images, and audio.
/openapi.json for the current Azure ML Endpoint.
Example payload
{
"model": "zai-org/GLM-4.7-Flash",
"messages": [
{"role":"user","content":"What is Deep Learning?"}
],
"max_completion_tokens": 256,
"temperature": 0.6
}
Responses API
Alternatively, given thatzai-org/GLM-4.7-Flash is a reasoning model, note that the recommended API is the OpenAI Responses API over the default OpenAI Chat Completions API aforementioned.
Send Request
curl <AZUREML_ENDPOINT_URL>/v1/responses \
-X POST \
-d '{"model":"zai-org/GLM-4.7-Flash","input":"What is Deep Learning?","reasoning":{"effort":"medium"}}' \
-H "Authorization: Bearer <AZUREML_TOKEN>" \
-H "Content-Type: application/json"
Supported Parameters
This being said, the following are the only mandatory parameters to send in the HTTP POST request to/v1/responses.
- model (string): Model ID used to generate the response, in this case since only a single model is deployed within the same endpoint you can either set it to zai-org/GLM-4.7-Flash or leave it blank instead.
- input (str or array): A text, image, or file inputs to the model, or even a list of messages comprising the conversation so far, used to generate the response. Depending on the model you use, different message types (modalities) are supported, like text, images, and audio; whilst in this case only text generation is supported so image and audio inputs are disallowed.
/openapi.json for the current Azure ML Endpoint.
Example Payload
{
"model": "zai-org/GLM-4.7-Flash",
"input": "What is Deep Learning?",
"max_output_tokens": 1024,
"temperature": 0.6,
"reasoning": {
"effort": "medium"
}
}
Model Specifications
LicenseMit
Last UpdatedJanuary 2026
ProviderHugging Face