openai-gpt-oss-120b
Version: 1
Model Card for
Try gpt-oss ·
Guides ·
System card ·
OpenAI blog
Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models:
Note that in this case, since openai/gpt-oss-120b is a reasoning model, the Responses API is preferred over the Chat Completions API, that's why the route is set to
openai/gpt-oss-120b
in Azure
Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models:
gpt-oss-120b
— for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters)gpt-oss-20b
— for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)
Highlights
- Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
- Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
- Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
- Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
- Agentic capabilities: Use the models’ native capabilities for function calling, web browsing , Python code execution , and Structured Outputs.
- Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making
gpt-oss-120b
run on a single H100 GPU and thegpt-oss-20b
model run within 16GB of memory.
Send Request
You can use cURL or any REST Client to send a request to the AzureML endpoint with your AzureML token.curl <AZUREML_ENDPOINT_URL> \
-X POST \
-d '{"model":"openai/gpt-oss-120b","input":"What is Deep Learning?"}' \
-H "Authorization: Bearer <AZUREML_TOKEN>" \
-H "Content-Type: application/json"
/v1/responses
instead of /v1/chat/completions
.
Supported Parameters
The following are the only mandatory parameters to send in the HTTP POST request to/v1/responses
.
- model (string): Model ID used to generate the response, in this case since only a single model is deployed within the same endpoint you can either set it to openai/gpt-oss-120b or leave it blank instead.
- input (str or array): A text, image, or file inputs to the model, or even a list of messages comprising the conversation so far, used to generate the response. Depending on the model you use, different message types (modalities) are supported, like text, images, and audio; whilst in this case only text generation is supported so image and audio inputs are disallowed.
/openapi.json
for the current Azure ML Endpoint. Alternatively, vLLM also exposes the OpenAI Chat Completions API, whilst the Responses API is preferred.
Example payload
{
"model": "openai/gpt-oss-120b",
"input": "What is Deep Learning?",
"max_output_tokens": 1024,
"temperature": 0.6,
"reasoning": {
"effort": "medium"
}
}
Model Specifications
LicenseApache-2.0
Last UpdatedAugust 2025
PublisherHuggingFace