granite-4.1-30b-base
granite-4.1-30b-base
Version: 1
Hugging FaceLast updated April 2026
ibm-granite/granite-4.1-30b-base powered by vLLM
⚠️ Warning: ibm-granite/granite-4.1-30b-base is a base model hence it doesn't have a pre-defined chat template as it's not fine-tuned for chat applications but rahter just pre-trained, meaning that it's likely not optimal for consumer facing inference or downstream applications.

Completions API

⚠️ Warning: note that the OpenAI Completions API is a legacy API, despite it being supported in vLLM; but it's the only way to get completions for base models as those don't have a chat template.

Send Request

You can use cURL or any REST Client to send a request to the Azure ML endpoint with your Azure ML token.
curl <AZUREML_ENDPOINT_URL> \
    -X POST \
    -H "Authorization: Bearer <AZUREML_TOKEN>" \
    -H "Content-Type: application/json" \
    -d '{"model":"ibm-granite/granite-4.1-30b-base","prompt":"What is Deep Learning?"}'

Supported Parameters

The following are the only mandatory parameters to send in the HTTP POST request to /v1/completions.
  • model (string): Model ID used to generate the response, in this case since only a single model is deployed within the same endpoint you can either set it to ibm-granite/granite-4.1-30b-base or leave it blank instead.
  • prompt (string or array): The prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. Note that <|endoftext|> is the document separator that the model sees during training, so if a prompt is not specified the model will generate as if from the beginning of a new document.
A list of messages comprising the conversation so far. Depending on the model you use, different message types (modalities) are supported, like text, images, and audio. The rest of the parameters are optional, and since this model is powered by vLLM with an OpenAI compatible interface on top for the Completions API for text generation, then the I/O interfaces for both generation and streaming are the same as in OpenAI Completions API. You can find the whole specification of the allowed parameters in the OpenAI Completions API Specification , or alternatively in the endpoint /openapi.json for the current Azure ML Endpoint.

Example payload

{
  "model": "ibm-granite/granite-4.1-30b-base",
  "prompt": "What is Deep Learning?",
  "max_tokens": 256,
  "temperature": 0.6
}

Hugging Face on Foundry

This model is sourced from Hugging Face, which offers thousands of open models for easy deployment on Microsoft Foundry. This model is a Non-Microsoft Product that has not been tested or evaluated by Microsoft. Customers should ensure that the model is appropriate for their specific use, including by evaluating any legal or export-control considerations and conducting their own model risk and safety evaluations. You can learn about Foundry risk and safety evaluations here . You can learn about Hugging Face security measures and requirements for models offered in Foundry here .
Model Specifications
LicenseApache-2.0
Last UpdatedApril 2026
ProviderHugging Face