microsoft-godel-v1-1-large-seq2seq
microsoft-godel-v1-1-large-seq2seq
Version: 8
HuggingFaceLast updated July 2025

Large-Scale Pre-Training for Goal-Directed Dialog (GODEL)

GODEL is a large-scale pre-trained model for goal-directed dialogs. It is parameterized with a Transformer-based encoder-decoder model and trained for response generation grounded in external text, which allows more effective fine-tuning on dialog tasks that require conditioning the response on information that is external to the current conversation (e.g., a retrieved document). The pre-trained model can be efficiently fine-tuned and adapted to accomplish a new dialog task with a handful of task-specific dialogs. The v1.1 model is trained on 551M multi-turn dialogs from Reddit discussion thread, and 5M instruction and knowledge grounded dialogs.
Multi-turn generation examples from an interactive environment:
Chitchat example:
Instruction: given a dialog context, you need to response empathically.

User: Does money buy happiness?

Agent: It is a question. Money buys you a lot of things, but not enough to buy happiness.

User: What is the best way to buy happiness ?

Agent: Happiness is bought through your experience and not money.
Grounded response generation example:
Instruction: given a dialog context and related knowledge, you need to response safely based on the knowledge.

Knowledge: The best Stardew Valley mods PCGamesN_0 / About SMAPI

User: My favorite game is stardew valley. stardew valley is very fun.

Agent: I love Stardew Valley mods, like PCGamesN_0 / About SMAPI.
Please find the information about preprocessing, training and full details of the GODEL in the project webpage . ArXiv paper: https://arxiv.org/abs/2206.11309

How to use

Now we are ready to try out how the model works as a chatting partner!
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("microsoft/GODEL-v1_1-large-seq2seq")
model = AutoModelForSeq2SeqLM.from_pretrained("microsoft/GODEL-v1_1-large-seq2seq")
def generate(instruction, knowledge, dialog):
    if knowledge != '':
        knowledge = '[KNOWLEDGE] ' + knowledge
    dialog = ' EOS '.join(dialog)
    query = f"{instruction} [CONTEXT] {dialog} {knowledge}"
    input_ids = tokenizer(f"{query}", return_tensors="pt").input_ids
    outputs = model.generate(input_ids, max_length=128, min_length=8, top_p=0.9, do_sample=True)
    output = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return output
# Instruction for a chitchat task
instruction = f'Instruction: given a dialog context, you need to response empathically.'
# Leave the knowldge empty
knowledge = ''
dialog = [
    'Does money buy happiness?',
    'It is a question. Money buys you a lot of things, but not enough to buy happiness.',
    'What is the best way to buy happiness ?'
]
response = generate(instruction, knowledge, dialog)
print(response)

Citation

if you use this code and data in your research, please cite our arxiv paper:
@misc{peng2022godel,
author = {Peng, Baolin and Galley, Michel and He, Pengcheng and Brockett, Chris and Liden, Lars and Nouri, Elnaz and Yu, Zhou and Dolan, Bill and Gao, Jianfeng},
title = {GODEL: Large-Scale Pre-training for Goal-Directed Dialog},
howpublished = {arXiv},
year = {2022},
month = {June},
url = {https://www.microsoft.com/en-us/research/publication/godel-large-scale-pre-training-for-goal-directed-dialog/},
}

microsoft/GODEL-v1_1-large-seq2seq powered by Text Generation Inference (TGI)

Send Request

You can use cURL or any REST Client to send a request to the AzureML endpoint with your AzureML token.
curl <AZUREML_ENDPOINT_URL> \
    -X POST \
    -d '{"inputs":"What is Deep Learning?"}' \
    -H "Authorization: Bearer <AZUREML_TOKEN>" \
    -H "Content-Type: application/json"

Supported Parameters

  • inputs (string): Input prompt.
  • parameters (object):
    • best_of (integer): Generate best_of sequences and return the one if the highest token logprobs.
    • decoder_input_details (boolean): Whether to return decoder input token logprobs and ids.
    • details (boolean): Whether to return generation details.
    • do_sample (boolean): Activate logits sampling.
    • frequency_penalty (float): The parameter for frequency penalty. 1.0 means no penalty Penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
    • grammar (object): One of the following
      • #1 (object):
        • type (enum): Possible values: json.
        • value (string): A string that represents a JSON Schema. JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions.
      • #2 (object):
        • type (enum): Possible values: regex.
        • value (string): The regular expression.
      • #3 (object):
        • type (enum): Possible values: json_schema.
        • value (object):
          • name (string): Optional name identifier for the schema
          • schema (object): The actual JSON schema definition
    • max_new_tokens (integer): Maximum number of tokens to generate.
    • repetition_penalty (float): The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.
    • return_full_text (boolean): Whether to prepend the prompt to the generated text
    • seed (integer): Random sampling seed.
    • stop (string[]): Stop generating tokens if a member of stop is generated.
    • temperature (float): The value used to module the logits distribution.
    • top_k (integer): The number of highest probability vocabulary tokens to keep for top-k-filtering.
    • top_n_tokens (integer): The number of highest probability vocabulary tokens to keep for top-n-filtering.
    • top_p (float): Top-p value for nucleus sampling.
    • truncate (integer): Truncate inputs tokens to the given size.
    • typical_p (float): Typical Decoding mass See Typical Decoding for Natural Language Generation for more information.
    • watermark (boolean): Watermarking with A Watermark for Large Language Models.
    • stream (boolean): Whether to stream the output tokens or not. Defaults to false.

Example payload

{
  "inputs": "What is Deep Learning?",
  "parameters": {
    "do_sample": true,
    "top_p": 0.95,
    "temperature": 0.2,
    "top_k": 50,
    "max_new_tokens": 256,
    "repetition_penalty": 1.03,
    "stop": ["\nUser:", "<|endoftext|>", "</s>"]
  }
}

OpenAI Chat Completion API compatibility

Additionally, Text Generation Inference (TGI) offers an OpenAI Chat Completion API compatible layer under the endpoint /v1/chat/completions,
check the full specification in the OpenAI Chat Completion Create Documentation .
Model Specifications
LicenseMit
Last UpdatedJuly 2025
PublisherHuggingFace