microsoft-nextcoder-14b
microsoft-nextcoder-14b
Version: 1
HuggingFaceLast updated August 2025

NextCoder-14B

GitHub    |    Paper
NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits (ICML'2025)

Introduction

NextCoder is the latest series of Code-Editing large language models developed using the Qwen2.5-Coder Instruct variants as base and trained with novel Selective Knowledge Transfer finetuning methodology as introduced in the paper. NextCoder family model comes in 3 different sizes 7, 14, 32 billion parameters, to meet the needs of different developers.
Following are the key improvements:
  • Significantly improvements in code editing, NextCoder-32B has performing on par with GPT-4o on complex benchmarks like Aider-Polyglot with performance increment of 44% from their base model.
  • No loss of generalizibility, due to our new finetuning method SeleKT
  • Long-context Support up to 32K tokens.
This repo contains the NextCoder 14B model, which has the following features:
  • Type: Causal Language Models
  • Training Stage: Post-training with SeleKT
  • Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
  • Number of Parameters: 14.7B
  • Number of Paramaters (Non-Embedding): 13.1B
  • Number of Layers: 48
  • Number of Attention Heads (GQA): 40 for Q and 8 for KV
For more details, please refer to our blog , GitHub , Paper .

Requirements

The code of NextCoder is based on Qwen2.5 base models which has been in the latest Hugging face transformers and we advise you to use the latest version of transformers. With transformers<4.37.0, you will encounter the following error:
KeyError: 'qwen2'

Quickstart

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "microsoft/NextCoder-14B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = """
Fix the following function that divides two numbers to handle all the edge cases:

def divide(a, b)
  returm a/b
"""
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Evaluation and Performance

ModelsHUMANEVALFIXCANITEDITAIDERPOLYGLOT
QwenCoder-2.5-3B73.237.136.8-
QwenCoder-2.5-3B-LoRA64.636.235.8-
QwenCoder-2.5-3B-SFT76.232.430.1-
NextCoder-3B75.642.437.6-
QwenCoder-2.5-7B73.848.159.4-
QwenCoder-2.5-7B-LoRA70.744.340.6-
QwenCoder-2.5-7B-SFT70.136.748.9-
NextCoder-7B81.150.565.7-
QwenCoder-2.5-14B87.858.166.99.3
QwenCoder-2.5-14B-LoRA78.050.966.25.3
QwenCoder-2.5-14B-SFT79.942.436.83.1
NextCoder-14B89.860.272.212.2
QwenCoder-2.5-32B90.261.072.916.4
QwenCoder-2.5-32B-LoRA82.352.460.26.7
QwenCoder-2.5-32B-SFT81.749.566.98.4
NextCoder-32B88.962.474.723.6
Comparison of base QwenCoder-2.5 models of different sizes and their SELEKT-enhanced versions across three code editing benchmarks. Detailed evaluation results are reported in this 📑 paper .

Responsible AI Use

The base models (from the QwenCoder-2.5 family) are suspectible to malicious prompts and may generate or execute harmful code. Our finetuning does not enhance or impede such behaviors. The users should use the models and their outputs responsibly and with caution. Model outputs should be subjected to additional analysis, including manual inspection, and sandboxing before execution.

Citation

@inproceedings{aggarwal2025nextcoder,
author = {Aggarwal, Tushar and Singh, Swayam and Awasthi, Abhijeet and Kanade, Aditya and Natarajan, Nagarajan},
title = {NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits},
booktitle = {International Conference on Machine Learning},
year = {2025},
url = {https://www.microsoft.com/en-us/research/publication/nextcoder-robust-adaptation-of-code-lms-to-diverse-code-edits/},
}

microsoft/NextCoder-14B powered by Text Generation Inference (TGI)

Send Request

You can use cURL or any REST Client to send a request to the AzureML endpoint with your AzureML token.
curl <AZUREML_ENDPOINT_URL> \
    -X POST \
    -d '{"inputs":"What is Deep Learning?"}' \
    -H "Authorization: Bearer <AZUREML_TOKEN>" \
    -H "Content-Type: application/json"

Supported Parameters

  • inputs (string): Input prompt.
  • parameters (object):
    • best_of (integer): Generate best_of sequences and return the one if the highest token logprobs.
    • decoder_input_details (boolean): Whether to return decoder input token logprobs and ids.
    • details (boolean): Whether to return generation details.
    • do_sample (boolean): Activate logits sampling.
    • frequency_penalty (float): The parameter for frequency penalty. 1.0 means no penalty Penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
    • grammar (object): One of the following
      • #1 (object):
        • type (enum): Possible values: json.
        • value (string): A string that represents a JSON Schema. JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions.
      • #2 (object):
        • type (enum): Possible values: regex.
        • value (string): The regular expression.
      • #3 (object):
        • type (enum): Possible values: json_schema.
        • value (object):
          • name (string): Optional name identifier for the schema
          • schema (object): The actual JSON schema definition
    • max_new_tokens (integer): Maximum number of tokens to generate.
    • repetition_penalty (float): The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.
    • return_full_text (boolean): Whether to prepend the prompt to the generated text
    • seed (integer): Random sampling seed.
    • stop (string[]): Stop generating tokens if a member of stop is generated.
    • temperature (float): The value used to module the logits distribution.
    • top_k (integer): The number of highest probability vocabulary tokens to keep for top-k-filtering.
    • top_n_tokens (integer): The number of highest probability vocabulary tokens to keep for top-n-filtering.
    • top_p (float): Top-p value for nucleus sampling.
    • truncate (integer): Truncate inputs tokens to the given size.
    • typical_p (float): Typical Decoding mass See Typical Decoding for Natural Language Generation for more information.
    • watermark (boolean): Watermarking with A Watermark for Large Language Models.
    • stream (boolean): Whether to stream the output tokens or not. Defaults to false.

Example payload

{
  "inputs": "What is Deep Learning?",
  "parameters": {
    "do_sample": true,
    "top_p": 0.95,
    "temperature": 0.2,
    "top_k": 50,
    "max_new_tokens": 256,
    "repetition_penalty": 1.03,
    "stop": ["\nUser:", "<|endoftext|>", "</s>"]
  }
}

OpenAI Chat Completion API compatibility

Additionally, Text Generation Inference (TGI) offers an OpenAI Chat Completion API compatible layer under the endpoint /v1/chat/completions,
check the full specification in the OpenAI Chat Completion Create Documentation .
Model Specifications
LicenseMit
Last UpdatedAugust 2025
ProviderHuggingFace