microsoft-xtremedistil-l6-h384-uncased
Version: 6
HuggingFaceLast updated July 2025

XtremeDistilTransformers for Distilling Massive Neural Networks

XtremeDistilTransformers is a distilled task-agnostic transformer model that leverages task transfer for learning a small universal model that can be applied to arbitrary tasks and languages as outlined in the paper XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation . We leverage task transfer combined with multi-task distillation techniques from the papers XtremeDistil: Multi-stage Distillation for Massive Multilingual Models and MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers with the following Github code . This l6-h384 checkpoint with 6 layers, 384 hidden size, 12 attention heads corresponds to 22 million parameters with 5.3x speedup over BERT-base. Other available checkpoints: xtremedistil-l6-h256-uncased and xtremedistil-l12-h384-uncased The following table shows the results on GLUE dev set and SQuAD-v2.
Models#ParamsSpeedupMNLIQNLIQQPRTESSTMRPCSQUAD2Avg
BERT1091x84.591.791.368.693.287.376.884.8
DistilBERT662x82.289.288.559.991.387.570.781.3
TinyBERT662x83.590.590.672.291.688.473.184.3
MiniLM662x84.091.091.071.592.088.476.484.9
MiniLM225.3x82.890.390.668.991.386.672.983.3
XtremeDistil-l6-h256138.7x83.989.590.680.191.290.074.185.6
XtremeDistil-l6-h384225.3x85.490.391.080.992.390.076.686.6
XtremeDistil-l12-h384332.7x87.291.991.385.693.190.480.288.5
Tested with tensorflow 2.3.1, transformers 4.1.1, torch 1.6.0 If you use this checkpoint in your work, please cite:
@misc{mukherjee2021xtremedistiltransformers,
      title={XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation}, 
      author={Subhabrata Mukherjee and Ahmed Hassan Awadallah and Jianfeng Gao},
      year={2021},
      eprint={2106.04563},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

microsoft/xtremedistil-l6-h384-uncased powered by Hugging Face Inference Toolkit

Send Request

You can use cURL or any REST Client to send a request to the AzureML endpoint with your AzureML token.
curl <AZUREML_ENDPOINT_URL> \
    -X POST \
    -H "Authorization: Bearer <AZUREML_TOKEN>" \
    -H "Content-Type: application/json" \
    -d '{"inputs":"I like you. I love you"}'

Supported Parameters

  • inputs (string): The text to classify
  • parameters (object):
    • function_to_apply (enum): Possible values: sigmoid, softmax, none.
    • top_k (integer): When specified, limits the output to the top K most probable classes.
Check the full API Specification at the Hugging Face Inference documentation .
Model Specifications
LicenseMit
Last UpdatedJuly 2025
ProviderHuggingFace