microsoft-xtremedistil-l6-h384-uncased
Version: 6
XtremeDistilTransformers for Distilling Massive Neural Networks
XtremeDistilTransformers is a distilled task-agnostic transformer model that leverages task transfer for learning a small universal model that can be applied to arbitrary tasks and languages as outlined in the paper XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation . We leverage task transfer combined with multi-task distillation techniques from the papers XtremeDistil: Multi-stage Distillation for Massive Multilingual Models and MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers with the following Github code . This l6-h384 checkpoint with 6 layers, 384 hidden size, 12 attention heads corresponds to 22 million parameters with 5.3x speedup over BERT-base. Other available checkpoints: xtremedistil-l6-h256-uncased and xtremedistil-l12-h384-uncased The following table shows the results on GLUE dev set and SQuAD-v2.| Models | #Params | Speedup | MNLI | QNLI | QQP | RTE | SST | MRPC | SQUAD2 | Avg |
|---|---|---|---|---|---|---|---|---|---|---|
| BERT | 109 | 1x | 84.5 | 91.7 | 91.3 | 68.6 | 93.2 | 87.3 | 76.8 | 84.8 |
| DistilBERT | 66 | 2x | 82.2 | 89.2 | 88.5 | 59.9 | 91.3 | 87.5 | 70.7 | 81.3 |
| TinyBERT | 66 | 2x | 83.5 | 90.5 | 90.6 | 72.2 | 91.6 | 88.4 | 73.1 | 84.3 |
| MiniLM | 66 | 2x | 84.0 | 91.0 | 91.0 | 71.5 | 92.0 | 88.4 | 76.4 | 84.9 |
| MiniLM | 22 | 5.3x | 82.8 | 90.3 | 90.6 | 68.9 | 91.3 | 86.6 | 72.9 | 83.3 |
| XtremeDistil-l6-h256 | 13 | 8.7x | 83.9 | 89.5 | 90.6 | 80.1 | 91.2 | 90.0 | 74.1 | 85.6 |
| XtremeDistil-l6-h384 | 22 | 5.3x | 85.4 | 90.3 | 91.0 | 80.9 | 92.3 | 90.0 | 76.6 | 86.6 |
| XtremeDistil-l12-h384 | 33 | 2.7x | 87.2 | 91.9 | 91.3 | 85.6 | 93.1 | 90.4 | 80.2 | 88.5 |
tensorflow 2.3.1, transformers 4.1.1, torch 1.6.0
If you use this checkpoint in your work, please cite:
@misc{mukherjee2021xtremedistiltransformers,
title={XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation},
author={Subhabrata Mukherjee and Ahmed Hassan Awadallah and Jianfeng Gao},
year={2021},
eprint={2106.04563},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
microsoft/xtremedistil-l6-h384-uncased powered by Hugging Face Inference Toolkit
Send Request
You can use cURL or any REST Client to send a request to the AzureML endpoint with your AzureML token.curl <AZUREML_ENDPOINT_URL> \
-X POST \
-H "Authorization: Bearer <AZUREML_TOKEN>" \
-H "Content-Type: application/json" \
-d '{"inputs":"I like you. I love you"}'
Supported Parameters
- inputs (string): The text to classify
- parameters (object):
- function_to_apply (enum): Possible values: sigmoid, softmax, none.
- top_k (integer): When specified, limits the output to the top K most probable classes.
Model Specifications
LicenseMit
Last UpdatedJuly 2025
ProviderHuggingFace