microsoft-minilm-l12-h384-uncased
Version: 7
HuggingFaceLast updated July 2025

MiniLM: Small and Fast Pre-trained Models for Language Understanding and Generation

MiniLM is a distilled model from the paper "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers ". Please find the information about preprocessing, training and full details of the MiniLM in the original MiniLM repository . Please note: This checkpoint can be an inplace substitution for BERT and it needs to be fine-tuned before use!

English Pre-trained Models

We release the uncased 12-layer model with 384 hidden size distilled from an in-house pre-trained UniLM v2 model in BERT-Base size.
  • MiniLMv1-L12-H384-uncased: 12-layer, 384-hidden, 12-heads, 33M parameters, 2.7x faster than BERT-Base

Fine-tuning on NLU tasks

We present the dev results on SQuAD 2.0 and several GLUE benchmark tasks.
Model#ParamSQuAD 2.0MNLI-mSST-2QNLICoLARTEMRPCQQP
BERT-Base 109M76.884.593.291.758.968.687.391.3
MiniLM-L12xH38433M81.785.793.091.558.573.389.591.3

Citation

If you find MiniLM useful in your research, please cite the following paper:
@misc{wang2020minilm,
    title={MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers},
    author={Wenhui Wang and Furu Wei and Li Dong and Hangbo Bao and Nan Yang and Ming Zhou},
    year={2020},
    eprint={2002.10957},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

microsoft/MiniLM-L12-H384-uncased powered by Hugging Face Inference Toolkit

Send Request

You can use cURL or any REST Client to send a request to the AzureML endpoint with your AzureML token.
curl <AZUREML_ENDPOINT_URL> \
    -X POST \
    -H "Authorization: Bearer <AZUREML_TOKEN>" \
    -H "Content-Type: application/json" \
    -d '{"inputs":"I like you. I love you"}'

Supported Parameters

  • inputs (string): The text to classify
  • parameters (object):
    • function_to_apply (enum): Possible values: sigmoid, softmax, none.
    • top_k (integer): When specified, limits the output to the top K most probable classes.
Check the full API Specification at the Hugging Face Inference documentation .
Model Specifications
LicenseMit
Last UpdatedJuly 2025
ProviderHuggingFace