distilbert-base-cased-distilled-squad
Version: 13
The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT , and the paper DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter . DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark.
This model is a fine-tune checkpoint of DistilBERT-base-cased , fine-tuned using (a second step of) knowledge distillation on SQuAD v1.1 .
Training Details
Training Data
The distilbert-base-cased model was trained using the same data as the distilbert-base-uncased model . The distilbert-base-uncased model model describes it's training data as:DistilBERT pretrained on the same data as BERT, which is BookCorpus , a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers).To learn more about the SQuAD v1.1 dataset, see the SQuAD v1.1 data card .
Training Procedure
Preprocessing
See the distilbert-base-cased model card for further details.Pretraining
See the distilbert-base-cased model card for further details.Evaluation Results
As discussed in the model repositoryThis model reaches a F1 score of 87.1 on the [SQuAD v1.1] dev set (for comparison, BERT bert-base-cased version reaches a F1 score of 88.7).
Limitations and Biases
CONTENT WARNING: Readers should be aware that language generated by this model can be disturbing or offensive to some and can propagate historical and current stereotypes. Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021) ). Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.Model Evaluation samples
| Task | Use case | Dataset | Python sample (Notebook) | CLI with YAML |
|---|---|---|---|---|
| Question Answering | Extractive Q&A | Squad v2 | evaluate-model-question-answering.ipynb | evaluate-model-question-answering.yml |
Inference samples
| Inference type | Python sample (Notebook) |
|---|---|
| Real time | sdk-example.ipynb |
| Real time | question-answering-online-endpoint.ipynb |
Sample inputs and outputs
Sample input
{
"input_data": {
"question": "What's my name?",
"context": "My name is John and I live in Seattle"
}
}
Sample output
[
"John"
]
Model Specifications
LicenseApache-2.0
Last UpdatedApril 2025
Provider
Languages1 Language