distilroberta-base
Version: 17
distilroberta-base is a distilled version of the RoBERTa-base model . It follows the same training procedure as DistilBERT .
The code for the distillation process can be found here . This model is case-sensitive: it makes a difference between english and English. The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base).
On average DistilRoBERTa is twice as fast as Roberta-base.
The code for the distillation process can be found here . This model is case-sensitive: it makes a difference between english and English. The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base).
On average DistilRoBERTa is twice as fast as Roberta-base.
Training Details
DistilRoBERTa was pre-trained on OpenWebTextCorpus , a reproduction of OpenAI's WebText dataset (it is ~4 times less training data than the teacher RoBERTa). See the roberta-base model card for further details on training.Evaluation Results
WheModel Specifications
LicenseApache-2.0
Last UpdatedApril 2025
Publisher
Languages1 Language