distilroberta-base

Version: 17

•Last updated April 2025

distilroberta-base is a distilled version of the RoBERTa-base model . It follows the same training procedure as DistilBERT .
The code for the distillation process can be found here . This model is case-sensitive: it makes a difference between english and English. The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base).
On average DistilRoBERTa is twice as fast as Roberta-base.

Training Details

DistilRoBERTa was pre-trained on OpenWebTextCorpus , a reproduction of OpenAI's WebText dataset (it is ~4 times less training data than the teacher RoBERTa). See the roberta-base model card for further details on training.

Evaluation Results

Whe

Model Specifications

LicenseApache-2.0

Last UpdatedApril 2025

Publisher

Languages1 Language

Quick Start