RFdiffusion
RFdiffusion
Version: 1
NvidiaLast updated May 2025
RFdiffusion (RoseTTAFold Diffusion) is a generative model that creates novel protein structures for protein scaffolding and protein binder design tasks. This model generates entirely new protein backbones and designs proteins that can be specifically tailored to bind to target molecules. It's a generative model of protein backbones for protein binder design. For more information on the RFDiffusion model please check the Github repo

Intended Use

Primary Use Cases

The Rfdiffusion model is most suitable for - Motif Scaffolding
  • Unconditional protein generation
  • Symmetric unconditional generation (cyclic, dihedral and tetrahedral symmetries currently implemented, more - coming!)
  • Symmetric motif scaffolding
  • Binder design
  • Design diversification ("partial diffusion", sampling around a design)

Responsible AI Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here .

Training Data

Link: The Protein Data Bank Data Collection Method by dataset Hybrid: Automatic, Human
For PDB dataset, scientists worldwide submit structural data determined by X-ray crystallography or cryo-electron microscopy (cryo-EM). This includes atomic coordinates, experimental data, and metadata about the biological macromolecules.
Labeling Method by dataset Hybrid: Automatic, Human
For PDB dataset, expert biocurators review the submitted data to ensure accuracy and completeness. This involves checking the plausibility of the data and annotating it with relevant biological and chemical information.
Properties (Quantity, Dataset Descriptions, Sensor(s)): The training dataset used for RFdiffusion, as detailed in referenced paper, consists of protein structures sampled from the Protein Data Bank (PDB). To prepare these structures for training, a noising process is applied. This process involves simulating up to 200 steps of random modifications on the protein structures. Specifically, the modifications include perturbing the Cα coordinates with 3D Gaussian noise and applying Brownian motion to the residue orientations on the manifold of rotation matrices.

Evaluation Data

The evaluation strategy involved training the model on PDB structures (as described in Training Dataset) with added noise and then assessing its ability to denoise these structures, as well as evaluating its performance on design tasks with auxiliary conditioning information. Data Collection Method by dataset Automatic: random splits from PDB dataset. Labeling Method by dataset Automatic: random splits from PDB dataset.
The training, validation, and test splits were derived from protein assemblies in the PDB, which includes structures determined by X-ray crystallography or cryo-electron microscopy (cryoEM).
RFdiffusion NIM is optimized to run best on the following compute:
GPUTotal GPU memoryAzure VM compute#GPUs on VMLink
A10080Standard_NC24ads_A100_v41link
A100160Standard_NC48ads_A100_v42link
A100320Standard_NC96ads_A100_v44link
A100640STANDARD_ND96AMSR_A100_V48link
H10094STANDARD_NC40ADS_H100_V51link
H100188STANDARD_NC80ADIS_H100_V52link
Model Specifications
LicenseCustom
Last UpdatedMay 2025
PublisherNvidia