RFdiffusion
Version: 1
RFdiffusion (RoseTTAFold Diffusion) is a generative model that creates novel protein structures for protein scaffolding and protein binder design tasks. This model generates entirely new protein backbones and designs proteins that can be specifically tailored to bind to target molecules. It's a generative model of protein backbones for protein binder design. For more information on the RFDiffusion model please check the Github repo
Intended Use
Primary Use Cases
The Rfdiffusion model is most suitable for - Motif Scaffolding- Unconditional protein generation
- Symmetric unconditional generation (cyclic, dihedral and tetrahedral symmetries currently implemented, more - coming!)
- Symmetric motif scaffolding
- Binder design
- Design diversification ("partial diffusion", sampling around a design)
Responsible AI Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here .Training Data
Link: The Protein Data Bank Data Collection Method by dataset Hybrid: Automatic, HumanFor PDB dataset, scientists worldwide submit structural data determined by X-ray crystallography or cryo-electron microscopy (cryo-EM). This includes atomic coordinates, experimental data, and metadata about the biological macromolecules. Labeling Method by dataset Hybrid: Automatic, Human
For PDB dataset, expert biocurators review the submitted data to ensure accuracy and completeness. This involves checking the plausibility of the data and annotating it with relevant biological and chemical information. Properties (Quantity, Dataset Descriptions, Sensor(s)): The training dataset used for RFdiffusion, as detailed in referenced paper, consists of protein structures sampled from the Protein Data Bank (PDB). To prepare these structures for training, a noising process is applied. This process involves simulating up to 200 steps of random modifications on the protein structures. Specifically, the modifications include perturbing the Cα coordinates with 3D Gaussian noise and applying Brownian motion to the residue orientations on the manifold of rotation matrices.
Evaluation Data
The evaluation strategy involved training the model on PDB structures (as described in Training Dataset) with added noise and then assessing its ability to denoise these structures, as well as evaluating its performance on design tasks with auxiliary conditioning information. Data Collection Method by dataset Automatic: random splits from PDB dataset. Labeling Method by dataset Automatic: random splits from PDB dataset.The training, validation, and test splits were derived from protein assemblies in the PDB, which includes structures determined by X-ray crystallography or cryo-electron microscopy (cryoEM).
RFdiffusion NIM is optimized to run best on the following compute:
GPU | Total GPU memory | Azure VM compute | #GPUs on VM | Link |
---|---|---|---|---|
A100 | 80 | Standard_NC24ads_A100_v4 | 1 | link |
A100 | 160 | Standard_NC48ads_A100_v4 | 2 | link |
A100 | 320 | Standard_NC96ads_A100_v4 | 4 | link |
A100 | 640 | STANDARD_ND96AMSR_A100_V4 | 8 | link |
H100 | 94 | STANDARD_NC40ADS_H100_V5 | 1 | link |
H100 | 188 | STANDARD_NC80ADIS_H100_V5 | 2 | link |
Model Specifications
LicenseCustom
Last UpdatedMay 2025
PublisherNvidia