AI Model Catalog | Microsoft Foundry Models

RFdiffusion

Version: 1

Nvidia•Last updated May 2025

RFdiffusion (RoseTTAFold Diffusion) is a generative model that creates novel protein structures for protein scaffolding and protein binder design tasks. This model generates entirely new protein backbones and designs proteins that can be specifically tailored to bind to target molecules. It's a generative model of protein backbones for protein binder design. For more information on the RFDiffusion model please check the Github repo

Intended Use

Primary Use Cases

The Rfdiffusion model is most suitable for - Motif Scaffolding

Unconditional protein generation
Symmetric unconditional generation (cyclic, dihedral and tetrahedral symmetries currently implemented, more - coming!)
Symmetric motif scaffolding
Binder design
Design diversification ("partial diffusion", sampling around a design)

Responsible AI Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here .

Training Data

Link: The Protein Data Bank Data Collection Method by dataset Hybrid: Automatic, Human
For PDB dataset, scientists worldwide submit structural data determined by X-ray crystallography or cryo-electron microscopy (cryo-EM). This includes atomic coordinates, experimental data, and metadata about the biological macromolecules. Labeling Method by dataset Hybrid: Automatic, Human
For PDB dataset, expert biocurators review the submitted data to ensure accuracy and completeness. This involves checking the plausibility of the data and annotating it with relevant biological and chemical information. Properties (Quantity, Dataset Descriptions, Sensor(s)): The training dataset used for RFdiffusion, as detailed in referenced paper, consists of protein structures sampled from the Protein Data Bank (PDB). To prepare these structures for training, a noising process is applied. This process involves simulating up to 200 steps of random modifications on the protein structures. Specifically, the modifications include perturbing the Cα coordinates with 3D Gaussian noise and applying Brownian motion to the residue orientations on the manifold of rotation matrices.

Evaluation Data

The evaluation strategy involved training the model on PDB structures (as described in Training Dataset) with added noise and then assessing its ability to denoise these structures, as well as evaluating its performance on design tasks with auxiliary conditioning information. Data Collection Method by dataset Automatic: random splits from PDB dataset. Labeling Method by dataset Automatic: random splits from PDB dataset.
The training, validation, and test splits were derived from protein assemblies in the PDB, which includes structures determined by X-ray crystallography or cryo-electron microscopy (cryoEM).

RFdiffusion NIM is optimized to run best on the following compute:

GPU	Total GPU memory	Azure VM compute	#GPUs on VM	Link
A100	80	Standard_NC24ads_A100_v4	1	link
A100	160	Standard_NC48ads_A100_v4	2	link
A100	320	Standard_NC96ads_A100_v4	4	link
A100	640	STANDARD_ND96AMSR_A100_V4	8	link
H100	94	STANDARD_NC40ADS_H100_V5	1	link
H100	188	STANDARD_NC80ADIS_H100_V5	2	link

Model Specifications

LicenseCustom

Last UpdatedMay 2025

ProviderNvidia

Quick Start