AI Model Catalog | Azure AI Foundry Models

EvoDiff

Version: 1

Microsoft•Last updated May 2025

Microsoft Research's EvoDiff is a diffusion modeling framework capable of generating high-fidelity, diverse, and novel proteins with the option of conditioning according to sequence constraints. Because it operates in the universal protein design space, EvoDiff can unconditionally sample diverse structurally-plausible proteins, generate intrinsically disordered regions, and scaffold structural motifs using only sequence information, challenging a paradigm in structure-based protein design. We are thrilled to release EvoDiff-OADM 640M on Azure AI Foundry. For all other models in the EvoDiff suite, please see our github repository . If you use the code our repository, the results, or the model available on Azure AI Foundry, please cite our preprint .

Overview

We investigated two types of forward processes for diffusion over discrete data modalities to determine which would be most effective. In order-agnostic autoregressive diffusion OADM , one amino acid is converted to a special mask token at each step in the forward process. After $T=L$ steps, where $L$ is the length of the sequence, the entire sequence is masked. We additionally designed discrete denoising diffusion probabilistic models D3PM for protein sequences. In EvoDiff-D3PM, the forward process corrupts sequences by sampling mutations according to a transition matrix, such that after $T$ steps the sequence is indistinguishable from a uniform sample over the amino acids. In the reverse process for both, a neural network model is trained to undo the previous corruption. The trained model can then generate new sequences starting from sequences of masked tokens or of uniformly-sampled amino acids for EvoDiff-OADM or EvoDiff-D3PM, respectively. We trained all EvoDiff sequence models on 42M sequences from UniRef50 using a dilated convolutional neural network architecture introduced in the CARP protein masked language model. We trained 38M-parameter and 640M-parameter versions for each forward corruption scheme and for left-to-right autoregressive (LRAR) decoding. To explicitly leverage evolutionary information, we designed and trained EvoDiff MSA models using the MSA Transformer architecture on the OpenFold dataset. To do so, we subsampled MSAs to a maximum length of 512 residues per sequence and a maximum depth of 64 sequences, either by randomly sampling the sequences ("Random") or by greedily maximizing for sequence diversity ("Max"). Within each subsampling strategy, we then trained EvoDiff MSA models with the OADM and D3PM corruption schemes.

Generation on Azure AI Foundry

Start using EvoDiff on Azure AI Foundry with this Jupyter Notebook .

Intended Use

Primary Use Cases

Below are several use cases for EvoDiff. Currently, Azure AI Foundry supports unconditional or conditional design with EvoDiff-Seq. To use EvoDiff-MSA, we point you to our github repository for more information.

Unconditional generation with EvoDiff-Seq or EvoDiff-MSA(https://github.com/microsoft/evodiff/blob/main/README.md#unconditional-generation-with-evodiff-msa )
Conditional sequence generation
1. Evolution-guided protein generation with EvoDiff-MSA
2. Generating intrinsically disordered regions with EvoDiff-Seq and EvoDiff-MSA
3. Scaffolding functional motifs with EvoDiff-Seq and EvoDiff-MSA

Out-of-Scope Use Cases

This model is intended for use on protein sequences. It is not meant for natural language or other biological sequences, such as DNA sequences.

Bias, Risks, and Limitations

This model will not generate sequences that are not proteins. This includes cases such as trying to generate other biological sequences, such as DNA sequences, or natural language. In other words, the model will perform best on data within the data distribution, which includes protein sequences and multiple sequence alignments (MSAs). Based on review of currently available information, EvoDiff is not be expected to provide any notable uplift in expertise to users. It is also very unlikely to create any new or add to any known CBRN or advanced autonomy risks.

Training Data

We obtain sequences from the Uniref50 dataset , which contains
approximately 42 million protein sequences. The Multiple Sequence Alignments (MSAs) are from the OpenFold dataset ,
which contains 401,381 MSAs for 140,000 unique Protein Data Bank (PDB) chains and 16,000,000 UniClust30 clusters. The intrinsically disordered regions (IDR) data was obtained from the Reverse Homology GitHub . For the scaffolding structural motif benchmark, we provide pdb and fasta files used for conditionally generating sequences in the examples/scaffolding-pdbs folder. We also provide pdb files used for conditionally generating MSAs in the examples/scaffolding-msas folder.

Environmental Impact

Hardware Type: 32GB NVIDIA V100 GPUs
Hours used: 4,128 (14 days per sequence model, 10 days per MSA model)
Cloud Provider: Azure
Compute Region: East US
Carbon Emitted: 485.21 kg

For full details, please refer to our preprint .

Testing Data

We provide all generated sequences on the EvoDiff Zenodo . To download our unconditional generated sequences from unconditional_generations.csv file:

curl -O https://zenodo.org/record/8329165/files/unconditional_generations.csv?download=1

To extract all unconditionally generated sequences created using the EvoDiff-seq oa_dm_640M model, run the following code:

import pandas as pd
df = pd.read_csv('unconditional_generations.csv', index_col = 0)
subset = df.loc[df['model'] == 'evodiff_oa_dm_640M']

Please view our README.md for more information about the CSV files containing generated data.

Metrics

To analyze the quality of the generations, we look at:

amino acid KL divergence (aa_reconstruction_parity_plot )
secondary structure KL divergence (evodiff/analysis/calc_kl_ss.py )
model perplexity for sequences (evodiff/analysis/sequence_perp.py )
model perplexity for MSAs (evodiff/analysis/msa_perp.py )
Fréchet inception distance (evodiff/analysis/calc_fid.py )
Hamming distance (evodiff/analysis/calc_nearestseq_hamming.py )
sc-RMSD score (analysis/rmsd_analysis.py )

We also compute the self-consistency perplexity to evaluate the foldability of generated sequences. To do so, we make use of various tools:

TM score
Omegafold
ProteinMPNN
ESM-IF1 ; see this Jupyter notebook for setup details.
PGP
DISOPRED3
DR-BERT

We refer to the setup instructions outlined by the authors of those tools. Our analysis scripts for iterating over these tools are in the evodiff/analysis/downstream_bash_scripts folder. Once we run the scripts in this folder, we analyze the results in self_consistency_analysis.py .

EvoDiff-Seq Performance

The reconstruction KL (Recon KL) was calculated between the distribution of amino acids in the test set and in generated samples (n=1000). The perplexity was computed on 25k samples from the test set. The minimum Hamming distance to any train sequence of the same length (Hamming) is reported for each model as the mean ± standard deviation over the generated samples. Frechet ProtT5 distance (FPD) was calculated between the test set and generated samples. The secondary structure KL (SS KL) was calculated between the means of the predicted secondary structures of the test and generated samples.

Model	parameters	Recon KL	perplexity	Hamming	FPD	SS KL
Test	-	9.92e-4¹	-	0.0039²	0.10¹	1.37e-5¹
EvoDiff-Seq (D3PM BLOSUM)	38M	1.77e-2	17.16	0.83 ± 0.05	1.42	3.30e-5
EvoDiff-Seq (D3PM Uniform)	38M	1.48e-3	18.82	0.83 ± 0.05	1.31	3.73e-5
EvoDiff-Seq (OADM)	38M	1.11e-3	14.61	0.83 ± 0.07	0.92	1.61e-4
EvoDiff-Seq (D3PM BLOSUM)	640M	3.73e-2	15.74	0.83 ± 0.05	1.53	4.96e-4
EvoDiff-Seq (D3PM Uniform)	640M	2.90e-3	18.47	0.83 ± 0.05	1.35	2.13e-4
EvoDiff-Seq (OADM)	640M	1.26e-3	13.05	0.83 ± 0.08	0.88	1.48e-4
LRAR	38M	7.90e-4	12.38	0.82 ± 0.06	0.86	1.61e-4
CARP	38M	5.71e-1	25.13	0.74 ± 0.07	6.30	2.72e-3
LRAR	640M	7.01e-4	10.41	0.83 ± 0.06	0.63	1.76e-5
CARP	640M	3.56e-1	31.77	0.84 ± 0.05	1.78	5.03e-3
ESM-1b³	650M	4.91e-1	53.49	0.83 ± 0.06	6.67	5.48e-4
ESM-2³	650M	5.00e-1	68.39	0.84 ± 0.06	6.79	3.05e-3
FoldingDiff⁴	14M	5.49e-2	-	-	1.64	1.76e-3
RFdiffusion⁵	60M	7.19e-2	-	-	1.96	5.98e-3
Random	-	1.65e-1	20	0.85 ± 0.04	3.16	1.90e-4

Notes:

Calculated between the test set and validation set.
Reported value is the minimum Hamming distance between any two natural sequences of the same length in UniRef50.
Due to model constraints, the maximum sequence length sampled was 1022.
For the FoldingDiff baseline, 1000 structures generated by FoldingDiff were randomly selected, and the corresponding 1000 inferred sequences were inverse-folded using ESM IF. These sequences are between lengths of 50 and 128 residues.
For the RFdiffusion baseline,1000 structures were generated corresponding to the UniRef train distribution length, and 1000 corresponding sequences were inverse-folded using ESM-IF.

EvoDiff-MSA performance

The perplexity is calculated based on the ability of each model to reconstruct a subsampled MSA from the validation set. "Max" and "Rand. Perplexity" indicate MaxHamming and Random subsampling, respectively, for construction of the validation MSA.

Corruption	Subsampling	Params	MaxPerplexity	Rand.Perplexity
EvoDiff-MSA (D3PM BLOSUM)	Random	100M	11.35	8.31
EvoDiff-MSA (D3PM BLOSUM)	Max	100M	10.98	7.61
EvoDiff-MSA (D3PM Uniform)	Random	100M	10.14	6.77
EvoDiff-MSA (D3PM Uniform)	Max	100M	10.06	6.66
EvoDiff-MSA (OADM)	Random	100M	6.05	3.64
EvoDiff-MSA (OADM)	Max	100M	6.14	3.60
ESM-MSA-1b	Max	100M	11.20	5.89

EvoDiff-Seq structural plausibility metrics

Metrics are reported as the mean ± standard deviation for 1000 generated samples for each model.

Model	Params	ESM-IF scPerplexity	ProteinMPNN scPerplexity	OmegaFold pLDDT
Test	-	8.04 ± 4.04	3.09 ± 0.63	68.25 ± 17.85
EvoDiff-Seq (D3PM BLOSUM)	38M	12.38 ± 2.06	3.80 ± 0.49	42.76 ± 14.55
EvoDiff-Seq (D3PM Uniform)	38M	12.03 ± 2.04	3.77 ± 0.50	42.37 ± 14.39
EvoDiff-Seq (OADM)	38M	11.61 ± 2.38	3.72 ± 0.50	43.78 ± 14.18
EvoDiff-Seq (D3PM BLOSUM)	640M	11.86 ± 2.21	3.73 ± 0.48	44.14 ± 13.80
EvoDiff-Seq (D3PM Uniform)	640M	12.29 ± 2.05	3.78 ± 0.49	41.65 ± 14.32
EvoDiff-Seq (OADM)	640M	11.53 ± 2.50	3.71 ± 0.52	44.46 ± 14.62
LRAR	38M	11.61 ± 2.38	3.64 ± 0.56	48.26 ± 14.87
CARP	38M	9.68 ± 2.56	3.66 ± 0.62	50.79 ± 12.06
LRAR	640M	10.99 ± 2.63	3.59 ± 0.54	48.71 ± 15.47
CARP	640M	14.13 ± 2.42	4.05 ± 0.52	41.56 ± 14.35
ESM-1b	650M	13.90 ± 2.44	3.47 ± 0.68	58.07 ± 15.64
ESM-2	650M	14.02 ± 2.87	3.58 ± 0.69	50.70 ± 15.67
Random	-	14.68 ± 1.97	3.96 ± 0.50	39.97 ± 14.05

EvoDiff-MSA homolog conditioned generation

Metrics are reported as the mean ± standard deviation over 250 generated samples for each model. The first subsampling method listed describes the sampling procedure to train the model, and the second describes the subsampling procedure used for generation.

Model	scPerplexity	pLDDT	Seq. similarity	TM score
Valid	5.93 ± 3.19	73.99 ± 17.80	14.58 ± 21.64¹	-
EvoDiff-MSA (OADM (Rand) - Rand MSA)	9.41 ± 2.61	55.99 ± 14.75	6.13 ± 9.88	0.49 ± 0.23
EvoDiff-MSA (OADM (Max) - Max MSA)	9.38 ± 2.57	57.08 ± 16.01	6.74 ± 11.00	0.50 ± 0.23
EvoDiff-MSA (OADM (Max) - Rand MSA)	9.59 ± 2.69	54.95 ± 16.83	6.55 ± 10.49	0.46 ± 0.23
ESM-MSA-1b	10.05 ± 2.92	51.64 ± 16.54	7.13 ± 11.60	0.40 ± 0.23
Potts	10.34 ± 2.26	55.46 ± 13.82	12.01 ± 17.19	0.17 ± 0.10

Note:

Sequence similarity is calculated between the original query sequence and all the sequences in the MSA.

Scaffolding performance of EvoDiff-Seq

Number of scaffolding successes out of 100 generations for RFdiffusion, EvoDiff-Seq, the LRAR baseline, the CARP baseline, and randomly sampled scaffolds (Random), for each of 17 scaffolding problems. The bottom row contains the total number of successful scaffolds generated per model.

PDB	RFdiffusion	EvoDiff-Seq	LRAR	CARP	Random
1BCF	100	24	0	4	0
6E6R	71	16	7	3	1
2KL8	88	0	1	1	0
6EXZ	42	0	0	0	0
1YCR	74	13	12	10	7
6VW1	69	1	0	0	0
4JHW	0	0	0	0	0
5TPN	61	0	0	0	0
4ZYP	40	0	0	0	0
3IXT	25	23	22	13	7
7MRX	7	0	0	0	0
1PRW	8	68	70	54	5
5IUS	2	0	0	0	0
5YUI	0	4	0	0	0
5WN9	0	0	0	0	2
1QJG	0	0	0	0	0
5TRV	22	0	0	0	0
Total	610	149	112	85	22

Scaffolding performance of EvoDiff-MSA

Number of scaffolding successes out of 100 generations for RFdiffusion, EvoDiff-MSA (Max), EvoDiff-MSA (Random), and the ESM-MSA baseline, for each of 17 scaffolding problems. The bottom row contains the total number of successful scaffolds generated per model.

PDB	RFdiffusion	EvoDiff-MSA (Max)	EvoDiff-MSA (Random)	ESM-MSA
1BCF	100	100	98	99
6E6R	71	87	63	96
2KL8	88	11	31	42
6EXZ	42	86	87	73
1YCR	74	3	0	0
6VW1	69	4	3	4
4JHW	0	0	0	0
5TPN	61	0	0	0
4ZYP	40	0	0	0
3IXT	25	1	0	5
7MRX	7	72	68	66
1PRW	8	48	46	92
5IUS	2	3	1	7
5YUI	0	58	44	70
5WN9	0	0	0	0
1QJG	0	34	22	38
5TRV	22	15	12	12
Total	610	522	475	604

Model Specifications

LicenseMit

Last UpdatedMay 2025

Input TypeText

Output TypeText

PublisherMicrosoft

Languages1 Language

Quick Start