MatterGen
Version: 2
Key capabilities
About this model
MatterGen is able to produce novel, unique, and stable material candidates both with and without property conditions. For property-guided generation, MatterGen is able to produce S.U.N. structures with extreme property values such as 400 GPa bulk modulus, where there are only two such structures in the labeled reference set. MatterGen outperforms both classical as well as recent deep generative model baselines.Key model capabilities
- Generate inorganic materials candidates without property condition
- Generate inorganic materials candidates with target chemical system
- Generate inorganic materials candidates with target space group number
- Generate inorganic materials candidates with target chemical system and formation energy above the convex hull (eV/atom)
- Generate inorganic materials candidates with target band gap (eV)
- Generate inorganic materials candidates with target magnetic density (Angstrom**-3)
- Generate inorganic materials candidates with target magnetic density (Angstrom**-3) and Herfindahl–Hirschman index (HHI)
- Generate inorganic materials candidates with target bulk modulus (GPa)
Use cases
See Responsible AI for additional considerations for responsible use.Key use cases
Generate inorganic materials candidates without property condition. Generate inorganic materials candidates with target chemical system. Generate inorganic materials candidates with target space group number. Generate inorganic materials candidates with target chemical system and formation energy above the convex hull (eV/atom). Generate inorganic materials candidates with target band gap (eV). Generate inorganic materials candidates with target magnetic density (Angstrom**-3). Generate inorganic materials candidates with target magnetic density (Angstrom**-3) and Herfindahl–Hirschman index (HHI). Generate inorganic materials candidates with target bulk modulus (GPa).Out of scope use cases
Generate materials with more than 20 atoms inside the unit cell. Generate organic crystals or non-crystalline materials. Generate crystals containing noble gas elements, radioactive elements, or elements with atomic number greater than 84 – these elements were removed from the training data.Pricing
Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.Technical specs
MatterGen contains 46.8M parametersTraining cut-off date
The provider has not supplied this information.Training time
One training epoch of around 600K training samples takes around 6 minutes on 8 NVIDIA A100 GPUsInput formats
The provider has not supplied this information.Output formats
Sampling 1,000 structures takes around two hours using a single NVIDIA V100 GPUSupported languages
The provider has not supplied this information.Sample JSON response
{
"input_data": {
"properties_to_condition_on": {}
}
}
Model architecture
It is a diffusion model which jointly predicts a material's atomic fractional coordinates, elements, as well as unit cell lattice vectors.Long context
The provider has not supplied this information.Optimizing model performance
The provider has not supplied this information.Additional assets
The provider has not supplied this information.Training disclosure
Training, testing and validation
MatterGen was trained on crystalline materials from the following data sources: MP (https://next-gen.materialsproject.org/ v2022.10.28, Creative Commons Attribution 4.0 International License), an open-access resource containing DFT-relaxed crystal structures obtained from a variety of sources, but largely based upon experimentally-known crystals. The Alexandria dataset (https://alexandria.icams.rub.de/ Creative Commons Attribution 4.0 International License), an open-access resource containing DFT-relaxed crystal structures from a variety of sources, including a large quantity of hypothetical crystal structures generated by ML methods or other algorithmic means. To train MatterGen, we select only structures with up to 20 atoms and whose energy above hull is below 0.1 eV/atom. Further, we remove structures that contain noble gas elements, elements with atomic number higher than 84 (which includes most radioactive elements), or the radioactive elements "Tc" and "Pm" from the training data.Distribution
Distribution channels
The provider has not supplied this information.More information
The provider has not supplied this information.Responsible AI considerations
Safety techniques
The provider has not supplied this information.Safety evaluations
The provider has not supplied this information.Known limitations
- MatterGen was only trained on and evaluated on up to 20 atoms inside the unit cell; more atoms are currently not supported.
- The performance on property-guided generation heavily depends on the quality and quantity of the property labels used to train MatterGen. For extreme property values where there are few training structures with similar values, the performance may degrade.
- MatterGen's training data is materials below 0.1 eV/atoms below the reference convex hull. Therefore, it is expected that the fraction of generated materials on or below the convex hull is significantly lower than the fraction of materials within 0.1 eV/atom above the convex hull.
Acceptable use
Acceptable use policy
The provider has not supplied this information.Quality and performance evaluations
Source: Microsoft MatterGen was evaluated on unconditional generation across the following metrics:- The percentage of stable, novel, and unique (S.U.N.) structures among 1,024 generated samples.
- Stable means a structure's energy is less than 0.1 eV/atom above the reference convex hull
- Novel means a structure does not match any structure in our reference dataset with the disordered structure matcher presented in the paper.
- Unique means that there is no other structure among the generated ones which matches a given structure.
- The average root mean square distance (RMSD) of generated structures and their DFT-relaxed local energy minima, measured in Angstrom.
Benchmarking methodology
Source: Microsoft We relax structures from the above data sources with DFT and select only those structures whose energy above the combined convex hull is below 0.1 eV/atom. MatterGen is trained solely on primitive structures. We further select only structures with up to 20 atoms inside the unit cell. We use the Niggli reduction to preprocess the unit cell lattices, followed by the polar decomposition to ensure the lattice matrices are symmetric matrices. See the paper for more detailed information. for more detailed information.Public data summary
Source: Microsoft The provider has not supplied this information.Model Specifications
LicenseMit
Last UpdatedDecember 2025
Input TypeText
Output TypeText
ProviderMicrosoft
Languages1 Language