MatterGen
MatterGen
Version: 2
MicrosoftLast updated December 2025
A generative model for inorganic materials design
Atomistic generation

Key capabilities

About this model

MatterGen is able to produce novel, unique, and stable material candidates both with and without property conditions. For property-guided generation, MatterGen is able to produce S.U.N. structures with extreme property values such as 400 GPa bulk modulus, where there are only two such structures in the labeled reference set. MatterGen outperforms both classical as well as recent deep generative model baselines.

Key model capabilities

  • Generate inorganic materials candidates without property condition
  • Generate inorganic materials candidates with target chemical system
  • Generate inorganic materials candidates with target space group number
  • Generate inorganic materials candidates with target chemical system and formation energy above the convex hull (eV/atom)
  • Generate inorganic materials candidates with target band gap (eV)
  • Generate inorganic materials candidates with target magnetic density (Angstrom**-3)
  • Generate inorganic materials candidates with target magnetic density (Angstrom**-3) and Herfindahl–Hirschman index (HHI)
  • Generate inorganic materials candidates with target bulk modulus (GPa)

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

Generate inorganic materials candidates without property condition. Generate inorganic materials candidates with target chemical system. Generate inorganic materials candidates with target space group number. Generate inorganic materials candidates with target chemical system and formation energy above the convex hull (eV/atom). Generate inorganic materials candidates with target band gap (eV). Generate inorganic materials candidates with target magnetic density (Angstrom**-3). Generate inorganic materials candidates with target magnetic density (Angstrom**-3) and Herfindahl–Hirschman index (HHI). Generate inorganic materials candidates with target bulk modulus (GPa).

Out of scope use cases

Generate materials with more than 20 atoms inside the unit cell. Generate organic crystals or non-crystalline materials. Generate crystals containing noble gas elements, radioactive elements, or elements with atomic number greater than 84 – these elements were removed from the training data.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

MatterGen contains 46.8M parameters

Training cut-off date

The provider has not supplied this information.

Training time

One training epoch of around 600K training samples takes around 6 minutes on 8 NVIDIA A100 GPUs

Input formats

The provider has not supplied this information.

Output formats

Sampling 1,000 structures takes around two hours using a single NVIDIA V100 GPU

Supported languages

The provider has not supplied this information.

Sample JSON response

{
  "input_data": {
    "properties_to_condition_on": {}
  }
}

Model architecture

It is a diffusion model which jointly predicts a material's atomic fractional coordinates, elements, as well as unit cell lattice vectors.

Long context

The provider has not supplied this information.

Optimizing model performance

The provider has not supplied this information.

Additional assets

The provider has not supplied this information.

Training disclosure

Training, testing and validation

MatterGen was trained on crystalline materials from the following data sources: MP (https://next-gen.materialsproject.org/ v2022.10.28, Creative Commons Attribution 4.0 International License), an open-access resource containing DFT-relaxed crystal structures obtained from a variety of sources, but largely based upon experimentally-known crystals. The Alexandria dataset (https://alexandria.icams.rub.de/ Creative Commons Attribution 4.0 International License), an open-access resource containing DFT-relaxed crystal structures from a variety of sources, including a large quantity of hypothetical crystal structures generated by ML methods or other algorithmic means. To train MatterGen, we select only structures with up to 20 atoms and whose energy above hull is below 0.1 eV/atom. Further, we remove structures that contain noble gas elements, elements with atomic number higher than 84 (which includes most radioactive elements), or the radioactive elements "Tc" and "Pm" from the training data.

Distribution

Distribution channels

The provider has not supplied this information.

More information

The provider has not supplied this information.

Responsible AI considerations

Safety techniques

The provider has not supplied this information.

Safety evaluations

The provider has not supplied this information.

Known limitations

  • MatterGen was only trained on and evaluated on up to 20 atoms inside the unit cell; more atoms are currently not supported.
  • The performance on property-guided generation heavily depends on the quality and quantity of the property labels used to train MatterGen. For extreme property values where there are few training structures with similar values, the performance may degrade.
  • MatterGen's training data is materials below 0.1 eV/atoms below the reference convex hull. Therefore, it is expected that the fraction of generated materials on or below the convex hull is significantly lower than the fraction of materials within 0.1 eV/atom above the convex hull.

Acceptable use

Acceptable use policy

The provider has not supplied this information.

Quality and performance evaluations

Source: Microsoft MatterGen was evaluated on unconditional generation across the following metrics:
  • The percentage of stable, novel, and unique (S.U.N.) structures among 1,024 generated samples.
    • Stable means a structure's energy is less than 0.1 eV/atom above the reference convex hull
    • Novel means a structure does not match any structure in our reference dataset with the disordered structure matcher presented in the paper.
    • Unique means that there is no other structure among the generated ones which matches a given structure.
  • The average root mean square distance (RMSD) of generated structures and their DFT-relaxed local energy minima, measured in Angstrom.
MatterGen achieves 38.57 % S.U.N. rate among generated structures, and the average RMSD of its samples is 0.021 Angstrom. For more details see Section 2.2 of the MatterGen paper. We also evaluate MatterGen on property-conditioned generation. For generation conditioned on chemical system, MatterGen produces 83 % S.U.N. structures on well-explored chemical systems, 65 % on partially explored systems, and 49 % on unexplored chemical systems. For more details, see Section 2.3 of the MatterGen paper. Conditioning on a bulk modulus value of 400 GPa, MatterGen produces 106 S.U.N. structures with > 400 GPa bulk modulus given a budget of 180 DFT property calculations. For more details, see Section 2.4 of the MatterGen paper. Conditioning on magnetic density of > 0.2 Angstrom-3, MatterGen produces 18 S.U.N. structures complying with the condition given a budget of 180 DFT property calculations. For more details, see Section 2.4 of the MatterGen paper. MatterGen is able to produce novel, unique, and stable material candidates both with and without property conditions. For property-guided generation, MatterGen is able to produce S.U.N. structures with extreme property values such as 400 GPa bulk modulus, where there are only two such structures in the labeled reference set. MatterGen outperforms both classical as well as recent deep generative model baselines. For more details on the performance of MatterGen, see the paper.

Benchmarking methodology

Source: Microsoft We relax structures from the above data sources with DFT and select only those structures whose energy above the combined convex hull is below 0.1 eV/atom. MatterGen is trained solely on primitive structures. We further select only structures with up to 20 atoms inside the unit cell. We use the Niggli reduction to preprocess the unit cell lattices, followed by the polar decomposition to ensure the lattice matrices are symmetric matrices. See the paper for more detailed information. for more detailed information.

Public data summary

Source: Microsoft The provider has not supplied this information.
Model Specifications
LicenseMit
Last UpdatedDecember 2025
Input TypeText
Output TypeText
ProviderMicrosoft
Languages1 Language