Skala
Skala
Version: 1
MicrosoftLast updated October 2025
Neural network-based XC functional for density functional theory
Skala is a neural network-based exchange-correlation functional for density functional theory (DFT), developed by Microsoft Research AI for Science. It leverages deep learning to predict exchange-correlation energies from electron density features, achieving chemical accuracy for atomization energies and strong performance on broad thermochemistry and kinetics benchmarks, all at a computational cost similar to semi-local DFT. Trained on a large, diverse dataset—including coupled cluster atomization energies and public benchmarks—Skala uses scalable message passing and local layers to learn both local and non-local effects. The model has about 276,000 parameters and matches the accuracy of leading hybrid functionals. Code and documentation are available at https://github.com/microsoft/skala .

Model sources

Quickstart

The recommended use of Skala on Foundry is via the client library in the skala package documented at https://github.com/microsoft/skala/blob/main/docs/foundry.rst

Uses

Direct intended uses

  1. The Skala functional is being shared with the research community to facilitate reproduction of evaluations with our model.
  2. Evaluating the energy difference of a reaction by computing the total energy of all compounds in the reaction using a Self-Consistent-Field (SCF) evaluation with the Skala exchange correlation functional.
  3. Evaluating the total energy of a molecule using a Self-Consistent-Field (SCF) evaluation with the Skala exchange correlation functional. Note that like all density functionals, energy differences are predicted much more reliably than total energies of single molecules.
  4. The SCF evaluation we provide is implemented using Accelerated DFT and GauXC , which runs the functional inference on GPU.

Risks and limitations

  1. The Skala inference server currently supports only single-instance endpoints.
  2. Interpretation of results requires expertise in quantum chemistry.
  3. The Skala functional is trained on atomization energies, conformers, proton affinities, ionization potentials, elementary reaction pathways, non-covalent interactions, as well as a small amount of electron affinities and total energies of atoms. We have benchmarked the performance on the public benchmark W4-17 for atomization energies, as well as the public benchmark set GMTKN55, which covers general-main group thermochemistry, kinetics and noncovalent interactions, to provide an indication of how it generalizes outside of the training set. We have also measured robustness on dipole moment predictions and geometry optimization
  4. The Skala functional has been trained on data that contains the following elements of the periodic table: H-Ar, Br, Kr, I, Xe. We have tested it on data containing the elements H-Ca, Ge-Kr, Sn-I, Pb, Bi.
  5. When used with A100 80GB, the GPU memory currently limits the molecular size to up to approximately 100 atoms when used with the "fine" grid.
  6. Given the above point 3 to 5, we remind the user that this is not a production model. Therefore, we advise testing the functional further before applying it to your research. We welcome any feedback!

Training details

Training data

The following data is included in our training set: For all training data we have created input density and derived meta-GGA features using density matrices of converged SCF calculations with the B3LYP functional (def2-QZVP and ma-def2-QZVP basis set) using a modified version of the PySCF software package.

Training procedure

Preprocessing

The training datapoints are preprocessed as follows.
  • For each molecule the density and derived meta-GGA features are computed based on the density matrix of converged SCF calculations with the B3LYP functional using a def2-QZVP or ma-def2-QZVP basis set using a modified version of the PySCF software package.
  • Density fitting was not applied for the SCF calculation.
  • The density features were evaluated on an atom centered integration grid of level 2 or level 3.
  • The radial integral was performed with the Treutler-Ahlrichs, Gauss-Chebychev, Delley, or Mura-Knowles based on Bragg atomic radii with Treutler based radii adjustment.
  • The angular grid points were pruned using the NWChem scheme.
  • No density based cutoff was applied and all grid points were retained for training.

Training hyperparameters

The training hyperparameter settings are detailed in the supplementary of Accurate and scalable exchange-correlation with deep learning, Luise et al. 2025 . This repository only includes the code to evaluate the checkpoints provided, not the training code.

Speeds, sizes, times

The training of our functional using the training dataset as detailed in the section “Training data” took approximately 36h for 500k training steps on a NC A100 v4 series VM with 4 NVIDIA A100 GPU with 80 GB memory each, 96 CPU cores, 880 GB RAM, and a 256 GB disk.
The model checkpoints have +- 276,001 trainable parameters.
Skala was evaluated using several public and internal quantum chemistry benchmarks to assess its accuracy and robustness. Key datasets include W4-17 for atomization energies, GMTKN55 for general thermochemistry, kinetics, and noncovalent interactions, as well as specialized datasets for geometry optimization and dipole moment prediction. These benchmarks provide a comprehensive view of Skala’s performance compared to established functionals.
CategoryBenchmarkSkalaωB97M-VB3LYPr2SCAN
Atomization EnergyW4-17 (MAE, kcal/mol)1.062.03.84.5
General main-group energeticsGMTKN55 (WTMAD-2, kcal/mol)3.93.26.47.3
See the Skala paper more results.
Model Specifications
LicenseMit
Last UpdatedOctober 2025
Input TypeJson
Output TypeJson
PublisherMicrosoft
Languages1 Language