Skala
Version: 1
Skala is a neural network-based exchange-correlation functional for density functional theory (DFT), developed by Microsoft Research AI for Science. It leverages deep learning to predict exchange-correlation energies from electron density features, achieving chemical accuracy for atomization energies and strong performance on broad thermochemistry and kinetics benchmarks, all at a computational cost similar to semi-local DFT.
Trained on a large, diverse dataset—including coupled cluster atomization energies and public benchmarks—Skala uses scalable message passing and local layers to learn both local and non-local effects. The model has about 276,000 parameters and matches the accuracy of leading hybrid functionals. Code and documentation are available at https://github.com/microsoft/skala .
Model sources
- Repository: https://github.com/microsoft/skala
- Paper: https://arxiv.org/abs/2506.14665
Quickstart
The recommended use of Skala on Foundry is via the client library in theskala
package documented at https://github.com/microsoft/skala/blob/main/docs/foundry.rst
Uses
Direct intended uses
- The Skala functional is being shared with the research community to facilitate reproduction of evaluations with our model.
- Evaluating the energy difference of a reaction by computing the total energy of all compounds in the reaction using a Self-Consistent-Field (SCF) evaluation with the Skala exchange correlation functional.
- Evaluating the total energy of a molecule using a Self-Consistent-Field (SCF) evaluation with the Skala exchange correlation functional. Note that like all density functionals, energy differences are predicted much more reliably than total energies of single molecules.
- The SCF evaluation we provide is implemented using Accelerated DFT and GauXC , which runs the functional inference on GPU.
Risks and limitations
- The Skala inference server currently supports only single-instance endpoints.
- Interpretation of results requires expertise in quantum chemistry.
- The Skala functional is trained on atomization energies, conformers, proton affinities, ionization potentials, elementary reaction pathways, non-covalent interactions, as well as a small amount of electron affinities and total energies of atoms. We have benchmarked the performance on the public benchmark W4-17 for atomization energies, as well as the public benchmark set GMTKN55, which covers general-main group thermochemistry, kinetics and noncovalent interactions, to provide an indication of how it generalizes outside of the training set. We have also measured robustness on dipole moment predictions and geometry optimization
- The Skala functional has been trained on data that contains the following elements of the periodic table: H-Ar, Br, Kr, I, Xe. We have tested it on data containing the elements H-Ca, Ge-Kr, Sn-I, Pb, Bi.
- When used with A100 80GB, the GPU memory currently limits the molecular size to up to approximately 100 atoms when used with the "fine" grid.
- Given the above point 3 to 5, we remind the user that this is not a production model. Therefore, we advise testing the functional further before applying it to your research. We welcome any feedback!
Training details
Training data
The following data is included in our training set:- 99% of MSR-ACC:TAE (±78k reactions) containing atomization energies. This data was generated in collaboration with prof. Amir Karton, University of New England, with the W1-F12 composite protocol based on CCSD(T) and is released as part of the Microsoft Research Accurate Chemistry Collection (MSR-ACC).
- Total energies, electron affinities and ionization potentials (up to triple ionization) for atoms, from H to Ar (excluding Li and Be because of basis set constraints). This data was produced in-house with CCSD(T) by extrapolating to the complete basis set limit from quadruple zeta (QZ) and pentuple zeta (5Z) basis set calculations. The basis sets used for H and He were aug-cc-pV(Q+d)Z, aug-cc-pV(5+d), while for the remaining elements B-Ar the basis sets used were aug-cc-pCVQZ and aug-cc-pCV5Z. All basis sets were obtained from the Basis Set Exchange (BSE). Extrapolation of the correlation energy was performed by fitting a simple Z⁻³ expression, while extrapolation of the Hartree-Fock energy was performed using the two-point extrapolation suggested in Comment on: “Estimating the Hartree–Fock limit from finite basis set calculations” [Jensen F (2005) Theor Chem Acc 113:267], Karton et al., Theor. Chem. Acc. 2005 .
- Four datasets from the ATLAS collection of non-covalent interactions :
- D442x10 , dissociation curves for dispersion bound van-der-Waals complexes
- SH250x10 , dissociation curves for sigma-hole bound van-der-Waals complexes
- R739x5 , compressed van-der-Waals complexes
- HB300SPXx10 , dissociation curves for hydrogen bound van-der-Waals complexes
- W4-CC, containing atomization energies of carbon clusters provided in Atomization energies of the carbon clusters Cn (n = 2−10) revisited by means of W4 theory as well as density functional, Gn, and CBS methods, Karton et al., Mol. Phys. 2009 .
Training procedure
Preprocessing
The training datapoints are preprocessed as follows.- For each molecule the density and derived meta-GGA features are computed based on the density matrix of converged SCF calculations with the B3LYP functional using a def2-QZVP or ma-def2-QZVP basis set using a modified version of the PySCF software package.
- Density fitting was not applied for the SCF calculation.
- The density features were evaluated on an atom centered integration grid of level 2 or level 3.
- The radial integral was performed with the Treutler-Ahlrichs, Gauss-Chebychev, Delley, or Mura-Knowles based on Bragg atomic radii with Treutler based radii adjustment.
- The angular grid points were pruned using the NWChem scheme.
- No density based cutoff was applied and all grid points were retained for training.
Training hyperparameters
The training hyperparameter settings are detailed in the supplementary of Accurate and scalable exchange-correlation with deep learning, Luise et al. 2025 . This repository only includes the code to evaluate the checkpoints provided, not the training code.Speeds, sizes, times
The training of our functional using the training dataset as detailed in the section “Training data” took approximately 36h for 500k training steps on a NC A100 v4 series VM with 4 NVIDIA A100 GPU with 80 GB memory each, 96 CPU cores, 880 GB RAM, and a 256 GB disk.The model checkpoints have +- 276,001 trainable parameters.
Skala was evaluated using several public and internal quantum chemistry benchmarks to assess its accuracy and robustness. Key datasets include W4-17 for atomization energies, GMTKN55 for general thermochemistry, kinetics, and noncovalent interactions, as well as specialized datasets for geometry optimization and dipole moment prediction. These benchmarks provide a comprehensive view of Skala’s performance compared to established functionals.
See the Skala paper more results.
Category | Benchmark | Skala | ωB97M-V | B3LYP | r2SCAN |
---|---|---|---|---|---|
Atomization Energy | W4-17 (MAE, kcal/mol) | 1.06 | 2.0 | 3.8 | 4.5 |
General main-group energetics | GMTKN55 (WTMAD-2, kcal/mol) | 3.9 | 3.2 | 6.4 | 7.3 |
Model Specifications
LicenseMit
Last UpdatedOctober 2025
Input TypeJson
Output TypeJson
PublisherMicrosoft
Languages1 Language