RetroChimera

Version: 2

Microsoft•Last updated October 2025

RetroChimera is a model that takes as input a product molecule that one wants to synthesize (encoded as a SMILES string), and produces several potential chemical reactions which could be used to produce that input molecule. Each reaction is represented as a group of ingredients (reactant molecules), with those molecules again represented each by a SMILES string.

Developed by: Microsoft Research AI for Science
Model type: Chemical Reaction Prediction Model
Language(s): Python
License: MIT
Model Repository: microsoft/retrochimera
Paper: Chemist-aligned retrosynthesis by ensembling diverse inductive bias models

Quickstart

RetroChimera can be queried either in single-step or multi-step mode. For the latter, the model is called repeatedly in synthesis search until a given time budget is exhausted. An example payload to use the single-step workflow is

{
  "input_data": {
    "inputs": ["Oc1ccc(OCc2ccccc2)c(Br)c1"],
    "workflow": "single_step",
    "num_results": 3
  }
}

An example payload to use the multi-step workflow is

{
  "input_data": {
    "inputs": [
      "C=CC(=O)N1CCCCC(n2c(=O)c3ncccc3n(Cc3ccc(Oc4cccc(F)c4)cc3)c2=O)C1"
    ],
    "workflow": "multi_step",
    "time_limit_s": 60
  }
}

Tips

Both workflows are currently synchronous: a call will actively wait until the work is done. This takes a couple of seconds for single-step, but will vary for multi-step and is controlled by the time_limit_s argument. Note that by default Azure will timeout calls taking longer than 90 seconds, thus setting time_limit_s close or above to that will not work. If you wish to use longer searches, please deploy from CLI where you may be able to increase the timeout to 180 seconds. The deployment maintains a global cache for recent model calls. This means repeated single-step calls with the same input will be much faster, while repeated multi-step calls will effectively lead to continuing search roughly from where it left off, allowing one to simulate longer search time limits by calling the endpoint until a solution is found. With repeated multi-step calls, we recommend allowing 4 seconds after each call before making the next one to avoid overwhelming the deployment.

Responsible AI Documentation

RetroChimera is being shared with the research community to facilitate reproduction of our results and foster further research in this area. RetroChimera is intended to be used by domain experts who are independently capable of evaluating the quality of outputs before acting on them. We do not recommend using RetroChimera in commercial or real-world applications without further testing and development. It is being released for research purposes.

Direct intended uses

Single-step: Predict chemical reactions that could be used to synthesize a given drug-like small molecule.
Multi-step: Perform synthesis planning which calls the model repeatedly and assembles its predictions into a synthesis plan.

Out-of-scope uses

Predict chemical reactions that could be used to synthesize a non-drug-like small molecule (e.g. natural products), macromolecule (e.g. a protein) or material.

Risks and limitations

RetroChimera was trained via supervised learning, and so it is likely to reproduce any biases that exist in its training data, which was derived from publications and patents. As such, RetroChimera is likely to be biased towards reactions that were commonly reported in the past, even if other more efficient or greener alternatives exist. RetroChimera was developed for research and experimental purposes. Further testing and validation are needed before considering its application in commercial or real-world scenarios. Outputs generated by AI may include errors or fabrication of ungrounded content. Users are responsible for assessing the accuracy of generated content. All decisions leveraging outputs of the system should be made with human oversight and not be based solely on system outputs.

Recommendations

RetroChimera can propose many potential reactions. It is up to the user to filter them further based on feasibility, cost, and environmental considerations. In particular, we recommend further downstream processing of the output with reaction feasibility and forward reaction prediction models to achieve the best results and mitigate hallucinations.

Training data

The version of RetroChimera used here was trained on Pistachio, which is a proprietary database curated by NextMove, extracted from a variety of patents and publications.

Feedback

We welcome feedback from the community and are eager for collaboration. If you have suggestions, questions, or observe unexpected/problematic behavior in our technology, please contact us:

Krzysztof Maziarz, krmaziar@microsoft.com
Marwin Segler, marwinsegler@microsoft.com
Guoqing Liu, guoqingliu@microsoft.com

If the team receives reports of undesired behavior or identifies issues independently, we will update this repository with appropriate mitigations.

Model Specifications

LicenseMit

Last UpdatedOctober 2025

Input TypeText

Output TypeText

PublisherMicrosoft

Languages1 Language

Quick Start