Computational Chemistry is currently a synergistic assembly between ab initio calculations, simulation, machine learning (ML) and optimization strategies for describing, solving and predicting chemical data and related phenomena. These include accelerated literature searches, analysis and prediction of physical and quantum chemical properties, transition states, chemical structures, chemical reactions, and also new catalysts and drug candidates. Reaction predictor is an application for predicting chemical reactions and reaction pathways. It uses deep learning to predict and rank elementary reactions by first identifying electron sources and sinks, pairing those sources and sinks to propose elementary reactions, and finally ranking the reactions by favorability. Reaction condition recommendation is an essential element for the realization of computer-assisted synthetic planning. Accurate suggestions of reaction conditions are required for experimental validation and can have a significant effect on the success or failure of an attempted transformation.

The generalization of scalability to larger chemical problems, rather than specialization, is now the main principle for transforming chemical tasks in multiple fronts, for which systematic and cost-effective solutions have benefited from ML approaches, including those based on deep learning (e.g. quantum chemistry, molecular screening, synthetic route design, catalysis, drug discovery). The latter class of ML algorithms is capable of combining raw input into layers of intermediate features, enabling bench-to-bytes designs with the potential to transform several chemical domains. In this review, the most exciting developments concerning the use of ML in a range of different chemical scenarios are described. A fundamental problem of synthetic chemistry is the identification of unknown products observed via mass spectrometry. Reaction predictor includes a pathway search feature that can help identify such products through multitarget mass search.

Here Deep learning for chemical reaction prediction authors  describe the ML design and methodology underpinning reaction predictor’s ML-based predictions. To train our models, we carefully curated a training data set consisting of over 11 000 elementary reactions, covering a broad range of advanced organic chemistry. In paper authors carefully curated a data set consisting of over 11000 elementary reactions, covering a broad range of advanced organic chemistry. Using this data for training, they demonstrate an 80% top-5 recovery rate on a separate, challenging benchmark set of reactions drawn from modern organic chemistry literature. Finally, they discuss an alternative approach to predicting electron sources and sinks using recurrent neural networks, specifically long short-term memory (LSTM) architectures, operating directly on SMILES strings

Schematic of the LSTM architecture used for source/sink prediction. This approach operates directly on SMILES strings representing all reactant molecules, and is able to make source/sink predictions using context from the entire set of reactants
Schematic of the LSTM architecture used for source/sink prediction. This approach operates directly on SMILES strings representing all
reactant molecules, and is able to make source/sink predictions using context from the entire set of reactants

Authors also tested reaction predictor’s performance on a benchmark data set of challenging real-world reactions, and demonstrate a high degree of accuracy. We also compare these results with the performance of an early prototype system that was developed in our group. Finally it was demonstrated a promising LSTM-based approach to predicting reactive sites based solely on SMILES strings. This could be used in future work to complement and improve the existing MLP-based source/sink filters. Authors expect reaction predictor will continue to improve over time as new opportunities for refinement are identified, and as more training data becomes available.

In work A graph-convolutional neural network model for the prediction of chemical reactivityauthors trained on ∼10 million examples from Reaxys. The model is able to propose conditions where a close match to the recorded catalyst, solvent, and reagent is found within the top-10 predictions 69.6% of the time, with top-10 accuracies for individual species reaching 80−90%. Temperature is accurately predicted within ±20 °C from the recorded temperature in 60− 70% of test cases, with higher accuracy for cases with correct chemical context predictions. The utility of the model is illustrated through several examples spanning a range of common reaction classes. Authors also demonstrate that the model implicitly learns a continuous numerical embedding of solvent and reagent species that captures their functional similarity.

In work Deep Learning for Deep Chemistry: Optimizing the Prediction of Chemical Patterns  authors focus is given to the models, algorithms and methods proposed to facilitate research on compound design and synthesis, materials design, prediction of binding, molecular activity, and soft matter behavior. The information produced by pairing Chemistry and ML, through data-driven analyses, neural network predictions and monitoring of chemical systems, allows prompting the ability to understand the complexity of chemical data, streamlining and designing experiments, discovering new molecular targets and materials, and also planning or rethinking forthcoming chemical challenges. In fact, optimization engulfs all these tasks directly.

This review has sought to provide a sample of ML approaches that support the major research trends in Chemistry, especially in computational chemistry, focusing on DLNs. Such an approaches have offered the possibility of solving chemical problems that cannot be described and explained via conventional methods. In the last few years, the application of ML to the optimization and prediction of molecular properties has become very popular, since more researchers are trained and acquired technical skills to develop and use such methods.