Aligning Machine Learning with Chemists to Aid Decision Making in Organic Synthesis
Shim, Eunjae
2024
Abstract
Computational investigation of chemical reactions is indispensable for understanding underlying factors of successful outcomes. Quantum chemistry simulation is a particularly valuable tool that can show how a reaction progresses at an atomistic level, providing detailed descriptions of why a certain outcome is observed. Machine learning (ML) is another toolbox that takes a different approach on reactions, extracting how outcomes are statistically correlated with what enters the flask. While they may seem different, the goal of chemists using either strategy is to learn generalizable principles that will facilitate organic synthesis. To this end, this dissertation outlines the application and development of the two computational tools. In the introductory chapter, the scope of reactions – palladium-catalyzed cross-coupling reactions and novel modes of reacting amines and carboxylic acids – this work focuses on is introduced, followed by an overview of both computational tools. Chapter 2 describes quantum chemical studies on two reactions between amines and carboxylic acids. The first part investigates a nickel-catalyzed C(sp3)–C(sp2) bond forming reaction. Studying the role of the key additive phthalimide shows it is likely to be involved later in the catalytic cycle than oxidative addition. The latter half explores a selective reduction of ester over amide discovered during the development of a deaminative etherification reaction. Among two silyl cations involved in the reaction, one was shown to activate the carbonyls in the substrate while the other facilitated deoxygenation after hydride transfer. Insights from these studies will be useful for designing reactions involving these widely available building blocks. Two subsequent chapters look into improving ML’s prediction of reaction conditions. Chapter 3 evaluates label ranking, which ranks reaction condition candidates from substrates, as an alternative to the current approach of predicting their yields. Label ranking was shown to be more generalizable on various datasets likely due to its simpler setup than regressors. An extended problem of finding effective reaction conditions for new substrate classes was sought in Chapter 4. Inspired by the successful approach of chemists developing reactions –leveraging relevant chemical knowledge for designing initial experiments and planning subsequent experiments based on earlier results – an analogous ML strategy was devised. It combines transfer learning, which makes informed predictions on low-data problems with a model trained on a relevant dataset with more data, and active learning, which iteratively refines the model with new data. The resulting active transfer learning strategy identified working reaction conditions for challenging substrates in C–N coupling reactions more efficiently than either strategy alone. Then, its prospective use on improving another reaction between activated amines and carboxylic acids is demonstrated. In Chapter 5, reactivity prediction for parallel library synthesis was explored. Emphasizing a reactivity-centric approach, a Reactivity Map that structures the reactivity landscape across substrates was first developed. After initial testing of a small number of substrates across this map, a semi-supervised learning model was employed to predict outcomes of remaining substrates. Compared to random forest classifiers, this method resulted in libraries with higher product ratios. The dissertation is concluded with a discussion of where ML for reaction outcome prediction currently stands and a possible direction for combining it with quantum chemistry to achieve our goal of learning generalizable principles and making chemistry more predictable.Deep Blue DOI
Subjects
Machine learning organic synthesis decision making
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.