Date: 26 June, 2024 Dataset Title: Combined Experimental and Molecular Dynamics Approach Towards a Rational Design of the YfeX Biocatalyst for Enhanced Carbene Transferase Reactivity Dataset Creators: Sosa Alfaro, Victor; Palomino, Hannah; Liu, Sophia Y.; Lemuh Njimoh, Cybele; Lehnert, Nicolai Dataset Contact: Nicolai Lehnert lehnertn@umich.edu Key Points: -- In a previous study, we have shown that wild-type YfeX, naturally a dye-decolorizing peroxidase, is an enzyme with great potential for the development of new carbene transferases, with an intrinsic reactivity of the wild-type protein that is on par with the best myoglobin mutants available to this date (Chem. Eur. J. 2022, 28, e202201474). -- In this work, we focus on the further optimization of YfeX using mutagenesis to explore the role of important active site amino acid side chains for catalysis, and how they affect the region- and stereoselectivity of the carbene transfer reactions (Catalysis Science & Technology 2024, 14, DOI: 10.1039/D3CY01489D, open access publication). -- Here, we provide the details of the molecular dynamics calculations conducted as part of this study. Research Overview: In the last decade, carbene transfer biocatalysis has evolved from a playfield for chemists’ scientific curiosity to an area with vast potential for the development of new industrial processes in the chemical and pharmaceutical industry. Despite the growing significance of these “carbene transferases”, it is surprising that their development is based on a very narrow range of proteins, mostly Cyt. P450s and myoglobin. This is in contrast to the large diversity of naturally occurring proteins. In this study, we focus on the further optimization of the dye-decolorizing peroxidase YfeX as a carbene transfer catalyst. Amino acids in the active site that we targeted are R232, which sits right above the heme in the distal pocket (and plays a key role for the natural peroxidase activity of the enzyme), D143, and S234, as well as I230, which sits in the entrance of the active site pocket. MD simulations are then used to rationalize the somewhat surprising results that we observed experimentally in the corresponding variant proteins. Our results show that the I230A single variant identified here is one of the most active N-H insertion catalysts known, producing >90% yields in only 1 hour (typical reaction times in the literature are 8 – 24 hours). Despite the substitution of the bulky amino acid I230 with alanine, this variant actually has the smallest active site, which explains why this variant does not operate on larger, more bulky substrates. The MD simualtions further show that the R232A variant actually has the largest active site, and in fact, this variant is able to operate on indole as a substrate. However, despite the propensity of YfeX to activate N-H bonds, we observe a strong preference for C-H bond activation with indole as the substrate, with the R232A variant catalyzing this reaction in 12% yield. Finally, our MD simulations also rationalize why YfeX, despite being a very active catalyst for N-H and Si-H insertion reactions, does not produce cyclopropanes in high yield. Methodology: Molecular dynamics simulations were performed using the GPU code (pmemd) of the AMBER 22 package. The YfeX crystal structure available from the PDB (PDB code: 5GT2) was used as the starting point, in its monomeric form. Parameters for the variants were generated using the same procedure as that for WT. Initial structures for the YfeX variants R232A, S234A, D143A and I230A were generated using the PyMOL mutagenesis tool. The protonation states of the ionizable side chains were accessed with Propka software, while the axial His215 that is coordinated to the Fe center of the heme was assigned a protonation state based on visual inspection of its local environment. The active center parameters were generated using Metal Center Parameter Builder (MCPB.py), as implemented in Amber18. Parameters for the iron-carbenoid and substrates were generated within the antechamber and MCPB.py modules in the AMBER22 package using the general AMBER force field (gaff). The bond and angular force constants were derived using the Seminario method, while point charge parameters for the electrostatic potentials were obtained using the ChgModB method. Remaining protein residues were described using the Amber FF14SB force field. All the missing hydrogen atoms in the crystal structure were added with the Leap module in Amber and charges were neutralized using Na+ counter ions. The system was solvated with TIP3P water molecules in a rectangular box within a distance of 10 Å from the protein surface. Several MD and QM/MM studies on both heme and non-heme Fe-containing enzymes have successfully used the parameters generated via this procedure to study both the dynamics and the catalytic mechanisms of Fe-containing proteins. A two-stage minimization of the geometries was first performed using MM to eliminate clashes of atoms. The first stage minimizes the positions of solvent molecules and ions, while positional restraints were imposed on the solute molecules by a harmonic potential of 500 kcal/(mol Å2). The second stage minimizes all the atoms without any restraints. Minimization was performed using the CPU version of the Sander in Amber by subjecting the system to 5000 steps of steepest descent, followed by 5000 steps of conjugate gradient energy minimization. The minimized system was first slowly heated by restraining the solute molecules (harmonic potential of 50 kcal/(mol Å2)) from 0 to 300 K for 50 ps in an NVT ensemble using a Langevin thermostat. To achieve uniform density, the heated system was further simulated with a weak restraint on the solute molecules at constant temperature of 300 K for 1 ns in an NPT ensemble. After that, the system was equilibrated for 3 ns in an NPT ensemble at a fixed temperature and pressure of 300 K and 1 bar, respectively, without any restraints on solute molecules. After equilibration, unrestrained production trajectories were run under the NVT ensemble and periodic boundary conditions for 1000 ns (1 s). Unrestrained MD simulations with the iron-carbenoid intermediate were set up for the WT, I230A, and R232A variants by first selecting the most populated clusters (based on backbone clustering analysis using cpptraj) obtained from holo MD simulations as starting points to build the iron-carbenoid bound structures. Each iron-carbenoid bound system was run for 1000 ns. Unrestrained MD simulations of WT enzyme with styrene involved styrene placement in the active site through visual inspection, followed by the standard pre-production processing steps mentioned above with the exception of a distance restraint that was added between the center of mass of the styrene C-C double bond and the central C1 atom of the iron-carbenoid (3.0-3.2 Å), which was defined by adding a harmonic potential with k = 100 kcal/(mol Å2) to this coordinate during the equilibration. Trajectories were processed and analyzed using the cpptraj module from Ambertools utilities. All structures used for analysis are central structures of the most presented clusters. For further details and references see the corresponding paper in Catalysis Science & Technology. Instrument and/or Software specifications: AMBER 22 (https://ambermd.org/) Files contained here: The data are organized in folders, by variant: wild-type (WT_YfeX), D143A, I230A, R232A, and S234A. The data provided include coordinate and trajectory files for the complete molecular dynamic simulations performed on WT YfeX and the four variants. In addition, coordinate, root mean square deviation (RMSD), root mean square fluctuation (RMSF), hydrogen-bonding (H-bond) and solvent accessible surface area (SASA) files used for analysis are also provided. The coordinate and trajectory files containing "cond" indicate that the first of every 100 frames of the total 50,000 frames are used. The coordinate and trajectory files containing "nowat" indicate that the water molecules were removed after the simulations were complete. Here: molecular dynamics trajectories: .nc molecular dynamics coordinates: .prmtop protein structure coordinate files: .pdb RMSD files (ascii files): .dat RMSF files (ascii files): .dat H-bond analysis (ascii files): .dat SASA files (ascii files): .dat Ascii files can be opened with any plotting or data analysis program like Excel, Origin, SigmaPlot, etc. The .pdb files can be opened with PyMol (https://pymol.org/) and Chimera (https://www.cgl.ucsf.edu/chimera/). The trajectory and coordinate files can be opened with VMD (https://www.ks.uiuc.edu/Research/vmd/). Related publication(s): Sosa Alfaro, Victor; Palomino, Hannah; Liu, Sophia Y.; Lemuh Njimoh, Cybele; Lehnert, Nicolai, "Combined Experimental and Molecular Dynamics Approach Towards a Rational Design of the YfeX Biocatalyst for Enhanced Carbene Transferase Reactivity", Catalysis Science & Technology 2024, 14, DOI: 10.1039/D3CY01489D Use and Access: This data set is made available under a Creative Commons Public Domain license: http://creativecommons.org/licenses/by-nc/4.0/ To Cite Data: Sosa Alfaro, V.; Palomino, H.; Liu, S. Y.; Lemuh Njimoh, C.; Lehnert, N. Combined Experimental and Molecular Dynamics Approach Towards a Rational Design of the YfeX Biocatalyst for Enhanced Carbene Transferase Reactivity [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/ear2-rd86