Date: October 3, 2020 Dataset Title: AutoSiM software and sample single-molecule trace data accompanying "Automatic classification and segmentation of single-molecule fluorescence time traces with deep learning" Dataset Creators: Jieming Li, Leyou Zhang, Alexander Johnson-Buck, Nils G. Walter Dataset Contact: Nils G. Walter, nwalter@umich.edu Funding: This work was supported by NSF grants 1741618 and 1609051 to Xiaoming Mao, NIH IMAT awards R21-CA204560 and R33-CA229023 to Nils G. Walter, and a Michigan Economic Development Corporation Mi-TRAC award to Alexander Johnson-Buck and Nils G. Walter. Research Overview: Traces from single-molecule fluorescence microscopy (SMFM) experiments exhibit photophysical artifacts that typically necessitate human expert screening, which is time-consuming and introduces potential for user-dependent expectation bias. Here, we have used deep learning to develop a rapid, automatic SMFM trace selector, termed AutoSiM, that improves the sensitivity and specificity of an assay for a DNA point mutation based on single-molecule recognition through equilibrium Poisson sampling (SiMREPS). The improved performance of AutoSiM is based on accepting both more true positives and fewer false positives than the conventional approach of hidden Markov modeling (HMM) followed by thresholding. As a second application, the selector was used for automated screening of single-molecule Förster resonance energy transfer (smFRET) data to identify high-quality traces for further analysis, and achieves ~90% concordance with manual selection while requiring less processing time. AutoSiM can be adapted readily to novel datasets, requiring only modest Transfer Learning. Methodology: Data collection and analysis software are described in detail in the related publication [Li & Zhang et al. (2020) Nature Communications]. Source data were acquired using one- or two-channel total internal reflection fluorescence (TIRF) video microscopy, and one- or two-channel intensity-versus-time traces were generated from the TIRF movies as described in the original publications cited within the related publication. AutoSiM exploits two types of deep learning neural networks to automatically classify single-molecule fluorescence time traces a recurrent neural network (RNN) called long-short term memory (LSTM), and a convolutional neural network (CNN). In the case of single-channel time traces generated from SiMREPS experiments, an LSTM network is used to reject or accept each time trace as evidence of a single analyte molecule based on the temporal binding pattern (kinetic fingerprint) of a fluorescent probe. This LSTM network is trained in an automated fashion using two training datasets: a negative control dataset containing only SMFM traces that reflect nonspecific or background binding of the probe, and a positive control dataset comprising at least some SMFM traces that reflect specific binding of the probe to the target analyte. The training process is designed to generate a network that maximizes the number of true positives (number of accepted traces in the positive control dataset) while minimizing the number of false positives (number of accepted traces in the negative control dataset). In the case of two-channel time traces generated from smFRET experiments, an LSTM network or CNN is trained using smFRET traces that have been manually labeled as "accepted" or "rejected" by a human user, and then employed to reject/accept any new input smFRET traces for further analysis based on the criteria established during training. Optionally, an LSTM network is employed to perform segmentation of smFRET traces (i.e., select only the segments of the smFRET traces containing valid FRET measurements before photobleaching has occurred while discarding any blinking events); the segmentation LSTM is also trained using smFRET traces that have been manually segmented. This data deposit contains LSTM and CNN networks that have been trained using 29,395 smFRET traces from four different molecular systems; analysis of novel systems can be performed using these networks. Alternatively, the accuracy of analysis of novel systems can be increased by modifying these existing networks using Transfer Learning with a small subset of ~100 smFRET traces from the new system. Detailed instructions on the use of AutoSiM for analysis of one- and two-channel SMFM time traces can be found in the README files contained within the folders of this dataset. Files contained here: folders containing AutoSiM software and sample datasets for analysis of one- and two-channel single-molecule fluorescence time traces using deep learning 1. app_SiMREPS/: Software and sample data for analysis of single-channel single-molecule fluorescence time traces - README.html describing the use of AutoSiM for single-channel data - simreps_app.mlapp containing source code for the main function of the software - sample_data/ containing the sample data for using the software 2. app_smFRET/: Software and sample data for analysis of two-channel single-molecule FRET time traces - README.html describing the use of AutoSiM for two-channel data - smfret_app.mlapp containing the source code for the main function of the software - sample_data/ containing the sample data for using the software 3. scripts_SiMREPS/: MATLAB scripts and sample data for analysis of single-channel single-molecule fluorescence time traces - README.html describing the use of the scripts for single-channel data - src/ containing source code of the scripts - data/ containing the data for using the scripts 4. scripts_smFRET/: MATLAB scripts and sample data for analysis of two-channel single-molecule FRET time traces - README.html describing the use of the scripts for two-channel data - src/ containing the source code of the scripts - data/ containing the data for using the software Related publication: Jieming Li, Leyou Zhang, Alexander Johnson-Buck, Nils G. Walter. (2020) Automatic classification and segmentation of single-molecule fluorescence time traces with deep learning. Nature Communications, In Press. Source publications for sample data: One-channel data: Hayward, S. L. et al. Ultraspecific and Amplification-Free Quantification of Mutant DNA by Single-Molecule Kinetic Fingerprinting. J. Am. Chem. Soc. 140, 11755–11762 (2018). Two-channel data: Fu, J. et al. Multi-enzyme complexes on DNA scaffolds capable of substrate channelling with an artificial swinging arm. Nat. Nanotechnol. 9, 531–536 (2014). Li, J. et al. Exploring the speed limit of toehold exchange with a cartwheeling DNA acrobat. Nat. Nanotechnol. 13, 723–729 (2018). Widom, J. R. et al. Ligand Modulates Cross-Coupling between Riboswitch Folding and Transcriptional Pausing. Mol. Cell 72, 541-552.e6 (2018). Suddala, K. C., Wang, J., Hou, Q. & Walter, N. G. Mg2+ Shifts Ligand-Mediated Folding of a Riboswitch from Induced-Fit to Conformational Selection. J. Am. Chem. Soc. 137, 14075–14083 (2015). Suddala, K. C. et al. Local-to-global signal transduction at the core of a Mn 2+ sensing riboswitch. Nat. Commun. 10, 4304 (2019). Use and Access: This software is made available under the BSD 3-Clause Clear license (https://spdx.org/licenses/BSD-3-Clause-Clear.html). To Cite Data: Jieming Li, Leyou Zhang, Alexander Johnson-Buck, Nils G. Walter (2020). AutoSiM software and sample single-molecule trace data accompanying "Automatic classification and segmentation of single-molecule fluorescence time traces with deep learning". University of Michigan - Deep Blue. https://doi.org/10.7302/ck2m-qf69