Date: March 05, 2025 Dataset Title: Machine learning models for Si nanoparticle growth in nonthermal plasma Dataset Contact: Angela Violi avioli@umich.edu Dataset Creators: Name: Matt Raymond Email: mattrmd@umich.edu Institution: University of Michigan Department of Electrical Engineering and Computer Science ORCID: https://orcid.org/0000-0001-6824-8692 Name: Paolo Elvati Email: elvati@umich.edu Institution: University of Michigan Department of Mechanical Engineering ORCID: https://orcid.org/0000-0002-6882-6023 Name: Jacob C. Saldinger Email: jsald@umich.edu Institution: University of Michigan Department of Chemical Engineering, Low Carbon Pathway Innovation at BP ORCID: https://orcid.org/0000-0001-5005-614X Name: Jonathan Lin Email: jonlin@umich.edu Institution: University of Michigan Department of Electrical Engineering and Computer Science ORCID: https://orcid.org/0009-0004-6381-4068 Name: Xuetao Shi Email: xuetao_shi@dfci.harvard.edu Institution: University of Michigan Department of Mechanical Engineering, Dana-Farber Cancer Institute at Harvard ORCID: https://orcid.org/0000-0001-6274-5495 Name: Angela Violi Email: avioli@umich.edu Institution: University of Michigan Departments of Electrical Engineering and Computer Science, Mechanical Engineering, and Chemical Engineering ORCID: https://orcid.org/0000-0001-9517-668X Funding: US Army Research Office MURI Grant No. W911NF-18-1-0240 and the NSF ECO-CBET No. F059554. Key Points: - We simulated the collisions of silane nanoparticles of various sizes (which we call "clusters" and "impactors") using molecular dynamics with a reactive force field. - "Clusters" are defined as Si2H6, Si4, and Si29Hy (y=18,27,31,36), while "impactors" are defined as Si2Hy (y=1-6) in all possible hydrogen distributions (\eg for Si2H4, we simulated both H2Si*-*SiH2 and HSi**-SiH3). - The dependence of sticking probabilities on temperature, H coverage of both silane impactors and cluster surfaces, and the size of the cluster, are modeled using various machine learning models. - We find that machine learning models accurately predict sticking probabilities when trained on as little as 15% of these simulations, significantly reducing the number of simulations that must be run. Research Overview: Nanoparticles (NPs) formed in nonthermal plasmas (NTPs) can have unique properties and applications. However, modeling their growth in these environments presents significant challenges due to the non-equilibrium nature of NTPs, making them computationally expensive to describe. In this work, we address the challenges associated with accelerating the estimation of parameters needed for these models. Specifically, we explore how different machine learning models can be tailored to improve prediction outcomes. We apply these methods to reactive classical molecular dynamics data, which capture the processes associated with colliding silane fragments in NTPs. These reactions exemplify processes where qualitative trends are clear, but their quantification is challenging, hard to generalize, and requires time-consuming simulations. Our results demonstrate that good prediction performance can be achieved when appropriate loss functions are implemented and correct invariances are imposed. While the diversity of molecules used in the training set is critical for accurate prediction, our findings indicate that only a fraction (15-25%) of the energy and temperature sampling is required to achieve high levels of accuracy. This suggests a substantial reduction in computational effort is possible for similar systems. Methodology: The data are mechanism labeling of the Molecular Dynamics (MD) simulation trajectories at the end of allocated simulation time. The MD simulation is carried out using LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) and the atomic interactions were modeled using a classical all-atom reactive force field. The labeling is made by computing the composition and the number of clusters in the system. Two atoms were assigned to the same cluster if their distance was less than twice their typical bond length, namely 0.44, 0.32 and 0.148 nm for Si/Si, Si/H, and H/H pairs, respectively. If the trajectory outcome is non-sticking (more than one cluster), the label is "-1"; If the trajectory outcome is physisorption (one cluster, but no chemical bond formed), the label is "0"; If the trajectory outcome is chemisorption (one cluster, chemical bonds formed), the label is "1", "2", "3", "4", where the numerical value corresponds to the number new bonds formed. This approach is the same as in [2]. Instrument and/or Software specifications: NA Files contained here: The 66 CSV files correspond to the 66 molecular dynamics simulations run for this work, out of the 78 possible cluster-impactor combinations described above. We did not include the 12 simulations that were previously simulated in [2]. The title of each CSV file corresponds to the label of one of the MD trajectories, and the columns are as follows: - Temperature: the system temperature. There are 5 temperatures (300K, 400K, 500K, 600K, and 900K). - Configuration: the molecular orientation configurations. There are 5 configurations, denoted "2, 4, 6, 8, 10", for both impacting fragments, separated by an underscore "_". In total, there could be 5*5=25 configurations (although not all of them may be sampled). - Velocity interval: the impact velocity interval percentages in terms of the CDF (cumulative distribution function). Typically, there are 200 velocity intervals, but they may be sub-sampled down to 40 velocity intervals for certain trajectories. - Label: the trajectory outcome, as described in the "Methodology" section of this document. Below is an exhaustive list of all interactions evaluated in this work (choosing one from each column.) | Cluster | Impactor | |---------|----------| | Si2H6 | SiH | | Si4 | SiH2 | | Si29H18 | SiH2-Si | | Si29H27 | SiH3 | | Si29H31 | SiH3-Si | | Si29H36 | SiH3-SiH | | | SiH4 | | | SiH2-SiH | | | Si2H | | | Si2H2 | | | Si2H4 | | | Si2H5 | | | Si2H6 | Related publication(s): [1] Raymond, M., Elvati, P., Saldinger, J. C., Lin, J. Shi, X., Violi, A. (2025). Machine learning models for Si nanoparticle growth in nonthermal plasma. Plasma Sources Sci. Technol. Https://doi.org/10.1088/1361-6595/adbae1 [2] Shi, X., Elvati, P., Violi, A. (2021). On the growth of Si nanoparticles in non-thermal plasma: physisorption to chemisorption conversion. J. Phys. D. https://doi.org/10.7302/vd87-wm68 Use and Access: This data set is made available under a Creative Commons Public Domain license (CC0 1.0).