Work Description

Title: Machine learning models for Si nanoparticle growth in nonthermal plasma Open Access Deposited

h
Attribute Value
Methodology
  • The data are mechanism labeling of the Molecular Dynamics (MD) simulation trajectories at the end of allocated simulation time. The MD simulation is carried out using LAMMPS and the atomic interactions were modeled using a classical all-atom reactive force field. The labeling is made by computing the composition and the number of clusters in the system. Two atoms were assigned to the same cluster if their distance was less than twice their typical bond length, namely 0.44, 0.32 and 0.148 nm for Si/Si, Si/H, and H/H pairs, respectively. If the trajectory outcome is non-sticking (more than one cluster), the label is "-1"; If the trajectory outcome is physisorption (one cluster, but no chemical bond formed), the label is "0"; If the trajectory outcome is chemisorption (one cluster, chemical bonds formed), the label is "1", "2", "3", "4", where the numerical value corresponds to the number new bonds formed.

  • This approach is the same as in Shi, X., Elvati, P., Violi, A. (2021). On the growth of Si nanoparticles in non-thermal plasma: physisorption to chemisorption conversion. J. Phys. D.  https://doi.org/10.1088/1361-6463/ac0b71
Description
  • Nanoparticles (NPs) formed in nonthermal plasmas (NTPs) can have unique properties and applications. However, modeling their growth in these environments presents significant challenges due to the non-equilibrium nature of NTPs, making them computationally expensive to describe. In this work, we address the challenges associated with accelerating the estimation of parameters needed for these models. Specifically, we explore how different machine learning models can be tailored to improve prediction outcomes. We apply these methods to reactive classical molecular dynamics data, which capture the processes associated with colliding silane fragments in NTPs. These reactions exemplify processes where qualitative trends are clear, but their quantification is challenging, hard to generalize, and requires time-consuming simulations. Our results demonstrate that good prediction performance can be achieved when appropriate loss functions are implemented and correct invariances are imposed. While the diversity of molecules used in the training set is critical for accurate prediction, our findings indicate that only a fraction (15-25%) of the energy and temperature sampling is required to achieve high levels of accuracy. This suggests a substantial reduction in computational effort is possible for similar systems.
Creator
Creator ORCID iD
Depositor
Depositor creator
  • true
Contact information
Discipline
Funding agency
  • National Science Foundation (NSF)
  • Other Funding Agency
Other Funding agency
  • US Army Research Office
ORSP grant number
  • W911NF-18-1-0240
Keyword
Citations to related material
Resource type
Last modified
  • 04/08/2025
Published
  • 04/08/2025
Language
DOI
  • https://doi.org/10.7302/5dt5-cy22
License
To Cite this Work:
Raymond, M., Elvati, P., Saldinger, J. C., Lin, J., Shi, X., Violi, A. (2025). Machine learning models for Si nanoparticle growth in nonthermal plasma [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/5dt5-cy22

Relationships

This work is not a member of any user collections.

Files (Count: 69; Size: 8.01 MB)

Date: March 05, 2025

Dataset Title: Machine learning models for Si nanoparticle growth in nonthermal plasma

Dataset Contact: Angela Violi [email protected]

Dataset Creators:
Name: Matt Raymond
Email: [email protected]
Institution: University of Michigan Department of Electrical Engineering and Computer Science
ORCID: https://orcid.org/0000-0001-6824-8692

Name: Paolo Elvati
Email: [email protected]
Institution: University of Michigan Department of Mechanical Engineering
ORCID: https://orcid.org/0000-0002-6882-6023

Name: Jacob C. Saldinger
Email: [email protected]
Institution: University of Michigan Department of Chemical Engineering, Low Carbon Pathway Innovation at BP
ORCID: https://orcid.org/0000-0001-5005-614X

Name: Jonathan Lin
Email: [email protected]
Institution: University of Michigan Department of Electrical Engineering and Computer Science
ORCID: https://orcid.org/0009-0004-6381-4068

Name: Xuetao Shi
Email: [email protected]
Institution: University of Michigan Department of Mechanical Engineering, Dana-Farber Cancer Institute at Harvard
ORCID: https://orcid.org/0000-0001-6274-5495

Name: Angela Violi
Email: [email protected]
Institution: University of Michigan Departments of Electrical Engineering and Computer Science, Mechanical Engineering, and Chemical Engineering
ORCID: https://orcid.org/0000-0001-9517-668X

Funding: US Army Research Office MURI Grant No. W911NF-18-1-0240 and the
NSF ECO-CBET No. F059554.

Key Points:
- We simulated the collisions of silane nanoparticles of various sizes (which we call "clusters" and "impactors") using molecular dynamics with a reactive force field.
- "Clusters" are defined as Si2H6, Si4, and Si29Hy (y=18,27,31,36), while "impactors" are defined as Si2Hy (y=1-6) in all possible hydrogen distributions (\eg for Si2H4, we simulated both H2Si*-*SiH2 and HSi**-SiH3).
- The dependence of sticking probabilities on temperature, H coverage of both silane impactors and cluster surfaces, and the size of the cluster, are modeled using various machine learning models.
- We find that machine learning models accurately predict sticking probabilities when trained on as little as 15% of these simulations, significantly reducing the number of simulations that must be run.

Research Overview:
Nanoparticles (NPs) formed in nonthermal plasmas (NTPs) can have unique properties and applications. However, modeling their growth in these environments presents significant challenges due to the non-equilibrium nature of NTPs, making them computationally expensive to describe. In this work, we address the challenges associated with accelerating the estimation of parameters needed for these models. Specifically, we explore how different machine learning models can be tailored to improve prediction outcomes. We apply these methods to reactive classical molecular dynamics data, which capture the processes associated with colliding silane fragments in NTPs. These reactions exemplify processes where qualitative trends are clear, but their quantification is challenging, hard to generalize, and requires time-consuming simulations. Our results demonstrate that good prediction performance can be achieved when appropriate loss functions are implemented and correct invariances are imposed. While the diversity of molecules used in the training set is critical for accurate prediction, our findings indicate that only a fraction (15-25%) of the energy and temperature sampling is required to achieve high levels of accuracy. This suggests a substantial reduction in computational effort is possible for similar systems.

Methodology:
The data are mechanism labeling of the Molecular Dynamics (MD) simulation trajectories at the end of allocated simulation time. The MD simulation is carried out using LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) and the atomic interactions were modeled using a classical all-atom reactive force field. The labeling is made by computing the composition and the number of clusters in the system.

Two atoms were assigned to the same cluster if their distance was less than twice their typical bond length, namely 0.44, 0.32 and 0.148 nm for Si/Si, Si/H, and H/H pairs, respectively.

If the trajectory outcome is non-sticking (more than one cluster), the label is "-1";
If the trajectory outcome is physisorption (one cluster, but no chemical bond formed), the label is "0";
If the trajectory outcome is chemisorption (one cluster, chemical bonds formed), the label is "1", "2", "3", "4", where the numerical value corresponds to the number new bonds formed.

This approach is the same as in [2].

Instrument and/or Software specifications: NA

Files contained here:
The 66 CSV files correspond to the 66 molecular dynamics simulations run for this work, out of the 78 possible cluster-impactor combinations described above. We did not include the 12 simulations that were previously simulated in [2].
The title of each CSV file corresponds to the label of one of the MD trajectories, and the columns are as follows:
- Temperature: the system temperature. There are 5 temperatures (300K, 400K, 500K, 600K, and 900K).
- Configuration: the molecular orientation configurations. There are 5 configurations, denoted "2, 4, 6, 8, 10", for both impacting fragments, separated by an underscore "_". In total, there could be 5*5=25 configurations (although not all of them may be sampled).
- Velocity interval: the impact velocity interval percentages in terms of the CDF (cumulative distribution function). Typically, there are 200 velocity intervals, but they may be sub-sampled down to 40 velocity intervals for certain trajectories.
- Label: the trajectory outcome, as described in the "Methodology" section of this document.

Below is an exhaustive list of all interactions evaluated in this work (choosing one from each column.)

| Cluster | Impactor |
|---------|----------|
| Si2H6 | SiH |
| Si4 | SiH2 |
| Si29H18 | SiH2-Si |
| Si29H27 | SiH3 |
| Si29H31 | SiH3-Si |
| Si29H36 | SiH3-SiH |
| | SiH4 |
| | SiH2-SiH |
| | Si2H |
| | Si2H2 |
| | Si2H4 |
| | Si2H5 |
| | Si2H6 |

Related publication(s):
[1] Raymond, M., Elvati, P., Saldinger, J. C., Lin, J. Shi, X., Violi, A. (2025). Machine learning models for Si nanoparticle growth in nonthermal plasma. Plasma Sources Sci. Technol. Https://doi.org/10.1088/1361-6595/adbae1
[2] Shi, X., Elvati, P., Violi, A. (2021). On the growth of Si nanoparticles in non-thermal plasma: physisorption to chemisorption conversion. J. Phys. D. https://doi.org/10.7302/vd87-wm68

Use and Access:
This data set is made available under a Creative Commons Public Domain license (CC0 1.0).

Download All Files (To download individual files, select them in the “Files” panel above)

Best for data sets < 3 GB. Downloads all files plus metadata into a zip file.



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to contact us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.