Work Description

Title: Classification of complex local environments in systems of particle shapes through shape-symmetry encoded data augmentation Open Access Deposited

h
Attribute Value
Methodology
  • The data set contains both particle trajectory data and codes. The particle trajectory data were simulated and produced by the HOOMD-blue simulation engine using Molecular Dynamics (MD) and hard particle Monte Carlo (HPMC) method. The particle trajectory data, containing particle configurations (positions and orientations), were generated in the GSD file format. We trained and tested our multilayer perceptron (MLP) classifiers through PyTorch Python package. As for the data augmentation part, particle neighborhood finding was calculated with freud analysis pacakge, the quaternion algebra was handled by rowan, and the shape information evaluation was performed with coxeter. We recommend users to use Ovito to visualize the particle trajectory data. Please refer to our publication and the README.md for detailed usage of the data set.
Description
  • The trajectory data and codes were generated for our work "Classification of complex local environments in systems of particle shapes through shape-symmetry encoded data augmentation" (amidst peer review process). The data sets contain trajectory data in GSD file format for 7 test systems, including cubic structures, two-dimensional and three-dimensional patchy particle shape systems, hexagonal bipyramids with two aspect ratios, and truncated shapes with two degrees of truncation. Besides, the corresponding Python code and Jupyter notebook used to perform data augmentation, MLP classifier training, and MLP classifier testing are included.
Creator
Depositor
  • shihkual@umich.edu
Contact information
Discipline
ORSP grant number
  • DMR 2302470
Keyword
Date coverage
  • 2022 to 2023
Citations to related material
Resource type
Last modified
  • 02/23/2024
Published
  • 02/23/2024
Language
DOI
  • https://doi.org/10.7302/w13t-2177
License
To Cite this Work:
Lee, S. K., Tsai, S. T., Glotzer, S. C. (2024). Classification of complex local environments in systems of particle shapes through shape-symmetry encoded data augmentation [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/w13t-2177

Relationships

This work is not a member of any user collections.

Files (Count: 2; Size: 2.3 GB)

Classification of complex local environments in systems of particle shapes through shape-symmetry encoded data augmentation

Shih-Kuang (Alex) Lee, Sun-Ting Tsai & Sharon C. Glotzer

Method

The data set contains both particle trajectory data and codes. The particle trajectory data were simulated and produced by the HOOMD-blue simulation engine using Molecular Dynamics (MD) and hard particle Monte Carlo (HPMC) method. The particle trajectory data, containing particle configurations (positions and orientations), were generated in the GSD file format. We trained and tested our multilayer perceptron (MLP) classifiers through PyTorch Python package. As for the data augmentation part, particle neighborhood finding was calculated with freud analysis pacakge, the quaternion algebra was handled by rowan, and the shape information evaluation was performed with coxeter. We recommend users to use Ovito to visualize the particle trajectory data.

Relevant documentation:

References:

  • J. A. Anderson, J. Glaser, and S. C. Glotzer. HOOMD-blue: A Python package for high-performance molecular dynamics and hard particle Monte Carlo simulations. Computational Materials Science 173: 109363, Feb 2020. 10.1016/j.commatsci.2019.109363
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32 (Curran Associates, Inc., 2019) pp. 8024–8035. https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
  • Ramasubramani, Vyas, and Sharon C. Glotzer. "rowan: A Python package for working with quaternions." Journal of Open Source Software 3, no. 27 (2018): 787. doi: 10.21105/joss.00787
  • V. Ramasubramani, B. D. Dice, E. S. Harper, M. P. Spellings, J. A. Anderson, and S. C. Glotzer. freud: A Software Suite for High Throughput Analysis of Particle Simulation Data. Computer Physics Communications Volume 254, September 2020, 107275. doi:10.1016/j.cpc.2020.107275.
  • A. Stukowski. Visualization and analysis of atomistic simulation data with OVITO – the Open Visualization Tool, Modelling Simul. Mater. Sci. Eng. 18 (2010), 015012. doi:10.1088/0965-0393/18/1/015012

Description

The trajectory data and codes were generated for our work "Classification of complex local environments in systems of particle shapes through shape-symmetry encoded data augmentation" (amidst peer review process). The data sets contain trajectory data in GSD file format for 7 test systems, including cubic structures, two-dimensional and three-dimensional patchy particle shape systems, hexagonal bipyramids with two aspect ratios, and truncated shapes with two degrees of truncation. Besides, the corresponding Python code and Jupyter notebook used to perform data augmentation, MLP classifier training, and MLP classifier testing are included.

After unzipping the mlp_crystal_classifier.zip file, inside the mlp_crystal_classifier directory, the data set is organized as follows: 1) The utils directory stores Python scripts in order to perform data augmentation, model training, and model testing tasks. It contains three different scripts to separate functions aiming for different usage. 2) The data folder stores training and testing particle trajectory data in the GSD file format of the seven systems of particle shapes. 3) The cube, patchy_triangles, patchy_prism, hexagonal_bps, truncated_tetrahedrons, and truncated_octahedrons folders, separately, store the Jupyter notebooks for performing data augmentation, model training, and model testing tasks for our seven systems of particle shapes. Below, we refer to them as experiment folders.

We explain three groups of folders in detail: :

  1. The utils folder:
  2. data_prep.py:

Store functions for performing data preparation by constructing local geometric fingerprints of particles of the selected snapshots in a given trajectory.

  • prepare_data: Prepare data without symmetry augmentation.

  • prepare_data_augmented: Prepare data with symmetry augmentation.

  • prepare_triangles_data: Prepare data without symmetry augmentation specifically for the patchy triangles system, which requires labeling particles based on particle types.

  • prepare_triangles_data_augmented: Prepare data with symmetry augmentation specifically for the patchy triangles system, which requires labeling particles based on particle types.

  • normalize_angleij: Normalize the angular part of the relative position in spherical coordinates based on the input reference particle orientation and shape symmetry factors.

  • normalize_qij: Normalize the relative quaternion based on the input reference particle orientation and shape symmetry factors.

  • ‎appendSpherical_np: Transform the positions from cartesian coordinates to spherical coordinates.

  • evaluation.py:

    Store functions for testing.

    • mlp_eval: Evaluate the classification of the MLP classifier trained on non-augmented data.
    • mlp_eval_augmented: Evaluate the classification of the MLP classifier trained on augmented data.
    • calc_rdf: Calculate radial distribution function given trajectory file and frames using freud's functionality.
  • mlp.py:

    For creating MLP classifier and data sets loading. Besides, store the functions for training and testing.

  1. The data folder

Stores trajectory files in GSD file format of the seven systems of particle shapes, including cube, patchy_triangles, patchy_prism, hexagonal_bps, truncated_tetrahedrons, and truncated_octahedrons.

  1. The experiment folder

Stores Jupyter notebooks of cookdata.ipynb, training.ipynb, and testing.ipynb to demonstrate data preparation, training and testing of the MLP classifier.

Citation: Lee, Shih-Kuang (Alex), Tsai, Sun-Ting, and Sharon Glotzer. "Classification of complex local environments in systems of particle shapes through shape-symmetry encoded data augmentation." arXiv preprint arXiv:2312.11822 (2023).

Corresponding Author: Sharon Glotzer: email: sglotzerkjc@umich.edu

Depositing Author: Shih-Kuang (Alex) Lee: email: shihkual@umich.edu

Date Written: February 23rd, 2024

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 2.3 GB may be too large to download directly. Consider using Globus (see below).



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.