Work Description

Title: Data For: Achieving Reproducibility and Replicability of Molecular Dynamics and Monte Carlo Simulations Using the Molecular Simulation Design Framework (MoSDeF) Open Access Deposited

h
Attribute Value
Methodology
  • Molecular simulations are complex numerical computations that require extensive knowledge and understanding to set up. As such, replicating simulation procedure can be difficult if it is necessary to translate the exact procedures that were used across methodologies, software, or parse from the text of a paper. Molecular Simulation Design Framework (MoSDeF) simplifies these tasks by allowing common workflows to be instantiated and used for a variety of engines, and documents the exact input files, force field used, and run time parameters for each engine. Specifically, HOOMD-blue, LAMMPS, GROMACS, Cassandra, MCCCS-MN, and GOMC are available simulation methods set up through MoSDeF. Across these engines, relative errors of <0.1% are discernible for even increasingly complex simulation procedures, which demonstrates the transferable nature of the package across different simulation workflows. Additionally, controllable experiments are run to evaluate the affects of small differences in methodologies that may exist across the engines. We demonstrate the utility of using open-source, group-curated tools for simulations research by comparing the standard errors herein to the standard errors found in previous literature. The data are generated through 6 simulation engines and conducted at 5 different universities. 1. HOOMD-blue 4.0.0 -> University of Michigan 2. LAMMPS 23Jun2022 -> Vanderbilt University 3. GROMACS 3.0.5 -> Vanderbilt University 4. Cassandra 1.2.5 -> Notre Dame University 5. MCCCS-MN 2020 -> University of Minnesota 6. GOMC v2.75a -> Wayne State University
Description
  • Data are collected in 5 separate workspace, one for the main density data calculations across the space and 4 for the subproject simulations that were performed to validate and dive deeper into specific engine implementations. In order to copy the simulation trajectory and calculated averages used to generate figures, these workspace folders must be downloaded and pointed to the correct place in the GitHub Project Structure, which can be found at  https://github.com/mosdef-hub/reproducibility_study

  • Each compressed file contains the data for a single workspace.
Creator
Depositor
Contact information
Discipline
Funding agency
  • National Science Foundation (NSF)
ORSP grant number
  • U-M Award ID: AWD010419. Project/Grant F051565.
Keyword
Citations to related material
Resource type
Last modified
  • 05/29/2025
Published
  • 05/29/2025
Language
DOI
  • https://doi.org/10.7302/fdqw-jy80
License
To Cite this Work:
Craven, N. C., Singh, R., Quach, C. D., Gilmer, J. B., Crawford, B., Marin-Rimoldi, E., Smith, R., DeFever, R., Dyukov, M., Fothergill, J., Jones, C., Moore, T., Butler, B. L., Anderson, J. A., Iacovella, C., Jankowski, E., Maginn, E., Potoff, J., Glotzer, S. C., McCabe, C., Cummings, P. T., Siepmann, I. J. (2025). Data For: Achieving Reproducibility and Replicability of Molecular Dynamics and Monte Carlo Simulations Using the Molecular Simulation Design Framework (MoSDeF) [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/fdqw-jy80

Relationships

This work is not a member of any user collections.

Files (Count: 7; Size: 1000 GB)

Data For: Achieving Reproducibility and Replicability of Molecular Dynamics and Monte Carlo Simulations Using the Molecular Simulation Design Framework (MoSDeF)

This work was funded by the National Science Foundation, Grant/Awards OAC-1835067, OAC-1835560,
OAC-1835593, OAC-1835612, OAC-1835613, OAC-1835630, OAC-1835713, OAC-1835874

Dataset Contact: [email protected]

Dataset Creators (by institution):

Key Points:

  • We compare the simulation densities for 5 different molecules across 3 MD engines and 3 MC engines
  • We evaluate the affects of different long range correction methods for MD and MC engines
  • Other affects of simulation implementation are evaluated for their affects on density with high precision
  • We compare simulations setup via MoSDeF, which is transferable across engines, to data generated across engines without MoSDeF: Michael Schappals et al. Journal of Chemical Theory and Computation 2017 13 (9), 4270-4280 DOI: 10.1021/acs.jctc.7b00489.

Research Overview:

Molecular simulations are complex numerical computations that require extensive knowledge and
understanding to set up. As such, replicating simulation procedure can be difficult if it is
necessary to translate the exact procedures that were used across methodologies, software, or parse
from the text of a paper. Molecular Simulation Design Framework (MoSDeF) simplifies these tasks by
allowing common workflows to be instantiated and used for a variety of engines, and documents the
exact input files, forcefield used, and run time parameters for each engine. Specifically,
HOOMD-blue, LAMMPS, GROMACS, Cassandra, MCCCS-MN, and GOMC are available simulation methods set up
through MoSDeF. Across these engines, relative errors of <0.1% are discernible for even increasingly
complex simulation procedures, and controllable experiments are able to be run to evaluate the
affects of small differences in methodologies that exist across the engines. We demonstrate the
utility of using these group-curated tools for simulations research by comparing the standard errors
herein to the standard errors found in previous literature.

Methodology:

The data are generated through 6 simulation engines and conducted at 5 different universities.

  1. HOOMD-blue 4.0.0 -> University of Michigan
  2. LAMMPS 23Jun2022 -> Vanderbilt University
  3. GROMACS 3.0.5 -> Vanderbilt University
  4. Cassandra 1.2.5 -> Notre Dame University
  5. MCCCS-MN 2020 -> University of Minnesota
  6. GOMC v2.75a -> Wayne State University

Data are collected in 5 separate workspace, one for the main density data calculations across the
space and 4 for the subproject simulations that were performed to validate and dive deeper into
specific engine implementations. In order to copy the simulation trajectory and calculated averages
used to generate figures, these workspace folders must be downloaded and pointed to the correct
place in the code which can be found in code.tar.zst in this archive and at
https://github.com/mosdef-hub/reproducibility_study

The following indicates where each directory in this repo should be installed relative to the GitHub
project structure.
workspace -> reproducibility_study/reproductibility_project/workspace
lrc_shift_subproject_workspace -> reproducibility_study/reproductibility_project/lrc_shift_subproject/workspace
mdmc_ethanol_subproject_workspace -> reproducibility_study/reproductibility_project/mdmc_ethanol_subproject/workspace
methane_systemsize_subproject_workspace -> reproducibility_study/reproductibility_project/methane_systemsize/workspace
water_spce_nist_workspace -> reproducibility_study/reproductibility_project/waterspce_nist_subproject/workspace
** note that the data in spe_subproject for the single point energies is found on GitHub and does not need to be
individually installed.

This work used signac-flow to manage the project, which was broken into pieces and run on clusters
at each university. The core steps of this process are as follows.

  1. Initialize job statepoints in the signac project 1.5. All jobs are then ran from the CLI for signac by navigating to the reproducibility_project/src/engines/ENGINE_NAME and running the local project.py file. For more information on interfacing with signac and signac-flow, see https://docs.signac.io/en/latest/flow-project.html#the-flowproject
  2. Generate molecular structures through scripts found in src/molecules/system_builder.py
  3. Submit simulation to cluster. This can be broken into multiple steps for some engines.
  4. Run data analysis reproducibility_project/aggregate_data/analyze_data.py
  5. Generate plots reproducibility_project/aggregate_summary/plotting.py
  6. Repeat 1-6 for each of the subprojects
    • reproducibility_project/lrc_shift_subproject
    • reproducibility_project/mdmc_ethanol_subproject
    • reproducibility_project/spe_subproject
    • reproducibility_project/methane_systemsize - There are multiple subsubprojects within this data, looking at number of particles and rcut changes.
    • reproducibility_project/waterspce_nist_subproject

Other packages can be found in the file environment.yml, and engine specific installs in
engines.yml. To install and activate, run with miniconda or anaconda installed on your device.


git clone https://github.com/mosdef-hub/reproducibility_study.git
cd reproducibility_study
mamba env create -f environment.yml
conda activate mosdef-study38

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 1000 GB is too large to download directly. Consider using Globus (see below).



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to contact us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.