- This dataset includes the output of hard particle Monte Carlo simulations conducted using the HPMC plugin to the simulation toolkit HOOMD-blue. Specifically, the data space includes simulation trajectories and logged output quantities associated with simulations. There are also some atomistic simulation files generated using GROMACS that are used to parametrize the coarse-grained models in HOOMD-blue. The data spaces are managed using the signac framework, which provides utilities for storing, accessing, and operating on the files without recourse to any server-based database. The custom HOOMD scripts used to run the simulations are included within the repository. The signac-flow package is used to organize and manage the workflows associated with both the HOOMD-blue and GROMACS simulations, and the Python files containing the full workflow logic are also included in the repositories.
The goal of the work is to elucidate the stability of a complex experimentally observed structure of proteins. We found that supercharged GFP molecules spontaneously assemble into a complex 16-mer structure that we term a protomer, and that under the right conditions an even larger assembly is observed. The protomer structure is very well defined, and we performed simulations to try and understand the mechanics underlying its behavior. In particular, we focused on understanding the role of electrostatics in this system and how varying salt concentrations would alter the stability of the structure, with the ultimate goal of predicting the effects of various mutations on the stability of the structure.
There are two separate projects included in this repository, but the two are closely linked. One, the candidate_structures folder, contains the atomistic outputs used to generate coarse-grained configurations. The actual coarse-grained simulations are in the rigid_protein folder, which pulls the atomistic coordinates from the other folder. All data is managed by signac and lives in the workspace directories, which contain various folders corresponding to different parameter combinations. The parameters associated with a given folder are stored in the signac_statepoint.json files within each subdirectory.
The atomistic data uses experimentally determined protein structures as a starting point; all of these are stored in the ConfigFiles folder. The primary output is the topology files generated from the PDBs by GROMACS; these topologies are then used to parametrize the Monte Carlo simulations. In some cases, atomistic simulations were actually run as well, and the outputs are stored alongside the topology files.
In the rigid_protein folder, the ConfigFiles folder contains MSMS, the software used to generate polyhedral representations of proteins from the PDBs in the candidate_structures folder. All of the actual polyhedral structures are also stored in the ConfigFiles folder. The actual simulation trajectories are stored as general simulation data (GSD) files within each subdirectory of the workspace, along with a single .pos file that contains the shape definition of the (nonconvex) polyhedron used to represent a protein. The logged quantities, such as energies and MC move sizes, are stored in .log files.
The logic for the simulations in the candidate_structures project is in the Python scripts project.py, operations.py, and scripts/init.py. The rigid_protein folder also includes the notebooks directory, which contains Jupyter notebooks used to perform analyses, as well as the Python scripts used to actually perform the simulations and manage the data space. In particular, the project.py, operations.py and scripts/init.py scripts contain most of the logic associated with the simulations.
- Anna J Simon, Vyas Ramasubramani, Jens Glaser, Arti Pothukuchy, Jillian Gerberich, Janelle Leggere, Barrett R Morrow, Jimmy Golihar, Cheulhee Jung, Sharon C Glotzer, David W Taylor, Andrew D Ellington,"Supercharging enables organized assembly of synthetic biomolecules," bioRxiv 323261; doi: https://doi.org/10.1101/323261