The goal of the work is to elucidate the stability of a complex experimentally observed structure of proteins. We found that supercharged GFP molecules spontaneously assemble into a complex 16-mer structure that we term a protomer, and that under the right conditions an even larger assembly is observed. The protomer structure is very well defined, and we performed simulations to try and understand the mechanics underlying its behavior. In particular, we focused on understanding the role of electrostatics in this system and how varying salt concentrations would alter the stability of the structure, with the ultimate goal of predicting the effects of various mutations on the stability of the structure.
There are two separate projects included in this repository, but the two are closely linked. One, the candidate_structures folder, contains the atomistic outputs used to generate coarse-grained configurations. The actual coarse-grained simulations are in the rigid_protein folder, which pulls the atomistic coordinates from the other folder. All data is managed by signac and lives in the workspace directories, which contain various folders corresponding to different parameter combinations. The parameters associated with a given folder are stored in the signac_statepoint.json files within each subdirectory.
The atomistic data uses experimentally determined protein structures as a starting point; all of these are stored in the ConfigFiles folder. The primary output is the topology files generated from the PDBs by GROMACS; these topologies are then used to parametrize the Monte Carlo simulations. In some cases, atomistic simulations were actually run as well, and the outputs are stored alongside the topology files.
In the rigid_protein folder, the ConfigFiles folder contains MSMS, the software used to generate polyhedral representations of proteins from the PDBs in the candidate_structures folder. All of the actual polyhedral structures are also stored in the ConfigFiles folder. The actual simulation trajectories are stored as general simulation data (GSD) files within each subdirectory of the workspace, along with a single .pos file that contains the shape definition of the (nonconvex) polyhedron used to represent a protein. The logged quantities, such as energies and MC move sizes, are stored in .log files.
The logic for the simulations in the candidate_structures project is in the Python scripts project.py, operations.py, and scripts/init.py. The rigid_protein folder also includes the notebooks directory, which contains Jupyter notebooks used to perform analyses, as well as the Python scripts used to actually perform the simulations and manage the data space. In particular, the project.py, operations.py and scripts/init.py scripts contain most of the logic associated with the simulations.