Work Description

Title: Simulation Data associated with the paper: Supercharging enables organized assembly of synthetic biomolecules Open Access Deposited

http://creativecommons.org/licenses/by-nc/4.0/
Attribute Value
Methodology
  • This dataset includes the output of hard particle Monte Carlo simulations conducted using the HPMC plugin to the simulation toolkit HOOMD-blue. Specifically, the data space includes simulation trajectories and logged output quantities associated with simulations. There are also some atomistic simulation files generated using GROMACS that are used to parametrize the coarse-grained models in HOOMD-blue. The data spaces are managed using the signac framework, which provides utilities for storing, accessing, and operating on the files without recourse to any server-based database. The custom HOOMD scripts used to run the simulations are included within the repository. The signac-flow package is used to organize and manage the workflows associated with both the HOOMD-blue and GROMACS simulations, and the Python files containing the full workflow logic are also included in the repositories.
Description
  • The goal of the work is to elucidate the stability of a complex experimentally observed structure of proteins. We found that supercharged GFP molecules spontaneously assemble into a complex 16-mer structure that we term a protomer, and that under the right conditions an even larger assembly is observed. The protomer structure is very well defined, and we performed simulations to try and understand the mechanics underlying its behavior. In particular, we focused on understanding the role of electrostatics in this system and how varying salt concentrations would alter the stability of the structure, with the ultimate goal of predicting the effects of various mutations on the stability of the structure. There are two separate projects included in this repository, but the two are closely linked. One, the candidate_structures folder, contains the atomistic outputs used to generate coarse-grained configurations. The actual coarse-grained simulations are in the rigid_protein folder, which pulls the atomistic coordinates from the other folder. All data is managed by signac and lives in the workspace directories, which contain various folders corresponding to different parameter combinations. The parameters associated with a given folder are stored in the signac_statepoint.json files within each subdirectory. The atomistic data uses experimentally determined protein structures as a starting point; all of these are stored in the ConfigFiles folder. The primary output is the topology files generated from the PDBs by GROMACS; these topologies are then used to parametrize the Monte Carlo simulations. In some cases, atomistic simulations were actually run as well, and the outputs are stored alongside the topology files. In the rigid_protein folder, the ConfigFiles folder contains MSMS, the software used to generate polyhedral representations of proteins from the PDBs in the candidate_structures folder. All of the actual polyhedral structures are also stored in the ConfigFiles folder. The actual simulation trajectories are stored as general simulation data (GSD) files within each subdirectory of the workspace, along with a single .pos file that contains the shape definition of the (nonconvex) polyhedron used to represent a protein. The logged quantities, such as energies and MC move sizes, are stored in .log files. The logic for the simulations in the candidate_structures project is in the Python scripts project.py, operations.py, and scripts/init.py. The rigid_protein folder also includes the notebooks directory, which contains Jupyter notebooks used to perform analyses, as well as the Python scripts used to actually perform the simulations and manage the data space. In particular, the project.py, operations.py and scripts/init.py scripts contain most of the logic associated with the simulations.
Creator
Depositor
  • vramasub@umich.edu
Contact information
Discipline
Funding agency
  • Other Funding Agency
Keyword
Citations to related material
  • Anna J Simon, Vyas Ramasubramani, Jens Glaser, Arti Pothukuchy, Jillian Gerberich, Janelle Leggere, Barrett R Morrow, Jimmy Golihar, Cheulhee Jung, Sharon C Glotzer, David W Taylor, Andrew D Ellington,"Supercharging enables organized assembly of synthetic biomolecules," bioRxiv 323261; doi: https://doi.org/10.1101/323261
Resource type
Last modified
  • 11/05/2019
Published
  • 11/14/2018
Language
DOI
  • https://doi.org/10.7302/nh6j-jk03
License
To Cite this Work:
Ramasubramani, V. (2018). Simulation Data associated with the paper: Supercharging enables organized assembly of synthetic biomolecules [Data set]. University of Michigan - Deep Blue. https://doi.org/10.7302/nh6j-jk03

Relationships

Files (Count: 5; Size: 4.77 GB)

# About
This project contains all code and data associated with the Nature Chemistry paper "Supercharging enables organized assembly of synthetic biomolecules".
The files are split into multiple archives to simplify the archiving process.
This README contains the information on all of the files and how to make use of them.
For more details on the contents beyond what is provided in the Deep Blue Data repository metadata, please see the paper itself.

Files

The data for this paper is split into two distinct but closely related projects that are detailed below:

  • candidate_stability: This project is associated with creating atomistic representations of the candidate protomer structures.
    • candidate_stability_workspace.tar.gz: This archive contains the outputs of atomistic simulations and the data associated with them. The data space was managed with signac, and all outputs were generated using GROMACS.
    • candidate_stability_project.tar.gz: This archive contains simulation code, analysis notebooks, and input files needed to generate the outputs contained in the candidate_stability_workspace.tar.gz archive. The code is written in Python using the signac framework to manage data and the workflow. All simulations were performed using GROMACS, which is interfaced with through Python functions calling the appropriate command line APIs. This archive includes its own, more detailed README.md, along with a LICENSE for the code.
  • rigid_protein: This project is associated with coarse-grained Monte Carlo simulations of the protomers.
    • rigid_protein_workspace.tar.gz: This archive contains the outputs of coarse-grained simulations and the data associated with them. All outputs are generated by HOOMD-blue. Each data point is linked to a corresponding atomistic seed structure from the candidate_stability_workspace.tar.gz archive.
    • rigid_protein_project.tar.gz: This archive contains simulation code, analysis notebooks, and input files needed to generate the outputs contained in the rigid_protein_workspace.tar.gz archive. The code is written in Python using the signac framework to manage data and the workflow. All simulations were performed using HOOMD-blue. This archive includes its own, more detailed README.md, along with a LICENSE for the code.

How to use

To unpack these archives into a usable form, the two *_project.tar.gz archives should each be unpacked into separate folders.
Then, the corresponding *_workspace.tar.gz archives should be unpacked into the corresponding project folders.
Once unpacked, simulations can be run using the project.py scripts in both folders using the signac-flow interface: python project.py run.
For more information, please see the relevant package documentation for HOOMD-blue, GROMACS, and signac.

Requirements

The following pieces of software are required for running the code.
Note that the version numbers shown indicate the last versions tested; earlier versions may work, but no guarantees are provided.

  • signac 0.9.2
  • signac-flow 0.6.1
  • HOOMD-blue 2.4.0
  • GSD 1.5.0
  • NumPy 1.14.1
  • Pandas 0.21.0
  • Intel TBB 2018 Update 5
  • Chimera: chimera production version 1.12 (build 41623) 2017-10-24 23:35:37 UTC
  • GROMACS: version 5.1.4

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 4.77 GB may be too large to download directly. Consider using Globus (see below).

Files are ready   Download Data from Globus
Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus