Work Description
Title: Dataset for high-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases Open Access Deposited
Attribute | Value |
---|---|
Methodology |
|
Description |
|
Creator | |
Creator ORCID iD | |
Depositor | |
Depositor creator |
|
Contact information | |
Discipline | |
Funding agency |
|
Other Funding agency |
|
ORSP grant number |
|
Keyword | |
Date coverage |
|
Citations to related material |
|
Resource type | |
Last modified |
|
Published |
|
Language | |
DOI |
|
License |
(2025). Dataset for high-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/r2wk-3n35
Relationships
- This work is not a member of any user collections.
Files (Count: 21; Size: 115 GB)
Thumbnailthumbnail-column | Title | Original Upload | Last Modified | File Size | Access | Actions |
---|---|---|---|---|---|---|
![]() |
Key_to_FASTQ_files_v2.xlsx | 2025-01-21 | 2025-01-21 | 9.51 KB | Open Access |
|
![]() |
Archive.zip | 2025-01-21 | 2025-01-21 | 115 GB | Open Access |
|
![]() |
combine_r1_r2.pl | 2025-02-13 | 2025-02-23 | 36.8 KB | Open Access |
|
![]() |
combine_r1_r2_cluster.sh | 2025-02-13 | 2025-02-23 | 1.28 KB | Open Access |
|
![]() |
complete_blast_cluster.sh | 2025-02-13 | 2025-02-23 | 2.39 KB | Open Access |
|
![]() |
screen_amplicon.pl | 2025-02-13 | 2025-02-23 | 21.8 KB | Open Access |
|
![]() |
clean_convert_to_fasta.pl | 2025-02-13 | 2025-02-23 | 5.96 KB | Open Access |
|
![]() |
process_proc.pl | 2025-02-13 | 2025-02-23 | 6.26 KB | Open Access |
|
![]() |
concat_blast_result_files.sh | 2025-02-13 | 2025-02-23 | 2.76 KB | Open Access |
|
![]() |
complete_blast_cluster.pl | 2025-02-13 | 2025-02-23 | 7.67 KB | Open Access |
|
![]() |
start_blast_cluster.sh | 2025-02-13 | 2025-02-23 | 2.88 KB | Open Access |
|
![]() |
screen_amplicon_cluster.sh | 2025-02-13 | 2025-02-23 | 1.21 KB | Open Access |
|
![]() |
clean_convert_cluster.sh | 2025-02-13 | 2025-02-23 | 1.67 KB | Open Access |
|
![]() |
_0_Pkgs_Libraries.R | 2025-02-13 | 2025-02-23 | 942 Bytes | Open Access |
|
![]() |
_1_DESeq2.R | 2025-02-13 | 2025-02-23 | 8.35 KB | Open Access |
|
![]() |
_2_Compare_DESeq2_results.R | 2025-02-13 | 2025-02-23 | 8.03 KB | Open Access |
|
![]() |
_3_Compare_to_ConSurf_Scores.R | 2025-02-13 | 2025-02-23 | 9.09 KB | Open Access |
|
![]() |
_4_DMSheatmaps.R | 2025-02-13 | 2025-02-23 | 5.06 KB | Open Access |
|
![]() |
data_scores.txt | 2025-02-13 | 2025-02-23 | 11.1 KB | Open Access |
|
![]() |
WTbg_0h_screen.txt | 2025-02-13 | 2025-02-23 | 132 KB | Open Access |
|
![]() |
PAI-1_SPECICITY_DATASET_README_0...2.txt | 2025-02-14 | 2025-02-23 | 7.39 KB | Open Access |
|
This readme file was generated on 2025-02-13 by Laura M Haynes
GENERAL INFORMATION
Title of Dataset: High-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases
Dataset Creators:
Principal Investigator Information
Name: David Ginsburg
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]
ORCID: 0000-0002-6436-8942
Primary Author Information
Name: Laura M. Haynes
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]
ORCID: 0000-0002-2237-659X
Author Information
Name: Matthew L. Holding
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]
ORCID: 0000-0003-3477-3012
Name: Hannah L DiGionvanni
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]
Name: David Siemieniak
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]
Date of data collection: Data was collected in 2021
Information about funding sources that supported the collection of the data: This research was funded by the National Institutes of Health and the University of Michigan Frankel Cardiovascular Center.
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: http://creativecommons.org/licenses/by-nc/4.0/
Links to publications that cite or use the data: Haynes LM, Holding ML, DiGiovanni H, Siemieniak D, Ginsburg D. High-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases. bioRxiv [Preprint]. 2024 Sep 20:2024.09.16.612699. doi: 10.1101/2024.09.16.612699. PMID: 39345533; PMCID: PMC11429915.
DATA & FILE OVERVIEW
File List:
3392-LH-1_CACGATAT-AGATCTCG_S58_R1_001.fastq.gz
3392-LH-1_CACGATAT-AGATCTCG_S58_R2_001.fastq.gz
3392-LH-1_CACTCAAT-AGATCTCG_S59_R1_001.fastq.gz
3392-LH-1_CACTCAAT-AGATCTCG_S59_R2_001.fastq.gz
3392-LH-1_CAGGCGAT-AGATCTCG_S60_R1_001.fastq.gz
3392-LH-1_CAGGCGAT-AGATCTCG_S60_R2_001.fastq.gz
3392-LH-1_CATGGCAT-AGATCTCG_S61_R1_001.fastq.gz
3392-LH-1_CATGGCAT-AGATCTCG_S61_R2_001.fastq.gz
3392-LH-1_CATTTTAT-AGATCTCG_S62_R1_001.fastq.gz
3392-LH-1_CATTTTAT-AGATCTCG_S62_R2_001.fastq.gz
3392-LH-1_CCAACAAT-AGATCTCG_S63_R1_001.fastq.gz
3392-LH-1_CCAACAAT-AGATCTCG_S63_R2_001.fastq.gz
3392-LH-1_CGGAATAT-AGATCTCG_S64_R1_001.fastq.gz
3392-LH-1_CGGAATAT-AGATCTCG_S64_R2_001.fastq.gz
3392-LH-1_CTAGCTAT-AGATCTCG_S65_R1_001.fastq.gz
3392-LH-1_CTAGCTAT-AGATCTCG_S65_R2_001.fastq.gz
3392-LH-1_CTATACAT-AGATCTCG_S66_R1_001.fastq.gz
3392-LH-1_CTATACAT-AGATCTCG_S66_R2_001.fastq.gz
3936-LH-1_ACTGATAT-AGATCTCG_S80_R1_001.fastq.gz
3936-LH-1_ACTGATAT-AGATCTCG_S80_R2_001.fastq.gz
3936-LH-1_ATGAGCAT-AGATCTCG_S81_R1_001.fastq.gz
3936-LH-1_ATGAGCAT-AGATCTCG_S81_R2_001.fastq.gz
3936-LH-1_ATTCCTAT-AGATCTCG_S82_R1_001.fastq.gz
3936-LH-1_ATTCCTAT-AGATCTCG_S82_R2_001.fastq.gz
3936-LH-1_CAAAAGAT-AGATCTCG_S83_R1_001.fastq.gz
3936-LH-1_CAAAAGAT-AGATCTCG_S83_R2_001.fastq.gz
3936-LH-1_CAACTAAT-AGATCTCG_S84_R1_001.fastq.gz
3936-LH-1_CAACTAAT-AGATCTCG_S84_R2_001.fastq.gz
3936-LH-1_CACCGGAT-AGATCTCG_S85_R1_001.fastq.gz
3936-LH-1_CACCGGAT-AGATCTCG_S85_R2_001.fastq.gz
3936-LH-1_CACGATAT-AGATCTCG_S86_R1_001.fastq.gz
3936-LH-1_CACGATAT-AGATCTCG_S86_R2_001.fastq.gz
3936-LH-1_CACTCAAT-AGATCTCG_S87_R1_001.fastq.gz
3936-LH-1_CACTCAAT-AGATCTCG_S87_R2_001.fastq.gz
3936-LH-1_CAGGCGAT-AGATCTCG_S88_R1_001.fastq.gz
3936-LH-1_CAGGCGAT-AGATCTCG_S88_R2_001.fastq.gz
3936-LH-1_CATGGCAT-AGATCTCG_S89_R1_001.fastq.gz
3936-LH-1_CATGGCAT-AGATCTCG_S89_R2_001.fastq.gz
3936-LH-1_CATTTTAT-AGATCTCG_S90_R1_001.fastq.gz
3936-LH-1_CATTTTAT-AGATCTCG_S90_R2_001.fastq.gz
3936-LH-1_CCAACAAT-AGATCTCG_S91_R1_001.fastq.gz
3936-LH-1_CCAACAAT-AGATCTCG_S91_R2_001.fastq.gz
4641-LH-1_ACTGATAT-AGATCTCG_S321_R1_001.fastq.gz
4641-LH-1_ACTGATAT-AGATCTCG_S321_R2_001.fastq.gz
4641-LH-1_AGTCAAAT-AGATCTCG_S309_R1_001.fastq.gz
4641-LH-1_AGTCAAAT-AGATCTCG_S309_R2_001.fastq.gz
4641-LH-1_CGTACGAT-AGATCTCG_S318_R1_001.fastq.gz
4641-LH-1_CGTACGAT-AGATCTCG_S318_R2_001.fastq.gz
4641-LH-1_GAGTGGAT-AGATCTCG_S319_R1_001.fastq.gz
4641-LH-1_GAGTGGAT-AGATCTCG_S319_R2_001.fastq.gz
4641-LH-1_GATCAGAT-AGATCTCG_S306_R1_001.fastq.gz
4641-LH-1_GATCAGAT-AGATCTCG_S306_R2_001.fastq.gz
4641-LH-1_GGCTACAT-AGATCTCG_S308_R1_001.fastq.gz
4641-LH-1_GGCTACAT-AGATCTCG_S308_R2_001.fastq.gz
4641-LH-1_GGTAGCAT-AGATCTCG_S320_R1_001.fastq.gz
4641-LH-1_GGTAGCAT-AGATCTCG_S320_R2_001.fastq.gz
4641-LH-1_TAGCTTAT-AGATCTCG_S307_R1_001.fastq.gz
4641-LH-1_TAGCTTAT-AGATCTCG_S307_R2_001.fastq.gz
*a key to the data sets can be found in the accompanying file: "Key_to_FASTQ_files_v2.xlsx"
Script list with descriptions:
screen_amplicon.pl: compare consensus and reference sequences to call amino acid substitutions
process_proc.pl: subroutines for translating DNA sequences, assessing quality scores, and comparing paired-end reads for mismatches
complete_blast_cluster.pl: BLAST alignment and categorize sequences by alignment quality
combine_r1_r2.pl: aligns R1 and R2 sequencing reads
clean_convert_to_fasta.pl: processes FASTQ files
start_blast_cluster.sh: generates and submits SLURM batch job scripts for BLAST nucleotide searches
screen_amplicon_cluster.sh: Bash script that runs screen_amplicon.pl
concat_blast_result_files.sh: Concatenates BLAST results files fo R1 and R2 sequencing reads
complete_blast_cluster.sh: Bash script that executes complete_blast_cluster.pl
combine_r1_r2_cluster.sh: Bash script that executes combine_r1_r2.pl
clean_convert_cluster.sh: Bash script that executes clean_convert_to_fasta.pl
_0_Pkgs%Libraries.R: packages and libraries necessary to execute R scripts
_1_DESeq2.R: Executes DESeq2 analysis of counts per amino acid substitution determined from associated FASTQ files
_2_Compare_DESeq2_results.R: Determines significance thresholds for the data sets and compares datasets
_3_Compare_to_ConSurf: Compares DESeq2 results to ConSurf evolutionary conservation scores at each amino acid position in PAI-1 (data_scores.txt)
_4_DMSheatmaps.R: Generates heatmaps of the DMS data
*A permanent link to scripts can be found at: https://github.com/hayneslm/PAI-1_and_divergent_proteases
Other files needed to execute scripts:
data_scores.txt: ConSurf evolutionary conservation scores
WTbg_0h_screen: Original screen of the WT PAI-1 library to determine functional variants (Huttinger, Z.M., Haynes, L.M., Yee, A. et al. Deep mutational scanning of the plasminogen activator inhibitor-1 functional landscape. Sci Rep 11, 18827 (2021). https://doi.org/10.1038/s41598-021-97871-7)
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
This data set was collected using a phage display PAI-1 library that was screened for its ability to inhibit different serine proteases (uPA, TMPRSS2, factor XIIa). The variants were identified using Illumina high-throughput DNA sequencing. The raw FASTQ files are contained in this data set.
Methods for processing the data: The software needed to analyze these files can be found contained within this dataset and at https://github.com/hayneslm/PAI-1_and_divergent_proteases.
Instrument- or software-specific information needed to interpret the data: Code is executed in bash, perl, and R programming languages
People involved with sample collection, processing, analysis and/or submission: Laura M Haynes, Matthew L Holding, David Siemieniak