Work Description

Title: Dataset for high-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases Open Access Deposited

h
Attribute Value
Methodology
  • This data set was collected using a phage display PAI-1 variant library that was screened for its ability to inhibit different serine proteases (uPA, TMPRSS2, and coagulation factor XIIa). The PAI-1 amino acid substitutions in the PAI-1 variants that were able to inhibit each target were identified by sequencing the PAI-1 coding DNA from the input and selected phage using the Illumina DNA sequencing platform. The data are FASTQ files. Bioinformatics scripts to analyze the data are also contained within the repository.
Description
Creator
Creator ORCID iD
Depositor
Depositor creator
  • true
Contact information
Discipline
Funding agency
  • National Institutes of Health (NIH)
  • Other Funding Agency
Other Funding agency
  • University of Michigan Frankel Cardiovascular Center
ORSP grant number
  • AWD026167
Keyword
Date coverage
  • 2021
Citations to related material
  • Haynes LM, Holding ML, DiGiovanni H, Siemieniak D, Ginsburg D. High-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases. bioRxiv [Preprint]. 2024 Sep 20:2024.09.16.612699. doi: 10.1101/2024.09.16.612699. PMID: 39345533; PMCID: PMC11429915.
Resource type
Last modified
  • 02/24/2025
Published
  • 02/14/2025
Language
DOI
  • https://doi.org/10.7302/r2wk-3n35
License
To Cite this Work:
Haynes, L. M., Holding, M. L., DiGiovanni, H. L., Siemieniak, D., Ginsburg, D. (2025). Dataset for high-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/r2wk-3n35

Relationships

This work is not a member of any user collections.

Files (Count: 21; Size: 115 GB)

This readme file was generated on 2025-02-13 by Laura M Haynes

GENERAL INFORMATION

Title of Dataset: High-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases

Dataset Creators:

Principal Investigator Information
Name: David Ginsburg
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]
ORCID: 0000-0002-6436-8942

Primary Author Information
Name: Laura M. Haynes
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]
ORCID: 0000-0002-2237-659X

Author Information
Name: Matthew L. Holding
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]
ORCID: 0000-0003-3477-3012

Name: Hannah L DiGionvanni
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]

Name: David Siemieniak
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]

Date of data collection: Data was collected in 2021

Information about funding sources that supported the collection of the data: This research was funded by the National Institutes of Health and the University of Michigan Frankel Cardiovascular Center.

SHARING/ACCESS INFORMATION

Licenses/restrictions placed on the data: http://creativecommons.org/licenses/by-nc/4.0/

Links to publications that cite or use the data: Haynes LM, Holding ML, DiGiovanni H, Siemieniak D, Ginsburg D. High-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases. bioRxiv [Preprint]. 2024 Sep 20:2024.09.16.612699. doi: 10.1101/2024.09.16.612699. PMID: 39345533; PMCID: PMC11429915.

DATA & FILE OVERVIEW

File List:
3392-LH-1_CACGATAT-AGATCTCG_S58_R1_001.fastq.gz
3392-LH-1_CACGATAT-AGATCTCG_S58_R2_001.fastq.gz
3392-LH-1_CACTCAAT-AGATCTCG_S59_R1_001.fastq.gz
3392-LH-1_CACTCAAT-AGATCTCG_S59_R2_001.fastq.gz
3392-LH-1_CAGGCGAT-AGATCTCG_S60_R1_001.fastq.gz
3392-LH-1_CAGGCGAT-AGATCTCG_S60_R2_001.fastq.gz
3392-LH-1_CATGGCAT-AGATCTCG_S61_R1_001.fastq.gz
3392-LH-1_CATGGCAT-AGATCTCG_S61_R2_001.fastq.gz
3392-LH-1_CATTTTAT-AGATCTCG_S62_R1_001.fastq.gz
3392-LH-1_CATTTTAT-AGATCTCG_S62_R2_001.fastq.gz
3392-LH-1_CCAACAAT-AGATCTCG_S63_R1_001.fastq.gz
3392-LH-1_CCAACAAT-AGATCTCG_S63_R2_001.fastq.gz
3392-LH-1_CGGAATAT-AGATCTCG_S64_R1_001.fastq.gz
3392-LH-1_CGGAATAT-AGATCTCG_S64_R2_001.fastq.gz
3392-LH-1_CTAGCTAT-AGATCTCG_S65_R1_001.fastq.gz
3392-LH-1_CTAGCTAT-AGATCTCG_S65_R2_001.fastq.gz
3392-LH-1_CTATACAT-AGATCTCG_S66_R1_001.fastq.gz
3392-LH-1_CTATACAT-AGATCTCG_S66_R2_001.fastq.gz
3936-LH-1_ACTGATAT-AGATCTCG_S80_R1_001.fastq.gz
3936-LH-1_ACTGATAT-AGATCTCG_S80_R2_001.fastq.gz
3936-LH-1_ATGAGCAT-AGATCTCG_S81_R1_001.fastq.gz
3936-LH-1_ATGAGCAT-AGATCTCG_S81_R2_001.fastq.gz
3936-LH-1_ATTCCTAT-AGATCTCG_S82_R1_001.fastq.gz
3936-LH-1_ATTCCTAT-AGATCTCG_S82_R2_001.fastq.gz
3936-LH-1_CAAAAGAT-AGATCTCG_S83_R1_001.fastq.gz
3936-LH-1_CAAAAGAT-AGATCTCG_S83_R2_001.fastq.gz
3936-LH-1_CAACTAAT-AGATCTCG_S84_R1_001.fastq.gz
3936-LH-1_CAACTAAT-AGATCTCG_S84_R2_001.fastq.gz
3936-LH-1_CACCGGAT-AGATCTCG_S85_R1_001.fastq.gz
3936-LH-1_CACCGGAT-AGATCTCG_S85_R2_001.fastq.gz
3936-LH-1_CACGATAT-AGATCTCG_S86_R1_001.fastq.gz
3936-LH-1_CACGATAT-AGATCTCG_S86_R2_001.fastq.gz
3936-LH-1_CACTCAAT-AGATCTCG_S87_R1_001.fastq.gz
3936-LH-1_CACTCAAT-AGATCTCG_S87_R2_001.fastq.gz
3936-LH-1_CAGGCGAT-AGATCTCG_S88_R1_001.fastq.gz
3936-LH-1_CAGGCGAT-AGATCTCG_S88_R2_001.fastq.gz
3936-LH-1_CATGGCAT-AGATCTCG_S89_R1_001.fastq.gz
3936-LH-1_CATGGCAT-AGATCTCG_S89_R2_001.fastq.gz
3936-LH-1_CATTTTAT-AGATCTCG_S90_R1_001.fastq.gz
3936-LH-1_CATTTTAT-AGATCTCG_S90_R2_001.fastq.gz
3936-LH-1_CCAACAAT-AGATCTCG_S91_R1_001.fastq.gz
3936-LH-1_CCAACAAT-AGATCTCG_S91_R2_001.fastq.gz
4641-LH-1_ACTGATAT-AGATCTCG_S321_R1_001.fastq.gz
4641-LH-1_ACTGATAT-AGATCTCG_S321_R2_001.fastq.gz
4641-LH-1_AGTCAAAT-AGATCTCG_S309_R1_001.fastq.gz
4641-LH-1_AGTCAAAT-AGATCTCG_S309_R2_001.fastq.gz
4641-LH-1_CGTACGAT-AGATCTCG_S318_R1_001.fastq.gz
4641-LH-1_CGTACGAT-AGATCTCG_S318_R2_001.fastq.gz
4641-LH-1_GAGTGGAT-AGATCTCG_S319_R1_001.fastq.gz
4641-LH-1_GAGTGGAT-AGATCTCG_S319_R2_001.fastq.gz
4641-LH-1_GATCAGAT-AGATCTCG_S306_R1_001.fastq.gz
4641-LH-1_GATCAGAT-AGATCTCG_S306_R2_001.fastq.gz
4641-LH-1_GGCTACAT-AGATCTCG_S308_R1_001.fastq.gz
4641-LH-1_GGCTACAT-AGATCTCG_S308_R2_001.fastq.gz
4641-LH-1_GGTAGCAT-AGATCTCG_S320_R1_001.fastq.gz
4641-LH-1_GGTAGCAT-AGATCTCG_S320_R2_001.fastq.gz
4641-LH-1_TAGCTTAT-AGATCTCG_S307_R1_001.fastq.gz
4641-LH-1_TAGCTTAT-AGATCTCG_S307_R2_001.fastq.gz
*a key to the data sets can be found in the accompanying file: "Key_to_FASTQ_files_v2.xlsx"

Script list with descriptions:
screen_amplicon.pl: compare consensus and reference sequences to call amino acid substitutions
process_proc.pl: subroutines for translating DNA sequences, assessing quality scores, and comparing paired-end reads for mismatches
complete_blast_cluster.pl: BLAST alignment and categorize sequences by alignment quality
combine_r1_r2.pl: aligns R1 and R2 sequencing reads
clean_convert_to_fasta.pl: processes FASTQ files
start_blast_cluster.sh: generates and submits SLURM batch job scripts for BLAST nucleotide searches
screen_amplicon_cluster.sh: Bash script that runs screen_amplicon.pl
concat_blast_result_files.sh: Concatenates BLAST results files fo R1 and R2 sequencing reads
complete_blast_cluster.sh: Bash script that executes complete_blast_cluster.pl
combine_r1_r2_cluster.sh: Bash script that executes combine_r1_r2.pl
clean_convert_cluster.sh: Bash script that executes clean_convert_to_fasta.pl
_0_Pkgs%Libraries.R: packages and libraries necessary to execute R scripts
_1_DESeq2.R: Executes DESeq2 analysis of counts per amino acid substitution determined from associated FASTQ files
_2_Compare_DESeq2_results.R: Determines significance thresholds for the data sets and compares datasets
_3_Compare_to_ConSurf: Compares DESeq2 results to ConSurf evolutionary conservation scores at each amino acid position in PAI-1 (data_scores.txt)
_4_DMSheatmaps.R: Generates heatmaps of the DMS data
*A permanent link to scripts can be found at: https://github.com/hayneslm/PAI-1_and_divergent_proteases

Other files needed to execute scripts:
data_scores.txt: ConSurf evolutionary conservation scores
WTbg_0h_screen: Original screen of the WT PAI-1 library to determine functional variants (Huttinger, Z.M., Haynes, L.M., Yee, A. et al. Deep mutational scanning of the plasminogen activator inhibitor-1 functional landscape. Sci Rep 11, 18827 (2021). https://doi.org/10.1038/s41598-021-97871-7)

METHODOLOGICAL INFORMATION

Description of methods used for collection/generation of data:
This data set was collected using a phage display PAI-1 library that was screened for its ability to inhibit different serine proteases (uPA, TMPRSS2, factor XIIa). The variants were identified using Illumina high-throughput DNA sequencing. The raw FASTQ files are contained in this data set.

Methods for processing the data: The software needed to analyze these files can be found contained within this dataset and at https://github.com/hayneslm/PAI-1_and_divergent_proteases.

Instrument- or software-specific information needed to interpret the data: Code is executed in bash, perl, and R programming languages

People involved with sample collection, processing, analysis and/or submission: Laura M Haynes, Matthew L Holding, David Siemieniak

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 115 GB is too large to download directly. Consider using Globus (see below).



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to contact us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.