Dataset for high-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases

Haynes, Laura M; Holding, Matthew L; DiGiovanni, Hannah L; Siemieniak, David; Ginsburg, David

Work Description

Title: Dataset for high-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases Open Access Deposited

Attribute	Value
Methodology	This data set was collected using a phage display PAI-1 variant library that was screened for its ability to inhibit different serine proteases (uPA, TMPRSS2, and coagulation factor XIIa). The PAI-1 amino acid substitutions in the PAI-1 variants that were able to inhibit each target were identified by sequencing the PAI-1 coding DNA from the input and selected phage using the Illumina DNA sequencing platform. The data are FASTQ files. Bioinformatics scripts to analyze the data are also contained within the repository.
Description	This data set contains FASTQ files of PAI-1 sequence variants that differentially inhibit different target proteases. The software needed to analyze these files can be found at https://github.com/hayneslm/PAI-1_and_divergent_proteases.
Creator	Haynes, Laura M Holding, Matthew L DiGiovanni, Hannah L Siemieniak, David Ginsburg, David
Creator ORCID iD	https://orcid.org/0000-0002-2237-659X
Depositor	[email protected]
Depositor creator	true
Contact information	[email protected], [email protected], [email protected], [email protected]
Discipline	Science
Funding agency	National Institutes of Health (NIH) Other Funding Agency
Other Funding agency	University of Michigan Frankel Cardiovascular Center
ORSP grant number	AWD026167
Keyword	serpins, serine proteases, coevolution, deep mutational scanning, fibrinolysis, phage display, protein-protein interactions, sequence space, DNA sequencing
Date coverage	2021
Citations to related material	Haynes LM, Holding ML, DiGiovanni H, Siemieniak D, Ginsburg D. High-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases. bioRxiv [Preprint]. 2024 Sep 20:2024.09.16.612699. doi: 10.1101/2024.09.16.612699. PMID: 39345533; PMCID: PMC11429915.
Resource type	Dataset
Last modified	02/24/2025
Published	02/14/2025
Language	English Perl Bash R
DOI	https://doi.org/10.7302/r2wk-3n35
License	http://creativecommons.org/licenses/by-nc/4.0/

To Cite this Work:
Haynes, L. M., Holding, M. L., DiGiovanni, H. L., Siemieniak, D., Ginsburg, D. (2025). Dataset for high-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/r2wk-3n35

Relationships


This work is not a member of any user collections.

Files (Count: 21; Size: 115 GB)

Title	Original Upload	Last Modified	File Size	Access	Actions
Key_to_FASTQ_files_v2.xlsx	2025-01-21	2025-01-21	9.51 KB	Open Access	View Details Download
Archive.zip	2025-01-21	2025-01-21	115 GB	Open Access	View Details
combine_r1_r2.pl	2025-02-13	2025-02-23	36.8 KB	Open Access	View Details Download
combine_r1_r2_cluster.sh	2025-02-13	2025-02-23	1.28 KB	Open Access	View Details Download
complete_blast_cluster.sh	2025-02-13	2025-02-23	2.39 KB	Open Access	View Details Download
screen_amplicon.pl	2025-02-13	2025-02-23	21.8 KB	Open Access	View Details Download
clean_convert_to_fasta.pl	2025-02-13	2025-02-23	5.96 KB	Open Access	View Details Download
process_proc.pl	2025-02-13	2025-02-23	6.26 KB	Open Access	View Details Download
concat_blast_result_files.sh	2025-02-13	2025-02-23	2.76 KB	Open Access	View Details Download
complete_blast_cluster.pl	2025-02-13	2025-02-23	7.67 KB	Open Access	View Details Download
start_blast_cluster.sh	2025-02-13	2025-02-23	2.88 KB	Open Access	View Details Download
screen_amplicon_cluster.sh	2025-02-13	2025-02-23	1.21 KB	Open Access	View Details Download
clean_convert_cluster.sh	2025-02-13	2025-02-23	1.67 KB	Open Access	View Details Download
_0_Pkgs_Libraries.R	2025-02-13	2025-02-23	942 Bytes	Open Access	View Details Download
_1_DESeq2.R	2025-02-13	2025-02-23	8.35 KB	Open Access	View Details Download
_2_Compare_DESeq2_results.R	2025-02-13	2025-02-23	8.03 KB	Open Access	View Details Download
_3_Compare_to_ConSurf_Scores.R	2025-02-13	2025-02-23	9.09 KB	Open Access	View Details Download
_4_DMSheatmaps.R	2025-02-13	2025-02-23	5.06 KB	Open Access	View Details Download
data_scores.txt	2025-02-13	2025-02-23	11.1 KB	Open Access	View Details Download
WTbg_0h_screen.txt	2025-02-13	2025-02-23	132 KB	Open Access	View Details Download
PAI-1_SPECICITY_DATASET_README_0...2.txt	2025-02-14	2025-02-23	7.39 KB	Open Access	View Details Download

This readme file was generated on 2025-02-13 by Laura M Haynes

GENERAL INFORMATION

Title of Dataset: High-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases

Dataset Creators:

Principal Investigator Information
Name: David Ginsburg
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]
ORCID: 0000-0002-6436-8942

Primary Author Information
Name: Laura M. Haynes
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]
ORCID: 0000-0002-2237-659X

Author Information
Name: Matthew L. Holding
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]
ORCID: 0000-0003-3477-3012

Name: Hannah L DiGionvanni
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]

Name: David Siemieniak
Institution: University of Michigan
Address: Life Sciences Institute
Email: [email protected]

Date of data collection: Data was collected in 2021

Information about funding sources that supported the collection of the data: This research was funded by the National Institutes of Health and the University of Michigan Frankel Cardiovascular Center.

SHARING/ACCESS INFORMATION

Licenses/restrictions placed on the data: http://creativecommons.org/licenses/by-nc/4.0/

Links to publications that cite or use the data: Haynes LM, Holding ML, DiGiovanni H, Siemieniak D, Ginsburg D. High-throughput amino acid-level characterization of the interactions of plasminogen activator inhibitor-1 with variably divergent proteases. bioRxiv [Preprint]. 2024 Sep 20:2024.09.16.612699. doi: 10.1101/2024.09.16.612699. PMID: 39345533; PMCID: PMC11429915.

DATA & FILE OVERVIEW

File List:
3392-LH-1_CACGATAT-AGATCTCG_S58_R1_001.fastq.gz
3392-LH-1_CACGATAT-AGATCTCG_S58_R2_001.fastq.gz
3392-LH-1_CACTCAAT-AGATCTCG_S59_R1_001.fastq.gz
3392-LH-1_CACTCAAT-AGATCTCG_S59_R2_001.fastq.gz
3392-LH-1_CAGGCGAT-AGATCTCG_S60_R1_001.fastq.gz
3392-LH-1_CAGGCGAT-AGATCTCG_S60_R2_001.fastq.gz
3392-LH-1_CATGGCAT-AGATCTCG_S61_R1_001.fastq.gz
3392-LH-1_CATGGCAT-AGATCTCG_S61_R2_001.fastq.gz
3392-LH-1_CATTTTAT-AGATCTCG_S62_R1_001.fastq.gz
3392-LH-1_CATTTTAT-AGATCTCG_S62_R2_001.fastq.gz
3392-LH-1_CCAACAAT-AGATCTCG_S63_R1_001.fastq.gz
3392-LH-1_CCAACAAT-AGATCTCG_S63_R2_001.fastq.gz
3392-LH-1_CGGAATAT-AGATCTCG_S64_R1_001.fastq.gz
3392-LH-1_CGGAATAT-AGATCTCG_S64_R2_001.fastq.gz
3392-LH-1_CTAGCTAT-AGATCTCG_S65_R1_001.fastq.gz
3392-LH-1_CTAGCTAT-AGATCTCG_S65_R2_001.fastq.gz
3392-LH-1_CTATACAT-AGATCTCG_S66_R1_001.fastq.gz
3392-LH-1_CTATACAT-AGATCTCG_S66_R2_001.fastq.gz
3936-LH-1_ACTGATAT-AGATCTCG_S80_R1_001.fastq.gz
3936-LH-1_ACTGATAT-AGATCTCG_S80_R2_001.fastq.gz
3936-LH-1_ATGAGCAT-AGATCTCG_S81_R1_001.fastq.gz
3936-LH-1_ATGAGCAT-AGATCTCG_S81_R2_001.fastq.gz
3936-LH-1_ATTCCTAT-AGATCTCG_S82_R1_001.fastq.gz
3936-LH-1_ATTCCTAT-AGATCTCG_S82_R2_001.fastq.gz
3936-LH-1_CAAAAGAT-AGATCTCG_S83_R1_001.fastq.gz
3936-LH-1_CAAAAGAT-AGATCTCG_S83_R2_001.fastq.gz
3936-LH-1_CAACTAAT-AGATCTCG_S84_R1_001.fastq.gz
3936-LH-1_CAACTAAT-AGATCTCG_S84_R2_001.fastq.gz
3936-LH-1_CACCGGAT-AGATCTCG_S85_R1_001.fastq.gz
3936-LH-1_CACCGGAT-AGATCTCG_S85_R2_001.fastq.gz
3936-LH-1_CACGATAT-AGATCTCG_S86_R1_001.fastq.gz
3936-LH-1_CACGATAT-AGATCTCG_S86_R2_001.fastq.gz
3936-LH-1_CACTCAAT-AGATCTCG_S87_R1_001.fastq.gz
3936-LH-1_CACTCAAT-AGATCTCG_S87_R2_001.fastq.gz
3936-LH-1_CAGGCGAT-AGATCTCG_S88_R1_001.fastq.gz
3936-LH-1_CAGGCGAT-AGATCTCG_S88_R2_001.fastq.gz
3936-LH-1_CATGGCAT-AGATCTCG_S89_R1_001.fastq.gz
3936-LH-1_CATGGCAT-AGATCTCG_S89_R2_001.fastq.gz
3936-LH-1_CATTTTAT-AGATCTCG_S90_R1_001.fastq.gz
3936-LH-1_CATTTTAT-AGATCTCG_S90_R2_001.fastq.gz
3936-LH-1_CCAACAAT-AGATCTCG_S91_R1_001.fastq.gz
3936-LH-1_CCAACAAT-AGATCTCG_S91_R2_001.fastq.gz
4641-LH-1_ACTGATAT-AGATCTCG_S321_R1_001.fastq.gz
4641-LH-1_ACTGATAT-AGATCTCG_S321_R2_001.fastq.gz
4641-LH-1_AGTCAAAT-AGATCTCG_S309_R1_001.fastq.gz
4641-LH-1_AGTCAAAT-AGATCTCG_S309_R2_001.fastq.gz
4641-LH-1_CGTACGAT-AGATCTCG_S318_R1_001.fastq.gz
4641-LH-1_CGTACGAT-AGATCTCG_S318_R2_001.fastq.gz
4641-LH-1_GAGTGGAT-AGATCTCG_S319_R1_001.fastq.gz
4641-LH-1_GAGTGGAT-AGATCTCG_S319_R2_001.fastq.gz
4641-LH-1_GATCAGAT-AGATCTCG_S306_R1_001.fastq.gz
4641-LH-1_GATCAGAT-AGATCTCG_S306_R2_001.fastq.gz
4641-LH-1_GGCTACAT-AGATCTCG_S308_R1_001.fastq.gz
4641-LH-1_GGCTACAT-AGATCTCG_S308_R2_001.fastq.gz
4641-LH-1_GGTAGCAT-AGATCTCG_S320_R1_001.fastq.gz
4641-LH-1_GGTAGCAT-AGATCTCG_S320_R2_001.fastq.gz
4641-LH-1_TAGCTTAT-AGATCTCG_S307_R1_001.fastq.gz
4641-LH-1_TAGCTTAT-AGATCTCG_S307_R2_001.fastq.gz
*a key to the data sets can be found in the accompanying file: "Key_to_FASTQ_files_v2.xlsx"

Script list with descriptions:
screen_amplicon.pl: compare consensus and reference sequences to call amino acid substitutions
process_proc.pl: subroutines for translating DNA sequences, assessing quality scores, and comparing paired-end reads for mismatches
complete_blast_cluster.pl: BLAST alignment and categorize sequences by alignment quality
combine_r1_r2.pl: aligns R1 and R2 sequencing reads
clean_convert_to_fasta.pl: processes FASTQ files
start_blast_cluster.sh: generates and submits SLURM batch job scripts for BLAST nucleotide searches
screen_amplicon_cluster.sh: Bash script that runs screen_amplicon.pl
concat_blast_result_files.sh: Concatenates BLAST results files fo R1 and R2 sequencing reads
complete_blast_cluster.sh: Bash script that executes complete_blast_cluster.pl
combine_r1_r2_cluster.sh: Bash script that executes combine_r1_r2.pl
clean_convert_cluster.sh: Bash script that executes clean_convert_to_fasta.pl
_0_Pkgs%Libraries.R: packages and libraries necessary to execute R scripts
_1_DESeq2.R: Executes DESeq2 analysis of counts per amino acid substitution determined from associated FASTQ files
_2_Compare_DESeq2_results.R: Determines significance thresholds for the data sets and compares datasets
_3_Compare_to_ConSurf: Compares DESeq2 results to ConSurf evolutionary conservation scores at each amino acid position in PAI-1 (data_scores.txt)
_4_DMSheatmaps.R: Generates heatmaps of the DMS data
*A permanent link to scripts can be found at: https://github.com/hayneslm/PAI-1_and_divergent_proteases

Other files needed to execute scripts:
data_scores.txt: ConSurf evolutionary conservation scores
WTbg_0h_screen: Original screen of the WT PAI-1 library to determine functional variants (Huttinger, Z.M., Haynes, L.M., Yee, A. et al. Deep mutational scanning of the plasminogen activator inhibitor-1 functional landscape. Sci Rep 11, 18827 (2021). https://doi.org/10.1038/s41598-021-97871-7)

METHODOLOGICAL INFORMATION

Description of methods used for collection/generation of data:
This data set was collected using a phage display PAI-1 library that was screened for its ability to inhibit different serine proteases (uPA, TMPRSS2, factor XIIa). The variants were identified using Illumina high-throughput DNA sequencing. The raw FASTQ files are contained in this data set.

Methods for processing the data: The software needed to analyze these files can be found contained within this dataset and at https://github.com/hayneslm/PAI-1_and_divergent_proteases.

Instrument- or software-specific information needed to interpret the data: Code is executed in bash, perl, and R programming languages

People involved with sample collection, processing, analysis and/or submission: Laura M Haynes, Matthew L Holding, David Siemieniak

Update Provenance Log Entries

Download All Files (To download individual files, select them in the “Files” panel above)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to contact us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.