Work Description
Title: Large-scale transcriptome mining enables macrocyclic diversification and improved bioactivity of the stephanotic acid scaffold Open Access Deposited
Attribute | Value |
---|---|
Methodology |
|
Description |
|
Creator | |
Creator ORCID iD | |
Depositor | |
Contact information | |
Discipline | |
Funding agency |
|
ORSP grant number |
|
Keyword | |
Citations to related material |
|
Resource type | |
Last modified |
|
Published |
|
Language | |
DOI |
|
License |
(2025). Large-scale transcriptome mining enables macrocyclic diversification and improved bioactivity of the stephanotic acid scaffold [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/z14x-5m94
Relationships
- This work is not a member of any user collections.
Files (Count: 15; Size: 254 GB)
Thumbnailthumbnail-column | Title | Original Upload | Last Modified | File Size | Access | Actions |
---|---|---|---|---|---|---|
![]() |
README_-_updated20250328.txt | 2025-03-28 | 2025-04-06 | 6.14 KB | Open Access |
|
![]() |
denovo-assembly-benchmarking.tar.gz | 2025-03-26 | 2025-03-26 | 5.59 GB | Open Access |
|
![]() |
genome-guided-assembly-benchmark...ar.gz | 2025-03-26 | 2025-03-26 | 2.27 GB | Open Access |
|
![]() |
plant-transcriptomes-1000.tar.gz | 2025-03-26 | 2025-03-26 | 30.1 GB | Open Access |
|
![]() |
plant-transcriptomes-2000.tar.gz | 2025-03-26 | 2025-03-26 | 31.6 GB | Open Access |
|
![]() |
plant-transcriptomes-3000.tar.gz | 2025-03-26 | 2025-03-26 | 30 GB | Open Access |
|
![]() |
plant-transcriptomes-4000.tar.gz | 2025-03-26 | 2025-03-26 | 30.2 GB | Open Access |
|
![]() |
plant-transcriptomes-5000.tar.gz | 2025-03-26 | 2025-03-26 | 29.8 GB | Open Access |
|
![]() |
plant-transcriptomes-6000.tar.gz | 2025-03-26 | 2025-03-26 | 27.7 GB | Open Access |
|
![]() |
plant-transcriptomes-7000.tar.gz | 2025-03-26 | 2025-03-26 | 27.4 GB | Open Access |
|
![]() |
plant-transcriptomes-8000+.tar.gz | 2025-03-26 | 2025-03-26 | 19.9 GB | Open Access |
|
![]() |
plant-transcriptomes-8000.tar.gz | 2025-03-26 | 2025-03-26 | 19.9 GB | Open Access |
|
![]() |
Data_S3_-_Benchmarking_assembly.xlsx | 2025-03-27 | 2025-04-06 | 41.3 KB | Open Access |
|
![]() |
Data_S5_-_Benchmarking_assembly_....xlsx | 2025-03-27 | 2025-04-06 | 51.3 KB | Open Access |
|
![]() |
Data_S6_-_Transcriptome_table.xlsx | 2025-03-27 | 2025-04-06 | 415 KB | Open Access |
|
Date: March 25, 2025
Dataset Title: Large-scale plant transcriptome mining reveals macrocyclic diversification and improved lung cancer cell cytotoxicity of the stephanotic acid scaffold
Dataset Creators:
Xiaofeng Wang, Khadija Shafiq, Derrick Ousley, Desnor N. Chigumba, Dulciana Davis, Kali McDonough, Lisa S. Mydy, Jonathan Z. Sexton, Roland D. Kersten
Dataset Contact: Roland D. Kersten [email protected]
Funding: R35GM146934 (NIGMS), Herman Frasch Foundation, T32GM140223 (NIGMS), F32GM146395 (NIGMS), F31GM155959 (NIGMS), Rackham Merit Fellowship Program, PhRMA Foundation Predoctoral Fellowship, Rackham Predoctoral Fellowship
Key Points:
- We applied scaled de novo transcriptome assembly for the discovery of stephanotic acid-type burpitides and underlying cyclases.
- 7,579 plant species transcriptomes were assembled de novo with rnaSPAdes (v3.15.5) and searched for BURP domain transcripts encoding stephanotic acid core peptides (QLxxW) by tblastn (BLAST+ v2.16.0) on Sequenceserver (v3.1.0) and RepeatFinder.
- Candidate stephanotic-acid burpitide cyclases from 37 species were predicted and stephanotic acid-type burpitide cyclases with new second-ring-crosslinks were identified.
Research Overview:
Moroidins are plant ribosomally synthesized and posttranslationally modified peptides (RiPPs) called burpitides biosynthesized from copper-dependent peptide cyclases. The bicyclic structure of moroidins contains (1) a stephanotic acid scaffold with a Leu-Cꞵ-Trp-indole-C6-crosslink and (2) a C-terminal ring formed by a Trp-indole-C2-His-imidazole-N1-crosslink. Moroidin is cytotoxic to H1437 non-small cell lung adenocarcinoma cells in vitro, underscoring the potential of stephanotic acid-type burpitides as anticancer lead structures and the importance of exploring diversification strategies to discover analogs with enhanced bioactivity. We mined the transcriptome of 7,579 plant species from 498 plant families to identify moroidin analogs with novel second ring structures and the cyclases responsible for their biosynthesis. A search of >27,000 candidate burpitide cyclases reveals two stephanotic acid-type burpitides in plants with new second ring crosslinks derived from posttranslational modification: Glechomanin from ground ivy (Glechoma hederacea) with a C-C-crosslink between a C-terminal tryptophan-indole-C6 and the β-carbon of a valine, and Mercurialin from annual mercury (Mercurialis annua) featuring a C-O-crosslink between a C-terminal tyrosine-phenol hydroxy and the β-carbon of a phenylalanine, respectively. Furthermore, our transcriptomics-guided burpitide genotyping enabled isolation of a moroidin analog from water chickweed (Stellaria aquatica), which exhibits a nine-fold higher in vitro cytotoxicity than moroidin against H1437 lung adenocarcinoma cells. We demonstrate that plant transcriptome mining can expand the medicinal chemistry toolbox for chemical and biological exploration of burpitide lead structures.
Methodology:
Paired-end RNA-seq data were downloaded from NCBI sequence read archive via fasterq-dump (parameter: --split-files) of the SRA Toolkit (v2.10.9) to the Great Lakes High Performance Computing (HPC) Cluster at the University of Michigan, Ann Arbor. For large-scale assembly, the datasets were trimmed (TrimGalore v0.6.7) and assembled with rnaSPAdes (v3.15.5) on the Great Lakes HPC cluster as specified below. Assemblies which failed at 48 GB memory were assembled at 180 GB memory as noted in Data S6.
rnaSPAdes:
#SBATCH --partition=standard
#SBATCH --gres=gpu:1
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=7
#SBATCH --mem=48g
module load spades/3.15.5-jhe6qq2
spades.py --rna -1 ./SRA#_1_val_1.fq -2 ./SRA#_2_val_2.fq -o SRAaccession_file
Files contained here:
plant_transcriptomes_1000.tar.gz – plant transcriptome assemblies 1-1000
plant_transcriptomes_2000.tar.gz – plant transcriptome assemblies 1001-2000
plant_transcriptomes_3000.tar.gz - plant transcriptome assemblies 2001-3000
plant_transcriptomes_4000.tar.gz - plant transcriptome assemblies 3001-4000
plant_transcriptomes_5000.tar.gz - plant transcriptome assemblies 4001-5000
plant_transcriptomes_6000.tar.gz - plant transcriptome assemblies 5001-6000
plant_transcriptomes_7000.tar.gz - plant transcriptome assemblies 6001-7000
plant_transcriptomes_8000.tar.gz - plant transcriptome assemblies 7001-8000
plant_transcriptomes_8000+.tar.gz - plant transcriptome assemblies 8001-8893
genome-guided-assembly-benchmarking.tar.gz - genome-guided plant transcriptome assemblies
denovo-assembly-benchmarking.tar.gz - de novo plant transcriptome assemblies
Data_S3_-_Benchmarking_assembly.xlsx - Information about datasets from de novo RNA-seq assembly benchmarking
Data_S5_-_Benchmarking_assembly_-_de_novo_vs_genome-guided.xlsx - Information about datasets from comparison of de novo RNA-seq assembly and genome-guided RNA-seq assembly
Data_S6_-_Transcriptome_table.xlsx - Information about plant transcriptomes (1-8893)
For RNA-seq benchmarking, 27 RNA-seq datasets were trimmed (TrimGalore v0.6.7) and assembled de novo via SPAdes (v3.15.5), MEGAHIT (v1.2.9), and Trinity (v2.15.1) (Data S3). 16 RNA-seq datasets were assembled genome-guided via StringTie (v2.2.1) or Trinity (v2.15.1) after alignment and mapping of trimmed reads to the annotated reference genome (nucleotide fasta and gtf/gff3) with STAR (v2.7.11a) (Data S5).
Files contained here:
All scripts for this project are deposited on GitHub: https://github.com/UM-KerstenLab4009/Moroidin-Transcriptomics
Related publication(s):
Wang, X., Shafiq, K., Ousley, D. et al. (2025) Large-scale plant transcriptome mining reveals macrocyclic diversification and improved lung cancer cell cytotoxicity of the stephanotic acid scaffold.
Forthcoming.
Use and Access:
This data set is made available under a Creative Commons Public Domain license (CC0 1.0).
To Cite Data:
Wang, X., Shafiq, K., Ousley, D., Chigumba, D.N., Davis, D., McDonough, K., Mydy, L.S., Sexton, J.Z. & Kersten, R.D. (2024) Large-scale plant transcriptome mining reveals macrocyclic diversification and improved lung cancer cell cytotoxicity of the stephanotic acid scaffold [Data set]. University of Michigan - Deep Blue.