Work Description

Title: Large-scale transcriptome mining enables macrocyclic diversification and improved bioactivity of the stephanotic acid scaffold Open Access Deposited

h
Attribute Value
Methodology
  • Paired-end RNA-seq data (Data S6) were downloaded from NCBI SRA via fasterq-dump (--split-files) of the SRA Toolkit (v2.10.9) to the Great Lakes High Performance Computing (HPC) Cluster at the University of Michigan, Ann Arbor. For benchmarking, the datasets (Data S3) were trimmed via TrimGalore (v0.6.7) with default settings (Phred cutoff: 20, Default pair-cutoff: 20 bp) assembled on the Great Lakes HPC cluster with Trinity (v2.15.1) 25, SPAdes (v3.15.5) 26,27, or MEGAHIT (v1.2.9) 32 with the parameters in scripts specified in the Supporting Information. The working memory for all benchmarking datasets was 48 GB except for Trinity assembly of SRR7440026 data (80 GB memory), which failed at 48 GB memory. For large-scale assembly, the datasets were assembled with SPAdes on the Great Lakes HPC cluster with the same parameters as for benchmarking (see Supporting Information). Assemblies which failed at 48 GB memory were assembled at 180 GB memory as noted in Data S6.
Description
  • Moroidins are plant ribosomally synthesized and posttranslationally modified peptides (RiPPs) called burpitides biosynthesized from copper-dependent peptide cyclases. The bicyclic structure of moroidins contains (1) a stephanotic acid scaffold with a Leu-Cꞵ-Trp-indole-C6-crosslink and (2) a C-terminal ring formed by a Trp-indole-C2-His-imidazole-N1-crosslink. Moroidin is cytotoxic to H1437 non-small cell lung adenocarcinoma cells in vitro, underscoring the potential of stephanotic acid-type burpitides as anticancer lead structures and the importance of exploring diversification strategies to discover analogs with enhanced bioactivity. We mined the transcriptome of 7579 plant species from 498 plant families to identify moroidin analogs with novel second-ring structures and the cyclases responsible for their biosynthesis. A search of >27000 candidate burpitide cyclases reveals two stephanotic acid-type burpitides in plants with new second-ring crosslinks derived from posttranslational modification: Glechomanin from ground ivy (Glechoma hederacea) with a C-C-crosslink between a C-terminal tryptophan-indole-C6 and the β-carbon of a valine, and Mercurialin from annual mercury (Mercurialis annua) featuring a C-O-crosslink between a C-terminal tyrosine-phenol hydroxy and the β-carbon of a phenylalanine, respectively. Furthermore, our transcriptomics-guided burpitide genotyping enabled isolation of a moroidin analog from water chickweed (Stellaria aquatica), which exhibits a ten-fold higher in vitro cytotoxicity than moroidin and selective toxicity against H1437 lung adenocarcinoma cells. We demonstrate that plant transcriptome mining can expand the medicinal chemistry toolbox for chemical and biological exploration of burpitide lead structures.
Creator
Creator ORCID iD
Depositor
Contact information
Discipline
Funding agency
  • National Institutes of Health (NIH)
ORSP grant number
  • AWD021278
Keyword
Citations to related material
  • in preparation
Resource type
Last modified
  • 03/28/2025
Published
  • 03/28/2025
Language
DOI
  • https://doi.org/10.7302/z14x-5m94
License
To Cite this Work:
Kersten, R. D., Ousley, D., Wang, X., Chigumba, D. N., Davis, D., Shafiq, K., McDonough, K., Mydy, L. M., Sexton, J. Z. (2025). Large-scale transcriptome mining enables macrocyclic diversification and improved bioactivity of the stephanotic acid scaffold [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/z14x-5m94

Relationships

This work is not a member of any user collections.

Files (Count: 15; Size: 254 GB)

Date: March 25, 2025

Dataset Title: Large-scale plant transcriptome mining reveals macrocyclic diversification and improved lung cancer cell cytotoxicity of the stephanotic acid scaffold

Dataset Creators:
Xiaofeng Wang, Khadija Shafiq, Derrick Ousley, Desnor N. Chigumba, Dulciana Davis, Kali McDonough, Lisa S. Mydy, Jonathan Z. Sexton, Roland D. Kersten
Dataset Contact: Roland D. Kersten [email protected]

Funding: R35GM146934 (NIGMS), Herman Frasch Foundation, T32GM140223 (NIGMS), F32GM146395 (NIGMS), F31GM155959 (NIGMS), Rackham Merit Fellowship Program, PhRMA Foundation Predoctoral Fellowship, Rackham Predoctoral Fellowship

Key Points:
- We applied scaled de novo transcriptome assembly for the discovery of stephanotic acid-type burpitides and underlying cyclases.
- 7,579 plant species transcriptomes were assembled de novo with rnaSPAdes (v3.15.5) and searched for BURP domain transcripts encoding stephanotic acid core peptides (QLxxW) by tblastn (BLAST+ v2.16.0) on Sequenceserver (v3.1.0) and RepeatFinder.
- Candidate stephanotic-acid burpitide cyclases from 37 species were predicted and stephanotic acid-type burpitide cyclases with new second-ring-crosslinks were identified.

Research Overview:
Moroidins are plant ribosomally synthesized and posttranslationally modified peptides (RiPPs) called burpitides biosynthesized from copper-dependent peptide cyclases. The bicyclic structure of moroidins contains (1) a stephanotic acid scaffold with a Leu-Cꞵ-Trp-indole-C6-crosslink and (2) a C-terminal ring formed by a Trp-indole-C2-His-imidazole-N1-crosslink. Moroidin is cytotoxic to H1437 non-small cell lung adenocarcinoma cells in vitro, underscoring the potential of stephanotic acid-type burpitides as anticancer lead structures and the importance of exploring diversification strategies to discover analogs with enhanced bioactivity. We mined the transcriptome of 7,579 plant species from 498 plant families to identify moroidin analogs with novel second ring structures and the cyclases responsible for their biosynthesis. A search of >27,000 candidate burpitide cyclases reveals two stephanotic acid-type burpitides in plants with new second ring crosslinks derived from posttranslational modification: Glechomanin from ground ivy (Glechoma hederacea) with a C-C-crosslink between a C-terminal tryptophan-indole-C6 and the β-carbon of a valine, and Mercurialin from annual mercury (Mercurialis annua) featuring a C-O-crosslink between a C-terminal tyrosine-phenol hydroxy and the β-carbon of a phenylalanine, respectively. Furthermore, our transcriptomics-guided burpitide genotyping enabled isolation of a moroidin analog from water chickweed (Stellaria aquatica), which exhibits a nine-fold higher in vitro cytotoxicity than moroidin against H1437 lung adenocarcinoma cells. We demonstrate that plant transcriptome mining can expand the medicinal chemistry toolbox for chemical and biological exploration of burpitide lead structures.

Methodology:
Paired-end RNA-seq data were downloaded from NCBI sequence read archive via fasterq-dump (parameter: --split-files) of the SRA Toolkit (v2.10.9) to the Great Lakes High Performance Computing (HPC) Cluster at the University of Michigan, Ann Arbor. For large-scale assembly, the datasets were trimmed (TrimGalore v0.6.7) and assembled with rnaSPAdes (v3.15.5) on the Great Lakes HPC cluster as specified below. Assemblies which failed at 48 GB memory were assembled at 180 GB memory as noted in Data S6.

rnaSPAdes:
#SBATCH --partition=standard
#SBATCH --gres=gpu:1
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=7
#SBATCH --mem=48g
module load spades/3.15.5-jhe6qq2
spades.py --rna -1 ./SRA#_1_val_1.fq -2 ./SRA#_2_val_2.fq -o SRAaccession_file

Files contained here:
plant_transcriptomes_1000.tar.gz – plant transcriptome assemblies 1-1000
plant_transcriptomes_2000.tar.gz – plant transcriptome assemblies 1001-2000
plant_transcriptomes_3000.tar.gz - plant transcriptome assemblies 2001-3000
plant_transcriptomes_4000.tar.gz - plant transcriptome assemblies 3001-4000
plant_transcriptomes_5000.tar.gz - plant transcriptome assemblies 4001-5000
plant_transcriptomes_6000.tar.gz - plant transcriptome assemblies 5001-6000
plant_transcriptomes_7000.tar.gz - plant transcriptome assemblies 6001-7000
plant_transcriptomes_8000.tar.gz - plant transcriptome assemblies 7001-8000
plant_transcriptomes_8000+.tar.gz - plant transcriptome assemblies 8001-8893
genome-guided-assembly-benchmarking.tar.gz - genome-guided plant transcriptome assemblies
denovo-assembly-benchmarking.tar.gz - de novo plant transcriptome assemblies
Data_S3_-_Benchmarking_assembly.xlsx - Information about datasets from de novo RNA-seq assembly benchmarking
Data_S5_-_Benchmarking_assembly_-_de_novo_vs_genome-guided.xlsx - Information about datasets from comparison of de novo RNA-seq assembly and genome-guided RNA-seq assembly
Data_S6_-_Transcriptome_table.xlsx - Information about plant transcriptomes (1-8893)

For RNA-seq benchmarking, 27 RNA-seq datasets were trimmed (TrimGalore v0.6.7) and assembled de novo via SPAdes (v3.15.5), MEGAHIT (v1.2.9), and Trinity (v2.15.1) (Data S3). 16 RNA-seq datasets were assembled genome-guided via StringTie (v2.2.1) or Trinity (v2.15.1) after alignment and mapping of trimmed reads to the annotated reference genome (nucleotide fasta and gtf/gff3) with STAR (v2.7.11a) (Data S5).

Files contained here:

All scripts for this project are deposited on GitHub: https://github.com/UM-KerstenLab4009/Moroidin-Transcriptomics

Related publication(s):
Wang, X., Shafiq, K., Ousley, D. et al. (2025) Large-scale plant transcriptome mining reveals macrocyclic diversification and improved lung cancer cell cytotoxicity of the stephanotic acid scaffold.
Forthcoming.

Use and Access:
This data set is made available under a Creative Commons Public Domain license (CC0 1.0).

To Cite Data:
Wang, X., Shafiq, K., Ousley, D., Chigumba, D.N., Davis, D., McDonough, K., Mydy, L.S., Sexton, J.Z. & Kersten, R.D. (2024) Large-scale plant transcriptome mining reveals macrocyclic diversification and improved lung cancer cell cytotoxicity of the stephanotic acid scaffold [Data set]. University of Michigan - Deep Blue.

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 254 GB is too large to download directly. Consider using Globus (see below).



Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to contact us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.