Show simple item record

Feature selection and classification over the network with missing node observations

dc.contributor.authorJin, Zhuxuan
dc.contributor.authorKang, Jian
dc.contributor.authorYu, Tianwei
dc.date.accessioned2022-04-08T18:04:00Z
dc.date.available2023-04-08 14:03:58en
dc.date.available2022-04-08T18:04:00Z
dc.date.issued2022-03-30
dc.identifier.citationJin, Zhuxuan; Kang, Jian; Yu, Tianwei (2022). "Feature selection and classification over the network with missing node observations." Statistics in Medicine 41(7): 1242-1262.
dc.identifier.issn0277-6715
dc.identifier.issn1097-0258
dc.identifier.urihttps://hdl.handle.net/2027.42/172014
dc.description.abstractJointly analyzing transcriptomic data and the existing biological networks can yield more robust and informative feature selection results, as well as better understanding of the biological mechanisms. Selecting and classifying node features over genome‐scale networks has become increasingly important in genomic biology and genomic medicine. Existing methods have some critical drawbacks. The first is they do not allow flexible modeling of different subtypes of selected nodes. The second is they ignore nodes with missing values, very likely to increase bias in estimation. To address these limitations, we propose a general modeling framework for Bayesian node classification (BNC) with missing values. A new prior model is developed for the class indicators incorporating the network structure. For posterior computation, we resort to the Swendsen‐Wang algorithm for efficiently updating class indicators. BNC can naturally handle missing values in the Bayesian modeling framework, which improves the node classification accuracy and reduces the bias in estimating gene effects. We demonstrate the advantages of our methods via extensive simulation studies and the analysis of the cutaneous melanoma dataset from The Cancer Genome Atlas.
dc.publisherJohn Wiley & Sons
dc.subject.othergene networks
dc.subject.otherBayesian nonparametrics
dc.subject.otherfalse discovery rate control
dc.subject.otherfeature selection
dc.titleFeature selection and classification over the network with missing node observations
dc.typeArticle
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbsecondlevelPublic Health
dc.subject.hlbsecondlevelMedicine (General)
dc.subject.hlbtoplevelHealth Sciences
dc.subject.hlbtoplevelScience
dc.subject.hlbtoplevelSocial Sciences
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/172014/1/sim9267_am.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/172014/2/sim9267.pdf
dc.identifier.doi10.1002/sim.9267
dc.identifier.sourceStatistics in Medicine
dc.identifier.citedreferenceDemidenko R, Razanauskas D, Daniunaite K, Lazutka JR, Jankevicius F, Jarmalaite S. Frequent down‐regulation of ABC transporter genes in prostate cancer. BMC Cancer. 2015; 15: 683.
dc.identifier.citedreferenceFalcon S, Gentleman R. Using GOstats to test gene lists for GO term association. Bioinformatics. 2007; 23: 257 ‐ 258.
dc.identifier.citedreferenceTaylor RC, Patel A, Panageas KS, Busam KJ, Brady MS. Tumor‐infiltrating lymphocytes predict sentinel lymph node positivity in patients with cutaneous melanoma. J Clin Oncol Offic J Am Soc Clin Oncol. 2007; 25: 869 ‐ 875.
dc.identifier.citedreferenceLardone RD, Plaisier SB, Navarrete MS, et al. Cross‐platform comparison of independent datasets identifies an immune signature associated with improved survival in metastatic melanoma. Oncotarget. 2016; 7: 14415 ‐ 14428.
dc.identifier.citedreferenceElsnerova K, Mohelnikova‐Duchonova B, Cerovska E, et al. Gene expression of membrane transporters: Importance for prognosis and progression of ovarian carcinoma. Oncol Rep. 2016; 35: 2159 ‐ 2170.
dc.identifier.citedreferenceZhou W, Feng XL, Li H, et al. Functional evidence for a nasopharyngeal carcinoma‐related gene BCAT1 located at 12p12. Oncol Res. 2007; 16: 405 ‐ 413.
dc.identifier.citedreferenceWarnecke‐Eberz U, Metzger R, Hölscher AH, Drebber U, Bollschweiler E. Diagnostic marker signature for esophageal cancer from transcriptome analysis. Tumour Biol J Int Soc Oncodevelop Biol Med. 2016; 37: 6349 ‐ 6358.
dc.identifier.citedreferenceGuo X, Li HW, Fei F, et al. Genetic variations in SLC3A2/CD98 gene as prognosis predictors in non‐small cell lung cancer. Mol Carcinog. 2015; 54 ( Suppl 1 ): E52 ‐ E60.
dc.identifier.citedreferenceEstrach S, Lee SA, Boulter E, et al. CD98hc (SLC3A2) loss protects against ras‐driven tumorigenesis by modulating integrin‐mediated mechanotransduction. Cancer Res. 2014; 74: 6878 ‐ 6889.
dc.identifier.citedreferenceBhat M, Skill N, Marcus V, et al. Decreased PCSK9 expression in human hepatocellular carcinoma. BMC Gastroenterol. 2015; 15: 176.
dc.identifier.citedreferenceHuang JF, Li L, Lian JH, et al. Tumor‐induced hyperlipidemia contributes to tumor growth. Cell Rep. 2016; 15: 336 ‐ 348.
dc.identifier.citedreferenceZlotnik A, Yoshie O. The chemokine superfamily revisited. Immunity. 2012; 36: 705 ‐ 716.
dc.identifier.citedreferenceJacquelot N, Enot DP, Flament C, et al. Chemokine receptor patterns in lymphocytes mirror metastatic spreading in melanoma. J Clin Investigat. 2016; 126: 921 ‐ 937.
dc.identifier.citedreferenceZhang JF, Chen Y, Lin GS, et al. High IFIT1 expression predicts improved clinical outcome, and IFIT1 along with MGMT more accurately predicts prognosis in newly diagnosed glioblastoma. Human Pathol. 2016; 52: 136 ‐ 144.
dc.identifier.citedreferenceLloyd MC, Szekeres K, Brown JS, Blanck G. Class II transactivator expression in melanoma cells facilitates T‐cell engulfment. Anticancer Res. 2015; 35: 25 ‐ 29.
dc.identifier.citedreferenceMartinet L, Le Guellec S, Filleron T, et al. High endothelial venules (HEVs) in human melanoma lesions: major gateways for tumor‐infiltrating lymphocytes. Oncoimmunology. 2012; 1: 829 ‐ 839.
dc.identifier.citedreferenceLiu WT, Peng YH, Tobin DJ. A new 12‐gene diagnostic biomarker signature of melanoma revealed by integrated microarray analysis. PeerJ. 2013; 1: e49.
dc.identifier.citedreferenceRentoft M, Lindell K, Tran P, et al. Heterozygous colon cancer‐associated mutations of SAMHD1 have functional significance. Proc Nat Acad Sci U S A. 2016; 113: 4723 ‐ 4728.
dc.identifier.citedreferencePaul P, Rouas‐Freiss N, Khalil‐Daher I, et al. HLA‐G expression in melanoma: a way for tumor cells to escape from immunosurveillance. Proc Nat Acad Sci U S A. 1998; 95: 4510 ‐ 4515.
dc.identifier.citedreferenceYan WH, Lin AF, Chang CC, Ferrone S. Induction of HLA‐G expression in a melanoma cell line OCM‐1A following the treatment with 5‐aza‐2’‐deoxycytidine. Cell Res. 2005; 15: 523 ‐ 531.
dc.identifier.citedreferenceDerré L, Corvaisier M, Charreau B, et al. Expression and release of HLA‐E by melanoma cells and melanocytes: potential impact on the response of cytotoxic effector cells. J Immunol. 2006; 177: 3100 ‐ 3107.
dc.identifier.citedreferenceGerlini G, Tun‐Kyi A, Dudli C, Burg G, Pimpinelli N, Nestle FO. Metastatic melanoma secreted IL‐10 down‐regulates CD1 molecules on dendritic cells in metastatic tumor lesions. Am J Pathol. 2004; 165: 1853 ‐ 1863.
dc.identifier.citedreferenceWennerberg E, Kremer V, Childs R, Lundqvist A. CXCL10‐induced migration of adoptively transferred human natural killer cells toward solid tumors causes regression of tumor growth in vivo. Cancer Immunol Immunother CII. 2015; 64: 225 ‐ 235.
dc.identifier.citedreferenceTherneau T A package for survival analysis in S. version 2.38; 2015.
dc.identifier.citedreferenceNi Y, Stingo F, Baladandayuthapani V. Bayesian graphical regression. J Am Stat Assoc. 2019; 114: 184 ‐ 197.
dc.identifier.citedreferenceNi Y, Muller P, Wei L, Ji Y. Bayesian graphical models for computational network biology. BMC Bioinform. 2018; 19: 63.
dc.identifier.citedreferenceCun YP, Fröhlich H. Biomarker gene signature discovery integrating network knowledge. Biology. 2012; 1: 5 ‐ 17.
dc.identifier.citedreferenceCun YP, Fröhlich H. Network and data integration for biomarker signature discovery via network smoothed T‐statistics. PLoS One. 2013; 8: e73074.
dc.identifier.citedreferenceEfron B, Tibshirani R. Empirical Bayes methods and false discovery rates for microarrays. Genet Epidemiol. 2002; 23: 70 ‐ 86.
dc.identifier.citedreferenceDo KA, Müller P, Tang F. A Bayesian mixture model for differential gene expression. J Royal Stat Soc Ser C (Appl Stat). 2005; 54: 627 ‐ 644.
dc.identifier.citedreferenceApolloni J, Leguizamón G, Alba E. Two hybrid wrapper‐filter feature selection algorithms applied to high‐dimensional microarray experiments. Appl. Soft Comput. 2016; 38: 922 ‐ 932.
dc.identifier.citedreferenceKong Y, Yu T. A graph‐embedded deep feedforward network for disease outcome classification and feature selection using gene expression data. Bioinformatics. 2018; 34 ( 21 ): 3727 ‐ 3737. doi: 10.1093/bioinformatics/bty429
dc.identifier.citedreferenceWei Z, Li HZ. A Markov random field model for network‐based analysis of genomic data. Bioinformatics. 2007; 23: 1537 ‐ 1544.
dc.identifier.citedreferenceLi CY, Li HZ. Network‐constrained regularization and variable selection for analysis of genomic data. Bioinformatics. 2008; 24: 1175 ‐ 1182.
dc.identifier.citedreferencePan W, Xie BH, Shen XT. Incorporating predictor network in penalized regression with application to microarray data. Biometrics. 2010; 66: 474 ‐ 484.
dc.identifier.citedreferenceWei P, Pan W. Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model. Bioinformatics. 2008; 24 ( 3 ): 404 ‐ 411. doi: 10.1093/bioinformatics/btm612
dc.identifier.citedreferenceWei P, Pan W. Network‐based genomic discovery: application and comparison of Markov random‐field models. J Royal Stat Soc Ser C‐Appl Stat. 2010; 59: 105 ‐ 125. doi: 10.1111/j.1467-9876.2009.00686.x
dc.identifier.citedreferenceLi F, Zhang NR. Bayesian variable selection in structured high‐dimensional covariate spaces with applications in genomics. J Am Stat Assoc. 2012; 105 ( 491 ): 1202 ‐ 1214.
dc.identifier.citedreferenceStingo FC, Vannucci M. Variable selection for discriminant analysis with Markov random field priors for the analysis of microarray data. Bioinformatics. 2011; 27: 495 ‐ 501.
dc.identifier.citedreferenceStingo FC, Chen YA, Tadesse MG, Vannucci M. Incorporating biological information into linear models: a Bayesian approach to the selection of pathways and genes. Ann Appl Stat. 2011; 5 ( 3 ): 1978 ‐ 2002.
dc.identifier.citedreferenceRočková V, George EI. EMVS: the EM approach to Bayesian variable selection. J Am Stat Assoc. 2014; 109: 828 ‐ 846.
dc.identifier.citedreferenceSun H, Lin W, Feng R, Li H. Network‐regularized high‐dimensional cox regression for analysis of genomic data. Stat Sin. 2014; 24 ( 3 ): 1433 ‐ 1459. doi: 10.5705/ss.2012.317
dc.identifier.citedreferenceDona MSI, Prendergast LA, Mathivanan S, Keerthikumar S, Salim A. Powerful differential expression analysis incorporating network topology for next‐generation sequencing data. Bioinformatics. 2017; 33 ( 10 ): 1505 ‐ 1513. doi: 10.1093/bioinformatics/btw833
dc.identifier.citedreferenceRen J, Du Y, Li S, Ma S, Jiang Y, Wu C. Robust network‐based regularization and variable selection for high‐dimensional genomic data in cancer prognosis. Genet Epidemiol. 2019; 43 ( 3 ): 276 ‐ 291. doi: 10.1002/gepi.22194
dc.identifier.citedreferenceZhang W, Ota T, Shridhar V, Chien J, Wu B, Kuang R. Network‐based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLoS Comput Biol. 2013; 9 ( 3 ): e1002975. doi: 10.1371/journal.pcbi.1002975
dc.identifier.citedreferenceWei P, Pan W. Bayesian joint modeling of multiple gene networks and diverse genomic data to identify target genes of a transcription factor. Ann Appl Stat. 2012; 6 ( 1 ): 334 ‐ 355. doi: 10.1214/11-Aoas502
dc.identifier.citedreferenceWu Z, Casciola‐Rosen L, Rosen A, Zeger SL. A Bayesian approach to restricted latent class models for scientifically structured clustering of multivariate binary outcomes. Biometrics. 2020. doi: 10.1111/biom.13388
dc.identifier.citedreferenceZhou F, He K, Li Q, Chapkin RS, Ni Y. Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization. Biostatistics. 2021. doi: 10.1093/biostatistics/kxab002
dc.identifier.citedreferenceLan Z, Zhao Y, Kang J, Yu T. Bayesian network feature finder (BANFF): an R package for gene network feature selection. Bioinformatics. 2016; 32 ( 23 ): 3685 ‐ 3687.
dc.identifier.citedreferenceZhao Y, Kang J, Yu TW. A Bayesian nonparametric mixture model for selecting genes and gene subnetworks. Ann Appl Stat. 2014; 8: 999.
dc.identifier.citedreferenceLittle RJA, Rubin DB. Statistical Analysis with Missing Data. Hoboken, NJ: John Wiley & Sons; 2014.
dc.identifier.citedreferenceNeal RM. Markov Chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat. 2012; 9 ( 2 ): 249 ‐ 265.
dc.identifier.citedreferenceEfron B, Storey JD, Tibshirani R. Microarrays empirical Bayes methods, and false discovery rates. 2001; Stanford University, Department of Statistics.
dc.identifier.citedreferenceLiang F. A double Metropolis–Hastings sampler for spatial models with intractable normalizing constants. J Stat Comput Simul. 2010; 80: 1007 ‐ 1022.
dc.identifier.citedreferenceNeal RM. Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat. 2000; 9: 249 ‐ 265.
dc.identifier.citedreferenceDas J, Yu HY. HINT: high‐quality protein interactomes and their applications in understanding human disease. BMC Syst Biol. 2012; 6: 92.
dc.identifier.citedreferenceClauset A, Newman ME, Moore C. Finding community structure in very large networks. Phys Rev E. 2004; 70: 066111.
dc.identifier.citedreferenceFraley C, Raftery AE. MCLUST: software for model‐based cluster analysis. J Classif. 1999; 16: 297 ‐ 306.
dc.identifier.citedreferenceNetwork CGA. Genomic classification of cutaneous melanoma. Cell. 2015; 161: 1681 ‐ 1696.
dc.identifier.citedreferenceCerami E, Gao JJ, Dogrusoz U, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012; 2: 401 ‐ 404.
dc.working.doiNOen
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.