Feature selection and classification over the network with missing node observations
dc.contributor.author | Jin, Zhuxuan | |
dc.contributor.author | Kang, Jian | |
dc.contributor.author | Yu, Tianwei | |
dc.date.accessioned | 2022-04-08T18:04:00Z | |
dc.date.available | 2023-04-08 14:03:58 | en |
dc.date.available | 2022-04-08T18:04:00Z | |
dc.date.issued | 2022-03-30 | |
dc.identifier.citation | Jin, Zhuxuan; Kang, Jian; Yu, Tianwei (2022). "Feature selection and classification over the network with missing node observations." Statistics in Medicine 41(7): 1242-1262. | |
dc.identifier.issn | 0277-6715 | |
dc.identifier.issn | 1097-0258 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/172014 | |
dc.description.abstract | Jointly analyzing transcriptomic data and the existing biological networks can yield more robust and informative feature selection results, as well as better understanding of the biological mechanisms. Selecting and classifying node features over genome‐scale networks has become increasingly important in genomic biology and genomic medicine. Existing methods have some critical drawbacks. The first is they do not allow flexible modeling of different subtypes of selected nodes. The second is they ignore nodes with missing values, very likely to increase bias in estimation. To address these limitations, we propose a general modeling framework for Bayesian node classification (BNC) with missing values. A new prior model is developed for the class indicators incorporating the network structure. For posterior computation, we resort to the Swendsen‐Wang algorithm for efficiently updating class indicators. BNC can naturally handle missing values in the Bayesian modeling framework, which improves the node classification accuracy and reduces the bias in estimating gene effects. We demonstrate the advantages of our methods via extensive simulation studies and the analysis of the cutaneous melanoma dataset from The Cancer Genome Atlas. | |
dc.publisher | John Wiley & Sons | |
dc.subject.other | gene networks | |
dc.subject.other | Bayesian nonparametrics | |
dc.subject.other | false discovery rate control | |
dc.subject.other | feature selection | |
dc.title | Feature selection and classification over the network with missing node observations | |
dc.type | Article | |
dc.rights.robots | IndexNoFollow | |
dc.subject.hlbsecondlevel | Statistics and Numeric Data | |
dc.subject.hlbsecondlevel | Public Health | |
dc.subject.hlbsecondlevel | Medicine (General) | |
dc.subject.hlbtoplevel | Health Sciences | |
dc.subject.hlbtoplevel | Science | |
dc.subject.hlbtoplevel | Social Sciences | |
dc.description.peerreviewed | Peer Reviewed | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/172014/1/sim9267_am.pdf | |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/172014/2/sim9267.pdf | |
dc.identifier.doi | 10.1002/sim.9267 | |
dc.identifier.source | Statistics in Medicine | |
dc.identifier.citedreference | Demidenko R, Razanauskas D, Daniunaite K, Lazutka JR, Jankevicius F, Jarmalaite S. Frequent down‐regulation of ABC transporter genes in prostate cancer. BMC Cancer. 2015; 15: 683. | |
dc.identifier.citedreference | Falcon S, Gentleman R. Using GOstats to test gene lists for GO term association. Bioinformatics. 2007; 23: 257 ‐ 258. | |
dc.identifier.citedreference | Taylor RC, Patel A, Panageas KS, Busam KJ, Brady MS. Tumor‐infiltrating lymphocytes predict sentinel lymph node positivity in patients with cutaneous melanoma. J Clin Oncol Offic J Am Soc Clin Oncol. 2007; 25: 869 ‐ 875. | |
dc.identifier.citedreference | Lardone RD, Plaisier SB, Navarrete MS, et al. Cross‐platform comparison of independent datasets identifies an immune signature associated with improved survival in metastatic melanoma. Oncotarget. 2016; 7: 14415 ‐ 14428. | |
dc.identifier.citedreference | Elsnerova K, Mohelnikova‐Duchonova B, Cerovska E, et al. Gene expression of membrane transporters: Importance for prognosis and progression of ovarian carcinoma. Oncol Rep. 2016; 35: 2159 ‐ 2170. | |
dc.identifier.citedreference | Zhou W, Feng XL, Li H, et al. Functional evidence for a nasopharyngeal carcinoma‐related gene BCAT1 located at 12p12. Oncol Res. 2007; 16: 405 ‐ 413. | |
dc.identifier.citedreference | Warnecke‐Eberz U, Metzger R, Hölscher AH, Drebber U, Bollschweiler E. Diagnostic marker signature for esophageal cancer from transcriptome analysis. Tumour Biol J Int Soc Oncodevelop Biol Med. 2016; 37: 6349 ‐ 6358. | |
dc.identifier.citedreference | Guo X, Li HW, Fei F, et al. Genetic variations in SLC3A2/CD98 gene as prognosis predictors in non‐small cell lung cancer. Mol Carcinog. 2015; 54 ( Suppl 1 ): E52 ‐ E60. | |
dc.identifier.citedreference | Estrach S, Lee SA, Boulter E, et al. CD98hc (SLC3A2) loss protects against ras‐driven tumorigenesis by modulating integrin‐mediated mechanotransduction. Cancer Res. 2014; 74: 6878 ‐ 6889. | |
dc.identifier.citedreference | Bhat M, Skill N, Marcus V, et al. Decreased PCSK9 expression in human hepatocellular carcinoma. BMC Gastroenterol. 2015; 15: 176. | |
dc.identifier.citedreference | Huang JF, Li L, Lian JH, et al. Tumor‐induced hyperlipidemia contributes to tumor growth. Cell Rep. 2016; 15: 336 ‐ 348. | |
dc.identifier.citedreference | Zlotnik A, Yoshie O. The chemokine superfamily revisited. Immunity. 2012; 36: 705 ‐ 716. | |
dc.identifier.citedreference | Jacquelot N, Enot DP, Flament C, et al. Chemokine receptor patterns in lymphocytes mirror metastatic spreading in melanoma. J Clin Investigat. 2016; 126: 921 ‐ 937. | |
dc.identifier.citedreference | Zhang JF, Chen Y, Lin GS, et al. High IFIT1 expression predicts improved clinical outcome, and IFIT1 along with MGMT more accurately predicts prognosis in newly diagnosed glioblastoma. Human Pathol. 2016; 52: 136 ‐ 144. | |
dc.identifier.citedreference | Lloyd MC, Szekeres K, Brown JS, Blanck G. Class II transactivator expression in melanoma cells facilitates T‐cell engulfment. Anticancer Res. 2015; 35: 25 ‐ 29. | |
dc.identifier.citedreference | Martinet L, Le Guellec S, Filleron T, et al. High endothelial venules (HEVs) in human melanoma lesions: major gateways for tumor‐infiltrating lymphocytes. Oncoimmunology. 2012; 1: 829 ‐ 839. | |
dc.identifier.citedreference | Liu WT, Peng YH, Tobin DJ. A new 12‐gene diagnostic biomarker signature of melanoma revealed by integrated microarray analysis. PeerJ. 2013; 1: e49. | |
dc.identifier.citedreference | Rentoft M, Lindell K, Tran P, et al. Heterozygous colon cancer‐associated mutations of SAMHD1 have functional significance. Proc Nat Acad Sci U S A. 2016; 113: 4723 ‐ 4728. | |
dc.identifier.citedreference | Paul P, Rouas‐Freiss N, Khalil‐Daher I, et al. HLA‐G expression in melanoma: a way for tumor cells to escape from immunosurveillance. Proc Nat Acad Sci U S A. 1998; 95: 4510 ‐ 4515. | |
dc.identifier.citedreference | Yan WH, Lin AF, Chang CC, Ferrone S. Induction of HLA‐G expression in a melanoma cell line OCM‐1A following the treatment with 5‐aza‐2’‐deoxycytidine. Cell Res. 2005; 15: 523 ‐ 531. | |
dc.identifier.citedreference | Derré L, Corvaisier M, Charreau B, et al. Expression and release of HLA‐E by melanoma cells and melanocytes: potential impact on the response of cytotoxic effector cells. J Immunol. 2006; 177: 3100 ‐ 3107. | |
dc.identifier.citedreference | Gerlini G, Tun‐Kyi A, Dudli C, Burg G, Pimpinelli N, Nestle FO. Metastatic melanoma secreted IL‐10 down‐regulates CD1 molecules on dendritic cells in metastatic tumor lesions. Am J Pathol. 2004; 165: 1853 ‐ 1863. | |
dc.identifier.citedreference | Wennerberg E, Kremer V, Childs R, Lundqvist A. CXCL10‐induced migration of adoptively transferred human natural killer cells toward solid tumors causes regression of tumor growth in vivo. Cancer Immunol Immunother CII. 2015; 64: 225 ‐ 235. | |
dc.identifier.citedreference | Therneau T A package for survival analysis in S. version 2.38; 2015. | |
dc.identifier.citedreference | Ni Y, Stingo F, Baladandayuthapani V. Bayesian graphical regression. J Am Stat Assoc. 2019; 114: 184 ‐ 197. | |
dc.identifier.citedreference | Ni Y, Muller P, Wei L, Ji Y. Bayesian graphical models for computational network biology. BMC Bioinform. 2018; 19: 63. | |
dc.identifier.citedreference | Cun YP, Fröhlich H. Biomarker gene signature discovery integrating network knowledge. Biology. 2012; 1: 5 ‐ 17. | |
dc.identifier.citedreference | Cun YP, Fröhlich H. Network and data integration for biomarker signature discovery via network smoothed T‐statistics. PLoS One. 2013; 8: e73074. | |
dc.identifier.citedreference | Efron B, Tibshirani R. Empirical Bayes methods and false discovery rates for microarrays. Genet Epidemiol. 2002; 23: 70 ‐ 86. | |
dc.identifier.citedreference | Do KA, Müller P, Tang F. A Bayesian mixture model for differential gene expression. J Royal Stat Soc Ser C (Appl Stat). 2005; 54: 627 ‐ 644. | |
dc.identifier.citedreference | Apolloni J, Leguizamón G, Alba E. Two hybrid wrapper‐filter feature selection algorithms applied to high‐dimensional microarray experiments. Appl. Soft Comput. 2016; 38: 922 ‐ 932. | |
dc.identifier.citedreference | Kong Y, Yu T. A graph‐embedded deep feedforward network for disease outcome classification and feature selection using gene expression data. Bioinformatics. 2018; 34 ( 21 ): 3727 ‐ 3737. doi: 10.1093/bioinformatics/bty429 | |
dc.identifier.citedreference | Wei Z, Li HZ. A Markov random field model for network‐based analysis of genomic data. Bioinformatics. 2007; 23: 1537 ‐ 1544. | |
dc.identifier.citedreference | Li CY, Li HZ. Network‐constrained regularization and variable selection for analysis of genomic data. Bioinformatics. 2008; 24: 1175 ‐ 1182. | |
dc.identifier.citedreference | Pan W, Xie BH, Shen XT. Incorporating predictor network in penalized regression with application to microarray data. Biometrics. 2010; 66: 474 ‐ 484. | |
dc.identifier.citedreference | Wei P, Pan W. Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model. Bioinformatics. 2008; 24 ( 3 ): 404 ‐ 411. doi: 10.1093/bioinformatics/btm612 | |
dc.identifier.citedreference | Wei P, Pan W. Network‐based genomic discovery: application and comparison of Markov random‐field models. J Royal Stat Soc Ser C‐Appl Stat. 2010; 59: 105 ‐ 125. doi: 10.1111/j.1467-9876.2009.00686.x | |
dc.identifier.citedreference | Li F, Zhang NR. Bayesian variable selection in structured high‐dimensional covariate spaces with applications in genomics. J Am Stat Assoc. 2012; 105 ( 491 ): 1202 ‐ 1214. | |
dc.identifier.citedreference | Stingo FC, Vannucci M. Variable selection for discriminant analysis with Markov random field priors for the analysis of microarray data. Bioinformatics. 2011; 27: 495 ‐ 501. | |
dc.identifier.citedreference | Stingo FC, Chen YA, Tadesse MG, Vannucci M. Incorporating biological information into linear models: a Bayesian approach to the selection of pathways and genes. Ann Appl Stat. 2011; 5 ( 3 ): 1978 ‐ 2002. | |
dc.identifier.citedreference | Ročková V, George EI. EMVS: the EM approach to Bayesian variable selection. J Am Stat Assoc. 2014; 109: 828 ‐ 846. | |
dc.identifier.citedreference | Sun H, Lin W, Feng R, Li H. Network‐regularized high‐dimensional cox regression for analysis of genomic data. Stat Sin. 2014; 24 ( 3 ): 1433 ‐ 1459. doi: 10.5705/ss.2012.317 | |
dc.identifier.citedreference | Dona MSI, Prendergast LA, Mathivanan S, Keerthikumar S, Salim A. Powerful differential expression analysis incorporating network topology for next‐generation sequencing data. Bioinformatics. 2017; 33 ( 10 ): 1505 ‐ 1513. doi: 10.1093/bioinformatics/btw833 | |
dc.identifier.citedreference | Ren J, Du Y, Li S, Ma S, Jiang Y, Wu C. Robust network‐based regularization and variable selection for high‐dimensional genomic data in cancer prognosis. Genet Epidemiol. 2019; 43 ( 3 ): 276 ‐ 291. doi: 10.1002/gepi.22194 | |
dc.identifier.citedreference | Zhang W, Ota T, Shridhar V, Chien J, Wu B, Kuang R. Network‐based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLoS Comput Biol. 2013; 9 ( 3 ): e1002975. doi: 10.1371/journal.pcbi.1002975 | |
dc.identifier.citedreference | Wei P, Pan W. Bayesian joint modeling of multiple gene networks and diverse genomic data to identify target genes of a transcription factor. Ann Appl Stat. 2012; 6 ( 1 ): 334 ‐ 355. doi: 10.1214/11-Aoas502 | |
dc.identifier.citedreference | Wu Z, Casciola‐Rosen L, Rosen A, Zeger SL. A Bayesian approach to restricted latent class models for scientifically structured clustering of multivariate binary outcomes. Biometrics. 2020. doi: 10.1111/biom.13388 | |
dc.identifier.citedreference | Zhou F, He K, Li Q, Chapkin RS, Ni Y. Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization. Biostatistics. 2021. doi: 10.1093/biostatistics/kxab002 | |
dc.identifier.citedreference | Lan Z, Zhao Y, Kang J, Yu T. Bayesian network feature finder (BANFF): an R package for gene network feature selection. Bioinformatics. 2016; 32 ( 23 ): 3685 ‐ 3687. | |
dc.identifier.citedreference | Zhao Y, Kang J, Yu TW. A Bayesian nonparametric mixture model for selecting genes and gene subnetworks. Ann Appl Stat. 2014; 8: 999. | |
dc.identifier.citedreference | Little RJA, Rubin DB. Statistical Analysis with Missing Data. Hoboken, NJ: John Wiley & Sons; 2014. | |
dc.identifier.citedreference | Neal RM. Markov Chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat. 2012; 9 ( 2 ): 249 ‐ 265. | |
dc.identifier.citedreference | Efron B, Storey JD, Tibshirani R. Microarrays empirical Bayes methods, and false discovery rates. 2001; Stanford University, Department of Statistics. | |
dc.identifier.citedreference | Liang F. A double Metropolis–Hastings sampler for spatial models with intractable normalizing constants. J Stat Comput Simul. 2010; 80: 1007 ‐ 1022. | |
dc.identifier.citedreference | Neal RM. Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat. 2000; 9: 249 ‐ 265. | |
dc.identifier.citedreference | Das J, Yu HY. HINT: high‐quality protein interactomes and their applications in understanding human disease. BMC Syst Biol. 2012; 6: 92. | |
dc.identifier.citedreference | Clauset A, Newman ME, Moore C. Finding community structure in very large networks. Phys Rev E. 2004; 70: 066111. | |
dc.identifier.citedreference | Fraley C, Raftery AE. MCLUST: software for model‐based cluster analysis. J Classif. 1999; 16: 297 ‐ 306. | |
dc.identifier.citedreference | Network CGA. Genomic classification of cutaneous melanoma. Cell. 2015; 161: 1681 ‐ 1696. | |
dc.identifier.citedreference | Cerami E, Gao JJ, Dogrusoz U, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012; 2: 401 ‐ 404. | |
dc.working.doi | NO | en |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.