Show simple item record

Bayesian data integration and variable selection for pan‐cancer survival prediction using protein expression data

dc.contributor.authorMaity, Arnab Kumar
dc.contributor.authorBhattacharya, Anirban
dc.contributor.authorMallick, Bani K.
dc.contributor.authorBaladandayuthapani, Veerabhadran
dc.date.accessioned2020-03-17T18:33:47Z
dc.date.availableWITHHELD_13_MONTHS
dc.date.available2020-03-17T18:33:47Z
dc.date.issued2020-03
dc.identifier.citationMaity, Arnab Kumar; Bhattacharya, Anirban; Mallick, Bani K.; Baladandayuthapani, Veerabhadran (2020). "Bayesian data integration and variable selection for pan‐cancer survival prediction using protein expression data." Biometrics 76(1): 316-325.
dc.identifier.issn0006-341X
dc.identifier.issn1541-0420
dc.identifier.urihttps://hdl.handle.net/2027.42/154486
dc.description.abstractAccurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, data are available for different tumor types; hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models, which accommodate all the challenges mentioned here. We use the hierarchical Bayesian accelerated failure time model for survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated “The Cancer Proteome Atlas” (TCPA), which contains reverse‐phase protein arrays–based high‐quality protein expression data as well as detailed clinical annotation, including survival times. Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model, which links different tumors with the correlated prior structures.
dc.publisherSpringer Science & Business Media
dc.publisherWiley Periodicals, Inc.
dc.subject.otherTCPA
dc.subject.otherAFT regression
dc.subject.otherborrowing strength
dc.subject.otherhorseshoe
dc.subject.otherpan‐cancer model
dc.titleBayesian data integration and variable selection for pan‐cancer survival prediction using protein expression data
dc.typeArticle
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelMathematics
dc.subject.hlbtoplevelScience
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/154486/1/biom13132_am.pdf
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/154486/2/biom13132.pdf
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/154486/3/biom13132-sup-0003-supmat.pdf
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/154486/4/biom13132-sup-0002-supplementary-v6-22Jul2019.pdf
dc.identifier.doi10.1111/biom.13132
dc.identifier.sourceBiometrics
dc.identifier.citedreferencePolson, N.G. and Scott, J.G. ( 2012 ) On the half‐Cauchy prior for a global scale parameter. Bayesian Analysis, 7 ( 4 ), 887 – 902.
dc.identifier.citedreferenceLi, J., Akbani, R., Zhao, W., Lu, Y., Weinstein, J.N., Mills, G.B. et al. ( 2017 ) Explore, visualize, and analyze functional cancer proteomic data using the Cancer Proteome Atlas. Cancer Research, 77 ( 21 ), e51 – e54.
dc.identifier.citedreferenceLi, J., Lu, Y., Akbani, R., Ju, Z., Roebuck, P.L., Liu, W. et al. ( 2013 ) TCPA: a resource for cancer functional proteomics data. Nature Methods, 10 ( 11 ), 1046 – 1047.
dc.identifier.citedreferenceLi, X., Zeng, D. and Wang, Y. ( 2015 ) Coxnet: Regularized Cox Model. R Package V0.2.
dc.identifier.citedreferenceLi, Y., Wang, J., Ye, J. and Reddy, C.K. ( 2016 ) A multi‐task learning formulation for survival analysis. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 1715 ‐ 1724.
dc.identifier.citedreferenceLinehan, W.M. and Ricketts, C.J. ( 2013 ) The metabolic basis of kidney cancer. Seminars in Cancer Biology, 23 ( 1 ), 46 – 55.
dc.identifier.citedreferenceLiu, B., Li, Y., Sun, Z., Ghosh, S. and Ng, K. ( 2018, February) Early Prediction of Diabetes Complications from Electronic Health Records: A Multi‐Task Survival Analysis Approach. Thirty‐Second AAAI Conference on Artificial Intelligence (AAAI‐18), New Orleans, LA.
dc.identifier.citedreferenceMiller, R.G. ( 1976 ) Least squares regression with censored data. Biometrika, 63 ( 3 ), 449 – 464.
dc.identifier.citedreferenceNarisetty, N.N. and He, X. ( 2014 ) Bayesian variable selection with shrinking and diffusing priors. The Annals of Statistics, 42 ( 2 ), 789 – 817.
dc.identifier.citedreferenceNi, D., Ma, X., Li, H.‐Z., Gao, Y., Li, X.‐T., Zhang, Y. et al. ( 2014 ) Downregulation of FOXO3a promotes tumor metastasis and is associated with metastasis‐free survival of patients with clear cell renal cell carcinoma. Clinical Cancer Research, 20 ( 7 ), 1779 – 1790.
dc.identifier.citedreferencePark, E.S., Rabinovsky, R., Carey, M., Hennessy, B.T., Agarwal, R., Liu, W. et al. ( 2010 ) Integrative analysis of proteomic signatures, mutations, and drug responsiveness in the NCI 60 cancer cell line set. Molecular Cancer Therapeutics, 9 ( 2 ), 257 – 267.
dc.identifier.citedreferencePeltola, T., Havulinna, A.S., Salomaa, V. and Vehtari, A. ( 2014 ) Hierarchical Bayesian survival analysis and projective covariate selection in cardiovascular event risk prediction. Proceedings of the Eleventh UAI Conference on Bayesian Modeling Applications Workshop, CEUR‐WS.org, volume 1218, pp. 79 ‐ 88.
dc.identifier.citedreferenceSha, N., Tadesse, M.G. and Vannucci, M. ( 2006 ) Bayesian variable selection for the analysis of microarray data with censored outcomes. Bioinformatics, 22 ( 18 ), 2262 – 2268.
dc.identifier.citedreferenceShankavaram, U.T., Reinhold, W.C., Nishizuka, S., Major, S., Morita, D., Chary, K.K. et al. ( 2007 ) Transcript and protein expression profiles of the NCI‐60 cancer cell panel: an integromic microarray study. Molecular Cancer Therapeutics, 6 ( 3 ), 820 – 832.
dc.identifier.citedreferenceSimon, N., Friedman, J., Hastie, T. and Tibshirani, R. ( 2011 ) Regularization paths for Coxs proportional hazards model via coordinate descent. Journal of Statistical Software, 39 ( 5 ), 1 – 13.
dc.identifier.citedreferenceTanner, M.A. and Wong, W.H. ( 1984 ) Data‐based nonparametric estimation of the hazard function with applications to model diagnostics and exploratory analysis. Journal of the American Statistical Association, 79 ( 385 ), 174 – 182.
dc.identifier.citedreferenceTibshirani, R. ( 1996 ) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological), 58 ( 1 ), 267 – 288.
dc.identifier.citedreferenceTibshirani, R. ( 1997 ) The lasso method for variable selection in the Cox model. Statistics in Medicine, 16 ( 4 ), 385 – 395.
dc.identifier.citedreferenceWang, J., Su, L., Chen, X., Li, P., Cai, Q., Yu, B. et al. ( 2014 ) Malat1 promotes cell proliferation in gastric cancer by recruiting sf2/asf. Biomedicine & Pharmacotherapy, 68 ( 5 ), 557 – 564.
dc.identifier.citedreferenceWang, L., Li, Y., Zhou, J., Zhu, D. and Ye, J. ( 2017 ) Multi‐task survival analysis. 2017 IEEE International Conference on Data Mining (ICDM), IEEE, pp. 485 ‐ 494.
dc.identifier.citedreferenceWang, X. and Song, L. ( 2011 ) Adaptive Lasso variable selection for the accelerated failure models. Communications in Statistics—Theory and Methods, 40 ( 24 ), 4372 – 4386.
dc.identifier.citedreferenceWei, L.‐J. ( 1992 ) The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Statistics in Medicine, 11 ( 14‐15 ), 1871 – 1879.
dc.identifier.citedreferenceWeinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K. et al. ( 2013 ) The cancer genome atlas pan‐cancer analysis project. Nature Genetics, 45 ( 10 ), 1113 – 1120.
dc.identifier.citedreferenceZhang, Z., Sinha, S., Maiti, T. and Shipp, E. ( 2018 ) Bayesian variable selection in the AFT model with an application to the SEER breast cancer data. Statistical Methods in Medical Research, 27 ( 4 ), 971 – 990.
dc.identifier.citedreferenceZou, H. ( 2006 ) The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101 ( 476 ), 1418 – 1429.
dc.identifier.citedreferenceAdvani, S.J., Camargo, M.F., Seguin, L., Mielgo, A., Anand, S., Hicks, A.M. et al. ( 2015 ) Kinase‐independent role for CRAF‐driving tumor radioresistance via CHK2. Nature Communications, 6, 6.
dc.identifier.citedreferenceAkbani, R., Ng, P.K.S., Werner, H.M., Shahmoradgoli, M., Zhang, F. Ju, Z. et al. ( 2014 ) A pan‐cancer proteomic perspective on The Cancer Genome Atlas. Nature Communications, 5, 3887.
dc.identifier.citedreferenceBaladandayuthapani, V., Talluri, R., Ji, Y., Coombes, K.R., Lu, Y., Hennessy, B.T. et al. ( 2014 ) Bayesian sparse graphical models for classification with application to protein expression data. The Annals of Applied Statistics, 8 ( 3 ), 1443.
dc.identifier.citedreferenceBhattacharya, A., Chakraborty, A. and Mallick, B.K. ( 2016 ) Fast sampling with Gaussian scale mixture priors in high‐dimensional regression. Biometrika, 103 ( 4 ), 985 – 991.
dc.identifier.citedreferenceBhattacharya, A., Pati, D., Pillai, N.S. and Dunson, D.B. ( 2015 ) Dirichlet–Laplace priors for optimal shrinkage. Journal of the American Statistical Association, 110 ( 512 ), 1479 – 1490.
dc.identifier.citedreferenceBlaschke, S., Mueller, C.A., Markovic‐Lipkovski, J., Puch, S., Miosge, N., Becker, V. et al. ( 2002 ) Expression of cadherin‐8 in renal cell carcinoma and fetal kidney. International Journal of Cancer, 101 ( 4 ), 327 – 334.
dc.identifier.citedreferenceBonato, V., Baladandayuthapani, V., Broom, B.M., Sulman, E.P., Aldape, K.D. and Do, K.‐A. ( 2011 ) Bayesian ensemble methods for survival prediction in gene expression data. Bioinformatics, 27 ( 3 ), 359 – 367.
dc.identifier.citedreferenceBrier, G. ( 1950 ) Verification of forecasts expressed in term of probabilities. Monthly Weather Review, 78, 1 – 3.
dc.identifier.citedreferenceCai, T., Huang, J. and Tian, L. ( 2009 ) Regularized estimation for the accelerated failure time model. Biometrics, 65 ( 2 ), 394 – 404.
dc.identifier.citedreferenceCarvalho, C.M., Polson, N.G. and Scott, J.G. ( 2010 ) The horseshoe estimator for sparse signals. Biometrika, 97 ( 2 ), 465 – 480.
dc.identifier.citedreferenceChen, W., Hill, H., Christie, A., Kim, M.S., Holloman, E., Pavia‐Jimenez, A. et al. ( 2016 ) Targeting renal cell carcinoma with a HIF‐2 antagonist. Nature, 539 ( 7627 ), 112.
dc.identifier.citedreferenceCox, D.R. ( 1972 ) Regression models and life‐tables. Journal of the Royal Statistical Society, Series B (Methodological), 34 ( 2 ), 187 – 220.
dc.identifier.citedreferenceDaemen, A., Gevaert, O., Ojeda, F., Debucquoy, A., Suykens, J.A., Sempoux, C. et al. ( 2009 ) A kernel‐based integration of genome‐wide data for clinical decision support. Genome Medicine, 1 ( 4 ), 39.
dc.identifier.citedreferenceDuckworth, C., Zhang, L., Carroll, S., Ethier, S. and Cheung, H. ( 2016 ) Overexpression of GAB2 in ovarian cancer cells promotes tumor growth and angiogenesis by upregulating chemokine expression. Oncogene, 35 ( 31 ), 4036 – 4047.
dc.identifier.citedreferenceGeorge, E.I. and McCulloch, R.E. ( 1993 ) Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88 ( 423 ), 881 – 889.
dc.identifier.citedreferenceHamid, J.S., Hu, P., Roslin, N.M., Ling, V., Greenwood, C.M. and Beyene, J. ( 2009 ) Data integration in genetics and genomics: methods and challenges. Human Genomics and Proteomics, 1 ( 1 ), 1 – 13.
dc.identifier.citedreferenceHothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A. and Van Der Laan, M.J. ( 2006 ) Survival ensembles. Biostatistics, 7 ( 3 ), 355 – 373.
dc.identifier.citedreferenceHuang, J. and Ma, S. ( 2010 ) Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis, 16 ( 2 ), 176 – 195.
dc.identifier.citedreferenceHuang, J., Ma, S. and Xie, H. ( 2006 ) Regularized estimation in the accelerated failure time model with high‐dimensional covariates. Biometrics, 62 ( 3 ), 813 – 820.
dc.identifier.citedreferenceIbrahim, J.G., Chen, M.‐H. and Gray, R.J. ( 2002 ) Bayesian models for gene expression with DNA microarray data. Journal of the American Statistical Association, 97 ( 457 ), 88 – 99.
dc.identifier.citedreferenceJansen, R., Lan, N., Qian, J. and Gerstein, M. ( 2002 ) Integration of genomic datasets to predict protein complexes in yeast. Journal of Structural and Functional Genomics, 2 ( 2 ), 71 – 81.
dc.identifier.citedreferenceKhan, M.H.R. and Shaw, J.E.H. ( 2016 ) Variable selection for survival data with a class of adaptive elastic net techniques. Statistics and Computing, 26 ( 3 ), 725 – 741.
dc.identifier.citedreferenceKhan, M.H.R. and Shaw, J.E.H. ( 2017 ) Variable selection for accelerated lifetime models with synthesized estimation techniques. Statistical Methods in Medical Research, 1 – 17.
dc.identifier.citedreferenceKleinbaum, D.G. and Klein, M. ( 2006 ) Survival Analysis: A Self‐Learning Text. New York, NY: Springer Science & Business Media.
dc.identifier.citedreferenceKling, T., Johansson, P., Sanchez, J., Marinescu, V.D., Jörnsten, R. and Nelander, S. ( 2015 ) Efficient exploration of pan‐cancer networks by generalized covariance selection and interactive web content. Nucleic Acids Research, 43 ( 15 ), 98 – 98.
dc.identifier.citedreferenceLee, K.E. and Mallick, B.K. ( 2004 ) Bayesian methods for variable selection in survival models with application to DNA microarray data. Sankhyā: The Indian Journal of Statistics, 66 ( 4 ), 756 – 778.
dc.identifier.citedreferenceLi, H. and Luan, Y. ( 2002 ) Kernel Cox regression models for linking gene expression profiles to censored survival data. Pacific Symposium on Biocomputing, 8, 65.
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.