Variable Selection for Model-Based High-Dimensional Clustering and Its Application to Microarray Data
dc.contributor.author | Wang, Sijian | en_US |
dc.contributor.author | Zhu, Ji | en_US |
dc.date.accessioned | 2010-04-01T15:49:02Z | |
dc.date.available | 2010-04-01T15:49:02Z | |
dc.date.issued | 2008-06 | en_US |
dc.identifier.citation | Wang, Sijian; Zhu, Ji (2008). "Variable Selection for Model-Based High-Dimensional Clustering and Its Application to Microarray Data." Biometrics 64(2): 440-448. <http://hdl.handle.net/2027.42/66311> | en_US |
dc.identifier.issn | 0006-341X | en_US |
dc.identifier.issn | 1541-0420 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/66311 | |
dc.identifier.uri | http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=retrieve&db=pubmed&list_uids=17970821&dopt=citation | en_US |
dc.description.abstract | Variable selection in high-dimensional clustering analysis is an important yet challenging problem. In this article, we propose two methods that simultaneously separate data points into similar clusters and select informative variables that contribute to the clustering. Our methods are in the framework of penalized model-based clustering. Unlike the classical L 1 -norm penalization, the penalty terms that we propose make use of the fact that parameters belonging to one variable should be treated as a natural “group.” Numerical results indicate that the two new methods tend to remove noninformative variables more effectively and provide better clustering results than the L 1 -norm approach. | en_US |
dc.format.extent | 1244380 bytes | |
dc.format.extent | 3110 bytes | |
dc.format.mimetype | application/pdf | |
dc.format.mimetype | text/plain | |
dc.publisher | Blackwell Publishing Inc | en_US |
dc.rights | ©2008, International Biometric Society | en_US |
dc.subject.other | EM Algorithm | en_US |
dc.subject.other | High-dimension Low Sample Size | en_US |
dc.subject.other | Microarray | en_US |
dc.subject.other | Model-based Clustering | en_US |
dc.subject.other | Regularization | en_US |
dc.subject.other | Variable Selection | en_US |
dc.title | Variable Selection for Model-Based High-Dimensional Clustering and Its Application to Microarray Data | en_US |
dc.type | Article | en_US |
dc.rights.robots | IndexNoFollow | en_US |
dc.subject.hlbsecondlevel | Mathematics | en_US |
dc.subject.hlbtoplevel | Science | en_US |
dc.description.peerreviewed | Peer Reviewed | en_US |
dc.contributor.affiliationum | Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A. | en_US |
dc.contributor.affiliationum | Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A. | en_US |
dc.identifier.pmid | 17970821 | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/66311/1/j.1541-0420.2007.00922.x.pdf | |
dc.identifier.doi | 10.1111/j.1541-0420.2007.00922.x | en_US |
dc.identifier.source | Biometrics | en_US |
dc.identifier.citedreference | Bickel, P. J. and Levina, L. ( 2004 ). Some theory for fisher's linear discriminant function, “naive Bayes,” and some alternatives when there are many more variables than observations. Bernoulli 10, 989 – 1010. | en_US |
dc.identifier.citedreference | Breiman, L. ( 1995 ). Better subset regression using the nonnegative garrote. Technometrics 37, 373 – 384. | en_US |
dc.identifier.citedreference | Dempster, A. P., Laird, N. M., and Rubin, D. B. ( 1977 ). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B 39, 1 – 38. | en_US |
dc.identifier.citedreference | Fan, J. and Li, R. ( 2001 ). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348 – 1360. | en_US |
dc.identifier.citedreference | Fraley, C. and Raftery, A. E. ( 2002 ). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97, 611 – 631. | en_US |
dc.identifier.citedreference | Friedman, J. H. and Meulman, J. J. ( 2004 ). Clustering objects on subsets of attributes (with discussion). Journal of the Royal Statistical Society, Series B 66, 815 – 849. | en_US |
dc.identifier.citedreference | Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., and Bloomfield, C. D. ( 1999 ). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531 – 537. | en_US |
dc.identifier.citedreference | Hoff, P. D. ( 2006 ). Model-based subspace clustering. Bayesian Analysis 1, 321 – 344. | en_US |
dc.identifier.citedreference | Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. ( 2001 ). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673 – 679. | en_US |
dc.identifier.citedreference | Liu, J. S., Zhang, J. L., and Palumbo, M. J. ( 2003 ). Bayesian clustering with variable and transformation selection (with discussion). Bayesian Statistics 7, 249 – 275. | en_US |
dc.identifier.citedreference | Marron, J. and Todd, M. ( 2002 ). Distance weighted discrimination. Technical Report, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY. | en_US |
dc.identifier.citedreference | McLachlan, G. and Peel, D. ( 2002 ). Finite Mixture Models. New York : John Wiley & Sons. | en_US |
dc.identifier.citedreference | Meng, X. L. and Rubin, D. ( 1993 ). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80, 267 – 278. | en_US |
dc.identifier.citedreference | Pan, W. and Shen, X. ( 2006 ). Penalized model-based clustering with application to variable selection. Journal of Machine Learning Research 8, 1145 – 1164. | en_US |
dc.identifier.citedreference | Raftery, A. E. ( 2003 ). Discussion of “Bayesian clustering with variable and transformation selection” by Liu et al. Bayesian Statistics 7, 266 – 271. | en_US |
dc.identifier.citedreference | Raftery, A. E. and Dean, N. ( 2006 ). Variable selection for model-based clustering. Journal of the American Statistical Association 101, 168 – 178. | en_US |
dc.identifier.citedreference | Schwarz, G. ( 1978 ). Estimating the dimension of a model. Annals of Statistics 6, 461 – 464. | en_US |
dc.identifier.citedreference | Shen, X. and Ye, J. ( 2002 ). Adaptive model selection. Journal of the American Statistical Association 97, 210 – 221. | en_US |
dc.identifier.citedreference | Tadesse, M. G., Sha, N., and Vannucci, M. ( 2005 ). Bayesian variable selection in clustering high-dimensional data. Journal of the American Statistical Association 100, 602 – 617. | en_US |
dc.identifier.citedreference | Tibshirani, R. ( 1996 ). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58, 267 – 288. | en_US |
dc.identifier.citedreference | Yuan, M. and Lin, Y. ( 2007 ). On the nonnegative garrote estimator. Journal of the Royal Statistical Society, Series B 69, 143 – 161. | en_US |
dc.identifier.citedreference | Zhang, H. H. and Lu, W. ( 2007 ). Adaptive-LASSO for Cox's proportional hazard model. Biometrika 94, 691 – 703. | en_US |
dc.identifier.citedreference | Zhang, H. H., Liu, Y., Wu, Y., and Zhu, J. ( 2006 ). Variable selection for multicategory SVM via sup-norm regularization. Institute of Statistics Mimeo Series 2596, North Carolina State University, Raleigh, NC. | en_US |
dc.identifier.citedreference | Zhao, P. and Yu, B. ( 2006 ). On model selection consistency of Lasso. Journal of Machine Learning Research 7, 2541 – 2567. | en_US |
dc.identifier.citedreference | Zhao, P., Rocha, G., and Yu, B. ( 2006 ). Grouped and hierarchical model selection through composite absolute penalties. Technical Report 703, Department of Statistics, University of California at Berkeley. | en_US |
dc.identifier.citedreference | Zou, H. ( 2006 ). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association 101, 1418 – 1429. | en_US |
dc.identifier.citedreference | Zou, H. and Yuan, M. ( 2006 ). The F ∞ -norm support vector machine. Statistica Sinica, in press. | en_US |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.