Variable Selection for Model-Based High-Dimensional Clustering and Its Application to Microarray Data

Wang, Sijian; Zhu, Ji

Variable Selection for Model-Based High-Dimensional Clustering and Its Application to Microarray Data

dc.contributor.author	Wang, Sijian	en_US
dc.contributor.author	Zhu, Ji	en_US
dc.date.accessioned	2010-04-01T15:49:02Z
dc.date.available	2010-04-01T15:49:02Z
dc.date.issued	2008-06	en_US
dc.identifier.citation	Wang, Sijian; Zhu, Ji (2008). "Variable Selection for Model-Based High-Dimensional Clustering and Its Application to Microarray Data." Biometrics 64(2): 440-448. <http://hdl.handle.net/2027.42/66311>	en_US
dc.identifier.issn	0006-341X	en_US
dc.identifier.issn	1541-0420	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/66311
dc.identifier.uri	http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=retrieve&db=pubmed&list_uids=17970821&dopt=citation	en_US
dc.description.abstract	Variable selection in high-dimensional clustering analysis is an important yet challenging problem. In this article, we propose two methods that simultaneously separate data points into similar clusters and select informative variables that contribute to the clustering. Our methods are in the framework of penalized model-based clustering. Unlike the classical L 1 -norm penalization, the penalty terms that we propose make use of the fact that parameters belonging to one variable should be treated as a natural “group.” Numerical results indicate that the two new methods tend to remove noninformative variables more effectively and provide better clustering results than the L 1 -norm approach.	en_US
dc.format.extent	1244380 bytes
dc.format.extent	3110 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	text/plain
dc.publisher	Blackwell Publishing Inc	en_US
dc.rights	©2008, International Biometric Society	en_US
dc.subject.other	EM Algorithm	en_US
dc.subject.other	High-dimension Low Sample Size	en_US
dc.subject.other	Microarray	en_US
dc.subject.other	Model-based Clustering	en_US
dc.subject.other	Regularization	en_US
dc.subject.other	Variable Selection	en_US
dc.title	Variable Selection for Model-Based High-Dimensional Clustering and Its Application to Microarray Data	en_US
dc.type	Article	en_US
dc.rights.robots	IndexNoFollow	en_US
dc.subject.hlbsecondlevel	Mathematics	en_US
dc.subject.hlbtoplevel	Science	en_US
dc.description.peerreviewed	Peer Reviewed	en_US
dc.contributor.affiliationum	Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.	en_US
dc.contributor.affiliationum	Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.	en_US
dc.identifier.pmid	17970821	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/66311/1/j.1541-0420.2007.00922.x.pdf
dc.identifier.doi	10.1111/j.1541-0420.2007.00922.x	en_US
dc.identifier.source	Biometrics	en_US
dc.identifier.citedreference	Bickel, P. J. and Levina, L. ( 2004 ). Some theory for fisher's linear discriminant function, “naive Bayes,” and some alternatives when there are many more variables than observations. Bernoulli 10, 989 – 1010.	en_US
dc.identifier.citedreference	Breiman, L. ( 1995 ). Better subset regression using the nonnegative garrote. Technometrics 37, 373 – 384.	en_US
dc.identifier.citedreference	Dempster, A. P., Laird, N. M., and Rubin, D. B. ( 1977 ). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B 39, 1 – 38.	en_US
dc.identifier.citedreference	Fan, J. and Li, R. ( 2001 ). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348 – 1360.	en_US
dc.identifier.citedreference	Fraley, C. and Raftery, A. E. ( 2002 ). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97, 611 – 631.	en_US
dc.identifier.citedreference	Friedman, J. H. and Meulman, J. J. ( 2004 ). Clustering objects on subsets of attributes (with discussion). Journal of the Royal Statistical Society, Series B 66, 815 – 849.	en_US
dc.identifier.citedreference	Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., and Bloomfield, C. D. ( 1999 ). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531 – 537.	en_US
dc.identifier.citedreference	Hoff, P. D. ( 2006 ). Model-based subspace clustering. Bayesian Analysis 1, 321 – 344.	en_US
dc.identifier.citedreference	Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. ( 2001 ). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673 – 679.	en_US
dc.identifier.citedreference	Liu, J. S., Zhang, J. L., and Palumbo, M. J. ( 2003 ). Bayesian clustering with variable and transformation selection (with discussion). Bayesian Statistics 7, 249 – 275.	en_US
dc.identifier.citedreference	Marron, J. and Todd, M. ( 2002 ). Distance weighted discrimination. Technical Report, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY.	en_US
dc.identifier.citedreference	McLachlan, G. and Peel, D. ( 2002 ). Finite Mixture Models. New York : John Wiley & Sons.	en_US
dc.identifier.citedreference	Meng, X. L. and Rubin, D. ( 1993 ). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80, 267 – 278.	en_US
dc.identifier.citedreference	Pan, W. and Shen, X. ( 2006 ). Penalized model-based clustering with application to variable selection. Journal of Machine Learning Research 8, 1145 – 1164.	en_US
dc.identifier.citedreference	Raftery, A. E. ( 2003 ). Discussion of “Bayesian clustering with variable and transformation selection” by Liu et al. Bayesian Statistics 7, 266 – 271.	en_US
dc.identifier.citedreference	Raftery, A. E. and Dean, N. ( 2006 ). Variable selection for model-based clustering. Journal of the American Statistical Association 101, 168 – 178.	en_US
dc.identifier.citedreference	Schwarz, G. ( 1978 ). Estimating the dimension of a model. Annals of Statistics 6, 461 – 464.	en_US
dc.identifier.citedreference	Shen, X. and Ye, J. ( 2002 ). Adaptive model selection. Journal of the American Statistical Association 97, 210 – 221.	en_US
dc.identifier.citedreference	Tadesse, M. G., Sha, N., and Vannucci, M. ( 2005 ). Bayesian variable selection in clustering high-dimensional data. Journal of the American Statistical Association 100, 602 – 617.	en_US
dc.identifier.citedreference	Tibshirani, R. ( 1996 ). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58, 267 – 288.	en_US
dc.identifier.citedreference	Yuan, M. and Lin, Y. ( 2007 ). On the nonnegative garrote estimator. Journal of the Royal Statistical Society, Series B 69, 143 – 161.	en_US
dc.identifier.citedreference	Zhang, H. H. and Lu, W. ( 2007 ). Adaptive-LASSO for Cox's proportional hazard model. Biometrika 94, 691 – 703.	en_US
dc.identifier.citedreference	Zhang, H. H., Liu, Y., Wu, Y., and Zhu, J. ( 2006 ). Variable selection for multicategory SVM via sup-norm regularization. Institute of Statistics Mimeo Series 2596, North Carolina State University, Raleigh, NC.	en_US
dc.identifier.citedreference	Zhao, P. and Yu, B. ( 2006 ). On model selection consistency of Lasso. Journal of Machine Learning Research 7, 2541 – 2567.	en_US
dc.identifier.citedreference	Zhao, P., Rocha, G., and Yu, B. ( 2006 ). Grouped and hierarchical model selection through composite absolute penalties. Technical Report 703, Department of Statistics, University of California at Berkeley.	en_US
dc.identifier.citedreference	Zou, H. ( 2006 ). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association 101, 1418 – 1429.	en_US
dc.identifier.citedreference	Zou, H. and Yuan, M. ( 2006 ). The F ∞ -norm support vector machine. Statistica Sinica, in press.	en_US
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: j.1541-0420.2007.00922.x.pdf
Size:: 1.186MB
Format:: PDF

View/Open

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.