Show simple item record

Variable Selection for Model-Based High-Dimensional Clustering and Its Application to Microarray Data

dc.contributor.authorWang, Sijianen_US
dc.contributor.authorZhu, Jien_US
dc.date.accessioned2010-04-01T15:49:02Z
dc.date.available2010-04-01T15:49:02Z
dc.date.issued2008-06en_US
dc.identifier.citationWang, Sijian; Zhu, Ji (2008). "Variable Selection for Model-Based High-Dimensional Clustering and Its Application to Microarray Data." Biometrics 64(2): 440-448. <http://hdl.handle.net/2027.42/66311>en_US
dc.identifier.issn0006-341Xen_US
dc.identifier.issn1541-0420en_US
dc.identifier.urihttps://hdl.handle.net/2027.42/66311
dc.identifier.urihttp://www.ncbi.nlm.nih.gov/sites/entrez?cmd=retrieve&db=pubmed&list_uids=17970821&dopt=citationen_US
dc.description.abstractVariable selection in high-dimensional clustering analysis is an important yet challenging problem. In this article, we propose two methods that simultaneously separate data points into similar clusters and select informative variables that contribute to the clustering. Our methods are in the framework of penalized model-based clustering. Unlike the classical L 1 -norm penalization, the penalty terms that we propose make use of the fact that parameters belonging to one variable should be treated as a natural “group.” Numerical results indicate that the two new methods tend to remove noninformative variables more effectively and provide better clustering results than the L 1 -norm approach.en_US
dc.format.extent1244380 bytes
dc.format.extent3110 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypetext/plain
dc.publisherBlackwell Publishing Incen_US
dc.rights©2008, International Biometric Societyen_US
dc.subject.otherEM Algorithmen_US
dc.subject.otherHigh-dimension Low Sample Sizeen_US
dc.subject.otherMicroarrayen_US
dc.subject.otherModel-based Clusteringen_US
dc.subject.otherRegularizationen_US
dc.subject.otherVariable Selectionen_US
dc.titleVariable Selection for Model-Based High-Dimensional Clustering and Its Application to Microarray Dataen_US
dc.typeArticleen_US
dc.rights.robotsIndexNoFollowen_US
dc.subject.hlbsecondlevelMathematicsen_US
dc.subject.hlbtoplevelScienceen_US
dc.description.peerreviewedPeer Revieweden_US
dc.contributor.affiliationumDepartment of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.en_US
dc.contributor.affiliationumDepartment of Statistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.en_US
dc.identifier.pmid17970821en_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/66311/1/j.1541-0420.2007.00922.x.pdf
dc.identifier.doi10.1111/j.1541-0420.2007.00922.xen_US
dc.identifier.sourceBiometricsen_US
dc.identifier.citedreferenceBickel, P. J. and Levina, L. ( 2004 ). Some theory for fisher's linear discriminant function, “naive Bayes,” and some alternatives when there are many more variables than observations. Bernoulli 10, 989 – 1010.en_US
dc.identifier.citedreferenceBreiman, L. ( 1995 ). Better subset regression using the nonnegative garrote. Technometrics 37, 373 – 384.en_US
dc.identifier.citedreferenceDempster, A. P., Laird, N. M., and Rubin, D. B. ( 1977 ). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B 39, 1 – 38.en_US
dc.identifier.citedreferenceFan, J. and Li, R. ( 2001 ). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348 – 1360.en_US
dc.identifier.citedreferenceFraley, C. and Raftery, A. E. ( 2002 ). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97, 611 – 631.en_US
dc.identifier.citedreferenceFriedman, J. H. and Meulman, J. J. ( 2004 ). Clustering objects on subsets of attributes (with discussion). Journal of the Royal Statistical Society, Series B 66, 815 – 849.en_US
dc.identifier.citedreferenceGolub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., and Bloomfield, C. D. ( 1999 ). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531 – 537.en_US
dc.identifier.citedreferenceHoff, P. D. ( 2006 ). Model-based subspace clustering. Bayesian Analysis 1, 321 – 344.en_US
dc.identifier.citedreferenceKhan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. ( 2001 ). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673 – 679.en_US
dc.identifier.citedreferenceLiu, J. S., Zhang, J. L., and Palumbo, M. J. ( 2003 ). Bayesian clustering with variable and transformation selection (with discussion). Bayesian Statistics 7, 249 – 275.en_US
dc.identifier.citedreferenceMarron, J. and Todd, M. ( 2002 ). Distance weighted discrimination. Technical Report, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY.en_US
dc.identifier.citedreferenceMcLachlan, G. and Peel, D. ( 2002 ). Finite Mixture Models. New York : John Wiley & Sons.en_US
dc.identifier.citedreferenceMeng, X. L. and Rubin, D. ( 1993 ). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80, 267 – 278.en_US
dc.identifier.citedreferencePan, W. and Shen, X. ( 2006 ). Penalized model-based clustering with application to variable selection. Journal of Machine Learning Research 8, 1145 – 1164.en_US
dc.identifier.citedreferenceRaftery, A. E. ( 2003 ). Discussion of “Bayesian clustering with variable and transformation selection” by Liu et al. Bayesian Statistics 7, 266 – 271.en_US
dc.identifier.citedreferenceRaftery, A. E. and Dean, N. ( 2006 ). Variable selection for model-based clustering. Journal of the American Statistical Association 101, 168 – 178.en_US
dc.identifier.citedreferenceSchwarz, G. ( 1978 ). Estimating the dimension of a model. Annals of Statistics 6, 461 – 464.en_US
dc.identifier.citedreferenceShen, X. and Ye, J. ( 2002 ). Adaptive model selection. Journal of the American Statistical Association 97, 210 – 221.en_US
dc.identifier.citedreferenceTadesse, M. G., Sha, N., and Vannucci, M. ( 2005 ). Bayesian variable selection in clustering high-dimensional data. Journal of the American Statistical Association 100, 602 – 617.en_US
dc.identifier.citedreferenceTibshirani, R. ( 1996 ). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58, 267 – 288.en_US
dc.identifier.citedreferenceYuan, M. and Lin, Y. ( 2007 ). On the nonnegative garrote estimator. Journal of the Royal Statistical Society, Series B 69, 143 – 161.en_US
dc.identifier.citedreferenceZhang, H. H. and Lu, W. ( 2007 ). Adaptive-LASSO for Cox's proportional hazard model. Biometrika 94, 691 – 703.en_US
dc.identifier.citedreferenceZhang, H. H., Liu, Y., Wu, Y., and Zhu, J. ( 2006 ). Variable selection for multicategory SVM via sup-norm regularization. Institute of Statistics Mimeo Series 2596, North Carolina State University, Raleigh, NC.en_US
dc.identifier.citedreferenceZhao, P. and Yu, B. ( 2006 ). On model selection consistency of Lasso. Journal of Machine Learning Research 7, 2541 – 2567.en_US
dc.identifier.citedreferenceZhao, P., Rocha, G., and Yu, B. ( 2006 ). Grouped and hierarchical model selection through composite absolute penalties. Technical Report 703, Department of Statistics, University of California at Berkeley.en_US
dc.identifier.citedreferenceZou, H. ( 2006 ). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association 101, 1418 – 1429.en_US
dc.identifier.citedreferenceZou, H. and Yuan, M. ( 2006 ). The F ∞ -norm support vector machine. Statistica Sinica, in press.en_US
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.