Cluster non‐Gaussian functional data

Zhong, Qingzhi; Lin, Huazhen; Li, Yi

Cluster non‐Gaussian functional data

dc.contributor.author	Zhong, Qingzhi
dc.contributor.author	Lin, Huazhen
dc.contributor.author	Li, Yi
dc.date.accessioned	2021-10-05T15:05:33Z
dc.date.available	2022-10-05 11:05:30	en
dc.date.available	2021-10-05T15:05:33Z
dc.date.issued	2021-09
dc.identifier.citation	Zhong, Qingzhi; Lin, Huazhen; Li, Yi (2021). "Cluster non‐Gaussian functional data." Biometrics 77(3): 852-865.
dc.identifier.issn	0006-341X
dc.identifier.issn	1541-0420
dc.identifier.uri	https://hdl.handle.net/2027.42/170210
dc.description.abstract	Gaussian distributions have been commonly assumed when clustering functional data. When the normality condition fails, biased results will follow. Additional challenges occur as the number of the clusters is often unknown a priori. This paper focuses on clustering non‐Gaussian functional data without the prior information of the number of clusters. We introduce a semiparametric mixed normal transformation model to accommodate non‐Gaussian functional data, and propose a penalized approach to simultaneously estimate the parameters, transformation function, and the number of clusters. The estimators are shown to be consistent and asymptotically normal. The practical utility of the methods is confirmed via simulations as well as an application of the analysis of Alzheimer’s disease study. The proposed method yields much less classification error than the existing methods. Data used in preparation of this paper were obtained from the Alzheimer’s Disease Neuroimaging Initiative database.
dc.publisher	University of Pennsylvania and Georgia Institute of Technology
dc.publisher	Wiley Periodicals, Inc.
dc.subject.other	functional principal component analysis
dc.subject.other	nonparametric transformation model
dc.subject.other	penalized EM algorithm
dc.subject.other	non‐Gaussian functional data
dc.subject.other	clustering analysis
dc.title	Cluster non‐Gaussian functional data
dc.type	Article
dc.rights.robots	IndexNoFollow
dc.subject.hlbsecondlevel	Mathematics
dc.subject.hlbtoplevel	Science
dc.description.peerreviewed	Peer Reviewed
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/170210/1/biom13349-sup-0001-SuppMat.pdf
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/170210/2/biom13349.pdf
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/170210/3/biom13349_am.pdf
dc.identifier.doi	10.1111/biom.13349
dc.identifier.source	Biometrics
dc.identifier.citedreference	Peng, J. and Müller, H.‐G. ( 2008 ) Distance‐based clustering of sparsely observed stochastic processes, with applications to online auctions. The Annals of Applied Statistics, 2, 1056 – 1077.
dc.identifier.citedreference	Jacques, J. and Preda, C. ( 2014 ) Functional data clustering: a survey. Advances in Data Analysis and Classification, 8, 231 – 255.
dc.identifier.citedreference	James, G.M., Hastie, T.J. and Sugar, C.A. ( 2000 ) Principal component models for sparse functional data. Biometrika, 87, 587 – 602.
dc.identifier.citedreference	James, G.M. and Sugar, C.A. ( 2003 ) Clustering for sparsely sampled functional data. Journal of the American Statistical Association, 98, 397 – 408.
dc.identifier.citedreference	Li, Y. and Hsing, T. ( 2010 ) Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. The Annals of Statistics, 38, 3321 – 3351.
dc.identifier.citedreference	Lin, Z., Müller, H.‐G. and Yao, F. ( 2018 ) Mixture inner product spaces and their application to functional data analysis. The Annals of Statistics, 46, 370 – 400.
dc.identifier.citedreference	Lin, H., Zhou, X.‐H. and Li, G. ( 2012 ) A direct semiparametric receiver operating characteristic curve regression with unknown link and baseline functions. Statistica Sinica, 22, 1427 – 1456.
dc.identifier.citedreference	Liu, X. and Yang, M.C. ( 2009 ) Simultaneous curve registration and clustering for functional data. Computational Statistics & Data Analysis, 53, 1361 – 1376.
dc.identifier.citedreference	Liu, J.S., Zhang, J.L., Palumbo, M.J. and Lawrence, C.E. ( 2003 ) Bayesian clustering with variable and transformation selections. Bayesian Statistics, 7, 249 – 275.
dc.identifier.citedreference	Ma, L., Hu, T. and Sun, J. ( 2015 ) Sieve maximum likelihood regression analysis of dependent current status data. Biometrika, 102, 731 – 738.
dc.identifier.citedreference	Mueller, S., Weiner, M., Thal, L., Petersen, R., Jack, C., Jagust, W., Trojanowski, J., Toga, A. and Beckett, L. ( 2005 ) The alzheimer’s disease neuroimaging initiative. Neuroimaging Clinics of North America, 15, 869 – 77.
dc.identifier.citedreference	Ramsay, J.O. and Silverman, B.W. ( 2005 ). Functional Data Analysis. Berlin: Springer.
dc.identifier.citedreference	Rivera‐García, D., García‐Escudero, L.A., Mayo‐Iscar, A. and Ortega, J. ( 2019 ) Robust clustering for functional data based on trimming and constraints. Advances in Data Analysis and Classification, 13, 201 – 225.
dc.identifier.citedreference	Schumaker, L. ( 2007 ). Spline Functions: Basic Theory. Cambridge: Cambridge University Press.
dc.identifier.citedreference	Schwarz, G. ( 1978 ) Estimating the dimension of a model. The Annals of Statistics, 6, 461 – 464.
dc.identifier.citedreference	Serban, N. and Jiang, H. ( 2012 ) Multilevel functional clustering analysis. Biometrics, 68, 805 – 814.
dc.identifier.citedreference	Stone, C.J. ( 1980 ) Optimal rates of convergence for nonparametric estimators. The Annals of Statistics, 1348 – 1360.
dc.identifier.citedreference	Suyundykov, R., Puechmorel, S. and Ferré, L. ( 2010 ) Multivariate functional data clusterization by PCA in Sobolev space using wavelets. 42èmes Journées de Statistique.
dc.identifier.citedreference	Tarpey, T. and Kinateder, K.K. ( 2003 ) Clustering functional data. Journal of Classification, 20, 93 – 114.
dc.identifier.citedreference	Tokushige, S., Yadohisa, H. and Inada, K. ( 2007 ) Crisp and fuzzy k‐means clustering algorithms for multivariate functional data. Computational Statistics, 22, 1 – 16.
dc.identifier.citedreference	Wang, J.‐L., Chiou, J.‐M. and Müller, H.‐G. ( 2016 ) Functional data analysis. Annual Review of Statistics and its Application, 3, 257 – 295.
dc.identifier.citedreference	Wang, H., Li, R. and Tsai, C.‐L. ( 2007 ) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94, 553 – 568.
dc.identifier.citedreference	Yao, F., Müller, H.‐G. and Wang, J.‐L. ( 2005 ) Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association, 100, 577 – 590.
dc.identifier.citedreference	Zhou, X.‐H., Lin, H. and Johnson, E. ( 2008 ) Non‐parametric heteroscedastic transformation regression models for skewed data with an application to health care costs. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 1029 – 1047.
dc.identifier.citedreference	Abraham, C., Cornillon, P.‐A., Matzner‐Løber, E. and Molinari, N. ( 2003 ) Unsupervised curve clustering using b‐splines. Scandinavian Journal of Statistics, 30, 581 – 595.
dc.identifier.citedreference	Bauer, D.J. and Curran, P.J. ( 2003 ) Distributional assumptions of growth mixture models: implications for overextraction of latent trajectory classes. Psychological Methods, 8, 338.
dc.identifier.citedreference	Biernacki, C., Celeux, G. and Govaert, G. ( 2000 ) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 719 – 725.
dc.identifier.citedreference	Bouveyron, C., Côme, E. and Jacques, J. ( 2015 ) The discriminative functional mixture model for a comparative analysis of bike sharing systems. The Annals of Applied Statistics, 9, 1726 – 1760.
dc.identifier.citedreference	Bouveyron, C. and Jacques, J. ( 2011 ) Model‐based clustering of time series in group‐specific functional subspaces. Advances in Data Analysis and Classification, 5, 281 – 300.
dc.identifier.citedreference	Cai, T. and Yuan, M. ( 2010 ) Nonparametric covariance function estimation for functional and longitudinal data. University of Pennsylvania and Georgia Institute of Technology.
dc.identifier.citedreference	Chen, X., Hu, T. and Sun, J. ( 2017 ) Sieve maximum likelihood estimation for the proportional hazards model under informative censoring. Computational Statistics & Data Analysis, 112, 224 – 234.
dc.identifier.citedreference	Chen, K. and Tong, X. ( 2010 ) Varying coefficient transformation models with censored data. Biometrika, 97, 969 – 976.
dc.identifier.citedreference	Chiou, J.‐M. and Li, P.‐L. ( 2007 ) Functional clustering and identifying substructures of longitudinal data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69, 679 – 699.
dc.identifier.citedreference	Delaigle, A., Hall, P. and Pham, T. ( 2019 ) Clustering functional data into groups by using projections. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81, 271 – 304.
dc.identifier.citedreference	Dempster, A.P., Laird, N.M. and Rubin, D.B. ( 1977 ) Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39, 1 – 22.
dc.identifier.citedreference	Ferraty, F. and Vieu, P. ( 2006 ). Nonparametric Functional Data Analysis: Theory and Practice. New York, NY: Springer Science & Business Media.
dc.identifier.citedreference	Floriello, D. and Vitelli, V. ( 2017 ) Sparse clustering of functional data. Journal of Multivariate Analysis, 154, 1 – 18.
dc.identifier.citedreference	Fröhwirth‐Schnatter, S. and Kaufmann, S. ( 2008 ) Model‐based clustering of multiple time series. Journal of Business & Economic Statistics, 26, 78 – 89.
dc.identifier.citedreference	Hall, P. and Horowitz, J.L. ( 2007 ) Methodology and convergence rates for functional linear regression. The Annals of Statistics, 35, 70 – 91.
dc.identifier.citedreference	Hall, P. and Hosseini‐Nasab, M. ( 2006 ) On properties of functional principal components analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 109 – 126.
dc.identifier.citedreference	Hall, P., Müller, H.‐G. and Yao, F. ( 2008 ) Modelling sparse generalized longitudinal observations with latent Gaussian processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 703 – 723.
dc.identifier.citedreference	Horowitz, J.L. ( 1996 ) Semiparametric estimation of a regression model with an unknown transformation of the dependent variable. Econometrica, 64, 103 – 137.
dc.identifier.citedreference	Huang, T., Peng, H. and Zhang, K. ( 2017 ) Model selection for gaussian mixture models. Statistica Sinica, 27, 147 – 169.
dc.identifier.citedreference	Jacques, J. and Preda, C. ( 2013 ) Funclust: a curves clustering method using functional random variables density approximation. Neurocomputing, 112, 164 – 171.
dc.working.doi	NO	en
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: biom13349-sup-0001-SuppMat.pdf
Size:: 451.2KB
Format:: PDF

View/Open

Name:: biom13349.pdf
Size:: 845.1KB
Format:: PDF

View/Open

Name:: biom13349_am.pdf
Size:: 2.192MB
Format:: PDF

View/Open

Interdisciplinary and Peer-Reviewed

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.