Show simple item record

Cluster non‐Gaussian functional data

dc.contributor.authorZhong, Qingzhi
dc.contributor.authorLin, Huazhen
dc.contributor.authorLi, Yi
dc.date.accessioned2021-10-05T15:05:33Z
dc.date.available2022-10-05 11:05:30en
dc.date.available2021-10-05T15:05:33Z
dc.date.issued2021-09
dc.identifier.citationZhong, Qingzhi; Lin, Huazhen; Li, Yi (2021). "Cluster non‐Gaussian functional data." Biometrics 77(3): 852-865.
dc.identifier.issn0006-341X
dc.identifier.issn1541-0420
dc.identifier.urihttps://hdl.handle.net/2027.42/170210
dc.description.abstractGaussian distributions have been commonly assumed when clustering functional data. When the normality condition fails, biased results will follow. Additional challenges occur as the number of the clusters is often unknown a priori. This paper focuses on clustering non‐Gaussian functional data without the prior information of the number of clusters. We introduce a semiparametric mixed normal transformation model to accommodate non‐Gaussian functional data, and propose a penalized approach to simultaneously estimate the parameters, transformation function, and the number of clusters. The estimators are shown to be consistent and asymptotically normal. The practical utility of the methods is confirmed via simulations as well as an application of the analysis of Alzheimer’s disease study. The proposed method yields much less classification error than the existing methods. Data used in preparation of this paper were obtained from the Alzheimer’s Disease Neuroimaging Initiative database.
dc.publisherUniversity of Pennsylvania and Georgia Institute of Technology
dc.publisherWiley Periodicals, Inc.
dc.subject.otherfunctional principal component analysis
dc.subject.othernonparametric transformation model
dc.subject.otherpenalized EM algorithm
dc.subject.othernon‐Gaussian functional data
dc.subject.otherclustering analysis
dc.titleCluster non‐Gaussian functional data
dc.typeArticle
dc.rights.robotsIndexNoFollow
dc.subject.hlbsecondlevelMathematics
dc.subject.hlbtoplevelScience
dc.description.peerreviewedPeer Reviewed
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/170210/1/biom13349-sup-0001-SuppMat.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/170210/2/biom13349.pdf
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/170210/3/biom13349_am.pdf
dc.identifier.doi10.1111/biom.13349
dc.identifier.sourceBiometrics
dc.identifier.citedreferencePeng, J. and Müller, H.‐G. ( 2008 ) Distance‐based clustering of sparsely observed stochastic processes, with applications to online auctions. The Annals of Applied Statistics, 2, 1056 – 1077.
dc.identifier.citedreferenceJacques, J. and Preda, C. ( 2014 ) Functional data clustering: a survey. Advances in Data Analysis and Classification, 8, 231 – 255.
dc.identifier.citedreferenceJames, G.M., Hastie, T.J. and Sugar, C.A. ( 2000 ) Principal component models for sparse functional data. Biometrika, 87, 587 – 602.
dc.identifier.citedreferenceJames, G.M. and Sugar, C.A. ( 2003 ) Clustering for sparsely sampled functional data. Journal of the American Statistical Association, 98, 397 – 408.
dc.identifier.citedreferenceLi, Y. and Hsing, T. ( 2010 ) Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. The Annals of Statistics, 38, 3321 – 3351.
dc.identifier.citedreferenceLin, Z., Müller, H.‐G. and Yao, F. ( 2018 ) Mixture inner product spaces and their application to functional data analysis. The Annals of Statistics, 46, 370 – 400.
dc.identifier.citedreferenceLin, H., Zhou, X.‐H. and Li, G. ( 2012 ) A direct semiparametric receiver operating characteristic curve regression with unknown link and baseline functions. Statistica Sinica, 22, 1427 – 1456.
dc.identifier.citedreferenceLiu, X. and Yang, M.C. ( 2009 ) Simultaneous curve registration and clustering for functional data. Computational Statistics & Data Analysis, 53, 1361 – 1376.
dc.identifier.citedreferenceLiu, J.S., Zhang, J.L., Palumbo, M.J. and Lawrence, C.E. ( 2003 ) Bayesian clustering with variable and transformation selections. Bayesian Statistics, 7, 249 – 275.
dc.identifier.citedreferenceMa, L., Hu, T. and Sun, J. ( 2015 ) Sieve maximum likelihood regression analysis of dependent current status data. Biometrika, 102, 731 – 738.
dc.identifier.citedreferenceMueller, S., Weiner, M., Thal, L., Petersen, R., Jack, C., Jagust, W., Trojanowski, J., Toga, A. and Beckett, L. ( 2005 ) The alzheimer’s disease neuroimaging initiative. Neuroimaging Clinics of North America, 15, 869 – 77.
dc.identifier.citedreferenceRamsay, J.O. and Silverman, B.W. ( 2005 ). Functional Data Analysis. Berlin: Springer.
dc.identifier.citedreferenceRivera‐García, D., García‐Escudero, L.A., Mayo‐Iscar, A. and Ortega, J. ( 2019 ) Robust clustering for functional data based on trimming and constraints. Advances in Data Analysis and Classification, 13, 201 – 225.
dc.identifier.citedreferenceSchumaker, L. ( 2007 ). Spline Functions: Basic Theory. Cambridge: Cambridge University Press.
dc.identifier.citedreferenceSchwarz, G. ( 1978 ) Estimating the dimension of a model. The Annals of Statistics, 6, 461 – 464.
dc.identifier.citedreferenceSerban, N. and Jiang, H. ( 2012 ) Multilevel functional clustering analysis. Biometrics, 68, 805 – 814.
dc.identifier.citedreferenceStone, C.J. ( 1980 ) Optimal rates of convergence for nonparametric estimators. The Annals of Statistics, 1348 – 1360.
dc.identifier.citedreferenceSuyundykov, R., Puechmorel, S. and Ferré, L. ( 2010 ) Multivariate functional data clusterization by PCA in Sobolev space using wavelets. 42èmes Journées de Statistique.
dc.identifier.citedreferenceTarpey, T. and Kinateder, K.K. ( 2003 ) Clustering functional data. Journal of Classification, 20, 93 – 114.
dc.identifier.citedreferenceTokushige, S., Yadohisa, H. and Inada, K. ( 2007 ) Crisp and fuzzy k‐means clustering algorithms for multivariate functional data. Computational Statistics, 22, 1 – 16.
dc.identifier.citedreferenceWang, J.‐L., Chiou, J.‐M. and Müller, H.‐G. ( 2016 ) Functional data analysis. Annual Review of Statistics and its Application, 3, 257 – 295.
dc.identifier.citedreferenceWang, H., Li, R. and Tsai, C.‐L. ( 2007 ) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94, 553 – 568.
dc.identifier.citedreferenceYao, F., Müller, H.‐G. and Wang, J.‐L. ( 2005 ) Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association, 100, 577 – 590.
dc.identifier.citedreferenceZhou, X.‐H., Lin, H. and Johnson, E. ( 2008 ) Non‐parametric heteroscedastic transformation regression models for skewed data with an application to health care costs. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 1029 – 1047.
dc.identifier.citedreferenceAbraham, C., Cornillon, P.‐A., Matzner‐Løber, E. and Molinari, N. ( 2003 ) Unsupervised curve clustering using b‐splines. Scandinavian Journal of Statistics, 30, 581 – 595.
dc.identifier.citedreferenceBauer, D.J. and Curran, P.J. ( 2003 ) Distributional assumptions of growth mixture models: implications for overextraction of latent trajectory classes. Psychological Methods, 8, 338.
dc.identifier.citedreferenceBiernacki, C., Celeux, G. and Govaert, G. ( 2000 ) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 719 – 725.
dc.identifier.citedreferenceBouveyron, C., Côme, E. and Jacques, J. ( 2015 ) The discriminative functional mixture model for a comparative analysis of bike sharing systems. The Annals of Applied Statistics, 9, 1726 – 1760.
dc.identifier.citedreferenceBouveyron, C. and Jacques, J. ( 2011 ) Model‐based clustering of time series in group‐specific functional subspaces. Advances in Data Analysis and Classification, 5, 281 – 300.
dc.identifier.citedreferenceCai, T. and Yuan, M. ( 2010 ) Nonparametric covariance function estimation for functional and longitudinal data. University of Pennsylvania and Georgia Institute of Technology.
dc.identifier.citedreferenceChen, X., Hu, T. and Sun, J. ( 2017 ) Sieve maximum likelihood estimation for the proportional hazards model under informative censoring. Computational Statistics & Data Analysis, 112, 224 – 234.
dc.identifier.citedreferenceChen, K. and Tong, X. ( 2010 ) Varying coefficient transformation models with censored data. Biometrika, 97, 969 – 976.
dc.identifier.citedreferenceChiou, J.‐M. and Li, P.‐L. ( 2007 ) Functional clustering and identifying substructures of longitudinal data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69, 679 – 699.
dc.identifier.citedreferenceDelaigle, A., Hall, P. and Pham, T. ( 2019 ) Clustering functional data into groups by using projections. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81, 271 – 304.
dc.identifier.citedreferenceDempster, A.P., Laird, N.M. and Rubin, D.B. ( 1977 ) Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39, 1 – 22.
dc.identifier.citedreferenceFerraty, F. and Vieu, P. ( 2006 ). Nonparametric Functional Data Analysis: Theory and Practice. New York, NY: Springer Science & Business Media.
dc.identifier.citedreferenceFloriello, D. and Vitelli, V. ( 2017 ) Sparse clustering of functional data. Journal of Multivariate Analysis, 154, 1 – 18.
dc.identifier.citedreferenceFröhwirth‐Schnatter, S. and Kaufmann, S. ( 2008 ) Model‐based clustering of multiple time series. Journal of Business & Economic Statistics, 26, 78 – 89.
dc.identifier.citedreferenceHall, P. and Horowitz, J.L. ( 2007 ) Methodology and convergence rates for functional linear regression. The Annals of Statistics, 35, 70 – 91.
dc.identifier.citedreferenceHall, P. and Hosseini‐Nasab, M. ( 2006 ) On properties of functional principal components analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 109 – 126.
dc.identifier.citedreferenceHall, P., Müller, H.‐G. and Yao, F. ( 2008 ) Modelling sparse generalized longitudinal observations with latent Gaussian processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 703 – 723.
dc.identifier.citedreferenceHorowitz, J.L. ( 1996 ) Semiparametric estimation of a regression model with an unknown transformation of the dependent variable. Econometrica, 64, 103 – 137.
dc.identifier.citedreferenceHuang, T., Peng, H. and Zhang, K. ( 2017 ) Model selection for gaussian mixture models. Statistica Sinica, 27, 147 – 169.
dc.identifier.citedreferenceJacques, J. and Preda, C. ( 2013 ) Funclust: a curves clustering method using functional random variables density approximation. Neurocomputing, 112, 164 – 171.
dc.working.doiNOen
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.