Poststratification fusion learning in longitudinal data analysis

Tang, Lu; Song, Peter X.‐k.

Poststratification fusion learning in longitudinal data analysis

dc.contributor.author	Tang, Lu
dc.contributor.author	Song, Peter X.‐k.
dc.date.accessioned	2021-10-05T15:05:02Z
dc.date.available	2022-10-05 11:04:59	en
dc.date.available	2021-10-05T15:05:02Z
dc.date.issued	2021-09
dc.identifier.citation	Tang, Lu; Song, Peter X.‐k. (2021). "Poststratification fusion learning in longitudinal data analysis." Biometrics 77(3): 914-928.
dc.identifier.issn	0006-341X
dc.identifier.issn	1541-0420
dc.identifier.uri	https://hdl.handle.net/2027.42/170201
dc.description.abstract	Stratification is a very commonly used approach in biomedical studies to handle sample heterogeneity arising from, for examples, clinical units, patient subgroups, or missing- data. A key rationale behind such approach is to overcome potential sampling biases in statistical inference. Two issues of such stratification- based strategy are (i) whether individual strata are sufficiently distinctive to warrant stratification, and (ii) sample size attrition resulted from the stratification may potentially lead to loss of statistical power. To address these issues, we propose a penalized generalized estimating equations approach to reducing the complexity of parametric model structures due to excessive stratification. Specifically, we develop a data- driven fusion learning approach for longitudinal data that improves estimation efficiency by integrating information across similar strata, yet still allows necessary separation for stratum- specific conclusions. The proposed method is evaluated by simulation studies and applied to a motivating example of psychiatric study to demonstrate its usefulness in real worldÂ settings.
dc.publisher	CRC Press
dc.publisher	Wiley Periodicals, Inc.
dc.subject.other	GEE
dc.subject.other	pattern- mixture model
dc.subject.other	regularization
dc.subject.other	stratification
dc.subject.other	variable selection
dc.title	Poststratification fusion learning in longitudinal data analysis
dc.type	Article
dc.rights.robots	IndexNoFollow
dc.subject.hlbsecondlevel	Mathematics
dc.subject.hlbtoplevel	Science
dc.description.peerreviewed	Peer Reviewed
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/170201/1/biom13333.pdf
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/170201/2/biom13333-sup-0002-Appendices.pdf
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/170201/3/biom13333_am.pdf
dc.identifier.doi	10.1111/biom.13333
dc.identifier.source	Biometrics
dc.identifier.citedreference	Shen, X. and Huang, H.- C. ( 2010 ) Grouping pursuit through a regularization solution surface. Journal of the American Statistical Association, 105, 727 - 739.
dc.identifier.citedreference	Little, R.J. ( 1993 ) Pattern- mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125 - 134.
dc.identifier.citedreference	Johnson, B.A., Lin, D. and Zeng, D. ( 2008 ) Penalized estimating functions and variable selection in semiparametric regression models. Journal of the American Statistical Association, 103, 672 - 680.
dc.identifier.citedreference	Jorgensen, B. ( 1997 ). The Theory of Dispersion Models. Boca Raton, FL: CRC Press.
dc.identifier.citedreference	Ke, Z.T., Fan, J. and Wu, Y. ( 2015 ) Homogeneity pursuit. Journal of the American Statistical Association, 110, 175 - 194.
dc.identifier.citedreference	Kroenke, K., Spitzer, R.L. and Williams, J.B. ( 2001 ) The PHQ- 9: validity of a brief depression severity measure. Journal of General Internal Medicine, 16, 606 - 613.
dc.identifier.citedreference	Ma, S. and Huang, J. ( 2017 ) A concave pairwise fusion approach to subgroup analysis. Journal of the American Statistical Association, 112, 410 - 423.
dc.identifier.citedreference	Neuhaus, J.M., Kalbfleisch, J.D. and Hauck, W.W. ( 1991 ) A comparison of cluster- specific and population- averaged approaches for analyzing correlated binary data. International Statistical Review, 59, 25 - 35.
dc.identifier.citedreference	Ollier, E., Samson, A., Delavenne, X. and Viallon, V. ( 2016 ) A SAEM algorithm for fused lasso penalized nonlinear mixed effect models: application to group comparison in pharmacokinetics. Computational Statistics & Data Analysis, 95, 207 - 221.
dc.identifier.citedreference	Qu, A. and Song, P.X.- K. ( 2002 ) Testing ignorable missingness in estimating equation approaches for longitudinal data. Biometrika, 89, 841 - 850.
dc.identifier.citedreference	Qu, A., Yi, G., Song, P. X.- K. and Wang, P. ( 2011 ) Assessing the validity of weighted generalized estimating equations. Biometrika, 98, 215 - 224.
dc.identifier.citedreference	Sen, S., Kranzler, H.R., Krystal, J.H., Speller, H., Chan, G., Gelernter, J. and Guille, C. ( 2010 ) A prospective cohort study investigating factors associated with depression during medical internship. Archives of General Psychiatry, 67, 557 - 565.
dc.identifier.citedreference	Liang, K.- Y. and Zeger, S.L. ( 1986 ) Longitudinal data analysis using generalized linear models. Biometrika, 73, 13 - 22.
dc.identifier.citedreference	Song, P. X.- K. ( 2007 ). Correlated Data Analysis: Modeling, Analytics, and Applications. New York, NY: Springer.
dc.identifier.citedreference	Spitzer, R.L., Kroenke, K., Williams, J.B. and LÃ¶we, B. ( 2006 ) A brief measure for assessing generalized anxiety disorder: the GAD- 7. Archives of Internal Medicine, 166, 1092 - 1097.
dc.identifier.citedreference	Tang, L. and Song, P.X. ( 2016 ) Fused lasso approach in regression coefficients clustering: learning parameter heterogeneity in data integration. The Journal of Machine Learning Research, 17, 3915 - 3937.
dc.identifier.citedreference	Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. ( 2005 ) Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 91 - 108.
dc.identifier.citedreference	Tibshirani, R.J. and Taylor, J. ( 2011 ) The solution path of the generalized lasso. The Annals of Statistics, 39, 1335 - 1371.
dc.identifier.citedreference	Van de Geer, S., BÃ¼hlmann, P., Ritov, Y. and Dezeure, R. ( 2014 ) On asymptotically optimal confidence regions and tests for high- dimensional models. The Annals of Statistics, 42, 1166 - 1202.
dc.identifier.citedreference	Wang, F., Wang, L. and Song, P. X.- K. ( 2016 ) Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements. Biometrics, 72, 1184 - 1193.
dc.identifier.citedreference	Wang, L., Zhou, J. and Qu, A. ( 2012 ) Penalized generalized estimating equations for high- dimensional longitudinal data analysis. Biometrics, 68, 353 - 360.
dc.identifier.citedreference	Zhang, C.- H. ( 2010 ) Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894 - 942.
dc.identifier.citedreference	Zhang, C.- H. and Zhang, S.S. ( 2014 ) Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 217 - 242.
dc.identifier.citedreference	Zou, H. ( 2006 ) The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418 - 1429.
dc.identifier.citedreference	Bach, F., Jenatton, R., Mairal, J. and Obozinski, G. ( 2012 ) Structured sparsity through convex optimization. Statistical Science, 27, 450 - 468.
dc.identifier.citedreference	Bondell, H.D. and Reich, B.J. ( 2008 ) Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics, 64, 115 - 123.
dc.identifier.citedreference	Bondell, H.D. and Reich, B.J. ( 2009 ) Simultaneous factor selection and collapsing levels in ANOVA. Biometrics, 65, 169 - 177.
dc.identifier.citedreference	Chen, J. and Chen, Z. ( 2012 ) Extended BIC for small- n- large- p sparse GLM. Statistica Sinica, 22, 555 - 574.
dc.identifier.citedreference	Chen, H.Y. and Little, R. ( 1999 ) A test of missing completely at random for generalised estimating equations with missing data. Biometrika, 86, 1 - 13.
dc.identifier.citedreference	Dawson, J.D. ( 1994 ) Stratification of summary statistic tests according to missing data patterns. Statistics in Medicine, 13, 1853 - 1863.
dc.identifier.citedreference	Diggle, P.J. ( 1989 ) Testing for random dropouts in repeated measurement data. Biometrics, 45, 1255 - 1258.
dc.identifier.citedreference	Fan, J. and Li, R. ( 2001 ) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348 - 1360.
dc.identifier.citedreference	Fu, W.J. ( 2003 ) Penalized estimating equations. Biometrics, 59, 126 - 132.
dc.identifier.citedreference	Hao, B., Sun, W.W., Liu, Y. and Cheng, G. ( 2018 ) Simultaneous clustering and estimation of heterogeneous graphical models. Journal of Machine Learning Research, 18, 1 - 58.
dc.identifier.citedreference	Hunter, D.R. and Li, R. ( 2005 ) Variable selection using MM algorithms. The Annals of Statistics, 33, 1617.
dc.working.doi	NO	en
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: biom13333.pdf
Size:: 453.8KB
Format:: PDF

View/Open

Name:: biom13333-sup-0002-Appendices.pdf
Size:: 1.372MB
Format:: PDF

View/Open

Name:: biom13333_am.pdf
Size:: 580.4KB
Format:: PDF

View/Open

Interdisciplinary and Peer-Reviewed

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.