It’s all relative: Regression analysis with compositional predictors

Li, Gen; Li, Yan; Chen, Kun

It’s all relative: Regression analysis with compositional predictors

dc.contributor.author	Li, Gen
dc.contributor.author	Li, Yan
dc.contributor.author	Chen, Kun
dc.date.accessioned	2023-07-14T13:58:07Z
dc.date.available	2024-07-14 09:58:06	en
dc.date.available	2023-07-14T13:58:07Z
dc.date.issued	2023-06
dc.identifier.citation	Li, Gen; Li, Yan; Chen, Kun (2023). "It’s all relative: Regression analysis with compositional predictors." Biometrics 79(2): 1318-1329.
dc.identifier.issn	0006-341X
dc.identifier.issn	1541-0420
dc.identifier.uri	https://hdl.handle.net/2027.42/177287
dc.description.abstract	Compositional data reside in a simplex and measure fractions or proportions of parts to a whole. Most existing regression methods for such data rely on log-ratio transformations that are inadequate or inappropriate in modeling high-dimensional data with excessive zeros and hierarchical structures. Moreover, such models usually lack a straightforward interpretation due to the interrelation between parts of a composition. We develop a novel relative-shift regression framework that directly uses proportions as predictors. The new framework provides a paradigm shift for regression analysis with compositional predictors and offers a superior interpretation of how shifting concentration between parts affects the response. New equi-sparsity and tree-guided regularization methods and an efficient smoothing proximal gradient algorithm are developed to facilitate feature aggregation and dimension reduction in regression. A unified finite-sample prediction error bound is derived for the proposed regularized estimators. We demonstrate the efficacy of the proposed methods in extensive simulation studies and a real gut microbiome study. Guided by the taxonomy of the microbiome data, the framework identifies important taxa at different taxonomic levels associated with the neurodevelopment of preterm infants.
dc.publisher	Springer
dc.publisher	Wiley Periodicals, Inc.
dc.subject.other	relative shift
dc.subject.other	tree-guided regularization
dc.subject.other	microbiome
dc.subject.other	feature aggregation
dc.subject.other	equi-sparsity
dc.title	It’s all relative: Regression analysis with compositional predictors
dc.type	Article
dc.rights.robots	IndexNoFollow
dc.subject.hlbsecondlevel	Mathematics
dc.subject.hlbtoplevel	Science
dc.description.peerreviewed	Peer Reviewed
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/177287/1/biom13703.pdf
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/177287/2/biom13703_am.pdf
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/177287/3/biom13703-sup-0002-SuppMat.pdf
dc.identifier.doi	10.1111/biom.13703
dc.identifier.source	Biometrics
dc.identifier.citedreference	Shi, P., Zhou, Y. & Zhang, A. ( 2021 ) High-dimensional log-error-in-variable regression with applications to microbial compositional data analysis. Biometrika, 109, 405 – 420.
dc.identifier.citedreference	Chen, X., Lin, Q., Kim, S., Carbonell, J.G. & Xing, E.P. ( 2012 ) Smoothing proximal gradient method for general structured sparse regression. The Annals of Applied Statistics, 6, 719 – 752.
dc.identifier.citedreference	Combettes, P.L. & Müller, C.L. ( 2021 ) Regression models for compositional data: General log-contrast formulations, proximal optimization, and microbiome data applications. Statistics in Biosciences, 13, 217 – 242.
dc.identifier.citedreference	Cong, X., Judge, M., Xu, W., Diallo, A., Janton, S., Brownell, E.A. et al. ( 2017 ) Influence of infant feeding type on gut microbiome development in hospitalized preterm infants. Nursing Research, 66, 123 – 133.
dc.identifier.citedreference	Garcia, T.P., Müller, S., Carroll, R.J. & Walzem, R.L. ( 2013 ) Identification of important regressor groups, subgroups and individuals via regularization methods: application to gut microbiome data. Bioinformatics, 30, 831 – 837.
dc.identifier.citedreference	Gloor, G.B., Wu, J.R., Pawlowsky-Glahn, V. & Egozcue, J.J. ( 2016 ) It’s all relative: analyzing microbiome data as compositions. Annals of Epidemiology, 26, 322 – 329.
dc.identifier.citedreference	Greenacre, M. ( 2020 ) Amalgamations are valid in compositional data analysis, can be used in agglomerative clustering, and their log-ratios have an inverse transformation. Applied Computing and Geosciences, 5, 100017.
dc.identifier.citedreference	Hastie, T., Tibshirani, R. & Wainwright, M. ( 2019 ) Statistical Learning with Sparsity: The Lasso and Generalizations. London: Chapman and Hall/CRC.
dc.identifier.citedreference	Kim, S., Sohn, K.-A. & Xing, E.P. ( 2009 ) A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics, 25, i204 – i212.
dc.identifier.citedreference	Li, H. ( 2015 ) Microbiome, metagenomics, and high-dimensional compositional data analysis. Annual Review of Statistics and Its Application, 2, 73 – 94.
dc.identifier.citedreference	Lin, W., Shi, P., Feng, R. & Li, H. ( 2014 ) Variable selection in regression with compositional covariates. Biometrika, 101, 785 – 797.
dc.identifier.citedreference	Nesterov, Y. ( 2005 ) Smooth minimization of non-smooth functions. Mathematical Programming, 103, 127 – 152.
dc.identifier.citedreference	Palarea-Albaladejo, J. & Martin-Fernandez, J. ( 2013 ) Values below detection limit in compositional chemical data. Analytica Chimica Acta, 764, 32 – 43.
dc.identifier.citedreference	Randolph, T.W., Zhao, S., Copeland, W., Hullar, M. & Shojaie, A. ( 2018 ) Kernel-penalized regression for analysis of microbiome data. The Annals of Applied Statistics, 12, 540 – 566.
dc.identifier.citedreference	She, Y. ( 2010 ) Sparse regression with exact clustering. Electronic Journal of Statistics, 4, 1055 – 1096.
dc.identifier.citedreference	Shi, P., Zhang, A. & Li, H. ( 2016 ) Regression analysis for microbiome compositional data. The Annals of Applied Statistics, 10, 1019 – 1040.
dc.identifier.citedreference	Silverman, J.D., Washburne, A.D., Mukherjee, S. & David, L.A. ( 2017 ) A phylogenetic transform enhances analysis of compositional microbiota data. Elife, 6, e21887.
dc.identifier.citedreference	Sun, Z., Xu, W., Cong, X., Li, G. & Chen, K. ( 2020 ) Log-contrast regression with functional compositional predictors: linking preterm infant’s gut microbiome trajectories in early postnatal period to neurobehavioral outcome. The Annals of Applied Statistics, 14, 1535 – 1556.
dc.identifier.citedreference	Tsilimigras, M.C. & Fodor, A.A. ( 2016 ) Compositional data analysis of the microbiome: fundamentals, tools, and challenges. Annals of Epidemiology, 26, 330 – 335.
dc.identifier.citedreference	Wang, T. & Zhao, H. ( 2017 ) Structured subcomposition selection in regression and its application to microbiome data analysis. The Annals of Applied Statistics, 11, 771 – 791.
dc.identifier.citedreference	Xia, Y., Sun, J. & Chen, D.-G. ( 2018 ) Statistical analysis of microbiome data with R (Vol. 847). Singapore: Springer.
dc.identifier.citedreference	Xu, T., Demmer, R.T. & Li, G. ( 2021 ) Zero-inflated poisson factor model with application to microbiome read counts. Biometrics, 77, 91 – 101.
dc.identifier.citedreference	Yan, X. & Bien, J. ( 2021 ) Rare feature selection in high dimensions. Journal of the American Statistical Association, 116, 887 – 900.
dc.identifier.citedreference	Aitchison, J. ( 1982 ) The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B, 44, 139 – 160.
dc.identifier.citedreference	Aitchison, J. ( 1983 ) Principal component analysis of compositional data. Biometrika, 70, 57 – 65.
dc.identifier.citedreference	Aitchison, J. & Bacon-Shone, J. ( 1984 ) Log contrast models for experiments with mixtures. Biometrika, 71, 323 – 330.
dc.identifier.citedreference	Aitchison, J. & Egozcue, J.J. ( 2005 ) Compositional data analysis: where are we and where should we be heading? Mathematical Geology, 37, 829 – 850.
dc.identifier.citedreference	Beck, A. & Teboulle, M. ( 2009 ) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2, 183 – 202.
dc.identifier.citedreference	Bien, J., Yan, X., Simpson, L. & Müller, C.L. ( 2020 ) Tree-aggregated predictive modeling of microbiome data. bioRxiv.
dc.identifier.citedreference	Bien, J., Yan, X., Simpson, L. & Müller, C.L. ( 2021 ) Tree-aggregated predictive modeling of microbiome data. Scientific Reports, 11 ( 1 ), 1 – 13.
dc.identifier.citedreference	Bühlmann, P. & van de Geer, S. ( 2009 ) Statistics for High-Dimensional Data. Berlin: Springer.
dc.working.doi	NO	en
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: biom13703.pdf
Size:: 415.4KB
Format:: PDF

View/Open

Name:: biom13703_am.pdf
Size:: 530.0KB
Format:: PDF

View/Open

Name:: biom13703-sup-0002-SuppMat.pdf
Size:: 306.0KB
Format:: PDF

View/Open

Interdisciplinary and Peer-Reviewed

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.