Feature selection with interactions in logistic regression models using multivariate synergies for a GWAS application
dc.contributor.author | Xu, Easton L | |
dc.contributor.author | Qian, Xiaoning | |
dc.contributor.author | Yu, Qilian | |
dc.contributor.author | Zhang, Han | |
dc.contributor.author | Cui, Shuguang | |
dc.date.accessioned | 2018-03-25T05:23:22Z | |
dc.date.available | 2018-03-25T05:23:22Z | |
dc.date.issued | 2018-03-21 | |
dc.identifier.citation | BMC Genomics. 2018 Mar 21;19(Suppl 4):170 | |
dc.identifier.uri | http://dx.doi.org/10.1186/s12864-018-4552-x | |
dc.identifier.uri | https://hdl.handle.net/2027.42/142801 | |
dc.description.abstract | Abstract Background Genotype-phenotype association has been one of the long-standing problems in bioinformatics. Identifying both the marginal and epistatic effects among genetic markers, such as Single Nucleotide Polymorphisms (SNPs), has been extensively integrated in Genome-Wide Association Studies (GWAS) to help derive “causal” genetic risk factors and their interactions, which play critical roles in life and disease systems. Identifying “synergistic” interactions with respect to the outcome of interest can help accurate phenotypic prediction and understand the underlying mechanism of system behavior. Many statistical measures for estimating synergistic interactions have been proposed in the literature for such a purpose. However, except for empirical performance, there is still no theoretical analysis on the power and limitation of these synergistic interaction measures. Results In this paper, it is shown that the existing information-theoretic multivariate synergy depends on a small subset of the interaction parameters in the model, sometimes on only one interaction parameter. In addition, an adjusted version of multivariate synergy is proposed as a new measure to estimate the interactive effects, with experiments conducted over both simulated data sets and a real-world GWAS data set to show the effectiveness. Conclusions We provide rigorous theoretical analysis and empirical evidence on why the information-theoretic multivariate synergy helps with identifying genetic risk factors via synergistic interactions. We further establish the rigorous sample complexity analysis on detecting interactive effects, confirmed by both simulated and real-world data sets. | |
dc.title | Feature selection with interactions in logistic regression models using multivariate synergies for a GWAS application | |
dc.type | Article | en_US |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/142801/1/12864_2018_Article_4552.pdf | |
dc.language.rfc3066 | en | |
dc.rights.holder | The Author(s) | |
dc.date.updated | 2018-03-25T05:23:27Z | |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.