Show simple item record

Statistical Learning for Latent Attribute Models

dc.contributor.authorMa, Chenchen
dc.date.accessioned2022-09-06T16:27:08Z
dc.date.available2022-09-06T16:27:08Z
dc.date.issued2022
dc.date.submitted2022
dc.identifier.urihttps://hdl.handle.net/2027.42/174627
dc.description.abstractLatent variable models are popularly used in unsupervised learning to uncover the latent structures underlying observed data and have seen great successes in representation learning in many applications and scientific disciplines. Latent attribute models, also known as cognitive diagnosis models or diagnostic classification models, are a special family of discrete latent variable models that have been widely applied in modern psychological and biomedical research with diagnostic purposes. Despite the wide usage in various fields, the models' discrete nature and complex restricted structures pose many new challenges for efficient learning and statistical inference. Moreover, with the large-scale item and subject pools emerging in modern educational and psychological measurements, efficient algorithms for uncovering latent structures of both items and subjects are desired. This dissertation studies four important problems that arise in this context. (I) The first part develops novel methodologies and efficient algorithms to learn the latent and hierarchical structures in latent attribute models. Specifically, researchers in many applications are interested in hierarchical structures among the latent attributes, such as prerequisite relationships among target skills in educational settings. However, in most cognitive diagnosis applications, the number of latent attributes, the attribute-attribute hierarchical structures, the item-attribute dependence structures, as well as the item-level diagnostic models, need to be fully or partially pre-specified, which may be subjective and misspecified as noted by many recent studies. In this part, we consider the problem of jointly learning these latent quantities and hierarchical structures from observed data with minimal model assumptions. A penalized likelihood approach is proposed for joint learning, an Expectation-Maximization (EM) algorithm is developed for efficient computation, and statistical consistency theory is established under mild conditions. (II) The second part generalizes the methodologies in part I to simultaneously infer the subgroup structures of both subjects and items. We consider the model-based co-clustering algorithms and aim to automatically select numbers of clusters and uncover latent block structures. Specifically, based on latent block models, we propose a penalized co-clustering approach that is capable of learning the numbers of clusters and inner block structures simultaneously. Efficient EM algorithms have been developed and comprehensive simulation studies demonstrate their superiority. (III) The third part concerns the important yet unaddressed problem of testing the latent hierarchical structures in latent attribute models. Testing the hierarchical structures is shown to be equivalent to testing the sparsity structure of the proportion parameter vector. However, due to the irregularity of the problem, the asymptotic distribution of the popular likelihood ratio test becomes nonstandard and tends to provide unsatisfactory finite sample performance under practical conditions. To tackle these challenges, we discuss the conditions of testability issues, provide statistical understandings of the failures, and propose a practical resampling-based procedure. (IV) The fourth part introduces a unified estimation framework to bridge the gap between parametric and nonparametric methods in cognitive diagnosis to better understand their relationship. In particular, a number of parametric and nonparametric methods for estimating latent attribute models have been developed and applied in a wide range of contexts. However, in the literature, a wide chasm exists between these two families of methods, and their relationship to each other is not well understood. Driven by this divide, we propose a unified framework and provide both theoretical analysis and practical recommendations under various cognitive diagnosis settings.
dc.language.isoen_US
dc.subjectLatent Variable Models
dc.subjectCognitive Diagnosis
dc.subjectHierarchical Structures
dc.subjectCo-clustering
dc.subjectHypothesis Testing
dc.subjectNonparametric
dc.titleStatistical Learning for Latent Attribute Models
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineStatistics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberXu, Gongjun
dc.contributor.committeememberWu, Zhenke
dc.contributor.committeememberTan, Kean Ming
dc.contributor.committeememberZhu, Ji
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbsecondlevelEducation
dc.subject.hlbsecondlevelPsychology
dc.subject.hlbtoplevelScience
dc.subject.hlbtoplevelSocial Sciences
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/174627/1/chenchma_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/6358
dc.identifier.orcid0000-0003-2784-9920
dc.identifier.name-orcidMa, Chenchen; 0000-0003-2784-9920en_US
dc.working.doi10.7302/6358en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.