Geometric Inference in Bayesian Hierarchical Models with Applications to Topic Modeling
dc.contributor.author | Yurochkin, Mikhail | |
dc.date.accessioned | 2018-10-25T17:42:00Z | |
dc.date.available | NO_RESTRICTION | |
dc.date.available | 2018-10-25T17:42:00Z | |
dc.date.issued | 2018 | |
dc.date.submitted | 2018 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/146051 | |
dc.description.abstract | Unstructured data is available in abundance with the rapidly growing size of digital information. Labeling such data is expensive and impractical, making unsupervised learning an increasingly important field. Big data collections often have rich latent structure that statistical modeler is challenged to uncover. Bayesian hierarchical modeling is a particularly suitable approach for complex latent patterns. Graphical model formalism has been prominent in developing various procedures for inference in Bayesian models, however the corresponding computational limits often fall behind the demands of the modern data sizes. In this thesis we develop new approaches for scalable approximate Bayesian inference. In particular, our approaches are driven by the analysis of latent geometric structures induced by the models. Our specific contributions include the following. We develop full geometric recipe of the Latent Dirichlet Allocation topic model. Next, we study several approaches for exploiting the latent geometry to first arrive at a fast weighted clustering procedure augmented with geometric corrections for topic inference, and then a nonparametric approach based on the analysis of the concentration of mass and angular geometry of the topic simplex, a convex polytope constructed by taking the convex hull of vertices representing the latent topics. Estimates produced by our methods are shown to be statistically consistent under some conditions. Finally, we develop a series of models for temporal dynamics of the latent geometric structures where inference can be performed in online and distributed fashion. All our algorithms are evaluated with extensive experiments on simulated and real datasets, culminating at a method several orders of magnitude faster than existing state-of-the-art topic modeling approaches, as demonstrated by experiments working with several million documents in a dozen minutes. | |
dc.language.iso | en_US | |
dc.subject | Bayesian inference and modeling | |
dc.subject | Latent Dirichlet Allocation | |
dc.subject | Bayesian nonparametrics | |
dc.subject | Geometric Bayesian inference | |
dc.subject | Topic Modeling | |
dc.subject | Scalable Bayesian inference | |
dc.title | Geometric Inference in Bayesian Hierarchical Models with Applications to Topic Modeling | |
dc.type | Thesis | en_US |
dc.description.thesisdegreename | PhD | en_US |
dc.description.thesisdegreediscipline | Statistics | |
dc.description.thesisdegreegrantor | University of Michigan, Horace H. Rackham School of Graduate Studies | |
dc.contributor.committeemember | Nguyen, Long | |
dc.contributor.committeemember | Mei, Qiaozhu | |
dc.contributor.committeemember | Sun, Yuekai | |
dc.contributor.committeemember | Tewari, Ambuj | |
dc.contributor.committeemember | Zhou, Shuheng | |
dc.subject.hlbsecondlevel | Statistics and Numeric Data | |
dc.subject.hlbtoplevel | Science | |
dc.description.bitstreamurl | https://deepblue.lib.umich.edu/bitstream/2027.42/146051/1/moonfolk_1.pdf | |
dc.identifier.orcid | 0000-0003-0153-6811 | |
dc.identifier.name-orcid | Yurochkin, Mikhail; 0000-0003-0153-6811 | en_US |
dc.owningcollname | Dissertations and Theses (Ph.D. and Master's) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.