Show simple item record

Geometric Inference in Bayesian Hierarchical Models with Applications to Topic Modeling

dc.contributor.authorYurochkin, Mikhail
dc.date.accessioned2018-10-25T17:42:00Z
dc.date.availableNO_RESTRICTION
dc.date.available2018-10-25T17:42:00Z
dc.date.issued2018
dc.date.submitted2018
dc.identifier.urihttps://hdl.handle.net/2027.42/146051
dc.description.abstractUnstructured data is available in abundance with the rapidly growing size of digital information. Labeling such data is expensive and impractical, making unsupervised learning an increasingly important field. Big data collections often have rich latent structure that statistical modeler is challenged to uncover. Bayesian hierarchical modeling is a particularly suitable approach for complex latent patterns. Graphical model formalism has been prominent in developing various procedures for inference in Bayesian models, however the corresponding computational limits often fall behind the demands of the modern data sizes. In this thesis we develop new approaches for scalable approximate Bayesian inference. In particular, our approaches are driven by the analysis of latent geometric structures induced by the models. Our specific contributions include the following. We develop full geometric recipe of the Latent Dirichlet Allocation topic model. Next, we study several approaches for exploiting the latent geometry to first arrive at a fast weighted clustering procedure augmented with geometric corrections for topic inference, and then a nonparametric approach based on the analysis of the concentration of mass and angular geometry of the topic simplex, a convex polytope constructed by taking the convex hull of vertices representing the latent topics. Estimates produced by our methods are shown to be statistically consistent under some conditions. Finally, we develop a series of models for temporal dynamics of the latent geometric structures where inference can be performed in online and distributed fashion. All our algorithms are evaluated with extensive experiments on simulated and real datasets, culminating at a method several orders of magnitude faster than existing state-of-the-art topic modeling approaches, as demonstrated by experiments working with several million documents in a dozen minutes.
dc.language.isoen_US
dc.subjectBayesian inference and modeling
dc.subjectLatent Dirichlet Allocation
dc.subjectBayesian nonparametrics
dc.subjectGeometric Bayesian inference
dc.subjectTopic Modeling
dc.subjectScalable Bayesian inference
dc.titleGeometric Inference in Bayesian Hierarchical Models with Applications to Topic Modeling
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineStatistics
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberNguyen, Long
dc.contributor.committeememberMei, Qiaozhu
dc.contributor.committeememberSun, Yuekai
dc.contributor.committeememberTewari, Ambuj
dc.contributor.committeememberZhou, Shuheng
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/146051/1/moonfolk_1.pdf
dc.identifier.orcid0000-0003-0153-6811
dc.identifier.name-orcidYurochkin, Mikhail; 0000-0003-0153-6811en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.