Geometric Inference in Bayesian Hierarchical Models with Applications to Topic Modeling

Yurochkin, Mikhail

Geometric Inference in Bayesian Hierarchical Models with Applications to Topic Modeling

dc.contributor.author	Yurochkin, Mikhail
dc.date.accessioned	2018-10-25T17:42:00Z
dc.date.available	NO_RESTRICTION
dc.date.available	2018-10-25T17:42:00Z
dc.date.issued	2018
dc.date.submitted	2018
dc.identifier.uri	https://hdl.handle.net/2027.42/146051
dc.description.abstract	Unstructured data is available in abundance with the rapidly growing size of digital information. Labeling such data is expensive and impractical, making unsupervised learning an increasingly important field. Big data collections often have rich latent structure that statistical modeler is challenged to uncover. Bayesian hierarchical modeling is a particularly suitable approach for complex latent patterns. Graphical model formalism has been prominent in developing various procedures for inference in Bayesian models, however the corresponding computational limits often fall behind the demands of the modern data sizes. In this thesis we develop new approaches for scalable approximate Bayesian inference. In particular, our approaches are driven by the analysis of latent geometric structures induced by the models. Our specific contributions include the following. We develop full geometric recipe of the Latent Dirichlet Allocation topic model. Next, we study several approaches for exploiting the latent geometry to first arrive at a fast weighted clustering procedure augmented with geometric corrections for topic inference, and then a nonparametric approach based on the analysis of the concentration of mass and angular geometry of the topic simplex, a convex polytope constructed by taking the convex hull of vertices representing the latent topics. Estimates produced by our methods are shown to be statistically consistent under some conditions. Finally, we develop a series of models for temporal dynamics of the latent geometric structures where inference can be performed in online and distributed fashion. All our algorithms are evaluated with extensive experiments on simulated and real datasets, culminating at a method several orders of magnitude faster than existing state-of-the-art topic modeling approaches, as demonstrated by experiments working with several million documents in a dozen minutes.
dc.language.iso	en_US
dc.subject	Bayesian inference and modeling
dc.subject	Latent Dirichlet Allocation
dc.subject	Bayesian nonparametrics
dc.subject	Geometric Bayesian inference
dc.subject	Topic Modeling
dc.subject	Scalable Bayesian inference
dc.title	Geometric Inference in Bayesian Hierarchical Models with Applications to Topic Modeling
dc.type	Thesis	en_US
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Statistics
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Nguyen, Long
dc.contributor.committeemember	Mei, Qiaozhu
dc.contributor.committeemember	Sun, Yuekai
dc.contributor.committeemember	Tewari, Ambuj
dc.contributor.committeemember	Zhou, Shuheng
dc.subject.hlbsecondlevel	Statistics and Numeric Data
dc.subject.hlbtoplevel	Science
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/146051/1/moonfolk_1.pdf
dc.identifier.orcid	0000-0003-0153-6811
dc.identifier.name-orcid	Yurochkin, Mikhail; 0000-0003-0153-6811	en_US
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: moonfolk_1.pdf
Size:: 1.793MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.