Large Data Approaches to Thresholding Problems

Lu, Zhiyuan

Large Data Approaches to Thresholding Problems

dc.contributor.author	Lu, Zhiyuan
dc.date.accessioned	2020-01-27T16:23:49Z
dc.date.available	NO_RESTRICTION
dc.date.available	2020-01-27T16:23:49Z
dc.date.issued	2019
dc.date.submitted	2019
dc.identifier.uri	https://hdl.handle.net/2027.42/153384
dc.description.abstract	Statistical models with discontinuities have seen much use in a variety of situations, in practical fields such as statistical process control, processing gene data, and econometrics. The study of such models is usually concerned with locating the these discontinuities, which methodologically cause various issues as estimation requires nonstandard optimization problems. With the contemporary increase in computer power and memory, it becomes more relevant to view these problems in the context of very large datasets, a context which introduces further complications for estimation. In this thesis, we study two major topics in threshold estimation, with models, methodology, and results motivated by the concern towards handling big data. Our first topic focuses on the change point problem, which involves detection of the locations where a change in distribution occurs within a data sequence. A variety of methods have been proposed and studied in this area, with novel approaches in the case where the number of change points is an unknown that could be greater than 1, making exhaustive search methods infeasible. Our contribution in this problem is motivated by the principle that only the data points close to the change points are useful for their estimation while other points are extraneous. From this observation we propose a zoom in estimation method which efficiently subsamples the data for estimation while not compromising the accuracy. The resulting method runs in sublinear time, while existing methods all run in linear time or above. Furthermore, the nature of this new methodology allows us to characterize the asymptotic distribution even in the case where the number of change point parameters increases without bound, a type of result not replicated in this field. The second topic regards the change plane model, which involves a real valued signal over a multiple dimensional space with a discontinuity delineated by a hyperplane. Practically the change plane model is used to combine regression between a covariate and response variable, while performing unsupervised classification onto the covariate. As change -plane models in growing dimensions have not been studied in the literature, we confine ourselves to canonical models in this dissertation, as a first approach to these problems. in terms of details, we establish fundamental convergence and support selection properties (the latter for the high-dimensional case) and present some simulation results.
dc.language.iso	en_US
dc.subject	change point
dc.subject	adaptive sampling
dc.subject	computational time
dc.title	Large Data Approaches to Thresholding Problems
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Statistics
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Banerjee, Moulinath
dc.contributor.committeemember	Michailidis, George
dc.contributor.committeemember	Cattaneo, Matias Damian
dc.contributor.committeemember	Ritov, Yaacov
dc.subject.hlbsecondlevel	Statistics and Numeric Data
dc.subject.hlbtoplevel	Science
dc.description.bitstreamurl	https://deepblue.lib.umich.edu/bitstream/2027.42/153384/1/jlnlu_1.pdf
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: jlnlu_1.pdf
Size:: 1.796MB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.