Show simple item record

Variable Weight Kernel Density Estimation

dc.contributor.authorCruz Cortes, Efren
dc.date.accessioned2017-10-05T20:27:02Z
dc.date.availableNO_RESTRICTION
dc.date.available2017-10-05T20:27:02Z
dc.date.issued2017
dc.date.submitted
dc.identifier.urihttps://hdl.handle.net/2027.42/138533
dc.description.abstractNonparametric density estimation is a common and important task in many problems in machine learning. It consists in estimating a density function from available observations without making parametric assumptions on the generating distribution. Kernel means are nonparametric estimators composed of the average of simple functions, called kernels, centered at each data point. This work studies some relatives of these kernel means with structural similarity but which assign different weights to each kernel unit in order to attain certain desired characteristics. In particular, we present a sparse kernel mean estimator and a consistent kernel density estimator with fixed bandwidth parameter. First, regarding kernel means, we study the kernel density estimator (KDE) and the kernel mean embedding. These are frequently used to represent probability distributions, unfortunately, they face scalability issues. A single point evaluation of the kernel density estimator, for example, requires a computation time linear in the training sample size. To address this challenge, we present a method to efficiently construct a sparse approximation of a kernel mean. We do so by first establishing an incoherence-based bound on the approximation error. We then observe that, for any kernel with constant norm (which includes all translation invariant kernels), the bound can be efficiently minimized by solving the k-center problem. The outcome is a linear time construction of a sparse kernel mean, which also lends itself naturally to an automatic sparsity selection scheme. We demonstrate the computational gains of our method by looking at several benchmark data sets, as well as three applications involving kernel means: Euclidean embedding of distributions, class proportion estimation, and clustering using the mean-shift algorithm. Second we address the bandwidth selection problem in kernel density estimation. Consistency of the KDE requires that the kernel bandwidth tends to zero as the sample size grows. In this work, we investigate the question of whether consistency is still possible when the bandwidth is fixed, if we consider a more general class of weighted KDEs. To answer this question in the affirmative, we introduce the fixed-bandwidth KDE (fbKDE), obtained by solving a quadratic program, that consistently estimates any continuous square-integrable density. Rates of convergence are also established for the fbKDE for radial kernels and the box kernel under appropriate smoothness assumptions. Furthermore, in a simulation study we demonstrate that the fbKDE compares favorably to the standard KDE and the previously proposed variable bandwidth KDE.
dc.language.isoen_US
dc.subjectmachine learning
dc.subjectdensity estimation
dc.subjectreproducing kernel Hilbert space
dc.subjectcomplexity
dc.subjectsparsity
dc.subjectconsistency
dc.titleVariable Weight Kernel Density Estimation
dc.typeThesisen_US
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineElectrical Engineering: Systems
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberScott, Clayton D
dc.contributor.committeememberNguyen, Long
dc.contributor.committeememberBalzano, Laura Kathryn
dc.contributor.committeememberZhang, Jun
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbsecondlevelElectrical Engineering
dc.subject.hlbsecondlevelStatistics and Numeric Data
dc.subject.hlbtoplevelEngineering
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttps://deepblue.lib.umich.edu/bitstream/2027.42/138533/1/encc_1.pdf
dc.identifier.orcid0000-0002-2062-6444
dc.identifier.name-orcidCruz Cortés, Efrén; 0000-0002-2062-6444en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.