Neighborhood Graphs for Estimation of Density Functionals.
Kallur Palli Kumar, Sricharan
2012
Abstract
Functionals of densities play a fundamental role in statistics, signal processing, machine learning, information theory and related fields. This class of functionals includes divergence measures of densities, intrinsic dimension of the density support, and minimum volume sets. k-nearest neighbor (k-NN) graph based estimators are widely used for the estimation of these functionals. While several consistent k-NN estimators have been previously proposed for estimating these functionals, general results on rates of convergence of these estimators and confidence intervals on the estimated functional are not available. In this thesis, a new class of estimators based on bipartite k-NN graphs is proposed for estimating functionals of probability density functions. This class includes divergence estimators, intrinsic dimension estimators and minimum volume set estimators. For this class of estimators, large sample theory is used to characterize performance of the estimators. Specifically, large sample expressions for estimator bias and variance is derived and a central limit theorem for the distribution of the estimators is established. This theory is applied to accurately estimate functionals of interest by optimizing the mean squared error over free parameters, e.g. the number of neighbors k, and obtaining confidence intervals on the estimated functional by invoking the central limit theorem. Furthermore, this theory provides significant insight into the statistical behavior of these bipartite k-NN estimators, leading to the development of modified k-NN estimators with faster rates of convergence. In particular, a weighted ensemble of bipartite k-NN estimators for functional estimation is proposed, and it is shown using this theory, that the weighted ensemble estimator outperforms the state-of-the-art. Using this theory, the thesis develops performance-driven algorithms in several applications. First, the theory is applied to determine entropy with confidence to facilitate anomaly detection at desired false alarm rates in wireless sensor networks. Second, the theory is applied to determine complexity of high-dimensional data lying on a manifold, and subsequently applied to fusion and segmentation applications. Finally, the thesis introduces an efficient anomaly detection algorithm based on estimation of p-values of membership in training-sample minimum volume sets using bipartite k-NN graphs.Subjects
K Nearest Neighbor K-NN Graphs K-NN Estimators Ensemble Estimators Entropy Estimation Dimension Estimation
Types
Thesis
Metadata
Show full item recordCollections
Showing items related by title, author, creator and subject.
-
Song, Jeeseon; Mohr, Joseph J.; Barkhouse, Wayne A.; Warren, Michael S.; Dolag, Klaus; Rude, Cody (IOP Publishing, 2012)
-
Jacquez, John A.; Greif, Peter (Elsevier, 1985-12)
-
Zhong, Jiapeng; Morozov, Alexey; Rahman, Yousaf; Bernstein, Dennis (Peer Reviewed, 2013)
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.