Show simple item record

Streaming Sketching: Mathematical Theory and Practical Algorithms

dc.contributor.authorWang, Dingyu
dc.date.accessioned2025-05-12T17:39:43Z
dc.date.available2025-05-12T17:39:43Z
dc.date.issued2025
dc.date.submitted2025
dc.identifier.urihttps://hdl.handle.net/2027.42/197252
dc.description.abstractA streaming sketch is a data structure that can approximately answer queries of a data stream, whose size is much smaller than the stream length or the domain size. This thesis studies a wide range of topics on streaming sketching including approximate counting, cardinality estimation, the tractability of queries, the universality of sketches, non-mergeable sketches, as well as sampling. My research goes in two directions. Mathematical theory: For this direction, the simplicity and generality of the theory is the primary goal, and the produced algorithms can be existential or more like mathematical prototypes. Notable progress includes the following. I explicitly connect Levy processes to streaming algorithms, revealing a deep connection between infinite divisibility and scale-invariance. This connection sheds new light on the important tractability problem and also pins down the hash pattern one needs to $G$-sample over incremental streams with exactly correct sampling probabilities. I propose a new universal sketching scheme by decomposing general streaming queries into easy-to-estimate “harmonic queries.” The decomposition can be obtained from the Levy-Khintchine representation theorem or the Fourier transform. I propose a formal measurement of the goodness of a cardinality sketch, which sets up a tension between the Fisher information number and Shannon's entropy. I also generalize cardinality sketches to work over any (finite) commutative monoid, using the harmonic decomposition technique together with the semigroup decomposition of commutative monoids. Practical algorithms: For this direction, the simplicity and practicality of the algorithm is the primary goal, and the associated analysis is usually complicated and computation-heavy. Small-scale experiments are often included to confirm the analysis. I analyze various classes of sketches and estimators for the cardinality query. Many recipes are constructed for practitioners to pick and construct cardinality sketches and estimators in different contexts with different performance emphases. I study the sequential behaviors of cardinality sketches and construct simple fraud detectors. I study two variants of the Morris counter, one tracks a multi-dimensional count vector and the other tracks a count that can be both incremented and decremented. Fortunately, some of the algorithms I construct turn out to possess simplicity in both theory and practice. Notable examples are the $F_{1/2}$-sampler, stable-HyperLogLog, and the $d$-dimensional approximate counter.
dc.language.isoen_US
dc.subjectstreaming and sketching
dc.subjectalgorithms and data structures
dc.titleStreaming Sketching: Mathematical Theory and Practical Algorithms
dc.typeThesis
dc.description.thesisdegreenamePhD
dc.description.thesisdegreedisciplineComputer Science & Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberPettie, Seth
dc.contributor.committeememberRudelson, Mark
dc.contributor.committeememberBansal, Nikhil
dc.contributor.committeememberDerezinski, Michal
dc.contributor.committeememberWoodruff, David
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbtoplevelEngineering
dc.contributor.affiliationumcampusAnn Arbor
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/197252/1/wangdy_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/25678
dc.identifier.orcid0009-0000-5462-8715
dc.identifier.name-orcidWang, Dingyu; 0009-0000-5462-8715en_US
dc.working.doi10.7302/25678en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.