Show simple item record

One-dimensional and multi-dimensional substring selectivity estimation

dc.contributor.authorKapitskaia, Olgaen_US
dc.contributor.authorSrivastava, Diveshen_US
dc.contributor.authorJagadish, H. V.en_US
dc.contributor.authorNg, Raymond T.en_US
dc.date.accessioned2006-09-08T20:13:26Z
dc.date.available2006-09-08T20:13:26Z
dc.date.issued2000-12en_US
dc.identifier.citationJagadish, H.V.; Kapitskaia, Olga; Ng, Raymond T.; Srivastava, Divesh; (2000). "One-dimensional and multi-dimensional substring selectivity estimation." The VLDB Journal 9(3): 214-230. <http://hdl.handle.net/2027.42/42330>en_US
dc.identifier.issn1066-8888en_US
dc.identifier.urihttps://hdl.handle.net/2027.42/42330
dc.description.abstractWith the increasing importance of XML, LDAP directories, and text-based information sources on the Internet, there is an ever-greater need to evaluate queries involving (sub)string matching. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the multiple dimensions. Effective query optimization in this context requires good selectivity estimates. In this paper, we use pruned count-suffix trees (PSTs) as the basic data structure for substring selectivity estimation. For the 1-D problem, we present a novel technique called MO (Maximal Overlap). We then develop and analyze two 1-D estimation algorithms, MOC and MOLC, based on MO and a constraint-based characterization of all possible completions of a given PST. For the k -D problem, we first generalize PSTs to multiple dimensions and develop a space- and time-efficient probabilistic algorithm to construct k -D PSTs directly. We then show how to extend MO to multiple dimensions. Finally, we demonstrate, both analytically and experimentally, that MO is both practical and substantially superior to competing algorithms.en_US
dc.format.extent215655 bytes
dc.format.extent3115 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypetext/plain
dc.language.isoen_US
dc.publisherSpringer-Verlag; Springer-Verlag Berlin Heidelbergen_US
dc.subject.otherKey Words: String Selectivity – Maximal Overlap – Short Memory Property – Pruned Count-suffix Treeen_US
dc.subject.otherLegacyen_US
dc.titleOne-dimensional and multi-dimensional substring selectivity estimationen_US
dc.typeArticleen_US
dc.subject.hlbsecondlevelPhilosophyen_US
dc.subject.hlbsecondlevelComputer Scienceen_US
dc.subject.hlbtoplevelHumanitiesen_US
dc.subject.hlbtoplevelEngineeringen_US
dc.description.peerreviewedPeer Revieweden_US
dc.contributor.affiliationumUniversity of Michigan, Ann Arbor; E-mail: jag@umich.edu, USen_US
dc.contributor.affiliationotherPôle Universitaire Léonard de Vinci; E-mail: Olga.Kapitskaia@devinci.fr, FRen_US
dc.contributor.affiliationotherAT&T Labs – Research, 180 Park Avenue, Bldg 103, Florham Park, NJ 07932, USA; E-mail: divesh@research.att.com, USen_US
dc.contributor.affiliationotherUniversity of British Columbia; E-mail: rng@cs.ubc.ca, CAen_US
dc.contributor.affiliationumcampusAnn Arboren_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/42330/1/778-9-3-214_00090214.pdfen_US
dc.identifier.doihttp://dx.doi.org/10.1007/s007780000029en_US
dc.identifier.sourceThe VLDB Journalen_US
dc.owningcollnameInterdisciplinary and Peer-Reviewed


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.