One-dimensional and multi-dimensional substring selectivity estimation
dc.contributor.author | Kapitskaia, Olga | en_US |
dc.contributor.author | Srivastava, Divesh | en_US |
dc.contributor.author | Jagadish, H. V. | en_US |
dc.contributor.author | Ng, Raymond T. | en_US |
dc.date.accessioned | 2006-09-08T20:13:26Z | |
dc.date.available | 2006-09-08T20:13:26Z | |
dc.date.issued | 2000-12 | en_US |
dc.identifier.citation | Jagadish, H.V.; Kapitskaia, Olga; Ng, Raymond T.; Srivastava, Divesh; (2000). "One-dimensional and multi-dimensional substring selectivity estimation." The VLDB Journal 9(3): 214-230. <http://hdl.handle.net/2027.42/42330> | en_US |
dc.identifier.issn | 1066-8888 | en_US |
dc.identifier.uri | https://hdl.handle.net/2027.42/42330 | |
dc.description.abstract | With the increasing importance of XML, LDAP directories, and text-based information sources on the Internet, there is an ever-greater need to evaluate queries involving (sub)string matching. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the multiple dimensions. Effective query optimization in this context requires good selectivity estimates. In this paper, we use pruned count-suffix trees (PSTs) as the basic data structure for substring selectivity estimation. For the 1-D problem, we present a novel technique called MO (Maximal Overlap). We then develop and analyze two 1-D estimation algorithms, MOC and MOLC, based on MO and a constraint-based characterization of all possible completions of a given PST. For the k -D problem, we first generalize PSTs to multiple dimensions and develop a space- and time-efficient probabilistic algorithm to construct k -D PSTs directly. We then show how to extend MO to multiple dimensions. Finally, we demonstrate, both analytically and experimentally, that MO is both practical and substantially superior to competing algorithms. | en_US |
dc.format.extent | 215655 bytes | |
dc.format.extent | 3115 bytes | |
dc.format.mimetype | application/pdf | |
dc.format.mimetype | text/plain | |
dc.language.iso | en_US | |
dc.publisher | Springer-Verlag; Springer-Verlag Berlin Heidelberg | en_US |
dc.subject.other | Key Words: String Selectivity – Maximal Overlap – Short Memory Property – Pruned Count-suffix Tree | en_US |
dc.subject.other | Legacy | en_US |
dc.title | One-dimensional and multi-dimensional substring selectivity estimation | en_US |
dc.type | Article | en_US |
dc.subject.hlbsecondlevel | Philosophy | en_US |
dc.subject.hlbsecondlevel | Computer Science | en_US |
dc.subject.hlbtoplevel | Humanities | en_US |
dc.subject.hlbtoplevel | Engineering | en_US |
dc.description.peerreviewed | Peer Reviewed | en_US |
dc.contributor.affiliationum | University of Michigan, Ann Arbor; E-mail: jag@umich.edu, US | en_US |
dc.contributor.affiliationother | Pôle Universitaire Léonard de Vinci; E-mail: Olga.Kapitskaia@devinci.fr, FR | en_US |
dc.contributor.affiliationother | AT&T Labs – Research, 180 Park Avenue, Bldg 103, Florham Park, NJ 07932, USA; E-mail: divesh@research.att.com, US | en_US |
dc.contributor.affiliationother | University of British Columbia; E-mail: rng@cs.ubc.ca, CA | en_US |
dc.contributor.affiliationumcampus | Ann Arbor | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/42330/1/778-9-3-214_00090214.pdf | en_US |
dc.identifier.doi | http://dx.doi.org/10.1007/s007780000029 | en_US |
dc.identifier.source | The VLDB Journal | en_US |
dc.owningcollname | Interdisciplinary and Peer-Reviewed |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.