One-dimensional and multi-dimensional substring selectivity estimation

Kapitskaia, Olga; Srivastava, Divesh; Jagadish, H. V.; Ng, Raymond T.

One-dimensional and multi-dimensional substring selectivity estimation

dc.contributor.author	Kapitskaia, Olga	en_US
dc.contributor.author	Srivastava, Divesh	en_US
dc.contributor.author	Jagadish, H. V.	en_US
dc.contributor.author	Ng, Raymond T.	en_US
dc.date.accessioned	2006-09-08T20:13:26Z
dc.date.available	2006-09-08T20:13:26Z
dc.date.issued	2000-12	en_US
dc.identifier.citation	Jagadish, H.V.; Kapitskaia, Olga; Ng, Raymond T.; Srivastava, Divesh; (2000). "One-dimensional and multi-dimensional substring selectivity estimation." The VLDB Journal 9(3): 214-230. <http://hdl.handle.net/2027.42/42330>	en_US
dc.identifier.issn	1066-8888	en_US
dc.identifier.uri	https://hdl.handle.net/2027.42/42330
dc.description.abstract	With the increasing importance of XML, LDAP directories, and text-based information sources on the Internet, there is an ever-greater need to evaluate queries involving (sub)string matching. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the multiple dimensions. Effective query optimization in this context requires good selectivity estimates. In this paper, we use pruned count-suffix trees (PSTs) as the basic data structure for substring selectivity estimation. For the 1-D problem, we present a novel technique called MO (Maximal Overlap). We then develop and analyze two 1-D estimation algorithms, MOC and MOLC, based on MO and a constraint-based characterization of all possible completions of a given PST. For the k -D problem, we first generalize PSTs to multiple dimensions and develop a space- and time-efficient probabilistic algorithm to construct k -D PSTs directly. We then show how to extend MO to multiple dimensions. Finally, we demonstrate, both analytically and experimentally, that MO is both practical and substantially superior to competing algorithms.	en_US
dc.format.extent	215655 bytes
dc.format.extent	3115 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	text/plain
dc.language.iso	en_US
dc.publisher	Springer-Verlag; Springer-Verlag Berlin Heidelberg	en_US
dc.subject.other	Key Words: String Selectivity – Maximal Overlap – Short Memory Property – Pruned Count-suffix Tree	en_US
dc.subject.other	Legacy	en_US
dc.title	One-dimensional and multi-dimensional substring selectivity estimation	en_US
dc.type	Article	en_US
dc.subject.hlbsecondlevel	Philosophy	en_US
dc.subject.hlbsecondlevel	Computer Science	en_US
dc.subject.hlbtoplevel	Humanities	en_US
dc.subject.hlbtoplevel	Engineering	en_US
dc.description.peerreviewed	Peer Reviewed	en_US
dc.contributor.affiliationum	University of Michigan, Ann Arbor; E-mail: jag@umich.edu, US	en_US
dc.contributor.affiliationother	Pôle Universitaire Léonard de Vinci; E-mail: Olga.Kapitskaia@devinci.fr, FR	en_US
dc.contributor.affiliationother	AT&T Labs – Research, 180 Park Avenue, Bldg 103, Florham Park, NJ 07932, USA; E-mail: divesh@research.att.com, US	en_US
dc.contributor.affiliationother	University of British Columbia; E-mail: rng@cs.ubc.ca, CA	en_US
dc.contributor.affiliationumcampus	Ann Arbor	en_US
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/42330/1/778-9-3-214_00090214.pdf	en_US
dc.identifier.doi	http://dx.doi.org/10.1007/s007780000029	en_US
dc.identifier.source	The VLDB Journal	en_US
dc.owningcollname	Interdisciplinary and Peer-Reviewed

Files in this item

Name:: 778-9-3-214_00090214.pdf
Size:: 210.6KB
Format:: PDF

View/Open

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.