Using disk based index and box queries for genome sequencing error correction

Gu, Y; Zhu, Q; Liu, X; Dong, Y; Brown, CT; Pramanik, S

Using disk based index and box queries for genome sequencing error correction

dc.contributor.author	Gu, Y
dc.contributor.author	Zhu, Q
dc.contributor.author	Liu, X
dc.contributor.author	Dong, Y
dc.contributor.author	Brown, CT
dc.contributor.author	Pramanik, S
dc.date.accessioned	2024-09-27T01:26:20Z
dc.date.available	2024-09-27T01:26:20Z
dc.date.issued	2016-01-01
dc.identifier.uri	https://hdl.handle.net/2027.42/195089
dc.description.abstract	The vast increase in DNA sequencing capacity over the last decade has quickly turned biology into a dataintensive science. Nevertheless, current sequencers such as Illumia HiSeq have high random per-base error rates, which makes sequencing error correction an indispensable requirement for many sequence analysis applications. Most existing error correction methods demand large expensive memory space, which limits their scalability for handling large datasets. In this paper, we present a new disk based method, called DiskBQcor, for sequencing error correction. DiskBQcor stores k-mers of sequencing genome data along with their associated metadata on inexpensive disk and utilizes a disk based index tree to efficiently process special box queries to obtain relevant k-mers and their occurring frequencies. It then applies a comprehensive voting mechanism and possibly an efficient binary encoding based assembly technique to verify and correct an erroneous base in a genome sequence under various conditions. Our experiments demonstrate that the proposed method is quite promising in error verification and correction for sequencing genome data on disk.
dc.title	Using disk based index and box queries for genome sequencing error correction
dc.type	Conference Paper
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/195089/2/BiCoB2016.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/24328
dc.identifier.source	Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016
dc.description.version	Published version
dc.date.updated	2024-09-27T01:26:17Z
dc.identifier.startpage	69
dc.identifier.endpage	76
dc.identifier.name-orcid	Gu, Y
dc.identifier.name-orcid	Zhu, Q
dc.identifier.name-orcid	Liu, X
dc.identifier.name-orcid	Dong, Y
dc.identifier.name-orcid	Brown, CT
dc.identifier.name-orcid	Pramanik, S
dc.working.doi	10.7302/24328	en
dc.owningcollname	Computer and Information Science, Department of (UM-Dearborn)

Files in this item

Name:: BiCoB2016.pdf
Size:: 3.656MB
Format:: PDF

View/Open

Computer and Information Science, Department of (UM-Dearborn)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.