TY - GEN
T1 - An efficient overlap graph coarsening approach for modeling short reads
AU - Warnke, Julia
AU - Ali, Hesham H.
PY - 2012
Y1 - 2012
N2 - Next generation sequencing has quickly emerged as the most exciting yet challenging computational problem in Bioinformatics. Current sequencing technologies are capable of producing several hundreds of thousands to several millions of short sequence reads in a single run. However, current methods for managing, storing, and processing the produced reads remain for the most part simple and lack the complexity needed to model the produced reads efficiently and assemble them correctly. These reads are produced at a high coverage of the original target sequence such that many reads overlap. The overlap relationships are used to align and merge reads into contiguous sequences called contigs. In this paper, we present an overlap graph coarsening scheme for modeling reads and their overlap relationships. Our approach is different from previous read analysis and assembly methods that use a single graph to model read overlap relationships. Instead, we use a series of graphs with different granularities of information to represent the complex read overlap relationships. We present a new graph coarsening algorithm for clustering a simulated metagenomics dataset at various levels of granularity. We also use the proposed graph coarsening scheme along with graph traversal algorithms to find a labeling of the overlap graph that allows for the efficient organization of nodes within the graph data structure.
AB - Next generation sequencing has quickly emerged as the most exciting yet challenging computational problem in Bioinformatics. Current sequencing technologies are capable of producing several hundreds of thousands to several millions of short sequence reads in a single run. However, current methods for managing, storing, and processing the produced reads remain for the most part simple and lack the complexity needed to model the produced reads efficiently and assemble them correctly. These reads are produced at a high coverage of the original target sequence such that many reads overlap. The overlap relationships are used to align and merge reads into contiguous sequences called contigs. In this paper, we present an overlap graph coarsening scheme for modeling reads and their overlap relationships. Our approach is different from previous read analysis and assembly methods that use a single graph to model read overlap relationships. Instead, we use a series of graphs with different granularities of information to represent the complex read overlap relationships. We present a new graph coarsening algorithm for clustering a simulated metagenomics dataset at various levels of granularity. We also use the proposed graph coarsening scheme along with graph traversal algorithms to find a labeling of the overlap graph that allows for the efficient organization of nodes within the graph data structure.
KW - Clustering
KW - Graph theory
KW - Next generation sequencing
KW - metagenomics
UR - http://www.scopus.com/inward/record.url?scp=84875628663&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84875628663&partnerID=8YFLogxK
U2 - 10.1109/BIBMW.2012.6470223
DO - 10.1109/BIBMW.2012.6470223
M3 - Conference contribution
AN - SCOPUS:84875628663
SN - 9781467327466
T3 - Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012
SP - 704
EP - 711
BT - Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012
T2 - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012
Y2 - 4 October 2012 through 7 October 2012
ER -