TY - GEN
T1 - On the integration of assembly and non-assembly approaches for comparing biological sequences
AU - Dam, Vi
AU - Ali, Hesham H.
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/15
Y1 - 2017/12/15
N2 - As Next Generation Sequencing (NGS) technologies continue to expand rapidly, the need to assemble and manipulate NGS data, available in the form of short genomic reads, remains the primary source of biological data in many Bioinformatics applications. As a result, many assemblers have been developed to assemble NSG short reads into long genomic sequences or contigs ready for advanced analysis such as Whole Genome Wide Studies (GWAS). However, the lack of high levels of robustness and reproducibility continue to limit the impact of Bioinformatics research and many biomedical researchers remain skeptical of results obtained from bioinformatics applications. In this study, we conduct a comparative study of various widely used assemblers and compare their performances using several NGS datasets associated with various organisms. We highlight the advantages and disadvantage of each assembler and explore the factors that impact the performance of each approach. In addition, we survey the assembly-free compression approach recently developed to process NGS short reads to analyze their performance in comparing genomic sequences represented by sets of short reads. We use phylogeny trees obtained from simulated and real datasets to evaluate the accuracy of each assembly-free approach. We test the hypothesis that non-assembly approaches could potentially overcome the limitations and inaccuracies of assembly approaches in comparing sequences, especially for large read sizes. Moreover, we proposed a hybrid approach by integrating both assembly and non-assembly approach for classifying genomic sequences. The proposed approach incorporates results obtained from partially assembling short reads as input for assembly-free methods to complete the NGS manipulation process. Preliminary superior results show that the hybrid approach is potential in comparing genomic sequences.
AB - As Next Generation Sequencing (NGS) technologies continue to expand rapidly, the need to assemble and manipulate NGS data, available in the form of short genomic reads, remains the primary source of biological data in many Bioinformatics applications. As a result, many assemblers have been developed to assemble NSG short reads into long genomic sequences or contigs ready for advanced analysis such as Whole Genome Wide Studies (GWAS). However, the lack of high levels of robustness and reproducibility continue to limit the impact of Bioinformatics research and many biomedical researchers remain skeptical of results obtained from bioinformatics applications. In this study, we conduct a comparative study of various widely used assemblers and compare their performances using several NGS datasets associated with various organisms. We highlight the advantages and disadvantage of each assembler and explore the factors that impact the performance of each approach. In addition, we survey the assembly-free compression approach recently developed to process NGS short reads to analyze their performance in comparing genomic sequences represented by sets of short reads. We use phylogeny trees obtained from simulated and real datasets to evaluate the accuracy of each assembly-free approach. We test the hypothesis that non-assembly approaches could potentially overcome the limitations and inaccuracies of assembly approaches in comparing sequences, especially for large read sizes. Moreover, we proposed a hybrid approach by integrating both assembly and non-assembly approach for classifying genomic sequences. The proposed approach incorporates results obtained from partially assembling short reads as input for assembly-free methods to complete the NGS manipulation process. Preliminary superior results show that the hybrid approach is potential in comparing genomic sequences.
UR - http://www.scopus.com/inward/record.url?scp=85045961520&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045961520&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2017.8218007
DO - 10.1109/BIBM.2017.8218007
M3 - Conference contribution
AN - SCOPUS:85045961520
T3 - Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
SP - 2232
EP - 2234
BT - Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
A2 - Yoo, Illhoi
A2 - Zheng, Jane Huiru
A2 - Gong, Yang
A2 - Hu, Xiaohua Tony
A2 - Shyu, Chi-Ren
A2 - Bromberg, Yana
A2 - Gao, Jean
A2 - Korkin, Dmitry
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
Y2 - 13 November 2017 through 16 November 2017
ER -