Next generation sequence assembler mis-assembly of phage genomes with terminal redundancy

Julia Warnke-Sommer, Ishwor Thapa, Hesham Ali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Next generation sequencing (NGS) has become the platform of numerous biomedical applications. The study of viral genomes using NGS technologies has led to the characterization of viral species in numerous environments including the human gut microbiome and plant hosts. Many viral genomes are circular or have terminally redundant ends. Circular or linear viral genomes with indeterminate starting and ending points pose a challenge for NGS assemblers, which may erroneously duplicate sections of these genomes. The length of an assembly, often characterized by the N50 length, is frequently used as an indication of an assembly's completeness and even quality. In this paper, we show that the longest contig produced by various assemblers is not always the best assembly for circular or terminally redundant phage genomes and may represent erroneously repeated genomic regions. Results demonstrate that assembly tools may even produce assembled genomes of different lengths for the same species, depending on content inaccurately repeated, leading to results that might be confusing to or inaccurately used by a researcher. To overcome this problem, we introduce strategies for using coverage depth to identify inaccurately repeated content in circular or terminally redundant phage genomes. We conclude the paper by providing the results of assembling two bacteriophage genomes and a bacteriophage metagenomics dataset, highlighting the impact of using the proposed strategies.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015
Editorslng. Matthieu Schapranow, Jiayu Zhou, Xiaohua Tony Hu, Bin Ma, Sanguthevar Rajasekaran, Satoru Miyano, Illhoi Yoo, Brian Pierce, Amarda Shehu, Vijay K. Gombar, Brian Chen, Vinay Pai, Jun Huan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1102-1108
Number of pages7
ISBN (Electronic)9781467367981
DOIs
StatePublished - Dec 16 2015
EventIEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015 - Washington, United States
Duration: Nov 9 2015Nov 12 2015

Publication series

NameProceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015

Other

OtherIEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015
CountryUnited States
CityWashington
Period11/9/1511/12/15

    Fingerprint

Keywords

  • Assembly validation
  • Next generation sequencing
  • Viral genome assembly

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Health Informatics
  • Biomedical Engineering

Cite this

Warnke-Sommer, J., Thapa, I., & Ali, H. (2015). Next generation sequence assembler mis-assembly of phage genomes with terminal redundancy. In L. M. Schapranow, J. Zhou, X. T. Hu, B. Ma, S. Rajasekaran, S. Miyano, I. Yoo, B. Pierce, A. Shehu, V. K. Gombar, B. Chen, V. Pai, & J. Huan (Eds.), Proceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015 (pp. 1102-1108). [7359836] (Proceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIBM.2015.7359836