A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition

Oliver Bonham-Carter, Hesham Ali, Dhundy Bastola

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Motivation: In meta-genome sequencing and assembly projects, where there are different types of contigs mixed together in a single pool, the task of assembling its different organisms is a complex and challenging problem. It is therefore desirable to sort the contigs by origins into separate bins from which to work. We propose a framework of using the base compositions of bacterial restriction sites to generate sets of motifs which work to differentiate organismal groups, including the contigs from those groups. We introduce spectrum sets and show how to strategically select them for use in binning contigs from different organisms. We suggest that this framework can save time during a meta-genome sequencing and assembly project. Results: Our method is able to differentiate organisms and to successfully determine the association of the contigs which were derived from an organism. In particular, we show that two genera are fundamentally different by analyzing their motif proportions. Using one of the four total spectrum sets, which encompass all known restriction sites, we show that different sets have different abilities to distinguish sequences. In addition, we show that the selection of a spectrum set which is relevant to one organism, but not the other, greatly improves performance of differentiation, even when the contig size is short (1000bps). Conclusions: Using ten trials of newly selected contigs to confirm our premise, our study provides a proof of concept for a novel and computationally effective method for a preprocessing step in meta-genome sequencing and assembly tasks.

Original languageEnglish (US)
Title of host publicationProceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012
Pages696-703
Number of pages8
DOIs
StatePublished - 2012
Event2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012 - Philadelphia, PA, United States
Duration: Oct 4 2012Oct 7 2012

Publication series

NameProceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012

Conference

Conference2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2012
Country/TerritoryUnited States
CityPhiladelphia, PA
Period10/4/1210/7/12

Keywords

  • Base composition
  • Restriction sites
  • Spectrum sets
  • palindromes

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Fingerprint

Dive into the research topics of 'A meta-genome sequencing and assembly preprocessing algorithm inspired by restriction site base composition'. Together they form a unique fingerprint.

Cite this