Detecting bacterial genomes in a metagenomic sample using NGS reads

Camilo Valdes, Meghan Brennan, Bertrand Clarke, Jennifer Clarke

Research output: Contribution to journalArticle

1 Scopus citations


We use a nucleotide flipping technique on whole genome next generation sequencing (NGS) data to test for the presence of various bacterial strains in a single metagenomic sample. Our technique is novel in that we induce artificial point mutations at the nucleotide level to define a test statistic for each genome on a given reference list. After finding a suitable nucleotide flipping rate, we use a variant of the Westfall-Young procedure to correct for multiple comparisons. When we align reads to reference genomes we permit fractional reads i.e., we weight the contribution of each read by one over the number of genomes to which it aligns. In a large scale simulation we characterize our method's performance on 'clean' data with respect to accuracy, genome lengths and genome abundances. Then, we apply our technique to real data from the Human Microbiome Project (HMP). We compare our results based on adjusted p-values with the HMP findings based on abundance, as assessed by coverage. The results from the two methods have substantial overlap; discrepancies can be explained by the inherent variability of the respective processing pipelines and data.

Original languageEnglish (US)
Pages (from-to)477-494
Number of pages18
JournalStatistics and its Interface
Issue number4
StatePublished - 2015


  • Artificial point mutations
  • Human microbiome project
  • Metagenomics
  • Multiple comparisons
  • Next generation sequencing
  • Nucleotide flipping

ASJC Scopus subject areas

  • Statistics and Probability
  • Applied Mathematics

Fingerprint Dive into the research topics of 'Detecting bacterial genomes in a metagenomic sample using NGS reads'. Together they form a unique fingerprint.

Cite this