Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Natasha Pavlovikj, Joao Carlos Gomes-Neto, Andrew K. Benson

Research output: Contribution to journalArticlepeer-review


Routine and systematic use of bacterial whole-genome sequencing (WGS) is enhancing the accuracy and resolution of epidemiological investigations carried out by Public Health laboratories and regulatory agencies. Large volumes of publicly available WGS data can be used to study pathogenic populations at a large scale. Recently, a freely available computational platform called ProkEvo was published to enable reproducible, automated, and scalable hierarchical-based population genomic analyses using bacterial WGS data. This implementation of ProkEvo demonstrated the importance of combining standard genotypic mapping of populations with mining of accessory genomic content for ecological inference. In particular, the work highlighted here used ProkEvo-derived outputs for population-scaled hierarchical analyses using the R programming language. The main objective was to provide a practical guide for microbiologists, ecologists, and epidemiologists by showing how to: i) use a phylogeny-guided mapping of hierarchical genotypes; ii) assess frequency distributions of genotypes as a proxy for ecological fitness; iii) determine kinship relationships and genetic diversity using specific genotypic classifications; and iv) map lineage differentiating accessory loci. To enhance reproducibility and portability, R markdown files were used to demonstrate the entire analytical approach. The example dataset contained genomic data from 2,365 isolates of the zoonotic foodborne pathogen Salmonella Newport. Phylogeny-anchored mapping of hierarchical genotypes (Serovar -> BAPS1 -> ST -> cgMLST) revealed the population genetic structure, highlighting sequence types (STs) as the keystone differentiating genotype. Across the three most dominant lineages, ST5 and ST118 shared a common ancestor more recently than with the highly clonal ST45 phylotype. ST-based differences were further highlighted by the distribution of accessory antimicrobial resistance (AMR) loci. Lastly, a phylogeny-anchored visualization was used to combine hierarchical genotypes and AMR content to reveal the kinship structure and lineage-specific genomic signatures. Combined, this analytical approach provides some guidelines for conducting heuristic bacterial population genomic analyses using pan-genomic information.

Original languageEnglish (US)
JournalJournal of visualized experiments : JoVE
Issue number178
StatePublished - Dec 7 2021
Externally publishedYes

ASJC Scopus subject areas

  • Neuroscience(all)
  • Chemical Engineering(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)


Dive into the research topics of 'Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations'. Together they form a unique fingerprint.

Cite this