Here we provide a pragmatic, high-level overview of the computational approaches and tools for the discovery of transcription factor binding sites. Unraveling transcription regulatory networks and their malfunctions such as cancer became feasible due to recent stellar progress in experimental techniques and computational analyses. While predictions of isolated sites still pose notorious challenges, cis-regulatory modules (clusters) of binding sites can now be identified with high accuracy. Further support comes from conserved DNA segments, co-regulation, transposable elements, nucleosomes, and three-dimensional chromosomal structures. We introduce computational tools for the analysis and interpretation of chromatin immunoprecipitation, next-generation sequencing, SELEX, and protein-binding microarray results. Because immunoprecipitation produces overly large DNA segments and well over half of the sequencing reads from constitute background noise, methods are presented for background correction, sequence read mapping, peak calling, false discovery rate estimation, and co-localization analyses. To discover short binding site motifs from extensive immunoprecipitation segments, we recommend algorithms and software based on expectation maximization and Gibbs sampling. Data integration using several databases further improves performance. Binding sites can be visualized in genomic and chromatin context using genome browsers. Binding site information, integrated with co-expression in large compendia of gene expression experiments, allows us to reveal complex transcriptional regulatory networks.
ASJC Scopus subject areas
- Molecular Biology