Genomic homogeneity is investigated for a broad base of DNA sequences in terms of dinucleotide relative abundance distances (abbreviated δ-distances) and of oligonucleotide compositional extremes. It is shown that δ-distances between different genomic sequences in the same species are low, only about 2 or 3 times the distance found in random DNA, and are generally smaller than the between-species δ-distances. Extremes in short oligonucleotides include underrepresentation of TpA and overrepresentation of GpC in most temperate bacteriophage sequences; underrepresentation of CTAG in most eubacterial genomes; underrepresentation of GATC in most bacteriophage; CpG suppression in vertebrates, in all animal mitochondrial genomes, and in many thermophilic bacterial sequences; and overrepresentation of GpG/CpC in all animal mitochondrial sets and chloroplast genomes. Interpretations center on DNA structures (dinucleotide stacking energies, DNA curvature and superhelicity, nucleosome organization), context-dependent mutational events, methylation effects, and processes of replication and repair.
|Original language||English (US)|
|Number of pages||5|
|Journal||Proceedings of the National Academy of Sciences of the United States of America|
|State||Published - Dec 20 1994|
ASJC Scopus subject areas