TY - JOUR
T1 - Alignment-free genetic sequence comparisons
T2 - A review of recent approaches by word analysis
AU - Bonham-Carter, Oliver
AU - Steele, Joe
AU - Bastola, Dhundy
N1 - Publisher Copyright:
© The Author 2013. Published by Oxford University Press.
PY - 2013/8/2
Y1 - 2013/8/2
N2 - Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events.New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies.We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base - base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression.
AB - Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events.New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies.We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base - base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression.
KW - Alignment-free
KW - Information theory
KW - Sequence-alignment
KW - Word-analysis
UR - http://www.scopus.com/inward/record.url?scp=84913590574&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84913590574&partnerID=8YFLogxK
U2 - 10.1093/bib/bbt052
DO - 10.1093/bib/bbt052
M3 - Article
C2 - 23904502
AN - SCOPUS:84913590574
VL - 15
SP - 890
EP - 905
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
SN - 1467-5463
IS - 6
ER -