TY - GEN
T1 - On gene prediction by cross-species comparative sequence analysis
AU - Chen, R.
AU - Ali, H.
PY - 2003
Y1 - 2003
N2 - Sequencing of large fragments of genomic DNA makes it possible to perform comparisons of genomic sequences for identification of protein-coding regions. We have conducted a comparative analysis of homologous genomic sequences of organisms with different evolutionary distances and determined the degree of conservation of the noncoding regions between closely related organisms. In contrast, more distance shows much less intron similarity but less conservation on the exon structures. Based on this finding and training of data sets, we proposed a model by which coding sequences could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available.
AB - Sequencing of large fragments of genomic DNA makes it possible to perform comparisons of genomic sequences for identification of protein-coding regions. We have conducted a comparative analysis of homologous genomic sequences of organisms with different evolutionary distances and determined the degree of conservation of the noncoding regions between closely related organisms. In contrast, more distance shows much less intron similarity but less conservation on the exon structures. Based on this finding and training of data sets, we proposed a model by which coding sequences could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available.
UR - http://www.scopus.com/inward/record.url?scp=84960388611&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84960388611&partnerID=8YFLogxK
U2 - 10.1109/CSB.2003.1227366
DO - 10.1109/CSB.2003.1227366
M3 - Conference contribution
AN - SCOPUS:84960388611
T3 - Proceedings of the 2003 IEEE Bioinformatics Conference, CSB 2003
SP - 446
EP - 447
BT - Proceedings of the 2003 IEEE Bioinformatics Conference, CSB 2003
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd International IEEE Computer Society Computational Systems Bioinformatics Conference, CSB 2003
Y2 - 11 August 2003 through 14 August 2003
ER -