TY - GEN
T1 - Identification of Mycobacterium species using curated custom databases
AU - Kuyper, Dan
AU - Ali, Hesham H.
AU - Mohamed, Amr M.
AU - Hinrichs, Steven H.
PY - 2004
Y1 - 2004
N2 - Advances in molecular biology have resulted in the development of diagnostic tests for infectious diseases based on genetic profiles. While probe based assays dominate the field today, sequence based assays hold great promise for the future. However, the variability in quality of sequence information currently present in public databases limits the potential growth and use of sequence based analysis. To address this problem a standardized method for DNA sequence validation and building of custom databases was developed using Mycobacterium as a development model. With this model, a computational approach to identification of infectious diseases was developed and evaluated. The web-based application, termed BioDatabase, accomplished genetic sequence identification via the creation of curated databases containing a relatively small set of genetic data specific to a species or group. The process for creation of the custom database included multiple steps beginning with identification of highly conserved start and end sequences and intervening sequence validation parameters. The process eliminated the need for multiple sequence alignment with GenBank sequences, whose information is valuable, yet difficult to properly utilize due to its size and quality. The custom database approach maximized application performance with minimal impact on analysis response time, allowing investigation of optimal sequences for identification of all Mycobacterium to the species level. In comparison to the 16S and ITS genetic regions, a curated ITS based approach proved most effective for identification of Mycobacterium isolates.
AB - Advances in molecular biology have resulted in the development of diagnostic tests for infectious diseases based on genetic profiles. While probe based assays dominate the field today, sequence based assays hold great promise for the future. However, the variability in quality of sequence information currently present in public databases limits the potential growth and use of sequence based analysis. To address this problem a standardized method for DNA sequence validation and building of custom databases was developed using Mycobacterium as a development model. With this model, a computational approach to identification of infectious diseases was developed and evaluated. The web-based application, termed BioDatabase, accomplished genetic sequence identification via the creation of curated databases containing a relatively small set of genetic data specific to a species or group. The process for creation of the custom database included multiple steps beginning with identification of highly conserved start and end sequences and intervening sequence validation parameters. The process eliminated the need for multiple sequence alignment with GenBank sequences, whose information is valuable, yet difficult to properly utilize due to its size and quality. The custom database approach maximized application performance with minimal impact on analysis response time, allowing investigation of optimal sequences for identification of all Mycobacterium to the species level. In comparison to the 16S and ITS genetic regions, a curated ITS based approach proved most effective for identification of Mycobacterium isolates.
UR - http://www.scopus.com/inward/record.url?scp=12444297543&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=12444297543&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:12444297543
SN - 0769521320
SN - 9780769521329
T3 - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)
SP - 2679
EP - 2685
BT - Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)
T2 - Proceedings - 18th International Parallel and Distributed Processing Symposium, IPDPS 2004 (Abstracts and CD-ROM)
Y2 - 26 April 2004 through 30 April 2004
ER -