TY - GEN
T1 - G protein-coupled receptor classification at the subfamily level with probabilistic suffix tree
AU - Yang, Jingyi
AU - Deogun, Jitender
PY - 2006
Y1 - 2006
N2 - Classifying G protein-coupled receptors (GPCRs) is an interesting topic because of the important role of GPCRs in pharmaceutical research. GPCRs have diverse functions and are involved in many biological processes, which makes them an ideal target of novel medicine. The diverse nature of GPCRs results in the lack of overall sequence homolog among the members, making the classification of GPCRs a very challenging task. Various approaches and methods have been applied to this task, such as HMM, decision tree, and SVM. However, their performances are not completely satisfactory. In this paper, we propose a new method to classify GPCRs into different subfamilies. In the proposed method, the probabilistic suffix tree (PST) is used to construct a prediction model for each of the subfamilies. To classify a GPCR protein, we calculate its similarity score against the PST prediction model of each subfamily using the multi-domain local prediction algorithm. The protein is then classified into the subfamily which gives it the highest score. Our method only uses the primary sequence information and is also very efficient. The model construction and prediction process takes very short time. However, it reports the 98.07% and 97.35% overall accuracy on the level I and II subfamily classification in a 2-fold cross validation test respectively. Given the high accuracy and efficiency, our method is a significant improvement on previously reported ones.
AB - Classifying G protein-coupled receptors (GPCRs) is an interesting topic because of the important role of GPCRs in pharmaceutical research. GPCRs have diverse functions and are involved in many biological processes, which makes them an ideal target of novel medicine. The diverse nature of GPCRs results in the lack of overall sequence homolog among the members, making the classification of GPCRs a very challenging task. Various approaches and methods have been applied to this task, such as HMM, decision tree, and SVM. However, their performances are not completely satisfactory. In this paper, we propose a new method to classify GPCRs into different subfamilies. In the proposed method, the probabilistic suffix tree (PST) is used to construct a prediction model for each of the subfamilies. To classify a GPCR protein, we calculate its similarity score against the PST prediction model of each subfamily using the multi-domain local prediction algorithm. The protein is then classified into the subfamily which gives it the highest score. Our method only uses the primary sequence information and is also very efficient. The model construction and prediction process takes very short time. However, it reports the 98.07% and 97.35% overall accuracy on the level I and II subfamily classification in a 2-fold cross validation test respectively. Given the high accuracy and efficiency, our method is a significant improvement on previously reported ones.
KW - GPCR protein classification
KW - Multi-domain local prediction
KW - Probabilistic suffix tree
UR - http://www.scopus.com/inward/record.url?scp=50249085915&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=50249085915&partnerID=8YFLogxK
U2 - 10.1109/CIBCB.2006.330976
DO - 10.1109/CIBCB.2006.330976
M3 - Conference contribution
AN - SCOPUS:50249085915
SN - 1424406234
SN - 9781424406234
T3 - Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06
SP - 490
EP - 497
BT - Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB'06
T2 - 3rd Computational Intelligence in Bioinformatics and Computational Biology Symposium, CIBCB
Y2 - 28 September 2006 through 29 September 2006
ER -