TY - JOUR
T1 - Joint learning of gene functions - A Bayesian network model approach
AU - Deng, Xutao
AU - Geng, Huimin
AU - Ali, Hesham H.
N1 - Funding Information:
Thanks to Michael Eisen, Paul Pavlidis and all those who deposit their experimental data in public databases and to those who maintain these databases. This work was supported by the NIH grant number P20 RR16469 from the IMBRE program of the national center for research resource.
PY - 2006/4
Y1 - 2006/4
N2 - In this paper, we develop a machine learning system for determining gene functions from heterogeneous data sources using a Weighted Naive Bayesian network (WNB). The knowledge of gene functions is crucial for understanding many fundamental biological mechanisms such as regulatory pathways, cell cycles and diseases. Our major goal is to accurately infer functions of putative genes or Open Reading Frames (ORFs) from existing databases using computational methods. However, this task is intrinsically difficult since the underlying biological processes represent complex interactions of multiple entities. Therefore, many functional links would be missing when only one or two sources of data are used in the prediction. Our hypothesis is that integrating evidence from multiple and complementary sources could significantly improve the prediction accuracy. In this paper, our experimental results not only suggest that the above hypothesis is valid, but also provide guidelines for using the WNB system for data collection, training and predictions. The combined training data sets contain information from gene annotations, gene expressions, clustering outputs, keyword annotations, and sequence homology from public databases. The current system is trained and tested on the genes of budding yeast Saccharomyces cerevisiae. Our WNB model can also be used to analyze the contribution of each source of information toward the prediction performance through the weight training process. The contribution analysis could potentially lead to significant scientific discovery by facilitating the interpretation and understanding of the complex relationships between biological entities.
AB - In this paper, we develop a machine learning system for determining gene functions from heterogeneous data sources using a Weighted Naive Bayesian network (WNB). The knowledge of gene functions is crucial for understanding many fundamental biological mechanisms such as regulatory pathways, cell cycles and diseases. Our major goal is to accurately infer functions of putative genes or Open Reading Frames (ORFs) from existing databases using computational methods. However, this task is intrinsically difficult since the underlying biological processes represent complex interactions of multiple entities. Therefore, many functional links would be missing when only one or two sources of data are used in the prediction. Our hypothesis is that integrating evidence from multiple and complementary sources could significantly improve the prediction accuracy. In this paper, our experimental results not only suggest that the above hypothesis is valid, but also provide guidelines for using the WNB system for data collection, training and predictions. The combined training data sets contain information from gene annotations, gene expressions, clustering outputs, keyword annotations, and sequence homology from public databases. The current system is trained and tested on the genes of budding yeast Saccharomyces cerevisiae. Our WNB model can also be used to analyze the contribution of each source of information toward the prediction performance through the weight training process. The contribution analysis could potentially lead to significant scientific discovery by facilitating the interpretation and understanding of the complex relationships between biological entities.
KW - Bayesian network
KW - Gene function prediction
KW - Machine learning
KW - Yeast
UR - http://www.scopus.com/inward/record.url?scp=33745726583&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745726583&partnerID=8YFLogxK
U2 - 10.1142/S0219720006001928
DO - 10.1142/S0219720006001928
M3 - Article
C2 - 16819781
AN - SCOPUS:33745726583
SN - 0219-7200
VL - 4
SP - 217
EP - 239
JO - Journal of Bioinformatics and Computational Biology
JF - Journal of Bioinformatics and Computational Biology
IS - 2
ER -