TY - JOUR
T1 - Cross-platform analysis of cancer biomarkers
T2 - A Bayesian network approach to incorporating mass spectrometry and microarray data
AU - Deng, Xutao
AU - Geng, Huimin
AU - Ali, Hesham H.
N1 - Funding Information:
This work was supported by the NIH grant number P20 RR16469 from the IMBRE program of the National Center for Research Resources.
PY - 2007
Y1 - 2007
N2 - Many studies showed inconsistent cancer biomarkers due to bioinformatics artifacts. In this paper we use multiple data sets from microarrays, mass spectrometry, protein sequences, and other biological knowledge in order to improve the reliability of cancer biomarkers. We present a novel Bayesian network (BN) model which integrates and cross-annotates multiple data sets related to prostate cancer. The main contribution of this study is that we provide a method that is designed to find cancer biomarkers whose presence is supported by multiple data sources and biological knowledge. Relevant biological knowledge is explicitly encoded into the model parameters, and the biomarker finding problem is formulated as a Bayesian inference problem. Besides diagnostic accuracy, we introduce reliability as another quality measurement of the biological relevance of biomarkers. Based on the proposed BN model, we develop an empirical scoring scheme and a simulation algorithm for inferring biomarkers. Fourteen genes/proteins including prostate specific antigen (PSA) are identified as reliable serum biomarkers which are insensitive to the model assumptions. The computational results show that our method is able to find biologically relevant biomarkers with highest reliability while maintaining competitive predictive power. In addition, by combining biological knowledge and data from multiple platforms, the number of putative biomarkers is greatly reduced to allow more-focused clinical studies.
AB - Many studies showed inconsistent cancer biomarkers due to bioinformatics artifacts. In this paper we use multiple data sets from microarrays, mass spectrometry, protein sequences, and other biological knowledge in order to improve the reliability of cancer biomarkers. We present a novel Bayesian network (BN) model which integrates and cross-annotates multiple data sets related to prostate cancer. The main contribution of this study is that we provide a method that is designed to find cancer biomarkers whose presence is supported by multiple data sources and biological knowledge. Relevant biological knowledge is explicitly encoded into the model parameters, and the biomarker finding problem is formulated as a Bayesian inference problem. Besides diagnostic accuracy, we introduce reliability as another quality measurement of the biological relevance of biomarkers. Based on the proposed BN model, we develop an empirical scoring scheme and a simulation algorithm for inferring biomarkers. Fourteen genes/proteins including prostate specific antigen (PSA) are identified as reliable serum biomarkers which are insensitive to the model assumptions. The computational results show that our method is able to find biologically relevant biomarkers with highest reliability while maintaining competitive predictive power. In addition, by combining biological knowledge and data from multiple platforms, the number of putative biomarkers is greatly reduced to allow more-focused clinical studies.
UR - http://www.scopus.com/inward/record.url?scp=43049127535&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=43049127535&partnerID=8YFLogxK
U2 - 10.1177/117693510700300001
DO - 10.1177/117693510700300001
M3 - Article
C2 - 19455243
AN - SCOPUS:43049127535
SN - 1176-9351
VL - 3
SP - 183
EP - 202
JO - Cancer Informatics
JF - Cancer Informatics
ER -