TY - GEN
T1 - Deep learning-based MSMS spectra reduction in support of running multiple protein search engines on cloud
AU - Maabreh, Majdi
AU - Qolomany, Basheer
AU - Alsmadi, Izzat
AU - Gupta, Ajay
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/15
Y1 - 2017/12/15
N2 - The diversity of the available protein search engines with respect to the utilized matching algorithms, the low overlap ratios among their results and the disparity of their coverage encourage the community of proteomics to utilize ensemble solutions of different search engines. The advancing in cloud computing technology and the availability of distributed processing clusters can also provide support to this task. However, data transferring and results' combining, in this case, could be the major bottleneck. The flood of billions of observed mass spectra, hundreds of Gigabytes or potentially Terabytes of data, could easily cause the congestions, increase the risk of failure, poor performance, add more computations' cost, and waste available resources. Therefore, in this study, we propose a deep learning model in order to mitigate the traffic over cloud network and, thus reduce the cost of cloud computing. The model, which depends on the top 50 intensities and their m/z values of each spectrum, removes any spectrum which is predicted not to pass the majority voting of the participated search engines. Our results using three search engines namely: pFind, Comet and X!Tandem, and four different datasets are promising and promote the investment in deep learning to solve such type of Big data problems.
AB - The diversity of the available protein search engines with respect to the utilized matching algorithms, the low overlap ratios among their results and the disparity of their coverage encourage the community of proteomics to utilize ensemble solutions of different search engines. The advancing in cloud computing technology and the availability of distributed processing clusters can also provide support to this task. However, data transferring and results' combining, in this case, could be the major bottleneck. The flood of billions of observed mass spectra, hundreds of Gigabytes or potentially Terabytes of data, could easily cause the congestions, increase the risk of failure, poor performance, add more computations' cost, and waste available resources. Therefore, in this study, we propose a deep learning model in order to mitigate the traffic over cloud network and, thus reduce the cost of cloud computing. The model, which depends on the top 50 intensities and their m/z values of each spectrum, removes any spectrum which is predicted not to pass the majority voting of the participated search engines. Our results using three search engines namely: pFind, Comet and X!Tandem, and four different datasets are promising and promote the investment in deep learning to solve such type of Big data problems.
KW - Cloud Computing
KW - Data Reduction
KW - Deep Learning
KW - Network Traffic
KW - Protein Search Engine
KW - Searching Space Reduction
UR - http://www.scopus.com/inward/record.url?scp=85046029372&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046029372&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2017.8217951
DO - 10.1109/BIBM.2017.8217951
M3 - Conference contribution
AN - SCOPUS:85046029372
T3 - Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
SP - 1909
EP - 1914
BT - Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
A2 - Yoo, Illhoi
A2 - Zheng, Jane Huiru
A2 - Gong, Yang
A2 - Hu, Xiaohua Tony
A2 - Shyu, Chi-Ren
A2 - Bromberg, Yana
A2 - Gao, Jean
A2 - Korkin, Dmitry
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
Y2 - 13 November 2017 through 16 November 2017
ER -