TY - GEN
T1 - A comparison of a campus cluster and open science grid platforms for protein-guided assembly using pegasus workflow management system
AU - Pavlovikj, Natasha
AU - Begcy, Kevin
AU - Behera, Sairam
AU - Campbell, Malachy
AU - Walia, Harkamal
AU - Deogun, Jitender S.
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/11/27
Y1 - 2014/11/27
N2 - Scientific workflows are a useful tool for managing large and complex computational tasks. Due to its intensive resource requirements, the scientific workflows are often executed on distributed platforms, including campus clusters, grids and clouds. In this paper we build a scientific workflow for blast2cap3, the protein-guided assembly, using the Pegasus Workflow Management System (Pegasus WMS). The modularity of blast2cap3 allows us to decompose the existing serial approach on multiple tasks, some of which can be run in parallel. Afterwards, this workflow is deployed on two distributed execution platforms: Sandhills, the University of Nebraska Campus Cluster, and the Open Science Grid (OSG). We compare and evaluate the performance of the built workflow for the both platforms. Furthermore, we also investigate the influence of the number of clusters of transcripts in the blast2cap3 workflow over the total running time. The performed experiments show that the Pegasus WMS implementation of blast2cap3 significantly reduces the running time compared to the current serial implementation of blast2cap3 for more than 95 %. Although OSG provides more computational resources than Sandhills, our workflow experimental runs have better running time on Sandhills. Moreover, the selection of 300 clusters of transcripts gives the optimum performance with the resources allocated from Sandhills.
AB - Scientific workflows are a useful tool for managing large and complex computational tasks. Due to its intensive resource requirements, the scientific workflows are often executed on distributed platforms, including campus clusters, grids and clouds. In this paper we build a scientific workflow for blast2cap3, the protein-guided assembly, using the Pegasus Workflow Management System (Pegasus WMS). The modularity of blast2cap3 allows us to decompose the existing serial approach on multiple tasks, some of which can be run in parallel. Afterwards, this workflow is deployed on two distributed execution platforms: Sandhills, the University of Nebraska Campus Cluster, and the Open Science Grid (OSG). We compare and evaluate the performance of the built workflow for the both platforms. Furthermore, we also investigate the influence of the number of clusters of transcripts in the blast2cap3 workflow over the total running time. The performed experiments show that the Pegasus WMS implementation of blast2cap3 significantly reduces the running time compared to the current serial implementation of blast2cap3 for more than 95 %. Although OSG provides more computational resources than Sandhills, our workflow experimental runs have better running time on Sandhills. Moreover, the selection of 300 clusters of transcripts gives the optimum performance with the resources allocated from Sandhills.
KW - Blast2cap3
KW - Campus cluster
KW - Open science grid
KW - Pegasus workflow management system
KW - Protein-guided assembly
KW - Scientific workflow
KW - Transcriptome assembly
UR - http://www.scopus.com/inward/record.url?scp=84918823980&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84918823980&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW.2014.66
DO - 10.1109/IPDPSW.2014.66
M3 - Conference contribution
AN - SCOPUS:84918823980
T3 - Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014
SP - 546
EP - 555
BT - Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014
PB - IEEE Computer Society
T2 - 28th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014
Y2 - 19 May 2014 through 23 May 2014
ER -