TY - GEN
T1 - Evaluating distributed platforms for protein-guided scientific workflow
AU - Pavlovikj, Natasha
AU - Begcy, Kevin
AU - Behera, Sairam
AU - Campbell, Malachy
AU - Walia, Harkamal
AU - Deogun, Jitender S.
PY - 2014
Y1 - 2014
N2 - Complex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, including campus clusters, grids and clouds are used for efficient execution of these workflows. In this paper we examine the performance and the cost of running the Pegasus Workflow Management System (Pegasus WMS) implementation of blast2cap3, the protein-guided assembly approach, on three different execution platforms: Sandhills, the University of Nebraska Campus Cluster, the academic grid Open Science Gird (OSG), and the commercial cloud Amazon EC2. Furthermore, the behavior of the blast2cap3 workflow was tested with different number of tasks. For the used workflows and execution platforms, we perform multiple runs in order to compare the total workflow running time, as well as the different resource availability over time. Additionally, for the most interesting runs, the number of running versus the number of idle jobs over time was analyzed for each platform. The performed experiments show that using the Pegasus WMS implementation of blast2cap3 with more than 100 tasks significantly reduces the running time for all execution platforms. In general, for our workflow, better performance and resource usage were achieved when Amazon EC2 was used as an execution platform. However, due to the Amazon EC2 cost, the academic distributed systems can sometimes be a good alternative and have excellent performance, especially when there are plenty of resources available.
AB - Complex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, including campus clusters, grids and clouds are used for efficient execution of these workflows. In this paper we examine the performance and the cost of running the Pegasus Workflow Management System (Pegasus WMS) implementation of blast2cap3, the protein-guided assembly approach, on three different execution platforms: Sandhills, the University of Nebraska Campus Cluster, the academic grid Open Science Gird (OSG), and the commercial cloud Amazon EC2. Furthermore, the behavior of the blast2cap3 workflow was tested with different number of tasks. For the used workflows and execution platforms, we perform multiple runs in order to compare the total workflow running time, as well as the different resource availability over time. Additionally, for the most interesting runs, the number of running versus the number of idle jobs over time was analyzed for each platform. The performed experiments show that using the Pegasus WMS implementation of blast2cap3 with more than 100 tasks significantly reduces the running time for all execution platforms. In general, for our workflow, better performance and resource usage were achieved when Amazon EC2 was used as an execution platform. However, due to the Amazon EC2 cost, the academic distributed systems can sometimes be a good alternative and have excellent performance, especially when there are plenty of resources available.
KW - Amazon EC2
KW - Blast2cap3
KW - Campus cluster
KW - Open science grid
KW - Pegasus workflow management system
KW - Scientific workflow
UR - http://www.scopus.com/inward/record.url?scp=84905482237&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84905482237&partnerID=8YFLogxK
U2 - 10.1145/2616498.2616551
DO - 10.1145/2616498.2616551
M3 - Conference contribution
AN - SCOPUS:84905482237
SN - 9781450328937
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the XSEDE 2014 Conference
PB - Association for Computing Machinery
T2 - 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2014
Y2 - 13 July 2014 through 18 July 2014
ER -