TY - GEN
T1 - An energy-aware bioinformatics application for assembling short reads in high performance computing systems
AU - Warnke, Julia
AU - Pawaskar, Sachin
AU - Ali, Hesham
PY - 2012
Y1 - 2012
N2 - Current biomedical technologies are producing massive amounts of data on an unprecedented scale. The increasing complexity and growth rate of biological data has made bioinformatics data processing and analysis a key and computationally intensive task. High performance computing (HPC) has been successfully applied to major bioinformatics applications to reduce computational burden. However, a naïve approach for developing parallel bioinformatics applications may achieve a high degree of parallelism while unnecessarily expending computational resources and consuming high levels of energy. As the wealth of biological data and associated computational burden continues to increase, there has become a need for the development of energy efficient computational approaches in the bioinformatics domain. To address this issue, we have developed an energy-aware scheduling (EAS) model to run computationally intensive applications that takes both deadline requirements and energy factors into consideration. An example of a computationally demanding process that would benefit from our scheduling model is the assembly of short sequencing reads produced by next generation sequencing technologies. Next generation sequencing produces a very large number of short DNA reads from a biological sample. Multiple overlapping fragments must be aligned and merged into long stretches of contiguous sequence before any useful information can be gathered. The assembly problem is extremely difficult due to the complex nature of underlying genome structure and inherent biological error present in current sequencing technologies. We apply our EAS model to a newly proposed assembly algorithm called Merge and Traverse, giving us the ability to generate speedup profiles. Our EAS model was also able to dynamically adjust the number of nodes needed to meet given deadlines for different sets of reads.
AB - Current biomedical technologies are producing massive amounts of data on an unprecedented scale. The increasing complexity and growth rate of biological data has made bioinformatics data processing and analysis a key and computationally intensive task. High performance computing (HPC) has been successfully applied to major bioinformatics applications to reduce computational burden. However, a naïve approach for developing parallel bioinformatics applications may achieve a high degree of parallelism while unnecessarily expending computational resources and consuming high levels of energy. As the wealth of biological data and associated computational burden continues to increase, there has become a need for the development of energy efficient computational approaches in the bioinformatics domain. To address this issue, we have developed an energy-aware scheduling (EAS) model to run computationally intensive applications that takes both deadline requirements and energy factors into consideration. An example of a computationally demanding process that would benefit from our scheduling model is the assembly of short sequencing reads produced by next generation sequencing technologies. Next generation sequencing produces a very large number of short DNA reads from a biological sample. Multiple overlapping fragments must be aligned and merged into long stretches of contiguous sequence before any useful information can be gathered. The assembly problem is extremely difficult due to the complex nature of underlying genome structure and inherent biological error present in current sequencing technologies. We apply our EAS model to a newly proposed assembly algorithm called Merge and Traverse, giving us the ability to generate speedup profiles. Our EAS model was also able to dynamically adjust the number of nodes needed to meet given deadlines for different sets of reads.
KW - Energy aware scheduling
KW - genome assembly
KW - high performance computing
KW - next generation sequencing
UR - http://www.scopus.com/inward/record.url?scp=84867018924&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84867018924&partnerID=8YFLogxK
U2 - 10.1109/HPCSim.2012.6266905
DO - 10.1109/HPCSim.2012.6266905
M3 - Conference contribution
AN - SCOPUS:84867018924
SN - 9781467323598
T3 - Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012
SP - 154
EP - 160
BT - Proceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012
T2 - 2012 10th Annual International Conference on High Performance Computing and Simulation, HPCS 2012
Y2 - 2 July 2012 through 6 July 2012
ER -