An energy-aware bioinformatics application for assembling short reads in high performance computing systems

Julia Warnke, Sachin Pawaskar, Hesham Ali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Current biomedical technologies are producing massive amounts of data on an unprecedented scale. The increasing complexity and growth rate of biological data has made bioinformatics data processing and analysis a key and computationally intensive task. High performance computing (HPC) has been successfully applied to major bioinformatics applications to reduce computational burden. However, a naïve approach for developing parallel bioinformatics applications may achieve a high degree of parallelism while unnecessarily expending computational resources and consuming high levels of energy. As the wealth of biological data and associated computational burden continues to increase, there has become a need for the development of energy efficient computational approaches in the bioinformatics domain. To address this issue, we have developed an energy-aware scheduling (EAS) model to run computationally intensive applications that takes both deadline requirements and energy factors into consideration. An example of a computationally demanding process that would benefit from our scheduling model is the assembly of short sequencing reads produced by next generation sequencing technologies. Next generation sequencing produces a very large number of short DNA reads from a biological sample. Multiple overlapping fragments must be aligned and merged into long stretches of contiguous sequence before any useful information can be gathered. The assembly problem is extremely difficult due to the complex nature of underlying genome structure and inherent biological error present in current sequencing technologies. We apply our EAS model to a newly proposed assembly algorithm called Merge and Traverse, giving us the ability to generate speedup profiles. Our EAS model was also able to dynamically adjust the number of nodes needed to meet given deadlines for different sets of reads.

Original languageEnglish (US)
Title of host publicationProceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012
Pages154-160
Number of pages7
DOIs
StatePublished - 2012
Event2012 10th Annual International Conference on High Performance Computing and Simulation, HPCS 2012 - Madrid, Spain
Duration: Jul 2 2012Jul 6 2012

Publication series

NameProceedings of the 2012 International Conference on High Performance Computing and Simulation, HPCS 2012

Conference

Conference2012 10th Annual International Conference on High Performance Computing and Simulation, HPCS 2012
Country/TerritorySpain
CityMadrid
Period7/2/127/6/12

Keywords

  • Energy aware scheduling
  • genome assembly
  • high performance computing
  • next generation sequencing

ASJC Scopus subject areas

  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'An energy-aware bioinformatics application for assembling short reads in high performance computing systems'. Together they form a unique fingerprint.

Cite this