A case study of parallel I/O for biological sequence search on Linux clusters

Yifeng Zhu, Hong Jiang, Xiao Qin, David Swanson

Research output: Contribution to journalArticle

4 Scopus citations

Abstract

In this work, we investigate parallel I/O efficiencies in parallelised BLAST, the most popular tool for searching similarity in biological databases and implement two variations by incorporating the PVFS and CEFT-PVFS parallel I/O facilities. Our goal is to study the performance gain from parallel I/O under the constraints of different numbers of commodity storage devices in a Linux cluster. We also evaluate two read performance- optimisation techniques employed in CEFT-PVFS: (1) doubling the degree of parallelism is shown to have comparable read performance with respect to PVFS when both systems have the same number of servers; (2) skipping hot-spot nodes can reduce the performance penalty when I/O workloads are highly imbalanced. The I/O resource contention between multiple applications, running in the same cluster, can degrade the performance of the original parallel BLAST and the PVFS version up to 10- and 21-fold, respectively; whereas, the one based on CEFT-PVFS, which has the ability to skip hot-spot nodes, suffered only a two-fold performance degradation.

Original languageEnglish (US)
Pages (from-to)214-222
Number of pages9
JournalInternational Journal of High Performance Computing and Networking
Volume1
Issue number4
DOIs
StatePublished - 2004

Keywords

  • BLAST
  • CEFT-PVFS
  • PVFS
  • bioinformatics
  • cluster computing
  • parallel I/O
  • sequence comparison

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'A case study of parallel I/O for biological sequence search on Linux clusters'. Together they form a unique fingerprint.

  • Cite this