Abstract
In this work, we investigate parallel I/O efficiencies in parallelised BLAST, the most popular tool for searching similarity in biological databases and implement two variations by incorporating the PVFS and CEFT-PVFS parallel I/O facilities. Our goal is to study the performance gain from parallel I/O under the constraints of different numbers of commodity storage devices in a Linux cluster. We also evaluate two read performance- optimisation techniques employed in CEFT-PVFS: (1) doubling the degree of parallelism is shown to have comparable read performance with respect to PVFS when both systems have the same number of servers; (2) skipping hot-spot nodes can reduce the performance penalty when I/O workloads are highly imbalanced. The I/O resource contention between multiple applications, running in the same cluster, can degrade the performance of the original parallel BLAST and the PVFS version up to 10- and 21-fold, respectively; whereas, the one based on CEFT-PVFS, which has the ability to skip hot-spot nodes, suffered only a two-fold performance degradation.
Original language | English (US) |
---|---|
Pages (from-to) | 214-222 |
Number of pages | 9 |
Journal | International Journal of High Performance Computing and Networking |
Volume | 1 |
Issue number | 4 |
DOIs | |
State | Published - 2004 |
Keywords
- BLAST
- CEFT-PVFS
- PVFS
- bioinformatics
- cluster computing
- parallel I/O
- sequence comparison
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Computer Networks and Communications