HOG: Distributed hadoop MapReduce on the grid

Chen He, Derek Weitzel, David Swanson, Ying Lu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

27 Scopus citations

Abstract

MapReduce is a powerful data processing platform for commercial and academic applications. In this paper, we build a novel Hadoop MapReduce framework executed on the Open Science Grid which spans multiple institutions across the United States - Hadoop On the Grid (HOG). It is different from previous MapReduce platforms that run on dedicated environments like clusters or clouds. HOG provides a free, elastic, and dynamic MapReduce environment on the opportunistic resources of the grid. In HOG, we improve Hadoop's fault tolerance for wide area data analysis by mapping data centers across the U.S. to virtual racks and creating multi-institution failure domains. Our modifications to the Hadoop framework are transparent to existing Hadoop MapReduce applications. In the evaluation, we successfully extend HOG to 1100 nodes on the grid. Additionally, we evaluate HOG with a simulated Facebook Hadoop MapReduce workload. We conclude that HOG's rapid scalability can provide comparable performance to a dedicated Hadoop cluster.

Original languageEnglish (US)
Title of host publicationProceedings - 2012 SC Companion
Subtitle of host publicationHigh Performance Computing, Networking Storage and Analysis, SCC 2012
Pages1276-1283
Number of pages8
DOIs
StatePublished - 2012
Externally publishedYes
Event2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012 - Salt Lake City, UT, United States
Duration: Nov 10 2012Nov 16 2012

Publication series

NameProceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012

Conference

Conference2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012
Country/TerritoryUnited States
CitySalt Lake City, UT
Period11/10/1211/16/12

Keywords

  • Grid computing
  • MapReduce
  • Middleware

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'HOG: Distributed hadoop MapReduce on the grid'. Together they form a unique fingerprint.

Cite this