Eirene: Improving Short Job Latency Performance with Coordinated Cold Data Migration and Scheduler-Aware Task Cloning

Wei Zhou, K. Preston White, Hongfeng Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In large-scale enterprise data centers for big data analytics, long batched jobs and short interactive jobs are usually mixed. Hybrid job schedulers, consisting of one centralized scheduler for long jobs and multiple distributed schedulers for short jobs, have become a promising alternative because they can significantly shorten latencies of short jobs via independent and parallelized assignment of short tasks by distributed schedulers and lower chances of head-of-line blocking via a number of performance optimization techniques.However, short jobs are still facing long job latencies under hybrid job schedulers due to workload fluctuation and straggler task problem. In this paper, we propose Eirene to optimize the latency performance of short jobs via two schemes tightly coupled into the general architecture of hybrid job schedulers. Coordinated Cold Data Migration leverages high task waiting time of short jobs under heavily-loaded periods and migrates cold data from disks to local memory for the initial phase of reading input so as to shorten task runtime and queueing time. On the other hand, Scheduler-Aware Task Cloning exploits spare computing resources under lightly-loaded periods and performs proactive task cloning for short jobs to mitigate the straggler problem.We implement a prototype of Eirene based on Eagle, a state-of-the-art hybrid job scheduler. Experimental results show that, under heavy loads, Eirene is able to improve 50-percentile (P50), 75-percentile (P75), 90-percentile (P90) latency performance of short jobs by up to 44.4%, 80.3%, 84.1% respectively compared with Eagle under the Facebook trace with a cluster of 50000 nodes.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages423-432
Number of pages10
ISBN (Electronic)9781728108582
DOIs
StatePublished - Dec 2019
Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
Duration: Dec 9 2019Dec 12 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference2019 IEEE International Conference on Big Data, Big Data 2019
CountryUnited States
CityLos Angeles
Period12/9/1912/12/19

Keywords

  • Big Data
  • Job Scheduler
  • Resource Management

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Fingerprint Dive into the research topics of 'Eirene: Improving Short Job Latency Performance with Coordinated Cold Data Migration and Scheduler-Aware Task Cloning'. Together they form a unique fingerprint.

  • Cite this

    Zhou, W., White, K. P., & Yu, H. (2019). Eirene: Improving Short Job Latency Performance with Coordinated Cold Data Migration and Scheduler-Aware Task Cloning. In C. Baru, J. Huan, L. Khan, X. T. Hu, R. Ak, Y. Tian, R. Barga, C. Zaniolo, K. Lee, & Y. F. Ye (Eds.), Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 (pp. 423-432). [9006575] (Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData47090.2019.9006575