TY - GEN
T1 - April
T2 - 2019 IEEE Conference on Computer Communications, INFOCOM 2019
AU - Nadig, Deepak
AU - Ramamurthy, Byrav
AU - Bockelman, Brian
AU - Swanson, David
N1 - Funding Information:
This material is based upon work supported by the National Science Foundation under Grant Numbers OAC-1541442 and CNS-1817105.
Funding Information:
This material is based upon work supported by the National Science Foundation under Grant Numbers OAC-1541442 and CNS-1817105. This work was completed using the Holland Computing Center of the University of Nebraska, which receives support from the Nebraska Research Initiative. The authors would like to thank Garhan Attebury, Holland Computing Center at UNL for his valuable support.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/4
Y1 - 2019/4
N2 - In this paper, we propose an application-aware intelligent load balancing system for high-throughput, distributed computing, and data-intensive science workflows. We leverage emerging deep learning techniques for time-series modeling to develop an application-aware predictive analytics system for accurately forecasting GridFTP connection loads. Our solution integrates with a major U.S. CMS Tier-2 site; we use a real dataset representing 670 million GridFTP transfer connections measured over 18 months to drive our predictive analytics solution. First, we perform extensive analysis on this dataset and use the connection loads as an example to study the temporal dependencies between various user-roles and workflow memberships. We use the analysis to motivate the design of a gated recurrent unit (GRU) based deep recurrent neural network (RNN) for modeling long-term temporal dependencies and predicting connection loads. We develop a novel application-aware, predictive and intelligent load balancer, APRIL, that effectively integrates application metadata and load forecast information to maximize server utilization. We conduct extensive experiments to evaluate the performance of our deep RNN predictive analytics system and compare it with other approaches such as ARIMA and multi-layer perceptron (MLP) predictors. The results show that our forecasting model, depending on the user-role, performs between 5.88%-92.6% better than the alternatives. We also demonstrate the effectiveness of APRIL by comparing it with the load balancing capabilities of an existing production Linux Virtual Server (LVS) cluster. Our approach improves server utilization, on an average, between 0.5 to 11 times, when compared with its LVS counterpart.
AB - In this paper, we propose an application-aware intelligent load balancing system for high-throughput, distributed computing, and data-intensive science workflows. We leverage emerging deep learning techniques for time-series modeling to develop an application-aware predictive analytics system for accurately forecasting GridFTP connection loads. Our solution integrates with a major U.S. CMS Tier-2 site; we use a real dataset representing 670 million GridFTP transfer connections measured over 18 months to drive our predictive analytics solution. First, we perform extensive analysis on this dataset and use the connection loads as an example to study the temporal dependencies between various user-roles and workflow memberships. We use the analysis to motivate the design of a gated recurrent unit (GRU) based deep recurrent neural network (RNN) for modeling long-term temporal dependencies and predicting connection loads. We develop a novel application-aware, predictive and intelligent load balancer, APRIL, that effectively integrates application metadata and load forecast information to maximize server utilization. We conduct extensive experiments to evaluate the performance of our deep RNN predictive analytics system and compare it with other approaches such as ARIMA and multi-layer perceptron (MLP) predictors. The results show that our forecasting model, depending on the user-role, performs between 5.88%-92.6% better than the alternatives. We also demonstrate the effectiveness of APRIL by comparing it with the load balancing capabilities of an existing production Linux Virtual Server (LVS) cluster. Our approach improves server utilization, on an average, between 0.5 to 11 times, when compared with its LVS counterpart.
UR - http://www.scopus.com/inward/record.url?scp=85068214579&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068214579&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM.2019.8737537
DO - 10.1109/INFOCOM.2019.8737537
M3 - Conference contribution
AN - SCOPUS:85068214579
T3 - Proceedings - IEEE INFOCOM
SP - 1909
EP - 1917
BT - INFOCOM 2019 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 29 April 2019 through 2 May 2019
ER -