TY - GEN
T1 - Combining in-situ and in-transit processing to enable extreme-scale scientific analysis
AU - Bennett, Janine C.
AU - Abbasi, Hasan
AU - Bremer, Peer Timo
AU - Grout, Ray
AU - Gyulassy, Attila
AU - Jin, Tong
AU - Klasky, Scott
AU - Kolla, Hemanth
AU - Parashar, Manish
AU - Pascucci, Valerio
AU - Pebay, Philippe
AU - Thompson, David
AU - Yu, Hongfeng
AU - Zhang, Fan
AU - Chen, Jacqueline
N1 - Copyright:
Copyright 2013 Elsevier B.V., All rights reserved.
PY - 2012
Y1 - 2012
N2 - With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.
AB - With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.
UR - http://www.scopus.com/inward/record.url?scp=84877687430&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84877687430&partnerID=8YFLogxK
U2 - 10.1109/SC.2012.31
DO - 10.1109/SC.2012.31
M3 - Conference contribution
AN - SCOPUS:84877687430
SN - 9781467308069
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012
T2 - 2012 24th International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012
Y2 - 10 November 2012 through 16 November 2012
ER -