Thomas Lux, Layne T. Watson

Abstract

Exponential increases in complexity and scale make variability a growing threat to sustaining HPC performance at exascale. Performance variability in HPC I/O is common, acute, and formidable. We take the first step towards comprehensively studying linear and nonlinear approaches to modeling HPC I/O system variability in an effort to demonstrate that variability is often a predictable artifact of system design. Using over 8 months of data collection on 6 identical systems, we propose and validate a modeling and analysis approach (MOANA) that predicts HPC I/O variability for thousands of software and hardware configurations on highly parallel shared-memory systems. Our findings indicate nonlinear approaches to I/O variability prediction are an order of magnitude more accurate than linear regression techniques. We demonstrate the use of MOANA to accurately predict the confidence intervals of unmeasured I/O system configurations for a given number of repeat runs - enabling users to quantitatively balance experiment duration with statistical confidence.

People

Thomas Lux


Layne T. Watson


Publication Details

Date of publication:
January 31, 2019
Journal:
IEEE Transactions on Parallel and Distributed Systems
Page number(s):
1843-1856
Volume:
30
Issue Number:
8
Publication note:

Kirk W. Cameron, Ali Anwar, Yue Cheng, Li Xu, Bo Li, Uday Ananth, Jon Bernard, Chandler Jearls, Thomas Lux, Yili Hong, Layne T. Watson, Ali Raza Butt: MOANA: Modeling and Analyzing I/O Variability in Parallel System Experimental Design. IEEE Trans. Parallel Distributed Syst. 30(8): 1843-1856 (2019)