Challenges of HPC Monitoring
Session: State of the Practice – HPC Monitoring/Syslog
Event Type: State of the Practice
Date: Thursday, November 17
Time: 1:30 – 2:30 p.m.
Session Chair: David Paul
Authors: William (Bill) E. Allcock, Randal Rheinheimer, Mike Lowe, Joshi Fullop, Evan Felix
Room: TCC LL4/LL5
Abstract:
HPC monitoring has no dominant solution, such as MPI for parallel programming. Any HPC administrator would likely agree that this is an area with plenty of room for improvement and this problem is only going to get worse as we grow to exascale. The panel members will discuss the biggest problems they face in HPC monitoring today, the good, the bad, and the ugly of what they have tried, and how this will change as we grow to larger and larger systems. The panel members represent 5 different institutions (ANL, IU, LANL, NCSA and PNNL), and over 25 years of combined experience on dozens of HPC clusters.
Chair/Author Details:
David Paul (Chair) – Lawrence Berkeley National Lab – NERSC Division
William (Bill) E. Allcock – Argonne National Laboratory
Randal Rheinheimer – Los Alamos National Laboratory
Mike Lowe – Indiana University
Joshi Fullop – National Center for Supercomputing Applications
Evan Felix – Pacific Northwest National Laboratory