State of the Practice: Challenges of HPC Monitoring

Challenges of HPC Monitoring

Session: State of the Practice – HPC Monitoring/Syslog

Event Type: State of the Practice

Date: Thursday, November 17

Time: 1:30 – 2:30 p.m.

Session Chair: David Paul

Authors: William (Bill) E. Allcock, Randal Rheinheimer, Mike Lowe, Joshi Fullop, Evan Felix

Room: TCC LL4/LL5

Abstract:

HPC monitoring has no dominant solution, such as MPI for parallel programming. Any HPC administrator would likely agree that this is an area with plenty of room for improvement and this problem is only going to get worse as we grow to exascale. The panel members will discuss the biggest problems they face in HPC monitoring today, the good, the bad, and the ugly of what they have tried, and how this will change as we grow to larger and larger systems. The panel members represent 5 different institutions (ANL, IU, LANL, NCSA and PNNL), and over 25 years of combined experience on dozens of HPC clusters.

Chair/Author Details:

David Paul (Chair) – Lawrence Berkeley National Lab – NERSC Division

William (Bill) E. Allcock – Argonne National Laboratory

Randal Rheinheimer – Los Alamos National Laboratory

Mike Lowe – Indiana University

Joshi Fullop – National Center for Supercomputing Applications

Evan Felix – Pacific Northwest National Laboratory

Tagged with: , , ,
Posted in State of the Practice