Benchmarking Machine Learning Ecosystem on HPC systems

With evolving system architectures, hardware and software stack, and scientific workloads and data from simulations, it is important to understand how these interact with each other. Benchmarking would help evaluate and reason the performance gains with workload to system mapping. As machine learning (ML) is becoming a critical component to help run applications faster, improve throughput and understand the insights from the data generated from simulations, benchmarking ML methods with scientific workloads at scale will be important as we progress towards exascale systems and beyond. We anticipate to know more about the scalability of different machine learning methods, frameworks and metrics should be considered and understand what various ongoing efforts could offer. 

Towards this effort, we are organizing the first Birds of a Feather (BoF) session at Supercomputing 2019.

Schedule: November 21, 2019: 12:15pm-1:15pm

Room 708, Colorado Convention Center, Denver, CO

Presentation Slides are available at

DurationSpeaker, affiliation
12:15-12:20Introduction (Murali Emani, ANL/ALCF)
12:20-12:27David Kanter (MLPerf)
12:27-12:34Steve Farrell (LBNL/NERSC)
12:34-12:41Sam Jackson (STFC, UK)
12:41-12:48Sergey Serebryakov (HPE)
12:48-12:55Natalia Vassileva (Cerebras)
12:55-1:15Panel discussion with the speakers:
Moderator (Steve Farrell, LBNL/NERSC)


  • Murali Emani, Argonne National Laboratory/ ALCF
  • Steve Farrell, Lawrence Berkeley National Laboratory/NERSC
  • Abid Malik, Brookhaven National Laboratory
  • Jacob Balma, Cray


In particular, we plan to discuss the following questions that cater to scientific workloads.

  • Why are standard HPC benchmarks needed for ML?
    • What capabilities are missing in current benchmark suites to address ML and HPC workloads
    • How benchmarks could be used to characterize systems to project future system performance such that representative benchmarks would be critical in designing future HPC systems that run ML workloads.
  • What are the challenges in creating benchmarks that would be useful?
    • Fast-moving field where representative workloads change with state-of-the-art
    • On-node compute characteristics vs off-node communication characteristics for various training schemes
    • Big datasets, I/O bottlenecks, reliability, MPI vs alternative communication backends
    • Complex workloads where model training/inference might be coupled to simulations / high-dimensional data or Hyperparameter optimization, Reinforcement learning frameworks
    • Availability and access to scientific datasets
    • What metrics would help in comparing different systems and workloads
  • How do we design benchmarks capable of characterizing HPC systems’ suitability for ML/DL workloads?
    • Probably need to enumerate the types of workloads that are emerging in practice
    • How do the needs of HPC facilities/labs differ from industry
    • How to integrate AI in HPC workflows
  • How well does the current landscape of emerging benchmarks represent industry/science use-cases?
    • MLPerf, Deep500, BigData bench, AI Matrix (Alibaba)

With contributions from diverse players, the theme of the proposed session will help in channelizing efforts with the SC community and liaise with domain scientists, academia, HPC facilities, vendors. 

Outcome:  The outcome from these discussions would be summarized in a technical report that is hosted online and made publicly available, thus laying the foundations for community building efforts towards benchmarking. Also, interested participants would be encouraged to contribute to organize and curate scientific datasets in a public repository.