Demo Stations – DOE HPC Leading the Way

Demo Stations and Roundtables

DOE Booth Demo Station and Roundtable abstracts are shown below.

Philip Mucci, Jeanine Cook, Tushar Mohan, Sandia
Optimizing HPC Platforms, Applications and Resources with PerfMiner
Tues., Nov. 15, 10 a.m. – Demo Station 1
PerfMiner is advanced site optimization system for HPC and enterprise-class data-centers. It’s goal is to provide actionable information as to how to optimize a site’s overall productivity and return on investment in hardware, software and personnel. PerfMiner uses state-of-the-art, lightweight and pervasive performance data collection technology, automate its collection, aggregation and indexing, and mines the data for key-performance indicators and present those metrics using a web-based, business-intelligence style dashboard. These indicators are combined and then presented in an audience-specific, dynamic, drill-down hierarchy of performance; covering site-productivity down to individual application threads. The systems visual cues facilitate easy interpretation and analysis and expose significant potential to improve decision-making across the business unit. While the data collected may be extensive, those mined and presented are derived from Minimal Metri cs extensive previous experience in the HPC and enterprise application performance tuning as well as the PI’s experience with an early prototype. The system is completely transparent: It has negligible overhead, works on applications written in any language and does not require any modifications by the user.

Christoph Junghans, Los Alamos
New Methods for Smoothed Particle hydrodynamics Simulations of Binary Neutron Star Mergers
Tues., Nov. 15, 10 a.m. – Roundtable
With the recent groundbreaking discovery of gravitational waves from merging black holes, the first direct detection of neutron star mergers is only a matter of time. Observational signatures include gravitational waves and faint supernova-like transients powered by radioactive decay of freshly synthesized heavy elements. Due to the complexity of the problem, the only way to understand these observations is to confront them with the predictions obtained via simulation. We use smoothed particle hydrodynamics, which is well suited for such problems, and adapt the highly scalable 2HOT code to simulate these mergers. Furthermore, we augment 2HOT by incorporating tabulated equations of state to improve the physics accuracy. This new physics introduces overhead. Retaining performance while adding new physics provides a unique opportunity to exercise the principles of co-design and for a collaboration between domain and computer scientists. To maintain performance and scalability, we explore and optimize the nearest-neighbor search algorithm intrinsic to the code. We develop a custom k-nearest neighbor proxy application, which provides the platform upon which we experiment with different domain partitioning schemes and problem space representations. To understand and optimize load-balancing, we also investigate implementations of our proxy application in varioustask-based runtime systems such as Charm++ and STAPL.

Sameer Shende, University of Oregon
The TAU Performance System
Tues., Nov. 15, 11 a.m. – Demo Station 1
Wed., Nov. 16, 12:15 p.m. – Roundtable
The TAU Performance System delivers robust, integrated, portable, and open technology for performance analysis of parallel applications on large-scale, leadership-class HPC machines available today. Through advances in application-specific performance evaluation, scalable performance tools, multi-experiment performance data management, performance data mining, and programming environment integration, TAU helps application developers be more productive in achieving their development and optimization goals. In addition, the TAU project is making important advances in kernel-level performance monitoring to identify OS actions influencing delivered performance. TAU provides integrated instrumentation, measurement, and analysis capabilities in a cross-platform tool suite, plus additional tools for performance data management, data mining, and interoperation. The TAU project has developed strong interactions with the ASC/NNSA, SciDAC SUPER, DOE X-Stack, and FastOS projec ts. TAU has been ported to the leadership class facilities at ANL, ORNL, LLNL, Sandia, and NERSC, including GPGPU Linux clusters, IBM, and Cray systems.

Eric Lingerfelt et al., Oak Ridge
BEAM: An HPC Pipeline for Nanoscale Materials Analysis and Neutron Data Modeling
Tues., Nov. 15, 12 p.m. – Demo Station 1
The Bellerophon Environment for Analysis of Materials (BEAM) enables scientists at ORNL’s Center for Nanophase Materials Sciences (CNMS) and Spallation Neutron Source (SNS) to leverage the integrated computational and analytical power of ORNL’s Compute And Data Environment for Science (CADES) and the Oak Ridge Leadership Computing Facility (OLCF) to perform near real-time scalable analysis and modeling. At the core of this computational workflow system is a web and data server located at CADES that enables multiple, concurrent users to securely upload and manage data, execute materials science analysis and modeling workflows, and interactively explore results through custom visualization services. BEAM’s long-term data management capabilities utilize CADES’ petabyte-scale file system and enable users to easily manipulate remote directories and uploaded data in their private data storage area as if they were browsing on a local workstation. In addition, the framework facilitates user workflow needs by enabling integration of advanced data analysis algorithms and authenticated, “push-button” execution of dynamically generated workflows employing these algorithms on Titan, Eos, and Rhea at OLCF, as well as compute clusters at CADES. We will demonstrate Band Excitation Analysis and Principal Component Analysis of SPM and STEM data (developed in collaboration with ORNL’s Institute for Functional Imaging of Materials) using a variety of HPC implementations including FORTRAN, R with pbdR, and Java with Apache Spark – all tightly bound with parallel HDF5. In addition, we will show an initial implementation of near real-time optimization of inelastic neutron scattering data utilizing Titan (developed in collaboration with the ACUMEN project and ORNL’s Center for Accelerating Materials Modeling).

W. Alan Scott, Sandia
ParaView and Big Data
Tues., Nov. 15, 12 p.m. – Demo Station 2
ParaView will be shown working with big data. This demo will include new features of the 5.2.0 release, such as anti aliasing of remote rendered data and enhanced backface removal.

John Harney, Oak Ridge National Laboratory
Deep Learning Challenges on Pre-exascale Systems
Tues., Nov. 15, 12:15 p.m. – Roundtable
Deep learning (DL), which is used to learn levels of representation and abstraction that make sense of large datasets, is the fastest growing field in machine learning. Although conceived several years ago, deep learning has become more achievable with rapid advances in modern GPU technology and is increasingly being utilized by domain and computational scientists worldwide in areas such as computer vision, automatic speech recognition, and natural language processing. In the coming year, pre-exascale systems, such as Summit at the Oak Ridge Leadership Computing Facility (OLCF), will be deployed with 100-200 TF capabilities under newly designed, state-of-the-art, hybrid architectures. It will therefore be advantageous to understand how deep learning applications can properly leverage features of these architectures to address new DL problems at larger scales. This roundtable discussion will focus on proper integration strategies of DL frameworks and libraries into these pre-exascale systems.

Rollin Thomas, Berkeley Lab
Using Jupyter for Interactive Data Analytics on Cori at NERSC
Tues., Nov. 15, 1 p.m. – Demo Station 1
Jupyter is a popular interactive web application that allows users to author and share “notebooks” containing code, rich text, and visualizations. It is a paradigm-shifting technology that promotes literate programming, collaboration, and reproducibility. Jupyter is a key element of NERSC’s strategy to enable interactive data analytics at scale. We have deployed Jupyter on the Cray XC40 “Cori” system, exposing an alternative route of access to supercomputing in place of the command line. Using an interactive notebook, users can perform data analytics tasks over the web on Cori compute nodes through Jupyter. We will describe and demonstrate this new service in action, using examples from real-world scientific use cases.

Jim Brandt, Ann Gentile, Sandia; Tom Tucker, Open Grid Computing
Exciting Developments in Tools for Large Scale HPC System and Application Performance Understanding
Tues., Nov. 15, 1 p.m. – Demo Station 2
Wed., Nov. 16, 3 p.m. – Demo Station 2
Thurs., Nov. 17, 2 p.m. – Demo Station 2
Advanced analysis and visualization techniques have long been used to enhance understanding of the evolution and results of scientific simulations run on large scale HPC systems. Over the same time, the task of understanding the interaction of the HPC system components that must work together seamlessly in order for these simulations to be completed has been left to the intuition of system administrators and operations staff using a handful of rudimentary tools. We have reached a point in system size and complexity that renders this approach largely ineffectual for all but the coarsest level evaluations (e.g., identification of severe congestion or failure without generally being able to identify root cause). In these demonstrations we unveil new and exciting tools and capabilities being developed to rectify this problem. In particular, we will be demonstrating use of tools for lightweight data collection, scalable log categorization, scalable information storage and retrieval along with analysis and visualization techniques to efficiently turn the information gathered into understanding.

Nick Cramer, Pacific Northwest
Explore Molecules in Virtual Reality
Tues., Nov. 15, 2 p.m. – Demo Station 1
Wed., Nov. 16, 10 a.m. – Demo Station 2
Thurs., Nov. 17, 11 a.m. – Demo Station 2
People will use virtual reality hardware to explore a molecule modeled on PNNL’s Cascade supercomputer.

Mariam Kiran, Eric Pouyoul, Anu Mercian, Brian Tierney, Inder Monga, Berkeley Lab/ESnet
InDI: Intent-based User-defined Service Deployment over Multi-Domain SDN applications
Tues., Nov. 15, 2 p.m. – Demo Station 2
Wed., Nov. 16, 11 a.m. – Demo Station 2
SDN and network function virtualization research offers multiple advantages such as efficient network design, closer network control and separate management of control and data planes. Through this network-as-a-service model, network users expect networks to promise and deliver their applications with user-intended profiles and user needed performance. This requires an ‘intent engine’ to act as an intermediary, that translates user high level language into network specific commands, to deploy across multiple SDN controllers. We present our work on the Intent Engine which uses a knowledge library and parser to give users a capability to ‘talk’ to the network and receive user-specified performance.

Shane Canon, Doug Jacobsen, Berkeley Lab
Shifter – Containers for HPC
Tues., Nov. 15, 3 p.m. – Demo Station 1
Wed., Nov. 16, 10 a.m. – Roundtable
Container-based computed is rapidly changing the way software is developed, tested, and deployed. We will demo and discuss the design and implementation of Shifter, which enables running containers on HPC platforms and is used in production at NERSC. Shifter enables end users to execute containers using images constructed from various methods including the popular Docker-based ecosystem. We will share some of the recent improvements to Shifter including an improved image manager, integration with SLURM, integration with the Cori burst buffer, and user controllable volume mounts. In addition, we will share lessons learned, performance results, and real-world use cases of Shifter in action. We also welcome discussion on the potential role of containers in scientific and technical computing including how they complement the scientific process.

Alexei Klimentov, et al Brookhaven
BigPanDA@Titan
Tues. Nov. 15, 10 a.m. – Roundtable
Tues., Nov. 15, 3 p.m. – Demo Station 2
Scientific priorities in High Energy and Nuclear Physics continue to serve as drivers of integrated computer and data infrastructure. The lack of scalable and extensible workload management capabilities across heterogeneous computing infrastructure, however presents a barrier to the scientific progress. BigPanDA represents important conceptual advances and novel capabilities to workload management. We propose to deploy and bring into production BigPanDA workflow management techniques on the Oak Ridge Leadership Computing Facility (OLCF) Titan supercomputer. This will significantly and positively impact scientific communities in High Energy and Nuclear Physics, and beyond, for current and future leadership computing facilities. This project translates the research artifacts and accomplishments from recent ASCR projects into OLCF operational advances and enhancement. The proposed solution will provide an important model for future exascale computing, increasing the coherence between the technology base used for high-performance, scalable modeling and simulation and that used for data-analytic computing. Our approach to demonstrate integration of non-traditional, data-intensive, high-throughput workloads and traditional compute-intensive workloads within leadership computing facilities, and yield important physics simulations and data analysis that would otherwise be impossible or far too slow for the rapidly increasing pace of data collection at the Large Hadron Collider (LHC).

Manuel Arenaz, et al, Apprenta/Oak Ridge
Parallware Trainer: LLVM-based Software for Training and Guided Parallelization
Tues., Nov. 15, 4 p.m. – Demo Station 1
This webinar will present the Parallware Trainer, a new desktop tool for effective OpenMP & OpenACC training based on the production-grade LLVM compiler infrastructure. Parallware is a new technology for static analysis of programs that overcomes the limitations of the classical dependence analysis that is at the foundation of current tools to extract parallelism in scientific codes. Using a fast, extensible hierarchical classification scheme to address dependence analysis, it discovers parallelism and annotates the source code with the most appropriate OpenMP & OpenACC directives. In this talk, we present the new tool Parallware Trainer, introduce the principles of the Parallware technology, and show how it can be used to help porting applications to current and future HPC facilities. The Parallware technology is currently under development at Appentra Solutions, in collaboration with ORNL, NVIDIA and BSC.

Wenji Wu, et al, Fermilab
Real-time Scientific Data Streaming Using ADIOS+mdtmFTP
Tues., Nov. 15, 4 p.m. – Demo Station 2
Wed., Nov. 16, 10 a.m. – Demo Station 1
Wed., Nov. 16, 4 p.m. – Demo Station 2
Large science applications are typically carried out in a widely distributed, highly collaborative manner. Scientific workflows have emerged as a useful paradigm that enable scientists to express and manage dependencies in complex distributed scientific processes and orchestrate large amount of widely distributed data. Traditionally, scientific workflows typically handle data in a database model where data is first indexed and stored in storage systems, and then subsequently processed by queries. Big data has emerged as a driving force for scientific discoveries. Big data is not just about Volume, but also about Velocity and Variety. We argue that scientific workflows with traditional database model have difficulties in supporting big data scientific applications. This is because the traditional database model typically involves a significant amount of storage I/Os, which is inflexible, costly, and difficult to achieve high performance. ORNL and FNAL are working collaboratively to develop a stream processing model using ADIOS+mdtmFTP for next generation scientific workflows to support big data scientific applications. ADIOS is an open-source I/O framework built by ORNL; mdtmFTP is a high-performance data transfer tool that efficiently utilized multicore hardware, developed by FNAL. In this stream processing model, ADIOS provides efficient and intelligent in-memory data processing, while mdtmFTP support reliable and high-performance data transfer. In this demo, we demonstrate ADIOS+mdtmFTPâ€™s stream processing capability in support of a DOE science application.

Chris Tracy, Berkeley Lab
400G Deployment over Next-Generation Optical Substrate by a National Research & Education Network
Tues., Nov. 15, 4:45 p.m. – Roundtable
In this round table discussion, we will discuss ESnet’s operational experiences with deployment of 400G services across new flexible photonic architectures, and how we are deploying and managing circuit demands beyond 100G in a previously deployed optical substrate. The discussion will include an overview of our proof-of-concept lab demonstrations, testbed deployments and field trials, as well as a production 400G deployment with a real-world application as a use case. Related topics include discussion of flexible transceivers, flexible grid wavelength switches, directionless add/drop architectures, and what “gridless” and “colorless” services really mean at the end of the day.

Prabhat, Berkeley Lab
What Can Deep Learning do for Science?
Tues., Nov. 15, 4:45 p.m. – Roundtable
A review of NERSC’s experience in successfully applying deep learning for problems in astronomy, cosmology, climate, neuroscience, high-energy physics and genomics. HPC implementations of deep learning frameworks and open problems at the frontier of semi-supervised and unsupervised learning will be discussed.

Patricia Crossno, Sandia
Slycat™ Ensemble Analysis and Visualization
Tues., Nov. 15, 5 p.m. – Demo Station 1
Slycat™ is a web-based system for performing data analysis and visualization of potentially large quantities of remote, high-dimensional data. Slycat™ specializes in working with ensemble data. An ensemble is a group of related data sets, which are typically a set of simulation runs exploring a shared problem space. An ensemble can be thought of as a set of samples defined on a common set of variables, where each sample is a vector of values defining a point in high-dimensional space. Ensemble analysis looks at the shared behaviors and common features of the group in an effort to understand and describe the underlying problem domain. Additionally, ensemble analysis tries to quantify any differences found in members that deviate from the rest of the group.

Prabhat, Berkeley Lab
NERSC’s Big Data Strategy
Wed., Nov. 16, 10 a.m. – Roundtable
Elements of NERSC’s current Big Data Strategy will be discussed; covering hardware aspects (Cori Phase I, Burst Buffer, queues and policies) and software (spanning analytics, management, workflows, transfer and visualization).

Les Cottrell, et al, SLAC/Berkeley Lab/Zettar
High Performance File Transfer for Next Generation Science Experiments
Wed., Nov. 16, 11 a.m.
The next generation Linear Coherent Light Source experiment (LCLS-II) at SLAC has an event rate 1000 times that of today’s LCLS. Much of the data analysis will be performed at the NERSC supercomputer center at LBL. The expected data transfer rates to transfer the data to NERSC will reach several hundreds of Gbits/sec soon after the project turns on in 2020 and exceed a Tbit/sec by 2025. This demonstration will present an overview of LCLS-II including what scientific insight it will provide and its need for data transfer and High Performance Computing (HPC) for analysis. This will be followed by a presentation of ESnet’s contributions to exploring high speed data transfer required for big science and NERSC’s Exascale supercomputer futures and analysis tools using HPC for data analysis. We will conclude with a presentation on a reference cluster configuration for high speed file transfers plus real-time demonstrations of transfers from SLAC to SLAC over a 5000 mile link to Atlanta and back, as well as a local 2*100Gbps link showing: memory to memory, long distance and local file transfer performance, impact of Lots of Small files, possibly encryption etc…

Joaquin Chung, Georgia Tech; Eun-Sung Jung, Rajkumar Kettimuthu, Nageswara Rao, Ian Foster, Argonne
Advance Reservation Access Control using Software-defined Networking and Tokens
Wed., Nov. 16, 12 p.m.
Advanced high-speed networks allow users to reserve dedicated bandwidth connections through advance reservation systems. A common use case for advance reservation systems is data transfers in distributed science environments. In this scenario, a user wants exclusive access to his/her reservation. However, current advance network reservation methods cannot ensure exclusive access of a network reservation to the specific flow for which the user made the reservation. In this demo we present a network architecture that addresses the above-mentioned limitation and ensure that a reservation is used only by the intended flow. We achieve this by leveraging software-defined networking (SDN) and tokens. We use SDN to orchestrate and automate the reservation of networking resources from end to end, and across multiple administrative domains. Finally, tokens are used to create a strong binding between the user or application who requested the reservation, and the flows provisioned by SDN.

Sabra Sehrish, Jim Kowalkowski Fermilab
Can Spark and HPC help Find Dark Matter?
Wed., Nov. 16, 1 p.m.
In this demo, we share our experience of bringing big data technologies and HPC resources to HEP analyses. Our goal is to demonstrate how well the two worlds benefit data- and compute-intensive statistical analyses in HEP. More specifically, we use Spark to implement a Dark Matter search use case from the CMS experiment at LHC/CERN, and evaluate its performance on NERSC (Cori/Edison). Spark is a fast and general-purpose cluster computing system. It allows for in-memory data processing, and is an attractive approach for similar, repeated analysis tasks on the same data. The analysis-ready CMS data consists of rows that describe physics objects (particles) and the event in which they were seen. We use HDF5 as our input format, and convert (currently in tabular ROOT datasets) to the HDF5 format. HDF5 is a well-known format for the HPC systems; it also allows us to use non-big data technologies to process these files. Our conversion follows a straightforward pattern to convert each Root tree/branch to an HDF5 group, and each ROOT leaf to an HDF5 dataset within the group. Each group represents a particle (tau, electron, etc) within an event and each dataset represents corresponding properties (momentum, trajectory, etc) of a particle. We implemented a custom Spark HDF5 reader to read in 1D datasets of a group within the HDF5 files into a Spark DataFrame. We keep one DataFrame per particle along with necessary event information. The task is to find all the events that have a complex combination of properties across the particle types. These combinations identify specific signatures i.e. Dark Matter particle interactions. We use Spark SQL joins to select and group data. We can successfully run our analysis workflow implemented in Spark on NERSC.

David Fritz, Jon Crussell, John Floren, Vince Urias, Sandia
Automated Discovery in Emulytics
Wed., Nov. 16, 1 & 2 p.m. – Demo Station 2
This presentation showcases Sandia’s state-of-the-art network modeling and emulation capabilities. A key part of Sandia’s modeling methodology is the discovery and specification of the information-system under study, and the ability to recreate that specification with the highest fidelity possible in order to extrapolate meaningful results. Sandia’s minimega platform (minimega.org) is an open-source tool for launching and managing large-scale virtual machine-based experiments. The platform supports research in virtualization, SDN, cybersecurity and cloud computing. The demonstration includes automated network structure and behavior discovery and generates bootable Emulytics models. In order to build the model, the toolset introspects, anonymizes and correlates data sources such as PCAP, netflow, active scans and router configurations. As well as model building, the toolset can rapidly launch the model in a large-scale virtual machine testbed.

H. Carter Edwards, Christian Trott, Daniel Sunderland, Sandia
Kokkos: Performance Portability and Productivity for C++ Applications
Wed., Nov. 16, 2 p.m. & 3 p.m. – Demo Station 1
The Kokkos programming model and C++ library implementation enables developers to productively write application code performance portable across diverse manycore architectures. Kokkos has been adopted by HPC applications at Sandia National Lab, Los Alamos National Lab, Oak Ridge National Lab, University of Utah PSAAP II center, and is under consideration at Army and Naval Research Labs. I will give an overview of Kokkos’ basic capabilities and API, how these are use to achieve performance portability, and a recent quantitative case study on application developer productivity using Kokkos.

Ramesh Balakrishnan, Tim Williams, Argonne
ALCF Theta Early Science Program: Science on Day One
Wed., Nov. 16, 4 p.m. – Demo Station 1

Jay Jay Billings, Oak Ridge
DOE OSTI’s New Software Forge and Product Lineup
Wed., Nov. 16, 4:45 p.m. – Roundtable
The Office of Scientific and Technical Information (OSTI), an Office of the Department of Energy’s Office of Science, is reinventing its Energy Science and Technology Software Center to provide a modern, state-of-the-art software development forge and aggregation service that will link together the code, data, authors, documentation and publications of DOE-sponsored software. This new product, DOE Code, will shift away from the “single-release” and binary executable focus of the previous product and embrace both social coding and open source software to better address the needs of the community. This presentation will discuss DOE Code, its development, the stakeholders, and how the rest of community can get involved to help on this unique project. We will present how DOE Code integrates with public forges such as GitHub and Bitbucket, how it handles software that is not public, how it aggregates and archives software, and how we have engaged the community in its development. We will also present a summary of the requirements on DOE Code from the HPC community that we gathered from members of the Exascale Computing Project and others in the national laboratories. The broader landscape of OSTI products and how DOE Code relates to them will also be presented.

Charles Ferenbaugh, Los Alamos
Code Modernization Experiences in LANL’s Eulerian Application Project
Thurs., Nov. 17, 12:15 p.m. – Roundtable
LANL’s Eulerian Applications Project is working on modernizing its code base to run at large scale on Trinity and other future platforms. A major part of this effort is what we call “packagization”: untangling the complicated dependency hierarchy and dividing the functionality into smaller, self-contained packages. Packagization is allowing us to refactor sections of the code in a localized way, and in particular it enables us to work on the optimizations needed for Trinity and other advanced architectures. This roundtable will provide a forum to discuss our ongoing modernization work with other code projects that are making, or considering making, similar efforts.