Workshop Proceedings

June 15th, 2015

The workshop will be held on July 13, 9am-4pm Central time, and on the 14th, 9am-12pm. The presentations will be broadcast via bluejeans – use the meeting ID 273236650 to join.
Presenters, please have your presentation in a PDF or powerpoint format. We’ll be using a single laptop for presenting.


Monday, July 13

Time Author(s) Talk Title
Session 1 (9:00am – 10:00am), Chair: John Jenkins
9:00am-9:15am John Jenkins, ANL Welcome [pdf]
9:20am-9:35am Christopher Carothers, RPI  Overview of ROSS [pdf]
9:40am-10:00am Bilge Acun, UIUC TraceR: A Parallel Trace Replay Tool for HPC Network Simulations [pdf]
Break (15 min)
Session 2 (10:15am – 11:45am), Chair: Misbah Mubarak
10:15am-10:40am Ning Liu, IIT FatTreeSim: Modeling a Large-scale Fat-Tree Network for HPC Systems and Data Centers Using Parallel and Discrete Event Simulation [pdf]
10:45am-11:10am Xin Wang, IIT Bandwidth allocation in WANs
11:15am-11:40am Xu Yang, IIT Network Contention Aware HPC job scheduling with Workload Precognition [pdf]
Lunch (11:45am – 1:00pm)
Session 3 (1:00pm – 2:00pm), Chair: Justin LaPre
1:00pm-1:15pm  Eric Mikida, UIUC CharmROSS – Empowering PDES with an Adaptive Runtime System [pdf]
1:20pm-1:35pm Caitlin Ross, RPI Evaluating the use of proxies in MG-RAST [pdf]
1:40pm-1:55pm  Yuki Kirii, Hiroki Ohtsuji, Kohei Hiraga, Osamu Tatebe, Tsukuba University Simulation of PPMDS: A Distributed Metadata Manegement System [pdf]
Break (30 min)
Session 4 (2:30pm – 3:45pm), Chair: Phil Carns
2:30pm-2:55pm Shane Snyder, ANL Epidemic Fault Detection and Group Membership in HPC Storage Systems [pdf]
3:00pm-3:15pm Christopher Carothers, RPI Introduction to Optimistic Simulation using ROSS [pdf]
3:20pm-3:45pm  Justin LaPre / Elsa Gonsiorowski, RPI ROSS as a PDES Research Vehicle [pdf]

Tuesday, July 14

Time Author(s) Talk Title
Session 1 (9:00am – 9:45am), Chair: Shane Snyder
9:00am-9:25am Ioan Raicu, IIT Simulation Research Overview
9:30am-9:45am John Jenkins, ANL Advanced CODES/ROSS Usage and Strategies [pdf]
Break (15 min)
Session 2 (10:00am – 12:00pm)
10:00am-12:00pm All CODES / ROSS Hackathon
12:00pm Misbah Mubarak, ANL Closing Remarks


Talk Descriptions

TraceR: A Parallel Trace Replay Tool for HPC Network Simulations

TraceR is a trace replay tool built upon the ROSS-based CODES simulation framework. TraceR can be used for predicting network performance and understanding network behavior by simulating messaging on interconnection networks. It addresses two major shortcomings in current network simulators. First, it enables fast and scalable simulations of large-scale supercomputer networks. Second, it can simulate production HPC applications using BigSim’s emulation framework. We also compare TraceR with other network simulators such as SST and BigSim, and demonstrate its scalability using various case studies.

Epidemic Fault Detection and Group Membership in HPC Storage Systems

Fault response strategies are crucial to maintaining performance and availability in HPC storage systems, and the first responsibility of a successful fault response strategy is to detect failures and maintain an accurate view of group membership. This is a nontrivial problem given the unreliable nature of communication networks and other system components. As with many engineering problems, trade-offs must be made to account for the competing goals of fault detection efficiency and accuracy.

Today’s production HPC services typically rely on distributed consensus algorithms and heartbeat monitoring for group membership. In this work, we investigate epidemic protocols to determine whether they would be a viable alternative. Epidemic protocols have been proposed in previous work for use in peer-to-peer systems, but they have the potential to increase scalability and decrease fault response time for HPC systems as well. We focus our analysis on the Scalable Weakly-consistent Infection-style Process Group Membership (SWIM) protocol.

We begin by exploring how the semantics of this protocol differ from those of typical HPC group membership protocols, and we discuss how storage systems might need to adapt as a result. We use existing analytical models to choose appropriate SWIM parameters for an HPC use case. We then develop a new, high-resolution parallel discrete event simulation of the protocol to confirm existing analytical models and explore protocol behavior that cannot be readily observed with analytical models. Our preliminary results indicate that the SWIM protocol is a promising alternative for group membership in HPC storage systems, offering rapid convergence, tolerance to transient network failures, and minimal network load.

Evaluating the use of proxies in MG-RAST

The MG-RAST metagenomics service provided by Argonne National Laboratory provides a platform for storing metagenomic data and a computational pipeline for analysis.  As the cost of DNA sequencing has decreased, systems such as MG-RAST have experienced an exponential increase in data submissions.  MG-RAST comprises of the workflow management system, AWE, and the data management system, Shock.  AweSim is a coarse-grained simulation of the MG-RAST workflow that was developed to evaluate data-aware scheduling in MG-RAST.  In this work, we have extended the AweSim simulation to provide proxy servers that the clients can communicate with, instead of directing all communication to the centralized Shock server.  We expect the use of proxy servers to improve performance by distributing the data movement requests that will lead to an improved job response time in the system.  In this talk, we will discuss some preliminary results of AweSim with various shock proxy configurations, as well as future work.

FatTreeSim: Modeling a Large-scale Fat-Tree Network for HPC Systems and Data Centers Using Parallel and Discrete Event Simulation

Fat-tree topologies have been widely adopted as the communication network in data centers in the past decade. Nowadays, high-performance computing (HPC) system designers are considering using fat-tree as the interconnection network for the next generation supercomputers. For extreme-scale computing systems like the data centers and supercomputers, the performance is highly dependent on the interconnection networks. In this paper, we present FatTreeSim, a PDES-based toolkit consisting of a highly scalable fat-tree network model, with the goal of better understanding the design constraints of fat-tree networking architectures in data centers and HPC systems, as well as evaluating the applications running on top of the network. FatTreeSim is designed to model and simulate large-scale fat-tree networks up to millions of nodes with protocol-level fidelity. We have conducted extensive experiments to validate and demonstrate the accuracy, scalability and usability of FatTreeSim. On Argonne Leadership Computing Facility’s Blue Gene/Q system, Mira, FatTreeSim is capable of achieving a peak event rate of 305 M/s for a 524,288-node fat-tree model with a total of 567 billion committed events. The strong scaling experiments use up to 32,768 cores and show a near linear scalability. Comparing with a small-scale physical system in Emulab, FatTreeSim can accurately model the latency in the same fat-tree network with less than 10% error rate for most cases. Finally, we demonstrate FatTreeSim’s usability through a case study in which FatTreeSim serves as the network module of the YARNsim system, and the error rates for all test cases are less than 13.7%.

Network Contention Aware HPC job scheduling with Workload Precognition

Today’s scientific applications increasingly involve large amounts of input/output data that must be moved among multiple computing facilities via wide-area networks (WANs). The bandwidth of WANs, however, is growing at a much smaller rate and thus becoming a bottleneck. Moreover, the network bandwidth has not been viewed as a limited resource, and thus coordinated allocation is lacking. Uncoordinated scheduling of competing data transfers over shared network links results in suboptimal system performance and poor user experiences. To address these problems, we propose a data transfer scheduler to coordinate and schedule data transfers between distributed computing facilities over WANs. Specifically, the scheduler prioritizes and allocates resources to data transfer requests based on user-centric utility functions in order to achieve maximum overall user satisfaction. We conducted trace-based simulation and demonstrated that our data transfer scheduling algorithms can considerably improve data transfer performance as well as quantified user satisfaction compared with traditional first-come, first-serve or short-job-first approaches.

Comments are closed.