The modern DOE scientific computing portfolio consists of a rich ecosystem of simulation, data analytics, and learning applications, with many distinct data management and analysis needs. The objective of the Mochi project is to design methodologies and tools that allow for the rapid development of distributed data services in support of DOE science. An important aspect of Mochi is composition: common capabilities such as communication, data storage, concurrency management, and group membership are provided under Mochi along with building blocks such as BLOB and key-value stores. These building blocks are mixed together to provide specialized service implementations catering to specific platforms and science needs. Current Mochi research directions include unifying management of disparate data classes from scientific campaigns and applying learning and artificial intelligence to improve the adaptability of data services on heterogeneous DOE platforms.
The Mochi project is a collaboration between Argonne National Laboratory, Los Alamos National Laboratory, Carnegie Mellon University, and the HDF Group. However, Mochi is also bigger than just these partners: Mochi is an open ecosystem enabling the development of a variety of services both within the DOE and internationally.
Quarterly meeting and newsletter, October 2024
Please join us for the next Mochi quarterly meeting on Thursday, October 31, 2024, at 10am CT. Mochi quarterly meetings are a great opportunity to learn about community activities, share best practices, get help with problems, and find out what’s new in Mochi.
Please suggest agenda items on the Mochi slack space or the [email protected] mailing list.
Microsoft Teams meeting
Join on your computer or mobile app
Click here to join the meeting
Mochi updates and agenda items
- Events at SC24 (November 17-22, Atlanta GA): Look for the Mochi team at the following events at SC24:
- Workshop: 9th International Parallel Data Systems Workshop (PDSW 2024), 9am-5:30pm Sunday Nov. 17
- Many Mochi team member will be in attendance, and several workshop presentations will feature Mochi technology, including “Copper: Cooperative Caching Layer for Scalable Data Loading in Exascale Supercomputers” and multiple presentations on work that leverages the DAOS storage system.
- Panel: Advancing the State of the Art in Distributed Services for HPC, 10:30am-12pm Thursday Nov. 21
- Mochi team members Matthieu Dorier and Rob Ross will be participating as panelist and moderator respectively.
- Paper: CARP: Range Query-Optimized Indexing for Streaming Data, 11:30am-12pm Thursday Nov 21
- Ankush Jain will be presenting his research work on a Mochi service for dynamic data partitioning.
- Workshop: 9th International Parallel Data Systems Workshop (PDSW 2024), 9am-5:30pm Sunday Nov. 17
- Platform updates:
- Most HPE Slingshot equipped systems (including Aurora@ALCF, Perlmutter@NERSC, and Frontier@OLCF) have now been updated to Libfabric version 1.20.1. We’ve updated our example installation recipes accordingly, see the platform-configurations repo for updates. You must use Mercury version 2.4.0rc3 or higher for compatibility with this update.
- The same repository now also includes an example of how to install Mochi in an Apptainer image so that it can still leverage the native host networking stack.
- We (and HPE) are aware of an issue in the Libfabric CXI provider that causes processes using Libfabric (and Mercury and Mochi) to consume 100% CPU at all times when waiting for network events. We expect this to be resolved in a future HPE software release.
- Software updates:
- Mercury version 2.4.0 was released in October 2024. It includes a variety of new features, bug fixes, and performance optimizations, and is compatible with the Libfabric 1.20.1 CXI provider on HPE platforms.
- Margo version 0.18.2 was released in October 2024. It includes a variety of minor bug fixes and enhancements. It also introduces a new polling mode with a configurable “spindown” parameter to control how long Margo waits before idling after servicing requests. It is enabled by default and significantly speeds up bursty or sequential workloads.
- Mofka version 0.3.3 was released in October 2024. Mofka is a distributed event streaming service for HPC systems. Recent releases include a variety of performance optimizations and features, including support for access to Kafka via the Mofka API.
- Bedrock version 0.15.1 was released in October 2024. Bedrock provides bootstrapping and configuration management capabilities for Mochi services. Recent releases include significant refactoring and API revisions to help simplify service configurations.
Quarterly meeting and newsletter, July 2024
Please join us for the next Mochi quarterly meeting on Thursday, July 25, 2024, at 10am CT. Mochi quarterly meetings are a great opportunity to learn about community activities, share best practices, get help with problems, and find out what’s new in Mochi.
Please suggest agenda items on the Mochi slack space or the [email protected] mailing list.
Microsoft Teams meeting
Join on your computer or mobile app
Click here to join the meeting
Mochi updates and agenda items
- Upcoming publications:
- Ankush Jain, Chuck Cranor, Qing Zheng, Brad Settlemyer, George Amvrosiadis, Gary Grider, “CARP: A Streaming Partitioner for Range Queries”, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis 24 (2024).
- Recent publications:
- Matthieu Dorier, Philip Carns, Robert Ross, Shane Snyder, Rob Latham, Amal Gueroudji, George Amvrosiadis, Chuck Cranor, Jerome Soumagne, “Extending the Mochi Methodology to Enable Dynamic HPC Data Services,” in proceedings of the 5th Workshop on Extreme-Scale Storage and Analysis (ESSA 2024) May, 2024.
- Software updates:
- mochi-margo 0.17.0 was released in May 2024. This is the core RPC management component that ties together Mercury RPCs and Argobots user-level threads. The 0.17.0 release features a new margo_monitor_dump() function that can be used to emit (and optionally reset) integrated Margo monitoring data at runtime rather than waiting for program termination.
- Mochi-flock 0.3.1 was released in July 2024. It is a new implementation of group membership functionality for Mochi services (replacing mochi-ssg). Recent release have added a variety of new features as well as a Python and C++ API.
- Mochi-bedrock 0.13.1 was released in July 2024. It is a bootstrapping and configuration management component for Mochi services. Recent releases have added mochi-flock support (enabled by default), added support for TOML configurations, and split the module api out into a separate component called mochi-bedrock-module-api.
- Mofka 0.1.1 was released in July 2024. Mofka is a new top-level Mochi service that implements a distributed event streaming model for HPC services. Mofka is still under active development but includes documentation and a stable API.
- Mercury 2.4.0rc3 was released in June 2024. Mercury is the underlying RPC and bulk RDMA transfer framework used for communication in in Mochi. This preview release includes several new tuning parameters and a new API that enables users to wait on a file descriptor for new events. We plan to leverage this feature in Margo in the future to improve network polling efficiency.
- Featured topics:
- This quarter we will provide an overview / walkthrough of the Mochi-flock group membership component. Mochi-flock is available at https://github.com/mochi-hpc/mochi-flock.
Quarterly Meeting and Newsletter, April 2024
Please join us for the next Mochi quarterly meeting on Thursday, April 25, 2024, at 10am CT. Mochi quarterly meetings are a great opportunity to learn about community activities, share best practices, get help with problems, and find out what’s new in Mochi.
Please suggest agenda items on the Mochi slack space or the [email protected] mailing list.
Microsoft Teams meeting
Join on your computer or mobile app
Click here to join the meeting
Or call in (audio only)
+1 630-556-7958,,254649841#
Mochi updates and agenda items
- Upcoming publications:
- Matthieu Dorier, Philip Carns, Robert Ross, Shane Snyder, Rob Latham, Amal Gueroudji, George Amvrosiadis, Chuck Cranor, Jerome Soumagne, “Extending the Mochi Methodology to Enable Dynamic HPC Data Services,” to appear at the 5th Workshop on Extreme-Scale Storage and Analysis (ESSA 2024) in May, 2024.
- Recent publications and presentations:
- P. Carns, M. Dorier, R. Latham, R. B. Ross, S. Snyder and J. Soumagne, “Mochi: A Case Study in Translational Computer Science for High-Performance Computing Data Management,” in Computing in Science & Engineering, vol. 25, no. 4, pp. 35-41, July-Aug. 2023, doi: 10.1109/MCSE.2023.3326436.
- Philip Carns. “Harnessing Programmable Devices in the Data Path with Composable Services”. Short presentation at the Joint Laboratory for Extreme-Scale Computing workshop (JLESC16), April 17, 2024. PDF
- Philip Carns. “HPC Storage: Adapting to Change”. Keynote presentation at the 3rd Workshop on Re-envisioning Extreme-Scale I/O for Emerging Hybrid HPC Workloads (REX-IO), October 31, 2023. PDF
- Software updates:
- Mochi-abt-io 0.7.0 was released in February 2024. This package provides an Argobots-aware abstraction of of common POSIX I/O functions to enable efficient, highly-concurrent I/O in Mochi services. The 0.7.0 release features support for the high-performance liburing Linux I/O interface. Documentation on how to to use this feature is available on the Mochi readthedocs page.
- Mochi-flock 0.1.0 was released in April 2024. It is a new component meant to replace SSG for group membership in Mochi services. The initial release includes Bedrock integration and support for static groups.
- Mochi-margo 0.16.0 was released in April 2024. Margo is the core Mochi component that combines Mercury RPCs and Argobots lightweight threads into a coherent data service programming model. The 0.16.0 release features improve timer support with a particular emphasis on more robust handling of cancelled timers.
- Argobots 1.2 was released in March 2024. Argobots is the foundational user-level threading package used by Mochi. The 1.2 release features a large collection of bug fixes, performance enhancements, and new elasticity features. Mochi has relied on a stable 1.2 release candidate as the preferred version of Argobots prior to this release; most users should not notice any changes.
Quarterly Newsletter, January 2024
The Mochi quarterly meeting for Thursday, January 25, 2024 has been cancelled. Please reach out to us on the mailing list or Slack space if you have anything that you would like to discuss with the Mochi team. Otherwise we hope to see you at our next quarterly meeting on April 25.
Mochi quarterly meetings are a great opportunity to learn about community activities, share best practices, get help with problems, and find out what’s new in Mochi.
Please suggest agenda items on the Mochi slack space or the [email protected] mailing list.
Mochi updates
- Software updates
- Mofka version 0.0.2 was released on January 19, 2024. Mofka is a distributed event streaming service with semantics and data models similar to that of Kafka, but geared towards scientific computing on HPC platforms. It is still in an early alpha stage but we invite questions, feedback, and comments. We will cover Mofka in more detail in a future meeting.
- Mercury version 2.3.1 was released on October 2.3.1. Mercury is the RPC framework used by Mochi for all communication and data transfer. This point release includes a collection of bug fixes and optimizations as well as support for more Slingshot VNI configurations.
- Platform updates
- The Mochi team is now using the ALCF Gitlab CI infrastructure to execute nightly performance regression tests on the Polaris system at the ALCF. We hope to expand this testing in the future. For now its primary objective is to monitor Slingshot network performance across not only Mochi updates but also HPE system software updates.
Quarterly meeting and newsletter, October 2023
Please join us for the next Mochi quarterly meeting on Thursday, October 26, 2023, at 10am CT. Mochi quarterly meetings are a great opportunity to learn about community activities, share best practices, get help with problems, and find out what’s new in Mochi.
Please suggest agenda items on the Mochi slack space or the [email protected] mailing list.
Microsoft Teams meeting
Join on your computer or mobile app
Click here to join the meeting
Or call in (audio only)
+1 630-556-7958,,254649841#
Mochi updates and agenda items
- Upcoming Publications and Presentations
- “Mochi: A Case Study in Translational Computer Science for High-Performance Computing Data Management” (under preparation for an upcoming issue of IEEE Computing in Science and Engineering)
- If you are attending IEEE Cluster 2023 in Santa Fe, please consider stopping by the REX-IO workshop on Tuesday October 31 for the keynote presentation “Anticipating and Adapting to Change in HPC Storage” by Phil Carns.
- New Mochi Microservices
- HPE Slingshot Status Update
- Communicating on a Slingshot network requires access to a Virtual Network Interface, or VNI, to authorize communication across processes. You may need to take additional steps to configure the VNI depending on your use case.
- Communicating across processes that were launched together (e.g. in the same srun or mpiexec invocation):
- Mercury and thus Mochi will use the same VNI allocated for use by MPI with no additional configuration needed.
- You may need to use a “–single-node-vni” argument to mpiexec or a “–network=single_node_vni” argument to srun, depending on your platform, to make sure that a VNI is allocated even if the launcher believes that all processes will be executing on the same node.
- Communicating across independently-launched processes within a job:
- On the Aurora or Sunspot systems at ANL, no additional configuration is needed.
- On HPE/SLURM based systems (i.e. Frontier and Perlmutter) additional configuration is needed, because these systems utilize a unique VNI for each job step by default. You can instruct Mercury to instead use a job-level VNI by passing a special value of “0:0” in the “auth_key” field of the Mercury json configuration in Mochi. This feature is already available in mercury@master but will also be available in the next point release. In addition, you must also enable the job-level VNI with the –network=job_vni option to the sbatch command or as a directive at the top of your job script.
- Communicating across jobs:
- We are still working with HPE on a general solution to enable communication across jobs.
- Communicating across processes that were launched together (e.g. in the same srun or mpiexec invocation):
- Communicating on a Slingshot network requires access to a Virtual Network Interface, or VNI, to authorize communication across processes. You may need to take additional steps to configure the VNI depending on your use case.
- General platform updates:
- A recipe for building and running Mochi on the ANL Sunspot system can now be found in the Mochi platform-configurations repository.
Quarterly meeting and newsletter, July 2023
Please join us for the next Mochi quarterly meeting on Thursday, July 27, 2023, at 10am CT. Mochi quarterly meetings are a great opportunity to learn about community activities, share best practices, get help with problems, and find out what’s new in Mochi.
Please suggest agenda items on the Mochi slack space or the [email protected] mailing list.
Microsoft Teams meeting
Join on your computer or mobile app
Click here to join the meeting
Or call in (audio only)
+1 630-556-7958,,254649841#
Mochi updates and agenda items
- Recent presentations
- The Mochi team presented two seminars in June 2023 as part of the Mathematics and Computer Science Division’s CS seminar series. These seminars provide an overview of the state of Mochi and how it can be used in 2023. The first covers high-level motivation, concepts, and key technologies, while the second describes the Mochi methodology for composable data services and highlights success stories in domain-specific data services (HEPnOS) and elastic in situ visualization (Colza).
- Philip Carns. “Mochi Project Overview: the Democratization of Data Services in HPC”, CS seminar series, Argonne National Laboratory Mathematics and Computer Science division, June 13, 2023. ABSTRACT PDF VIDEO
- Matthieu Dorier. “Mochi in Practice: Data Services for High-Energy Physics and Elastic In Situ Visualization Workflows”, CS seminar series, Argonne National Laboratory Mathematics and Computer Science division, June 20, 2023. ABSTRACT PDF VIDEO
- The Mochi team presented two seminars in June 2023 as part of the Mathematics and Computer Science Division’s CS seminar series. These seminars provide an overview of the state of Mochi and how it can be used in 2023. The first covers high-level motivation, concepts, and key technologies, while the second describes the Mochi methodology for composable data services and highlights success stories in domain-specific data services (HEPnOS) and elastic in situ visualization (Colza).
- Recent tutorials
- Matthieu Dorier, Philip Carns, and Marc-André Vef presented the following tutorial on May 21 at ISC High Performance 2023:
- Tutorial: Developing Custom HPC Data Services Using Mochi
- slides and exercises are available online
- The tutorial includes extensive hands-on exercises that can be done in either C or C++. They begin with simple RPC examples to illustrate concepts and then build up to using templates for complete microservices using Bedrock.
- The tutorial materials also include a docker image to make it easier to get started with a development environment.
- Matthieu Dorier, Philip Carns, and Marc-André Vef presented the following tutorial on May 21 at ISC High Performance 2023:
- New features
- Yokan now enjoys 4 new families of functions: yk_fetch , yk_doc_fetch, yk_iter and yk_doc_iter (each with variants to access multiple key/value pairs or documents at once). These functions are equivalent to yk_get, yk_doc_load, yk_list_keyvals and yk_doc_list, respectively, but take a callback that is invoked on each key/value pair or document, instead of taking a buffer in which the key/value pair or document is copied. These functions allow for fewer memory copies and simpler code (no need for the caller to manage their own buffer or call other functions to query the size of values/documents first). The _iter functions also provide automatic pipelining and batching.
- Mercury 2.3.0 is out now, including several notable performance enhancements for libfabric and CXI:
- new “multi-recv” optimization to improve RPC throughput
- avoid performance degradation in FI_SOURCE
- use WAIT_FD for graceful idling on Slingshot (CXI) transports
- HPE Slingshot status update
- Mercury support for Slingshot (CXI) is feature complete and performing well with Mercury 2.3.0, but there are some important usability issues to be aware of regarding Virtual Network Interfaces (VNIs). VNIs are a mandatory method for Slingshot network access control between compute nodes.
- The default job launcher on HPE systems will automatically provision a VNI for MPI. Mercury will inherit and use this same VNI without any additional action on your part.
- However, this default, launcher-provided VNI is not sufficient for communicating across MPI jobs or among manually-launched processes.
- We are in communication with HPE about this issue, but they are still working on a general solution. If you encounter problems, please alert your facility or vendor contacts and let us know about your experience!
Mochi CS seminars at Argonne National Laboratory
The Mochi team presented the following two seminars in June 2023 as part of the Mathematics and Computer Science Division’s CS seminar series:
- Philip Carns. “Mochi Project Overview: the Democratization of Data Services in HPC”, CS seminar series, Argonne National Laboratory Mathematics and Computer Science division, June 13, 2023. ABSTRACT PDF VIDEO
- Matthieu Dorier. “Mochi in Practice: Data Services for High-Energy Physics and Elastic In Situ Visualization Workflows”, CS seminar series, Argonne National Laboratory Mathematics and Computer Science division, June 20, 2023. ABSTRACT PDF VIDEO
The first is a high-level presentation of the motivation, concepts, and key technologies that make Mochi possible. The second describes the Mochi methodology for composable data services and highlights success stories in domain-specific data services (HEPnOS) and elastic in situ visualization (Colza).
Together, these presentations give a nice overview of the state of Mochi and how it can be used in 2023. See the links above for more detailed abstracts, slides, and a video recordings of the presentations.
Quarterly meeting and newsletter, April 2023
Please join us for the next Mochi quarterly meeting on Thursday, April 27, 2023, at 10am CT. Mochi quarterly meetings are a great opportunity to learn about community activities, share best practices, get help with problems, and find out what’s new in Mochi.
Please suggest agenda items on the Mochi slack space or the [email protected] mailing list.
Mochi updates and agenda items
- Upcoming events:
- Paper: HEPnOS: a Specialized Data Service for High Energy Physics Analysis
- Ali, Calvez, Carns, Dorier, Ding, Kowalkowski, Latham, Norman, Paterno, Ross, Sehrish, Snyder, Soumagne
- May 15, 2023, St Petersburg, Florida, USA
- ESSA 2023 : 4th Workshop on Extreme-Scale Storage and Analysis (co-located with IPDPS 2023)
- Tutorial: Developing Custom HPC Data Services Using Mochi
- Matthieu Dorier, Philip Carns, and Marc-André Vef
- Sunday, May 21, 2023 2:00 PM to 6:00 PM, Hamburg Germany
- Part of the ISC High Performance 2023 tutorial program
- Panel: The Future of Open-Source Filesystems for HPC – Competition, Cooperation or Consolidation?
- 8 panelists from across industry, academia, and government, including Philip Carns
- Monday, May 22, 2023 5:20 PM to 6:35 PM, Hamburg Germany
- Part of the ISC High Performance 2023 program
- Paper: HEPnOS: a Specialized Data Service for High Energy Physics Analysis
- Recent events and publications:
- Bringing Elasticity to HPC Storage and Data Services
- Matthieu Dorier
- March 23, 2023
- presentation at the 15th Joint Laboratory on Extreme Scale Computing (JLESC workshop)
- Bringing Elasticity to HPC Storage and Data Services
- Recent software development updates:
- New version of bedrock (0.6.0)
- https://github.com/mochi-hpc/mochi-bedrock/releases/tag/v0.6.0
- Adds many functionalities to add components (pools, xstreams, providers, etc.) to a running Bedrock daemon
- Adds client interfaces to interact with a group of daemons all at once
- Documentation for Bake on ReadTheDocs
- With a few example codes provided to show you how to get started
- https://mochi.readthedocs.io/en/latest/bake.html
- In progress: liburing support for mochi-abt-io
- https://github.com/mochi-hpc/mochi-abt-io/pull/20
- liburing is a modern kernel-assisted asynchronous I/O interface for Linux that is anticipated to be especially advantageous for low-latency storage devices
- see https://github.com/axboe/liburing for details
- New version of bedrock (0.6.0)
- Platform updates:
- A Spack recipe for the OLCF Frontier system is available now
Quarterly meeting and newsletter, January 2023
Please join us for the next Mochi quarterly meeting on Thursday, January 26, 2023, at 10am CT. Mochi quarterly meetings are a great opportunity to learn about community activities, share best practices, get help with problems, and find out what’s new in Mochi.
Please suggest agenda items on the Mochi slack space or the [email protected] mailing list.
Phone Conference ID: 254 649 841#
We plan to discuss the following topics at this meeting:
- Call for lightning presentations:
- Do you have something that you would like to present at the Mochi quarterly meeting? We would love to hear about interesting services you have built using Mochi technology, performance results, challenges and obstacles, or all of the above! It may be short notice for this meeting, but please let us know if you would like to share a presentation this week or request a slot for a future meeting.
- Elasticity support in Margo (Matthieu Dorier):
- Recent changes to Margo have introduced programmatic APIs for changing Margo configuration on the fly (in particular for adding, removing, or modifying Argobots execution streams). This is intended for use as a building block for elastic data services.
- The complete API can be found at https://github.com/mochi-hpc/mochi-margo/blob/main/include/margo-config.h
- Call for feedback on Mochi tutorial topics
- The Mochi team is planning to introduce new tutorial material this year (venues TBA).
- Upcoming tutorials will focus on hands-on exercises that use containers and Mochi service templates to get up and running quickly.
- What suggestions do you have for us on what points we should cover as we develop this new material?
- Examples of previous Mochi tutorials can be found at https://www.mcs.anl.gov/research/projects/mochi/tutorials/
Quarterly meeting and newsletter, October 2022
Please join us for the next Mochi quarterly meeting on Thursday, October 27, 2022, at 10am CT. Mochi quarterly meetings are a great opportunity to learn about community activities, share best practices, get help with problems, and find out what’s new in Mochi.
Please suggest agenda items on the Mochi slack space or the [email protected] mailing list.
Phone Conference ID: 254 649 841#
We plan to discuss the following topics at this meeting:
- Updates to Poesie (version 0.2), a microservice for embedding Python and Lua scripting languages within Mochi services:
- Updates to the Mochi onboarding process based on feedback gathered at previous quarterly meeting:
- New “Hello Mochi” guide and API documentation at https://mochi.readthedocs.io/en/latest/index.html
- Demo of
margo-info
, a new command-line utility included in Margo 0.10 to help diagnose network transport problems
- Summary of HPC Storage Service Autotuning Using Variational-Autoencoder-Guided Asynchronous Bayesian Optimization, recently presented at IEEE CLUSTER 2022