Quarterly Newsletter, July 2022

In lieu of our usual software update this quarter, we would instead like to invite the community to join us for the July 28 quarterly meeting to discuss ideas for how to improve the process of bootstrapping, testing, and validating Mochi software environments. This process is more ad-hoc than we would like for it to be right now; anything we can do to formalize and streamline would be a big help in the long run.

If you have any suggestions or comments to share (on this topic or anything else Mochi-related) please connect to the following meeting on Thursday, July 28, at 10:00am CT:

Mochi Quarterly meeting

Click here to join the meeting

Or call in (audio only)

+1 630-556-7958,,254649841#  United States, Big Rock

Phone Conference ID: 254 649 841#

Find a local number | Reset PIN

Learn More | Meeting options

Please join our mailing list (see link on the right side of this web site) if you would like to suggest agenda items in advance.

Building Custom Data Services with Mochi BoF, 2022

Team Mochi, along with special guests Philip Davis of the University of Utah and Chris Kelly of Brookhaven National Laboratory, hosted a BoF session entitled “Building Custom Data Services with Mochi” at the 2022 ECP Community BoF Days. Thank you everyone for participating! The slides are now available online on the Mochi Tutorials page.

UPDATE: a full video event is also now available from the tutorials link provided above.

Quarterly Newsletter, April 2022

New tools

  • Mochi-json-vis
    • https://github.com/mochi-hpc/mochi-json-vis
    • A command-line tool that can be used to generate a visual representation of a Mochi Bedrock configuration.
    • This can be helpful to sanity check or better understand service configuration details such as the mapping of providers to execution streams.

Software updates

  • Mochi-thallium 0.10.1 (C++ bindings to Mochi)
    • Adds support for timer_callback
    • Adds logger class and logging functionality
    • Adds access to margo’s underlying configuration, pools, and xstreams
  • Mochi-bedrock 0.4.1 (service configuration framework)
    • Ability to initialize the server with a JX9 script instead of a JSON configuration

Publications

  • Matthieu Dorier, Zhe Wang, Utkarsh Ayachit, Shane Snyder, Robert Ross, Manish Parashar. “Colza: Enabling Elastic In Situ Visualization for High-performance Computing Simulations.” in Proceedings of the 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022) (TO APPEAR)
  • Bradley Settlemyer, George Amvrosiadis, Philip Carns, and Robert Ross. “It’s time to talk about HPC storage: Perspectives on the past and future.” Computing in Science & Engineering, 23(6):63–68, 2021. https://ieeexplore.ieee.org/document/9658238

Upcoming events

Building Custom Data Services with Mochi (public BoF)

May 12th, 11:00 AM eastern time

We will provide general updates on the Mochi project, highlight key capabilities related to service composition and key/value stores, and share work from guest speakers about the Mochi messaging layer and successful Mochi use cases:

  • Mercury: platform updates and optimizations for RPC and RDMA communication (Jerome Soumagne, The HDF Group)
  • Chimbuko: scalable application performance analysis and provenance (Chris Kelly, Brookhaven National Laboratory)
  • DataSpaces: extreme-scale data management framework (Philip Davis, University of Utah)

To register, follow this link, expand the Mochi BoF description, and click “Register” — this should provide you with a Zoom link to attend the BoF: https://www.exascaleproject.org/event/ecp-community-bof-days-2022/

Mochi BoF at the ECP Community BoF Days, May 12, 2022

We would like to invite everyone to attend a virtual Mochi BoF session as part of the ECP Community BoF Days:

Building Custom Data Services with Mochi
May 12th, 11:00 AM eastern time

We will provide general updates on the Mochi project, highlight key capabilities related to service composition and key/value stores, and share work from guest speakers about the Mochi messaging layer and successful Mochi use cases:

  • Mercury: platform updates and optimizations for RPC and RDMA communication (Jerome Soumagne, The HDF Group)
  • Chimbuko: scalable application performance analysis and provenance (Chris Kelly, Brookhaven National Laboratory)
  • DataSpaces: extreme-scale data management framework (Philip Davis, University of Utah)

To register, follow this link, expand the Mochi BoF description, and click “Register” — this should provide you with a Zoom link to attend the BoF: https://www.exascaleproject.org/event/ecp-community-bof-days-2022/

Thanks!
–Mochi team

Quarterly newsletter, January 2022

New microservices

  • Mochi-quintain
    • https://github.com/mochi-hpc/mochi-quintain
    • Includes a provider that can be embedded in other services, via Mochi-bedrock or other means, to provide synthetic workload testing capability (i.e., “self-test”)
    • Includes an MPI benchmark that can be used to issue parameterized RPCs to the quintain provider from a large number of concurrent clients to measure response times from a heavily loaded server
    • Some preliminary plotting tools to help understand response time distributions and tail latency
Example distribution of response times for a Quintain provider under load.

Software updates

Platform support

Publications (updated)

Srinivasan Ramesh, Robert B Ross, Matthieu Dorier, Allen D Malony, Philip Carns, and Kevin Huck. SYMBIOMON: A High Performance, Composable Monitoring Service. In 29th IEEE International Conference on High Performance Computing, Data, & Analytics (HiPC). IEEE, 2021.

Upcoming events

  • ECP annual meeting
    • https://www.ecpannualmeeting.com/
    • Tentatively scheduled for May
    • We will host some BoF and/or tutorial content
      • All material will be made publicly available after the event
    • What topics would you like to see covered?

Quarterly Newsletter, October 2021

Project News:

New Microservices:

  • We are happy to introduce a new key/value microservice, called Yokan, to the Mochi framework. You can find more details in the Yokan documentation and Yokan GitHub repository. Yokan aims to provide state-of-the-art key/value storage capabilities on top of Margo, following the best practices of the Mochi methodology. It provides many backends, including BerkeleyDB, GDBM, LevelDB, LMDB, RocksDB, TKRZW, Unqlite, and a number of in-memory key/value stores. It was designed to be highly configurable and highly flexible, making it easy to configure databases using JSON, and to provide your own database implementation if the ones we offer don’t satisfy you. Yokan also provides C++ and Python APIs in addition to the usual C API.

Software updates:

  • Libfabric 1.13.2 has resolved multiple outstanding bugs that impacted Mochi, particularly with the RXM provider which is used on TCP and Verbs networks. Please try it out and report if you have any problems.
  • Mercury version 2.1.0rc2 is now available. This is very close to the final 2.1.0 release of Mercury and is the default version supported in the mochi-spack-packages repository. It includes a UCX network driver, improvements to the shared memory transport, new threading options, and miscellaneous bug fixes.
  • Margo version 0.9.6 has also been released; it includes support for the upcoming Mercury 2.1.0 and performance enhancements that take advantage of upcoming features in Argobots 1.2.

Platform support:

  • Please remember to refer to the Mochi platform configurations repository for suggested configurations for various platforms. We have recently updated several example Spack environment files. Feel free to contribute more!

Contribution policy:

  • The Mochi Contributor License Agreement (CLA) has been updated to streamline the process of contributing source code to the project. We have also installed GitHub action that will automatically prompt you to digitally agree to the CLA terms when you open your first pull request. Let us know if you have any questions.

New/Upcoming Publications:

  • Srinivasan Ramesh, Robert Ross, Matthieu Dorier, Allen Malony, Philip Carns and Kevin Huck. SYMBIOMON: A High Performance, Composable Monitoring Service. TO APPEAR in the 28th IEEE International Conference on High Performance Computing, Data, & Analytics (HiPC 2021)

Mochi selected as a 2021 R&D 100 finalist

UPDATE: Mochi was announced as a R&D 100 winner in the Software/Services category on October 21, 2021! We will post more details soon.

The Mochi project, a collaboration between Argonne National Laboratory, Carnegie Mellon University, Los Alamos National Laboratory, and The HDF Group, has been selected as a finalist for the 2021 R&D 100 Awards. The the R&D 100 Awards have served as the most prestigious innovation awards program for the past 58 years; their mission is to identify and honor the top 100 new technologies of the year. Winners will be announced in November 2021.

Quarterly Newsletter, July 2021

Publication news:

  • Pierre Matri and Robert Ross. “Neon: Low-Latency Streaming Pipelines for HPC”, to appear in IEEE Cloud 2021, Sept 5-10 2021.
    • Introduces a new Mochi service for stream processing
  • Stay tuned for more Mochi-related publications at SC21 in November. More details will be posted once the SC21 technical program is announced.

Recent development updates:

  • A proof-of-concept of UCX support in Mercury is available in the master-ucx version of Mercury in the Mochi Spack repository
    • Please contact us if you are interested in this capability; it is under active development and should be considered experimental at this time.
  • The git origin/main branch of Margo includes new safety checks to ensure compatible Argobots runtime parameters if Argobots is initialized outside of Margo. This will be available in an upcoming release after coordinating updates to other Mochi packages.
  • Both Mochi and Margo have new Contributor License Agreement (CLA) documents available online as of July 2021 with more relaxed language than the previous version. We will soon streamline these even further with online electronic forms that will be activated within the GitHub contribution process.

Debugging tips:

  • We have encountered several bug reports on Libfabric 1.13.0 in the last few days, especially with the RXM provider. Debugging is in progress, but in the mean time you may want to consider reverting to an earlier release if you encounter communication problems.
  • Recent libfabric releases also include a new PSM3 provider. PSM3 is not directly supported by Mercury / Mochi, but enabling it in libfabric may interfere with the performance of the traditional PSM2 provider. The libfabric package in the Mochi spack repository disables PSM3 by default for now to avoid this problem.

Quarterly newsletter, April 2021

New presentation materials:

GitHub migration complete:

New software releases:

  • Argobots 1.1
    • Underlying user-level threading package for Mochi
    • includes performance improvements, broader platform support, and new profiling and debugging capabilities (more on that later)
  • Mercury 2.0.1rc3
    • Underlying RPC communication package for Mochi
    • improved logging and several performance optimizations
    • final 2.0.1 release coming soon
  • Mochi-sdskv 0.1.12
    • Key/Value store microservice
    • Bedrock support
    • various packaging (cmake, pkgconfig, and dependency) improvements
  • Bedrock 0.2.1
    • Flexible service composition tool
    • various packaging (cmake, pkgconfig) improvements
  • Sonata 0.6.2
    • Document store microservice
    • various packaging (cmake) improvements

Performance regressions from previous quarterly newsletter resolved:

  • Power9 CPU mutex locking performance regression is resolved in Argobots 1.1
  • OmniPath network performance regression is resolved in Mercury 2.0.1rc3

New debugging/profiling/maintenance features:

  • Margo is now using munit for unit testing
    • Available in origin/main (or mochi-margo@main in Spack)
    • Coverage is limited for now but will be expanded over time
    • We will also be leveraging this frame work in additional components over time
  • Recent Argobots updates include multiple (optional) stack guard methods
    • See Argobots documentation or Spack package variants. Notable optoins:
      • “mprotect”: real time detection of stack overruns (with some performance overhead; just use this for debugging)
      • “canary”: lightweight deferred stack overrun detection (lighter weight, but will not report that a stack overflow occurred until shutdown)
  • margo_state_dump() function
    • Available in origin/main (or mochi-margo@main in Spack)
    • function that can be called at any time to dump point-in-time state to a text file or stdout for debugging purposes
    • includes Margo json configuration, Argobots configuration, current Argobots ES layout, Argobots performance profile, in flight RPC counts, stack dump for blocked user-level threads, etc. See https://github.com/mochi-hpc/mochi-margo/blob/main/doc/debugging.md for details.