Non-blocking collective I/O
The MPI standard defines non-blocking communication. It also defines non-blocking (independent) I/O. When it comes to collective I/O, the choices are blocking I/O or the little-developed and little-used “split collectives”.
The HDF Group pushed to add true non-blocking collective I/O to the MPI standard. MPI-3.1 finally incorporates this feature. The use cases are motivated by things the HDF5 library would like to do in a portable manner at scale:
- Modifying metadata of a dataset: Each process has a cache of metadata, so updates done collectively (thus ensuring everyone’s cache is consistent between memory and file). When evicting members from this cache, HDF5 could issue a non-blocking collective I/O request for these typically tiny elements, then go do other work.
- Backgrounding Data operations: HDF5 knows a bit about the structure of data on disk due to its file format. It also knows a bit about the data a user will want to operate on. A sufficiently clever HDF5 library could issue non-blocking collective I/O to either read-ahead in anticipation of what a user will need, or to maintain a write-back cache.
Sangmin Seo implemented the non-blocking collective I/O routines for ROMIO. Implementers might find it interesting that he used the extended generalized requests we added to MPICH way back in 2007.
This feature is available in mpich-master and in the last few pre-releases. If you try out non-blocking collective I/O, let us know how it worked.