HDF5-1.10.0 and more scalable metadata

June 9, 2016 by Latham, Robert J.

While not exactly ROMIO, the new HDF5 release comes with a nice optimization that benefits all MPI-IO implementations.
In order to know where the objects, datastets, and other information in an HDF5 file is located on disk, a process needs to read the HDF5 metadata. This metadata is scattered across the file, so to find out where everything is located a process will have to issue many tiny read requests. For a long time, each HDF5 process needed to issue these reads. There was no way for one process to examine the file and then tell the other processes about the file layout. When HDF5 programs were tens or hundreds of MPI processes, this read overhead was not so bad. As process counts get larger and larger in scale, as on for example Blue Gene, these reads started taking up a huge amount of time.
The HDF Group has implemented collective metadata in HDF5-1.10.0. With collective metadata, only one process will read the metadata and broadcast to the other processes. This optimization has worked quite well for Parallel-NetCDF and we’re glad to see it in HDF5. Hopefully, other I/O libraries will learn this lesson and adopt similar scalable approaches.
If you do any reading of HDF5 datasets in parallel, go upgrade to HDF5-1.10.0 .