Deferred Open
When I came to Argonne in 2002, my second project was to implement “deferred open”, where we would skip opening the file if certain hints were given. We never got around to writing a paper about this optimization, though. There’s a brief mention in the ROMIO users guide , but it wouldn’t hurt to have a bit more documentation about this feature.
First, some background. ROMIO has an optimization for collective I/O called “two-phase collective buffering”. When writing, ROMIO selects a subset of processes as “I/O aggregators” . These aggregators are the MPI processes that actually write data to the file, after collecting data from all the other processors. When reading, these I/O aggregators read the data in some file-system friendly way, then scatter the data out to the other MPI processors. Observe that in two-phase, the non-aggregator processes never touch the file. We use this observation to implement a deferred open strategy for non-aggregators.
To enable deferred open, two hint conditions must be true
- romio_cb_write and romio_cb_read must not be “disable”. That’s the default setting for every file system everywhere, though: it’s rare to find this condition not met
- romio_no_indep_rw must be “true”. With this hint, the user has told ROMIO “I will not do any independent I/O”. ROMIO will then attempt to avoid opening the file on any non-aggregator processes.
- optional: The cb_config_list and cb_nodes hints can be given to further control which nodes are aggregators
The “deferred” part comes from the fact that MPI Info tunables are hints, not contracts. The user might lie to ROMIO, specifying “romio_no_indep_rw” to “true” and then go right ahead and carry out a bunch of independent I/O operations. In that case, ROMIO will open the file just before the independent I/O operation happens — we say the open has been deferred.