Collective I/O to overlapping regions
It is an error for multiple MPI processes to write to the same or to overlapping regions of a file. ROMIO will let you get away with this but if your processes are writing different data, I can’t tell you what will end up in the file in the end.
What about reads, though?
ROMIO’s two phase collective buffering algorithm handles overlapping read requests the way you would hope: I/O aggregators read from the file and send the data to the right MPI process. N processes reading a config file, for example, will result in one read followed by a network exchange.
proce
As an aside, ROMIO’s two-phase algorithm is general and so not as good as a “read and broadcast” — if you the application/library writer know you are going to have every process read the file, here is one spot (maybe the only spot) where I’d encourage you to (independently) read from one processor and broadcast to everyone else.
I bet you are excited to go try this out on some code. Maybe you will have every process read the same elements out of an array. Did you get the performance you expected? Probably not. ROMIO tries to be clever. If the requests from processes are interleaved, ROMIO will carry on with two phase collective I/O. If the requests are not interleaved, then ROMIO will fall back to independent I/O on the assumption that the expense of the two-phase algorithm will not be worth it.
You can see the check here: https://github.com/pmodels/mpich/blob/master/src/mpi/romio/adio/common/ad_read_coll.c#L149
In 2018, two-phase is almost always a good idea — even if the requests are well-formed, collective buffering will map request sizes to underlying file system quirks, reduce the overall client count thanks to I/O aggregators, and probably place those aggregators strategically.
You can force ROMIO to always use collective buffering by setting the hint "romio_cb_read"
to “enable” . On Blue Gene systems, that is the default setting already. On other platforms, the default is “automatic”, which triggers that check we mentioned.