Archive for the ‘Uncategorized’ Category


August 5th, 2013
Comments Off on bglockless

Update: in MPICH-3.1.1 we finally scrapped bglockless, (see this writeup on 3.1.1 and Blue Gene enhancements)  but it’s still part of the system software on any BG /L BG /P or BG /Q machines.  The following writeup is perhaps of historical interest, but it will be a while (maybe never) before mpich-3.1.1 is the default MPI on Bue Gene /Q.

The IBM BGP MPI-IO implementation is designed to the “lowest common denominator”: NFS. So they’re performing some very conservative locking in their ADIO file system driver in order to try to get correct MPI-IO semantics out of what might be an NFS volume underneath.  It’s possible, though, to select an alternate driver that gives better performance in most cases — and terrible, terrible performance in one specific case.
The MPI routine MPI_File_open takes a string “filename” argument. Normally, ROMIO does a stat of the file system to figure out what kind of file system that file lives on, and then selects a “file system driver” (one of the ADIO modules) that might contain file system specific optimizations.
If you provide a prefix, like “ufs:” for traditional unix files, or “pvfs2:” or even “gridftp:”, then that prefix overrides whatever magic detection routines ROMIO would run, and the corresponding “ADIO driver” will be selected.
For Blue Gene /L (L, I tell you!) I wrote a ROMIO driver that made no explicit fcntl() lock calls.  Those lock calls are normally not a big deal, but PVFS v2 did not support fcntl() locks.   I called this driver ‘bglockless’.
our friends at IBM, in a conservative effort to ensure correctness for all possible file systems, wrapped every I/O operation in an fcntl() lock.  90% of these locks were unnecessary and served only to slow down I/O.
so, the half-day “driver with no locks” project I wrote for PVFS takes on a second life as the “make I/O go fast” driver.
Now here’s the catch, and why we can’t just make “bglockless” the default: certain I/O workloads, if locks are not available, must be carried out in a extremely inefficient manner.  Specifically, strided independent writes to  a file.   Certain rarely used functionality, like shared file pointers and ordered mode operations, are not implemented when locks are disabled.
For Blue Gene /P and /Q, one can set the environment variable BGLOCKLESSMPIO_F_TYPE to 0x47504653 (the GPFS file system magic number). ROMIO will then pretend GPFS is like PVFS and not issue any fcntl() lock commands.


system hints: hints via config file

September 26th, 2008
Comments Off on system hints: hints via config file

In ROMIO, setting hints looks like this:

MPI_Info info;
MPI_Info_set(“cb_buffer_size”, “8388608”);

Setting these hints in the program  can make sense in many cases — for example, you know something specific about the workload and wish to guide ROMIO’s optimizations a bit.  But what if you want to explore the impact of hints on your program?  There are a few options to do so:

  •  Modify your program to look at an environment variable and use that as the value for your hint.
  •  Take a command line parameter.
  • Repeatedly edit and re-compile your program.

While good practice, the approaches require additional work.  It also assumes access to the source code — common, but not a guarantee.
Additionally, we notice very few users set hints on their own.  They will gladly do so if we suggest it, but what would be great is if every application on a system ran with the best hints for that system.   Sometimes you can count on the system’s vendor to set the defaults, but it is our experience that vendor defaults are exceedingly conservative.
We added a new feature in ROMIO called “system hints“.   You can now populate a config file with the same key-value pairs you would pass to MPI_Info_set and ROMIO will add those hints to your program.
Here’s an example of what that file might look like:

$ cat romio_hints
romio_cb_read enable
romio_cb_write enable
cb_config_list *:2

By default, ROMIO will look for /etc/romio-hints , but you can set the environment variable ROMIO_HINTS to select a different location (for example, your application’s working directory)


citing ROMIO

February 1st, 2002
Comments Off on citing ROMIO

To cite all of ROMIO, use “A Case for Using MPI’s Derived Datatypes to Improve I/O”

author = {Rajeev Thakur and William Gropp and Ewing Lusk},
title = {A Case for Using {MPI’s} Derived Datatypes to Improve {I/O}
booktitle = {Proceedings of SC98: High Performance Networking and Computing},
year = {1998},
month = {November},
publisher = {ACM Press},
earlier = {thakur:mpi-tr},
URL = {},
keywords = {MPI, parallel I/O, pario-bib}

To cite specific optimizations such as data sieving or collective buffering, cite “Optimizing Noncontiguous Accesses in MPI-IO,”:

author = {Rajeev Thakur and William Gropp and Ewing Lusk},
title = {Optimizing Noncontiguous Accesses in {MPI-IO}},
journal = {Parallel Computing},
year = {2002},
month = {January},
volume = {28},
number = {1},
pages = {83–105},
URL = {},
keywords = {parallel I/O, parallel I/O, MPI-IO, collective I/O, data sieving,