Understanding I/O performance at Leadership Scale
Getting the best performance out of ROMIO can take a good bit of work and a great deal of experience. Before one can properly tune the various performance optimizations (data sieving, two-phase collective buffering, I/O aggregator selection and placement, etc), one needs to understand the entire storage stack. Such a study is time and labor intensive, but sometimes the resulting paper can get you an SC publication:
I/O Perfomance Challenges at Leadership Scale (a link tot he PDF)
http://dl.acm.org/citation.cfm?id=1654100 (a link to the citation)
We spent a million CPU hours and benchmarked all the links between disks and compute nodes. While Intrepid will only be with us for a few years (UPDATE: Intrepid was decommissioned at the end of 2013), the approach in this paper should be applied to all new leadership-class machines.