Skip to content

ROMIO

  • ROMIO: A High-Performance, Portable MPI-IO Implementation

bglockless

August 5, 2013 by Latham, Robert J.

Update: in MPICH-3.1.1 we finally scrapped bglockless, (see this writeup on 3.1.1 and Blue Gene enhancements)  but it’s still part of the system software on any BG /L BG /P or BG /Q machines.  The following writeup is perhaps of historical interest, but it will be a while (maybe never) before mpich-3.1.1 is the default MPI on Bue Gene /Q.

The IBM BGP MPI-IO implementation is designed to the “lowest common denominator”: NFS. So they’re performing some very conservative locking in their ADIO file system driver in order to try to get correct MPI-IO semantics out of what might be an NFS volume underneath.  It’s possible, though, to select an alternate driver that gives better performance in most cases — and terrible, terrible performance in one specific case.
The MPI routine MPI_File_open takes a string “filename” argument. Normally, ROMIO does a stat of the file system to figure out what kind of file system that file lives on, and then selects a “file system driver” (one of the ADIO modules) that might contain file system specific optimizations.
If you provide a prefix, like “ufs:” for traditional unix files, or “pvfs2:” or even “gridftp:”, then that prefix overrides whatever magic detection routines ROMIO would run, and the corresponding “ADIO driver” will be selected.
For Blue Gene /L (L, I tell you!) I wrote a ROMIO driver that made no explicit fcntl() lock calls.  Those lock calls are normally not a big deal, but PVFS v2 did not support fcntl() locks.   I called this driver ‘bglockless’.
our friends at IBM, in a conservative effort to ensure correctness for all possible file systems, wrapped every I/O operation in an fcntl() lock.  90% of these locks were unnecessary and served only to slow down I/O.
so, the half-day “driver with no locks” project I wrote for PVFS takes on a second life as the “make I/O go fast” driver.
Now here’s the catch, and why we can’t just make “bglockless” the default: certain I/O workloads, if locks are not available, must be carried out in a extremely inefficient manner.  Specifically, strided independent writes to  a file.   Certain rarely used functionality, like shared file pointers and ordered mode operations, are not implemented when locks are disabled.
For Blue Gene /P and /Q, one can set the environment variable BGLOCKLESSMPIO_F_TYPE to 0x47504653 (the GPFS file system magic number). ROMIO will then pretend GPFS is like PVFS and not issue any fcntl() lock commands.

Post navigation

Previous Post:

Large transfers in ROMIO

Next Post:

New ROMIO optimizations for Blue Gene /Q

One comment

  1. Pingback: ROMIO » New ROMIO optimizations for Blue Gene /Q

Comments are closed.

Recent Posts

  • ROMIO and MPICH-4.3.0
  • ROMIO and “large counts”
  • Hintdump: a small utility for poking at MPI implementations.
  • Quobyte file system
  • ROMIO at SC 2019

Recent Comments

  • ROMIO » New ROMIO optimizations for Blue Gene /Q on bglockless
  • bglockless | ROMIO on New ROMIO optimizations for Blue Gene /Q

Archives

  • February 2025
  • May 2024
  • April 2023
  • October 2020
  • November 2019
  • February 2019
  • December 2018
  • November 2018
  • September 2018
  • November 2017
  • September 2017
  • March 2017
  • August 2016
  • June 2016
  • January 2016
  • December 2015
  • November 2015
  • June 2015
  • May 2015
  • February 2015
  • October 2014
  • August 2014
  • July 2014
  • June 2014
  • August 2013
  • July 2013
  • February 2012
  • September 2010
  • November 2009
  • November 2008
  • September 2008
  • February 2006
  • August 2003
  • February 2002

Categories

  • development
  • features
  • gpfs
  • intel-mpi
  • lustre
  • presentations
  • publications
  • releases
  • tuning
  • Uncategorized

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
© 2025 ROMIO | WordPress Theme by Superbthemes