Lustre tuning
Getting the best performance out of Lustre can be a bit of a challenge. Here are some things to check if you tried out ROMIO on a Lustre file system and did not see the performance you were expecting.
The zeroth step in tuning Lustre is “consult your site-specific documentation”. Your admins will have information about how many Lustre servers are deployed, how much performance you should expect, and any site-specific utilities they have provided to make your life easier. Here are some of the more popular sites:
First, are you using the Lustre file system driver? Nowadays, you would have to go out of your way not to. One can read the romio_filesystem_type hint to confirm.
Next, what is the stripe count? Lustre typically defaults to a stripe count of 1, which means all reads and writes will go to just one server (OST in Lustre parlance). Most systems have tens of OSTs, so the default stripe size is really going to kill performance!
The ‘lfs’ utility can be used to get and set lustre file information.
$ lfs getstripe /path/to/directory lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 2 obdidx objid objid group 2 14114525 0xd75edd 0x280000400
This directory has a stripe_count of 1. That means any files created in this directory will also have a stripe count of one. This directory would be good for hosting small config files, but large HPC input decks or checkpoint files will not see good performance.
When reading a file, there’s no way to adjust the stripe count. When the file is created, the striping is locked in place. You would have to create a directory with a large stripe count and copy the files into this new directory.
$ lfs setstripe -c 60 /my/new/directory
Now any new file created in “my/new/directory” will have stripe count of 60
If you care creating a new file, you can set the stripe size in ROMIO with the “striping_factor” hint:
MPI_Info_create(&info); MPI_Info_set(info, "striping_factor", "32"); MPI_File_open(MPI_COMM_WORLD, "foo.chkpt", MPI_MODE_CREATE, info, &fh);