IO corruption on SDSC oasis file systems

Issue #2073 closed
Roland Haas created an issue

I am experiencing file corruption in ASCII output produced by the Cactus code on Comet's (and Gordon's as far as I remember) scratch file systems. This manifests as lines of output being mashed together in the output file.

I have added strace calls to my job script to capture all arguments to the OS's write() function and re-created the write() calls based on this. Those write calls, when replayed on a login node, do produce a correct (no mashed lines) file.

All output to the file in question was from rank 0 only even though the code used MPI and ran on two MPI ranks.

The same code and number of MPI ranks produces a correct output file when run on the $HOME file system.

Thus it seems to me as if there may be an issue with the file system. I can try and reduce the test case to a more minimal example (right now it is a full simulation even though it runs only for <1minute) .

You can find the job script (for account, SLURM options etc) here:

/oasis/scratch/comet/rhaas/temp_project/simulations/OSTREAM_2_12/output-0000/SIMFACTORY/SubmitScript

the script that launches the MPI executable here:

/oasis/scratch/comet/rhaas/temp_project/simulations/OSTREAM_2_12/output-0000/SIMFACTORY/RunScript

the strace output here:

/home/rhaas/strace/strace.1882[67].log

and the awk script to recreate the write calls is:

gawk -vFS='"' '/write.*\/grid-coordinates.xy.asc/{print "printf \""$2"\""}' ~/strace.18826.log >recreate.sh

The corrupted line is eg. line 161 of

/oasis/scratch/comet/rhaas/temp_project/simulations/OSTREAM_2_12/output-0000/TEST/sim/CarpetIOASCII/newsep/grid-coordinates.xy.asc

which reads

1 4 3 4 1 0.1666666666660.505076272276105285714 etc

but should read

1 4 3 4 1 0.166666666666667 -0.0714285714285714

I can avoid the file corruption by flushing the output file after each line.

I am wondering if there is anything known about this or if there is a workaround that does not boil to first writing all data to a file system local to the compute node and copying to /oasis/scratch after the job is finished (how much local space would be available since I would also have to do so for eg checkpoint files and 3d hdf5 output).

Keyword: Comet
Keyword: Gordon
Keyword: SDSC

Comments (5)

  1. Roland Haas reporter
    • removed comment

    SDSC's support teams said (02/27/2017 22:30 in XSEDE ticket 63522)

    For the ascii part I already have a small reproducer from one of the code developers and will work with the Lustre folks to narrow down the cause. Its not something obvious (like a broken disk or some network problem) and also doesn't happen with other kinds of IO.

  2. Roland Haas reporter
    • removed comment

    Update from SDSC. The issue has been identified and a new version of Lustre fixes the problem. However deploying the new lustre client caused problems due to incompatibility with the used Lustre server version. So this is being worked on but not yet fixed.

  3. Roland Haas reporter
    • removed comment

    The fix does indeed fix the issues for Cactus. Will close this ticket once we have confirmation that the fix has been applied cluster-wide.

  4. Roland Haas reporter
    • changed status to resolved
    • removed comment

    According to the SDSC support team (ticket #63522 in message from Wed, 13 Dec 2017 20:39:59 -0600) the change got pushed to all the nodes now. Closing this ticket.

  5. Log in to comment