simfactory does not abort --testsuite submission process if rsync fails

Issue #1340 closed
Roland Haas created an issue

when setting up testsuite runs simfactory uses rsync to copy the test suite data into the simulation folder. If this rsync fails (eg. because a user specified incorrect rsyncopts in defs.local.ini) the submission process does not abort and instead submits an emtpy test-suite run.

rhaas@kraken-gsi2:~/ET_trunk> sim create-submit 2p6t --procs 12 --num-threads 6 --walltime 4:0:0 --tests
uite --allocation TG-ASC120003
Skeleton Created
Job directory: "/lustre/scratch/rhaas/simulations/2p6t"
Option --testsuite given
Executable: "/nics/c/home/rhaas/ET_trunk/exe/cactus_sim"
Option list: "/lustre/scratch/rhaas/simulations/2p6t/SIMFACTORY/cfg/OptionList"
Submit script: "/lustre/scratch/rhaas/simulations/2p6t/SIMFACTORY/run/SubmitScript"
Run script: "/lustre/scratch/rhaas/simulations/2p6t/SIMFACTORY/run/RunScript"
Assigned restart id: 0
Copying testsuite data
rsync: --times=no: option does not take an argument
rsync error: syntax or usage error (code 1) at main.c(1435) [client=3.0.9]
Executing submit command: /opt/torque/2.5.7/bin/qsub /lustre/scratch/rhaas/simulations/2p6t/output-0000/SIMFACTORY/SubmitScript
Submit finished, job id is 3236567.nid00016
rhaas@kraken-gsi2:~/ET_trunk> qdel 3236567.nid00016

My rsynopts were:

rsyncopts       = --times=no --checksum --include 'configs/*/ThornList' --exclude 'configs/*/*'

which are bad for two reasons: 1.) kraken's rsync does not no --times-no (likely wants --notimes or so) 2.) --exclude 'configs//' excludes cctk_MPI.h which is used by the test suite infrastructure to detect the presence of MPI

Note that some of these options are obviously obsolete now that simfactory defaults to --times=no --checksum anyway.

Still, simfactory should always check the exit status of any command it calls I think.

Keyword:

Comments (5)

  1. Ian Hinder
    • removed comment

    I just wanted to clarify (as I was confused initially) that Roland is saying that these rsyncopts are bad, not that the simfactory rsync options for tests are bad. SimFactory explicitly includes the required cctki_MPI.h file. The bug is that simfactory uses simlib.ExecuteCommand without checking its return status. For some reason, return codes are being used to indicate errors, even though the language has exceptions for this purpose.

    The attached (untested) patch should solve the problem, assuming I didn't make any typos! Does it work?

  2. Ian Hinder
    • removed comment

    Why are the "rsyncopts" being used for copying test data? I wouldn't have thought they were relevant there. rsyncopts is used for "sim sync", which is a very different type of operation.

  3. Roland Haas reporter
    • marked as
    • removed comment

    Bumping major as this prevents me from running the tests on bluewaters. rsync 3.0.9 seems to not like --times=no and aborts with:

    rsync: --times=no: option does not take an argument
    Command returned exit status 1
    Error: Rsync of test data for simulation failed
    Aborting Simfactory.
    

    So actually this ticket is now fixed but I still cannot run the tests.

  4. Log in to comment