Simfactory (python version) not able to submit jobs on bluedrop.

Issue #295 closed
Peter Diener created an issue

With the most recent version of the python version of simfactory, job submission fails on bluedrop. The standard error file for the job contains:

Traceback (most recent call last): File "/home/diener/Cactus/simfactory/bin/../lib/sim.py", line 141, in <module> main() File "/home/diener/Cactus/simfactory/bin/../lib/sim.py", line 138, in main CommandDispatch() File "/home/diener/Cactus/simfactory/bin/../lib/sim.py", line 105, in CommandDispatch module.main() File "/data/diener/Experimental/Cactus/simfactory/lib/sim-manage.py", line 336, in main CommandDispatch() File "/data/diener/Experimental/Cactus/simfactory/lib/sim-manage.py", line 315, in CommandDispatch exec("command_%s()" % command) File "<string>", line 1, in <module> File "/data/diener/Experimental/Cactus/simfactory/lib/sim-manage.py", line 166, in command_run restart.submitRun(simulationName, restart_id) File "/data/diener/Experimental/Cactus/simfactory/lib/simrestart.py", line 824, in submitRun self.run() File "/data/diener/Experimental/Cactus/simfactory/lib/simrestart.py", line 869, in run assert(self.IsActive()) AssertionError

Keyword:

Comments (5)

  1. anonymous
    • changed status to resolved
    • removed comment

    Between job submission by simfactory and the first query of job status (<1 second), the job wasn't being reported as existing by llq, so simfactory was cleaning the job up before run() executed. Introduced a 30 second delay (rate limiting cleanup) between submission of the simulation and the first attempt at cleanup. Seems to have solved the issue.

  2. Erik Schnetter
    • removed comment

    That seems dangerous. If the simulation is checked for some other reason, then the simulation will still be cleaned up.

    Could you add a check into the cleanup routine that tests the age of the simulation, and does not mark it as "finished" if it is less than a minute old? This check would prevent cleanup more reliably, e.g. also if one runs simfactory multiple times, or if the rate limiting mechanism changes at some point.

    Can you also add a comment to the code explaining why this is necessary?

  3. anonymous
    • removed comment

    I used the rate limit mechanism i built into simfactory a few revisions ago to accomplish this. When this timestamp exists, simfactory under no condition will attempt to clean up the simulation until this timestamp reaches a certain age. The only change I had to make was when makeActive() is called by a restart (a restart gets flagged as active), I create the timestamp in the simulation. This ensures that any change to active prevents cleanup for at least 30 seconds.

    I understand what you're saying, though, and I'll separate the simulation submission process from the cleanup rate limiting mechanism. I'll commit that change shortly.

  4. anonymous
    • removed comment

    k. a separate simulation timestamp is created upon submission. CleanupRestarts looks for this timestamp as well, and if its not 60 seconds old, doesn't attempt to clean up the simulation.

  5. Log in to comment