newest revision of simfactory 2.0 submit three instead of one job

Issue #420 closed
anonymous created an issue

I am submitting a simulation using simfactory with the command:

[snip]
simfactory/bin/sim submit test-whisky-openmp --configuration test-whisky-openmp --parfile=whisky-openmp-test-ali.par --verbose --walltime=48:00:00 --procs=16 --ppn=4 --num-threads=4 --machine=damiana --queue=intel.q
[snip]

sim then comes with the messages:

[snip]
Info: Simfactory command: simfactory/bin/../lib/sim.py "submit" "test-whisky-openmp" "--configuration" "test-whisky-openmp" "--parfile=whisky-openmp-test-ali.par" "--verbose" "--walltime=48:00:00" "--procs=16" "--ppn=4" "--num-threads=4" "--machine=damiana" "--queue=intel.q"
Info: Version 1331M The Simulation Factory: Manage Cactus simulations
Info: defs: /home/alibeck/programme/Cactus-Luca/Cactus/simfactory/etc/defs.ini
Info: defs.local: /home/alibeck/programme/Cactus-Luca/Cactus/simfactory/etc/defs.local.ini 
Info: Cactus Directory: /home/alibeck/programme/Cactus-Luca/Cactus 
Info: simenv.COMMAND: submit 
Info: Executing command: submit 
Info: Assigned restart_id of: 0002 
Info: Found the following restart_ids: [0, 1] 
Info: Maximum restart id determined to be: 0001 Assigned restart id: 2 
Info: Simulation is inactive: submitting 
Info: Job allocation information: 
Info: System: nodes=170 cores/node=4 threads/process=4 
Info: Requested: nodes=4 cores=16 cores/node=4 
Info: Run: processes=4 threads=16 threads/process=4 
Info: Distribution: processes/node=1 threads/node=4 
Info: Ratio: threads/core=1.000 cores/thread=1.000 
Info: writing to internalDir: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0002/SIMFACTORY 
Info: saving substituted submitscript contents to: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0002/SIMFACTORY/SubmitScript
Executing submit command: qsub /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0002/SIMFACTORY/SubmitScript
Submit finished, job id is 259460 
Info: Restart 2 is active 
Info: Assigned restart_id of: 0003 
Info: Found the following restart_ids: [0, 1, 2, 2] 
Info: Maximum restart id determined to be: 0002 Assigned restart id: 3 
Info: Simulation is active: presubmitting 
Info: Job allocation information: 
Info: System: nodes=170 cores/node=4 threads/process=4 
Info: Requested: nodes=4 cores=16 cores/node=4 
Info: Run: processes=4 threads=16 threads/process=4 
Info: Distribution: processes/node=1 threads/node=4 
Info: Ratio: threads/core=1.000 cores/thread=1.000 
Info: writing to internalDir: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0003/SIMFACTORY 
Info: saving substituted submitscript contents to: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0003/SIMFACTORY/SubmitScript
Executing submit command: qsub /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0003/SIMFACTORY/SubmitScript
Submit finished, job id is 259461 
Info: Restart 2 is active 
Info: Assigned restart_id of: 0004 
Info: Found the following restart_ids: [0, 1, 2, 2, 3] 
Info: Maximum restart id determined to be: 0003 Assigned restart id: 4 
Info: Simulation is active: presubmitting 
Info: Job allocation information: 
Info: System: nodes=170 cores/node=4 threads/process=4 
Info: Requested: nodes=4 cores=16 cores/node=4 
Info: Run: processes=4 threads=16 threads/process=4 
Info: Distribution: processes/node=1 threads/node=4 
Info: Ratio: threads/core=1.000 cores/thread=1.000 
Info: writing to internalDir: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0004/SIMFACTORY 
Info: saving substituted submitscript contents to: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0004/SIMFACTORY/SubmitScript
Executing submit command: qsub /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0004/SIMFACTORY/SubmitScript
Submit finished, job id is 259462
[snip]

As a result three jobs are queued.

[snip]
qstat job-ID prior name user state submit/start at queue slots ja-task-ID
259460 0.00000 test-whisk alibeck qw 04/29/2011 10:26:55 16 
259461 0.00000 test-whisk alibeck hqw 04/29/2011 10:26:56 16 
259462 0.00000 test-whisk alibeck hqw 04/29/2011 10:26:56 16 
[snip]

What is going wrong here?

Keyword:

Comments (6)

  1. Erik Schnetter
    • removed comment

    It seems that this is presubmission. The wall time you requested was longer than the wall time limit, so SimFactory broke up your simulation into three pieces that will execute sequentially. You see this from the three lines

    Info: Simulation is inactive: submitting Info: Simulation is active: presubmitting Info: Simulation is active: presubmitting

    Notice that your initial submit command did not create a new simulation; instead, it restarted an existing simulation that had already two restarts.

    I do not know how well-tested presubmission is on Damiana.

  2. Log in to comment