Modify

Opened 7 years ago

Last modified 7 years ago

#508 reopened defect

list-simulations displays job ID when it doesn't make any sense

Reported by: Ian Hinder Owned by: Erik Schnetter
Priority: minor Milestone:
Component: SimFactory Version:
Keywords: Cc:

Description

The list-simulations command displays a restart number and job ID even for simulations for which there is no job in the queue. The attached patch omits these in that case.

Attachments (1)

0002-list-simulations-Only-display-restart-number-and-job.patch (1.1 KB) - added by Ian Hinder 7 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 7 years ago by Ian Hinder

Status: newreview

comment:2 Changed 7 years ago by Erik Schnetter

Each active simulation has one restart that is active; this restart id should be output when the simulation is active.

If a restart is active, the job id should also be output, even if the job is not in the queue any more. For example, several job queuing systems keep job ids around even after a job has finished (still listed with "qstat" but in a "D" state), or create debug/output files containing the job id, or send emails where the job id is in the subject.

Knowing the job id lets people interact with the queuing system without simfactory; people are used to this, so it's nice to know the job id.

comment:3 Changed 7 years ago by Frank Löffler

Ian: can we close this ticket?

comment:4 Changed 7 years ago by Ian Hinder

I disagree with what Erik said in comment 2. SimFactory is supposed to create an abstraction over these details. Users who want to break the abstraction can easily go and delve around in the simulation output directory or use qstat to find the job id. Similarly for the restart id.

As a user, what I want to see from list-simulations is:

  • A list of the simulations on the machine
  • Whether a simulation is active or inactive
  • Whether the simulation is running or not
  • How much longer it will run for in walltime (the total walltime of all queued restarts) [this is not currently available, and requires logging in to the machine and running qstat]

I don't care about the details of individual restarts or job ids. Perhaps those could be output with a --all-details option or something.

Erik: if I haven't convinced you, just close the ticket.

comment:5 Changed 7 years ago by Erik Schnetter

You have.

One problem that I often encounter is that there are queued and held jobs left in the queue. These presumably come from faulty submit scripts or faulty handling of presubmission, but this is a real problem -- submit scripts will always be a bit dodgy, in particular if machines change, or if someone prepares a submit script for a new machine. Simfactory should tell the user about such jobs; maybe Simfactory should even check whether these jobs "look like" a Simfactory job, and if so, warn about these.

Basically, if we don't output the job id any more, then Simfactory needs to be able to do the most important tasks people currently do via qstat.

comment:6 Changed 7 years ago by Ian Hinder

Good idea! Maybe when simfactory runs qstat, it should do this check (whether there are jobs that look like simfactory jobs) and report as a warning that there are "orphaned" jobs that simfactory is not managing. There could then be a command to "clean up" any orphaned simfactory jobs.

comment:7 Changed 7 years ago by Frank Löffler

Status: reviewreopened

I assume this now goes beyond the proposed patch - removing the 'review'.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as reopened The owner will remain Erik Schnetter.
Next status will be 'review'.
as The resolution will be set.
to The owner will be changed from Erik Schnetter to the specified user.
The owner will be changed from Erik Schnetter to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.