Modify

Opened 6 years ago

Last modified 6 years ago

#1150 new enhancement

Simfactory: it doesn't honor --procs when running testsuites

Reported by: bmundim Owned by: Erik Schnetter
Priority: major Milestone:
Component: SimFactory Version:
Keywords: Cc:

Description

I have entered the following command:

sim create-submit trestles2_8 --testsuite --procs 2 --num-threads 8

but trestles2_8/output-0000/SIMFACTORY/RunScript only sets -np 1:

${MPICHDIR}/bin/mpirun_rsh -tv -np 1 -hostfile ${MPI_NODEFILE} /oasis/scratch/trestles/bcmundim/temp_project/simulations/trestles2_8/SIMFACTORY/exe/cactus_sim -L 3 ${TESTSUITE_PARFILE}

This may be related to bug #1075.
Thanks!

Attachments (0)

Change History (12)

comment:1 Changed 6 years ago by Roland Haas

--procs is processors (cores) not MPI processes. The names have become confusing. Simfactory should likely abort in this case since you asked for more threads than cores.

This *is* actually explained in the docs but very confusing (and I think simfactory is not consistent with its use of procs for processes and processors).

comment:2 Changed 6 years ago by Ian Hinder

The documentation Roland is referring to is at http://simfactory.org/info/documentation/userguide/processterminology.html. I don't remember coming across a situation where it was found to be inconsistent with the simfactory code. On the other hand, the machine database entries in submit script and run scripts often don't use these variables and terms correctly. I agree that the terminology is confusing. I have opened #1151.

In your case, since you have 8 threads, you need --procs 16 to run with 2 processes.

Note that this is nothing to do with testsuites; you should get the same behaviour if you run a normal job. If this is not the case, then it is indeed a bug.

comment:3 Changed 6 years ago by bmundim

ok, shouldn't --procs actually correspond to MPI processes/jobs/tasks? It is what I thought immediately,
so I assume it is more intuitive...

comment:4 Changed 6 years ago by bmundim

also procs is described in http://simfactory.org/info/documentation/userguide/processterminology.html as the number of threads, which is even more confusing. We end up with the following possible mappings:

--procs --> total number of threads
--procs --> number of threads per MPI task (probably not since –num-threads is described as such)
--procs --> number of processes, meaning MPI tasks
--procs --> number of processors, meaning CPUs
--procs --> number of cores, for those thinking of a core as a computational unit (less probable though)

comment:5 Changed 6 years ago by Roland Haas

I certainly was confused about procs as well. In particular since many places in the code use Procs to refer to MPI processes (eg CCTK_NProcs). This might be historical and based on the assumption that there are as many processes as processors (do we mean socket or core by that as well) but is what you are used to when writing MPI code.

So my preference would be to rename --procs to --cores (if this is what the queuing systems wants) and output a warning message if --procs is used. Since I think of procs as MPI processes my personal favourite would be of course to have --procs by MPI processes but that seems impossible to change now since all users of simfactory are now used to the old meaning.

We might also want to have look at how simfactory handles undersubscribing a node eg. run only 2 MPI processes with 4 threads each on a node that has say 16 cores (since one needs lots of memory). I think this is currently possible but would double check that the naming convention there is not "unfortunate". :-)

comment:6 Changed 6 years ago by bmundim

Since I think of procs as MPI processes my personal favourite would be of course to have --procs by MPI
processes but that seems impossible to change now since all users of simfactory are now used to the old
meaning.

My feeling is that everyone thinks is confusing, but no one wants to change since people either got used to it or do not have a better alternative in mind. I think we should propose a better alternative and change it, under a mutual agreement of course. New users will appreciate more intuitive command options.

comment:7 Changed 6 years ago by Erik Schnetter

People rarely want to specify the number of MPI processes. They typically want to specify the number of cores that they request from the queueing system, since this is what they pay for. Given that the number of OpenMP threads is also fixed (either at one, or at a small integer number depending on the system hardware), then actual number of MPI processes then needs to be calculated automatically. This is guaranteed to always lead to a "sensible" answer.

The converse, e.g. specifying MPI processes and threads, can easily lead to underused nodes. Specifying cores and MPI processes can easily lead to strange (non-optimal) numbers of OpenMP threads.

Under- and over-subscribing should work fine in Simfactory, apart from particular submit scripts that may not be able to handle this.

comment:8 Changed 6 years ago by bmundim

People rarely want to specify the number of MPI processes. They typically want to specify the number of >cores that they request from the queueing system, since this is what they pay for.

Wouldn't the be more natural that they submit their jobs through simfactory with a --cores option?

Given that the number of OpenMP threads is also fixed (either at one, or at a small integer number >depending on the system hardware), then actual number of MPI processes then needs to be calculated >automatically. This is guaranteed to always lead to a "sensible" answer.

I agree. Most of the time users won't care for the number of MPI processes and or number of threads.
That's what the machine database in simfactory is there for, and, as far as I understood, you wouldn't need
to specify the number of threads per process you want, only the total number of cores or nodes (another interesting option would be --nodes or --num-nodes). However when we want to experiment with other
configurations then the options become unintuitive. Also when I suggested --tasks or --mpi-tasks or
--num-mpitasks I was referring tasks per node. Again that would be important only if the user wants to
experiment (or to run testsuites).

The converse, e.g. specifying MPI processes and threads, can easily lead to underused nodes.

That's correct if we are specifying the total number of MPI processes. That wouldn't be a problem
if we specify the number of MPI processes per node in the same fashion we do specify the number of
threads per MPI process.

Specifying cores and MPI processes can easily lead to strange (non-optimal) numbers of OpenMP threads.

I agree. We either specify one or the other. Not both at the same time.

Under- and over-subscribing should work fine in Simfactory, apart from particular submit scripts that >may not be able to handle this.

Ok, so it is only a matter of making the option names more intuitive for the work they are supposed to
do.

comment:9 Changed 6 years ago by Ian Hinder

--procs is ambiguous, so we should deprecate it. It really means "total number of threads which will be launched", but from the user's point of view, it also usually corresponds to "cores". I would like to be able to specify "--cores" (i.e. the amount of hardware I want to use), on the understanding that I will get that many cores, and that there will be one thread per core (the sensible choice). We should keep --procs for compatibility to mean exactly what it means now. We can introduce "--processes" which will determine the number of MPI processes; this is probably only ever used when testing, and when running test suites.

comment:10 Changed 6 years ago by Erik Schnetter

We should do this, together with updating documentation and examples, but should do it after the release. It wouldn't make sense to deprecate "procs" without having thoroughly tested "cores".

comment:11 Changed 6 years ago by Frank Löffler

Type: defectenhancement

I agree about --cores and the default of one thread per core.

About options to deviate from that we should provide what users will most likely want to complement --cores above. I would think that this is most likely:

--num-processes: number of mpi processes _per node_
--num-threads : number of threads _per mpi process_

Obviously specifying only one of the latter two should set the other such that a whole node is filled as much as possible (not oversubscribing). Specifying both would be used for over- or undersubscribing nodes.

In some sense --cores would be used for how much you request from the queuing system and the other two would be used to change the default layout of how these cores will be used (mpi/openmp).

comment:12 Changed 6 years ago by bmundim

I like Frank's complementary idea, and surely we should implement this after the release.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The owner will remain Erik Schnetter.
Next status will be 'review'.
as The resolution will be set.
to The owner will be changed from Erik Schnetter to the specified user.
Next status will be 'confirmed'.
The owner will be changed from Erik Schnetter to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.