Change default for --num-threads

Issue #530 closed
Erik Schnetter created an issue

Change the default for --num-threads to be what the MDB specifies for the particular machine, instead of using a single thread by default.

Keyword:

Comments (13)

  1. Frank Löffler
    • changed status to open
    • assigned issue to
    • removed comment

    Ian volunteered to put this in (using a global variable use_multiple_threads_by_default, defaulting to "no" for the moment)

  2. Erik Schnetter reporter
    • removed comment

    My apologies for missing the discussion. I dislike the idea of introducing a global variable for this.

    If people want to use only a single thread, they can (a) use the --num-threads option, or (b) build without OpenMP, or (c) change the settings in their defs.local.ini file. A global variable complicates things for everybody, and is really just a sign that we do not dare to make a choice ourselves. However, this is just what Simfactory is supposed to do -- make choices so that people don't have to care. Since with think that OpenMP works fine, and since the majority of our code (McLachlan, GRHydro) have been parallelised, we should simply make the switch. Or was there somebody in the discussion who mentioned an actual problem that would arise?

  3. Frank Löffler
    • removed comment

    Erik, you didn't miss the discussion, you only forgot about it because, as I just realize, it happened almost two years back, and you actually suggested the name of the variable. (Look for mails on Oct 19 2009 on the simfactory mailing list, specifically message id B7FFFB04-8D83-42B8-A115-754688D27F07@cct.lsu.edu) Of course, two years are a lot of time to change mind.

    So, your suggestion is to set num-threads to whatever the machine database sets by default, and let users choose to overwrite this with one of the three mechanisms. I personally agree with that, but I can see that this might be inconvenient for some users. At the same time, the current situation is even more inconvenient for other users (having to remember how many threads should be used on a given machine instead of just '1' to disable openmp). This is why I agree with you.

    Should we bring this again to the users list, or should we just move ahead?

  4. Ian Hinder
    • removed comment
    • I don't like forcing an entire configuration to either have or not have OpenMP. I might have some OpenMP and some non-OpenMP thorns. I reject this solution.
    • I don't like the user to have to change a setting in their configuration - the use might be an undergraduate who doesn't even know what OpenMP is. SimFactory is supposed to hide these details. I reject this solution.
    • The optimal number of OpenMP threads is a property of the machine, and is set in the machine database, which is good.
    • Whether or not multiple threads should be used is a property of a thorn, and should be set in the configuration.ccl of the thorn. The key could be single-threaded = yes/no, defaulting to no, as Cactus is a modern infrastructure and OpenMP is expected. We can make this change coinciding with an advertisement during a release.
    • I propose that SimFactory does not set OMP_NUM_THREADS, but instead sets OPTIMAL_OMP_NUM_THREADS. The flesh then sets OMP_NUM_THREADS to either OPTIMAL_OMP_NUM_THREADS if no thorns are single-threaded, or to 1 if there are single-threaded thorns. Another option would be to just use OMP_NUM_THREADS, but this might cause confusion. The user should still be able to override OPTIMAL_OMP_NUM_THREADS using the '--num-threads' option on the simfactory command line.

    Thoughts?

  5. Frank Löffler
    • removed comment

    The usual mechanism using OMP_NUM_THREADS works quite well, and I would not force users to use something else instead. That would only create confusion.

    Our problem here is that in some cases OMP_NUM_THREADS as set by something like simfactory should be ignored depending on which thorns are active in a simulation. Simfactory does not really know that, Cactus does. The flesh would also know which thorns are flagged 'single-thread' by some mechanism. So, the flesh could contain code which sets the number of used openmp-threads to something else than OMP_NUM_THREADS, depending on some conditions. I don't like to introduce another environment variable and use this. Instead I propose:

    - The flesh gets a new accumulator parameter 'single-threaded' (name to be discussed) which can be added to by thorns to force single-threaded execution. No user would have to know or could set this parameter. - The flesh gets another new parameter 'ignore-single-threaded' (name to be discussed), but this one intended for interaction with the user to be able to ignore the accumulator parameter, for testing primarily. - The flesh would then do nothing if 'single-threaded' is 0 (the default), which means no thorns requested to be executed only single-threaded. - The flesh would set the number of threads to 1 (using the openmp C api) if 'single-threaded' is something else than 0, unless 'ignore-single-threaded' is set to 'yes' in which case it would also not do anything.

    This would ensure that a mere user wouldn't have to do anything to Cactus, the parameter file or simfactory, assuming single-threaded thorns use the accumulator parameter. It's also easy to do that for developers of such thorns. The only action a user would have to take if this is not what is intended is to set one parameter in the parameter file, which should not occur often and which is reasonably easy.

    Frank

  6. Roland Haas
    • removed comment

    Replying to [comment:3 eschnett]:

    My apologies for missing the discussion. I dislike the idea of introducing a global variable for this.

    If people want to use only a single thread, they can (a) use the --num-threads option, or (b) build without OpenMP, or (c) change the settings in their defs.local.ini file. A global variable complicates things for everybody, and is really just a sign that we do not dare to make a choice ourselves. However, this is just what Simfactory is supposed to do -- make choices so that people don't have to care. Since with think that OpenMP works fine, and since the majority of our code (McLachlan, GRHydro) have been parallelised, we should simply make the switch. Or was there somebody in the discussion who mentioned an actual problem that would arise?

    I usually turn off OpenMP for my hydro runs since at least for the runs that I run (48 cores, lonestar, ideal gas eos) I find that GRHydro with OpenMP is about a factor of two slower (!) than without. So even if GRHydro might be OpenMP aware and actually contain code to use it, I do not run with it since it is not beneficial for me. This might have changed by now but was still the case with the development version of GRHydro about 2 months or so ago (and I don't think much has changed in GRHydro wrt OpenMP since then). So it seems to be there are valid reasons to run without OpenMP even if all thorns are not single-threaded.

    As far as fiddling with OMP_NUM_THREADS goes: OpenMP offers a function omp_set_num_threads (https://computing.llnl.gov/tutorials/openMP/#OMP_SET_NUM_THREADS) to change the number of used threads used in parallel sections. It could be called by the flesh early on (taking env("OMP_NUM_THREADS") into account). This seems better than fooling with the environment variables which should be user settable (I think).

    Also it would seem that the information of whether to use or not use OpenMP has to be present already at at the time qsub runs since it affects the number of MPI processes requested and therefore the flesh alone cannot handle it completely it seems to me.

  7. Frank Löffler
    • removed comment

    Right, I forgot about that (again). But that leaves us an an ugly situation:

    - Simfactory would have to know about the number of threads because the number of mpi processes depends on it - Only Cactus would know which thorns are active in a given simulation, and could deduce from that if thorns 'request' only one thread.

    I see only two possible ways out: a) Simfactory parses the parameter file for the active thorns, and gets from somewhere (the build directory most likely) the information whether thorns request one-thread-only. b) We replace the run-time dependency of the single-thread-request with a compile-time dependency and say that a given configuration requests single-thread execution depending on the thorns which are compiled in. This is against the modular Cactus approach, but it would mean that simfactory wouldn't have to parse the parameter file and could rely on a single entry in the configuration.

    I would prefer solution a) if feasable. Doesn't simfactory already parse parameter files for active thorns, and disables them if the machine specified this?

  8. Erik Schnetter reporter
    • removed comment

    I am afraid that we are here over-designing something that is not really a problem at all. Do we know even of a single case where someone would be severely affected by changing the default?

    We don't go through these hoops when it comes to MPI processes. There is no "this thorn doesn't do MPI" flag. If a thorn can't handle MPI, it would check this at startup, and abort with an error. We could do the same with OpenMP, except that OpenMP is "backward compatible" in the sense that it doesn't break anything, it would just run a bit slower than naively expected.

    Having a single thorn disable OpenMP for a whole simulation is overkill. The thorn will still work fine with OpenMP. Whether this is acceptable or not depends on the user's settings and the user's expectations. Is there a thorn which should disable OpenMP today? Which would it be? AHFinderDirect comes to my mind... When it comes to performance problems, then e.g. CarpetIOASCII is a much bigger sinner with it's missing MPI parallelisation (it serialises output).

    I think we are just being afraid of taking this step. We are imagining problems, trying to design a perfect solution, and in the process complicate things for everybody and avoid progress for everybody. Let's just do it, and if it doesn't work out, let's listen to the complaints and address them. This is the development version, after all.

  9. Erik Schnetter reporter
    • removed comment

    Simfactory doesn't parse parameter files, it parses thorn lists to add/remove thorns not supported on a machine.

    It is really necessary to have this mechanism? While some people want to run GRHydro with a single thread, others work on improving its performance and want to run it with multiple threads. So we can't make this thorn single-thread, and it wouldn't address this problem. In GRHydro, someone recently replaced the workshare constructs with explicit loops, addressing a shortcoming in the Intel compiler.

    I am really afraid that we are over-designing here. We are trying to anticipate problems, and are designing a non-trivial solution before we know what the problem is. I say we change the default number of threads, correct the (possibly wrong) MDB entries as we go along, and wait for feedback. We announce this change in a positive way and enjoy progress.

    Since this discussion is about a default setting, Simfactory's choice reflects the state of the art, namely that most machines these days require OpenMP to achieve good performance. This has changed from a few years ago, and hence changing the default makes sense. I outlined several ways in which users can choose their own settings overriding the default, and (as we all know) adding --num-threads to a submit command isn't that big of a deal -- we've been doing it for a long time ourselves.

  10. Roland Haas
    • removed comment

    To clarify: I am fine with having the default for --num-threads change (as long as I can change it again in my own defs.local.ini file or via --num-threads at submission time). I will use whatever method is fastest to run the simulations [which for large runs involves OpenMP] :-)

  11. Log in to comment