make attempt to change non-steerable parameters at recovery fatal error

Issue #1397 closed
Roland Haas created an issue

currently Cactus ignores, almost silently (there is a L2 warning "Non-steerable parameter 'ADMBase::initial_lapse' is not set from the parameter file but recovered from the checkpoint file"), any attempt to steer non-steerable parameters during checkpoint recovery. I wonder if it would be better to abort or at least make this a Lev1 warning. The comments in cckt_Warnlevel.h say:

#define CCTK_WARN_ABORT    0    /* abort the Cactus run */
#define CCTK_WARN_ALERT    1    /* the results of this run will be wrong, */
                                /* and this will surprise the user, */
                                /* but we can still continue the run */
#define CCTK_WARN_COMPLAIN 2    /* the user should know about this, */
                                /* but the problem is not terribly */
                                /* surprising */
#define CCTK_WARN_PICKY    3    /* this is for small problems that can */
                                /* probably be ignored, but that careful */
                                /* people may want to know about */
#define CCTK_WARN_DEBUG    4    /* these messages are probably useful */
                                /* only for debugging purposes */

since having a parameter being ignored is indeed surprising for most users :-) .

Keyword:

Comments (10)

  1. Erik Schnetter
    • removed comment

    Yes, it could be warnlevel 1.

    Isn't there a command line option that influences this behaviour?

  2. Frank Löffler
    • removed comment

    There is a parameter specifying what should be printed. If I remember correctly, this is 1 by default. Given the value descriptions, 2 would be a better option for the default. Also, 2 seems to be the right value for this specific warning. Results are not necessarily wrong when this happens, the user should know about it, and it is not terribly surprising that steering a non-steerable parameter won't work. On the other hand, I can see that results might in fact be wrong, and the warning might be overlooked. In that case it doesn't matter which level>0 it actually is. If we decide that this should not run at all, then we should make it a level 0 warning, an error. I didn't hit this very often (at least that I know of), so I don't have a good feeling whether 0 or 2 would be better.

    Regardless of this, I believe a lot of simfactory runscripts actually use '-L 3' by default, so whether this is 'almost silently' depends on how you run Cactus.

  3. Erik Schnetter
    • removed comment

    Actually, if I set a parameter in a parameter file and Cactus ignores this, then this is surprising to a user. No one makes intentional errors, so we have to assume the user didn't know that this parameter is not steerable.

    Roland: How often does this occur? I would like to try making this an error instead of a warning, since this is a parameter that cannot bet set. However, if we then find that this breaks too many things, we would need to revert that.

  4. Roland Haas reporter
    • removed comment

    Personally I would prefer this to be an error, since it was a user input that we cannot honor so I would treat it as a parameter file error.

    As for frequency: I don't have hard numbers. I remember that this has happened to at least three (incl. myself) of us here at Caltech in the last say month (or so), mostly because I remember having suggested/had suggested to me to change param.ccl, recompile and rerun. So not very very often. It is however fairly common for SN runs to recover from a checkpoint with changed options (eg when transitioning from pre-bounce to bounce), or to try and change the grid structure, tracking options etc. after a checkpoint recovery (if eg something failed).

  5. Frank Löffler
    • removed comment

    Please change it to be an error in trunk - we'll see what will happen. I don't expect much, apart from the positive effects we want to achieve. I don't see a reason why trying to steer a parameter that isn't steerable shouldn't be an error. Continuing might actually hurt much more. A run might burn time, and a user might wait until it finished before realizing that a change she intended to make wasn't applied. That's worse than an aborted run for a wrong parameter file. Of course, we did want to have an option in Cactus that could check parameter files before actually running them, right? That might not work on all systems, but on the ones where it does (most I expect), simfactory (or the user) could run this before submission...

  6. Roland Haas reporter
    • removed comment

    This turns out to be fairly ugly when done "right" in that Cactus should list all parameters that are attempted to be steered but which are non-steerable. Cactus uses modes (http://en.wikipedia.org/wiki/The_Humane_Interface) to change the behaviour of CCTK_ParameterSet, and IOUtil (which provides the utility functions for checkpoint/recovery) does not return an error code to its caller and itself has not means of asking Cactus what mode it is in (both its caller and CCTK_ParameterSet know which mode they are in).

    Attached please find patches to flesh and IOUtil that abort when IOUtil_SetAllParameters tries to change a non-steerable parameter. Not really my favorite way of implementing this.

  7. Erik Schnetter
    • changed status to open
    • removed comment

    This patch looks fine.

    To convert a double to an int, use lrint. Other combinations such as (int) or (int)round() may lead to compiler warnings.

  8. Roland Haas reporter
    • changed status to resolved
    • removed comment

    I had not actually intended to propose all of flesh.patch (only flesh.2.patch). The changes in flesh.patch avoid compiler warnings (which is why they first assign the result of round() to a double then cast the double variable to an int). Using lrint I can avoid the temporary variable and still avoid compiler warnings. Thank you for the tip. I applied the patches as rev 5030 and rev 5031 of the flesh and rev 303 of IOUtil.

    Closing the ticket for now. Please re-open if there are problems with this change.

  9. Log in to comment