The test system should treat a nonzero exit code from Cactus as a failure

Issue #1690 resolved
Ian Hinder created an issue

The test system seems to ignore the fact that Cactus exits with a nonzero exit code. It displays

Cactus exited with error code 1 Please check the logfile...

No files created in test directory

Success: 0 files identical

And in the summary at the end, it treats this as a passing test. In this case, there were no test reference files and no files output, because the test (by design) does not produce any data, it just aborts if the test fails.

Keyword:

Comments (14)

  1. Erik Schnetter
    • removed comment

    It is difficult to obtain the exit code of Cactus. Instead, we should reject tests without output, e.g. treating them as failing all the time.

  2. Frank Löffler
    • removed comment

    Ticket #1689 is already open about the issue of how to determine whether a Cactus run was successful or not. We should probably treat this bug here as dependent on #1689, as the solution to #1689 might implement something that could then be used by the test system to determine whether a run was indeed successful or not. This "something" might not need to be an exit code.

  3. Ian Hinder reporter
    • changed status to open
    • marked as
    • removed comment

    Why is it difficult to obtain the exit code of Cactus? Is this because mpirun only returns the exit code of the root process, or because some mpirun implementations are buggy? I think if a nonzero exit code is returned, something has definitely gone wrong, though the converse may not be true.

    Setting priority to major because a test will appear to have passed even if there was a fatal error when running Cactus, if that test does not have any output files.

  4. Frank Löffler
    • removed comment

    It should be possible, with most of the MPI implementations at least, to get a non-zero exit code from a dying simulation. It might not be the one actually triggering the crash, but it would likely be non-zero, and mpirun is probably unlikely to return non-zero for a succeeding run. Detecting this would be good.

    Still, in addition I don't see the harm to generate a "sentinel" file that is only created after Cactus successfully reaches TERMINATION. We could (and, if this works, should) then test for both.

  5. Ian Hinder reporter
    • removed comment

    I would like Cactus to provide a little more information about its termination than just creating a file when it reaches "Done.". For example, I would like to know the reason for termination. Was it due to TerminationTrigger running out of walltime, in which case the simulation is not complete? Or was it due to reaching cctk_final_time, in which case it is. Alternatively, if termination is from CCTK_Error, it would be good to get the error message in a file which is easy to parse, so that it can be displayed in tables of simulation statuses. Probably we want to write a termination file, as you suggest, but we would need to define the format for the content. We should brainstorm on what other things we might want in this file. Does Cactus already "know" the reason for a termination, or do we need to extend the flesh for this? I think this is outside the scope of this ticket, so I have created another one (#1720). I would still like to detect a nonzero exit code, as described in the current ticket. For example, an error which occurred after the termination file was written might cause this.

  6. Roland Haas
    • removed comment

    This is a bit more complex. I just looked into RunTestUtils.pl which contains these lines:

      $retcode = &RunCactus($output,$test,$cmd);
      chdir $config_data->{"CCTK_DIR"};
    
      # Deal with the error code
      if($retcode != 0)
      {
        print "Cactus exited with error code $retcode\n";
        print "Please check the logfile $testdata->{\"$thorn $test TESTRUNDIR\"}$sep$test.log\n\n";
        $testdata->{"$thorn FAILED"} .= "$parfile ";
        $testdata->{"NFAILED"}++;
      }
    

    and indeed I do see the output

    Cactus exited with error code 137 Please check the logfile [...]/TEST/sim/TestArrays/arrays0.log

    It just turns out that the fields in $testdata that are set but it seems they have always been ignored (incl. in the commit "cbe7f3d1 - (HEAD) Fixed bug where par files which core dumped would pass (16 years ago)" which I have tried).

  7. Roland Haas
    • changed status to open

    This pushed change fails to handle the case where a failing test has no output file (eg Carpet's 64k2.par test).

  8. Log in to comment