The test system should treat a nonzero exit code from Cactus as a failure

Issue #1690 resolved

Ian Hinder created an issue 2014-11-05

The test system seems to ignore the fact that Cactus exits with a nonzero exit code. It displays

Cactus exited with error code 1 Please check the logfile...

No files created in test directory

Success: 0 files identical

And in the summary at the end, it treats this as a passing test. In this case, there were no test reference files and no files output, because the test (by design) does not produce any data, it just aborts if the test fails.

Keyword:

Comments (14)

Erik Schnetter
- removed comment
It is difficult to obtain the exit code of Cactus. Instead, we should reject tests without output, e.g. treating them as failing all the time.
- 2014-11-05T08:16:40+00:00
Frank Löffler
- removed comment
Ticket ~~#1689~~ is already open about the issue of how to determine whether a Cactus run was successful or not. We should probably treat this bug here as dependent on ~~#1689~~, as the solution to ~~#1689~~ might implement something that could then be used by the test system to determine whether a run was indeed successful or not. This "something" might not need to be an exit code.
- 2014-11-05T09:32:08+00:00
Ian Hinder reporter
- changed status to open
- marked as
- removed comment
Why is it difficult to obtain the exit code of Cactus? Is this because mpirun only returns the exit code of the root process, or because some mpirun implementations are buggy? I think if a nonzero exit code is returned, something has definitely gone wrong, though the converse may not be true.

Setting priority to major because a test will appear to have passed even if there was a fatal error when running Cactus, if that test does not have any output files.
- 2014-12-09T16:19:26+00:00
Frank Löffler
- removed comment
It should be possible, with most of the MPI implementations at least, to get a non-zero exit code from a dying simulation. It might not be the one actually triggering the crash, but it would likely be non-zero, and mpirun is probably unlikely to return non-zero for a succeeding run. Detecting this would be good.

Still, in addition I don't see the harm to generate a "sentinel" file that is only created after Cactus successfully reaches TERMINATION. We could (and, if this works, should) then test for both.
- 2014-12-09T21:58:50+00:00
Ian Hinder reporter
- removed comment
I would like Cactus to provide a little more information about its termination than just creating a file when it reaches "Done.". For example, I would like to know the reason for termination. Was it due to TerminationTrigger running out of walltime, in which case the simulation is not complete? Or was it due to reaching cctk_final_time, in which case it is. Alternatively, if termination is from CCTK_Error, it would be good to get the error message in a file which is easy to parse, so that it can be displayed in tables of simulation statuses. Probably we want to write a termination file, as you suggest, but we would need to define the format for the content. We should brainstorm on what other things we might want in this file. Does Cactus already "know" the reason for a termination, or do we need to extend the flesh for this? I think this is outside the scope of this ticket, so I have created another one (#1720). I would still like to detect a nonzero exit code, as described in the current ticket. For example, an error which occurred after the termination file was written might cause this.
- 2014-12-10T02:33:23+00:00
Roland Haas
- removed comment
This was independently rediscovered in ~~#2113~~.
- 2018-02-02T16:08:20+00:00
Roland Haas
- removed comment
I just found a test that likely has been failing for 14 years and was not reported as such. See https://trac.einsteintoolkit.org/ticket/2186
- 2018-08-08T17:22:53+00:00
Roland Haas
- removed comment
This is a bit more complex. I just looked into RunTestUtils.pl which contains these lines:
```
  $retcode = &RunCactus($output,$test,$cmd);
  chdir $config_data->{"CCTK_DIR"};

  # Deal with the error code
  if($retcode != 0)
  {
    print "Cactus exited with error code $retcode\n";
    print "Please check the logfile $testdata->{\"$thorn $test TESTRUNDIR\"}$sep$test.log\n\n";
    $testdata->{"$thorn FAILED"} .= "$parfile ";
    $testdata->{"NFAILED"}++;
  }
```
and indeed I do see the output

Cactus exited with error code 137 Please check the logfile [...]/TEST/sim/TestArrays/arrays0.log

It just turns out that the fields in $testdata that are set but it seems they have always been ignored (incl. in the commit "cbe7f3d1 - (HEAD) Fixed bug where par files which core dumped would pass (16 years ago)" which I have tried).
- 2018-08-08T18:05:15+00:00
Roland Haas
- changed status to open
- removed comment
Pull request is here: https://bitbucket.org/cactuscode/cactus/pull-requests/52/cactus-fixed-bug-where-par-files-which/diff
- 2018-08-08T18:11:09+00:00
Roland Haas
Unless objected I will commit after 2019-05-14
- 2019-04-30T15:30:11+00:00
Roland Haas
- changed status to resolved
- edited description
Applied as git hash cbbed86e "Cactus: Fixed bug where par files which core dumped would pass" of cactus
- 2019-05-15T12:46:58+00:00
Roland Haas
- changed status to closed
- 2019-05-15T12:47:18+00:00
Roland Haas
- changed status to open
This pushed change fails to handle the case where a failing test has no output file (eg Carpet's 64k2.par test).
- 2019-06-30T18:17:36+00:00
Roland Haas
- changed status to resolved
Fixed in git hash 36434cf4 "Cactus: explicitly keep track of exit code of tests" of cactus
- 2019-06-30T18:39:14+00:00
Log in to comment

Assignee: –

Type: bug

Priority: minor

Status: resolved

Component: Cactus

Milestone: –

Version: development version

Votes: 0

Watchers: 0