Modify

Opened 4 years ago

Last modified 3 months ago

#1690 review defect

The test system should treat a nonzero exit code from Cactus as a failure

Reported by: Ian Hinder Owned by:
Priority: major Milestone:
Component: Cactus Version: development version
Keywords: Cc:

Description

The test system seems to ignore the fact that Cactus exits with a nonzero exit code. It displays

Cactus exited with error code 1
Please check the logfile...

No files created in test directory

Success: 0 files identical

And in the summary at the end, it treats this as a passing test. In this case, there were no test reference files and no files output, because the test (by design) does not produce any data, it just aborts if the test fails.

Attachments (0)

Change History (9)

comment:1 Changed 4 years ago by Erik Schnetter

It is difficult to obtain the exit code of Cactus. Instead, we should reject tests without output, e.g. treating them as failing all the time.

comment:2 Changed 4 years ago by Frank Löffler

Ticket #1689 is already open about the issue of how to determine whether a Cactus run was successful or not. We should probably treat this bug here as dependent on #1689, as the solution to #1689 might implement something that could then be used by the test system to determine whether a run was indeed successful or not. This "something" might not need to be an exit code.

comment:3 Changed 4 years ago by Ian Hinder

Priority: minormajor
Status: newconfirmed

Why is it difficult to obtain the exit code of Cactus? Is this because mpirun only returns the exit code of the root process, or because some mpirun implementations are buggy? I think if a nonzero exit code is returned, something has definitely gone wrong, though the converse may not be true.

Setting priority to major because a test will appear to have passed even if there was a fatal error when running Cactus, if that test does not have any output files.

comment:4 Changed 4 years ago by Frank Löffler

It should be possible, with most of the MPI implementations at least, to get a non-zero exit code from a dying simulation. It might not be the one actually triggering the crash, but it would likely be non-zero, and mpirun is probably unlikely to return non-zero for a succeeding run. Detecting this would be good.

Still, in addition I don't see the harm to generate a "sentinel" file that is only created after Cactus successfully reaches TERMINATION. We could (and, if this works, should) then test for both.

comment:5 Changed 4 years ago by Ian Hinder

I would like Cactus to provide a little more information about its termination than just creating a file when it reaches "Done.". For example, I would like to know the reason for termination. Was it due to TerminationTrigger running out of walltime, in which case the simulation is not complete? Or was it due to reaching cctk_final_time, in which case it is. Alternatively, if termination is from CCTK_Error, it would be good to get the error message in a file which is easy to parse, so that it can be displayed in tables of simulation statuses. Probably we want to write a termination file, as you suggest, but we would need to define the format for the content. We should brainstorm on what other things we might want in this file. Does Cactus already "know" the reason for a termination, or do we need to extend the flesh for this? I think this is outside the scope of this ticket, so I have created another one (#1720). I would still like to detect a nonzero exit code, as described in the current ticket. For example, an error which occurred after the termination file was written might cause this.

comment:6 Changed 10 months ago by Roland Haas

This was independently rediscovered in #2113.

comment:7 Changed 3 months ago by Roland Haas

I just found a test that likely has been failing for 14 years and was not reported as such. See https://trac.einsteintoolkit.org/ticket/2186

comment:8 Changed 3 months ago by Roland Haas

This is a bit more complex. I just looked into RunTestUtils.pl which contains these lines:

  $retcode = &RunCactus($output,$test,$cmd);
  chdir $config_data->{"CCTK_DIR"};

  # Deal with the error code
  if($retcode != 0)
  {
    print "Cactus exited with error code $retcode\n";
    print "Please check the logfile $testdata->{\"$thorn $test TESTRUNDIR\"}$sep$test.log\n\n";
    $testdata->{"$thorn FAILED"} .= "$parfile ";
    $testdata->{"NFAILED"}++;
  }

and indeed I do see the output

Cactus exited with error code 137
Please check the logfile [...]/TEST/sim/TestArrays/arrays0.log

It just turns out that the fields in $testdata that are set but it seems they have always been ignored (incl. in the commit "cbe7f3d1 - (HEAD) Fixed bug where par files which core dumped would pass (16 years ago)" which I have tried).

Modify Ticket

Change Properties
Set your email in Preferences
Action
as review The ticket will remain with no owner.
Next status will be 'reviewed_ok'.
as The resolution will be set.
The resolution will be deleted.
to The owner will be changed from (none) to the specified user.
to The owner will be changed from (none) to the specified user.
The owner will be changed from (none) to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.