Carpet segfaults in Shutdown

Issue #1473 closed
Frank Löffler created an issue

For high numbers of threads, Carpet segfaults in Shutdown for the testsuite CarpetWaveToyNewRecover_test_1proc. I tried this with gcc and intel on a Debian system. The machine I run this on (spine, with the default simfactory configurations) has 2 processors with 8 cores, and ht enabled. When I run this testsuite with 1 mpi process but certain numbers of threads (e.g. 9, 11, 16) I see this segfault. Which number triggers the bug seems to depend on the compiler, but seems to be consistent within tries.

The backtrace I see is, e.g.,:

1. CarpetLib::signal_handler(int)   [/home/knarf/ET_2013_11/exe/cactus_sim(_ZN9CarpetLib14signal_handlerEi+0xeb) [/home/knarf/ET_2013_11/configs/sim/build/CarpetLib/backtrace.cc:542]
2. /lib/x86_64-linux-gnu/libc.so.6(+0x324f0) [??:0]
3. /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [??:0]
4. /lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [??:0]
5. /lib/x86_64-linux-gnu/libc.so.6(+0x6d52b) [??:0]
6. /lib/x86_64-linux-gnu/libc.so.6(+0x76d76) [??:0]
7. /lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c) [??:0]
8. mem<double>::~mem()   [/home/knarf/ET_2013_11/exe/cactus_sim(_ZN3memIdED1Ev+0x80) [/home/knarf/ET_2013_11/configs/sim/build/CarpetLib/mem.cc:185]
9. data<double>::free()   [/home/knarf/ET_2013_11/exe/cactus_sim(_ZN4dataIdE4freeEv+0x4b) [/home/knarf/ET_2013_11/configs/sim/build/CarpetLib/data.cc:549]
a. data<double>::~data()   [/home/knarf/ET_2013_11/exe/cactus_sim(_ZN4dataIdED1Ev+0x27) [/home/knarf/ET_2013_11/configs/sim/build/CarpetLib/data.cc:487]
b. data<double>::~data()   [/home/knarf/ET_2013_11/exe/cactus_sim(_ZN4dataIdED0Ev+0x9) [/home/knarf/ET_2013_11/configs/sim/build/CarpetLib/data.cc:487]
c. ggf::recompose_free(int)   [/home/knarf/ET_2013_11/exe/cactus_sim(_ZN3ggf14recompose_freeEi+0x1ae) [/home/knarf/ET_2013_11/configs/sim/build/CarpetLib/ggf.cc:257]
d. ggf::~ggf()   [/home/knarf/ET_2013_11/exe/cactus_sim(_ZN3ggfD1Ev+0xde) [/home/knarf/ET_2013_11/configs/sim/build/CarpetLib/ggf.cc:70]
e. gf<double>::~gf()   [/home/knarf/ET_2013_11/exe/cactus_sim(_ZN2gfIdED0Ev+0x9) [/home/knarf/ET_2013_11/configs/sim/build/CarpetLib/gf.cc:30]
f. Carpet::Shutdown(tFleshConfig*)   [/home/knarf/ET_2013_11/exe/cactus_sim(_ZN6Carpet8ShutdownEP12tFleshConfig+0x3ad) [/home/knarf/ET_2013_11/configs/sim/build/Carpet/Shutdown.cc:102]
10. /home/knarf/ET_2013_11/exe/cactus_sim(main+0x49) [/home/knarf/ET_2013_11/configs/sim/build/Cactus/main/flesh.cc:92]
11. /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [??:0]
12. /home/knarf/ET_2013_11/exe/cactus_sim() [??:0]

Marking as minor because this is in Shutdown, however if something would actually check for this it might break workflows.

Keyword:

Comments (15)

  1. Roland Haas
    • removed comment

    I just tried this on bethe (9 cores, ht, intel13). Using 9 OpenMP threads and 1 process I get an assertion failure from within hwloc:

        OpenMP thread 0: PU set L#{0-15} P#{0-15}
        OpenMP thread 1: PU set L#{0-15} P#{0-15}
        OpenMP thread 2: PU set L#{0-15} P#{0-15}
        OpenMP thread 3: PU set L#{0-15} P#{0-15}
        OpenMP thread 4: PU set L#{0-15} P#{0-15}
        OpenMP thread 5: PU set L#{0-15} P#{0-15}
        OpenMP thread 6: PU set L#{0-15} P#{0-15}
        OpenMP thread 7: PU set L#{0-15} P#{0-15}
        OpenMP thread 8: PU set L#{0-15} P#{0-15}
    INFO (hwloc): Setting thread CPU bindings:
    cactus_sim: /home/rhaas/ET_trunk/arrangements/ExternalLibraries/hwloc/src/system_topology.cc:525: void <unnamed>::set_bindings(hwloc_topology *, const <unnamed>::mpi_host_mapping_t &): Assertion `core_num < num_cores' failed.
    

    To run, do:

    export CCTK_TESTSUITE_RUN_COMMAND="/usr/local/apps/openmpi-1.6-intel-2013.2.146/bin/mpirun -np \$nprocs $NICE \$exe \$parfile"
    OMP_NUM_THREADS=9 make sim-testsuite
    

    then select 1 process, and the CarpetWaveToyNewRecover_test_1proc test of CarpetIOHDF5. The test passes with 8 threads and 4 threads (and also 16). It fails also with 9, 11 threads.

  2. Roland Haas
    • removed comment

    I can now reproduce this with a single thread and CarpetWaveToyNewRecover_test_1proc.par when running without mpirun and without MALLOC_DEBUG_ and not inside valgrind (but inside gdb). The original error seems to be memory corruption (double free) from inside Carpet/CarpetLib/src/mem.cc:178 which can be traced back to Carpet/Carpet/src/Shutdown.cc:102 (the delete f in Shutdown).

    Full backtrace:

    -----------------------------------------------
    Iteration      Time |              WAVETOY::phi
                        |      minimum      maximum
    -----------------------------------------------
          124     0.816 |    0.6820495    0.8796267
          125     0.822 |    0.6820495    0.8796267
          126     0.829 |    0.6820495    0.8796267
          127     0.836 |    0.6820495    0.8796267
          128     0.842 |    0.6820495    0.8796267
    *** Error in `/mnt/data/rhaas/postdoc/gr/Zelmani/exe/cactus_bns_all': free(): invalid next size (fast): 0x0000000009e85dd0 ***
    *** Error in `/mnt/data/rhaas/postdoc/gr/Zelmani/exe/cactus_bns_all': double free or corruption (out): 0x0000000009e85e20 ***
    
    Program received signal SIGSEGV, Segmentation fault.
    0x00007ffff485c6a5 in malloc_consolidate (av=av@entry=0x7ffff4b87620 <main_arena>) at malloc.c:4106
    4106    malloc.c: No such file or directory.
    (gdb) bt
    #0  0x00007ffff485c6a5 in malloc_consolidate (av=av@entry=0x7ffff4b87620 <main_arena>) at malloc.c:4106
    #1  0x00007ffff485d291 in _int_free (av=0x7ffff4b87620 <main_arena>, p=0xa17ce00, have_lock=0)
        at malloc.c:3998
    #2  0x00000000057699fe in mem<double>::~mem (this=0xa167e10, __in_chrg=<optimized out>)
        at /mnt/data/rhaas/postdoc/gr/Zelmani/arrangements/Carpet/CarpetLib/src/mem.cc:178
    #3  0x00000000057a15dc in data<double>::free (this=0xa13ab90)
        at /mnt/data/rhaas/postdoc/gr/Zelmani/arrangements/Carpet/CarpetLib/src/data.cc:549
    #4  0x00000000057a0e99 in data<double>::~data (this=0xa13ab90, __in_chrg=<optimized out>)
        at /mnt/data/rhaas/postdoc/gr/Zelmani/arrangements/Carpet/CarpetLib/src/data.cc:486
    #5  0x00000000057a0ed4 in data<double>::~data (this=0xa13ab90, __in_chrg=<optimized out>)
        at /mnt/data/rhaas/postdoc/gr/Zelmani/arrangements/Carpet/CarpetLib/src/data.cc:487
    #6  0x0000000005759e61 in ggf::recompose_free (this=0x9e8d620, rl=1)
        at /mnt/data/rhaas/postdoc/gr/Zelmani/arrangements/Carpet/CarpetLib/src/ggf.cc:257
    #7  0x0000000005758967 in ggf::~ggf (this=0x9e8d620, __in_chrg=<optimized out>)
        at /mnt/data/rhaas/postdoc/gr/Zelmani/arrangements/Carpet/CarpetLib/src/ggf.cc:71
    #8  0x0000000005757c15 in gf<double>::~gf (this=0x9e8d620, __in_chrg=<optimized out>)
        at /mnt/data/rhaas/postdoc/gr/Zelmani/arrangements/Carpet/CarpetLib/src/gf.cc:30
    #9  0x0000000005757c44 in gf<double>::~gf (this=0x9e8d620, __in_chrg=<optimized out>)
        at /mnt/data/rhaas/postdoc/gr/Zelmani/arrangements/Carpet/CarpetLib/src/gf.cc:30
    #10 0x00000000056b298d in Carpet::Shutdown (fc=0x7fffffffd4d0)
        at /mnt/data/rhaas/postdoc/gr/Zelmani/arrangements/Carpet/Carpet/src/Shutdown.cc:102
    #11 0x0000000000ff059e in main (argc=2, argv=0x7fffffffd5d8)
        at /mnt/data/rhaas/postdoc/gr/Zelmani/src/main/flesh.cc:88
    
  3. Roland Haas
    • removed comment

    ok. This is non trivial.

    The reason for the segfault is actually due to the way Cacuts/Carpet interact when initially creating the group Carpet::timing_levels which is defined as

    CCTK_REAL timing_levels TYPE=array DIM=1 SIZE=max_refinement_levels DISTRIB=constant TAGS='checkpoint="no"'
    {
      time_level time_level_count
    } "Per-level timing information"
    

    Notice that its SIZE depends on the Carpet parameter max_refinement_levels. This is valid and often used.

    The problem occurs when we recover from a checkpoint using a very minimal recover parfile (such as the test does) that does not set Carpet::max_refinement_levels in the recovery parfile. In that case the group is created so early (basically when Carpet is activated) inside of configs/bns_all/bindings/Variables/Carpet.c that max_refinement_levels has not yet been recovered from the checkpoint (since the thorns are just being activated) so still has its default value of 1.

    Carpet then contains a loop

        for (int rl=0; rl<max_refinement_levels; ++rl) {
          time_level      [rl] = 0.0;
          time_level_count[rl] = 0.0;
        }
    

    in Carpet/src/Timing.cc line 202. This loop writes past the end of the allocated data for time_level (since only 1 entry was allocated due to max_refinement_level having been 1 when the group was created but being recovered to its original value of 4 from the checkpoint). Since this is just a short array, we don't get a segfault but just overwrite malloc's control data which leads to a double free() error message and eventually a segfault when we free the data in the array during SHUTDOWN.

    So: this is actually dangerous and also quite hard to fix since it basically requires that Carpet restores the grid structure of not just grid functions but also of grid arrays.

    A simple fix for the test is to add max_refinement_levels = 4 to the recovery parfile.

    Thankfully I doubt that this bug is commonly triggered in the wild since we usually use the same parfile for checkpointing and recovery.

    I was actually hoping that this would give me some insight into a nasty checkpoint recovery bug on bluewaters, but from the looks of it, this seems unlikely ;-).

  4. Erik Schnetter
    • removed comment

    In other words:

    There is a grid array that has a size that depends on a parameter. This parameter is not set when recovering, and should thus retain its original value. However, when the grid array is set up after recovery, the parameter has not yet been recovered, leading to an inconsistent grid structure. Is this a correct explanation?

    If this is so, then the inconsistency is between recovering parameters and setting up the grid structure.

    How does PUGH handle this? Does it store and retrieve the sizes of grid arrays, or does it always define them based on parameter values? How do we want to handle this? In principle, the ability to change the size of grid arrays upon recovering can be important.

  5. Roland Haas
    • removed comment

    I just ran a quick test with PUGH with a parfile like this:

    ActiveThorns = "PUGH IOUtil IOHDF5 Debug IOASCII PUGHSlab"
    
    Cactus::cctk_itlast = 1
    
    IO::checkpoint_id = yes
    IOHDF5::checkpoint = yes
    
    IO::checkpoint_dir = "cppugh"
    IO::out_dir = $parfile
    
    debug::gasize = 12
    
    IOASCII::out1d_every = 1
    IOASCII::out1d_vars = "Debug::array"
    

    where the Debug thorn provides a 1d array whose size is controlled by gasize. I then comment out

    debug::gasize = 12
    

    for recovery (the default is 3). PUGH behaves the same way that Carpet does. After recover the grid array has size 3 while the parameter gasize is recovered as size 12.

    If we want to allow the possibility to change an array size during recovery, then we need to make sure that CP_RECOVER_PARAMETERS happens before the groups are created (which is very early I think). In particular this means that the recovery code (and any thorns that it depends on) may not assume that their grid functions exist (or at least they are not yet fully created) until after CP_RECOVER_PARAMETERS.

    Carpet cannot just resize the grid arrays after recovery since it no longer knows which sizes are affected by parameter changes.

    The alternative is what Erik mentioned: to checkpoint the size of (all, not just the checkpoint=yes) arrays and re-instantiate them with this size in which case Carpet can resize them (since the information is part of GroupDynamicData which the driver controls) but which does not allow them to change at recovery time.

  6. Erik Schnetter
    • removed comment

    In this case, I would leave PUGH's and Carpet's behaviour as is, and declare that either the parameter file used for recovery or the code that traverses a grid array without looking at its size is in error.

  7. Roland Haas
    • removed comment

    I agree PUGH's and Carpet's current behaviour if perfectly well defined. Since these error are very hard to track down (for example it seems that Philipp's recovery issue on bluewaters is due to this problem after all) I would also suggest that CarpetIOHDF5/IOHDF5 warn with level 1 if the array size in the checkpoint does not agree with the current one.

    The typical case (similar thing can happen I think with vectors of SCALARS) where this fails is something like this:

    CCTK_REAL myscalar TYPE=ARRAY DIM=1 SIZE=numberofthingummies DISTRIB=constant "some number of things"
    
    for(int i = 0 ; i < numberofthingummies ; i++) { do_something(); }
    

    which is very common (eg Carpet does it, the PITTNullCode does it, ZelmaniLeak does it). Right now CarpetIOHDF5 will abort the array grows (I think) since then it can be read only partially from file, while shrinking the array (which is what happened in the testsuite) is not caught and can cause memory corruption.

  8. Ian Hinder
    • removed comment

    My initial opinion is that absent any explicit steering in the parameter file, Carpet should recover the simulation exactly as it was checkpointed. We shouldn't need to re-specify parameters so that variables have the same size. But I now realise that I don't know quite how this currently works. If I recover with a parameter file which doesn't set a steerable parameter, is it interpreted as "steered to the default value", or is it recovered as "how it was when checkpointed"?

  9. Erik Schnetter
    • removed comment

    I am confused

    What is the size of the grid array that is recovered? Is it too small, since the parameters are recovered too late? If so, this should be an error that needs to be corrected.

    In the mean time, there should be an error (not just a warning) if Carpet's recovery detects an inconsistency.

  10. Roland Haas
    • changed status to open
    • removed comment

    Actually one can get the drivers to recover the parameters early. CCTK_RECOVER_PARAMETER is special anyway and scheduled very early (even before the thorn's startup routines are called). Attached please find a patch to the flesh that moves parameter recovery to just after the parameters from the parfile are processed. This ensures that anything afterwards sees the parameters from the checkpoint (for example the cache setup routine which looks at some Cactus parameters and '''importantly''' CCTKi_BindingsVariablesInitialise.

    I believe this is too invasive to apply before the release though. I'd suggest to document this bug and apply the patch after the release.

  11. Log in to comment