CarpetLib::barriers fails with multipatch

Issue #970 new
Roland Haas created an issue

This happens during the initial storage allocation where there are mismatching barriers in dh::add and gdata::gdata. The underlying reason seems to be that Carpet/StorageCrease has a loop (schematically) around line 93 of Storage.cc.

{{ for(m=0;m<maps;++m) new gf<T> (which calls dh::add)

arrdata.AT(group).AT(m).data.AT(var)->set_timelevels which eventually call gdata::gdata }}

this causes the a barrier error when on process owns a component on map 0 but another does only onwn a component on map 1, since in this case the first one will encounter the barriers as:

dhd::add (map 0) gdata::gdata (component on map 0) dh::add (map 1)

while the other process sees:

dhd::add (map 0) gdata::gdata (component on map 0) dh::add (map 1)

The actual error is then (where there are some extra printf() lines that I added):

INFO (Carpet): [tl=0] Starting initialisation INFO (Carpet): [tl=0] GroupStorageIncrease INFO (Carpet): [tl=0] ADMBASE::SHIFT_STATE: increase to 1 dh::add added varindex 0: shift_state CHECKPOINT: processor 16, file /work/00945/rhaas/Zelmani/arrangements/Carpet/CarpetLib/src/dh.cc, line 2176 Adding varindex 0: shift_state INFO (Carpet): [tl=0] ADMBASE::DTLAPSE_STATE: increase to 1 dh::add added varindex 1: dtlapse_state CHECKPOINT: processor 16, file /work/00945/rhaas/Zelmani/arrangements/Carpet/CarpetLib/src/dh.cc, line 2176 Adding varindex 1: dtlapse_state INFO (Carpet): [tl=0] ADMBASE::DTSHIFT_STATE: increase to 1 dh::add added varindex 2: dtshift_state CHECKPOINT: processor 16, file /work/00945/rhaas/Zelmani/arrangements/Carpet/CarpetLib/src/dh.cc, line 2176 Adding varindex 2: dtshift_state INFO (Carpet): [tl=0] ADMBASE::LAPSE: increase to 1 dh::add added varindex 15: alp CHECKPOINT: processor 16, file /work/00945/rhaas/Zelmani/arrangements/Carpet/CarpetLib/src/dh.cc, line 2176 dh::add added varindex 15: alp CHECKPOINT: processor 16, file /work/00945/rhaas/Zelmani/arrangements/Carpet/CarpetLib/src/dh.cc, line 2176 WARNING level 0 in thorn CarpetLib processor 16 host c305-212.ls4.tacc.utexas.edu (line 251 of /work/00945/rhaas/Zelmani/arrangements/Carpet/CarpetLib/src/dist.cc): -> Wrong id for Barrier "CarpetLib::dist::checkpoint": expected 506880075d, found 783988953d

This like something that is rather hard to fix generally for little benefit (ie. it affects only debugging runs with multipatch). Should this even be reported (if only so that there is official notice that this is known behaviour)? Should the fix be just a warning if Carpet encounters this situation?

Keyword:

Comments (2)

  1. Roland Haas reporter
    • removed comment

    Replying to [ticket:970 rhaas]:

    This happens during the initial storage allocation where there are mismatching barriers in dh::add and > this causes the a barrier error when on process owns a component on map 0 but another does only onwn a component on map 1, since in this case the first one will encounter the barriers as:

    dhd::add (map 0) gdata::gdata (component on map 0) dh::add (map 1)

    while the other process sees:

    dhd::add (map 0) gdata::gdata (component on map 0) dh::add (map 1)

    This is wrong (since this behaviour would actually work). The second process sees dhd::add (map 0) dh::add (map 1) gdata::gdata (component on map 1)

  2. Erik Schnetter
    • removed comment

    The code and the parameter enabling these barriers should be annotated, so that people know this is the expected behaviour.

  3. Log in to comment