CarpetLib::check_communication_schedule fails in 2D output routines

Issue #920 new
Roland Haas created an issue

the attached parameter file fails with an assert() in CarpetLib:

CarpetLib/src/commstate.cc:118: void comm_state::step(): Assertion `recvcount.at(proc * dist::c_ndatatypes() + type) == (typebufs.at(type).in_use ? int(typebufs.at(type).procbufs.at(proc).recvbufsize) : 0)' failed

The failure only occurs for > 2 processors both for HDF5 2d and ASCII 2d Output. Erik understands the issue and this ticket is to serve as a reminder to eventually fix it (one solution is apparently to teach commstate which processes take part in a given communication).

No actual data producing code is wrong, only the test.

Keyword:

Comments (2)

  1. Erik Schnetter
    • removed comment

    To clarify: This test checks whether all MPI processes send and receive consistent amounts of data. It uses MPI_Alltoall to exchange this information, which is otherwise implicitly obtained from the communication schedule and not explicitly checked.

    However, CarpetIOASCII and CarpetIOHDF5 use an optimised communication strategy where only pairs of processes communicate; in particular, process N sends data to process 0 for output. In this case, the MPI_Alltoall obviously fails.

    The solution is to tell the commstate class which processes are going to be involved.

  2. Roland Haas reporter
    • removed comment

    Probably for the same reason the parameter file also hangs in OutputGH if carpetlib::barrier_between_stages = "yes" is set. The Processes hang in comstate.cc in the barrier 404924393 in comm_state::step and the MPI)Waitall in state_empty_recv_buffers (no line numbers since I put so many printf statements in that I am sure to misscount).

  3. Log in to comment