MPI error from Carpet with QuasiLocalMeasures: "MPI_SUM is not defined for non-intrinsic datatypes"

Issue #495 closed
Ian Hinder created an issue

In the development (Mercurial) version of Carpet, but not the stable (Git) version, QuasiLocalMeasures fails with the following error message:

... INFO (QuasiLocalMeasures): Landau-Lifshitz angular momentum x: 0.00000 INFO (QuasiLocalMeasures): Landau-Lifshitz angular momentum y: 0.00000 INFO (QuasiLocalMeasures): Landau-Lifshitz angular momentum z: 0.00000 [sl-18:26396] An error occurred in MPI_Reduce: the reduction operation MPI_SUM is not defined for non-intrinsic datatypes [sl-18:26396] on communicator MPI COMMUNICATOR 3 SPLIT FROM 0 [sl-18:26396] MPI_ERR_OP: invalid reduce operation [sl-18:26396] MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

This affects the QuasiLocalMeasures test suite, but was also reported and discussed on the ET mailing list:

http://lists.einsteintoolkit.org/pipermail/users/2011-May/001107.html

Additional debugging was suggested to try to locate the reason for the error.

(Reporting against Carpet even though it might be a problem in QuasiLocalMeasures because it works in the stable Carpet and this will be a possible blocker for promoting the development Carpet to stable).

Keyword:

Comments (12)

  1. Erik Schnetter
    • removed comment

    This is most likely not a problem in QLM, since that thorn only uses intrinsic MPI datatypes. I conjecture that this is a problem in CarpetReduce, most likely uncovered by a recent version of an MPI library that is more strict about checking standard conformance.

    I am still waiting for mor information. Please send a stack backtrace or similar that helps identify which routine uses which MPI datatype when this error occurs.

  2. Erik Schnetter
    • removed comment

    The only non-intrinsic datatypes that Carpet uses are those defined for complex numbers. Is there a reduction operation called for complex numbers? If so, from where -- is this output, or is this in the thorn itself?

  3. Eloisa Bentivegna
    • removed comment

    It is output, not QLM. What is the fix, changing CarpetReduce to handle complex numbers in a standard-compliant form, or making sure QLM uses intrinsic datatypes only?

  4. Erik Schnetter
    • removed comment

    The work-around is the to disable CarpetScalar output for complex quantities.

    The solution is to add support for Carpet's complex MPI datatypes to CarpetReduce. MPI does not offer C datatypes for complex numbers. Maybe we can instead use Fortran datatypes -- they should be the same. If so, the code in Carpet which creates new MPI datatypes for complex number should be modified to use the existing Fortran complex number MPI datatypes instead.

  5. Eloisa Bentivegna
    • removed comment

    A related question: what has changed in CarpetReduce between the git and mercurial versions? The error appears when switching from the former to the latter, with the exact same Open MPI implementation (1.5.4).

  6. Erik Schnetter
    • removed comment

    It is possible that CarpetReduce changed -- I don't recall.

    Can we, for the release, state that one should not reduce complex numbers for output?

  7. Ian Hinder reporter
    • removed comment

    I'm attaching a list of all the commits in the Mercurial version which touched Carpet/CarpetReduce. You can generate this list using

    hg log -r 5418b354a3ab:tip Carpet/CarpetReduce

  8. Log in to comment