Configure options for ExternalLibraries/MPI should be easier to understand

Issue #1667 open
Ian Hinder created an issue

I am trying to compile the ET on Datura, and MPI configuration is failing. The error message is:

Running configuration script for thorn MPI:
Found mpi compiler wrapper at /cluster/openmpi/SL6/1.7.2/intel14/bin/mpic++!
MPI could not be configured.

CST error 1:
  -> Configuration script for thorn MPI returned exit code 5
     (no error message)

Finished running configuration script for thorn MPI.

The configuration script should report any errors to the user.

Keyword: MPI

Comments (23)

  1. Ian Hinder reporter
    • removed comment

    In fact, the problem is likely that the Cactus configuration script mechanism by default swallows any unexpected errors. Unless you explicitly wrap code with begin/end error etc, the error message is lost. This is poor design, as it means you have to proactively anticipate any possible error and write code to capture it and format it. The Perl configuration script for MPI could maybe install some sort of error handling function which did this? But this is over-engineering the solution in the configure script. Cactus should handle this.

  2. Erik Schnetter
    • removed comment

    Cactus swallows most output because, most of the time, people don't want to see the output. Try running with VERBOSE=yes.

  3. Ian Hinder reporter
    • removed comment

    Cactus should capture the output for each script, and if the script fails, the output and error messages should be available, and probably printed. If Cactus is incapable of displaying stderr when there is an error, and suppressing it when there isn't, then it could at least tell the user how to proceed. e.g., instead of printing "(no error message)" it could say:

    Running configuration script for thorn MPI:
    Found mpi compiler wrapper at /cluster/openmpi/SL6/1.7.2/intel14/bin/mpic++!
    MPI could not be configured.
    
    CST error 1:
      -> Configuration script for thorn MPI returned exit code 5
         Error message suppressed due to VERBOSE=no.  Additional diagnostic output
         may be available if you recompile with VERBOSE=yes.
    
    Finished running configuration script for thorn MPI.
    

    But I think the better solution would be to capture stdout and stderr to a file when running the script, and displaying them in the case that the script fails and doesn't return a begin/end error message.

  4. anonymous
    • removed comment

    I am trying to compile the development of ET on Loewe now, and I am seeing the following MPI error message:

    Running configuration script for thorn MPI:
    ERROR: MPI could not be configured: neither automatic nor manual configuration succeeded
    
    CST error 1:
      -> Configuration script for thorn MPI returned exit code 5
         Error message: 'MPI could not be configured: neither automatic nor manual configuration succeeded'
    
    Finished running configuration script for thorn MPI.
    

    It is still cryptic to me since it doesn't tell me why it failed. Note that I set 'export VERBOSE=yes' before trying to run it and that I do see lots of output for all other external library scripts, but that is the only output I see from the MPI thorn.

    How do I turn on more verbose output for the MPI thorn? Is there a fundamental change on setting MPI_DIR and its include and lib dirs? How should I change that in the loewe and supermuc machine options?

    Thanks, Bruno

  5. Frank Löffler
    • removed comment

    How does your MPI configuration look like? Which MPI_ variables do you set, and what to?

    If I read the code correctly, you likely set MPI_DIR to some directory, but no other variables (MPI_*_DIRS or MPI_LIBS). Do you see one of the two messages

    Found mpi compiler wrapper at ...
    

    or

    No mpi compiler wrapper found beneath MPI_DIR (MPI_DIR=$ENV{MPI_DIR})
    

    or possibly

    MPI_DIR is set to a directory that does not exist (MPI_DIR=$ENV{MPI_DIR}); continuing anyway
    

    (although I am aware that you would have reported it if so - just to make sure).

  6. Erik Schnetter
    • removed comment

    This error message means that (a) thorn MPI was not able to find a usable MPI version on its own, and (b) you did not specify sufficient information in the option list for manual configuration.

    The old MPI thorn did not output any more information in this case either. But it probably would have continued, leading to build errors later.

    I see these options for Loewe:

    MPI_DIR      = NO_BUILD
    MPI_INC_DIRS = /cm/shared/apps/slurm/current/include /cm/shared/apps/mvapich2/intel-14.0.3/2.0/include /cm/shared/apps/mvapich2/intel-14.0.3/2.0/include
    MPI_LIB_DIRS = /cm/shared/apps/mvapich2/intel-14.0.3/2.0/lib 
    MPI_LIBS     = mpich opa mpl
    

    Since MPI_DIR is not specified (it is set to NO_BUILD), this means you do not want to provide a manual configuration. A manual configuration, at the very least, needs to point to a directory where MPI is installed. Since you also say NO_BUILD, thorn MPI will not build MPI. That leaves auto-configuration -- but that apparently didn't work either, probably because no MPI modules are loaded.

    I suggest changing MPI_DIR to /cm/shared/apps/mvapich2/intel-14.0.3/2.0. The setting for MPI_LIB_DIRS can then be omitted.

    We could output something like my explanation above, but note that this does not depend on anything that thorn MPI did or found; this is just an explanation of how MPI_DIR works. Maybe this would be a good idea, since people find it confusing.

  7. Bruno Mundim
    • removed comment

    Hi Erik,

    thanks for the explanation. There is still an issue with machines (I have in mind supermuc) with no standard MPI directory installations. I recall a conversation with Ian where he convinced me that MPI_INC_DIRS and MPI_LIB_DIRS should be set independently from MPI_DIR, and MPI_DIR set to NO_BUILD to prevent the thorn from building it. Your suggestion of setting MPI_DIR to /cm/shared/apps/mvapich2/intel-14.0.3/2.0 might work on Loewe, but will probably fail on Supermuc, which has a non-standard MPI installation. In any case, let me see if it does solve for Loewe first.

    Thanks, Bruno.

  8. Erik Schnetter
    • removed comment

    No, MPI_DIR should be set to the directory where MPI is installed. You can later override this with the MPI_*_DIRS options. Setting it to NO_BUILD means that you do not want to use a pre-installed MPI, which is the wrong option for you.

  9. Bruno Mundim
    • removed comment

    Ok, but what I am trying to say is that the MPI_DIR might not be sufficient to determine where the inc and lib files are. For example the default mpi stack on supermuc sets the MPI_DIR to /opt/ibmhpc/pecurrent/mpich2, MPI_INC_DIR to -I/opt/ibmhpc/pecurrent/mpich2/intel/include64 and MPI_LIB_DIR to -L/opt/ibmhpc/pecurrent/mpich2/intel/lib64. If MPI_INC_DIR and MPI_LIB_DIR set in the optionlist do overwrite the ones set by the script when MPI_DIR is set, then we should be fine. Otherwise it will cause problems. I am working on updating supermuc configuration too. Let's see how it will behave.

    Thanks, Bruno.

  10. Ian Hinder reporter
    • removed comment

    Bruno: I just ran into this on Hydra, which is similar to supermuc, and setting MPI_DIR to some directory solves the problem. It is never used, so you can set it to anything.

    Erik: NO_BUILD means "don't use a preinstalled version". I would say this is misnamed. There might not be a single directory which corresponds to the MPI installation directory, and both MPI_LIB_DIRS and MPI_INC_DIRS need to be set. In that case, there is no meaningful content to put in MPI_DIR, as it will never be used. You seem to be defending the current system and suggesting that people should not have found it confusing. I find it very confusing.

  11. Erik Schnetter
    • removed comment

    These are the five cases that we need to cover:

    1. use an installed library at a specified location (set MPI_DIR to point to the library)
    2. always build it (set MPI_DIR to BUILD)
    3. do nothing, e.g. for Cray (set MPI_DIR to NONE)
    4. search for an installed library, build it not found (MPI_DIR is empty, i.e. this is the default)
    5. search for an installed library, fail if not found (MPI_DIR is NO_BUILD)

    I think this covers all interesting cases. The case NONE could also be handled by setting MPI_DIR and MPI_LIBS to a "fantasy" directory and "fantasy" library, but that's slightly inelegant.

    Note that the user's option settings are ignored in all cases except 1.

    I notice we're overloading the meaning of MPI_DIR. We could instead use a setting for MPI, which would be less confusing.

    We could also rename NO_BUILD to SEARCH_AND_IGNORE_USER_SETTINGS_AND_NEVER_BUILD. That options is a bit of an outlier, because we know in this case that we need to use a system library (so it's presumably a strange system), but we also expect Cactus to find this system library.

    We should probably abort with an error if user options are set and are ignored.

  12. Frank Löffler
    • removed comment

    I agree that the way things are now can be a bit confusing. MPI_DIR, however, is used as directory, to direct the search to a specific directory if needed/requested - in case none of the other MPI_ variables are set. It is only not looked at (but needs to be set), if one of the others is set. In this case you are expected to set everything manually.

  13. Erik Schnetter
    • removed comment

    Ian, you seem to be wanting a new system. You are welcome to re-design the existing system. Please keep it backward compatible, and ensure it works for all the cases listed above. Also, to reduce confusion, please convert the existing external libraries so that there isn't a different mechanism for each of these. Thank you. No, I'm not defending the current system -- I'm merely explaining it.

  14. Steven R. Brandt
    • removed comment

    Sorry. I had intended that if you set NO_BUILD and then explicitly specified the directories that the configuration should have worked. I'll put up a patch later today.

  15. Roland Haas
    • removed comment

    Shorter names for cases 1-5 that are less confusing may be:

    1. MANUAL
    2. BUILD
    3. NATIVE (this is what the old MPI interface used)
    4. AUTOPROBE (this is the same name that IOUtils uses to probe for possibly existing checkpoints), or AUTO
    5. SEARCH

    this avoids the confusing name NO_BUILD which is currently used in places where NATIVE is meant eg on the Cray machines (if I am not mistaken).

  16. Erik Schnetter
    • removed comment

    Steve -- please don't change the behaviour. Many of the well-tested option lists depend on it. Also, how can the current case 5 then be specified?

  17. Erik Schnetter
    • removed comment

    Roland: The difference between autoprobe and search is not clear; neither indicates whether Cactus would proceed to build MPI.

  18. Roland Haas
    • removed comment

    So would "AUTOMATIC" be clearer in implying that it may build? SEARCH does not imply building to my ears.

    Addendum: may make sense to have the SEARCH method accept an MPI_DIR to give it a starting point to search, so that one can set MPI_DIR=some-mpi-root and it will find both libmpi.so and libmpich.so etc. So just setting MPI_DIR would SEARCH for an MPI installation in there, while setting MPI_DIR and MPI_LIBS would still search for the include files but fix the libraries to whatever they are. So in a blending from SEARCH to MANUAL, SEARCH would be

    MPI_DIR = SEARCH

    and MANUAL would be MPI_DIR = MANUAL MPI_COMPILE_FLAGS = -I/usr/lib/mpich/include -DMPICH_SKIP_CXX_SEEK MPI_LINK_FLAGS = -lmpich -lpthreads -L/usr/lib/mpich/lib64

    with MPI_DIR = /usr/lib/mpich

    being somewhere in between. This is kind of my personal wishlist for how I wished the system would behave.

  19. Erik Schnetter
    • removed comment

    Roland: In this case, please add a sixth case to my list above that can then be the default. Then let's find good names for all of these. Then people can set MPI to any of these cases (or do nothing to get the default), and thorn MPI will then act accordingly. User options that are set, but are always ignored for the particular choice would be errors.

  20. Log in to comment