Modify

Opened 4 years ago

Last modified 4 years ago

#1667 confirmed defect

Configure options for ExternalLibraries/MPI should be easier to understand

Reported by: Ian Hinder Owned by:
Priority: minor Milestone:
Component: Cactus Version: development version
Keywords: MPI Cc:

Description

I am trying to compile the ET on Datura, and MPI configuration is failing. The error message is:

Running configuration script for thorn MPI:
Found mpi compiler wrapper at /cluster/openmpi/SL6/1.7.2/intel14/bin/mpic++!
MPI could not be configured.

CST error 1:
  -> Configuration script for thorn MPI returned exit code 5
     (no error message)

Finished running configuration script for thorn MPI.

The configuration script should report any errors to the user.

Attachments (1)

msgs.diff (2.1 KB) - added by Steven R. Brandt 4 years ago.
Add verbose messages and remove stray comment

Download all attachments as: .zip

Change History (24)

comment:1 Changed 4 years ago by Ian Hinder

In fact, the problem is likely that the Cactus configuration script mechanism by default swallows any unexpected errors. Unless you explicitly wrap code with begin/end error etc, the error message is lost. This is poor design, as it means you have to proactively anticipate any possible error and write code to capture it and format it. The Perl configuration script for MPI could maybe install some sort of error handling function which did this? But this is over-engineering the solution in the configure script. Cactus should handle this.

comment:2 Changed 4 years ago by Erik Schnetter

Cactus swallows most output because, most of the time, people don't want to see the output. Try running with VERBOSE=yes.

comment:3 Changed 4 years ago by Ian Hinder

Cactus should capture the output for each script, and if the script fails, the output and error messages should be available, and probably printed. If Cactus is incapable of displaying stderr when there is an error, and suppressing it when there isn't, then it could at least tell the user how to proceed. e.g., instead of printing "(no error message)" it could say:

Running configuration script for thorn MPI:
Found mpi compiler wrapper at /cluster/openmpi/SL6/1.7.2/intel14/bin/mpic++!
MPI could not be configured.

CST error 1:
  -> Configuration script for thorn MPI returned exit code 5
     Error message suppressed due to VERBOSE=no.  Additional diagnostic output
     may be available if you recompile with VERBOSE=yes.

Finished running configuration script for thorn MPI.

But I think the better solution would be to capture stdout and stderr to a file when running the script, and displaying them in the case that the script fails and doesn't return a begin/end error message.

Changed 4 years ago by Steven R. Brandt

Attachment: msgs.diff added

Add verbose messages and remove stray comment

comment:4 Changed 4 years ago by anonymous

I am trying to compile the development of ET on Loewe now,
and I am seeing the following MPI error message:

Running configuration script for thorn MPI:
ERROR: MPI could not be configured: neither automatic nor manual configuration succeeded

CST error 1:
  -> Configuration script for thorn MPI returned exit code 5
     Error message: 'MPI could not be configured: neither automatic nor manual configuration succeeded'

Finished running configuration script for thorn MPI.

It is still cryptic to me since it doesn't tell me why
it failed. Note that I set 'export VERBOSE=yes' before
trying to run it and that I do see lots of output for all
other external library scripts, but that is the only
output I see from the MPI thorn.

How do I turn on more verbose output for the MPI thorn?
Is there a fundamental change on setting MPI_DIR and its
include and lib dirs? How should I change that in the loewe
and supermuc machine options?

Thanks,
Bruno

comment:5 Changed 4 years ago by Frank Löffler

How does your MPI configuration look like? Which MPI_ variables do you set, and what to?

If I read the code correctly, you likely set MPI_DIR to some directory, but no other
variables (MPI_*_DIRS or MPI_LIBS). Do you see one of the two messages

Found mpi compiler wrapper at ...

or

No mpi compiler wrapper found beneath MPI_DIR (MPI_DIR=$ENV{MPI_DIR})

or possibly

MPI_DIR is set to a directory that does not exist (MPI_DIR=$ENV{MPI_DIR}); continuing anyway

(although I am aware that you would have reported it if so - just to make sure).

comment:6 Changed 4 years ago by Erik Schnetter

This error message means that (a) thorn MPI was not able to find a usable MPI version on its own, and (b) you did not specify sufficient information in the option list for manual configuration.

The old MPI thorn did not output any more information in this case either. But it probably would have continued, leading to build errors later.

I see these options for Loewe:

MPI_DIR      = NO_BUILD
MPI_INC_DIRS = /cm/shared/apps/slurm/current/include /cm/shared/apps/mvapich2/intel-14.0.3/2.0/include /cm/shared/apps/mvapich2/intel-14.0.3/2.0/include
MPI_LIB_DIRS = /cm/shared/apps/mvapich2/intel-14.0.3/2.0/lib 
MPI_LIBS     = mpich opa mpl 

Since MPI_DIR is not specified (it is set to NO_BUILD), this means you do not want to provide a manual configuration. A manual configuration, at the very least, needs to point to a directory where MPI is installed. Since you also say NO_BUILD, thorn MPI will not build MPI. That leaves auto-configuration -- but that apparently didn't work either, probably because no MPI modules are loaded.

I suggest changing MPI_DIR to /cm/shared/apps/mvapich2/intel-14.0.3/2.0. The setting for MPI_LIB_DIRS can then be omitted.

We could output something like my explanation above, but note that this does not depend on anything that thorn MPI did or found; this is just an explanation of how MPI_DIR works. Maybe this would be a good idea, since people find it confusing.

comment:7 Changed 4 years ago by bmundim

Hi Erik,

thanks for the explanation. There is still an issue
with machines (I have in mind supermuc) with no standard
MPI directory installations. I recall a conversation with
Ian where he convinced me that MPI_INC_DIRS and MPI_LIB_DIRS
should be set independently from MPI_DIR, and MPI_DIR set
to NO_BUILD to prevent the thorn from building it. Your suggestion
of setting MPI_DIR to /cm/shared/apps/mvapich2/intel-14.0.3/2.0
might work on Loewe, but will probably fail on Supermuc,
which has a non-standard MPI installation. In any case,
let me see if it does solve for Loewe first.

Thanks,
Bruno.

comment:8 Changed 4 years ago by Erik Schnetter

No, MPI_DIR should be set to the directory where MPI is installed. You can later override this with the MPI_*_DIRS options. Setting it to NO_BUILD means that you do not want to use a pre-installed MPI, which is the wrong option for you.

comment:9 Changed 4 years ago by bmundim

Ok, but what I am trying to say is that the MPI_DIR
might not be sufficient to determine where the inc and lib
files are. For example the default mpi stack on supermuc sets
the MPI_DIR to /opt/ibmhpc/pecurrent/mpich2, MPI_INC_DIR
to -I/opt/ibmhpc/pecurrent/mpich2/intel/include64 and MPI_LIB_DIR
to -L/opt/ibmhpc/pecurrent/mpich2/intel/lib64. If MPI_INC_DIR
and MPI_LIB_DIR set in the optionlist do overwrite the ones
set by the script when MPI_DIR is set, then we should be fine.
Otherwise it will cause problems. I am working on updating supermuc configuration too. Let's see how it will behave.

Thanks,
Bruno.

comment:10 Changed 4 years ago by Ian Hinder

Bruno: I just ran into this on Hydra, which is similar to supermuc, and setting MPI_DIR to some directory solves the problem. It is never used, so you can set it to anything.

Erik: NO_BUILD means "don't use a preinstalled version". I would say this is misnamed. There might not be a single directory which corresponds to the MPI installation directory, and both MPI_LIB_DIRS and MPI_INC_DIRS need to be set. In that case, there is no meaningful content to put in MPI_DIR, as it will never be used. You seem to be defending the current system and suggesting that people should not have found it confusing. I find it very confusing.

comment:11 Changed 4 years ago by Erik Schnetter

These are the five cases that we need to cover:

  1. use an installed library at a specified location (set MPI_DIR to point to the library)
  2. always build it (set MPI_DIR to BUILD)
  3. do nothing, e.g. for Cray (set MPI_DIR to NONE)
  4. search for an installed library, build it not found (MPI_DIR is empty, i.e. this is the default)
  5. search for an installed library, fail if not found (MPI_DIR is NO_BUILD)

I think this covers all interesting cases. The case NONE could also be handled by setting MPI_DIR and MPI_LIBS to a "fantasy" directory and "fantasy" library, but that's slightly inelegant.

Note that the user's option settings are ignored in all cases except 1.

I notice we're overloading the meaning of MPI_DIR. We could instead use a setting for MPI, which would be less confusing.

We could also rename NO_BUILD to SEARCH_AND_IGNORE_USER_SETTINGS_AND_NEVER_BUILD. That options is a bit of an outlier, because we know in this case that we need to use a system library (so it's presumably a strange system), but we also expect Cactus to find this system library.

We should probably abort with an error if user options are set and are ignored.

comment:12 Changed 4 years ago by Frank Löffler

I agree that the way things are now can be a bit confusing. MPI_DIR, however, is used as directory, to direct the search to a specific directory if needed/requested - in case none of the other MPI_ variables are set. It is only not looked at (but needs to be set), if one of the others is set. In this case you are expected to set everything manually.

comment:13 Changed 4 years ago by Erik Schnetter

Ian, you seem to be wanting a new system. You are welcome to re-design the existing system. Please keep it backward compatible, and ensure it works for all the cases listed above. Also, to reduce confusion, please convert the existing external libraries so that there isn't a different mechanism for each of these. Thank you. No, I'm not defending the current system -- I'm merely explaining it.

comment:14 Changed 4 years ago by Steven R. Brandt

Sorry. I had intended that if you set NO_BUILD and then explicitly specified the directories that the configuration should have worked. I'll put up a patch later today.

comment:15 Changed 4 years ago by Roland Haas

Shorter names for cases 1-5 that are less confusing may be:

  1. MANUAL
  2. BUILD
  3. NATIVE (this is what the old MPI interface used)
  4. AUTOPROBE (this is the same name that IOUtils uses to probe for possibly existing checkpoints), or AUTO
  5. SEARCH

this avoids the confusing name NO_BUILD which is currently used in places where NATIVE is meant eg on the Cray machines (if I am not mistaken).

comment:16 Changed 4 years ago by Erik Schnetter

Steve -- please don't change the behaviour. Many of the well-tested option lists depend on it. Also, how can the current case 5 then be specified?

comment:17 Changed 4 years ago by Erik Schnetter

Roland: The difference between autoprobe and search is not clear; neither indicates whether Cactus would proceed to build MPI.

comment:18 Changed 4 years ago by Roland Haas

So would "AUTOMATIC" be clearer in implying that it may build? SEARCH does not imply building to my ears.

Addendum: may make sense to have the SEARCH method accept an MPI_DIR to give it a starting point to search, so that one can set MPI_DIR=some-mpi-root and it will find both libmpi.so and libmpich.so etc. So just setting MPI_DIR would SEARCH for an MPI installation in there, while setting MPI_DIR and MPI_LIBS would still search for the include files but fix the libraries to whatever they are. So in a blending from SEARCH to MANUAL, SEARCH would be

MPI_DIR = SEARCH

and MANUAL would be

MPI_DIR = MANUAL
MPI_COMPILE_FLAGS = -I/usr/lib/mpich/include -DMPICH_SKIP_CXX_SEEK
MPI_LINK_FLAGS = -lmpich -lpthreads -L/usr/lib/mpich/lib64

with

MPI_DIR = /usr/lib/mpich

being somewhere in between. This is kind of my personal wishlist for how I wished the system would behave.

comment:19 Changed 4 years ago by Erik Schnetter

Roland: In this case, please add a sixth case to my list above that can then be the default. Then let's find good names for all of these. Then people can set MPI to any of these cases (or do nothing to get the default), and thorn MPI will then act accordingly. User options that are set, but are always ignored for the particular choice would be errors.

comment:20 Changed 4 years ago by Ian Hinder

I wrote up some ideas two years ago (https://docs.einsteintoolkit.org/et-docs/Improving_the_treatment_of_external_libraries). Maybe we could make changes to that document?

comment:21 Changed 4 years ago by Ian Hinder

And the corresponding ticket is #1175.

comment:22 Changed 4 years ago by Ian Hinder

Status: newconfirmed

comment:23 Changed 4 years ago by Erik Schnetter

Summary: ExternalLibraries/MPI should report error messagesConfigure options for ExternalLibraries/MPI should be easier to understand

Modify Ticket

Change Properties
Set your email in Preferences
Action
as confirmed The ticket will remain with no owner.
Next status will be 'review'.
as The resolution will be set.
to The owner will be changed from (none) to the specified user.
The owner will be changed from (none) to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.