Modify

Opened 2 months ago

Last modified 8 weeks ago

#2158 review defect

Without MPI installed, Cactus doesn't build

Reported by: Steven R. Brandt Owned by:
Priority: unset Milestone:
Component: Other Version: development version
Keywords: Cc:

Description

I attempted to test our automatic building of Cactus on a system without MPI installed. In principle, thorn MPI should build and the system should work. Instead, I get this:

MPI: Building...
Making all in config
Making all in contrib
Making all in opal
Making all in include
Making all in asm
  CC       asm.lo
ln -s "../../opal/asm/generated/atomic-amd64-linux.s" atomic-asm.S
  CPPAS    atomic-asm.lo
  CCLD     libasm.la
../../libtool: line 6000: cd: NO_BUILD/lib: No such file or directory
libtool: link: cannot determine absolute directory name of `NO_BUILD/lib'
Makefile:1584: recipe for target 'libasm.la' failed
make[6]: *** [libasm.la] Error 1
Makefile:2153: recipe for target 'all-recursive' failed
make[5]: *** [all-recursive] Error 1
Makefile:1702: recipe for target 'all-recursive' failed
make[4]: *** [all-recursive] Error 1
Died at /home/etuser/Cactus/arrangements/ExternalLibraries/MPI/src/build.pl line 74.
/home/etuser/Cactus/arrangements/ExternalLibraries/MPI/src/make.code.deps:9: recipe for target '/home/etuser/Cactus/configs/sim/scratch/done/MPI' failed
make[3]: *** [/home/etuser/Cactus/configs/sim/scratch/done/MPI] Error 17
/home/etuser/Cactus/lib/make/make.thornlib:112: recipe for target 'make.checked' failed
make[2]: *** [make.checked] Error 2
/home/etuser/Cactus/lib/make/make.configuration:181: recipe for target '/home/etuser/Cactus/configs/sim/lib/libthorn_MPI.a' failed
make[1]: *** [/home/etuser/Cactus/configs/sim/lib/libthorn_MPI.a] Error 2
Makefile:256: recipe for target 'sim' failed
make: *** [sim] Error 2
The command '/bin/sh -c ./simfactory/bin/sim build -j8 --thornlist ../einsteintoolkit.th' returned a non-zero code: 1

The test system is made from docker

FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
RUN apt-get install -y libfftw3-dev libssl-dev libhdf5-dev subversion gcc curl libjpeg-turbo?-dev git make pkg-config g++ libpapi-dev patch libgsl-dev libhwloc-dev python liblapack-dev numactl gfortran
RUN adduser etuser
USER etuser
WORKDIR /home/etuser
ENV USER etuser
RUN curl -kLO https://raw.githubusercontent.com/gridaphobe/CRL/master/GetComponents
RUN chmod a+x GetComponents
RUN ./GetComponents --parallel https://bitbucket.org/einsteintoolkit/manifest/raw/master/einsteintoolkit.th
RUN echo testme > .hostname
WORKDIR /home/etuser/Cactus
RUN ./simfactory/bin/sim setup-silent
RUN ./simfactory/bin/sim build -j8 --thornlist ../einsteintoolkit.th

Attachments (0)

Change History (7)

comment:1 Changed 2 months ago by Roland Haas

Unless this is ticket to serve only as a reminder for yourself, could you include the full build log obtained from:

VERBOSE=yes ./simfactory/bin/sim build -j8 --thornlist ../einsteintoolkit.th 2>&1 | tee make.log

as well as whatever option list, machine.ini ended up being used, please?

Otherwise this seems like poking in the dark for all of us. I would be particularly curious where the NO_BUILD is coming from (which does not eg appear in generic.cfg).

comment:2 Changed 2 months ago by Steven R. Brandt

Roland, this uses generic.cfg. I didn't think I needed the full log since the included docker file completely reproduces the problem. I can regen it, though.

comment:3 Changed 2 months ago by Roland Haas

Having instructions to reproduce is definitely a plus. Yet having log files would mean less of a burden I think for those who would like to help fix it since they can already guess what could be happening. Docker is quite a hurdle for me for example since I have to look up every single command for it on the internet.

Using generic.cfg and getting errors about NO_BUILD is very strange indeed.

comment:4 Changed 2 months ago by Steven R. Brandt

Status: newreview

It turns out the problem is fairly simple. The configuration of HWLOC was broken. This fixes it:

===================================================================
--- arrangements/ExternalLibraries/MPI/src/build.pl     (revision 87)
+++ arrangements/ExternalLibraries/MPI/src/build.pl     (working copy)
@@ -62,7 +62,7 @@
 print "MPI: Configuring...\n";
 chdir(${NAME});
 my $hwloc_opts = '';
-if ($ENV{HWLOC_DIR} ne '') {
+if ($ENV{HWLOC_DIR} ne '' and $ENV{HWLOC_DIR} ne 'NO_BUILD') {
     $hwloc_opts = "--with-hwloc='$ENV{HWLOC_DIR}'";
 }
 # Cannot have a memory manager with a static library on some systems

comment:5 Changed 2 months ago by Roland Haas

I see. This seems somewhat ugly, since we have to look for "magic" directory names. It seems workable but is required everywhere else where we may refer to XXX_DIR.

I would also try to improve hwloc's HWLOC_DIR setting logic? Ie.

HWLOC_DIR="$(echo ${HWLOC_INC_DIRS} NO_BUILD | sed 's!/[^/]* *!!')"

seems a bit strange to me. I kind of understand what it wants to do, namely use HWLOC_INC_DIRS then remove the last part of the path (presumably "include") from the first one found, or use NO_BUILD if HWLOC_INC_DIRS is empty. Would it make sense to try and return only a single word in HWLOC_INC_DIRS? Right now it may be "/home/sw/ NO_BUILD".

I would try for the output of

pkg-config hwloc --variable=prefix

as well. This is not foolproof as the variable does not have to exist.

comment:6 Changed 8 weeks ago by Steven R. Brandt

My recollection is that we decided sometime ago that these packages would use the special string NO_BUILDDIR rather than the empty string to signify that the build dir was not set. I believe the hwloc thorn is doing the correct thing when it sets that option.

comment:7 Changed 8 weeks ago by Roland Haas

Looking at what HWLOIC is doing, it seems that it is setting HWLOC_DIR to "NO_BUILD" if it can find all the "-l" options from pkg-config but cannot extract a directory from the CFLAGS options that pkg-config reports. Which is not the same as if a variable was not set (by a user). It is an inability by hwloc to determine an installation prefix directory for itself. Admittedly given that Ubuntu now uses /usr/lib/x86_64-linux-gnu/ instead of the traditional /usr/lib, a prefix (namely "/usr") is lees useful since one cannot expect anymore that given a prefix the include directory is prefix/include and the lib is prefix/lib (other than using the compatibility directories like /usr/lib/x86_64-linux-gnu/hdf5/serial/ that are sometimes provided).

I am not sure what the current agreed upon convention is? Can one expect that HWLOC_DIR/lib and HWLOC_DIR/include exist and contain the libs and header files or is all one can rely on that HWLOC_DIR is some directory associated with HWLOC (eg HWLOC_DIR/bin contains utilities)?

Having HWCLOC_DIR set to NO_BUILD does indeed allow other externallibraries that require HWLOC take steps in case HWLOC is in a system location (which is why there is no -I in CFLAGS) and at least abort the build if the, for whatever reason, do need a prefix directory that is valid.

Such a convention should ideally be documented in the minutes or wiki if possible to make sure that all thorns use the same magic value HWLOC uses NO_BUILD right now (so a different magic value). The same logic (and magic value) should then be implemented in all other ExternalLibraries.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as review The ticket will remain with no owner.
Next status will be 'reviewed_ok'.
as The resolution will be set.
The resolution will be deleted.
to The owner will be changed from (none) to the specified user.
to The owner will be changed from (none) to the specified user.
The owner will be changed from (none) to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.