Without MPI installed, Cactus doesn't build

Issue #2158 closed
Steven R. Brandt created an issue

I attempted to test our automatic building of Cactus on a system without MPI installed. In principle, thorn MPI should build and the system should work. Instead, I get this:

MPI: Building...
Making all in config
Making all in contrib
Making all in opal
Making all in include
Making all in asm
  CC       asm.lo
ln -s "../../opal/asm/generated/atomic-amd64-linux.s" atomic-asm.S
  CPPAS    atomic-asm.lo
  CCLD     libasm.la
../../libtool: line 6000: cd: NO_BUILD/lib: No such file or directory
libtool: link: cannot determine absolute directory name of `NO_BUILD/lib'
Makefile:1584: recipe for target 'libasm.la' failed
make[6]: *** [libasm.la] Error 1
Makefile:2153: recipe for target 'all-recursive' failed
make[5]: *** [all-recursive] Error 1
Makefile:1702: recipe for target 'all-recursive' failed
make[4]: *** [all-recursive] Error 1
Died at /home/etuser/Cactus/arrangements/ExternalLibraries/MPI/src/build.pl line 74.
/home/etuser/Cactus/arrangements/ExternalLibraries/MPI/src/make.code.deps:9: recipe for target '/home/etuser/Cactus/configs/sim/scratch/done/MPI' failed
make[3]: *** [/home/etuser/Cactus/configs/sim/scratch/done/MPI] Error 17
/home/etuser/Cactus/lib/make/make.thornlib:112: recipe for target 'make.checked' failed
make[2]: *** [make.checked] Error 2
/home/etuser/Cactus/lib/make/make.configuration:181: recipe for target '/home/etuser/Cactus/configs/sim/lib/libthorn_MPI.a' failed
make[1]: *** [/home/etuser/Cactus/configs/sim/lib/libthorn_MPI.a] Error 2
Makefile:256: recipe for target 'sim' failed
make: *** [sim] Error 2
The command '/bin/sh -c ./simfactory/bin/sim build -j8 --thornlist ../einsteintoolkit.th' returned a non-zero code: 1

The test system is made from docker

FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
RUN apt-get install -y libfftw3-dev libssl-dev libhdf5-dev subversion gcc curl libjpeg-turbo?-dev git make pkg-config g++ libpapi-dev patch libgsl-dev libhwloc-dev python liblapack-dev numactl gfortran
RUN adduser etuser
USER etuser
WORKDIR /home/etuser
ENV USER etuser
RUN curl -kLO https://raw.githubusercontent.com/gridaphobe/CRL/master/GetComponents
RUN chmod a+x GetComponents
RUN ./GetComponents --parallel https://bitbucket.org/einsteintoolkit/manifest/raw/master/einsteintoolkit.th
RUN echo testme > .hostname
WORKDIR /home/etuser/Cactus
RUN ./simfactory/bin/sim setup-silent
RUN ./simfactory/bin/sim build -j8 --thornlist ../einsteintoolkit.th

Keyword: None

Comments (10)

  1. Roland Haas
    • removed comment

    Unless this is ticket to serve only as a reminder for yourself, could you include the full build log obtained from:

    VERBOSE=yes ./simfactory/bin/sim build -j8 --thornlist ../einsteintoolkit.th 2>&1 | tee make.log
    

    as well as whatever option list, machine.ini ended up being used, please?

    Otherwise this seems like poking in the dark for all of us. I would be particularly curious where the NO_BUILD is coming from (which does not eg appear in generic.cfg).

  2. Steven R. Brandt reporter
    • removed comment

    Roland, this uses generic.cfg. I didn't think I needed the full log since the included docker file completely reproduces the problem. I can regen it, though.

  3. Roland Haas
    • removed comment

    Having instructions to reproduce is definitely a plus. Yet having log files would mean less of a burden I think for those who would like to help fix it since they can already guess what could be happening. Docker is quite a hurdle for me for example since I have to look up every single command for it on the internet.

    Using generic.cfg and getting errors about NO_BUILD is very strange indeed.

  4. Steven R. Brandt reporter
    • changed status to open
    • removed comment

    It turns out the problem is fairly simple. The configuration of HWLOC was broken. This fixes it:

    ===================================================================
    --- arrangements/ExternalLibraries/MPI/src/build.pl     (revision 87)
    +++ arrangements/ExternalLibraries/MPI/src/build.pl     (working copy)
    @@ -62,7 +62,7 @@
     print "MPI: Configuring...\n";
     chdir(${NAME});
     my $hwloc_opts = '';
    -if ($ENV{HWLOC_DIR} ne '') {
    +if ($ENV{HWLOC_DIR} ne '' and $ENV{HWLOC_DIR} ne 'NO_BUILD') {
         $hwloc_opts = "--with-hwloc='$ENV{HWLOC_DIR}'";
     }
     # Cannot have a memory manager with a static library on some systems
    
  5. Roland Haas
    • removed comment

    I see. This seems somewhat ugly, since we have to look for "magic" directory names. It seems workable but is required everywhere else where we may refer to XXX_DIR.

    I would also try to improve hwloc's HWLOC_DIR setting logic? Ie.

    HWLOC_DIR="$(echo ${HWLOC_INC_DIRS} NO_BUILD | sed 's!/[^/]* *!!')"
    

    seems a bit strange to me. I kind of understand what it wants to do, namely use HWLOC_INC_DIRS then remove the last part of the path (presumably "include") from the first one found, or use NO_BUILD if HWLOC_INC_DIRS is empty. Would it make sense to try and return only a single word in HWLOC_INC_DIRS? Right now it may be "/home/sw/ NO_BUILD".

    I would try for the output of

    pkg-config hwloc --variable=prefix
    

    as well. This is not foolproof as the variable does not have to exist.

  6. Steven R. Brandt reporter
    • removed comment

    My recollection is that we decided sometime ago that these packages would use the special string NO_BUILDDIR rather than the empty string to signify that the build dir was not set. I believe the hwloc thorn is doing the correct thing when it sets that option.

  7. Roland Haas
    • removed comment

    Looking at what HWLOIC is doing, it seems that it is setting HWLOC_DIR to "NO_BUILD" if it can find all the "-l" options from pkg-config but cannot extract a directory from the CFLAGS options that pkg-config reports. Which is not the same as if a variable was not set (by a user). It is an inability by hwloc to determine an installation prefix directory for itself. Admittedly given that Ubuntu now uses /usr/lib/x86_64-linux-gnu/ instead of the traditional /usr/lib, a prefix (namely "/usr") is lees useful since one cannot expect anymore that given a prefix the include directory is prefix/include and the lib is prefix/lib (other than using the compatibility directories like /usr/lib/x86_64-linux-gnu/hdf5/serial/ that are sometimes provided).

    I am not sure what the current agreed upon convention is? Can one expect that HWLOC_DIR/lib and HWLOC_DIR/include exist and contain the libs and header files or is all one can rely on that HWLOC_DIR is some directory associated with HWLOC (eg HWLOC_DIR/bin contains utilities)?

    Having HWCLOC_DIR set to NO_BUILD does indeed allow other externallibraries that require HWLOC take steps in case HWLOC is in a system location (which is why there is no -I in CFLAGS) and at least abort the build if the, for whatever reason, do need a prefix directory that is valid.

    Such a convention should ideally be documented in the minutes or wiki if possible to make sure that all thorns use the same magic value HWLOC uses NO_BUILD right now (so a different magic value). The same logic (and magic value) should then be implemented in all other ExternalLibraries.

  8. Log in to comment