make utils and simfactory ignore HWLOC=BUILD and try to copy hwloc executables from /usr/bin

Issue #1923 resolved
V M created an issue

Hello!

Trying to install the Einstein toolkit on a new machine, we came across this seemingly wrong behaviour:

setting HWLOC=BUILD in the optionlist, the ET is built with the bundled hwloc successfully.

However, upon building the utilities, it gives an error when it tries to copy hwloc-ls from /usr/bin to /exe/sim

The error results from hwloc-ls being a broken symlink in the system, but the weird thing is that it actually copies the executables from /usr/bin when we have actually built hwloc using the bundled version.

The same behaviour can also be seen here, where some hwloc executables are copied from /usr/bin, while others are copied from the hwloc bundle:

http://lists.einsteintoolkit.org/pipermail/test/2014-January/000047.html

Copying hwloc-assembler from https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/configs/sim/scratch/external/hwloc/bin/hwloc-assembler to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-assembler-remote from https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/configs/sim/scratch/external/hwloc/bin/hwloc-assembler-remote to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-bind from /usr/bin/hwloc-bind to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-calc from /usr/bin/hwloc-calc to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-distances from https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/configs/sim/scratch/external/hwloc/bin/hwloc-distances to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-distrib from /usr/bin/hwloc-distrib to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-info from /usr/bin/hwloc-info to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-ls from /usr/bin/hwloc-ls to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying hwloc-ps from /usr/bin/hwloc-ps to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying lstopo from /usr/bin/lstopo to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim Copying lstopo-no-graphics from https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/configs/sim/scratch/external/hwloc/bin/lstopo-no-graphics to https://build.barrywardell.net/job/EinsteinToolkitReleased/ws/exe/sim

Is this wanted? I guess if one specifies HWLOC_DIR=BUILD, it should only use hwloc executables from the actual hwloc that comes bundled, right?

I have attached the optionlist as well as the config-info and a tar of the config-data folder for the build.

Keyword:

Comments (11)

  1. Frank Löffler
    • removed comment

    This mix is certainly not wanted. Some observations:

    For reference: the files that are copied from /usr/bin are also present in the built version. With that, the question is still why some of the files are taken from /usr/bin and some from the built version. Could it be that by default they are taken from /usr/bin, but in case some of them are not present there, the built version is used? To answer that, it would be interesting to know whether, e.g., /usr/bin/hwloc-assembler exists.

    In the thorn itself I don't see why, e.g., hwloc-assembler and hwloc-ls should be treated any different. Both are mentioned as standard hwloc binaries.

    Looking at https://build.barrywardell.net/job/EinsteinToolkitReleased/101/consoleFull I find "-L/usr/lib" in GENERAL_LIBRARIES, as well as -Wl,-rpath,/usr/lib. I wouldn't expect that there, but I am also not sure if this is related. Both LAPACK and BLAS define their dir to be /usr/lib, but both should be stripped in their configure.sh script (and from the output it looks like that happens).

    GENERAL_LIBRARIES also looks odd because of paths for PETSc and hwloc to appear multiple times (but could be due to dependencies).

    The option list adds LIBS=-L/usr/lib64, which looks not right, but since this is lib64 and not lib it might also not be related.

    Also, /usr/lib appears in INC_DIRS_F, which is strange, and points to MPI (only difference between INC_DIRS and INC_DIRS_F are MPI related). Looking at the output from when PETSc is built, at this point: MPI_LIB_DIRS=/usr/lib. That shouldn't happen.

    Looking at MPI's configure script, it does seem to strip $ENV{MPI_LIB_DIRS}. However, I don't know why it does contain this line:

    print "HWLOC_DIR       = $ENV{HWLOC_DIR}\n";
    

    What does MPI have to do with HWLOC? But then, this isn't HWLOC_LIB_DIRS, and shouldn't cause the problem mentioned in this ticket.

    One problem, I believe, is found in detect.pl of MPI:

    sub strip_lib_dirs {
        my $dirlist = shift;
        my @dirs = split / /, $dirlist;
        map { s{//}{/}g } @dirs;
        @dirs = grep { !m{^/(usr/(local/)?)?lib(64?)/?$} } @dirs;
        return join ' ', @dirs;
    }
    

    The regex does not trigger on /usr/lib paths, but should. Thus, MPI adds /usr/lib to the -L options.

    Now, whether and/or why this might trigger the hwloc-util problem I don't know. At this point I am probably at least as confused as you are reading this. Too many things look iffy:

    • MPI sets HWLOC_DIR
    • MPI doesn't strip paths properly
    • The option lists adds /usr/lib64 (shouldn't be necessary, but who knows)
  2. Frank Löffler
    • removed comment

    https://build.barrywardell.net/job/EinsteinToolkitReleased/101/consoleFull also mentions: "hwloc selected, but HWLOC_DIR not set.". HWLOC_DIR=build should not trigger this output. However, build 101 uses ubuntu.cfg that doesn't define HWLOC_DIR, while build 25 (the one references above) probably didn't (Jenkins switched options lists). On the other hand, even without HWLOC_DIR defined, since Jenkins didn't find a system installation, it built the bundled version in 101. Why it didn't find the system version (that apparently is installed) is another matter.

  3. Frank Löffler
    • removed comment

    Replying to [comment:1 knarf]:

    Looking at MPI's configure script, it does seem to strip $ENV{MPI_LIB_DIRS}. However, I don't know why it does contain this line:

    print "HWLOC_DIR = $ENV{HWLOC_DIR}\n";

    What does MPI have to do with HWLOC? But then, this isn't HWLOC_LIB_DIRS, and shouldn't cause the problem mentioned in this ticket.

    The reason for that seems to be that the build script makes use of HWLOC_DIR to point the MPI install to that directory in order to build using hwloc.

  4. Roland Haas

    Adding what I apparently forgot to post: the reason this happens is the the currently used rules by ExternalLibraries look like this:

    $(UTIL_DIR)/%: $(MPI_DIR)/bin/%
        @echo "Copying $* from $< to $(UTIL_DIR)"
        -$(MKDIR) $(MKDIRFLAGS) $(UTIL_DIR) 2> /dev/null
        cp $< $@
    

    it a pattern-rule that says that if you need to build % you can build it assuming you have $(MPI_DIR)/bin/%. Now if $(MPI_DIR)/bin contains a file hwloc-info then make will happily run this recipe instead of say the one that has $(HWLOC_DIR)/bin/%.

    The fix (above) is to make the rules more specific.

  5. Roland Haas

    Unless objected I will push the changes in the patches above to hwloc and MPI and the other ExternalLibraries copying executables after 2019-11-20.

  6. Roland Haas

    I applied the fix to all ExternalLibraries that use "Copying" in their make.configuration.deps files. This should fix the issue.

    However the fix is fragile. A single ExternalLibrary that uses the incorrect pattern:

    $(UTIL_DIR)/%: $(PAPI_DIR)/bin/%
    

    instead of

    $(patsubst %,$(UTIL_DIR)/%,$(PAPI_UTILS)): $(UTIL_DIR)/%: $(PAPI_DIR)/bin/%
    

    will bring back the bug.

  7. Log in to comment