hwloc: lnuma & lltdl *really* required?

Issue #1717 open
Zach Etienne created an issue

I downloaded the ET devel version (ca. Nov 26) on my (ubuntu) laptop and compiled it, using gcc.

I compiled to the linker stage, and the linker complained: ld: cannot find -lnuma ld: cannot find -lltdl

I found references to these libraries in: configs/[buildname]/bindings/Configuration/Capabilities/make.HWLOC.defn

After removing these references, the code compiled and seemed (on the surface) to run okay. Are these libraries really necessary?

I ask because every time I need to install ET on a new machine, it would be more convenient if the step "apt-get install libnuma-dev libltdl-dev" were left out, particularly since reliable Internet access may not exist at that time.

Keyword:

Comments (42)

  1. Erik Schnetter
    • removed comment

    The content of this file is auto-generated. Cactus's hwloc configuration script queries the hwloc installation what libraries are required to link against hwloc, and apparently this is the answer.

    How did you configure hwloc -- are you sure that the version installed on your system is used, and that Cactus does not build hwloc? You can post the screen output of the configuration stage, which would allow us to decide if you can't tell.

    Can you post the output of "pkg-config hwloc --static --libs" and/or "pkg-config hwloc --libs" on your system?

  2. Zach Etienne reporter
    • removed comment

    Looks like I installed the following packages on my Ubuntu 14.04 system prior to compiling ET: libhwloc-dev:amd64 libhwloc-plugins libhwloc5:amd64

    pkg-config hwloc --static --libs

    -lhwloc -lm -lnuma -lltdl -lpthread -ldl

    pkg-config hwloc --libs -lhwloc

    After uninstalling the libhwloc-dev, libnuma-dev, and libltdl-dev packages, I recompiled ET from scratch, and there were no linker problems. So perhaps there is a bug in the ET build system when hwloc-dev is installed?

  3. Erik Schnetter
    • removed comment

    I do not think this is a but on the ET build system, as the ET gets this information from hwloc. I would suspect that the hwloc information is wrong. It may be that hwloc's package manager didn't realize that libhwloc-dev needs to depend on libnuma-dev and libtdl-dev.

    If you don't install libhwloc-dev, then the ET will build hwloc from scratch instead of using the system version. This will always work.

  4. anonymous
    • removed comment

    Yes, you seem to be correct about the case in which libhwloc-dev is installed.

    I think the problem lies with configure.sh, as it requests data from

    "pkg-config hwloc --static --libs".

    Given the output from pkg-config commands above, is it possible that changing configure.sh to call

    "pkg-config hwloc --libs"

    instead (i.e., without "--static") would fix the problem?

  5. Erik Schnetter
    • removed comment

    This may or may not circumvent the problem. I suspect it would. At some point we decided to use static libraries as much as possible when building Cactus, since this reduces dependences on things that can change after the executable has been built. I don't think this is a priority any more. You could try this; node that hwloc's configuration script already omits the --static as a fallback, if pkgconfig does not support this option.

    In general, if there is a machine where something is broken, we use Simfactory's option list to circumvent the problem. In this case, we would set "hwloc=BUILD" in the options, probably accompanied by a comment explaining why.

  6. Frank Löffler
    • removed comment

    Given that the manpage for pkg-config already mentions:

           --static
                  Output  libraries  suitable  for  static linking.  That means including any private
                  libraries in the output.  This relies on proper tagging in the .pc  files,  else  a
                  too large number of libraries will ordinarily be output.
    

    my first guess would be that hwloc itself is at fault: nothing we could fix.

  7. Zach Etienne reporter
    • removed comment

    Hi Erik,

    I like the idea of a workaround within ET/hwloc, e.g., by disabling the --static within hwloc/configure.sh unless a static build is explicitly requested. Speaking of which, is there a configuration option to compile ET statically? If not, the default options should at least be consistent with static or dynamic, and I would argue that the default should be a regular, non-static compilation (i.e., without the --static option within hwloc/configure.sh).

  8. Zach Etienne reporter
    • removed comment

    Erik,

    Yes, I just verified that the compile does work if the --static is omitted, and fails if --static is included.

  9. Erik Schnetter
    • removed comment

    As a general rule, we don't want to have work-arounds for particular systems in Cactus since this is very fragile. Over time, these work-arounds accumulate, and in the end no one remembers why somethings are done in a particular and very complex way. Sometimes we find comments about systems that no one even knows any more. For example, do you know why the C++ compiler on OSF systems requires the option "-noimplicit_include"? Does anybody even know what OSF is, without resorting to Google? (Hint: It is a version of Unix.) Apparently this option was introduced in 1999... I'm quite sure that the respective logic can be deleted and no one will ever notice, but no one has time to deal with these kinds of clean-ups because, sometimes, these work-arounds have subtle side-effects that break things when they are removed.

    As I mentioned before, it is very easy (a one-line addition) to update the option list for your machine to avoid this problem. Also, since your system seems broken, you'd have to explain why you cannot correct your install, and why you think that modifying Cactus instead is a better idea.

    However, you raise the questions whether Cactus should be linked statically or dynamically by default. These days, we tend to like static linking because (a) disk space usage is not really a concern, and (b) this means that executables are more independent once created. If you have a dynamically linked executable and then uninstall a certain library, it may break. This is an issue on supercomputers where production runs may take weeks or months, and where someone may have installed a library into his/her home directory and others are then using it. Once an executable is broken this way, it is very difficult (if impossible) to repair it. Static linking avoids this issue.

    As a bonus, static linking also uncovers errors (duplicate symbols) that may go undetected with dynamic linking.

  10. Zach Etienne reporter
    • removed comment

    I am agnostic about whether static or dynamic linking should be chosen as default, though I anticipate more headaches if static were the default (you brought up the standard reasons).

    Further I would argue that we should make one choice as default and stick with it consistently, and not something between static and dynamic, as it creates confusion (this case for example).

  11. Frank Löffler
    • removed comment

    I agree about the problem with workarounds. However:

    Replying to [comment:12 eschnett]:

    As I mentioned before, it is very easy (a one-line addition) to update the option list for your machine to avoid this problem. Also, since your system seems broken, you'd have to explain why you cannot correct your install, and why you think that modifying Cactus instead is a better idea.

    This seems to be a generic Ubuntu installation. It is not an isolated machine. New users are not unlikely to hit the same issue, unless they know to choose the (then fixed) option list. Also, if this is a problem with the .pc files in hwloc, then this is likely a problem even outside of Ubuntu.

    However, you raise the questions whether Cactus should be linked statically or dynamically by default. These days, we tend to like static linking because (a) disk space usage is not really a concern, and (b) this means that executables are more independent once created.

    This might be true on a supercomputer. I certainly like dynamic libraries more when I develop, i.e., most of the time on my laptop/workstation. So, I would answer this with "it depends". I would think both should work.

  12. Erik Schnetter
    • removed comment

    Frank -- what do you suggest concretely?

    If you think that a standard Ubuntu may be broken, then we should update our standard Ubuntu option list. We can either make it build hwloc ourselves, or at least add the missing packages to the comments at the top of this list.

    There is always a dilemma between using a pre-existing library and building things on our own. I usually prefer to build my own, since this is more likely to work. Others prefer using existing libraries. In this case, we probably need to extend the extent to which we test existing libraries before we use them.

  13. Frank Löffler
    • removed comment

    Replying to [comment:15 eschnett]:

    Frank -- what do you suggest concretely?

    If I read the following correctly, this could be a rather interesting problem: http://www.open-mpi.org/community/lists/hwloc-devel/2013/05/3743.php

    The problem seems to be that when used as dynamical library, hwloc does not depend on libltdl as "usual dynamic library", but can load it later by itself, if found (dynamically, but ldd does not see that). If built statically of course, ltdl would need to be linked in for it to be usable. So, if I read this correctly, the hwloc-dev package has the option of depending on the other two libraries at build-time, but does not have to (it builds without these libraries, but it can use these libraries in a dynamic setup). Does anybody here agree to my interpretation?

    That leaves the user with an installed hwloc that uses both numa and ltdl if present (because it was compiled with support for them), and the library correctly reports the link-dependencies both for dynamic and static linking. However, the -dev package does not have a dependency on the other two library packages because they are optional, at least for the dynamic libraries that are typically used on the system. So, the problem is: should the hwloc-dev package depend on the numa/ldtl libraries? It should for static linking, and should not for dynamic linking. Since the hwloc package only provides the dynamic version, and since by far the majority of users would link dynamically, their decision to not add the dependency is correct in my eyes.

    Now the question would be: what do we do in this situation? We could test for these libraries to be present when we build hwloc and link statically, and give a better error message.

  14. Erik Schnetter
    • removed comment

    No, the hwloc package manager's decision is not correct. libhwloc-dev provides a library libhwloc.a that cannot be used without libltdl.a. Thus, it either needs to depend on the libltdl-dev package, or needs to provide a library libhwloc.a that does not have this dependency.

    If you want to provide a work-around, then I suggest to do this in the system-specific file "ubuntu.cfg" of Simfactory. Of course, you can also check in hwloc's configure script whether we are using the system hwloc library, whether it depends on libltdl, whether this library is installed, etc., and if so, refuse to use the system library. We usually don't go to these lengths when looking for existing libraries, though. If you really want to go this route, then I would suggest using autoconf or cmake for this, which provide exactly this kind of functionality.

  15. Frank Löffler
    • removed comment

    Replying to [comment:17 eschnett]:

    No, the hwloc package manager's decision is not correct. libhwloc-dev provides a library libhwloc.a that cannot be used without libltdl.a. Thus, it either needs to depend on the libltdl-dev package, or needs to provide a library libhwloc.a that does not have this dependency.

    Yes, I missed the .a file in the -dev package. I assumed this to be in the hwloc package, if at all present. In this case, it would indeed need to be reported to the Ubuntu package maintainers. The respective Debian package has the missing dependencies, but not yet in the released version - which means Debian is likely affected too, in the current release.

    If you want to provide a work-around, then I suggest to do this in the system-specific file "ubuntu.cfg" of Simfactory.

    I don't think that the current option lists for these specify a static build. We should probably give the correct flag (--static or not) to pkg-config, and in this case that should work, shouldn't it?

  16. Erik Schnetter
    • removed comment

    When you link against a library, then you need to decide whether to link statically or dynamically. That is independent of whether other libraries are linked statically or dynamically, or whether the final executable is a dynamic library. Here, we choose (as in many other external libraries) static libraries. I explained the reasons for this above.

  17. Ian Hinder
    • removed comment

    Should this apply also to debian, where Frank says (comment:18) the problem also exists?

  18. Frank Löffler
    • removed comment

    Replying to [comment:21 hinder]:

    Should this apply also to debian, where Frank says (comment:18) the problem also exists?

    Only if it really turns up to be a problem there. The package dependencies look similar, but I can build the current version (at least last timeI tried). Maybe the static library does have different dependencies there.

  19. Roland Haas
    • removed comment

    There is a similar problem with numa lib, static linking and self-build hwloc. Currently detect.sh only claims hwloc as a required library (HWLOC_LIBS=hwloc) when building hwloc, however hwloc's configure script will find libnuma and use it, which makes numa a required library ''in that case'' only. A workaround right now is to add numa to LIBS but this has to be done by the user since whether numa is required depends on both the software installed on the machine as well as hwloc's configure script.

    Right now there seems to be no good way to fix this, since detect.sh sets HWLOC_LIBS before configure runs so cannot know what configure will do. As far as I can tell, the only way to fix this is to run the ExternalLibraries configure script in detect.sh but then defer building to build.sh.

  20. anonymous
    • removed comment

    Can pkg-config be run after configure, but before building (and installing) the library? If not, we would also need to have another way of finding the libraries than pkg-config.

  21. Frank Löffler
    • removed comment

    The current proposed, short-term (release), dirty solution (thanks, Steve) would be: within detect.sh emulate 'configure' of the library by calling the compiler with a minimal file and, e.g., -lnuma, and if that succeeds we assume the compiler found libnuma, could use it, and 'configure' of hwloc will find it too. This is obviously not the best solution, it is a hack, and a dirty one. It assumes stuff that can, and probably will go wrong on some machines. However, it should work on most machines, and it would be not very invasive, especially so short before a release.

    Any opinions on this before we try it?

  22. Erik Schnetter
    • removed comment

    I would not make changes just before a release. I would provide a stop-gap solution that enables people to build the ET. Since there seems to be a work-around via "apt-get", I would go with this.

  23. Frank Löffler
    • removed comment

    There are two problems mixed in this ticket. The first is the one involving system libraries, and I agree that at least for the release specifying a list of packages is a viable workaround.

    The other problem, however, revolves around a self(Cactus)-built hwloc. The underlying problem is similar, but here we cannot point people to an option list, because this can happen on any machine.

  24. Frank Löffler
    • removed comment

    Replying to [comment:31 eschnett]:

    In this case people can set HWLOC_EXTRA_LIBS in their option list.

    Interesting possibility. I wouldn't expect that to be necessary if Cactus builds hwloc itself.

  25. Frank Löffler

    Given the proximity of a release, and that this is not a regression (was the case also in older releases), I propose to leave it as is for now.

    After the release I propose to open the possibility to have the build scripts (build.sh) overwrite/set Cactus build variables, specifically HWLOC_LIBS in this case, but also others if need be (as detect.sh). The "only" issue I see with this is that it would need to be done in a way that parallel builds still work.

  26. Frank Löffler
    • removed comment

    Also, note that there is another ticket about hwloc: #1753. That's not strictly the same issue, but whoever fixes one should know about the other.

  27. Roland Haas

    I would suggest removing --static as Zach pointed out. Most clusters these days actually do use dynamic linking. Also, at least as far as I am concerned, the automatisms in ExternalLibraries are for workstations. For clusters we create individual option lists anyway where we can add the extra libraries required to link statically.

  28. Roland Haas

    I will remove --static from hwloc’s build options (or find out if one can build both static and dynamic) after 2020-08-18.

  29. Roland Haas
    • changed status to open

    This has unintended side effects on clusters where we want static linking.

    On Blue Waters when both static and shared library are build, then the linker, even though called via the CC wrapper which defaults to static linking will prefer the .so file over the .a file and throws and error.

  30. Roland Haas

    Hmm, I could not come up with a “nice” solution that will build shared on workstations but let me link statically on Cray’s (Blue Waters):

    ./configure --prefix=${HWLOC_DIR} ${bgq} ${handle_pci} --disable-cairo --disable-libxml2 --disable-cuda --disable-nvml --disable-opencl --with-x=no --disable-gl --enable-shared=no --enable-static=yes
    
    echo "hwloc: Building static library..."
    pushd hwloc
    ${MAKE}
    popd
    
    echo "hwloc: Building static utilities..."
    pushd utils
    HAVE_UTILS=
    if ${MAKE} ; then
      ${MAKE} install
      HAVE_UTILS="static"
    fi
    popd
    
    echo "hwloc: Cleaning up..."
    ${MAKE} clean
    
    echo "hwloc: Configuring shared library..."
    ./configure --prefix=${HWLOC_DIR} ${bgq} ${handle_pci} --disable-cairo --disable-libxml2 --disable-cuda --disable-nvml --disable-opencl --with-x=no --disable-gl --enable-shared=yes --enable-static=yes
    
    echo "hwloc: Building shared library..."
    pushd hwloc
    ${MAKE}
    popd
    
    echo "hwloc: Building shared utilities..."
    pushd utils
    if ${MAKE} ; then
      ${MAKE} install
      HAVE_UTILS="shared"
    fi
    popd
    
    if [ -n "$HAVE_UTILS" ] ; then
      echo "hwloc: Successfully build ${HAVE_UTILS} utilities..."
    else
      exit 1
    fi
    
    echo "hwloc: Installing library ..."
    pushd hwloc
    ${MAKE} install
    popd
    pushd include
    ${MAKE} install
    popd
    
    echo "hwloc: Cleaning up..."
    rm -rf ${BUILD_DIR}
    

    Which works like this:

    • configure and build hwloc library statically so that no .so files are present
    • build utilitiles and install them
    • clean everything
    • configure and build hwloc library statically and dynamically
    • attempt to build utils once more which will build them against the shared lib if possible, install if build succeeds
    • install library (shared and static)
    • install include files

    Downsides:

    • configures twice
    • builds full library twice (once with -fPIC and once without -fPIC)
    • static utilities will be linked a hwloc.a that is not actually the one installed (since they will use the one without -fPIC)
    • general uglyness in output eg failure to build dynamci utils shows up in output

  31. Log in to comment