Switch to OpenBLAS

Issue #1674 wontfix
Erik Schnetter created an issue

OpenBLAS is a BLAS and LAPACK library that is significantly more efficient that the standard ("reference") BLAS. I suggest we switch the ET thorn list to ExternalLibraries/OpenBLAS instead of ExternalLibraries/BLAS and ExternalLibraris/LAPACK.

Keyword:

Comments (26)

  1. Frank Löffler
    • removed comment

    blas is often provided by atlas (and is picked up by the current BLAS thorn), and I don't recall atlas to be particularly inefficient. Do you talk about the real "reference" blas, or atlas in your comparison (same for lapack)?

  2. Erik Schnetter reporter
    • removed comment

    At the moment, Cactus builds the reference BLAS if it doesn't find a system BLAS. Instead, it should build OpenBLAS. Building Atlas is more complicated since Atlas auto-tunes, which also takes a long time. OpenBLAS does not need this, and is apparently also faster than Atlas. (I don't know what this means in practice.)

  3. Frank Löffler
    • removed comment

    Since it would be confusing to have both thorn BLAS and thorn OpenBLAS in the ET, do you suggest to move to OpenBLAS for the toolkit (which would be fine with me, assuming OpenBLAS works at least as well)? And, do we really want to maintain both thorns?

  4. Erik Schnetter reporter
    • removed comment

    OpenBLAS also provides LAPACK. It contains a copy of the reference LAPACK, probably with optimizations to some important routines.

    My suggestion is to replace both BLAS and LAPACK by OpenBLAS, both as thorns as in the ET thorn list. BLAS and LAPACK are then not needed any more.

  5. Frank Löffler
    • removed comment

    The thorn OpenBLAS currently does not seem to be able to detect any other installed BLAS/LAPACK installation like the BLAS/LAPACK thorns do. Without this it is not yet a replacement for the BLAS/LAPACK thorns.

  6. Erik Schnetter reporter
    • removed comment

    It is dangerous to auto-detect existing BLAS libraries, because these are often just the reference BLAS. It is better to build OpenBLAS in this case, since building it is actually quite fast. The point of using OpenBLAS is to guarantee that we have an efficient BLAS implementation at hand, e.g. for people using PETSc.

    We could auto-detect existing BLAS libraries if we also auto-detect what kind it is (e.g. Atlas, OpenBLAS, reference BLAS, MKL, ...), but that is difficult, and we haven't done this before, so this shouldn't be a prerequisite.

  7. Frank Löffler
    • removed comment

    The current state with BLAS and LAPACK is that a user does not have to do anything to get these pickup MKL/Atlas - two pretty optimized libraries. For production machines the simfactory entries would handle that this happens anyway (one would hope), but this also happens for typical workstation environments. I don't think that we should change this.

    Also, this is not what we do for any other external library. The task of the ExternalLibraries thorns was so far always to check for the existence of a viable system version first, and use this if at all possible. Of course, the point for discussion here is 'viable'. If we understand this as "works", then performance is none of the thorns concerns. If performance is important, we have to detect poor performance somehow. Ignoring an installed blas/lapack library is not what I would expect. ExternalLibraries/MPI also checks for other MPI versions than OpenMPI, and does not care whether the found version has better or worse performance than a built OpenMPI would have.

  8. Erik Schnetter reporter
    • removed comment

    You cannot compare performance between different MPI version with performance of the reference BLAS. We are talking about a factor of ten here. Compare it rather to a utility that chooses not to install an Infiniband driver since a Gigabit Ethernet driver is already present on the system.

    If you think that thorn OpenBLAS should use an installed, slow library instead of building an efficient OpenBLAS, then we disagree very much on what this thorn should be doing.

  9. Frank Löffler
    • removed comment

    Replying to [comment:9 eschnett]:

    If you think that thorn OpenBLAS should use an installed, slow library instead of building an efficient OpenBLAS, then we disagree very much on what this thorn should be doing.

    I don't think <quote>that thorn OpenBLAS should use an installed, slow library instead of building an efficient OpenBLAS</quote>, in this we agree. But that means that if OpenBLAS rejects a working installed BLAS installation, it should do so on account of that being slow. We can either determine this somehow, and then reject a slow installation, or we cannot distinguish between a slow and a fast existing installation, in which case I believe we should choose the existing version.

  10. Frank Löffler
    • removed comment

    On an unrelated note: if the thorn is supposed to be used to point to any BLAS/LAPACK installation, shouldn't the variable to do so not be BLAS_DIR / LAPACK_DIR? Of course this would now be different than the thorn name.

  11. Erik Schnetter reporter
    • removed comment

    Please make a suggestion for how "somehow detect" should work, and what should happen if this detection fails to give a result.

    As I said, I think we disagree on what OpenBLAS should be doing. My main goal is to ensure that there is an efficient BLAS available. Yours seems to be to trust the system library, if there is one. I don't trust people to install something reasonable on their laptops/workstations.

    We can discuss thorn, variable, and requirement names later.

  12. Roland Haas
    • removed comment

    On danger of being called out for me lack of wanting a fast LAPACK (this is realated to my general unhappiness with building everything from scratch since I have many compiled Cactus trees and many configurations and often compile new ones): I actually do "think that thorn OpenBLAS should use an installed, slow library instead of building an efficient OpenBLAS". At least in its probe-what-is-there mode. OpenBLAS may well want to output a warning, but otherwise I'd stick to: "the user installed the system LAPACK, so I use it". If they want to compile OpenBLAS then they can either uninstall the system one or ask for XXXX_DIR=BUILD. This is based on the notion that the OpenBLAS thorn provides the capabilities LAPACK and BLAS and is not only a wrapper around OpenBLAS's configure script.

  13. Erik Schnetter reporter
    • removed comment

    Which machines are these?

    The behaviour we are discussing will not affect any machines where we use a special option list, i.e. none of the typical HPC systems will be affected by this choice. For generic Ubuntu systems, we can add Atlas or OpenBLAS to the list of packages that need to be installed before building Cactus.

    Where do you keep the "many Cactus trees"?

    Or are you describing builds that do not use Simfactory?

  14. Roland Haas
    • removed comment

    I'll try to answer each question:

    These are my workstation, my Linux laptop, my OSX laptop, three different Linux installations in virtual machines. My undestanding is that the issue of compiling/using OpenBLAS only comes up on this types of machines (ie. personal workstations and laptops) at all, yes? For the XSEDE/SciNet/PRACE/whatnot clusters in simfactory we already provide paths to optimized LAPACK/BLAS versions. For private clusters that someone sets up, there would seem to be enough complexity already that choosing the proper LAPACK/BLAS library should be a minor point, eg. I usually find it much harder to get infiniband to work properly.

    Maybe I misunderstood. I was under the impression that the suggestion was to have OpenBLAS ignore any installed LAPACK/BLAS and compile its own code since it is hard to determine if the system installed LAPACK is slow. This would only affect its probe-for-installed-versions mode, not the mode where we specify an non-special CCC_DIR. The current instructions for first users for Ubuntu (https://docs.einsteintoolkit.org/et-docs/Simplified_Tutorial_for_New_Users) already suggest installing libatlas-base-dev and the ubuntu option list in simfactory does the same in its comment headers. I vanilla Ubuntu install will also work since it does not install any lapack/blas by default.

    I keep at least three different Cactus trees on my workstation at all times: ET_master, ET_release, Zelmani. Each with different sources for the thorns. I have ~40 Cactus trees on my workstation, most of them historical only but maybe an additional to the three listed above that I actually use. I have multiple configurations in Zelmani, one that is mostly vanilla, one for MHD, one for the inversion-symmetry-preserving setup, one that just compiles external libraries for use by other toolkit, several that are -debug variants of the above, several that contain the state of the code as used by other group members. On top of that I have maybe 5-10 Cactus tree on various machines that are builds of Formaline tarballs when I wanted/needed to exactly reproduce behaviour of one of mine/a group member's runs; those probably don't count since they really would actualy want to compile everything from scratch to make sure I use the very same code as the last time around and the machines are often clusters in simfactory.

    The trees on my workstation usually don't use simfactory to build but use the a variant of debian.cfg to build, the OSX laptop uses osx-yosemite-homebrew-gcc.cfg to compile. No workstation/laptop of mine starts runs using simfactory since they are testing machines only and I find using a script of my own to call mpirun (and log output) more convenient.

    My builds usually use simfactory option lists, ET/Zelmani thornlists but that do not use simfactory to build or run. I have not admit I am not sure how using simfactory would change this, unless we'd want to use ENABLE/DISABLE thorn statements in the localhost.ini ini files that sim setup creates when a private workstation is set up.

    This may not be the most efficient setup and differ considerably from what others are using. I am also not suggesting this to be the best option, but it is the one that I ended up using.

    My main concerns are that somebody who wants to "just try" the ET should not have to spend a long time compiling it, and (just for my own, though I can naturally change options etc to achieve this in any way) I like software to have simple, predictable behaviour that is the same among similar software (ie all ExternalLibraries should behave the same).

  15. Erik Schnetter reporter
    • removed comment

    We all want simple, and we all want fast. This is a trade-off: simple or fast. You want simple, since you don't need a fast lapack.

  16. Frank Löffler
    • removed comment

    Simple isn't so much the problem. Building OpenBLAS is (hopefully) also simple. It requires build-time and disk space, and it is not clear whether it actually gives the user a benefit. In order to know that we would need to test the alternative, and we argued that this is too much hassle.

    The question is more like: What is more important to you: an uncertain performance benefit (OpenBlas vs. others), or a certain build-time and -space benefit (of using installed libraries)? That is hard to answer given the (un)available information, and also depends on the use case. I rarely need a fast Lapack, and even if I do, the installed Atlas has been good enough so far. I don't know if OpenBlas would be faster in my particular case. I never had (and still don't have) a reason to try.

  17. Roland Haas
    • removed comment

    OpenBLAS is more aggressive in using processor optimized code and fails to compile if it cannot identify the host processor (see #1962), we should provide some fallback for this since Cactus most likely know enough about the current host machine to enable compilation for a "generic" CPU of that type.

  18. Frank Löffler
    • removed comment

    The question is: should Cactus succeed (even if verbose) in that case? If OpenBLAS is selected, shouldn't that mean that a failure to compile it should produce an error? Output while compiling can easily be overlooked, especially if in the end compiling succeeds.

  19. Frank Löffler
    • changed status to open
    • removed comment

    Until we have a solution that would be workable, I'll reset the ticket state to 'open'.

  20. Roland Haas

    This ticket has been around for a while. I would still think that since OpenBLAS is x86 specific and even on x86 machines fails to compile if the processor is too new (or old I guess), I would not want it to be the default.

    Instead clusters where indeed we could compile BLAS and LAPACK libs from scratch can use simfactory’s enabled-thorns and disable-thorns options to enable OpenBLAS. If we do not compile, which is the usual case on clusters, then there is no difference between using BLAS/LAPACK externallibraries and OpenBLAS.

    That leaves the case of user workstations and unknown clusters. On a user workstation, with automated setup, speed should not matter so I would strongly advocate to use the “compiles everywhere” BLAS and LAPACK even if they are slow. In particular since the ET as a whole doe not depend on those for speed.

    On clusters, speed does matter, but setting up on a cluster requires some expertise anyway and we should be able to expect that those doing so will be able to set the correct LAPACK_DIR and BLAS_DIR variables to point to the system installed good BLAS / LAPACK library. If there are indeed HPC clusters where those libraries are not provide, then those clusters may be best avoided.

  21. Log in to comment