Modify

Opened 4 years ago

Last modified 14 months ago

#1674 reopened enhancement

Switch to OpenBLAS

Reported by: Erik Schnetter Owned by:
Priority: major Milestone:
Component: Other Version: development version
Keywords: Cc:

Description

OpenBLAS is a BLAS and LAPACK library that is significantly more efficient that the standard ("reference") BLAS. I suggest we switch the ET thorn list to ExternalLibraries/OpenBLAS instead of ExternalLibraries/BLAS and ExternalLibraris/LAPACK.

Attachments (0)

Change History (22)

comment:1 Changed 4 years ago by Erik Schnetter

Status: newreview

comment:2 Changed 4 years ago by Frank Löffler

blas is often provided by atlas (and is picked up by the current BLAS thorn), and I don't recall atlas to be particularly inefficient. Do you talk about the real "reference" blas, or atlas in your comparison (same for lapack)?

comment:3 Changed 4 years ago by Erik Schnetter

At the moment, Cactus builds the reference BLAS if it doesn't find a system BLAS. Instead, it should build OpenBLAS. Building Atlas is more complicated since Atlas auto-tunes, which also takes a long time. OpenBLAS does not need this, and is apparently also faster than Atlas. (I don't know what this means in practice.)

comment:4 Changed 4 years ago by Frank Löffler

Since it would be confusing to have both thorn BLAS and thorn OpenBLAS in the ET, do you suggest to move to OpenBLAS for the toolkit (which would be fine with me, assuming OpenBLAS works at least as well)?
And, do we really want to maintain both thorns?

comment:5 Changed 4 years ago by Erik Schnetter

OpenBLAS also provides LAPACK. It contains a copy of the reference LAPACK, probably with optimizations to some important routines.

My suggestion is to replace both BLAS and LAPACK by OpenBLAS, both as thorns as in the ET thorn list. BLAS and LAPACK are then not needed any more.

comment:6 Changed 4 years ago by Frank Löffler

The thorn OpenBLAS currently does not seem to be able to detect any other installed BLAS/LAPACK installation like the BLAS/LAPACK thorns do. Without this it is not yet a replacement for the BLAS/LAPACK thorns.

comment:7 Changed 4 years ago by Erik Schnetter

It is dangerous to auto-detect existing BLAS libraries, because these are often just the reference BLAS. It is better to build OpenBLAS in this case, since building it is actually quite fast. The point of using OpenBLAS is to guarantee that we have an efficient BLAS implementation at hand, e.g. for people using PETSc.

We could auto-detect existing BLAS libraries if we also auto-detect what kind it is (e.g. Atlas, OpenBLAS, reference BLAS, MKL, ...), but that is difficult, and we haven't done this before, so this shouldn't be a prerequisite.

comment:8 Changed 4 years ago by Frank Löffler

The current state with BLAS and LAPACK is that a user does not have to do anything to get these pickup MKL/Atlas - two pretty optimized libraries. For production machines the simfactory entries would handle that this happens anyway (one would hope), but this also happens for typical workstation environments. I don't think that we should change this.

Also, this is not what we do for any other external library. The task of the ExternalLibraries thorns was so far always to check for the existence of a viable system version first, and use this if at all possible. Of course, the point for discussion here is 'viable'. If we understand this as "works", then performance is none of the thorns concerns. If performance is important, we have to detect poor performance somehow. Ignoring an installed blas/lapack library is not what I would expect. ExternalLibraries/MPI also checks for other MPI versions than OpenMPI, and does not care whether the found version has better or worse performance than a built OpenMPI would have.

comment:9 Changed 4 years ago by Erik Schnetter

You cannot compare performance between different MPI version with performance of the reference BLAS. We are talking about a factor of ten here. Compare it rather to a utility that chooses not to install an Infiniband driver since a Gigabit Ethernet driver is already present on the system.

If you think that thorn OpenBLAS should use an installed, slow library instead of building an efficient OpenBLAS, then we disagree very much on what this thorn should be doing.

comment:10 in reply to:  9 Changed 4 years ago by Frank Löffler

Replying to eschnett:

If you think that thorn OpenBLAS should use an installed, slow library instead of building an efficient OpenBLAS, then we disagree very much on what this thorn should be doing.

I don't think <quote>that thorn OpenBLAS should use an installed, slow library instead of building an efficient OpenBLAS</quote>, in this we agree. But that means that if OpenBLAS rejects a working installed BLAS installation, it should do so on account of that being slow. We can either determine this somehow, and then reject a slow installation, or we cannot distinguish between a slow and a fast existing installation, in which case I believe we should choose the existing version.

comment:11 Changed 4 years ago by Frank Löffler

On an unrelated note: if the thorn is supposed to be used to point to any BLAS/LAPACK installation, shouldn't the variable to do so not be BLAS_DIR / LAPACK_DIR? Of course this would now be different than the thorn name.

comment:12 Changed 4 years ago by Erik Schnetter

Please make a suggestion for how "somehow detect" should work, and what should happen if this detection fails to give a result.

As I said, I think we disagree on what OpenBLAS should be doing. My main goal is to ensure that there is an efficient BLAS available. Yours seems to be to trust the system library, if there is one. I don't trust people to install something reasonable on their laptops/workstations.

We can discuss thorn, variable, and requirement names later.

comment:13 Changed 4 years ago by Erik Schnetter

*ping*

comment:14 Changed 4 years ago by Roland Haas

On danger of being called out for me lack of wanting a fast LAPACK (this is realated to my general unhappiness with building everything from scratch since I have many compiled Cactus trees and many configurations and often compile new ones): I actually do "think that thorn OpenBLAS should use an installed, slow library instead of building an efficient OpenBLAS". At least in its probe-what-is-there mode. OpenBLAS may well want to output a warning, but otherwise I'd stick to: "the user installed the system LAPACK, so I use it". If they want to compile OpenBLAS then they can either uninstall the system one or ask for XXXX_DIR=BUILD. This is based on the notion that the OpenBLAS thorn provides the capabilities LAPACK and BLAS and is not only a wrapper around OpenBLAS's configure script.

comment:15 Changed 4 years ago by Erik Schnetter

Which machines are these?

The behaviour we are discussing will not affect any machines where we use a special option list, i.e. none of the typical HPC systems will be affected by this choice. For generic Ubuntu systems, we can add Atlas or OpenBLAS to the list of packages that need to be installed before building Cactus.

Where do you keep the "many Cactus trees"?

Or are you describing builds that do not use Simfactory?

comment:16 Changed 4 years ago by Roland Haas

I'll try to answer each question:

These are my workstation, my Linux laptop, my OSX laptop, three different Linux installations in virtual machines. My undestanding is that the issue of compiling/using OpenBLAS only comes up on this types of machines (ie. personal workstations and laptops) at all, yes? For the XSEDE/SciNet/PRACE/whatnot clusters in simfactory we already provide paths to optimized LAPACK/BLAS versions. For private clusters that someone sets up, there would seem to be enough complexity already that choosing the proper LAPACK/BLAS library should be a minor point, eg. I usually find it much harder to get infiniband to work properly.

Maybe I misunderstood. I was under the impression that the suggestion was to have OpenBLAS ignore any installed LAPACK/BLAS and compile its own code since it is hard to determine if the system installed LAPACK is slow. This would only affect its probe-for-installed-versions mode, not the mode where we specify an non-special CCC_DIR. The current instructions for first users for Ubuntu (https://docs.einsteintoolkit.org/et-docs/Simplified_Tutorial_for_New_Users) already suggest installing libatlas-base-dev and the ubuntu option list in simfactory does the same in its comment headers. I vanilla Ubuntu install will also work since it does not install any lapack/blas by default.

I keep at least three different Cactus trees on my workstation at all times: ET_master, ET_release, Zelmani. Each with different sources for the thorns. I have ~40 Cactus trees on my workstation, most of them historical only but maybe an additional to the three listed above that I actually use. I have multiple configurations in Zelmani, one that is mostly vanilla, one for MHD, one for the inversion-symmetry-preserving setup, one that just compiles external libraries for use by other toolkit, several that are -debug variants of the above, several that contain the state of the code as used by other group members. On top of that I have maybe 5-10 Cactus tree on various machines that are builds of Formaline tarballs when I wanted/needed to exactly reproduce behaviour of one of mine/a group member's runs; those probably don't count since they really would actualy *want* to compile everything from scratch to make sure I use the very same code as the last time around and the machines are often clusters in simfactory.

The trees on my workstation usually don't use simfactory to build but use the a variant of debian.cfg to build, the OSX laptop uses osx-yosemite-homebrew-gcc.cfg to compile. No workstation/laptop of mine starts runs using simfactory since they are testing machines only and I find using a script of my own to call mpirun (and log output) more convenient.

My builds usually use simfactory option lists, ET/Zelmani thornlists but that do not use simfactory to build or run. I have not admit I am not sure how using simfactory would change this, unless we'd want to use ENABLE/DISABLE thorn statements in the localhost.ini ini files that sim setup creates when a private workstation is set up.

This may not be the most efficient setup and differ considerably from what others are using. I am also not suggesting this to be the best option, but it is the one that I ended up using.

My main concerns are that somebody who wants to "just try" the ET should not have to spend a long time compiling it, and (just for my own, though I can naturally change options etc to achieve this in any way) I like software to have simple, predictable behaviour that is the same among similar software (ie all ExternalLibraries should behave the same).

comment:17 Changed 4 years ago by Erik Schnetter

We all want simple, and we all want fast. This is a trade-off: simple or fast. You want simple, since you don't need a fast lapack.

comment:18 Changed 4 years ago by Roland Haas

Guilty as charged (on both counts).

comment:19 Changed 4 years ago by Frank Löffler

Simple isn't so much the problem. Building OpenBLAS is (hopefully) also simple. It requires build-time and disk space, and it is not clear whether it actually gives the user a benefit. In order to know that we would need to test the alternative, and we argued that this is too much hassle.

The question is more like: What is more important to you: an uncertain performance benefit (OpenBlas vs. others), or a certain build-time and -space benefit (of using installed libraries)? That is hard to answer given the (un)available information, and also depends on the use case. I rarely need a fast Lapack, and even if I do, the installed Atlas has been good enough so far. I don't know if OpenBlas would be faster in my particular case. I never had (and still don't have) a reason to try.

comment:20 Changed 22 months ago by Roland Haas

OpenBLAS is more aggressive in using processor optimized code and fails to compile if it cannot identify the host processor (see #1962), we should provide some fallback for this since Cactus most likely know enough about the current host machine to enable compilation for a "generic" CPU of that type.

comment:21 Changed 22 months ago by Frank Löffler

The question is: should Cactus succeed (even if verbose) in that case? If OpenBLAS is selected, shouldn't that mean that a failure to compile it should produce an error? Output while compiling can easily be overlooked, especially if in the end compiling succeeds.

comment:22 Changed 14 months ago by Frank Löffler

Status: reviewreopened

Until we have a solution that would be workable, I'll reset the ticket state to 'open'.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as reopened The ticket will remain with no owner.
Next status will be 'review'.
as The resolution will be set.
to The owner will be changed from (none) to the specified user.
The owner will be changed from (none) to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.