Support KNL's AVX512 instruction set

Issue #2068 closed
Erik Schnetter created an issue

The main point of this pull request is to support KNL's AVX512 instruction set. This pull request also contains a few other changes on which the AVX512 depends.

See [https://bitbucket.org/cactuscode/cactusutils/pull-requests/13/support-knls-avx512-instruction-set/diff].

Keyword:

Comments (15)

  1. Roland Haas
    • removed comment

    this is hard to review. Could you provide a pull request that separates out the result of clang-format (which don't need review) from the other changes (ie new code, CCTK_VWarn -> CCTL_VWARN changes)?

  2. Erik Schnetter reporter
    • removed comment

    I factored it into individual topical commits. If you look at the commits, you see thematically-related changes. Reformatting is restricted to a single commit.

  3. Roland Haas
    • changed status to open
    • removed comment

    ok, thank you. Let me see if I can figure out how to look at the commits individually (and comment on them). I tried to have a look and failed to find out which branch to look at. :-)

  4. Roland Haas
    • removed comment

    Alright, this worked without problems, not sure what I had tried to do before. The only downside is that the comments do not show up in the pull request diff but only in the individual commits, so here's links to them:

    Looks fine in general. Usual disclaimers apply: these comments are all very terse and just comments that came to my mind when reading the code, not intended to be very well worded or final requests.

    Two more comments:

    • does the final version of the commits still pass through clang-format unchanged?
    • I don't know what was done in thorn Vectors (compared to vecmathlib) before: are approximated answers used or was there a promise that (as much as possible) results that simple use vectorization to do the same thing to multiple grid points and doing the same thing to multiple grid points using a loop return the same result (ignoring fma and the like).
  5. Erik Schnetter reporter
    • removed comment
    • Before committing, I will re-run clang-format to ensure the formatting is right.

    • Answers are not approximated, except for functions such as sin and cos where the IEEE standard does not require accuracy of all bits; in these cases, I follow the OpenCL standard with respect to accuracy (how many bits can be wrong, usually at most 4 out of 53). inf and nan might also be handled differently. However, in particular basic arithmetic and square roots are correct in all digits. fma might or might not round between multiplying and adding. So, in the absence of trigonometric functions, the answer is "yes, result will be the same".

  6. Ian Hinder
    • changed status to open
    • removed comment

    This seems to have caused test failures on Jenkins. Specifically,

     CarpetProlongateTest.test_o11/2procs
     CarpetProlongateTest.test_o7/2procs
     CarpetProlongateTest.test_o9/2procs
    

    all now fail. For o11, the error is

    cactus_sim: /home/jenkins/workspace/EinsteinToolkit/arrangements/Carpet/CarpetLib/src/prolongate_3d_rf2.cc:258: T CarpetLib::interp1(const T*, size_t) [with T = double; int ORDER = 11; int di = 1; size_t = long unsigned int]: Assertion `i == (ptrdiff_t(coeffs::imax) - ptrdiff_t(coeffs::ncoeffs % VP::size()))' failed.

    Backtrace from rank 0 pid 30341:
    1. CarpetLib::signal_handler(int)   [/home/jenkins/workspace/EinsteinToolkit/../simulations/EinsteinToolkit_eec4338ddb95f8ab8c59ed7cd91635b8c4ff0f23_2/SIMFACTORY/exe/cactus_sim(_ZN9CarpetLib14signal_handlerEi+0xda) [0x23f031a]]
    2. /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7fd54f6cf4b0]
    3. /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7fd54f6cf428]
    4. /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7fd54f6d102a]
    5. /lib/x86_64-linux-gnu/libc.so.6(+0x2dbd7) [0x7fd54f6c7bd7]
    6. /lib/x86_64-linux-gnu/libc.so.6(+0x2dc82) [0x7fd54f6c7c82]
    7. /home/jenkins/workspace/EinsteinToolkit/../simulations/EinsteinToolkit_eec4338ddb95f8ab8c59ed7cd91635b8c4ff0f23_2/SIMFACTORY/exe/cactus_sim() [0xc88a70]
    8. void CarpetLib::prolongate_3d_rf2<double, 11>(double const*, vect<int, 3> const&, vect<int, 3> const&, double*, vect<int, 3> const&, vect<int, 3> const&, bbox<int, 3> const&, bbox<int, 3> const&, bbox<int, 3> const&, bbox<int, 3> const&, void*)   [/home/jenkins/workspace/EinsteinToolkit/../simulations/EinsteinToolkit_eec4338ddb95f8ab8c59ed7cd91635b8c4ff0f23_2/SIMFACTORY/exe/cactus_sim(_ZN9CarpetLib17prolongate_3d_rf2IdLi11EEEvPKT_RK4vectIiLi3EES7_PS1_S7_S7_RK4bboxIiLi3EESC_SC_SC_Pv+0x1527) [0x2463de7]]
    9. /home/jenkins/workspace/EinsteinToolkit/../simulations/EinsteinToolkit_eec4338ddb95f8ab8c59ed7cd91635b8c4ff0f23_2/SIMFACTORY/exe/cactus_sim() [0x2465eb6]
    a. /usr/lib/x86_64-linux-gnu/libgomp.so.1(+0xf43e) [0x7fd54fc8943e]
    b. /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fd5522446ba]
    c. /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fd54f7a13dd]
    
    The hexadecimal addresses in this backtrace can also be interpreted
    with a debugger (e.g. gdb), or with the 'addr2line' (or 'gaddr2line')
    command line tool: 'addr2line -e cactus_sim <address>'.
    

    The other orders are similar.

  7. Log in to comment