thorn Vectors does not provide a sum() reduction

Issue #2015 closed
Roland Haas created an issue

It would be very useful for some applications (eg interpolation) if thorn Vectors provided a sum() function similar to the sum() member of vecmathlib

https://bitbucket.org/eschnett/vecmathlib/src/477f9c85cf111e59a9042f230bb536d0765265fa/vec_avx_double4.h?at=master&fileviewer=file-view-default#vec_avx_double4.h-537

real_t sum() const {
  // return (*this)[0] + (*this)[1] + (*this)[2] + (*this)[3];
  // __m256d x = _mm256_hadd_pd(v, v);
  // __m128d xlo = _mm256_extractf128_pd(x, 0);
  // __m128d xhi = _mm256_extractf128_pd(x, 1);
  realvec_t x = *this;
  x = _mm256_hadd_pd(x.v, x.v);
  return x[0] + x[2];
}

Most likely one can just copy and paste the code from vecmathlib. vecmathlib's license is not LGPL but permissive enough for inclusion and we can also ask Erik if one can use it and change the license of the affected code lines to LGPL in Cactus, the license is here:

https://bitbucket.org/eschnett/vecmathlib/src/477f9c85cf111e59a9042f230bb536d0765265fa/LICENCE?at=master&fileviewer=file-view-default

Copyright (c) 2012, 2013 Erik Schnetter <eschnetter@gmail.com>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

Keyword: Vectors

Comments (6)

  1. Erik Schnetter
    • removed comment

    A sum SIMD reduction is quite expensive, much more so (ten times?) than a vectorized add. We can certainly copy the code over (I wrote it, so I can re-licence it), or we can switch Cactus to Vecmathlib in the long term.

    There might be another way to speed up an interpolation by preferring certain directions. I don't know whether this is already implemented -- did you think about a particular thorn?

  2. Roland Haas reporter
    • removed comment

    Thank you. I was looking at CarpetInterp2's lagrange interpolation. See https://bitbucket.org/eschnett/carpet/branch/rhaas/fasterp though this has not yet been speed tested compared to the previous implementation.

    The sum() at the end is done via a loop over the elt function. I did a quick check to see what an omp simd reduction(+: var) would do and found that at least on my laptop it does a similar loop and no horizontal sum at all.

  3. Erik Schnetter
    • removed comment

    I would implement sum via a sequence of shuffle and vector-add instructions, where the last add can be a scalar add:

    // assuming vector size is 4, input is s4
    t4 = __builtin_shuffle(s4, {1,0,3,2});
    s2 = s4 + t4;
    t2 = __builtin_shuffle(s4, {2,3,0,1});
    s1 = s2 + t2;
    
  4. Roland Haas reporter
    • removed comment

    You mean in vectors.h or in CarpetInterp2? In the latter I am limited to either functions from vectors.h or standards compliant code and __builtin_shuffle sounds like it will be compiler dependent. I will give things a try though.

  5. Log in to comment