Improve OpenMP parallelisation of SummationByParts

Issue #1023 closed
Erik Schnetter created an issue

The Intel compiler does not handle workshare constructs well. The attached patch replaces them by explicit loops, which execute faster. This makes a measurable difference on Hopper with 24 OpenMP threads.

This only modifies one operator; other operators could be treated in the same way.

Keyword:

Comments (4)

  1. Frank Löffler
    • changed status to open
    • removed comment

    The patch looks ok. I didn't check all the indices really carefully (due to the length of the patch) and didn't run testsuites. Assuming tests show no difference between both versions using multiple threads I think it is ok to commit this. I'll leave testing to Erik. :)

  2. Log in to comment