Compiling PittNullCode is slow

Issue #778 closed
Erik Schnetter created an issue

Compiling PittNullCode is very slow on some systems (e.g. with gcc). I believe this is because files such as NullConstr_R00.F90 contain many whole-array operations that the compiler has to analyse.

I suggest to rewrite these routines, using e.g. forall or do loops. If we want to keep the elegant, index-free notation, then I suggest to add an elemental subroutine for the actual calculations and calling it with whole arrays.

Keyword: NullConstr

Comments (7)

  1. Roland Haas
    • changed status to open
    • removed comment

    The attached patch replaces the array operations by two nested do loops. Changes generated via a search&replace. Passed the test in SphericalHarmonicRecon. Reduces compilation time using intel 12 from many minutes to <1 minute.

  2. Erik Schnetter reporter
    • removed comment

    I think this is a good approach. Could we also get a "good to go" from one of the thorn's maintainers?

  3. Roland Haas
    • removed comment

    Bela Szilagyi had a look at the proposed patch and pointed out that there is no need to have the temporaries s2-s10 and the e*'s be arrays anymore. Otherwise the patch was fine. The attached patch array2.patch replaces the old array.patch and implements this. The results agree identically with the array version (one has to actually add them to the regression test by hand).

    Applied as rev 11 of NullConstr.

  4. Ian Hinder
    • changed status to open
    • removed comment

    NullConstr_R00 still takes a very long time to compile with ifort (IFORT) 14.0.0 20130728. It takes 20 minutes on the Datura head node.

  5. Roland Haas
    • changed status to resolved
    • removed comment

    Datura no longer exists. Compiling the file with gcc 7.2 on my workstation takes 1m (57s to compile and link the whole thing using 1 cpu) and using intel 16.0.3 on BW takes 1m8.411s (just the one file) which is usually among the slowest machines to build.

    The current slowest file to compile is in QuasiLocalMeaures (also F90 code).

  6. Log in to comment