Add Lean_public and Proca thorns to the ET

Issue #2101 closed
Miguel Zilhão created an issue

As discussed during the European ET meeting in Mallorca and during the 2018-01-22 ET telecon, we have a couple of Cactus thorns that we'd like to make available to the general ET community. These include: an updated evolution code that we used to evolve Proca fields (https://arxiv.org/abs/1505.00797), the corresponding analysis and initial data thorns, as well as a general metric evolution thorn (based on Uli Sperhake's Lean). We have been cleaning up the codes, and they are now available in following (public) bitbucket repositories:

https://bitbucket.org/canuda/lean_public https://bitbucket.org/canuda/proca

Included are some testsuites as well as some basic README files. We are currently working on a paper that would also serve as documentation. We would be very grateful if these could be considered for inclusion in the 2018_08 release.

Sincerely,

Helvi Witek Miguel Zilhão

Keyword: Lean_public

Comments (33)

  1. anonymous
    • removed comment

    I am reviewing the "LeanBSSNMoL" code for submission to the Einstein Toolkit. Executive summary: The code is beautifully written, but the thorn lacks test cases, which are necessary.

    Downloading and building the code worked fine.

    The code is written in Fortran 90. It has a clear structure and is very readable. It consists of essentially one large loop per function, which is efficient and compiles quickly, as opposed to using Fortran 90's array syntax that unfortunately leads to bad slow-downs with some compilers.

    The code has the usual idiosyncracies where it expects that "CCTK_REAL" is the same as "double precision", and "CCTK_INT" is the same as "integer". These hold usually in practice, and the code is fine as is, although the main point of introducing "CCTK_REAL" and "CCTK_INT" was to allow these types to be distinct.

    I am reading the code, and I have the following remarks:

    The file "zero_rhs.F90" (and maybe others) do not include "cctk_Arguments.h". How do they even compile? The missing include statement should be added. I wonder whether there is a bug in the Einstein Toolkit in that including "cctk.h" already includes "cctk_Arguments.h"?

    I have, for obvious reasons, not thoroughly checked the implementations of finite differences and BSSN equations in the file "bssn_constraints.F90" and "calc_bssn_rhs.F90". I assume that the authors checked their code and am willing to trust them. Personally, I would have defined macros for the lengthy finite differencing expressions.

    I ran all test cases successfully. There are no test cases, so this is not a strong statement. Test cases need to be added for a few simple cases, e.g. for some of the Apples with Apples examples (nonlinear gauge wave, Gowdy spacetim) as well as a single, preferably rotating black hole.

    I have run the example parameter file on 1 node with 40 cores of the Niagara system of Compute Canada, using 10 MPI processes with 4 OpenMP threads each. This seems to work nicely, albeit progress was so slow that after a few hour only 16 iterations had completed, and horizons were found only once. A quick glance shows that the output looks correct.

    For future reference, example parameter files should contain comments near the top describing approximately how much memory and CPU time they use. This makes it much simpler to run them, and prevents people from accidentally running them on their laptop if there is no chance of success. Put differently, the driver should have a notion of the amount of memory available on a node (or laptop), and should refuse to allocate significantly more than that. Simfactory already has the necessary information.

    Additional remarks, as suggestions for future improvements to the authors:

    The file "evolve_utils.F90" uses "omp parallel do collapse(3)". It might be faster to use only "collapse(2)" here, and instead use "omp simd" for the innermost loop. The same applies to all OpenMP loops.

    Some functions have very generic names, such as "remove_trA" or "reset_detmetric". This risks name collisions throughout the executable. I suggest to add a prefix to these functions, even if only a short one such as "lbm_". Alternatively, these functions can be put into a module with a unique name such as "LeanBSSN_evolve_utils", which would have the added benefit of checking the function's signatures. (Scheduled functions cannot be put into modules.)

    Other, minor remarks:

    The thorn list entry contains the comment "Lean public thorns". This gives the impression as if there was a proper set of Lean thorns, and a few of them were made public. I'd prefer to call them simply "Lean thorns", as these thorns need to be interesting and useful by themselves, and should see some development by themselves. I object to the notion that Einstein Toolkit thorns are merely mirror images of thorns that truly live elsewhere, hidden, and which from time to time receive a data dump as upgrade. Instead, the thorns in the Einstein Toolkit should be the main development versions, and where private copies exist, they should contribute via pull request to the main development versions.

    The shift condition mentioned in the README of LeanBSSNMoL should recieve a proper pointer to a particular set of equations (e.g. (S2)?).

    I see comments such as "$Header:$" in some files. These are now-unused left-overs from CVS, and can be removed.

    Calls to "CCTK_WARN(0, ...)" can be replaced by "CCTK_ERROR(..."), which might be more readable.

  2. Peter Diener
    • removed comment

    Here is my review of the "Proca" suite of thorns that are proposed for inclusion in the Einstein Toolkit. Proca consists of ProcaBase, Proca_simpleID, TwoPunctures_KerrProca and ProcaEvolve and NPScalars_Proca. The code is nicely written, but mainly lack some tests and documentation.

    Questions, comments and recommendations regarding Proca overall:

    There are two tests in ProcaEvolve of which one uses Lean and the other ML_BSSN while otherwise appear to be the same. Whereas the Ai and Ei variables agree quite well, there seems to be significant differences between Aphi and Zeta. Is this expected? If so, why?

    In ProcaEvolve there is another parameter file in the test directory (ML_BSSN_Aphi_mu0.4_c100.1_r06_S0.0_4th.par) that does not have any associated output. If this was meant to be a test, output should be provided. Otherwise it should be moved to the example parameter file directory.

    NPScalars_Proca, Proca_simpleID and TwoPunctures_KerrProca have no tests of their own. Proca_simpleID is tested by the tests in ProcaEvolve, but NPScalars_Proca and TwoPunctures_KerrProca should definitely be covered by tests. You may also consider making a standalone test for Proca_simpleID.

    None of the thorns in the Proca arrangement have any documentation of their own. As there is a paper describing the formulation, it is okay to refer to the paper, but there should be some description of the variables and parameters used in the code and how they relate to the physics and equations.

    It would be nice if all he parameter files for the runs in the paper could be made available as examples of how to use the thorn. Currently there are only two example parameter files available and it's not immediately clear if they have been used for the paper. There could be a comment at the top of the parameterfiles indicating what section and/or figure in the paper they were used for.

    Questions, comments and recommendations specifically to ProcaBase:

    In parameter.ccl the comment to the value of the parameter mu says: "any non-negative number". However, the range allows for zero and in fact the default value is zero. Either the comment should be changed or the range and default value should be changed.

    Are the integer power for the fall off rate for the outgoing boundary conditions for the Proca variables not known? Or can they be different for different physical systems? If they are known constants, should they be parameters at all? Also could the power be different for different components of the E and A vectors. If not, there's no need to make these vector parameters.

    I propose to give the file (src/Basegrid.F90) with the symmetry registering routine a more descriptive name.

    It would be nice to have a proper (but short) thorn documentation to relate the variables with the paper.

    Questions, comments and recommendations specifically to Proca_simpleID:

    The Simple Initial Data solution in the paper is a spherical symmetric solution. The parameters for Proca_simpleID on the other hand seem to allow for a superposition of 2 such solutions separated by a non-zero distance. As the parameter for the separation can not be zero (imposed by the parameter range) the initial data thorn will thus never produce initial data that is spherically symmetric around the origin of the cactus grid. Would it make sense for the range of par_b to include zero in order to allow for that. This is something worth explaining in the (missing) documentation.

    Questions, comments and recommendations specifically to TwoPunctures_KerrProca:

    It looks like the code implements initial data with a Y_11 angular dependence in addition to Y_00 and Y_10 mentioned in the paper. This should be in the documentation.

    There's a comment in Equations.c that some of the equations were copied from Mathematica. Could that mathematica notebook be made available in the thorn as well?

    In the future it might be worth considering some way of merging this back into the original TwoPunctures, so we don't have multiple thorns implementing the same spectral solver infrastructure.

    There should be tests for all the different types of angular profiles for the initial data.

    The discussion in the README should be converted into a proper thorn documentation.

    Questions, comments and recommendations specifically to ProcaEvolve:

    Need a thorn documentation to relate variables to the paper and to explain the use of parameters.

    I don't know how much performance is a concern. If only a tiny amount of time is spent in Proca compared to BSSN (or other things) this is obviously not important. However, if the issues with the code, that prevents the compiler from optimizations, makes the code slow enough that the time spent in Proca is comparable to the time spent in BSSN it might be worth it to fundamentally rewrite the code. One of the problems, is the many nested loops over tensor indices. These prevents the compiler from vectorizing most of the RHS code (I checked this by looking at compiler vectorization output from the Intel compiler). As the loop lengths are just 3, the vectorizer deems that vectorization will be ineffecient and hence doesn't do it. On modern CPUs with vector lengths of 4 or 8 (in double precision), efficiently using the vector units is really important for good performance. However, this is not an issue (if there really is a problem with performance) that should prevent inclusion in the ET, but just something to think about for later.

  3. Miguel Zilhão reporter
    • removed comment

    many thanks for the comments and review! below we try to address your points.

    I am reading the code, and I have the following remarks:

    The file "zero_rhs.F90" (and maybe others) do not include "cctk_Arguments.h". How do they even compile? The missing include statement should be added. I wonder whether there is a bug in the Einstein Toolkit in that including "cctk.h" already includes "cctk_Arguments.h"?

    indeed this was missing; it has been added.

    I have, for obvious reasons, not thoroughly checked the implementations of finite differences and BSSN equations in the file "bssn_constraints.F90" and "calc_bssn_rhs.F90". I assume that the authors checked their code and am willing to trust them. Personally, I would have defined macros for the lengthy finite differencing expressions.

    using macros would have been preferred here, yes. the reason for us not doing so is partly historical, as the code we built upon used the present implementation (and had already extensively been tested). so we decided to keep this.

    I ran all test cases successfully. There are no test cases, so this is not a strong statement. Test cases need to be added for a few simple cases, e.g. for some of the Apples with Apples examples (nonlinear gauge wave, Gowdy spacetim) as well as a single, preferably rotating black hole.

    we had only one test in place, which is triggered through the ProcaEvolve thorn. we have now added "standalone" tests, as well as one test for the NPScalars thorn.

    For future reference, example parameter files should contain comments near the top describing approximately how much memory and CPU time they use. This makes it much simpler to run them, and prevents people from accidentally running them on their laptop if there is no chance of success. Put differently, the driver should have a notion of the amount of memory available on a node (or laptop), and should refuse to allocate significantly more than that. Simfactory already has the necessary information.

    we've added the information with the memory consumption for our sample parameter file just before the grid specification.

    Additional remarks, as suggestions for future improvements to the authors:

    The file "evolve_utils.F90" uses "omp parallel do collapse(3)". It might be faster to use only "collapse(2)" here, and instead use "omp simd" for the innermost loop. The same applies to all OpenMP loops.

    we have now added the simd directives to the loops, thanks.

    Some functions have very generic names, such as "remove_trA" or "reset_detmetric". This risks name collisions throughout the executable. I suggest to add a prefix to these functions, even if only a short one such as "lbm_". Alternatively, these functions can be put into a module with a unique name such as "LeanBSSN_evolve_utils", which would have the added benefit of checking the function's signatures. (Scheduled functions cannot be put into modules.)

    since these functions are scheduled, we have opted to add the prefix "LeanBSSN_".

    Other, minor remarks:

    The thorn list entry contains the comment "Lean public thorns". This gives the impression as if there was a proper set of Lean thorns, and a few of them were made public. I'd prefer to call them simply "Lean thorns", as these thorns need to be interesting and useful by themselves, and should see some development by themselves. I object to the notion that Einstein Toolkit thorns are merely mirror images of thorns that truly live elsewhere, hidden, and which from time to time receive a data dump as upgrade. Instead, the thorns in the Einstein Toolkit should be the main development versions, and where private copies exist, they should contribute via pull request to the main development versions.

    we have removed this thornlist, actually, since it contained the rest of the (out-of-date) ET thorns. we have added the comment "Lean thorns" to the thornlist with just the Lean thorns.

    The shift condition mentioned in the README of LeanBSSNMoL should recieve a proper pointer to a particular set of equations (e.g. (S2)?).

    this has been added.

    I see comments such as "$Header:$" in some files. These are now-unused left-overs from CVS, and can be removed.

    these have been removed.

    Calls to "CCTK_WARN(0, ...)" can be replaced by "CCTK_ERROR(..."), which might be more readable.

    we have done this replacement throughout.

  4. Miguel Zilhão reporter
    • removed comment

    Replying to [comment:4 Peter Diener]:

    many thanks for your input and comments! below we try to address your points.

    Questions, comments and recommendations regarding Proca overall:

    There are two tests in ProcaEvolve of which one uses Lean and the other ML_BSSN while otherwise appear to be the same. Whereas the Ai and Ei variables agree quite well, there seems to be significant differences between Aphi and Zeta. Is this expected? If so, why?

    on my machine Aphi gives very similar values for the evolutions with Lean and ML_BSSN; Zeta does appear a bit different, but this quantity is the constraint damping variable and has no physical meaning, so this may just be due to the different ways the gauge is handled in Lean in ML_BSSN...

    In ProcaEvolve there is another parameter file in the test directory (ML_BSSN_Aphi_mu0.4_c100.1_r06_S0.0_4th.par) that does not have any associated output. If this was meant to be a test, output should be provided. Otherwise it should be moved to the example parameter file directory.

    it has been moved to the par folder, thanks.

    NPScalars_Proca, Proca_simpleID and TwoPunctures_KerrProca have no tests of their own. Proca_simpleID is tested by the tests in ProcaEvolve, but NPScalars_Proca and TwoPunctures_KerrProca should definitely be covered by tests. You may also consider making a standalone test for Proca_simpleID.

    added respective tests.

    None of the thorns in the Proca arrangement have any documentation of their own. As there is a paper describing the formulation, it is okay to refer to the paper, but there should be some description of the variables and parameters used in the code and how they relate to the physics and equations.

    we've added to the ProcaBase interface.ccl file some more description, matching the variables with the notation used in the first paper. once we have the new paper out this can also be updated to reflect the notation therein, if needed. we've also added a simple documentation.tex to ProcaEvolve and TwoPunctures_KerrProca

    It would be nice if all he parameter files for the runs in the paper could be made available as examples of how to use the thorn. Currently there are only two example parameter files available and it's not immediately clear if they have been used for the paper. There could be a comment at the top of the parameterfiles indicating what section and/or figure in the paper they were used for.

    we've added this comment in one of the parfiles. the other two were used for the runs in the upcoming paper; we can add the information as well once the paper is published.

    Questions, comments and recommendations specifically to ProcaBase:

    In parameter.ccl the comment to the value of the parameter mu says: "any non-negative number". However, the range allows for zero and in fact the default value is zero. Either the comment should be changed or the range and default value should be changed.

    zero is indeed allowed, this is what we meant with "non-negative". does it not imply that zero is permissible?

    Are the integer power for the fall off rate for the outgoing boundary conditions for the Proca variables not known? Or can they be different for different physical systems? If they are known constants, should they be parameters at all? Also could the power be different for different components of the E and A vectors. If not, there's no need to make these vector parameters.

    i think there have not been any rigorous studies of the boundary conditions of these systems... the default values we are setting are what we expect makes more sense from a naive analysis, but in principle a user may wish to experiment with other values, so we decided to allow for that option.

    I propose to give the file (src/Basegrid.F90) with the symmetry registering routine a more descriptive name.

    it has been renamed to Proca_Symmetries.F90

    It would be nice to have a proper (but short) thorn documentation to relate the variables with the paper.

    would the description introduced to the interface.ccl under ProcaBase suffice? if not, we can certainly add more information if something is not clear...

    Questions, comments and recommendations specifically to Proca_simpleID:

    The Simple Initial Data solution in the paper is a spherical symmetric solution. The parameters for Proca_simpleID on the other hand seem to allow for a superposition of 2 such solutions separated by a non-zero distance. As the parameter for the separation can not be zero (imposed by the parameter range) the initial data thorn will thus never produce initial data that is spherically symmetric around the origin of the cactus grid. Would it make sense for the range of par_b to include zero in order to allow for that. This is something worth explaining in the (missing) documentation.

    this initial data thorn actually allows for more configurations than the Simple Initial Data one of the paper. by allowing a superposition of 2 solutions, we can also construct the initial data for the head-on collision of two charged black holes with equal charge-to-mass ratios (as in https://arxiv.org/abs/1205.1063).

    however, due to how the initial data is constructed, the parameter par_b cannot be exactly zero, and we therefore have excluded that option in the parameter range. choosing a very small value for it (like the default choice) typically works very well at producing spherically symmetric data.

    we have added this information to a README file in this thorn.

    Questions, comments and recommendations specifically to TwoPunctures_KerrProca:

    It looks like the code implements initial data with a Y_11 angular dependence in addition to Y_00 and Y_10 mentioned in the paper. This should be in the documentation.

    this is correct. the initial data with the Y_11 mode is something that will be featured in the upcoming paper. we've added this note in the documentation.

    There's a comment in Equations.c that some of the equations were copied from Mathematica. Could that mathematica notebook be made available in the thorn as well?

    we have no problem with it being made available, we just need to clean up the notebook a bit.

    In the future it might be worth considering some way of merging this back into the original TwoPunctures, so we don't have multiple thorns implementing the same spectral solver infrastructure.

    we considered this, but since the equations to be solved changed considerably, we thought it'd be cleaner to implement this in a new thorn. we're certainly open to the possibility of merging this with the original TwoPunctures if deemed possible and useful, though.

    There should be tests for all the different types of angular profiles for the initial data.

    the issue here is that we find that these initial data need a significant number of spectral collocation points, otherwise the algorithm just doesn't converge. this makes some configurations very time-consuming, needing order ~10-15minutes at least to run. this makes them unfeasible for quick testing... we've added one configuration which runs moderately fast (a couple of minutes); we hope this is ok.

    The discussion in the README should be converted into a proper thorn documentation.

    we have updated the README and added a documentation.tex file with the notes.

    Questions, comments and recommendations specifically to ProcaEvolve:

    Need a thorn documentation to relate variables to the paper and to explain the use of parameters.

    we've added a basic documentation.tex mapping the rhs evolution variables to their definition in the paper.

    I don't know how much performance is a concern. If only a tiny amount of time is spent in Proca compared to BSSN (or other things) this is obviously not important. However, if the issues with the code, that prevents the compiler from optimizations, makes the code slow enough that the time spent in Proca is comparable to the time spent in BSSN it might be worth it to fundamentally rewrite the code. One of the problems, is the many nested loops over tensor indices. These prevents the compiler from vectorizing most of the RHS code (I checked this by looking at compiler vectorization output from the Intel compiler). As the loop lengths are just 3, the vectorizer deems that vectorization will be ineffecient and hence doesn't do it. On modern CPUs with vector lengths of 4 or 8 (in double precision), efficiently using the vector units is really important for good performance. However, this is not an issue (if there really is a problem with performance) that should prevent inclusion in the ET, but just something to think about for later.

    we did not run very thorough checks, but from our limited analysis it seemed that the BSSN metric evolution code still dominates the runtime, so we didn't worry too much about optimizing the Proca evolution. also, one of the motivations for making this code available was for pedagogical purposes, so we thought that having a readable and easy to understand code was preferable. in any case, we are certainly open to write it in a more efficient manner if there are things being done in an obviously non-optimal way.

  5. Peter Diener

    Most of the questions/concerns I had with the Proca arrangement have been addressed satisfactorily. However, one issue has appeared since I last looked at the code and I have a suggestion to try regarding one issue that was only partially addressed.

    The first issue is, that many of the new tests that have been added fails (at least when using the intel compilers (both with version 17 and 18). This is not due to any issues with the Proca thorns, but rather with LeanBSSNMoL. I tracked it down to commit: d5df3a514a79e70e8302a1a37669de0ac06fbf92 (LeanBSSNMoL: add !$OMP SIMD in do loops). Reverting this commit makes all the Proca tests pass (as well as the LeanBSSNMoL tests).

    The second issue is related to the request for tests for all different angular profiles for the initial data produced by TwoPunctures_KerrProca. One new test has been added (thank you) but it is argued that other tests would be too time consuming since a large number of collocation points are needed in order to achieve convergence. My question is whether it would be possible to reduce the number of collocation points and, at the same time, relax the convergence criterium in order to produce a reasonable run time for a test. This would not be something that would be physically realistic, but would still test the equations (presumably the tolerance for the tests would have to be appropriately adjusted). This, might not be possible, but it would be nice if it could.

  6. Roland Haas

    I looked at some of the tests and since they output 1d/2d/3d data they need to either fix the number of MPI ranks they are runnable on (in test.cl NPROCS=2) or they need to use carpetioascii::compact_format = yes and ioascii::out3d_ghosts = no and make sure that slicing happens in only the z direction (this is usually the case). This may let the tests pass with both 1 and 2 MPI ranks. See https://docs.einsteintoolkit.org/et-docs/Adding_a_test_case for an explanation of how to create a test case.

  7. helvi witek

    thanks! we actually already use those parameters in all our tests (carpetioascii::compact_format = yes and ioascii::out3d_ghosts = no). We also just added the number of processors in test.ccl

    make sure that slicing happens in only the z direction

    the test failing due to output indeed has octant symmety. Would you prefer us to take it off?

  8. Roland Haas

    Even with octant symmetry Carpet is likely to split the grid along the z direction (all other things being equal it prefers z). So the test does not have to be removed because of that. I must have looked at an older version of the files then as they seem to not have been using compact_format (ie they had the full set of comments in the header). The test that fails with extra lines of output is likely caused by something in Cactus/Carpet so it not passing is not going to be a blocker for lean. Keeping it in (and failing for now) however will serve as a reminder to keep looking into this.

  9. helvi witek

    @ Roland: ok, we will keep it in

    @ all: to track down the issues with simd I started re-inserting the relevant commands routine by routine, starting with the simplest one.

    One such simple toy case is the routine enforcing the algebraic constraint trace(A)=0. I explicitly expanded out the loop calculating the trace of A. Nonetheless, the tests fail if I use simd (i.e. gridfunctions differ in the 5th/6th digit). The testsuite passes w/o simd. I don't see an obvious mistake, and it would be great if one of you could cross-check. The routine is in our development branch: https://bitbucket.org/canuda/lean_public/src/development/LeanBSSNMoL/src/evolve_utils.F90

    in routine "subroutine LeanBSSN_remove_trA" starting line 9

    I compiled on stampede2 using the configuration script "stampede2-skx.cfg" in simfactory.

    Ultimately, we would have used it for the calculation of the BSSN rhs and constraints. However, given these issues, Miguel and I decided to leave them out for the time being.

  10. Roland Haas

    @HelviWitek I till try and have a look. Note that you need to let bitbucket complete my user name which will ensure I get an extra notification I think :-).

  11. Roland Haas

    I am getting link failures due to multiply defined symbols in TwoPunctures and TwoPunctures_Kerr:

    /home/rhaas/ET_Proca/configs/sim/lib/libthorn_TwoPunctures_KerrProca.a(TP_utilities.c.o): In function `norm2':
    /home/rhaas/ET_Proca/arrangements/Proca/TwoPunctures_KerrProca/src/TP_utilities.c:507: multiple definition of `norm2'
    /home/rhaas/ET_Proca/configs/sim/lib/libthorn_TwoPunctures.a(TP_utilities.c.o):/home/rhaas/ET_Proca/arrangements/EinsteinInitialData/TwoPunctures/src/TP_utilities.c:507: first defined here
    

    This is actually a bug in both TP and TP_Kerr, both of which should prefix globally visible symbols with their thorn name (which also should use different names).

    TP is being "clever" and has this bit in TP_utilities.h:

    #define nrerror TP_nrerror
    #define ivector TP_ivector
    #define dvector TP_dvector
    #define imatrix TP_imatrix
    #define dmatrix TP_dmatrix
    #define d3tensor TP_d3tensor
    #define free_ivector TP_free_ivector
    #define free_dvector TP_free_dvector
    #define free_imatrix TP_free_imatrix
    #define free_dmatrix TP_free_dmatrix
    #define free_d3tensor TP_free_d3tensor
    

    so likely TP_Kerr needds to make up a new prefix here as well as for the other source files as well.

  12. Peter Diener

    @helvi witek I took a look at the simd issue starting with your modified version of evolve_utils.F90. I was able to make the tests pass by changing your openmp simd directive to:

    !$OMP SIMD PRIVATE(i, hh, aa, hu, tra, dethh)

    It appears that the SIMD directive does not inherit the PRIVATE information from the enclosing PARALLEL DO directive, so the compiler was not generating vector versions of these temporary variables but was instead using shared scalar versions.

    After this I went back to to original version with the short loops over tensor indices inside, but was not able to get a simd version of those to work correctly. I thought it might be possible to use the COLLAPSE clause to handle the inner loops, but that does not allow any code between the outer and the 2 inner loops.

    To fix all the simd issues, it therefore appears that what you need to do is to get rid of all the inner loops and then declare all the temporary variables PRIVATE.

  13. Roland Haas

    That sounds a bit like a compiler bug to me. It may be possible to write a short reproducer by nesting a omp simd in an omp parallel and testing what the compiler does to variables declared private in the omp parallel. We may have similar omp simd constructs in other code (though likely not so much and likely not in Fortran).

  14. Peter Diener

    I'm not sure that this is a compiler bug. For example the j and k variables that are declared private in the omp parallel directive change roles and can be considered to be shared in the omp simd directive.

  15. Roland Haas

    That may be the case. The OpenMP docs do document a private clause for OMP SIMD (though no shared as far as I can tell). I don't quite understand what private means in the context of an OMP SIMD though. The pragma only tells the compiler to use SIMD instructions for the next loop but does not eg spin of new threads.

  16. Erik Schnetter

    You can view SIMD parallelization as a pre-scheduled way of multi-threading, where each SIMD vector lane corresponds to a thread.

  17. Erik Schnetter

    My main objection to adding these thorns to the Einstein Toolkit was the lack of test cases. As there are now test cases, the Lean thorns should be added iff the test cases succeed.

  18. Roland Haas

    There are test cases for Lean now: https://einsteintoolkit.org/testsuite_results/index.php all tests for LeanBSSNMoL pass everywhere. However the thorn NPScalars has a failing test (teukolsky) on wheeler, stampede2-skx, shelob, golub, comet meaning it fails on most clusters and passes on most desktop systems. It could be a intel vs. gcc compiler issues, the test also passes on BW (gnu compiler but --ffast-math).

  19. Ian Hinder

    It it's a 3D teukolsky wave, then it might be quite sensitive to roundoff at the origin, depending on the type of regularisation used.

  20. Miguel Zilhão reporter

    what would be the best way to handle this? should we just remove this test? the thing is that i don't have access to a machine with intel compilers to test...

  21. Roland Haas

    The test parfile teukolsky.par is the same as WeylScal4's teukolsky.par (2 lines are is changed to output the NPScalar quantities rather than WeylScal4). However WeylScal4 uses test.ccl to reduce he accuracy requirement of the test to 5e-10:

    TEST teukolsky
    {
      ABSTOL 5e-10
    }
    

    and possibly the NPScalar test would pass with that reduced setting. Tolerance was increased in 233494b4 "WeylScal4: Set the absolute tolerance to 5e-10" (9 years ago) by @peterdiener with a comment:

    Since the teukolsky test seem to be very sensitive to roundoff error the previous tolerance seemed to be too tough. Adding noise at the level of 1e-14 could lead to absolute differences of order 1e-10 and relative differences at the level of 1e-8.

  22. Log in to comment