Update tests which rely on BSSN_MoL to use ML_BSSN instead

Issue #824 closed
Ian Hinder created an issue

BSSN_MoL is not part of the toolkit. Many thorns which are part of the toolkit have tests which rely on BSSN_MoL. These tests currently do not run. In order to test these thorns, the tests should be updated to use ML_BSSN instead of BSSN_MoL.

Correctness-testing the new results (which will be different due to differences in the implementation of BSSN) is problematic. I propose that if the tests run with ML_BSSN and do not generate NaNs or poison, we should just commit the new test data. These are only regression tests anyway - there is no formal correctness-testing framework.

Keyword: testsuites

Comments (31)

  1. Roland Haas
    • removed comment

    Of the tests missing only BSSN_MoL:

    • checkpoint in AHFinderDirect, recover in AHFinderDirect were removed since checkpointML exists
    • checkpoint2 in AHFinderDirect, recover2 in AHFinderDirect were removed since they use LegoExcision which is deprecated and will go away
    • CarpetEvolutionMask_test in CarpetEvolutionMask, CarpetEvolutionMask_off_test in CarpetEvolutionMask has an attached patch that uses ML_BSSN but requires source code changes since it was using parameters of Carpet that no longer exist. Values changed considerably since the old test used the conformal metric. The test is also huge (about 25MB).
    • test_BSSN_MoL_Carpet in CarpetRegrid, test_BSSN_MoL_Carpet_keep in CarpetRegrid will not be fixed since CarpetRegrid2 exists and should be used instead of Regrid.
    • GRHydro_test_tov_ppm in GRHydro, GRHydro_test_tov_ppm_disable_internal_excision and GRHydro_test_tov_ppm_no_trp will be removed since corresponding ML tests exist
    • RotatingSymmetry180, RotatingSymmetry90 were replaced by tests usin McLachlan and NoExcision instead of BSSN_MoL and LegoExcision. All affected tests need to be regenerated.
    • schw-0050 in Cartoon2D, test_cartoon_2 in Cartoon2D, test_cartoon_3 in Cartoon2D were replaced by tests using ML_BSSN
    • test_ah in Dissipation, test_ob in Dissipation were replaced by tests using ML_BSSN everything needs to be regenerated
    • test_legoexcision in LegoExcision will be deprecated along with LegoExcision
    • KS-tilted in Exact has been replaced by a test using ML_BSSN and NoExcision instead of BSSN_MoL and LegoExcision
    • de_Sitter in Exact has been replaced by a test using metric_evolution = "exact"; it already used Exact for the gauge evolution

    Many of the tests that had not already been converted used metric_type = "static conformal" so all test data needs to be regenerated for them

  2. Roland Haas
    • removed comment

    Since the test data was created with gcc and damiana uses intel. Would you mind trying waht happens if you compile with -fp-model precise, please? The tests might be quite sensitive to roundoff since some of them have singularities on the grid.

  3. Ian Hinder reporter
    • removed comment

    With the standard optionlist, I get:

    Tests failed:

    schw-0050 (from Cartoon2D) test_cartoon_3 (from Cartoon2D) KS-tilted (from Exact) Kerr (from RotatingSymmetry180) Kerr-rotating-180 (from RotatingSymmetry180) Kerr-rotating-90 (from RotatingSymmetry90) Kerr-rotating-90-staggered (from RotatingSymmetry90) regression_test (from SphericalHarmonicRecon)

    with "-fp-model precise" I get:

    Tests failed:

    test_cartoon_3 (from Cartoon2D) KS-tilted (from Exact) Schwarzschild_EF (from Exact) Kerr (from RotatingSymmetry180) Kerr-rotating-180 (from RotatingSymmetry180) Kerr-rotating-90 (from RotatingSymmetry90) Kerr-rotating-90-staggered (from RotatingSymmetry90) regression_test (from SphericalHarmonicRecon)

    So it makes schw-0050 pass, and Schwarzschild_EF, which has always passed before, fail! Exact is very sensitive to roundoff-differences, so I'm not going to spend time on that. Looking at test_cartoon_3, the differences before were:

    grr_3D_diagonal.xg: substantial differences significant differences on 1 (out of 14) lines maximum absolute difference in column 2 is 1.36424205265939e-12 maximum relative difference in column 2 is 5.71414569021309e-16 (insignificant differences on 2 lines) grr_maximum.xg: substantial differences significant differences on 1 (out of 2) lines maximum absolute difference in column 2 is 1.81898940354586e-12 maximum relative difference in column 2 is 1.41519696318623e-16 grr_minimum.xg: differences below tolerance on 2 lines grr_norm1.xg: differences below tolerance on 2 lines grr_norm2.xg: substantial differences significant differences on 1 (out of 2) lines maximum absolute difference in column 2 is 1.02318153949454e-12 maximum relative difference in column 2 is 1.78066858765207e-15 (insignificant differences on 1 lines) grr_x_[2][3].xg: substantial differences significant differences on 3 (out of 42) lines maximum absolute difference in column 2 is 3.63797880709171e-12 maximum relative difference in column 2 is 2.83039392637247e-16 (insignificant differences on 1 lines) grr_y_[2][3].xg: substantial differences significant differences on 1 (out of 14) lines maximum absolute difference in column 2 is 1.81898940354586e-12 maximum relative difference in column 2 is 3.37736718006668e-15 (insignificant differences on 3 lines) grr_z_[2][2].xg: substantial differences significant differences on 2 (out of 72) lines maximum absolute difference in column 2 is 3.63797880709171e-12 maximum relative difference in column 2 is 3.89684464464154e-14 (insignificant differences on 1 lines) gxx_3D_diagonal.xg: differences below tolerance on 2 lines gxx_minimum.xg: differences below tolerance on 1 lines gxx_norm1.xg: differences below tolerance on 2 lines gxx_norm2.xg: substantial differences significant differences on 2 (out of 2) lines maximum absolute difference in column 2 is 1.13686837721616e-12 maximum relative difference in column 2 is 1.97852014937682e-15 gxx_x_[2][3].xg: substantial differences significant differences on 2 (out of 42) lines maximum absolute difference in column 2 is 1.81898940354586e-12 maximum relative difference in column 2 is 5.22440810962943e-15 (insignificant differences on 1 lines) gxx_y_[2][3].xg: differences below tolerance on 4 lines gxy_3D_diagonal.xg: differences below tolerance on 1 lines gxy_maximum.xg: differences below tolerance on 1 lines gxy_minimum.xg: differences below tolerance on 1 lines gxy_y_[2][3].xg: differences below tolerance on 4 lines gyy_3D_diagonal.xg: differences below tolerance on 3 lines gyy_minimum.xg: differences below tolerance on 1 lines gyy_norm1.xg: differences below tolerance on 2 lines gyy_norm2.xg: substantial differences significant differences on 1 (out of 2) lines maximum absolute difference in column 2 is 1.13686837721616e-12 maximum relative difference in column 2 is 1.97851928863973e-15 (insignificant differences on 1 lines) gyy_x_[2][3].xg: substantial differences significant differences on 3 (out of 42) lines maximum absolute difference in column 2 is 1.81898940354586e-12 maximum relative difference in column 2 is 5.2244263577245e-15 gyy_y_[2][3].xg: differences below tolerance on 2 lines gzz_3D_diagonal.xg: substantial differences significant differences on 1 (out of 14) lines maximum absolute difference in column 2 is 1.36424205265939e-12 maximum relative difference in column 2 is 5.71410366136695e-16 (insignificant differences on 2 lines) gzz_norm1.xg: differences below tolerance on 2 lines gzz_norm2.xg: substantial differences significant differences on 1 (out of 2) lines maximum absolute difference in column 2 is 1.13686837721616e-12 maximum relative difference in column 2 is 1.97852013904379e-15 gzz_x_[2][3].xg: substantial differences significant differences on 2 (out of 42) lines maximum absolute difference in column 2 is 1.81898940354586e-12 maximum relative difference in column 2 is 1.41519740783568e-16 gzz_y_[2][3].xg: differences below tolerance on 2 lines ham_z_[2][2].xg: differences below tolerance on 1 lines

    and after they are:

    grr_norm1.xg: differences below tolerance on 2 lines grr_norm2.xg: substantial differences significant differences on 1 (out of 2) lines maximum absolute difference in column 2 is 1.13686837721616e-12 maximum relative difference in column 2 is 1.97852065294675e-15 (insignificant differences on 1 lines) gxx_norm1.xg: differences below tolerance on 2 lines gxx_norm2.xg: substantial differences significant differences on 2 (out of 2) lines maximum absolute difference in column 2 is 1.13686837721616e-12 maximum relative difference in column 2 is 1.97852014937682e-15 gyy_norm1.xg: differences below tolerance on 2 lines gyy_norm2.xg: substantial differences significant differences on 1 (out of 2) lines maximum absolute difference in column 2 is 1.13686837721616e-12 maximum relative difference in column 2 is 1.97851928863973e-15 (insignificant differences on 1 lines) gzz_norm1.xg: differences below tolerance on 2 lines gzz_norm2.xg: substantial differences significant differences on 1 (out of 2) lines maximum absolute difference in column 2 is 1.13686837721616e-12 maximum relative difference in column 2 is 1.97852013904379e-15

    i.e. the only quantities which are now significantly different are the norms. Note that only 14 digits are output. The relative differences are of the order of 1e-15, but the absolute differences in the norms are of the order of 1e-12 (the default tolerance for the tests is 1e-12). If we accept an absolute tolerance of 1e-12 for the variables, then shouldn't we should multiply this by the volume of the domain for the norms? In any case, I think that differences of the order of 1e-9 would be perfectly fine, and I think the tolerances could just be increased.

  4. Roland Haas
    • removed comment

    Let me first try if I can make it more robust by shuffling points away from the origin (same as was done for Schwarzschild_EF which admittedly also fails).

  5. Roland Haas
    • removed comment

    Ian: I changed the affected tests to use more robust methods (larger exact_eps, second order interpolation, continously join excised region to outer region) and regenerated the test data. With this the tests pass on zwicky (intel 11.1, no precise) using 3 MPI processes and 5 threads. Would you mind giving them a try on Datura, please? I like to avoid a half dozen commits of the type "really make checkpoints work with Intel". You an find all patches at http://www.tapir.caltech.edu/~rhaas/824/ (I cannot attach them to the ticket since they are too large). I did not touch the sphericalrecon test (in the PITTNullCode arrangement) since I had not modified it.

  6. Ian Hinder reporter
    • removed comment

    If this were just a branch in a unified repository, it would be fine, as I could check it out with one command, but it's really too much work for me to manually apply this many patches. Since you have tested it on Intel 11, I assume they will work, and the easiest thing is if you just commit them and the automatic tests will pick them up.

    PS: that page wouldn't open for me.

  7. Roland Haas
    • removed comment

    ok. I applied the patches. The page works from my laptop at home. Very strange, maybe some trans-Atlantic timeout issues again..

  8. Ian Hinder reporter
    • removed comment
  9. Roland Haas
    • removed comment

    Thank you Ian. I had forgotten about the Dissipation tests. I updated them. Essentially I had to update the absolute tolerance since the grid values for gxx are about 100 (compared to order unity for the old static conformal test). I also cleaned up the domain description, excised the inner region and removed the sum reduction which is known to cause problems.

    Dissipation tests run on only a single processor (my guess without looking at the actual code is that they use an elliptic solver which might not work with more than one component per level).

    SphericalHarmonicRecon should be unrelated to this ticket, yes.

  10. Roland Haas
    • removed comment

    The regression_test seems to fail because AEILocalInterp cannot handle input variable type 111 (CCTK_VARIABLE_COMPLEX) and the log for this test (both claimed working (see [http://git.barrywardell.net/EinsteinToolkitTestResults.git/blob/a79d99f5d8c5063b2b511f9e2b41aea50697f2c6:/test_1/SphericalHarmonicRecon/regression_test.diffs regression_test.diffs] notice that it says 0 failures even though there are "substantial differences") and failing) is filled with

    WARNING[L1,P0] (AEILocalInterp): CCTK_InterpLocalUniform(): input datatype 111 not supported! (0-origin) input #in=0 The test passes for me on zwicky without these warnings with 2 processes and 6 threads each with no error in the log.

  11. Erik Schnetter
    • removed comment

    CCTK_COMPLEX should be supported by AEILocalInterp. This may be a problem with autoconf on this system.

  12. Roland Haas
    • changed status to open
    • removed comment

    Now the CarpetEvolutionMask tests started to fail for me on ziwkcy (both 1 and 2 processes and 6 threads per process each). Looking at the data, they *should* fail since they put a grid point at the Schwarzschild singularity. I have regenerated the data (in http://www.tapir.caltech.edu/~rhaas/824/fudge_singularity.patch the file is too large to attach) but am surprised it ever worked for Damiana. Waiting for approval and Ian's report on which commits happened since the test last passed before committing.

  13. Erik Schnetter
    • removed comment

    I don't think that epsilon needs to be as large as in this patch; setting it to 1e-4 or so should still work.

  14. Roland Haas
    • removed comment

    epsilon shows up as 1/pow(epsilon,4) in the metric so 1e-4 corresponds to grid values of order 1e16 which is too large to feel comfortable for me. With 0.5 I still have values of order 16 on the grid (going down to 1.5 or so at the edge of the grid so a dynamic range of about 2 orders of magnitude). This was the reason for the large value of epsilon. It's also a hedge against Intel non-precise floating point arithmetic which required rather large values of epsilon in previous tests. I might get away with 1e-1 (which is half a step size and would thus correspond more or less to a staggering the grid ) but based on the experience I had getting the Exact and RotatingSymmetry tests to robustly pass on both intel and gcc 1e-4 will not work.

  15. Roland Haas
    • removed comment

    Ian says the test does not fail on datura, so I am holding off until I understand the difference.

  16. Erik Schnetter
    • removed comment

    I often see values 1e10 on the grid, and am happy with them. I am surprised that you are trying to reduce the singularity down to 16.

    There are other methods of smoothing the singularity that have no effect further away from the singularity (using e.g. a spline instead of adding a term to r). For a test case this doesn't matter. I am worried that this will be a rather weak test if such a large epsilon is added, in particular if we don't need to do that in actual production runs.

  17. Roland Haas
    • removed comment

    I did a bit more digging into what could cause the observed behaviour (that I get test failures between different machines/compilers/options unless epsilon is huge) and just realised that the defaults for the tolerances (from RunTestUtils.pl and the "Print tolerance table" table) seem to be ABSTOL=1e-12, RELTOL=0. Which means we are in trouble when using default tolerances unless the range of values is around unity. This nicely matches my observation that the tests tend to fail for odd reasons unless the range of all values in the files is about unity.

    This then seems to be the major reason for the problem and I should be able to avoid this be addinga RELTOL = 1e-13 (or so) to test.ccl. This seems to be the issue by the recent testsuites. It would seem to me as if the default tolerances should be changed to have a non-zero RELTOL value. Would that make sense?

    For the singularity the problem I had smoothing it out was that none of the ID thorns provides nice methods to smooth out the singularity. I used NoExcision in a number of tests to cut out a region. Unfortunately without Carpet I can only use the "old" method and then have to fudge Minkowski_scale to avoid discontinuities at the edge of the excised region (and correspondingly large values of derivatives).

    As far as having data close to production data is concerned I am not sure if I agree. Our resolution in regression tests is so low that all the physics is wrong anyway, so I would much more like to construct a test that robustly tests the code than something that is close to production data. I then don't mind if the answer are all systematically wrong as long as they are still sensitive to changes in the code. In this sense any initial data is fine (any epsilon) for me since the code that is executed does not depend on the data. This might be very different for hydro where the code path taken depends on the data on the grid.

  18. Erik Schnetter
    • removed comment

    Yes, a non-zero RELTOL makes sense.

    I believe I added code to Exact's Kerr-Schild method to offer a nicer way to smoothing things out, where you provide both a radius and an epsilon, such that the solution remains unaffected outside the radius. I just checked -- this does not yet seem to be available in EinsteinExact.

    Go ahead, use your large value of epsilon.

  19. Roland Haas
    • changed status to resolved
    • removed comment

    not needed anymore since the problem only showed up since I had not chosen a sensible relative tolerance. With reltol = 1e-12 everything works fine. I am closing the ticket.

  20. Log in to comment