PITTNullCode results depend on number of processors

Issue #1795 closed
Roland Haas created an issue

As part of trying to add SphericalHarmonicReconGen to the ET I noticed that PITTNullCode produces different results depending on the number of cores used. This already happens with the ''regression_test'' test if SphericalHarmonicRecon that is already in the ET. The issue is hidden right now since the test uses very lose tolerances of

ABSTOL 1.5e-5
RELTOL 1.5e-4

Changing them to the standard Cactus accuracies and regenerating the tests shows that the test that was generated using 2 processors fails when run with a single MPI process.

Test data is in the branch rhaas/procdependency .

This needs to be fixed or the arrangement must be marked as broken unless this is the intended behaviour in which case this needs to be documented prominently in the documenation.

Keyword: PITTNullCode

Comments (14)

  1. Yosef Zlochower
    • removed comment

    I think the issue is that interpolation are not independent of the domain decomposition. The spacing between points in a given coordinate system is given by (local_max - local_min)/num_cells. I can set up a testsuite that does not evolve the data, but only tests the metric on the inner worldtube. This won't test the main null evolution, nor the Bondi news calculation.

  2. Roland Haas reporter
    • removed comment

    Christian Reisswig suggested to change the NullInterp options to the default ones like so:

    #Interpolation--------------------------
    #NullInterp::interpolation_order = 4
    #NullInterp::stereo_patch_type = "circle"
    #NullInterp::deriv_accuracy = 4
    

    which indeed reduces the differences between the 2 process test run and the 1 process test run, but does not make them go away.

  3. Yosef Zlochower
    • removed comment

    Setting the ghost_size to 4, which is correct, makes the results independent of the domain decomposition to the level of roundoff. I can submit a new, more through testsuite

  4. Roland Haas reporter
    • removed comment

    Thank you for the updated test, it certainly seems to move into the right direction. I still get differences though if I run it with 2 or one processors (one has to change test.ccl to remove the lenient ATOL and RTOL settings) that are of order 1e-11 relative errors.

      Test SphericalHarmonicRecon: regression_test
        "PITTNullCode/SphericalHarmonicRecon/test/regression_test.par"
    
      Issuing mpirun -np 1 /data/rhaas/postdoc/gr/ET_trunk/exe/cactus_sim /data/rhaas/postdoc/gr/ET_trunk/arrangements/PITTNullCode/SphericalHarmonicRecon/test/regression_test.par
    
       Jn_l[0]_2D.asc: substantial differences
          significant differences on 259 (out of 6627) lines
          maximum absolute difference in column 3 is 1.17932885679295e-11
          maximum absolute difference in column 4 is 7.48709427789152e-12
          maximum relative difference in column 3 is 6.80798181095467e-11
          maximum relative difference in column 4 is 3.29043498091389e-11
          (insignificant differences on 1702 lines)
       NewsB[0]_2D.asc: differences below tolerance on 256 lines
       jcn[5]_2D.asc: differences below tolerance on 405 lines
    

    In principle I would have expected multiple processors to give exactly the same answer, though if there are really differences in the coordinates then I could believe that it is possible for roundoff level differences to grow. The test only does 2 steps though (cctk_final_time is .1 and timestep is 0.05) so 4 orders of magnitude (from 1e-15 roundoff to 1e-11) seem large.

    Would it be possible to output the coordinates as an hdf5 file (using regular HDF5 output) to check if they actually differ?

    Finally if really 4 ghost zones are required then this should really be enforced, either by changing the allowed ranges in param.ccl (if always 4 are required) or through a PARAMCHECK routine (in case the number of ghost zones depends on say the finite difference order).

  5. Yosef Zlochower
    • removed comment

    The remaining 'issue' is that the interpolator is sensitive to the domain decomposition. This is due to the fact (I think) that grid spacing is calculated locally via something like dspace = (upper - lower)/nzones. I'll attach a parfile where dspace is a power of 2, which should give near identical answers (to within the default tolerance) on any number of processors

  6. Roland Haas reporter
    • removed comment

    I used the parfile nad there are still differences. To try and see if the coordinates or the angular spacing were to blame, I added HDF5 output for "NullGrid::StCrd" and "NullGrid::NullGrParArrR" and used the attached python script to recombine the 2 processor data into a single chunk. There is no difference in either coordinates nor the angular grid spacing.

    Are there other coordinates to consider? Eg the complex ones (though if the real ones agree I am not sure why the compex ones should not).

  7. Yosef Zlochower
    • removed comment

    Here's a patch to include the check for the correct number of ghostzones

    diff --git a/NullInterp/src/NullInterp_ParamCheck.F90 b/NullInterp/src/NullInte index deee049..84bbbf4 100644 --- a/NullInterp/src/NullInterp_ParamCheck.F90 +++ b/NullInterp/src/NullInterp_ParamCheck.F90 @@ -21,6 +21,15 @@ subroutine NullInterp_ParamCheck(CCTK_ARGUMENTS) call CCTK_PARAMWARN("Must set NullGrid::N_ang_ghost_pts >= NullGrid::N_an end if

    • if (deriv_accuracy.eq.4 .and. N_ang_ghost_pts.lt.4) then
    • call CCTK_PARAMWARN("If deriv_accuracy==4, then N_ang_ghost_pts >=4 is re
    • end if +
    • if (deriv_accuracy.eq.4 .and. N_ang_stencil_size.lt.4) then
    • call CCTK_PARAMWARN("If deriv_accuracy==4, then N_stencil_size >=4 is req
    • end if + + ! derivative_accuracy = deriv_accuracy

    ! if (deriv_accuracy.eq.4) then

  8. Ian Hinder
    • removed comment

    AEILocalInterp (is that the one being used?) by default adjusts the interpolation stencil if there are insufficient ghost zones, and the interpolation point is next to an interprocessor boundary. To avoid this, you should use

      boundary_off_centering_tolerance={0.0 0.0 0.0 0.0 0.0 0.0}
      boundary_extrapolation_tolerance={0.0 0.0 0.0 0.0 0.0 0.0}
    

    in the interpolator parameter string.

    See https://bitbucket.org/einsteintoolkit/einsteinanalysis/src/69f49fffa270/Multipole/param.ccl.

    See also https://trac.einsteintoolkit.org/ticket/181, in which it was decided to make AEILocalInterp abort if this condition was detected, rather than changing the interpolation operator, but as far as I know this was not implemented.

  9. Log in to comment