out-of-bounds write access checking in Cactus

Issue #1430 closed
Roland Haas created an issue

I possibly very useful (and simple to implement) debugging help in Cactus would be if the Cactus driver provided some means to detect array accesses out of the array/grid-function bounds. In general that is hard to do (in C, for Fortran there are compiler switches) however a possible useful partial solution might already be to put canary values before and after the user-visible data of grid functions/grid arrays. The flesh/driver could then check after each scheduled routine if any of the canary values were modified and if so output a warning.

Schematically the layout in memory would be

Canary1 data Canary2

and CCTK_VarDataPtr would return a pointer to "data" only. After a scheduled function all we check Canary1 and Canary2 and output an error if they are corrupted. Similarly the IncreaseGroupStorage/DecreaseGroupStorage routines could set/check the canary values.

This would prevent these errors triggering failures at some later unrelated call to malloc or free. glibc's malloc function provides some of this if _MALLOC_DEBUG is set, though I am not sure how well that actually works in practice in particular since eg OpenMP provides its own malloc function.

I don't have an implementation of this right now and am mostly fishing for comments.

Keyword:

Comments (10)

  1. Erik Schnetter
    • removed comment

    This would be straightforward to implement in PUGH and Carpet. In Carpet, the canaries would be initialized when the memory is allocated, and would be checked regularly (e.g. during poison checking), not just before deallocating.

    I doubt that OpenMP provides its own malloc function. I expect all memory allocation (including operator new from C++) to go through libc's malloc.

  2. Roland Haas reporter
    • removed comment

    I have a proof-of-concept implementation that implements a canary based electric fence around CarpetLib's mem objects. These are used for grid functions, scalars and arrays. The current code will check after each scheduled routine if any (write) access out-of-bounds happened in any of the variables accessible through CCTK_VarDataPtr. This is mostly ok though one can construct situations (involving Carpet's ENTER_XXX_MODE functions etc) where eg a GLOBAL routine accesses grid functions which is currently not caught.

    The functionality is activated by setting CarpetLibs::electric_fence = yes. Right now the implementation passes the test suite and there is a (designed to fail) test in CarpetExtra/CarpetEFenceTest.

    Most likely the interface will change in the future so that the check_fence function becomes a member of ggf (which is CarpetLib's representation of a grid function) rather than the data objects so that the user can just call ggf->check_fence(warning_callback) passing in a callback routine rather than iterating over all timelevels/maps/whatnot of each variable.

    The code is in a branch on bitbucket: https://bitbucket.org/rhaas80/carpet/commits/branch/fence You can click on the "Compare" button near the upper right to see the differences to (my) current Carpet master branch. Note that this branch will be rebased often.

    Comments welcome.

  3. Erik Schnetter
    • removed comment

    The line

    storage_ = (T*)align_up(size_t(storage_base_ + electric_fence), alignment);
    

    is wrong; electric_fence should be canary/2 instead.

  4. Roland Haas reporter
    • removed comment

    Right you are. Fixed now. Was a leftover from when the code only supported a fence_width of 1.

  5. Roland Haas reporter
    • removed comment

    Well I was not really aiming for an actual review since I plan to make further changes. It's good to see that the concept and current implementation is approved though. Thank you.

  6. Log in to comment