testsuites in HDF5

Issue #808 open
Jian Tao created an issue

Instead of constructing tools to compare Carpet ascii output from multiple processes, how about building tools to diff files HDF5 ?

Compared to ASCII:

HDF5 testsuites may have several advantages: small, fast, portable, independent of number of process.

Keyword:

Comments (13)

  1. Frank Löffler
    • removed comment

    independent of number of process

    This is only true when using parallel HDF5, otherwise you compare a bunch of hdf5 files to another bunch of hdf5 files - not saying that this could not be done of course.

  2. Roland Haas
    • removed comment

    Also, Cactus's hdf5 files format exposes the number of processors used through the domain decomposition which appears in the number/name of datasets in files. This is true even for 1d/2d hdf5 output where only a single output file is created. I don't really buy into the size/speed argument for the current set of 1d output in tests but can certainly see it as desirable once we compare 3d data (which we probably should). Also see the discussion in #566 about related issues wrt. to being able to use the version control system's diff command.

    What we would also like to have is tools that can unchunk output files and compare data ignoring the processor decomposition (both for ASCII and HDF5 should we use HDF5 output).

    For ASCII output, the current options to not output ghost/buffer/symmetry zones _almost_ do the trick. The only problem was that comment and empty lines are not completely ignored but their presence has to match in the files to be compared (or at least that used to be the case).

  3. Jian Tao reporter
    • removed comment

    I was thinking about using hdf5 tools to extract the data we want to compare, instead of comparing two binary files directly. There could be more options in this direction.

  4. Frank Löffler
    • removed comment

    I didn't intend to suggest to use hdf5_recombiner to obtain two files which could be directly compared. That would probably not work. However, it could be used as starting point of a tool which could compare two sets of files.

  5. Ian Hinder
    • removed comment

    This ticket has raised several issues:

    1. The original summary contains an error: Carpet HDF5 output is not independent of the number of processes.

    2. Having an output format for test suites which is independent of the number of processes would be extremely useful (it makes testing correctness on different numbers of processes trivial). There might be ways to set the parameters so that such an ASCII file is created. Roland: I believe that I fixed the issue with the existence of comment/blank lines (https://trac.einsteintoolkit.org/browser/Cactus%20flesh/trunk/lib/sbin/RunTestUtils.pl?rev=4773).

    3. It might be possible for Carpet to merge together the different components in its "sliced output" routines by super-region. If this was implemented, presumably it would apply to both HDF5 and ASCII "sliced" output. This would make the output independent of the number of processes. This might also make Carpet output (at least unigrid) usable with generic tools without needing to write plugins.

    4. Using HDF5 instead of ASCII for test output has only one advantage that I can see, which is that the data can be transparently compressed so that it takes up less space, but it has many disadvantages (note that I am usually a rabid supporter of HDF5 as opposed to ASCII). For example, version control systems and their GUIs do not natively understand HDF5 files, which means that you cannot tell from one version of a file to the next what has changed. For these tests, this sort of comparison is essential, and to encourage people to do it, it must be very easy. Having to run an external tool to see the differences adds a lot of unnecessary work.

    5. Modifying the Cactus test mechanism to understand HDF5 files is something I have thought about for a while, before coming to the conclusion in (4). There is already h5diff (http://www.hdfgroup.org/HDF5/doc/RM/Tools.html), which is a standard part of HDF5. Hooking into this to show differences would probably be possible. You can specify relative and absolute tolerances on the command line, as well as options for NaN detection.

    6. Should we be using 3D output for tests? The resulting size, especially with Carpet's 3D output format, makes HDF5 the only reasonable option. I have in the past experimented with this, and found that the resulting files, even with full HDF5 compression, were too large. If we had comprehensive coverage with 3D output, we would be forced to store them in a separate repository.

  6. Erik Schnetter
    • removed comment

    There are several reasons why Carpet's HDF5 output is not independent of the number of processes. I agree completely that this is inconvenient in many cases; however, this was a conscious decision we made drawing from experience with had with PUGH.

    Couldn't we add the complex logic into the comparison? We currently just run diff; instead, we could

    grep -v '^ *(#.*)?$' | sort | uniq

    and then diff. Carpet's ASCII format is intentionally defined such that the order in which the lines are presented does not matter, since each output line is self-sufficient. The line breaks and comments are also purely for gnuplot's sake and are not necessary for a correct interpretation.

  7. Roland Haas
    • removed comment

    It would be very nice to adapt Cactus' test mechanism to also support HDF5 output, eg. to test the HDF5 output thorns.

  8. Yosef Zlochower
    • removed comment

    The output of the SphericalHarmonicDecomp thorn (CCE) is in hdf5 (the output should, in principle, be the same regardless of the number of CPUs). To have a robust test of this code, it would be best to test the hdf5 files, themselves. Having the option of passing the output file through a filter (here it could be h5dump) before comparing would be quite useful. A simple hfdiff -d <TOL> would also work for this thorn.

  9. Log in to comment