Modify

Opened 6 years ago

Last modified 5 years ago

#808 assigned enhancement

testsuites in HDF5

Reported by: Jian Tao Owned by:
Priority: minor Milestone:
Component: Cactus Version: development version
Keywords: Cc:

Description

Instead of constructing tools to compare Carpet ascii output from multiple processes, how about building tools to diff files HDF5 ?

Compared to ASCII:

HDF5 testsuites may have several advantages:
small, fast, portable, independent of number of process.

Attachments (0)

Change History (13)

comment:1 Changed 6 years ago by Frank Löffler

independent of number of process

This is only true when using parallel HDF5, otherwise you compare a bunch of hdf5 files to another bunch of hdf5 files - not saying that this could not be done of course.

comment:2 Changed 6 years ago by Roland Haas

Also, Cactus's hdf5 files format exposes the number of processors used through the domain decomposition which appears in the number/name of datasets in files. This is true even for 1d/2d hdf5 output where only a single output file is created. I don't really buy into the size/speed argument for the current set of 1d output in tests but can certainly see it as desirable once we compare 3d data (which we probably should). Also see the discussion in #566 about related issues wrt. to being able to use the version control system's diff command.

What we would also like to have is tools that can unchunk output files and compare data ignoring the processor decomposition (both for ASCII and HDF5 should we use HDF5 output).

For ASCII output, the current options to not output ghost/buffer/symmetry zones _almost_ do the trick. The only problem was that comment and empty lines are not completely ignored but their presence has to match in the files to be compared (or at least that used to be the case).

comment:3 Changed 6 years ago by Frank Löffler

Carpet provides hdf5_recombiner, which can do part of that job.

comment:4 Changed 6 years ago by Jian Tao

I was thinking about using hdf5 tools to extract the data we want to compare, instead of comparing two binary files directly. There could be more options in this direction.

comment:5 Changed 6 years ago by Frank Löffler

I didn't intend to suggest to use hdf5_recombiner to obtain two files which could be directly compared. That would probably not work. However, it could be used as starting point of a tool which could compare two sets of files.

comment:6 Changed 6 years ago by Ian Hinder

This ticket has raised several issues:

  1. The original summary contains an error: Carpet HDF5 output is not independent of the number of processes.
  1. Having an output format for test suites which is independent of the number of processes would be extremely useful (it makes testing correctness on different numbers of processes trivial). There might be ways to set the parameters so that such an ASCII file is created. Roland: I believe that I fixed the issue with the existence of comment/blank lines (https://trac.einsteintoolkit.org/browser/Cactus%20flesh/trunk/lib/sbin/RunTestUtils.pl?rev=4773).
  1. It might be possible for Carpet to merge together the different components in its "sliced output" routines by super-region. If this was implemented, presumably it would apply to both HDF5 and ASCII "sliced" output. This would make the output independent of the number of processes. This might also make Carpet output (at least unigrid) usable with generic tools without needing to write plugins.
  1. Using HDF5 instead of ASCII for test output has only one advantage that I can see, which is that the data can be transparently compressed so that it takes up less space, but it has many disadvantages (note that I am usually a rabid supporter of HDF5 as opposed to ASCII). For example, version control systems and their GUIs do not natively understand HDF5 files, which means that you cannot tell from one version of a file to the next what has changed. For these tests, this sort of comparison is essential, and to encourage people to do it, it must be very easy. Having to run an external tool to see the differences adds a lot of unnecessary work.
  1. Modifying the Cactus test mechanism to understand HDF5 files is something I have thought about for a while, before coming to the conclusion in (4). There is already h5diff (http://www.hdfgroup.org/HDF5/doc/RM/Tools.html), which is a standard part of HDF5. Hooking into this to show differences would probably be possible. You can specify relative and absolute tolerances on the command line, as well as options for NaN detection.
  1. Should we be using 3D output for tests? The resulting size, especially with Carpet's 3D output format, makes HDF5 the only reasonable option. I have in the past experimented with this, and found that the resulting files, even with full HDF5 compression, were too large. If we had comprehensive coverage with 3D output, we would be forced to store them in a separate repository.

comment:7 Changed 6 years ago by Erik Schnetter

There are several reasons why Carpet's HDF5 output is not independent of the number of processes. I agree completely that this is inconvenient in many cases; however, this was a conscious decision we made drawing from experience with had with PUGH.

Couldn't we add the complex logic into the comparison? We currently just run diff; instead, we could

grep -v '^ *(#.*)?$' | sort | uniq

and then diff. Carpet's ASCII format is intentionally defined such that the order in which the lines are presented does not matter, since each output line is self-sufficient. The line breaks and comments are also purely for gnuplot's sake and are not necessary for a correct interpretation.

comment:8 Changed 6 years ago by Roland Haas

It would be very nice to adapt Cactus' test mechanism to also support HDF5 output, eg. to test the HDF5 output thorns.

comment:9 Changed 6 years ago by yosef@…

The output of the SphericalHarmonicDecomp thorn (CCE) is in hdf5 (the output should, in principle, be the same regardless of the number of CPUs). To have a robust test of this code, it would be best to test the hdf5 files, themselves. Having the option of passing the output file through a filter (here it could be h5dump) before comparing would be quite useful. A simple hfdiff -d <TOL>
would also work for this thorn.

comment:10 Changed 6 years ago by Frank Löffler

Cc: joshuaharris01@… added
Owner: set to jhar131
Priority: optionalminor
Status: newassigned

comment:11 Changed 6 years ago by Frank Löffler

Cc: joshuaharris01@… removed
Owner: jhar131 deleted

Won't happen soon.

comment:12 Changed 6 years ago by Frank Löffler

Milestone: Cactus_4.1.0Cactus_4.2.0

comment:13 Changed 5 years ago by Frank Löffler

Milestone: Cactus_4.2.0

Modify Ticket

Change Properties
Set your email in Preferences
Action
as assigned The ticket will remain with no owner.
Next status will be 'review'.
as The resolution will be set.
to The owner will be changed from (none) to the specified user.
The owner will be changed from (none) to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.