write index files for checkpoints

Issue #1031 closed
Roland Haas created an issue

this patch writes index files for checkpoints if output_index is set. This is (only) useful if the checkpoints will be read in by a different number of processes than written. In that case, parsing the index files is faster than parsing the heavy data files.

The two patches provide read and write functionality. The attached C code creates index files from existing HDF5 files (not necessary checkpoints). It is a modified copy of hdf5_extract from the HDF5 thorn.

Keyword: CarpetIOHDF5

Comments (12)

  1. Roland Haas reporter
    • removed comment

    I did not commit the index creator stand-alone code. It is functional but a hack. If there is interest or I have nothing else to do (not terribly likely) I'll clean it up and/or write a version based on hdf5_merge rather than hdf5_extract which would make usage simpler. They are fairly trivial modifications of the original tools. Note that the index files created by the tool are not quite identical to Ian's from CarpetIOHDF5 since I write sparse files with the full dataset extents intact whereas CarpetIOHDF5 sets the extents to 1 in each dimension.

  2. Ian Hinder
    • removed comment

    Sparse files sounds like a better solution. If you have spare time, could you modify CarpetIOHDF5 to also do that? We should also check the VisIt plugin to make sure it works with those.

  3. Roland Haas reporter
    • changed status to open
    • removed comment

    One gets sparse files by creating the datasets in the index file with the same extents as in the "heavy" data files but not writing any data into them. The one downside that the sparse files have is that the current hdf5_merge etc. are not smart enough to copy the sparse dataset as a sparse dataset, instead they would write "heavy" file with the datasets all filled with zeroes. Attached please find a patch that does this.

  4. Roland Haas reporter
    • removed comment

    Well actually, it did not remove all possible extraneous code :-). One can also get rid of the separate index_dataspace. Having sparse datasets is much simpler than what Ian originally did. I stumbled across them when I did not implement index files quite the way Ian did but simply skipped the H5Dwrite() call in hdf5_create_index.c. Sparse datasets seem to be created if one creates a dataset of a certain size, but then does not actually write data to it (or leaves out holes). So only creating the dataset with the nominal Cactus grid function size but never writing to it makes the file all sparse with all data appearing to have a value of zero when read via H5Dread().

    To summarize: The patch removes the separate shape arrays and separate dataspaces for the datasets in index files and creates the index file datasets with the same dataspace as the "heavy" file datasets. It retains the h5shape attribute that was originally introduced for index files even though it is now redundant (since one could call H5Sget_simple_extent_dims on the datasets in the index files).

  5. Erik Schnetter
    • changed status to open
    • removed comment

    (One can set the default value for a dataset to make it different from 0.)

    Please apply, with an appropriate comment, e.g. the text in comment 9 above.

  6. Log in to comment