speed up reading metadata of HDF5 files in visit

Issue #1483 closed
Roland Haas created an issue

The attached patch tries to extract as much information about a dataset from the dataset name as possible. Since accessing attributes is slow but the dataset name comes for free this speeds up opening HDF5 files in VisIt drastically (I had a factor of 6 when I measured it quite a while ago).

This is certainly not the nicest way of parsing the string, one could think of sequentially looking for "name=value" constructs in the string and acting based on the name rather than just matching against all possible options.

Keyword: CarpetHDF5

Comments (10)

  1. Ian Hinder
    • removed comment

    Is this beneficial even when using index files? We do a similar thing in SimulationTools. Take a look at https://bitbucket.org/simulationtools/simulationtools/src/8e0d40844bc9831bb3ba80e9fd453371c0574327/CarpetIOHDF5.m?at=master around line 332 "Low-level interface for determining what's available in a file". Essentially we read in all the dataset names and parse them, and build up a lookup table which is then used to find datasets by the key/value pairs from the name. Note that (as explained in the comment in the file) we use the terminology "attribute" here to refer to the X=Y parts of the dataset name, not the HDF5 attribute (this should probably be changed). The HDF5 attributes are read using the Annotations function, but this is only called when reading the data (to get the origin and delta) or something (like cctk_time) that is only available in HDF5 attributes.

  2. Roland Haas reporter
    • removed comment

    I made up some test using a file with very many (~400k) datasets (9.2GB which does not quite fit into main memory). I give times to iterate over the file and read the metadata (using gettimeofday):

    With patch and without index files: 26s With patch and with index files: 18s Without patch and without index files: 27s Without patch and with index files: 19s

    Doing the same with a large 3d dataset (~16GB) I find:

    With patch and without index files: 12s With patch and with index files: 10s Without patch and without index files: 12s Without patch and with index files: 10s

    So ti seems as if this has not measurable impact in speed (index files however help). This would mean that there is no need (anymore?) to try and parse information out of the dataset name and one can just query the attributes.

  3. Roland Haas reporter
    • removed comment

    Not as far as I can tell. I usually don't use them (since I forget to turn them on).

  4. Ian Hinder
    • removed comment

    I think this ticket can be closed, as the enhancement, once benchmarked, showed no speed improvement. Roland, do you agree? Or do you want to do more testing?

  5. Roland Haas reporter
    • removed comment

    Ian: I'll likely just apply it since it cannot hurt and the same logic already exists for the other attributes. I have more CarpetHDF5 changes anyway so I can just bunch them all up and try to get them into VisIt in one go. I am still confused that apparently in the Mathematica reader using string parsing was found to be a good thing but that the Carpet HDF5 reader sees so little benefit. Possibly the issue is that I used postprocessed files that might be "nicer" than what the simulation would produce directly.

  6. Log in to comment