Modify

Opened 6 years ago

Closed 6 years ago

#1053 closed defect (fixed)

CarpetIOHDF5: it doesn't honor out3D_every parameter

Reported by: bmundim Owned by: Erik Schnetter
Priority: major Milestone:
Component: Carpet Version:
Keywords: CarpetIOHDF5 out3D_every Cc:

Description

Hi,

I have noticed a strange behaviour of CarpetIOHDF5 when running a bbh
simulation with Lovelace release: it doesn't seem to follow the parameter
out3D_every. I have attached a modified version of
CarpetWaveToyCheckpointTest.par that reproduces this error. I tried to mimic a
bbh simulation setup I am using in terms of number of refinement levels
(it also helps to slow down the simulation a bit) and what I was interested
to output. The experiment goes as follows:

1) create two directories, 000 and 001, where you can copy the attached par
file to. Symlink a cactus executable for ET on each directory. Both development
and lovelace releases will show this problem. You should have something like this:

[bruno@frozenstar wavetoytest]$ ls 0*
000:
cactus_einstein@  CarpetWaveToyCheckpointTest.par

001:
cactus_einstein@  CarpetWaveToyCheckpointTest.par

2) Run the executable on both directories. First on 000 and then on 001.

./cactus_einstein CarpetWaveToyCheckpointTest.par

3) Analyze the results. Everything goes all right on the first directory, 000:

[bruno@frozenstar CarpetWaveToyCheckpointTest]$ h5ls wavetoy::scalarevolve.xyz.h5 
Parameters\ and\ Global\ Attributes Group
WAVETOY::phi\ it=0\ tl=0\ rl=0 Dataset {43, 43, 43}
WAVETOY::phi\ it=0\ tl=0\ rl=1 Dataset {41, 41, 41}
WAVETOY::phi\ it=0\ tl=0\ rl=2 Dataset {33, 33, 33}
WAVETOY::phi\ it=0\ tl=0\ rl=3 Dataset {33, 33, 33}
WAVETOY::phi\ it=0\ tl=0\ rl=4 Dataset {33, 33, 33}
WAVETOY::phi\ it=0\ tl=0\ rl=5 Dataset {33, 33, 33}
WAVETOY::phi\ it=0\ tl=0\ rl=6 Dataset {33, 33, 33}
WAVETOY::phi\ it=0\ tl=0\ rl=7 Dataset {33, 33, 33}
WAVETOY::phi\ it=0\ tl=0\ rl=8 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=1 Dataset {41, 41, 41}
WAVETOY::phi\ it=128\ tl=0\ rl=2 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=3 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=4 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=5 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=6 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=7 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=8 Dataset {33, 33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=0 Dataset {43, 43, 43}
WAVETOY::phi\ it=256\ tl=0\ rl=1 Dataset {41, 41, 41}
WAVETOY::phi\ it=256\ tl=0\ rl=2 Dataset {33, 33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=3 Dataset {33, 33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=4 Dataset {33, 33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=5 Dataset {33, 33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=6 Dataset {33, 33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=7 Dataset {33, 33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=8 Dataset {33, 33, 33}

You can see above that all refinement levels were correctly output every 128
iterations as it was set on the par file. That doesn't happen anymore after
recovery, ie on the second directory, 001:

[bruno@frozenstar CarpetWaveToyCheckpointTest]$ h5ls wavetoy::scalarevolve.xyz.h5 
Parameters\ and\ Global\ Attributes Group
WAVETOY::phi\ it=262\ tl=0\ rl=7 Dataset {33, 33, 33}
WAVETOY::phi\ it=262\ tl=0\ rl=8 Dataset {33, 33, 33}
WAVETOY::phi\ it=390\ tl=0\ rl=7 Dataset {33, 33, 33}
WAVETOY::phi\ it=390\ tl=0\ rl=8 Dataset {33, 33, 33}

neither 262 or 390 are multiples of 128! So something is going really wrong
here. Note however that the respective 2D slice is correct:

[bruno@frozenstar CarpetWaveToyCheckpointTest]$ h5ls wavetoy::scalarevolve.xy.h5 
Parameters\ and\ Global\ Attributes Group
WAVETOY::phi\ it=384\ tl=0\ rl=1 Dataset {41, 41}
WAVETOY::phi\ it=384\ tl=0\ rl=2 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=3 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=4 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=5 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=6 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=7 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=8 Dataset {33, 33}

384 is a multiple of 128 and as you can see all active refinement levels for
that iteration were written down.

Any idea on how to fix this?

Thanks!

Attachments (1)

CarpetWaveToyCheckpointTest.par (3.2 KB) - added by bmundim 6 years ago.

Download all attachments as: .zip

Change History (30)

Changed 6 years ago by bmundim

comment:1 Changed 6 years ago by Frank Löffler

What is the time of your checkpoint?

What I notice is that 390-262=128, which is what you specified as IOHDF5::out3D_every = 128. If you happen to restart at an iteration which isn't a multiple of 128 you will continue to get output every 128 timesteps, from the time of restart, not from 0. That is unless you set out_criterion to "divisor" which is probably what you want, but isn't the default.

comment:2 Changed 6 years ago by Erik Schnetter

The definition of "out_every" is slightly different than people may expect. Originally, it meant "output whenever the iteration number is an integer multiple of this number". This then led to large complaints as this particular iteration number may not exist on any grid, which results in no output at all. (For example, out_every=100 leads to very rare output if e.g. the finest level takes steps in multiples of 32.)

To remedy this, Carpet now keeps track of when the last output occurred, and outputs anew if at least out_every iterations passed since then. In the above case, output would occur every 128 iterations, since 128 is the smallest multiple of 32 that is larger than or equal to 100. This is safe, but of course, people now don't like this because output may occur at unpredictable times...

This is where the "divisor" output enters the game.

comment:3 Changed 6 years ago by Ian Hinder

To remedy this, Carpet now keeps track of when the last output occurred, and outputs anew if at least out_every iterations passed since then.

Do you mean the last output on a given refinement level? I think the original behaviour (if I'm understanding correctly) is a lot simpler and easier to understand. I don't expect to get output from refinement levels that don't exist when I ask for the output.

comment:4 Changed 6 years ago by Erik Schnetter

No, the last output iteration is stored globally, and at each iteration, output occurs for each refinement level that is active at this time. Otherwise, you would receive too much coarse grid output, and coarse grid output would occur at times when finer grids are not output. The behaviour has nothing to do with refinement levels.

If you specify a reasonable out_every, the behaviour is what you expect. If you specify an unreasonable out_every, Carpet will round up to the next reasonable value. The divisor method will instead disable output. Both choices are easy to understand.

comment:5 Changed 6 years ago by bmundim

I see. I understand the problem better now but I have to say that I find the so stated behavior of
out_every extremely misleading. Take for example the definition at IOUtil of out_every:

INT out_every "How often to do output by default" STEERABLE = ALWAYS
{
   1:* :: "Every so many iterations"
  -1:0 :: "Disable output"
} -1

that was what I took for granted! Every 128 iterations starting from 0 an output was supposed to be
dumped in the example above! If we are using Carpet then we need to understand a bit of AMR and how
Carpet deals with time index. Once we understand that then it is easy to choose the number of iterations
such that every other coarse (or half coarse, as I have chosen in this example) time step is dumped. The
way it seems to be implemented is very misleading, unfortunately.

To remedy this, Carpet now keeps track of when the last output occurred, and outputs anew if at least out_every iterations passed since then. In the above case, output would occur every 128 iterations, since 128 is the smallest multiple of 32 that is larger than or equal to 100. This is safe, but of course, people now don't like this because output may occur at unpredictable times...

I don't think Carpet is tracking correctly when the last 3D output occurred from the checkpoint file
(unless it is considering the checkpoint iteration as the last 3D output, what doesn't look right to
me). In the example above Carpet should know that the last 3D output was at it=256, before the
checkpoint iteration. However Carpet doesn't seem to recover that information from the checkpoint file
correctly. It seems to consider it=262 instead as the last output and then pick up the new series of
output from that iteration number. That doesn't seem correct to me at all. Also the 2D slice does work
correctly as posted above, ie it does pick it=256 as the last output iteration number.

In any case, adding the following to the par file avoids this problem:

IOHDF5::out3D_criterion = "divisor"

thanks!

comment:6 Changed 6 years ago by Erik Schnetter

I am happy to change the behaviour again, if only since this simplifies the implementation significantly.

Of course, people specifying "out_every = 1000" will then basically never receive any output. It can be quite frustrating if one is learning AMR, runs a simulation for a few days, and receives no output, and no explanation whatsoever.

The variables storing the last output iteration are checkpointed and recovered. I just see that these may accidentally be re-initialised after recovery -- moving CarpetIOASCIIInit from basegrid to initial may help.

comment:7 Changed 6 years ago by Ian Hinder

Then that user learns a valuable lesson about looking at their data as the simulation progresses! Seriously though, what about if Carpet detects that no output will ever appear, and gives a fatal error? There might be the complication that the number of refinement levels is not fixed.

comment:8 Changed 6 years ago by Frank Löffler

1:* :: "Every so many iterations"

It does what it says it does: it outputs every so many iterations. It doesn't say "starting from 0". Having said that I agree that "from 0" is what probably most want. Checkpointing shouldn't change the times at which regular output is written, at least not by default.

Would it be easy to produce a warning in case someone specifies to have output at a given iteration but that actually doesn't exist? I would consider that to be an error in the parameter file.

comment:9 Changed 6 years ago by bmundim

Erik: The variables storing the last output iteration are checkpointed and recovered. I just see that these may accidentally be re-initialised after recovery -- moving CarpetIOASCIIInit from basegrid to initial may help.

I think this is the source of the problem. If it was working correctly I wouldn't have ever seen
this "error" since I wouldn't have ever set something like "out_every = 1000" so Carpet would have never
needed to adjust its output to a non-empty one...

Cheers!

comment:10 Changed 6 years ago by Frank Löffler

Should this be backported to Lovelace as well? I would vote yes in case this was really the problem.

comment:11 Changed 6 years ago by Erik Schnetter

I propose the following patch to Carpet to correctly remember output iterations after recovering:

diff -r f0dc71726af0 Carpet/CarpetIOASCII/schedule.ccl
--- a/Carpet/CarpetIOASCII/schedule.ccl	Wed Aug 01 11:41:28 2012 -0700
+++ b/Carpet/CarpetIOASCII/schedule.ccl	Tue Aug 21 12:20:55 2012 -0400
@@ -8,7 +8,7 @@
   OPTIONS: global
 } "Startup routine"
 
-SCHEDULE CarpetIOASCIIInit AT basegrid
+SCHEDULE CarpetIOASCIIInit AT initial
 {
   LANG: C
   OPTIONS: global
diff -r f0dc71726af0 Carpet/CarpetIOBasic/schedule.ccl
--- a/Carpet/CarpetIOBasic/schedule.ccl	Wed Aug 01 11:41:28 2012 -0700
+++ b/Carpet/CarpetIOBasic/schedule.ccl	Tue Aug 21 12:20:55 2012 -0400
@@ -8,7 +8,7 @@
   OPTIONS: global
 } "Startup routine"
 
-schedule CarpetIOBasicInit at BASEGRID
+schedule CarpetIOBasicInit at INITIAL
 {
   LANG: C
   OPTIONS: global
diff -r f0dc71726af0 Carpet/CarpetIOHDF5/schedule.ccl
--- a/Carpet/CarpetIOHDF5/schedule.ccl	Wed Aug 01 11:41:28 2012 -0700
+++ b/Carpet/CarpetIOHDF5/schedule.ccl	Tue Aug 21 12:20:55 2012 -0400
@@ -8,7 +8,7 @@
   LANG: C
 } "Startup routine"
 
-schedule CarpetIOHDF5_Init at BASEGRID
+schedule CarpetIOHDF5_Init at INITIAL
 {
   LANG: C
   OPTIONS: global
diff -r f0dc71726af0 Carpet/CarpetIOScalar/schedule.ccl
--- a/Carpet/CarpetIOScalar/schedule.ccl	Wed Aug 01 11:41:28 2012 -0700
+++ b/Carpet/CarpetIOScalar/schedule.ccl	Tue Aug 21 12:20:55 2012 -0400
@@ -8,7 +8,7 @@
   OPTIONS: global
 } "Startup routine"
 
-schedule CarpetIOScalarInit at BASEGRID
+schedule CarpetIOScalarInit at INITIAL
 {
   LANG: C
   OPTIONS: global

comment:12 Changed 6 years ago by Erik Schnetter

Seriously though, what about if Carpet detects that no output will ever appear, and gives a fatal error? There might be the complication that the number of refinement levels is not fixed.


Would it be easy to produce a warning in case someone specifies to have output at a given iteration but that actually doesn't exist? I would consider that to be an error in the parameter file.

The number of refinement levels can change during evolution. At for each given number of refinement levels, each "out_every" value will eventually output something, it may just take a very long time.

comment:13 Changed 6 years ago by bmundim

The number of refinement levels can change during evolution. At for each given number of refinement levels, each "out_every" value will eventually output something, it may just take a very long time.

yes, but if the user is aware of the time indexing she may choose to output as multiples of the coarsest
levels. Chances are that we always have at least two or three coarse levels even if others disappear due
to other criteria.

Regarding the patch proposed, I tested for CarpetIOHDF5 and surprisingly it didn't solve the problem.
The 3D output still restarts from the last checkpoint iteration, it=262 in this example. I tried to
find out what were the values of last_output_iteration_slice[d] in the checkpoint file but it didn't
succeed:

h5ls checkpoint.chkpt.it_228.h5

...
CARPETIOHDF5::last_output_iteration_slice[0]\ it=228\ tl=0 Dataset {1}
CARPETIOHDF5::last_output_iteration_slice[1]\ it=228\ tl=0 Dataset {1}
CARPETIOHDF5::last_output_iteration_slice[2]\ it=228\ tl=0 Dataset {1}
CARPETIOHDF5::last_output_time_slice[0]\ it=228\ tl=0 Dataset {1}
CARPETIOHDF5::last_output_time_slice[1]\ it=228\ tl=0 Dataset {1}
CARPETIOHDF5::last_output_time_slice[2]\ it=228\ tl=0 Dataset {1}
CARPETIOHDF5::next_output_iteration\ it=228\ tl=0 Dataset {1}
CARPETIOHDF5::next_output_time\ it=228\ tl=0 Dataset {1}
CARPETIOHDF5::this_iteration\ it=228\ tl=0 Dataset {1}
CARPETIOHDF5::this_iteration_slice[0]\ it=228\ tl=0 Dataset {1}
CARPETIOHDF5::this_iteration_slice[1]\ it=228\ tl=0 Dataset {1}
CARPETIOHDF5::this_iteration_slice[2]\ it=228\ tl=0 Dataset {1}
...

and if I try to dump the last_output_iteration_slice I get the following error:

 h5dump -d "CARPETIOHDF5::last_output_iteration_slice[2] it=228 tl=0" checkpoint.chkpt.it_228.h5
HDF5 "checkpoint.chkpt.it_228.h5" {
DATASET "CARPETIOHDF5::last_output_iteration_slice" {
   h5dump error: unable to open dataset "CARPETIOHDF5::last_output_iteration_slice"
}
}

So I can't actually see what is the value of CARPETIOHDF5::last_output_iteration_slice[2] in the
checkpoint file, but wouldn't it have been set by any chance to the checkpoint *this_iteration
parameter? That seems to be the case. Also be aware that the checkpoint iteration number changed in the
example from 262 to 228 due to the new test I ran, but the issue is still open for a solution.

comment:14 Changed 6 years ago by Erik Schnetter

Can you try "h5dump" without the "-d" option? Maybe some of the special characters (colons, spaces, brackets) are not understood correctly. Or maybe using single quotes instead of double quotes will help.

comment:15 Changed 6 years ago by bmundim

Both suggestions didn't work, unfortunately.

comment:16 Changed 6 years ago by Erik Schnetter

Can you send me the checkpoint file in some way?

comment:17 Changed 6 years ago by bmundim

My checkpoint tarball file is in the following link:

http://ccrg.rit.edu/~mundim/checkpoint.chkpt.it_246.h5.tar.gz

thanks,
Bruno.

comment:18 Changed 6 years ago by Erik Schnetter

CARPETIOHDF5::last_output_iteration_slice[2] is 128 in the checkpoint file. A plain "h5dump checkpoint.chkpt.it_246.h5" outputs the content of the variable just fine for me.

With my patch from above (changing the scheduling of the *_Init routines), this variable should retain this value. With out_every=128, the next output should occur at iteration 256.

I am very confused about the iteration numbers. You mention 224, the checkpoint file has 246, and you say the next output iteration is 262? 262 would seem to be correct, since 262>=128+128, assuming that you have out_every=128. Can you repeat the test and list all the details (iteration numbers, out_every settings, etc.)?

comment:19 Changed 6 years ago by bmundim

The simulation stops by runtime, that it is set to 0.1min or 6.0s. Whenever I rerun the test I end up
with a slightly different iteration number for when the checkpoint happens. I suspect that my job competition for resources in my machine leads to this small differences for such a short run.

In any case I reran the test. I got the following:

000/CarpetWaveToyCheckpointTest:

h5ls wavetoy::scalarevolve.xyz.h5
Parameters\ and\ Global\ Attributes Group
WAVETOY::phi\ it=0\ tl=0\ rl=0 Dataset {43, 43, 43}
WAVETOY::phi\ it=0\ tl=0\ rl=1 Dataset {41, 41, 41}
WAVETOY::phi\ it=0\ tl=0\ rl=2 Dataset {33, 33, 33}
WAVETOY::phi\ it=0\ tl=0\ rl=3 Dataset {33, 33, 33}
WAVETOY::phi\ it=0\ tl=0\ rl=4 Dataset {33, 33, 33}
WAVETOY::phi\ it=0\ tl=0\ rl=5 Dataset {33, 33, 33}
WAVETOY::phi\ it=0\ tl=0\ rl=6 Dataset {33, 33, 33}
WAVETOY::phi\ it=0\ tl=0\ rl=7 Dataset {33, 33, 33}
WAVETOY::phi\ it=0\ tl=0\ rl=8 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=1 Dataset {41, 41, 41}
WAVETOY::phi\ it=128\ tl=0\ rl=2 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=3 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=4 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=5 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=6 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=7 Dataset {33, 33, 33}
WAVETOY::phi\ it=128\ tl=0\ rl=8 Dataset {33, 33, 33}

the first simulation dumped the last 3d iteration at it=128, as asked in the par file.

CheckPoints:
ls
checkpoint.chkpt.it_252.h5  checkpoint.chkpt.it_506.h5

h5ls checkpoint.chkpt.it_252.h5

...
CARPETIOHDF5::last_output_iteration_slice[0]\ it=252\ tl=0 Dataset {1}
CARPETIOHDF5::last_output_iteration_slice[1]\ it=252\ tl=0 Dataset {1}
CARPETIOHDF5::last_output_iteration_slice[2]\ it=252\ tl=0 Dataset {1}
CARPETIOHDF5::last_output_time_slice[0]\ it=252\ tl=0 Dataset {1}
CARPETIOHDF5::last_output_time_slice[1]\ it=252\ tl=0 Dataset {1}
CARPETIOHDF5::last_output_time_slice[2]\ it=252\ tl=0 Dataset {1}
CARPETIOHDF5::next_output_iteration\ it=252\ tl=0 Dataset {1}
CARPETIOHDF5::next_output_time\ it=252\ tl=0 Dataset {1}
CARPETIOHDF5::this_iteration\ it=252\ tl=0 Dataset {1}
CARPETIOHDF5::this_iteration_slice[0]\ it=252\ tl=0 Dataset {1}
CARPETIOHDF5::this_iteration_slice[1]\ it=252\ tl=0 Dataset {1}
CARPETIOHDF5::this_iteration_slice[2]\ it=252\ tl=0 Dataset {1}
...

and

h5dump checkpoint.chkpt.it_252.h5 | grep -A 80 'CARPETIOHDF5::this_iteration_slice'
   DATASET "CARPETIOHDF5::this_iteration_slice[0] it=252 tl=0" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): 0
      }
      ATTRIBUTE "carpet_mglevel" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "group_timelevel" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "ioffset" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "ioffsetdenom" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 1
         }
      }
      ATTRIBUTE "iorigin" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "level" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "name" {
         DATATYPE  H5T_STRING {
               STRSIZE 37;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
         DATASPACE  SCALAR
         DATA {
         (0): "CARPETIOHDF5::this_iteration_slice[0]"
         }
      }
      ATTRIBUTE "time" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SCALAR
         DATA {
         (0): 0.125
         }
      }
      ATTRIBUTE "timestep" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 252
         }
      }
   }
   DATASET "CARPETIOHDF5::this_iteration_slice[1] it=252 tl=0" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): 0
      }
      ATTRIBUTE "carpet_mglevel" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "group_timelevel" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "ioffset" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "ioffsetdenom" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 1
         }
      }
      ATTRIBUTE "iorigin" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "level" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "name" {
         DATATYPE  H5T_STRING {
               STRSIZE 37;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
         DATASPACE  SCALAR
         DATA {
         (0): "CARPETIOHDF5::this_iteration_slice[1]"
         }
      }
      ATTRIBUTE "time" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SCALAR
         DATA {
         (0): 0.125
         }
      }
      ATTRIBUTE "timestep" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 252
         }
      }
   }
   DATASET "CARPETIOHDF5::this_iteration_slice[2] it=252 tl=0" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): 128
      }
      ATTRIBUTE "carpet_mglevel" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "group_timelevel" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "ioffset" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "ioffsetdenom" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 1
         }
      }
      ATTRIBUTE "iorigin" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "level" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "name" {
         DATATYPE  H5T_STRING {
               STRSIZE 37;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
         DATASPACE  SCALAR
         DATA {
         (0): "CARPETIOHDF5::this_iteration_slice[2]"
         }
      }
      ATTRIBUTE "time" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SCALAR
         DATA {
         (0): 0.125
         }
      }
      ATTRIBUTE "timestep" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 252
         }
      }
   }

from this checkpoint data I can see that this_iteration_slice is set to 0, 0 and 128 for 1D, 2D and 3D,
respectively. The last_output_iteration_slice seems to have the same values as this_iteration_slice:

h5dump checkpoint.chkpt.it_252.h5 | grep -A 80 'CARPETIOHDF5::last_output_iteration_slice'
   DATASET "CARPETIOHDF5::last_output_iteration_slice[0] it=252 tl=0" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): 0
      }
      ATTRIBUTE "carpet_mglevel" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "group_timelevel" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "ioffset" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "ioffsetdenom" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 1
         }
      }
      ATTRIBUTE "iorigin" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "level" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "name" {
         DATATYPE  H5T_STRING {
               STRSIZE 44;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
         DATASPACE  SCALAR
         DATA {
         (0): "CARPETIOHDF5::last_output_iteration_slice[0]"
         }
      }
      ATTRIBUTE "time" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SCALAR
         DATA {
         (0): 0.125
         }
      }
      ATTRIBUTE "timestep" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 252
         }
      }
   }
   DATASET "CARPETIOHDF5::last_output_iteration_slice[1] it=252 tl=0" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): 0
      }
      ATTRIBUTE "carpet_mglevel" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "group_timelevel" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "ioffset" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "ioffsetdenom" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 1
         }
      }
      ATTRIBUTE "iorigin" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "level" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "name" {
         DATATYPE  H5T_STRING {
               STRSIZE 44;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
         DATASPACE  SCALAR
         DATA {
         (0): "CARPETIOHDF5::last_output_iteration_slice[1]"
         }
      }
      ATTRIBUTE "time" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SCALAR
         DATA {
         (0): 0.125
         }
      }
      ATTRIBUTE "timestep" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 252
         }
      }
   }
   DATASET "CARPETIOHDF5::last_output_iteration_slice[2] it=252 tl=0" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): 128
      }
      ATTRIBUTE "carpet_mglevel" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "group_timelevel" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "ioffset" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "ioffsetdenom" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 1
         }
      }
      ATTRIBUTE "iorigin" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "level" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
      ATTRIBUTE "name" {
         DATATYPE  H5T_STRING {
               STRSIZE 44;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
         DATASPACE  SCALAR
         DATA {
         (0): "CARPETIOHDF5::last_output_iteration_slice[2]"
         }
      }
      ATTRIBUTE "time" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SCALAR
         DATA {
         (0): 0.125
         }
      }
      ATTRIBUTE "timestep" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 252
         }
      }
   }

which is odd. In any case, at least the 3d slice seems to correctly indicate its last output
(while 2d and 1d don't). Now if we move to directory 001 output we have:

001/CarpetWaveToyCheckpointTest:

h5ls wavetoy::scalarevolve.xyz.h5
Parameters\ and\ Global\ Attributes Group
WAVETOY::phi\ it=252\ tl=0\ rl=6 Dataset {33, 33, 33}
WAVETOY::phi\ it=252\ tl=0\ rl=7 Dataset {33, 33, 33}
WAVETOY::phi\ it=252\ tl=0\ rl=8 Dataset {33, 33, 33}
WAVETOY::phi\ it=380\ tl=0\ rl=6 Dataset {33, 33, 33}
WAVETOY::phi\ it=380\ tl=0\ rl=7 Dataset {33, 33, 33}
WAVETOY::phi\ it=380\ tl=0\ rl=8 Dataset {33, 33, 33}

h5ls wavetoy::scalarevolve.xy.h5
Parameters\ and\ Global\ Attributes Group
WAVETOY::phi\ it=256\ tl=0\ rl=0 Dataset {43, 43}
WAVETOY::phi\ it=256\ tl=0\ rl=1 Dataset {41, 41}
WAVETOY::phi\ it=256\ tl=0\ rl=2 Dataset {33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=3 Dataset {33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=4 Dataset {33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=5 Dataset {33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=6 Dataset {33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=7 Dataset {33, 33}
WAVETOY::phi\ it=256\ tl=0\ rl=8 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=1 Dataset {41, 41}
WAVETOY::phi\ it=384\ tl=0\ rl=2 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=3 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=4 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=5 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=6 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=7 Dataset {33, 33}
WAVETOY::phi\ it=384\ tl=0\ rl=8 Dataset {33, 33}

that is: the file containing 2d slices starts correctly at it=256 while the one with
3d slices starts at the same iteration as the checkpoint iteration, it=252, which is
clearly wrong. I hope it is clearer now.

Thanks a lot!

comment:20 Changed 6 years ago by Erik Schnetter

Rookie mistake. The grid scalars were declared [3], and then accessed via [outdim] where outdim=3. That doesn't work, and C++ provides extra rope.

comment:21 Changed 6 years ago by Erik Schnetter

I committed a correction for this error, and also the patch I suggested above.

comment:22 Changed 6 years ago by Ian Hinder

Would this have been caught by compiling with bounds checking, or does the declaration effectively happen at run-time in a way which cannot be caught?

comment:23 Changed 6 years ago by Erik Schnetter

Unfortunately, C++ does not offer bounds checking.

I don't recall at the moment how the corresponding Fortran declaration looks like; I assume the array sizes is known and declared to Fortran.

comment:24 Changed 6 years ago by bmundim

Fantastic! problem solved! Do you mind to port back to Lovelace the last two Carpet commits?

Thanks a lot!

comment:25 Changed 6 years ago by bmundim

Just for the record, after contacting the hdf group regarding h5dump and data set names with brackets
on it I was told that the option --no-compact-subset available in the release 1.8.9 prevents h5dump
from subsetting, essentially what the brackets force it to do. So in one of the examples above we could
use the option as follows:

h5dump --no-compact-subset -d 'CARPETIOHDF5::last_output_iteration_slice[2] it=228 tl=0' checkpoint.chkpt.it_228.h5

comment:26 Changed 6 years ago by Roland Haas

Porting back is fine with me.

comment:27 Changed 6 years ago by Roland Haas

Any objections to porting this back into Lovelace? If not I will do so sometime after Thursday.

comment:28 Changed 6 years ago by Frank Löffler

No objections. Please also send a short message after the commit to the users mailing list to that users are aware of the update.

comment:29 Changed 6 years ago by Roland Haas

Resolution: fixed
Status: newclosed

Applied as Carpet hashes 32a217e2f8cc "CarpetIO*: Do not overwrite last_output_iteration after recovery"
and 82e1c4b08aa7 "CarpetIOHDF5: Increase array size for last_output_iteration_slice" of Carpet/Lovelace.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Erik Schnetter.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.