Reduce number of output files per directory

Roland Haas

removed comment

Could you outline the types of hierarchy that are considered? Something like:

hydrobase--rho/
hydrobase--rho.file_0.h5
hydrobase--rho.file_1.h5
hydrobase--eps/
hydrobase--eps.file_0.h5
hydrobase--eps.file_1.h5
TimerReport/
TimerReport.00000.txt
TimerReport.00001.txt

or

Proc0/
hydrobase--rho.file.0.h5
hydrobase--eps.file_0.h5
TimerReport.00000.txt
Proc1/
hydrobase--rho.file.1.h5
hydrobase--eps.file_1.h5
TimerReport.00001.txt

or something completely different?

2015-02-09T11:26:05+00:00

Erik Schnetter reporter

removed comment

My idea was to group things my process id, since this reduces I/O traffic: each directory would only be accessed from a subset of all processes. Whether a process writes one or multiple files into a directory is likely a secondary effect due to caching.

Having one directory per process may not be enough, since running on 10k processes then still creates 10k sub-directories in one directory. I was thinking of a hierarchical approach:

proc00nnnn/proc0000nn/files...
proc00nnnn/proc0001nn/files...
...
proc01nnnn/proc0000nn/files...

where proc00nnnn is used by the first 10k processes, proc0000nn is used by the first 100 processes, etc.

2015-02-09T16:42:37+00:00

Roland Haas

removed comment

As long as this grouping is off by default and can be turned on via a parameter: fine by me. How would we handle thorns that themselves look at IO::out_dir? Would we actually have

out_dir/proc00nnnn/proc0000nn/out_dir/files...

and have IOUtil create the directories and chdir() into out_dir/proc00nnnn/proc0000nn at the beginning of the run? This would require no (I think) change to current code that uses out_dir but leads to an awkward directory layout. An alternative would be to steer out_dir to a different value on each process. Finally one could try and use piraha's advanced parameter setting options to make out_dir assume different values on different processors, something like:

IO::out_dir = "proc"+str(int($MyProc / 10000))+"/proc"+str($MyProc)

which would require the predefined variable MyProc in piraha (same as $parfile) and would not (as is) provide zero padding for the process numbers. This is similar to steering the parameter.

Finally there are options in IOUtil to define output groups which are not designed for exactly this purpose but which currently can be used to have multiple processes write into a single common hdf5 file (but do nothing for ASCII output eg in TimerReport). They may be slow though since they involve a serialization of output inside each output group.

2015-02-10T03:33:42+00:00

Frank Löffler

removed comment

Different out_dirs for different processes might have the problem that code might assume it needs to, e.g. create a file or directory only on the root process to be present on all (after some sync time). On the other hand: such code is then likely to only write from the root process anyway (but not necessarily).

2015-02-10T11:46:26+00:00

Erik Schnetter reporter

removed comment

This is now https://bitbucket.org/cactuscode/cactusbase/pull-request/2/ioutil-new-parameter/diff.

This is an incompatible change, since IOUtil_AssembleFilename now takes an additional parameter describing the total number of I/O processes. There are branches for CactusPUGHIO and Carpet to take this into account.

Recovery is not tested yet.

2015-02-12T16:36:55+00:00

Erik Schnetter reporter

changed status to open
removed comment

... And recovery is working now. Added a test case.

2015-02-12T21:49:46+00:00

Frank Löffler

removed comment

Just looking at the diff:

Why is the case '1' excluded fr om the range for processes_per_directory?
lines 234 and 235 contain commented code. Is there a reason these should stay?
The patch uses strcpy/strcat; it should use strncpy/strncat instead. The assert about the length before isn't enough; asserts can be no-ops, and they wouldn't silence warnings as well.

Style: at first I was put off by the use of a space between a function name and the opening parenthesis, which is allowed, but (at least to me) unusual for C/C++ code. I later realized that at least in these files this usage seems to be common. Well - better to stick to one style than to have none.

(I usually follow: Control statements should have one space between the control keyword and opening parenthesis, to distinguish them from function calls.)

2015-02-12T22:47:25+00:00

Roland Haas

removed comment

I have two suggestions: * instead of snprintf, strcat, strcpy and Util_snprintf it would be easier to use Util_asprintf which allocates memory for the target string internally * to allow thorns to compile with both old and new IOUtils version, if we want to keep the extra nioprocs argument, we should follow other place in IOUtils #define a constant IOUTIL_FILENAME_HAS_NIOPROCS such that user code can test for the new API * do we actually need the extra argument? The function already takes a cGH and IOUtil's GH extension contains an nioprocs field already. I am also not sure about how file_nioprocs is handled. If it is the same as the GH extensions nioprocs then the assert() in line 288 is incorrect since it assumes that the first niprocs processes do IO, which is incorrect, see the logic in IOUtils SetupGH for how the ioprocs are spread out. If file_niprocs is not the same as the GH extensions nioprocs then a different variable name may avoid confusion.

Is it known if this deals with the majority of the output files? There are also per-process files written by CarpetLib, Carpet (maybe?) and TimerReport.

2015-02-13T01:47:19+00:00

Erik Schnetter reporter

removed comment

If you don't like the Cactus formatting style -- please update it. I don't like it either.

I would go as far as suggesting that we just run clang-format on the source tree, and accept the output. Bikeshedding is easily possible via a file .clang-format, where one can set hundreds of options.

Personally, I've stopped worrying about source code formatting. I'm running clang-format after saving a file, accepting the default options, and the result is just fine. Makes editing actually much faster -- no more worrying about indenting, line lengths, spaces, or line breaks.

2015-02-13T08:14:52+00:00

Erik Schnetter reporter

removed comment

Replying to [comment:7 knarf]:

Just looking at the diff:

Why is the case '1' excluded fr om the range for processes_per_directory?

This is a special case, since ilog(0) is not defined. Put differently: We need to create at least 0 subdirectories, numbers smaller than 0 don't make sense.

lines 234 and 235 contain commented code. Is there a reason these should stay?

Leftovers from pre-C99 code -- thanks.

The patch uses strcpy/strcat; it should use strncpy/strncat instead. The assert about the length before isn't enough; asserts can be no-ops, and they wouldn't silence warnings as well.

That's a common misconception. The semantics of strncpy and strncat are not what people think. The length argument of strncat isn't the available buffer size, and strncpy does not always append a NUL character.

Style: at first I was put off by the use of a space between a function name and the opening parenthesis, which is allowed, but (at least to me) unusual for C/C++ code. I later realized that at least in these files this usage seems to be common. Well - better to stick to one style than to have none.

(I usually follow: Control statements should have one space between the control keyword and opening parenthesis, to distinguish them from function calls.)

2015-02-13T08:20:31+00:00

Erik Schnetter reporter

removed comment

Replying to [comment:8 rhaas]:

I have two suggestions: * instead of snprintf, strcat, strcpy and Util_snprintf it would be easier to use Util_asprintf which allocates memory for the target string internally

Ok.

to allow thorns to compile with both old and new IOUtils version, if we want to keep the extra nioprocs argument, we should follow other place in IOUtils #define a constant IOUTIL_FILENAME_HAS_NIOPROCS such that user code can test for the new API

I think that's too much trouble. This routine is called from two places (PUGHIO and Carpet), and we can just update them.

do we actually need the extra argument? The function already takes a cGH and IOUtil's GH extension contains an nioprocs field already. I am also not sure about how file_nioprocs is handled. If it is the same as the GH extensions nioprocs then the assert() in line 288 is incorrect since it assumes that the first niprocs processes do IO, which is incorrect, see the logic in IOUtils SetupGH for how the ioprocs are spread out. If file_niprocs is not the same as the GH extensions nioprocs then a different variable name may avoid confusion.

I'll check about the GH extension, and about the meaning of file_ioproc.

Is it known if this deals with the majority of the output files? There are also per-process files written by CarpetLib, Carpet (maybe?) and TimerReport.

This captures checkpoint files only. Other output does not call this function to generate the file name. To make this work, I would rather change Carpet to call this function, instead of duplicating the logic there. It may be that there's some functionality missing.

2015-02-13T08:25:07+00:00

Frank Löffler

removed comment

This is a special case, since ilog(0) is not defined.

We could define it. The "Number of processes that can access the same directory" being 1 doesn't seem to be unreasonable to me.

The patch uses strcpy/strcat; it should use strncpy/strncat instead. The assert about the length before isn't enough; asserts can be no-ops, and they wouldn't silence warnings as well.

That's a common misconception. The semantics of strncpy and strncat are not what people think. The length argument of strncat isn't the available buffer size, and strncpy does not always append a NUL character.

I do understand that a wrong usage of both strncat and strncpy can lead to problems much like the usage of strcat and strcpy. I don't understand why that should prevent us from using them correctly. Right now, with asserts possibly doing nothing, both strcat and strcpy could write into memory they shouldn't touch.

Style: I really didn't intend to complain about the style, and even if so, certainly not to you. I am sorry if it sounded like that. On that topic: I didn't find this particular topic in the Coding Style guide of Cactus (maintguide), and spaces between function names and the opening parenthesis are not consistently used (or not) even within the flesh. We could add this to the style guide and probably should, even if that alone wouldn't improve the source magically. And yes, I agree: that style guide needs updating. But that would be another ticket.

2015-02-13T08:48:08+00:00

Erik Schnetter reporter

removed comment

Replying to [comment:12 knarf]:

This is a special case, since ilog(0) is not defined.

We could define it. The "Number of processes that can access the same directory" being 1 doesn't seem to be unreasonable to me.

It's "log of the number of processes minus one", and "log of zero" is undefined. Running on one process is truly a special case.

The fact that you cannot have just one file per directory is different. The subdirectories form a tree; if each tree branch has just 1 child, then the tree would be infinite. That's a limit on the use parameter processes_per_directory.

The patch uses strcpy/strcat; it should use strncpy/strncat instead. The assert about the length before isn't enough; asserts can be no-ops, and they wouldn't silence warnings as well.

That's a common misconception. The semantics of strncpy and strncat are not what people think. The length argument of strncat isn't the available buffer size, and strncpy does not always append a NUL character.

I do understand that a wrong usage of both strncat and strncpy can lead to problems much like the usage of strcat and strcpy. I don't understand why that should prevent us from using them correctly. Right now, with asserts possibly doing nothing, both strcat and strcpy could write into memory they shouldn't touch.

I refuse to do that. I'll convert the file to C++ instead.

Style: I really didn't intend to complain about the style, and even if so, certainly not to you. I am sorry if it sounded like that. On that topic: I didn't find this particular topic in the Coding Style guide of Cactus (maintguide), and spaces between function names and the opening parenthesis are not consistently used (or not) even within the flesh. We could add this to the style guide and probably should, even if that alone wouldn't improve the source magically. And yes, I agree: that style guide needs updating. But that would be another ticket.

Sure. I'm actually with you on this -- I don't like the style, and I'm serious about changing the style with an automated tool.

2015-02-13T09:47:25+00:00

Frank Löffler

removed comment

Replying to [comment:13 eschnett]:

Replying to [comment:12 knarf]:

This is a special case, since ilog(0) is not defined.

We could define it. The "Number of processes that can access the same directory" being 1 doesn't seem to be unreasonable to me.

It's "log of the number of processes minus one", and "log of zero" is undefined. Running on one process is truly a special case.

I meant 'we can (re)define what ilog(0) is, different from the definition for values >0. If I read the comment for the parameter correctly, restricting the number of processes that can access the same directory to 1 isn't something so special. Each process would write into it's own directory. That is different from running on one process.

The fact that you cannot have just one file per directory is different. The subdirectories form a tree; if each tree branch has just 1 child, then the tree would be infinite. That's a limit on the use parameter processes_per_directory.

I am afraid, but I think I don't understand the meaning of the parameter. I assumed: For a value of '1' I would expect each process to write into a different directory. For a value of '2' I would expect 2 processes to share one directory, and so on. A value of '0' would be special, as all processes could write to the same directory in that case. Isn't that how I should interpret the parameter?

I do understand that a wrong usage of both strncat and strncpy can lead to problems much like the usage of strcat and strcpy. I don't understand why that should prevent us from using them correctly. Right now, with asserts possibly doing nothing, both strcat and strcpy could write into memory they shouldn't touch.

I refuse to do that. I'll convert the file to C++ instead.

That would be fine too, of course. string handling in C is painful even if done correctly.

Sure. I'm actually with you on this -- I don't like the style, and I'm serious about changing the style with an automated tool.

I don't have enough experience with tools like this to have an informed opinion. I didn't quite like what a quick try using clang-format did to my source, but that is more my lack of configuration than anything else.

2015-02-13T10:22:40+00:00

Erik Schnetter reporter

removed comment

Replying to [comment:14 knarf]:

Replying to [comment:13 eschnett]:

Replying to [comment:12 knarf]:

This is a special case, since ilog(0) is not defined.

We could define it. The "Number of processes that can access the same directory" being 1 doesn't seem to be unreasonable to me.

It's "log of the number of processes minus one", and "log of zero" is undefined. Running on one process is truly a special case.

I meant 'we can (re)define what ilog(0) is, different from the definition for values >0. If I read the comment for the parameter correctly, restricting the number of processes that can access the same directory to 1 isn't something so special. Each process would write into it's own directory. That is different from running on one process.

The fact that you cannot have just one file per directory is different. The subdirectories form a tree; if each tree branch has just 1 child, then the tree would be infinite. That's a limit on the use parameter processes_per_directory.

I am afraid, but I think I don't understand the meaning of the parameter. I assumed: For a value of '1' I would expect each process to write into a different directory. For a value of '2' I would expect 2 processes to share one directory, and so on. A value of '0' would be special, as all processes could write to the same directory in that case. Isn't that how I should interpret the parameter?

We do not only create one level of subdirectories -- we create a tree of subdirectories, ensuring that each directory hold no more than N entries. That's why we need the log, and not just divide nprocs by the max number of files.

You cannot have a tree with just one entry per branch. That would not be a tree. It's not possible to have just 1 entry per directory.

If you let each process write to its own directory, then you need nprocs directories, instead of the nprocs files that you would normally have. You don't gain anything from this, the directories are still holding too many entries, and Lustre will still crash.

I do understand that a wrong usage of both strncat and strncpy can lead to problems much like the usage of strcat and strcpy. I don't understand why that should prevent us from using them correctly. Right now, with asserts possibly doing nothing, both strcat and strcpy could write into memory they shouldn't touch.

I refuse to do that. I'll convert the file to C++ instead.

That would be fine too, of course. string handling in C is painful even if done correctly.

Sure. I'm actually with you on this -- I don't like the style, and I'm serious about changing the style with an automated tool.

I don't have enough experience with tools like this to have an informed opinion. I didn't quite like what a quick try using clang-format did to my source, but that is more my lack of configuration than anything else.

I assume you didn't like it because you have a very firm opinion on how source code "should" be formatted. The problem is that, in a collaboration, everybody has his or her own firm opinion. What clang-format does is a reasnoable compromise, decided essentially by the LLVM developers who are expert software engineers. I stopped caring about the details of where braces go and how lines are broken, as long as I can read the code, and as long as the style is consistent within a group of source files. clang-format does something very reasonable, although it looks different from how I would format the code. But then, the way you write code looks different from what I would format it. So it doesn't matter. Trust clang-format, let it do its job, give it the benefit of the doubt for a few days, and then enjoy your newfound freedom from worrying about formatting.

I stopped even hitting "enter" when I make a small change to code, let alone worry about spaces, or indentation. I just write the essential characters, and a few seconds later the code is nicely formatted.

2015-02-13T10:44:55+00:00

Ian Hinder

removed comment

A change which makes formatting changes wholesale to the code can cause problems with diffing and merging across that change. While I agree it would be nice to have a consistent coding style, and for a new source file we could enforce it with an automated tool, for existing files it might cause real problems for what is mostly a cosmetic benefit. If the existing code is readable, and unlikely to confuse, I would leave it alone.

2015-02-13T11:01:13+00:00

Frank Löffler

removed comment

Replying to [comment:15 eschnett]:

We do not only create one level of subdirectories -- we create a tree of subdirectories, ensuring that each directory hold no more than N entries.

I see. Then I suggest to change "Number of processes that can access the same directory" to something like "Maximum number of subdirectories per level", or more bloated "Maximum number of subdirectories per directory tree level".

2015-02-13T12:48:48+00:00

Roland Haas

removed comment

I think I would like to get back to this. Though I would try and attempt a different solution that is more likely to capture all output files.

As far as I can tell almost all thorns producing output base the output path on IOUtil::out_dir (the exception being the "real" output thorns in Carpet and PUGH which have their own out_dir options that can optionally be used).

So I would try and make IOUtil set out_dir to the directory in the hierarchy. Unfortunately the parameter is currently steerable on recovery only (I suspect since IOUtil will only attempt to create the directory at simulation startup).

So a way to make this work is to add a special variable or so to piraha that expands to the proper directory hierarchy, so that eg:

IOUtil::out_dir = "$parfile/$MPI_hierarchy"

is expanded to:

IOUtil::out_dir = "parfile-name/1/4/"

on MPI rank 140 to 149 and similar for the others, having groups of 10 in each intermediate directory. If we want more files per output directory we'd use some other factor to bundle up the runs. Eg the equivalent of this perl code:

$size = 120000;
$rank = 78923;
$lumpsize = 100;
@path = ();
# put $lumpsize ranks in the same output directory
$size = int($size / $lumpsize);
$rank = int($rank / $lumpsize); 
while($size) {
  $idx = $rank % $lumpsize;
  $rank = int($rank / $lumpsize);
  $size = int($size / $lumpsize);
  push path,sprintf("%02d",$idx);
}
print join("/", reverse @path),"\n";

We can also decide to not put multiple ranks in the same directory by remove the two lines before the loop.

2016-09-25T10:51:18+00:00

Roland Haas

removed comment

This variant has these downsides (that I can think of right away):

3rd party tools that parse the parfiles need to know how to handle this
it puts output that is produced only on rank 0 (ie global output) into one of the nested directories
it may not work at all depending on how parameters are checkpointed
it is an abuse of the parfile parser for things it was not designed to handle

2016-09-25T11:05:36+00:00

Erik Schnetter reporter

removed comment

In CarpetSimulationIO (my new output thorn), I introduce a parameter that selects the maximum number of output files per directory. Files are then automatically split over sub-directories as necessary. The root process also creates a single file containing the relevant metadata as well as HDF5 external links to the datasets in the files in the subdirectories.

We should have a ET developers' day for SimulationIO. That's probably more accessible than FunHPC as well, thus there might be more interest.

2016-09-30T09:11:43+00:00

Frank Löffler

removed comment

I like both ideas (automatic split using HDF5 links, and the work-day). I am unsure about ASCII files, but given that we usually don't produce as many of them (or shouldn't if we do), I assume they are not really an issue anyway.

2016-09-30T10:05:45+00:00

Roland Haas

removed comment

The extra argument of file_nioproc seems gone, so could you use cGH->nioprocs for this? I added a minor comment on using strcpy/malloc vs strdup (in a C++ file stdup is somewhat safer since there is no issue of accidentally implicitly declaring strdup as returning int due to missing _BSD defines).

This will affect checkpoint files and out_vars 3d HDF5 output but not eg out3d_vars 3d HDF5 output I think (though I may be wrong and the latter may also use AssembleFilename).

This is a step in the right direction but not quite enough I believe since we have other output that is produced with one-file-per-process (all of which is optional eg TimerReport, carept-timing-statistics etc) and that faces the same issues. So it should be applied ideally along with code that makes the other places that produce per-process output use this function as well.

2017-01-16T10:37:37+00:00

Roland Haas

changed status to open
removed comment

2017-01-16T10:37:50+00:00

Comments (23)