Rename Carpet output files

Issue #1338 closed
Erik Schnetter created an issue

Windows does not allow colons in file names. Output files produced by Carpet's should therefore not use colons.

Keyword:

Comments (37)

  1. Frank Löffler
    • removed comment

    This would probably break a lot of post-processing tools. They should be quick to adapt, but there might be a lot of them out there. This would surely not please part of the community. Maybe this could be made a parameter, defaulting to the current behavior on non-Windows platforms?

  2. Erik Schnetter reporter
    • removed comment

    Yes, this would certainly be a parameter, with a default value to be decided conservatively and by the community.

  3. Barry Wardell
    • removed comment

    Colons in filenames are also a very bad idea on Mac OS X. It can handle them to a certain extent, but weird things often happen, with the colon often being automatically translated to and from a slash '/' instead. This is a carryover from Mac OS's pre-UNIX days, where colon was used as a path separator but '/' was allowed in a filename. For example, if you use ls from a terminal you will see the correct filename, e.g. 'carpet::timing..asc' but if you look at the same file in Finder it will appear as 'carpet\/\/timing..asc'. If you try to rename the file in Finder, it gives an error if you put in any colons. You can put in slashes, but they are automatically converted to colons (and shown as slashes in the GUI).

  4. Frank Löffler
    • removed comment

    Macs having a much wider audience in our community compared to Windows, and changing the default for them too would create confusion about what the expected file names of Carpet look like. From what I read the only problem in MacOS is the Finder, which shows colons as slashes but treats them as colons 'under the hood', is this correct? I guess Mac users have to deal with that bug even outside of Carpet (or better, complain to Apple).

    Does the output work on Macs (I guess so, a lot of people use it)? Do script that use colons work? If the display in the Finder is the only problem, is this really such a big problem? Did anyone actually complain or file a bug report with Apple?

  5. Erik Schnetter reporter
    • removed comment

    (1) No one suggested changing the default. Please don't confuse "make things work with Windows" with what is discussed above. (2) If you don't like how colons and slashes are handled by Apple, consider filing a bug report yourself. These things are not there out of neglect -- there's a reason for it, Apple didn't introduce colon-slash changing code by accident. (3) I still think that there should be a way to run Carpet on Windows. If you think that should not be possible, state your argument.

  6. Eloisa Bentivegna
    • removed comment

    Just to add to the discussion: I've also had problems with filenames containing colons in Samba-mounted filesystems. The AEI cluster filesystems can now be mounted on external machines (http://supercomputers.aei.mpg.de/high-performance-computing/samba-lustre-file-system-export-1/samba-lustre-file-system-export), but filenames that contain colons are mangled beyond recognition, making the service unusable for looking at Cactus data. The option to have an alternative would be welcome under this respect too.

  7. Frank Löffler
    • removed comment

    Please don't get angry. I didn't intend to confuse or flame. I already agreed that something needs to be done on Windows, and at the time changing the colon to another character seemed like a good idea - mostly because there are not that many Windows users anyway, which would limit the confusion different file names would have on users. I actually did suggest changing the default, but only on platforms where the 'usual' default doesn't work (only Windows at the time) - meaning to have a platform-dependent default.

    Now, if Macs seems to have problems with colons too, this becomes an entirely different issue - again, mainly because unlike Windows, Macs enjoy a broader usage among the Carpet users. So, if we tackle the problem for Windows users, we might think about Macs at the same time too. Having a different default on Macs and other Unixes doesn't sound like a good idea. If the current default on Macs is really "a very bad idea", then we should probably really think about changing it - for all users, which would solve the Windows problem at the same time.

    This leaves us with two options: a) Only change the default on Windows to something non-colon b) Change the filenames for all users

    Both have advantages and disadvantages. Whether to choose a or b mainly depends on how much of a problem colons are on MacOS, and I am not the right person to be asked that question. So, if you use a Mac and read this: please speak up.

    Speaking about colons and problems: using another character would make life in unix also a bit easier, because colons need to be escaped in a bash-like shell as well. That's not a big problem, especially with auto-complete, but maybe a small incentive.

    If another character is chosen (and at least for Windows that seems likely), it should follow the quite restricted Microsoft rules (e.g., http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx).

  8. Barry Wardell
    • removed comment

    Replying to [comment:4 knarf]:

    Macs having a much wider audience in our community compared to Windows, and changing the default for them too would create confusion about what the expected file names of Carpet look like. From what I read the only problem in MacOS is the Finder, which shows colons as slashes but treats them as colons 'under the hood', is this correct? I guess Mac users have to deal with that bug even outside of Carpet (or better, complain to Apple).

    'Only' Finder means a lot more than you might think. The implementation is at quite a low level so it also implies every open, save, etc. dialog box and probably most other GUIs. As a general rule, anything that uses Apple's APIs uses slashes and anything UNIX uses colons. Yes, it's a bit of a mess but I don't think this is a bug. It is a deliberate choice in how to handle the transition from older versions of Mac OS in the most backwards compatible way possible.

    Does the output work on Macs (I guess so, a lot of people use it)? Do script that use colons work? If the display in the Finder is the only problem, is this really such a big problem? Did anyone actually complain or file a bug report with Apple?

    Yes, it does work in the sense that it doesn't impact me. But I'd imagine it could be very confusing to new users to see the same file with different names depending on the context. For example, in Mathematica you need to use colons with the Import command, but the file browser shows slashes.

    Replying to [comment:7 knarf]:

    If the current default on Macs is really "a very bad idea", then we should probably really think about changing it - for all users, which would solve the Windows problem at the same time.

    I was really just trying to point out that there are other contexts beyond the "it doesn't work with Windows" case where one might want to avoid colons in filenames.

    Both have advantages and disadvantages. Whether to choose a or b mainly depends on how much of a problem colons are on MacOS, and I am not the right person to be asked that question. So, if you use a Mac and read this: please speak up.

    As a general rule I try to avoid colons in filenames as it has caused me headaches in the past. In this particular case it doesn't really bother me since my analysis tools are not affected by the issue. The Windows and Samba problems are much more important reasons.

  9. Roland Haas
    • removed comment

    Beyond Windows and OSX there are also a number of Unix tools that don't like colons (or other "unusual" characters in file names). VisIt is somewhat prominent in that it will interpret a file name with a colon as a remote file and one has to explicitly prefix such files with "localhost:/<PATH-TO-FILE>/" to make it happy. Getting this to work with python scrips and the conn_cmfe operator can be a bit of a a headache.

    Finally (unrelated) using colon in HDF5 dataset names is a bad idea as well since this is not quite in line with the HDF5 docs (http://www.hdfgroup.org/HDF5/doc/UG/10_Datasets.html : "A dataset name is a sequence of alphanumeric ASCII characters") and eg. xdmf is very unhappy about the colons.

    So offering a means to replace offending characters by eg "_" or "-" or "" would be helpful.

  10. Frank Löffler
    • removed comment

    Given that we've now collected enough arguments for a change from colon to something else, I suggest to write one conclusive proposal to do just that, and send it to all users for comments. What is missing for that, besides the summary, is a proposal for a new naming scheme. Assuming we only substitute characters (but we wouldn't be limited by that if there is need), the question is what to substitute the colons with. Common delimiters in filenames, at least as I see it, include the set '-.+' (excluding the single quotes, and not including spaces for obvious reasons). I am not aware than any of these aren't allowed or problematic anywhere. Using one of '-.' has the disadvantage that they might be used either in implementation or variable names, or are already used as delimiters for other things. Using one of them twice in a row might make it a little clearer, but also not necessarily (e.g. for '..', which some Carpet output already produces). This in theory leaves '+', but that is kind of unusual, at least I don't see it used that often. In the end it might help to look at examples. Pick your favorite:

    hydrobase__w_lorentz.maximum.asc
    hydrobase--w_lorentz.maximum.asc
    hydrobase++w_lorentz.maximum.asc
    hydrobase..w_lorentz.maximum.asc
    hydrobase_w_lorentz.maximum.asc
    hydrobase-w_lorentz.maximum.asc
    hydrobase+w_lorentz.maximum.asc
    hydrobase.w_lorentz.maximum.asc
    

    Please don't hesitate to come up with even different examples.

  11. Barry Wardell
    • removed comment

    For nothing more than aesthetic reasons, my vote would be for either '.' or '-'.

  12. Roland Haas
    • removed comment

    Using "." can be confusing for tools that need to split of a "variable" name from the filename and have to rely on the variable name being the first "." separated piece. "-" (or realy "--") used to be what VisIt does. If would not like "+" since it is unusual (see GNU arch and its odd filenames http://web.archive.org/web/20070808210711/www.gnuarch.org/gnuarchwiki/FunkyFileNames), "" might be nice but then again it makes it no longer possibly to separate group name from variable name since "" is valid in Cactus variable names.

  13. Wolfgang Kastaun
    • removed comment

    Why not simply create a folder for each thorn and place the files inside, named just after the variable name ? For example: hydrobase/w_lorentz.h5 hydrobase/press.maximum.asc

  14. Frank Löffler
    • removed comment

    Wolfgang's option sounds also reasonable to me, although is would be an even more drastic change. We could think about providing this as an option if someone cares enough to implement it.

    The only reply on the mailing lists was from Erik himself, preferring ".". Roland's argument against that is that it might be confusing for tools that split the "variable name" part from the filename, but then I would think that these tools could easily be changed to expect one (1) dot within that variable name as well. I personally don't really care whether it is "--", ".." or ".". So, if nobody objects within the next two weeks I suggest to change it to ".".

  15. Roland Haas
    • removed comment

    The problem with teaching tools that one dot might be in the variable name is that they then have to know all possible dotted parts. eg we generate all of these file names depending on how many processes are outputting, whether index files are used and what direction output is used:

    • grid.x.x.file_0.idx.h5
    • grid.x.x.idx.h5
    • grid.x.x.h5
    • grid.x.h5

    with all of x,y,z, xy,xz,yz, xyz for second "x". Things get further complicated by the fact the the old-style 3d hdf5 output does not put any indicator into the file name for what slice type it outputs, so grid.x.h5 could be either grid::x and full 3d output or the x-axis slice of a variable grid.

  16. Ian Hinder
    • removed comment

    Remember also that for multipatch data, the map is specified with an additional "." separator. Whatever we decide, we must document it very clearly so that it is possible to modify analysis tools to do the right thing in every situation. At the moment, the double-colon works on the vast majority of systems that are in active use with Carpet data. I agree it is not ideal, and that it should be possible to access and analyse Carpet data on Windows and using Windows file sharing, but unless the change is done carefully, this is going to cause immediate breakage and pain to existing tools and users, for a benefit which is likely small. So, if someone still wants to propose that the filename scheme is changed, please come up with a concrete proposal which addresses:

    1. ASCII
    2. HDF5
    3. Multipatch
    4. All currently-valid Cactus group and variable names
    5. One-file-per-group and one file per variable

    I had a look in the CarpetIOHDF5 source code, and couldn't find the function that constructs the filename; is this done at a higher level in Cactus or Carpet and then passed in to OutputVarAs?

    From looking at my simulations directory, it looks like the colon is only used when outputting one file per group, and in that case the double colon is used to separate an implementation name prefix from the group name. Is that correct?

    Roland: I don't understand your "grid" example. When would filenames like this be generated? "grid" is the implementation, and the group is called "coordinates", so the current name would be "grid::coordinates.x.file_0.idx.h5", wouldn't it? Is there another filename scheme where the variable name is used with the implementation name?

    Frank: Is it really true that you can have '-' or '.' in implementation and variable names? Don't variable names have to be valid C and Fortran identifiers? Maybe not implementation names. We need to be able to distinguish between <variable> and <implementation>::<group>. Since variables surely cannot have '-', it seems that <implementation>-<group> should be unambiguous.

    This issue would be less important if Cactus had a mechanism for writing a metadata file describing what had been output. This could then be read by analysis tools, and they wouldn't have to guess from filenames. I will open a ticket for this (I thought there was one already, but I can't find it). This would not solve the problem completely, as any time the output is ambiguous, it would also be possible for there to be a collision between two different output filenames, but it would take the guesswork out of writing analysis tools.

  17. Frank Löffler
    • removed comment

    I don't know of any implementation name containing '-' or '.', and I am not sure whether this would be allowed. I didn't grep for group names. Quite sure both characters are not allowed in variable names.

    Ian, do I read your reply correctly when I assume you propose '-' as delimiter?

    While a metadata mechanism might be nice, and advanced tools could make use of it, a descriptive file name would still be of value. I don't want to teach every single tool I use the format of a metadata file.

  18. Roland Haas
    • removed comment

    Roland: I don't understand your "grid" example. When would filenames like this be generated? "grid" is the implementation, and the group is called "coordinates", so the current name would be "grid::coordinates.x.file_0.idx.h5", wouldn't it? Is there another filename scheme where the variable name is used with the implementation name? I had not thought about the naming mechanism properly and had not realized that the implementation name is only prepended when "one_file_per_group" and otherwise only th variable name appears. So my example would be "grid.coords.x.h5".

  19. Barry Wardell
    • removed comment

    What is the latest status on this? It sounds like the options are:

    1. Use '.' as a separator.
    2. Use '-' as a separator.
    3. Use '..'as a separator.
    4. Use '--' as a separator.
    5. Put output files into per-implementation subdirectories.

    Was there any consensus on which of these to go for? I think my own preference was for either '-' or '.'.

    The other question was how the change should be made, either:

    1. As a new parameter, defaulting to the old behaviour.
    2. As a new parameter, defaulting to the new behaviour.
    3. As a change in behaviour.
    4. As one of the above 3 along with a metadata file describing the simulation.

    It seemed like most people preferred 1 (or 4), is that right?

  20. Frank Löffler
    • removed comment

    We should do something about this, so I now (again) suggest "." as separator, replacing the "::". The default should change too, but the old behaviour should be allowed using a parameter.

  21. Roland Haas
    • removed comment

    Same comment as in comment:12, I would rather use "-" than "." instead of ":" since is allows for a the file name to be split into individual parts without having to know all possible "dotted" pieces.

  22. Frank Löffler
    • removed comment

    Just for the record (and because I will not be present for the next call): "._-" would all be fine with me, both in single and double versions. Excluding "." because of some problems we are at:

    1. hydrobase_w_lorentz.maximum.asc
    2. hydrobase-w_lorentz.maximum.asc
    3. hydrobase__w_lorentz.maximum.asc
    4. hydrobase--w_lorentz.maximum.asc
    

    Of these, number 3 "looks" best to me, but I can see problems with potential underscores in variable names (although two in a row would be very unusual). So in the end my list of preference would be 2, 4, 3, 1. But really, I would be happy with any of these. And I think we should change the default.

  23. Frank Löffler
    • removed comment

    Doing that change now would mean only two weeks of testing. I usually would say that this would be an ideal candidate for just after a release, unless there is a compelling reason. Is there?

  24. Roland Haas
    • removed comment

    When we change it (before or after the release) I believe that the new parameter for the group separator might be best added to thorn IOUtil so that the separator is consistent among all output files.

  25. Erik Schnetter reporter
    • removed comment

    Do you object to adding this parameter to IOUtil already before the release?

  26. Frank Löffler
    • removed comment

    Replying to [comment:27 rhaas]:

    When we change it (before or after the release) I believe that the new parameter for the group separator might be best added to thorn IOUtil so that the separator is consistent among all output files.

    I agree. So, unless there are objections, there will be a new parameter in IOUtil, seleting the separator between implementation/thorn name and variable names. What's left to do is to decide - How that parameter should be called - When to add it (I think that could be done already now, before the release.) - What the default will be (I think that the old value would be best before the release, but the default should change after the release - please let us know if you disagree.) - The new default value (I believe most have spoken out for "-")

    Of course, the parameter would need to be used..., anything else?

  27. Erik Schnetter reporter
    • removed comment

    The parameter is currently called out_group_separator. This fits with the naming scheme of similar parameters in IOUtil. This parameter is already supported by all Carpet I/O thorns.

  28. Frank Löffler
    • removed comment

    One of four points done then. If someone else than me agrees that this new parameter could go into IOUtil even before the release, and that the default could be made conservative (until after the release), these two could be "down" too. And if we don't hear of any objection to "-" we can do that too, but this has still some time.

  29. Roland Haas
    • changed status to open
    • removed comment

    This has already been ok'ed by Frank and Roland. Please note that the consensus in this ticket was to use as single dash "-" as the separator and not the double underscore "__" that the release notes claim.

  30. Log in to comment