piraha_everywhere Zelmani stress test

Issue #2003 closed
Roland Haas created an issue

Hello all,

Zelmani contains a number of ccl files that make the perl based piraha parser abort with errors but that parsed fine with the old parrser. Some ccl files are clearly incorrect while others are correct and the parser should accept them. The major such issue is that the parser requires that that there is a private/public/protected setting at the beginning of a param.ccl which is not required by the docs since parameters are supposed to default to private.

I attach all ccl files both in their original version as well as version that can be parsed without errors (hopefully).

Keyword: piraha

Comments (44)

  1. Steven R. Brandt
    • removed comment

    I'm a little confused about what the issue is. The public/private/protected isn't required by the Piraha grammar either, and (for me) all the param.ccl files in your fixed file parse without error.

    There are, however, a lot of errors in the interface.ccl files. :)

  2. Roland Haas reporter
    • removed comment

    yes the fixed ones are all passing since I added private at the beginning of each one. Please try the "original.tar.gz" ones for errors.

    The issue is that piraha fails to parse PNSHelper/param.ccl in the original.tar.gz archive unless one adds a "private" at the beginning. One gets

    File: /home/rhaas/postdoc/gr/cactus/Zelmani/arrangements/Zelmani/PNSHelper/param.ccl; par key error:(PNSHELPER PRIVATE variables) new=(solve_after_recovery
    verbose
    update_GR_every
    update_GR_every_start
    extrapolation_order
    update_GR_switch_time
    nrad
    nrad_outer
    ntheta
    nphi
    rad_max
    rad_max_outer
    symm
    collect
    interpolator
    interpolator_options
    coordinate_system) old=() at /home/rhaas/postdoc/gr/cactus/Zelmani/lib/sbin/parameter_parser.pl line 800.
            main::parse_param_ccl("PNSHelper", Group=HASH(0x55a3a81f6a08), "BOOLEAN solve_after_recovery \"solve for constraints after rec"..., "{", "} \"no\"", "BOOLEAN verbose \"Do you want me to talk?\" STEERABLE=ALWAYS", "{", "} \"no\"", ...) called at /home/rhaas/postdoc/gr/cactus/Zelmani/lib/sbin/parameter_parser.pl line 62
            main::create_parameter_database("HDF5", "/home/rhaas/postdoc/gr/cactus/Zelmani/arrangemernalLi"..., "ZelmaniWizard", "/home/rhaas/postdoc/gr/cactus/Zelmani/arrangements/Zelmani/Ze"...,olate2", "/home/rhaas/postdoc/gr/cactus/Zelmani/arrangements/Llama/Inte"..., "CarpetEvolutionMasme/rhaas/postdoc/gr/cactus/Zelmani/arrangements/Carpet/Car"..., ...) called at /home/rhaas/postdctus/Zelmani/lib/sbin/CST line 140
    

    whatever that actual error inside of piraha is (maybe the message lets you guess), having to add a private to fix it does not seem the correct thing to do.

    I also noted that spelling "LANG" as "LABG" in a schedule.ccl gives me

    CST error 1:
      -> Unrecognised statement in schedule block (decloop) in schedule.ccl for thorn SyncG2Test
    ""
    

    The second part of the issue is that currently piraha's error messages are not yet as good as they could be I think.

    Generally I think the error messages should always list the line number in the ccl file in which the error occurred and not (at least eventually) output as much internals of the parser that are not very useful to an "end user".

  3. Steven R. Brandt
    • removed comment

    I'm confused. I don't see this error. Do you have the latest version of flesh/master?

  4. Roland Haas reporter
    • removed comment

    Replying to [comment:5 sbrandt]:

    I'm confused. I don't see this error. Do you have the latest version of flesh/master? i would have been reasonably sure it was the master version of the flesh. However I cannot reproduce this anymore (using git hash 5c69e001c38671149b1f24a83e4a080ed9075d06 of the flesh). It also turns out the error message is part of the original parser which should only ever run after piraha finished its parsing.

    So, looks like I must have used an incorrect version when encountering (and reporting) the LABG error.

  5. Roland Haas reporter
    • removed comment

    Some error messages seem to contain line numbers (from schedule.ccl?) while others don't seem to (interface.ccl?). Ideally the error message should contain on a single line preceded by a unique pattern the line number, the column number and a short error message so that one can use eg. vim/emacs's error mode to jump to the next error message (just as one can do for compiler errors). There is of course support in those for things like --8<-- ^ | error is here --8<-- though this harder to get to work.

  6. Steven R. Brandt
    • removed comment

    I have a fix in the branch "fix_error_messages" in the flesh that makes reporting of syntax errors, etc. more uniform. Please let me know what you think.

  7. Roland Haas reporter
    • removed comment

    Replying to [comment:8 sbrandt]:

    I have a fix in the branch "fix_error_messages" in the flesh that makes reporting of syntax errors, etc. more uniform. Please let me know what you think.

    Doesn't CST_fatal already exist in CSTutils.pl and thus one should require CSTutils then use it from there rather than re-implement or copy it (with a new author tag to boot :-) )?

    Otherwise, yes this goes into the direction I had in mind. thank you. There is still instances of confess present (eg in line 218 of lib/sbin/ScheduleParser.pl).

    Note that some of the parfiles that I had attached contain constructs that should parse but hat didn't at that time (eg one fails since apparently there is a tab after the word "Fortran" in the LANG tag used).

  8. Steven R. Brandt
    • removed comment

    CST_fatal is new with me, AFAIK. The existing facilities didn't abort immediately, which led to some strange additional error messages from the system that compares the results of the current parser to the old one. All it does is call CST_PrintErrors() and die.

  9. Roland Haas reporter
    • removed comment

    Replying to [comment:10 sbrandt]:

    CST_fatal is new with me, AFAIK. The existing facilities didn't abort immediately, which led to some strange additional error messages from the system that compares the results of the current parser to the old one. All it does is call CST_PrintErrors() and die.

    Oh true, sorry. I only looked at the author information in CSTUtils.pl and not at git-blame for the CST_fatal routine in that file. On top of that I apparently got confused when looking at the patch not realizing in which file things are defined. Ideally it should not abort on the very first syntax error but instead report all (or at least some number) of syntax errors before aborting (same as eg a compiler reports all errors in a source file and not only the first one, same as the runtime parameter file parser does).

    This is extra work and not totally trivial (in particular given that one has to recover from a faulty line of input) but I think is needed for a good error reporting.

  10. Erik Schnetter
    • removed comment

    There is a global variable "CST_errors" that one could check to find out whether to compare old and new parsers.

  11. Steven R. Brandt
    • removed comment

    OK, the fix_error_messages branch has been updated. I removed CST_fatal, and you can now get syntax errors from multiple ccl files (but not more than 1 for a single ccl file). The CCT_error() function was somewhat broken. It had a $full_warnings variable that was never set (so it was always zero) and therefore CST_error() not report line numbers/files.

  12. Roland Haas reporter
    • removed comment

    I just attached a tarball with the set of ccl files that should compile but don't (with flesh hash 5dd70e1e).

    Note that we describe the CCL syntax in the user guide: http://einsteintoolkit.org/usersguide/UsersGuidech12.html#x17-184000D2 which documents eg the use of a backslash for line continuation.

    Reading this I notice that at least the online version still mentions steps for real valued parameters and I think those were removed:

    REAL

    The range specification is the same as with integers, except that here, no step implies a continuum of values. Note that numeric constants should be expressed as in C (e.g. 1e-10). Note also that one cannot use the Cactus types such as CCTK_REAL4 to specify the precision of the parameter; parameters always have the default precision.

  13. Steven R. Brandt
    • removed comment

    The only file which seems to have this problem is CarpetRhoRegrid/param.ccl. At issue is the interpretation of the backslash continuation. I do not do a continuation mid-token, but accept it if embedded in whitespace. So, this is okay:

    CCTK_REAL rho_max_list[99] "List of densities at which to refine" \
    STEERABLE=ALWAYS
    {
     0.0e0:   :: "Anything larger 0"
    } 0.0e0
    

    But not this (what was present)

    CCTK_REAL rho_max_list[99] "List of densities at which to refine" STEERABLE=ALW\
    AYS
    {
     0.0e0:   :: "Anything larger 0"
    } 0.0e0
    

    Processing the \ mid token is a non-trivial thing to support. Do we really want that?

  14. Steven R. Brandt
    • removed comment

    For reference, here's what it says in the User Guide.

    CCL files are (mostly) case independent, and may contain comments introduced by the hash ‘#’ character, which indicates that the rest of the line is a comment. If the last non-blank character of a line in a CCL file is a backslash ‘\’, the following line is treated as a continuation of the current line.

    I believe I wrote this paragraph. I never imagined line continuations slicing up c-identifiers or literals.

  15. Roland Haas reporter
    • removed comment

    Yup, I also would not have expected to slice up identifiers. However the old parser supports it and for example Fortran does let you do exactly that as well with its line continuation so there is precedent. I also checked the docs (before posting the files) to check if eg the wording was that of C which essentially says "a backslash followed by a newline is replaced by a space". The wording would also make

    Mary had a little \<space><tab><newline>
    lamb
    

    a continuation (tab is a "space").

    That will teach us to make assumptions on what people may reasonably do I guess :-)

  16. Roland Haas reporter
    • removed comment

    Steve, I get parse failures for more files. Eg if I fix CarpetRhoRegrid/param.ccl by manually joining the lines I get:

    make sim-rebuld
    ...
    File: /home/rhaas/postdoc/gr/cactus/ET_trunk/arrangements/ZelmaniFixed/PNSHelper/param.ccl; par key error:(PNSHELPER PRIVATE variables) new=(solve_after_recovery
    verbose
    update_GR_every
    update_GR_every_start
    extrapolation_order
    update_GR_switch_time
    nrad
    nrad_outer
    ntheta
    nphi
    rad_max
    rad_max_outer
    symm
    collect
    interpolator
    interpolator_options
    coordinate_system) old=() at /home/rhaas/postdoc/gr/cactus/ET_trunk/lib/sbin/parameter_parser.pl line 801.
            main::parse_param_ccl("PNSHelper", Group=HASH(0x558fd7df9ba0), "BOOLEAN solve_after_recovery \"solve for constraints after rec"..., "{", "} \"no\"", "BOOLEAN verbose \"Do you want me to talk?\" STEERABLE=ALWAYS", "{", "} \"no\"", ...) called at /home/rhaas/postdoc/gr/cactus/ET_trunk/lib/sbin/parameter_parser.pl line 63
            main::create_parameter_database("TestComplex", "/home/rhaas/postdoc/gr/cactus/ET_trunk/arrangements/CactusTes"..., "RNSPerturb", "/home/rhaas/postdoc/gr/cactus/ET_trunk/arrangements/ZelmaniFi"..., "NewRad", "/home/rhaas/postdoc/gr/cactus/ET_trunk/arrangements/EinsteinE"..., "RegridSyncTest", "/home/rhaas/postdoc/gr/cactus/ET_trunk/arrangements/Carpet/Re"..., ...) called at /home/rhaas/postdoc/gr/cactus/ET_trunk/lib/sbin/CST line 142
    /home/rhaas/postdoc/gr/cactus/ET_trunk/lib/make/make.configuration:214: recipe for target '/home/rhaas/postdoc/gr/cactus/ET_trunk/configs/sim/config-data/make.thornlist' failed
    make[2]: *** [/home/rhaas/postdoc/gr/cactus/ET_trunk/configs/sim/config-data/make.thornlist] Error 25
    Makefile:256: recipe for target 'sim' failed
    make[1]: *** [sim] Error 2
    Makefile:657: recipe for target 'sim-rebuild' failed
    make: *** [sim-rebuild] Error 2
    

    I forgot to explicitly mention this: while the tarballs don't contain the full thorn source code, they contain enough so that one can actually run make sim-rebuild to parse the ccl files. I am not asking for a blind debugging, I am providing a working (or rather failing) example.

  17. Steven R. Brandt
    • removed comment

    I would vote to change the wording in the docs to make it illegal to slice up identifiers with line continuations. If we had hundreds of par files that worked in this way, I'd consider a backwards compatibility story. In this case, I think, it makes no sense.

    Actually, line continuations are now completely obsolete. You can just put things on the next line without the continuation and Piraha will figure it out.

  18. Steven R. Brandt
    • removed comment

    OK, this second error isn't a parsing error (which was all I checked with the tarball). That type of error means that the par file was interpreted differently. I'll check on that.

  19. Steven R. Brandt
    • removed comment

    Actually, it might be nice if you could tar up the entire arrangements directory. Can you do that?

  20. Roland Haas reporter
    • removed comment

    You mean all the ccl files of all the thorns in the thorn list (and the thorn list, sorry I forgot that one)? I have just done so in the attached file ccltest.tar.gz .

    Speaking of line continuation: the wording in the docs would also make things like

    CCTK_REAL mygridfun "a grid \
    function with a \
    very long and cumbersone \
    description"
    

    possible. That would still require backslashes, even with Piraha, would it?

  21. Steven R. Brandt
    • removed comment

    OK, a few small issues are found and fixed in the zelmani_stress branch of the flesh. Roland, please check.

  22. Roland Haas reporter
    • removed comment

    Much better. Still chokes on the line continuation, which I think should be allowed since it was in the past and given that the documentation says line continuation works on any line and that '\' is not said to be escapable should be very simple to implement (just check if the end of a just read in line is "\\\n" then read another line and join then repeat the procedure.

  23. Steven R. Brandt
    • removed comment

    The use of \ was a hack in the first place, used because the line+regex based parser couldn't figure out what was going on if an expression continued on another line. The Piraha parser does not have this limitation. In fact, you could delete \ from all files that currently use them and Piraha would have no trouble. (Although, since the old parser still runs in addition to Piraha, you can't do that right now.)

    I think the \'s are horrible to look at and make code harder to read.

    It makes more sense to me to change the documentation. Maybe we should discuss on the next phone call?

  24. Roland Haas reporter
    • removed comment

    Sorry, I won't make the call, (still in Germany and in a conflicting meeting).

    I fully agree that '\' is no longer required and could be removed from all ccl files. I am trying to point out, that we we need to support old ccl files that use it to remain backwards compatible. I usually try and argue strongly in favour of backwards compatibility even at the price of making the code more complex.

    Is it really so hard to modify the line reader that is used to at that point join lines if a line ending in '\' is found?

  25. Steven R. Brandt
    • removed comment

    I think there are enough \'s that they should be supported, but not in the middle of a token. It probably wouldn't be terribly hard to support, but are there enough old files that do this to make it worthwhile?

  26. Roland Haas reporter
    • removed comment

    Strictly speaking as long as it was documented the way it is right now in the previous release then the interpretation of having it in the middle of a token is valid and we must support it even if we don't know of a single file that uses it. I do not think we should assume that because we do not use a feature that it is not used at all since there are many groups that use Cactus but are not eg part of the Einstein Toolkit or may not even use it for numerical relativity so are not aware of any discussion on the ET users list or its trac system.

  27. Steven R. Brandt
    • removed comment

    What would it take to change this behavior to the one you request? Piraha is not currently a two pass parser. Every token has its own pattern, e.g. name=[a-zA-Z0-9_]+ or something equivalent. Since it would be impractical to change every token to allow \\n\s in the middle, the only way to support this feature would be to implement a pass prior to the Piraha parsing which would do nothing except remove the \\n\s patterns--except it wouldn't be quite that simple because simply substituting would throw the line numbers off. Perhaps it could insert a number of blank lines after the next \n to compensate. Maybe it wouldn't be that hard to do what I just described or that costly to the CPU, but... well, when I see that token broken in the middle with a \ for no reason it makes me cringe. I feel like I'm Detective Monk and you asked me to eat a sandwich that just fell on the floor. :)

    Again, I wrote the text and didn't imagine it would be interpreted that way. I don't see why the fix can't be made to the documentation rather than the code.

  28. Ian Hinder
    • removed comment

    Roland: I agree that backward compatibility is important, but nothing should be absolute. If it's a lot of work, for an extremely minor breakage, then it's not worth it, especially if it has only been documented for one release. Put it this way: would you rather that Steve spent time implementing this, or working on the automatic sync statements?

    Was CarpetRhoRegrid/param.ccl written with that funny backslash because it said it was possible in the documentation? I can't imagine anyone writing that line deliberately; it looks more like some auto-formatting by an editor.

  29. Roland Haas reporter
    • removed comment

    I think we cannot document it away since the documentation was present in at least one release and the breaking up in the middle of a token was used in at least one real life code out there, namely Zelmani.

    The pre-piraha docs said (git hash e8f2db41, 3 years ago):

    '''General Syntax of CCL Files''' Cactus Configuration Language (CCL) files are simple text files used to define configuration information for a thorn. CCL files are case independent, and may contain comments introduced by the hash ‘#’ character, which indicates that the rest of the line is a comment. If the last non-blank character of a line in a CCL file is a backslash ‘\’, the following line is treated as a continuation of the current line.

    So it was included in at least one release.

    I think that, as documented and mimicking how the old parser worked, this can be made to work through a simple modification of the parse_src subroutine:

    open(my $fd, "test.ccl") or die "cannot open $src";
    my $src_contents = "";
    my $line = "";
    while(<$fd>) {
      if(s/\\\s*\n//) {
        $line = "\n" . $line . $_;
      } else {
        $src_contents .= $line . $_;
        $line = "";
      }
    }
    close($fd);
    

    which adds '\n' at the front to keep line numbers correct.

    A slightly more complex regex pattern can be used to allow for escaping of '\'.

    open(my $fd, "test.ccl") or die "cannot open $src";
    my $src_contents = "";
    my $line = "";
    while(<$fd>) {
      if(m/(^|[^\\])([\\]{2})*\\\s*\n/) {
        s/\\\s*\n//;
        $line = "\n" . $line . $_;
      } else {
        $src_contents .= $line . $_;
        $line = "";
      }
    }
    

    which looks for odd numbers of '\' at the end of a line.

    I admit it is not beautiful since some information about the ccl file content is lost.

  30. Roland Haas reporter
    • removed comment

    Replying to [comment:30 hinder]:

    Roland: I agree that backward compatibility is important, but nothing should be absolute. If it's a lot of work, for an extremely minor breakage, then it's not worth it, especially if it has only been documented for one release. Put it this way: would you rather that Steve spent time implementing this, or working on the automatic sync statements? I disagree. I also think it is not much work as demonstrated by the code fragments above. We had a similar situation when we required that all strings in ccl files are surrounded by '"' quotes, which was fine for all ET parfiles (and all the ones I had written myself) but broke every single parfile used by Christian Reisswig and Christian Ott. They were ok to change their parfiles but it serves as an example to not claim something is "minor" because "I would never do this".

    Was CarpetRhoRegrid/param.ccl written with that funny backslash because it said it was possible in the documentation? I can't imagine anyone writing that line deliberately; it looks more like some auto-formatting by an editor.

    This must have been done intentionally. I would not know of an editor (emacs in this case most likely) that would add "\" at the end of a line. Auto-formatting would most likely just wrap the word to the next line.

  31. Erik Schnetter
    • removed comment

    Let's fix this one file to make it look better. That's a good idea anyway. That gives us more time to investigate whether people want to put backslashes in the middle of identifiers, and if they so, we can update the parser to handle this correctly.

  32. Frank Löffler
    • removed comment

    Most of the time you will find me arguing for backwards compatibility, for reasons Roland already mentioned. But there is a limit. Line-continuation within a token might have been allowed by the documentation, and might be working in the old parser. But I find it hard to imagine that someone actually wants to use it. It shouldn't be used. So, my opinion is that we should start disallowing that.

    Another issue is how we do this. Usually, we say we announce this in the release notes for the release after that one. The intention here is to give people time to adapt. In this specific case, however, it would mean either keeping the piraha changes out until then (and then still waiting for the double-parsing to be removed later I guess), or to change it to make two passes, just for that quite questionable case. If we were to do this, there should also be a parse-time warning about this, but frankly most likely this would be useless because nobody will be able to read it the few milliseconds it will be on the screen before it scrolls away. It would follow standard procedure, but it would be useless.

    This, this time, I suggest to make that change (disallow line continuation within tokens), and make it already for the next release.

    The one exception I see is a string (like a description). Here I could see someone wanting to break up a long string without introducing an actual newline in the string. But in this specific case, as Steve assured me, this would not be hard to handle, and wouldn't require two passes.

  33. Roland Haas reporter
    • removed comment

    Fine with me. I agree with Ian that I'd much rather see the auto-sync working (though admittedly I'd have also been fine to shelve piraha_everywhere and focus on autosync if we'd have found serious errors).

    Note that there is an implementation of line continuation that does not require two passes and that works exactly as the one in the current parser that I propose above, so we could remain backwards compatible at no additional cost. That is, if one is willing to let combining lines be part of the routine that reads lines rather than part of the parser.

  34. Frank Löffler
    • removed comment

    Replying to [comment:35 rhaas]:

    Note that there is an implementation of line continuation that does not require two passes and that works exactly as the one in the current parser that I propose above, so we could remain backwards compatible at no additional cost. That is, if one is willing to let combining lines be part of the routine that reads lines rather than part of the parser.

    The problem with that is that the parser wouldn't get the lines numbers of all lines following the line continuation correct - and the line-combiner, not being a parser, wouldn't know where to insert the necessary empty lines. Why do you need the line numbers? For nicer error messages.

  35. Roland Haas reporter
    • removed comment

    I had intended the line numbers to be correct. Each time a line is joined a newline is added to the front of the joined line. So the line number reported will be the last line of the joined lines. Please give it a try if you like.

  36. Frank Löffler
    • removed comment

    I guess this is getting it little bit too technical, but how would you handle something like

    "silliness\
     follows \
    here" to\
    theextremeand "nothing \
    gets better\
     from here\
    on" so\
    why "am\
     I doing \
    this?"
    

    By always adding an empty line in front, any error within this would register as happening on the last line, despite this being three strings and two tokens. One could argue that someone who actually would write this in all seriousness does not deserve better.

    It would still mean two stages while parsing, but the first would be quite simple (replace "\(\r\n?|\n)" with "", and in those instances add a newline at the beginning of the current line, and concatenate with the following line - if I understood you correctly?)

  37. Roland Haas reporter
    • removed comment

    Correct. It would give those error messages at the end. Not much one can do about this if one does not want to introduce a tokenization stage that would record for each token which line and column it started. That would be great but likely quite a bit of work I think. If you have a look at the code you can see that there already are two stages one is "slurp file into buffer" and the second one is "parse buffer".

    It is getting very technical though and rather than trying to convince everyone of my idea, I think it may be simpler to just leave things as the are right now ie. only allow line continuation at the end of pieces recognized by the grammar.

  38. Roland Haas reporter
    • removed comment

    Could the zelmani_stress branch code that has param.ccl file start off with PRIVATE scope for parameters (as documented and used by thorns out in the wild), please be prepared for review and inclusion? That seems to be an important fix to include.

  39. Log in to comment