Replace internal expression parser with Piraha

Issue #1518 closed
Roland Haas created an issue

the parameter parser allows things like:

thorn::param1 = 011
thorn::param2 = 012.34

in parameter files. For floating point values this is a bit unexpected but otherwise mostly harmless. For integers the situation is a bit more complex since in C a leading zero is used to indicate a octal number. And (worse) while the parameter file parser converts the string "011" to the number 11 the Cactus call CCTK_ParameterSet will convert it to 9. The difference is ultimately the difference between calling atof (Parser) and strtol (CCTK_ParameterSet).

To avoid confusion it would likely be good to change CCTK_ParameterSet to behave the way the Parameter parser does. This is a change in behaviour compared to the pre-Piraha parser, however I suspect the number of users that actually used octal (or hexedecimal) notation in their parameter files is small.

The change is to change inval = strtol (value, &endptr, 0); to inval = strtol (value, &endptr, 10); in line 2209 of src/main/Parameters.c and similar in line 2270.

Keyword: postrelease

Comments (28)

  1. Frank Löffler
    • assigned issue to
    • removed comment

    Steve? In principle I agree with Roland, but I could also see you wanting to add octal and hexadecimal support to the parser - just to have it more 'complete'. Which is it for you? :)

  2. Steven R. Brandt
    • changed status to resolved
    • removed comment

    It turns out that this was fixed in commit 5063, when I switched from using strtol() to operator>>(). I chose the latter because it does more error checking. Interestingly, it also has the effect of reverting Cactus back to the old behavior with regard to leading zeros.

  3. Roland Haas reporter
    • changed status to open
    • removed comment

    This still persists in rev 5066 (just checked). Attached please find a thorn that demonstrates the issue.

    $ cat arrangements/Local/OctalParameters/par/octal.par
    ActiveThorns = OctalParameters
    OctalParameters::intparam = 010
    
    $ exe/cactus_null arrangements/Local/OctalParameters/par/octal.par
    INFO (OctalParameters): intparam at line 10 is: 10
    INFO (OctalParameters): realparam at line 12 is: 17.4
    INFO (OctalParameters): intparam at line 21 is: 10
    INFO (OctalParameters): realparam at line 23 is: 10
    INFO (OctalParameters): intparam at line 32 is: 8
    INFO (OctalParameters): realparam at line 34 is: 10
    INFO (OctalParameters): intparam at line 43 is: 16
    INFO (OctalParameters): realparam at line 45 is: 16
    
  4. Steven R. Brandt
    • removed comment

    After digging in to this deeper, it seems the real problem is that Piraha integration was incomplete. Util_ExpressionEvaluate() is still used when expressions are set internally.

    On line 10, you see Piraha's conversion for "010" which is 10.

    On line 32, you see the old method's conversion for "010" which is 8.

    I think what I should do is replace Util_ExpressionEvaluate.

    Just to be sure, however, the value we want from "010" is 10, right?

  5. Frank Löffler
    • removed comment

    If I understood it correctly, the current (old) Cactus behavior is to evaluate 010 as octal number. I don't mind one way or another for octal numbers. I cannot think of a reason to use them, other than maybe some unix rights mask, but then we could have some other solution for that.

    Hexadecimal numbers might be another matter, these are more commonly used in general.

  6. Roland Haas reporter
    • removed comment

    Replying to [comment:5 sbrandt]:

    After digging in to this deeper, it seems the real problem is that Piraha integration was incomplete. Util_ExpressionEvaluate() is still used when expressions are set internally. We have actually list of leftover TODO items for Piraha (meeting minutes: http://lists.einsteintoolkit.org/pipermail/users/2013-February/002854.html) for which we unfortunately never created a ticket so they got lost it seems.

    Piranha parameter parser: * currently parses only parameter files, plan to extent to all ccl files * should use namespace and adhere to Cactus naming and commenting conventions: either use C++ namespace cctk (and cctki?) and/or prefix all externally visible functions by either cctk or cctki depending on whether they should be user callable or not [NOTE: these are lowercase but they should have been uppercase CCTK (RH)] * remove and replace existing math evaluator/parser by piranha one to have uniform math support

  7. Steven R. Brandt
    • removed comment

    So I think 2nd point is already handled. All of Piraha is wrapped in the C++ namespace "cctki_piraha". This ticket is essentially about the 3rd point. I think the 1st point should be its own ticket.

  8. Steven R. Brandt
    • removed comment

    OK, this is long overdue, but I replaced the expression evaluator with Piraha.

    One new feature, which I can take out if no one likes it, is that you can set temporaries in an evaluation. Thus, if you do this:

      uExpressionValue result;
    
      cctk_PirahaEval("$a=4+4");
      result = cctk_PirahaEval("$x=3+$a*2");
    
      if(result.type == ival) {
        printf("result=%d\n",result.value.ival);
      } else if(result.type == rval) {
        printf("result=%g\n",result.value.rval);
      }
    

    You'll get "result=19." See the patch.

  9. Ian Hinder
    • removed comment

    What "namespace" are the temporaries set in? Can codes create their own space to avoid conflicts with existing temporaries? Out of caution, I would comment out or otherwise disable the setting of temporaries using the flesh expression parser in case it causes problems. It could then be enabled if people need the feature and it is worth the risk of confusion.

    This patch introduces a new type "sval", I assume for string valued expressions. Existing code will not check exhaustively for the type of the expression. I doubt this is a problem in practice, as the expression parser code is rarely used outside the flesh I think.

    There is a switch statement at the top of the patch which does not have a default case. Please add a fatal error message here.

    Do I understand correctly that you are replacing the Util_Expression* flesh API with a different API? Doesn't this break backward compatibility? Should there be a wrapper introduced? Then the changes to Groups.c and Parameters.c would not be needed.

    I support replacing the expression parser with Piraha, but I don't think we should change/eliminate the Util_* expression API. Did I misunderstand what the patch was doing?

  10. Steven R. Brandt
    • removed comment

    Sorry for not responding to this sooner:

    (1) The ability to set variables (e.g. symbols preceded by $) can easily be removed. (2) I added sval for completeness. Someone might want to evaluate a string par. (3) I can add the default check. (4) The API still looks basically the same, only the implementation changed. It passes the test suite last I checked. The goal was to remove subtle behavioral differences between expression evaluation and parameter parsing.

  11. Ian Hinder
    • removed comment

    As far as I can see in the patch, the function Util_ExpressionParse has been removed. Is this correct? I consider this a change to the API, because code which calls this function will have to be changed. I had assumed that this function was documented and an official part of the flesh API, rather than an internal function, since it has a Util_ prefix rather than CCTKi_. However, I now see that it is not documented (http://einsteintoolkit.org/documentation/ReferenceManual/ReferenceManualpa2.html#x5-219000B). I suspect that it is considered by most Cactus developers to be part of the flesh API, and should have been documented. Since it is callable from user code, I don't think it should be removed. Instead, the API should be maintained, and should call the new code underneath.

  12. Steven R. Brandt
    • removed comment

    I said "basically the same." Since nothing in the ET test suite called it, and (as you pointed out) the docs don't mention it I figured it wasn't used, and open to some modification. The above function (cctk_PirahaEval), does essentially the same thing, but returns a uExpressionValue rather than a uExpression. I imagined a value would be more useful since the uExpression contains internal things like tokens and token counts. Of course, I could always rename cctk_PirahaEval as Util_ExpressionParse, but since the return value changed I thought it might make sense to change the name.

    Do you have a thorn and test case that uses this function?

  13. Ian Hinder
    • removed comment

    No, I don't. The ET tests are not exhaustive, and while they are useful for detecting some problems, they should not be used as evidence that something should not be supported. Cactus is a framework; the Einstein Toolkit is just one user of that framework. Many people use the Cactus flesh with thorns which are not in the Einstein Toolkit. Just because something is not used in the toolkit, it doesn't mean it can be changed or removed. If we want to remove the Util_Expression API, we should discuss this first, weighing up the pros and cons, evaluating how likely it is that people will be affected, announcing the change well in advance (e.g. at least one release), and getting a consensus from the Cactus developers that this is a good idea. It's possible that nobody uses it; it is, after all, a rather obscure feature. But the Cactus flesh has a large number of users, and a decision like this should not be taken lightly. Would you like to propose that this flesh API is changed, and give arguments for why this is good and/or OK? My own preference would be to implement the old functionality (down to the return types and semantics) in terms of the new, or to provide a new API in addition to the old one, deprecating the old one, if wrapping the new API with the old is not considered feasible.

  14. Roland Haas reporter
    • removed comment

    This a regression, yes? Should it be noted in the release notes as such?

  15. Steven R. Brandt
    • removed comment

    I'm not sure what you mean by a regression. AFAIK, there is no test that identifies the problems discussed in this ticket. Probably something should be added.

  16. Ian Hinder
    • removed comment

    http://en.wikipedia.org/wiki/Software_regression:

    A software regression is a software bug which makes a feature stop functioning as intended after a certain event

    In this case, it used to work, and now it doesn't. We have "regression tests" to catch regressions, but the word "regression" is not limited to whether the code is tested. If we know that something has broken in this release that worked before, then yes, it needs to be mentioned in the release notes. We should put the word "regression" into the ticket keywords when a ticket is associated with a regression so that we can search for them when writing the release notes.

  17. Roland Haas reporter
    • removed comment

    Uh, actually I think I should have remained silent. The possibility of encountering this low, since the original code was the one that would have allowed octal numbers ie 011octal = 9decimal. I would expect that not many users were using octal numbers in there parfiles.

  18. Steven R. Brandt
    • removed comment

    I confess that once I realized there were two distinct parsers in the system, that became the focus of the ticket for me. I actually forgot all about the octal conversion and whether it was desirable. Adding the octal conversion would be easy enough to do.

    My comment about the regression was simply that it doesn't break an existing software regression test, and so it seemed odd to me to call it a regression. I was wondering if I was missing something.

  19. Roland Haas reporter
    • removed comment

    I can only review code style and some general stuff since I do not actually understand what the Piraha parser does.

    Comments:

    • line 2312 ff:

      2312 int n=0; 2313 while(temp[n] != 0) 2314 n++; is just a strlen. So strlen should be used. Since the intend seems to be to try and avoid calling strlen() after each iteration, the most convenient way may be:

      for (unsigned int p = 0, n = strlen(temp); p < n; p++) the way eg Carpet caches the end() iterator for C++ style iterations. * there's a number of commented out lines that should not be committed (eg line 2845 of src/main/Groups.c, line 2223 of src/main/Parameters.c) * src/piraha/Makefile seems to hard-code compiler options for -O2 and -g. This should not be. * the patch redefine Util_StrCmpi. I don't understand why this is done. If actually required, I think it should be done in a separate commit. We may not even need Util_StrCmpi anymore since POSIX.1-2001 and we could officially add POSIX.1-2001 to Cactus' requirements.

  20. Erik Schnetter
    • removed comment

    While we are discussing style questions -- strlen returns size_t, not unsigned int.

    Yes, we should replace Util_StrCmpi by strcasecmp. In fact, we should make the former be just a wrapper around the latter.

  21. Roland Haas reporter
    • changed status to open
    • removed comment

    I should have added that with the exceptions of the points raised above, the patch is fine to apply. There may be more of the "unsigned int" issues around, since the type is already in the files (originating from sometime around 2001, where I suspect size_t was indeed the same as unsigned int).

  22. Log in to comment