[haiku-development] Re: DeskCalc Improvements (was need strtold() function)

  • From: Stephan Assmus <superstippi@xxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Fri, 22 Jan 2010 21:20:19 +0100

On 2010-01-22 at 20:10:28 [+0100], John Scipione <jscipione@xxxxxxxxx> 
wrote:
> Thanks for the tip Ingo. The author did not follow the adage, never use 
> chars for tokens, always use pointers (ie strings). I've been bitten by 
> that in the past when I've written lexers/parsers. To fix this bug I 
> think am going to have to rewrite NextToken to look at each token as a 
> string (which most of the time will be of length 1) and not as a single 
> char, then I am going to have to check for the strings "E+" (or "e+") and 
> "E-" (or "e-") and treat them as a single token each. This means that 
> I'll have to change the definition of the Token struct to include a 
> pointer to *fCurrentPos instead of simply using fCurrentChar. I can 
> already tell that this is going to be difficult so I'll probably need to 
> ask for help again before I am done with it.

But tokens already point at strings. How else would it parse numbers with 
more than one digit?

In Tokenizer::NextToken(), it is already looking for 'e' and 'E' while 
inside parsing a number. Wouldn't it be enough to simply check for + and - 
there, only in the case that E has already been encountered?

In the current code, finds the end of a number:

        const char* begin = fCurrentChar;
        while (*fCurrentChar != 0) {
                if (!isdigit(*fCurrentChar)) {
                        if (!(*fCurrentChar == '.' || *fCurrentChar == ','
                                || *fCurrentChar == 'e' || *fCurrentChar == 
'E'))
                                break;
                }
                if (*fCurrentChar == ',')
                        temp << '.';
                else
                        temp << *fCurrentChar;
                fCurrentChar++;
        }

Becomes something like this (untested):

        const char* begin = fCurrentChar;
        bool expectE = true;
        bool expectPlusOfMinus = false;
        while (*fCurrentChar != 0) {
                if (!isdigit(*fCurrentChar)) {
                        if (*fCurrentChar == 'e' || *fCurrentChar == 'E') {
                                if (!expectE)
                                        break;
                                expectE = false;
                                expectPlusOfMinus = true;
                                continue;
                        } else if (*fCurrentChar == '+' || *fCurrentChar == 
'-') {
                                if (!expectPlusOfMinus)
                                        break;
                        } else if (!(*fCurrentChar == '.' || *fCurrentChar == 
','))
                                break;
                        expectPlusOfMinus = false;
                }
                if (*fCurrentChar == ',')
                        temp << '.';
                else
                        temp << *fCurrentChar;
                fCurrentChar++;
        }

So while we are already parsing a number, we expect +/- only directly after 
encountering e/E. And we expect to encounter e/C only once (unlike before). 
The actual parsing is still handled by the sscanf() invokation and any 
syntax error in the string (like double ,/.) should cause it to throw an 
error.


Best regards,
-Stephan

Other related posts: