On 2010-01-22 at 20:10:28 [+0100], John Scipione <jscipione@xxxxxxxxx> wrote: > Thanks for the tip Ingo. The author did not follow the adage, never use > chars for tokens, always use pointers (ie strings). I've been bitten by > that in the past when I've written lexers/parsers. To fix this bug I > think am going to have to rewrite NextToken to look at each token as a > string (which most of the time will be of length 1) and not as a single > char, then I am going to have to check for the strings "E+" (or "e+") and > "E-" (or "e-") and treat them as a single token each. This means that > I'll have to change the definition of the Token struct to include a > pointer to *fCurrentPos instead of simply using fCurrentChar. I can > already tell that this is going to be difficult so I'll probably need to > ask for help again before I am done with it. But tokens already point at strings. How else would it parse numbers with more than one digit? In Tokenizer::NextToken(), it is already looking for 'e' and 'E' while inside parsing a number. Wouldn't it be enough to simply check for + and - there, only in the case that E has already been encountered? In the current code, finds the end of a number: const char* begin = fCurrentChar; while (*fCurrentChar != 0) { if (!isdigit(*fCurrentChar)) { if (!(*fCurrentChar == '.' || *fCurrentChar == ',' || *fCurrentChar == 'e' || *fCurrentChar == 'E')) break; } if (*fCurrentChar == ',') temp << '.'; else temp << *fCurrentChar; fCurrentChar++; } Becomes something like this (untested): const char* begin = fCurrentChar; bool expectE = true; bool expectPlusOfMinus = false; while (*fCurrentChar != 0) { if (!isdigit(*fCurrentChar)) { if (*fCurrentChar == 'e' || *fCurrentChar == 'E') { if (!expectE) break; expectE = false; expectPlusOfMinus = true; continue; } else if (*fCurrentChar == '+' || *fCurrentChar == '-') { if (!expectPlusOfMinus) break; } else if (!(*fCurrentChar == '.' || *fCurrentChar == ',')) break; expectPlusOfMinus = false; } if (*fCurrentChar == ',') temp << '.'; else temp << *fCurrentChar; fCurrentChar++; } So while we are already parsing a number, we expect +/- only directly after encountering e/E. And we expect to encounter e/C only once (unlike before). The actual parsing is still handled by the sscanf() invokation and any syntax error in the string (like double ,/.) should cause it to throw an error. Best regards, -Stephan