[dokuwiki] Re: about the Dokuwiki Lexer/Parser/Handler improvements....

  • From: Christopher Smith <chris@xxxxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Fri, 10 Apr 2009 11:45:48 +0100


On 9 Apr 2009, at 21:52, Mike Reinstein wrote:


I am interested in achieving 2 goals in the short term:
* improve the the lexing/parsing performance

Sounds good :)


* prototype a parser/lexer based on javascript

Towards those goals, here is what I'm considering:
* take the code from Doku_LexerParallelRegex, Doku_Lexer- >addPattern* functions, and all the Doku_Parser related classes and move them into a Doku_Generate_Tokenizer class that would contain the tokenizer logic (doesn't need invokation on each request) * modify Doku_Lexer to accept the data structure generated from Doku_Generate_Tokenizer as input

I am thinking that a common data interchange format could be used to store this lexer data. JSON seems like a good choice because it's built into PHP and can serialize/deserialize those ds pretty fast. JSON is also nice in that it would allow me to import the same lexing data into javascript very easily, and then all that would be required is to port the remainder of the lexer include the stack and the traversal. But with the static parts of the lexer/parser moved into Doku_Generate_Tokenizer, it's only a few hundred lines instead of a few thousand.

Have you considered, using the regex rather than the information used to generate the regex? At least in the first instance that would seem to be shorter project as all you need to handle is regex compatibility and the application of the regex to the wiki content.


again, these are just ideas but I'd love to get feedback from you folks on if you see this being feasible, and offer the possibility of contributing this back to the Dokuwiki community if it's desired. Didn't want to start coding it and then find out that a super awesome refactorign of the lexer/parser/renderer were underway. :)

Thoughts? comments/suggestions/feedback/concerns highly appreciated!


I'm not aware of anyone working on the Lexer/Parser/Renderer, its not the most easy piece of code to get to grips with and it probably doesn't get the developer love it deserves. Andi is the person most likely to know for sure.

One thing to give some thought to is syntax highlighting. GeSHi isn't part of the lexer/parser. Its handled entirely in the rendering phase. I guess you would need some ajax to retrieve the highlighted code snippet.

- Chris
--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: