[dokuwiki] Re: about the Dokuwiki Lexer/Parser/Handler improvements....‏

  • From: Mike Reinstein <web_fella@xxxxxxxxxxx>
  • To: <dokuwiki@xxxxxxxxxxxxx>
  • Date: Fri, 10 Apr 2009 13:10:37 -0400

> Have you considered, using the regex rather than the information used  
> to generate the regex?  At least in the first instance that would seem  
> to be shorter project as all you need to handle is regex compatibility  
> and the application of the regex to the wiki content.

I think we are saying the same thing. The Doku_Lexer->modes variable is an 
associative array of regexs, where the key is the current parse mode. 
That data is part of the Lexer, but it doesn't need to be rebuilt each time a 
page is rendered. Like you said, if the datastructure containing the regexes is
available as a module, it could be consumed by other modules and I wouldn't 
have to port the regex building logic in several places, makking it a shorter 
project.
I feel that the regex compatibility might be the biggest risk to this concept, 
as I'm not sure how closely javascript matches PHP PCRE behavior.


> I'm not aware of anyone working on the Lexer/Parser/Renderer, its not  
> the most easy piece of code to get to grips with and it probably  
> doesn't get the developer love it deserves. 

That's another reason why I am thinking about this change; Besides making the 
parsing/tokenizing a bit faster, it would also make the components less tightly 
coupled.
Right now, Lexer/Renderer/Parser are tightly coupled with a lot of 
dependencies. By moving the lexer data structure into another module, it would 
be a cleaner 
separation of logic.


> Andi is the person most likely to know for sure.

I started asking Andi before I joined this list, he's actually the one that 
pointed me here. :) He's not aware of any major undertakings at the moment.


> One thing to give some thought to is syntax highlighting.  GeSHi isn't  
> part of the lexer/parser.  Its handled entirely in the rendering  
> phase.  I guess you would need some ajax to retrieve the highlighted  
> code snippet.

Hmm, that's a really good point. The syntax highlighting hadn't even occured to 
me. I think that maybe one solution is that the javascript tokenizer/renderer 
wouldn't
absolutely need to support syntax highlighting to still be useful. The main 
reason why I'm pursuing it is to make previewing less demanding. Right now in 
most wikis
(especially in big corporations) one of the big problems is that they are often 
hosted someplace far away. I work for Nokia in Boston, and we have a wiki 
hosted in 
Australia, which is accessed by people in Vancouver, Espoo, Seattle, Boston, 
New York, and other places. People hate it. It's not the wiki itself, just the 
fact that 
the page load latency sucks. What I want to do is build a local renderer so 
that previewing/editing pages is a 0 latency process. That's going to open up 
new markets.
So I don't expect that javascript is quite ready to become a full fledged 
replacement for the server side renderer, it offers an opportunity to be "good 
enough" to 
save the server lots of processing power, bandwidth, and make the editing 
process more enjoyable.


Again, would love more feedback/follow up/concerns/issues/etc.

-Mike

Other related posts: