Hi Chris,
Been playing around with the lexer design again recently (in fact a port to Javascript) and think there may be a smart way to solve this general problem fundamentally, by adding a new method to the lexer API - addReEntryPattern - here's a Javascript unit test that illustrates the idea (hopefully close enough to the PHP lexer to make sense);
function testReentryPattern() { var l = new lexer('start'); l.addReentryPattern("\n",'start');
var testTokens = [ ["start","",lexer.ENTER,0], ["start","aaa",lexer.UNMATCHED,0], ["start","\n",lexer.EXIT,3], ["start","\n",lexer.ENTER,3], ["start","bbb",lexer.UNMATCHED,4], ["start","\n",lexer.EXIT,7], ["start","\n",lexer.ENTER,7], ["start","ccc",lexer.UNMATCHED,8], ["start","",lexer.EXIT,11] ];
assertTokensEqual(l.parse("aaa\nbbb\nccc"), testTokens, 'Reentry Pattern'); }
What's happening is, on encountering the _single_ re-entry pattern, the lexer emits _two_ tokens - first an EXIT and then an ENTRY - so it's basically a toggle.
It's fairly easy to add this to the lexer although for dokuwiki "handler" may be a problem, as it's trying to be smart about line breaks.
Anyway - hope that's useful.
Harry
Hi,
In order to fix a problem with the line break plugin[1] I need to ensure a space occurs between text and line endings in the preparsed wiki data. I can do this by using the new IO_WIKIPAGE_READ event[2] but that means I am modifying the data when its being read for other purposes besides parsing for display.
I think there is probably enough separation between the two uses to warrant an event which can supply preprocessing of raw wiki data along these lines[3] as it will also catch any inline uses of the parser.
event: PARSER_TEXT_PARSE (or perhaps PARSER_TEXT_GETINSTRUCTIONS) data: raw wiki text action: parse the text and generate the instructions list preventable: ? probably signalled: p_get_instructions result: instruction list
Any comments?
Cheers,
Chris
[1] to avoid messing up double new lines the plugin can't grab a new line when its the first character being processed - which also occurs when dokuwiki syntax occurs immediately prior to a line break.
[2] great addition Ben.
[3] the recent <IF***> discussion could also make use of access to the data immediately before it is parsed to strip out any content that shouldn't be included.
-- DokuWiki mailing list - more info at http://wiki.splitbrain.org/wiki:mailinglist
-- DokuWiki mailing list - more info at http://wiki.splitbrain.org/wiki:mailinglist