[dokuwiki] Re: Plugin: ShowChanges since last login

From: Michael Hamann <michael@xxxxxxxxxxxxxxxx>
To: dokuwiki@xxxxxxxxxxxxx
Date: Sun, 5 Apr 2009 21:34:21 +0200
Hi,

On Sun, Apr 05, 2009 at 05:58:37PM +0200, Robert Rackl wrote:
[...]
> My problem is now: Should I compare raw wikitext or HTML?
>
> 1) Compare HTML
> I can quite simple get the old and new version of the page in xHTML (the 
> new version I already get for free in my RENDERER_CONTENT_POSTPROCESS 
> action plugin. And the old version I can get get via 
> Plugin->render($oldRawWikiText)  ). Now I feed this into the 
> DifferenceEngine and as a result I get a set of 'edits'. But now I  
> cannot simply surround the text of these edits with <span  
> class="changed">foobar</span> tags in the xHTML. The original "foobar"  
> part in the HTML page might contain unballanced HTML tags.

I tried to find a solution for comparing xHTML some time ago. There is a
Java implementation that claims to work, but I haven't tried it as Java
wasn't an option for me. There is a Ruby solution that claims to work. I
was able to find an example that didn't work within minutes. So in other
words: I couldn't find any solution that works with complex xHTML
structures and will run on a "normal" webspace. (Think of: a unordered
list is replaced by an ordered one and one item is changed.)

> 2) Compare raw wiki text
> So the other way round. This is what the DifferenceEngine normally does 
> anyway. But now I get a set of edits in the raw wiki text. How do I match 
> these edits to paragraphs in my rendered xHTML page?

This is actually the way I've implemented it. I've used Text/Diff from
PEAR that matches on word level and inserts ins/del-tags and then I
changed the ins/del-Tags a bit so they are e.g. always after the markers
for lists and not across paragraphs, ... That doesn't work in all cases,
but as in my case the markup is relatively limited it does work quite
well.

> My Ideas:
> - write my own DifferenceEngine thats more clever? :-)

Depends on your skills and time. ;)

> - compare HTML: and wrap only changed lines with <span> tags that do not 
> contain unballanced html tags

Sounds not really easy, but I might be wrong.

> - compare raw wikitext. "somehow" pass the flag: "This  
> (rawwikitext-)part has changed since last login" on to the Parser. Then 
> the Parser creates new instructions for the Renderer, e.g. 'p_open' with 
> parameter "changed". Of course this would not be a plugin anymore. It 
> would require code changes in the Parser and Renderer.

I guess that would require a lot of changes, but will perhaps work.

> - my favorite idea: do not compare HTML as characters, but compare the  
> DOM-tree

It's really difficult, the Ruby example does that, and as I've already
said, without success. The problem is that you need really complex
rules. That means your code needs to know all the rules xHTML has, which
tags may be nested and which not and so on. And what do you do with
changes to attributes? And you might not be able to detect really
complex structural changes unless you do a lot of matching...

You can find the solutions I've found at
http://www.diigo.com/user/michitux/diff, the first 2 links are the tools
I've mentioned, the next 2 are other approaches to the problem...

Greetings
Michael Hamann
-- 
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist
References:
- [dokuwiki] Plugin: ShowChanges since last login
  - From: Robert Rackl
[dokuwiki] Re: Plugin: ShowChanges since last login

Other related posts: