Hi, On Sun, Apr 05, 2009 at 05:58:37PM +0200, Robert Rackl wrote: [...] > My problem is now: Should I compare raw wikitext or HTML? > > 1) Compare HTML > I can quite simple get the old and new version of the page in xHTML (the > new version I already get for free in my RENDERER_CONTENT_POSTPROCESS > action plugin. And the old version I can get get via > Plugin->render($oldRawWikiText) ). Now I feed this into the > DifferenceEngine and as a result I get a set of 'edits'. But now I > cannot simply surround the text of these edits with <span > class="changed">foobar</span> tags in the xHTML. The original "foobar" > part in the HTML page might contain unballanced HTML tags. I tried to find a solution for comparing xHTML some time ago. There is a Java implementation that claims to work, but I haven't tried it as Java wasn't an option for me. There is a Ruby solution that claims to work. I was able to find an example that didn't work within minutes. So in other words: I couldn't find any solution that works with complex xHTML structures and will run on a "normal" webspace. (Think of: a unordered list is replaced by an ordered one and one item is changed.) > 2) Compare raw wiki text > So the other way round. This is what the DifferenceEngine normally does > anyway. But now I get a set of edits in the raw wiki text. How do I match > these edits to paragraphs in my rendered xHTML page? This is actually the way I've implemented it. I've used Text/Diff from PEAR that matches on word level and inserts ins/del-tags and then I changed the ins/del-Tags a bit so they are e.g. always after the markers for lists and not across paragraphs, ... That doesn't work in all cases, but as in my case the markup is relatively limited it does work quite well. > My Ideas: > - write my own DifferenceEngine thats more clever? :-) Depends on your skills and time. ;) > - compare HTML: and wrap only changed lines with <span> tags that do not > contain unballanced html tags Sounds not really easy, but I might be wrong. > - compare raw wikitext. "somehow" pass the flag: "This > (rawwikitext-)part has changed since last login" on to the Parser. Then > the Parser creates new instructions for the Renderer, e.g. 'p_open' with > parameter "changed". Of course this would not be a plugin anymore. It > would require code changes in the Parser and Renderer. I guess that would require a lot of changes, but will perhaps work. > - my favorite idea: do not compare HTML as characters, but compare the > DOM-tree It's really difficult, the Ruby example does that, and as I've already said, without success. The problem is that you need really complex rules. That means your code needs to know all the rules xHTML has, which tags may be nested and which not and so on. And what do you do with changes to attributes? And you might not be able to detect really complex structural changes unless you do a lot of matching... You can find the solutions I've found at http://www.diigo.com/user/michitux/diff, the first 2 links are the tools I've mentioned, the next 2 are other approaches to the problem... Greetings Michael Hamann -- DokuWiki mailing list - more info at http://wiki.splitbrain.org/wiki:mailinglist