[dokuwiki] Solving sitemaps for farms

  • From: Michael Hamann <michael@xxxxxxxxxxxxxxxx>
  • To: dokuwiki <dokuwiki@xxxxxxxxxxxxx>
  • Date: Sat, 13 Mar 2010 13:21:01 +0100

Hi everybody,

since quite a while (since May 2009) there is a feature request for more
flexible sitemap generation in the bugtracker, see
http://bugs.dokuwiki.org/index.php?do=details&task_id=1693.

The discussion has also appeared on this list some time ago, but without coming
to a decision or a concrete plan that is accepted.  As farms support is around
now for some time and sitemaps are - from my point of view - the only really
broken feature in farms, I think it would be a good idea to fix them.

The thing that makes sitemaps a bit special is that they need to be served from
the directory they shall be for, that means the dokuwiki root directory.
In farm setups this is not DOKU_INC and it might even be that such directory
doesn't exist as the only things that are necessary are the conf and data
directory, which may be outside the document root. That's why we can't just
place the sitemap in the right directory, as there might be no such directory.

So in other words: Sitemaps need to be served by a PHP script. As cluttering
the dokuwiki root directory doesn't seem to be a good idea (at least imho) I
think doku.php should serve the sitemap.

There are a couple of questions remaining. First of all there is the question
if the sitemap should remain in the core code or if it should rather be moved
into a plugin. As I don't think sitemaps are that essential and there might be
wishes for e.g. custom ranks for each page I think it might be a good idea to
move it into a plugin in order to make it easily replaceable. It might also be
an option to provide an event that allows to change the sitemap data.

There might also be options to e.g. generate sitemaps for specific users
permissions or for namespaces, but I don't think that's a common problem and it
would be the only reason to make sitemap generation dynamic (and no longer an
indexer task), so I think sitemap generation could and should remain an indexer
task.

Another important question is how to provide backwards compatibility and how to
update the old file. Users using .htaccess for rewriting anyway might just add
a rewrite rule, and probably search engines should also be pinged using that
rewritten url. Deleting the old sitemap and replacing it with a directory that
contains an index.php file will probably work too, the webserver then sends a
redirect e.g. to sitemap.xml/ and index.php could again provide a permanent
redirect to the new location of the sitemap so everyone who knows the old
sitemap will also find the new one (given he follows two redirects). Probably
there needs to be config setting which url shall be given as parameter when
pinging search engines. Making the list of search engines that shall be pinged
configurable will probably be a good idea, too.

All in all I think the new version will, unfortunately, need a lot more of
configuration options, which I also think it would be better to move that stuff
into a plugin.

I could also think of a completely different fix: Just adding the sitemap to
the data directory somewhere, too (which needs to be done anyway) and provide a
doku.php-action that serves it, but leaving the old file and url in place so
farm owners (who normally do a custom server setup or rewriting, anyway) need
to provide a rewrite rule for serving the correct sitemap file, but everybody
else won't need to care about the change.

The whole change might also be done by a plugin that duplicates the core
functionality, but would only be needed by those who need more options for
their sitemaps. The core code might also be restructured in a way that the
plugin could use that code without duplicating the functionality.

If somebody of you has different ideas, opinions, ... on how sitemaps shall be
made ready for farms, please tell!

I'd be happy to receive your feedback on this issue as I'm not really decided
what to do, at the moment I think that the last option described might be the
best as everything else might confuse users who don't need to care about a lot
or seem a bit messy.

I will of course also provide an implementation of whatever option it will be
(unless somebody else wants to provide one).

Greetings
Michael
-- 
DokuWiki mailing list - more info at
http://www.dokuwiki.org/mailinglist

Other related posts: