[dokuwiki] Re: google crawls my export=raw?

  • From: Ben Coburn <btcoburn@xxxxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Mon, 3 Apr 2006 19:20:13 -0700


On Apr 3, 2006, at 1:17 PM, Andreas Gohr wrote:

On Tue, 28 Mar 2006 16:08:05 +0200
peter pilsl <pilsl@xxxxxxxxxxxx> wrote:


Is there a way to prevent google from indexing my /export=raw/ - pages.

Example: if you search at google for "goldfisch malmoe" you'll find

http://www.goldfisch.at/goldfisch/projekte?do=export_raw

Hmm no I don't but this may be caused by google redirecting me to google.de


listed as second page. I tried to find something to put in my
robots.txt, but since no wildcards allowed in robots.txt, I didnt find
any suitable.

I guess you could disallow those for the google bot by using mod_rewrite.

The problem is that we can't set any metaheaders because it's text/plain
mimetype. Does anyone know of any HTTP headers we could use as
alternative?

I don't know if this was what you were thinking of Andi, but how about something like this....


If mod-rewrite mode is enabled set the alternate urls to the form:
  _export_xhtml/wiki:dokuwiki
  _export_raw/wiki:dokuwiki
in place of
  /wiki:dokuwiki?do=export_xhtml
  /wiki:dokuwiki?do=export_raw

Then add a rewrite rule like this:
RewriteRule  ^_export_([^/]+)/(.*)  doku.php?do=export_$1&id=$2  [QSA,L]

The robots.txt file can then be used to advise robots not to use specific export types. After all, some export types (such as pdf) may be good for robots. For example:
User-agent: *
Disallow: _export_raw/
Disallow: _export_xhtml/


Regards, Ben Coburn


------------------- silicodon.net -------------------

--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: