[dokuwiki] Re: google crawls my export=raw?

From: Ben Coburn <btcoburn@xxxxxxxxxxxxx>
To: dokuwiki@xxxxxxxxxxxxx
Date: Mon, 3 Apr 2006 19:20:13 -0700


On Apr 3, 2006, at 1:17 PM, Andreas Gohr wrote:

On Tue, 28 Mar 2006 16:08:05 +0200
peter pilsl <pilsl@xxxxxxxxxxxx> wrote:


Is there a way to prevent google from indexing my /export=raw/ -
pages.

Example: if you search at google for "goldfisch malmoe" you'll find

http://www.goldfisch.at/goldfisch/projekte?do=export_raw


Hmm no I don't but this may be caused by google redirecting me to
google.de

listed as second page. I tried to find something to put in my
robots.txt, but since no wildcards allowed in robots.txt, I didnt find
any suitable.


I guess you could disallow those for the google bot by using
mod_rewrite.

The problem is that we can't set any metaheaders because it's text/plain mimetype. Does anyone know of any HTTP headers we could use as alternative?

I don't know if this was what you were thinking of Andi, but how about something like this....

If mod-rewrite mode is enabled set the alternate urls to the form:
  _export_xhtml/wiki:dokuwiki
  _export_raw/wiki:dokuwiki
in place of
  /wiki:dokuwiki?do=export_xhtml
  /wiki:dokuwiki?do=export_raw

Then add a rewrite rule like this:
RewriteRule  ^_export_([^/]+)/(.*)  doku.php?do=export_$1&id=$2  [QSA,L]

The robots.txt file can then be used to advise robots not to use specific export types. After all, some export types (such as pdf) may be good for robots. For example: User-agent: * Disallow: _export_raw/ Disallow: _export_xhtml/

Regards, Ben Coburn


-------------------
   silicodon.net
-------------------

--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Follow-Ups:
- [dokuwiki] Re: google crawls my export=raw?
  - From: Andreas Gohr

References:
- [dokuwiki] Re: google crawls my export=raw?
  - From: Andreas Gohr

[dokuwiki] Re: google crawls my export=raw?

Other related posts: