[dokuwiki] 304 Not Modified Response and Search Engine Bots

  • From: Peter Yu <web@xxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Thu, 12 Aug 2010 13:32:35 -0400

Hi everyone,

I have been looking at my web access logs and it looks like DokuWiki is not returning a 304 to search engine bots.

There's two issues:

1) The Wiki pages do not send Last Modified dates and do not support If-Modified-Since. 2) Media files 304 support does not work all the time. I have some PDF documents that get downloaded over and over again by Google.

For #1, is it possible to include If-Modified-Since support for pages?

For #2, I did a bit of investigation and found the function http_conditionalRequest in inc/httputils.php. It looks like the 304 is only issued if the IF-MODIFIED-SINCE date sent by the user agent is exactly equal to the $last_modified date.

I confirmed the behaviour by using wget. The file's Last Modified is "Sat, 24 Jul 2010 02:56:44 GMT". Issuing the following gets me a 304:

wget --header='If-Modified-Since: Sat, 24 Jul 2010 02:56:44 GMT' -S http://mysite/myfile.pdf

But if I change the If-Modified-Since to be after the last modified, I get a 200 and a full download.

I don't know whether search engine bots like Google send the Last Modified date they received the last time they visited as the If-Modified-Since, or if they send the date they last visited. If it is the latter then they will never get a 304.

If I change this line in http_conditionalRequest:

if ($if_modified_since && $if_modified_since != $last_modified)

to:

if ($if_modified_since && strtotime($if_modified_since)
        < strtotime($last_modified))

then the 304 gets sent properly to wget. I have not tried it on a live site because I don't know if it is correct.

I have searched the mailing list and the bug tracker for mentions of 304 but didn't find anything related to this, so I wanted to ask here. Thanks very much.
--
DokuWiki mailing list - more info at
http://www.dokuwiki.org/mailinglist

Other related posts: