[dokuwiki] Re: Performance and caching

  • From: "Joe Lapp" <joe.lapp@xxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Wed, 07 Sep 2005 17:18:39 -0500 (CDT)

Harry, thanks for naming specific issues that would need to be addressed.

My thoughts on link color: Perhaps we can live with link color that might be 
wrong for the 2 or 3 minutes needed for a page to go stale.  If the page has 
been newly created, clicking on a red link will still get you there.  If 
logged-in users get fresh pages, their links will always be right.  Besides, 
you can always force-invalidate page caches, should the backlink system be 
suitable for this.  At the very least, you can force a refresh on the red-link 
referring page.

The last point is an important one to remember too.  If DokuWiki knows which 
pages are made obsolete by an event, it can always invalidate the cache files 
for those pages, forcing them to refresh on next load.

I'm not fond of asking the client to issue more requests, as a separate way to 
validate links.  Servicing a new connection is one of the more resource-costly 
tasks we can ask of a server.

It really isn't an issue to live with pages that are stale by minutes.  The 
only users who are going to know the difference are those that just created or 
deleted a page -- and possibly only if they're not logged in.  They'll need to 
be aware of the staleness interval.  But you can even hide that a bit by 
freshening the HTTP referrer ("referer") page and any backlinks DokuWiki is 
aware of.

~joe

----- Start Original Message -----
From: Harry Fuecks <hfuecks@xxxxxxxxx>
To: dokuwiki@xxxxxxxxxxxxx
Subject: [dokuwiki] Re: Performance and caching

> Have to confess I'm in Joe's camp, reading this and the comments on 
> http://wiki.splitbrain.org/wiki:discussion:performance. Not that I really 
> have a high traffic site to worry about - it's just a perfectionist thing. 
> Injecting a little competition, I think Dokuwiki could top Mediawiki as a 
> "high load" wiki engine.
> 
> Two particular thoughts.
> 
> As Andy points out the XHTML has more dependencies than the instructions so 
> is harder to cache. But can those dependencies be removed? By losing the 
> functionality, obviously it can be done but what about shifting stuff client 
> side and keeping the functionality? The trace, for example, it currently 
> handled server-side with PHP sessions. What about moving all that client 
> side, using cookies and some DOM tweaking with Javascript? Also the 
> red/green links - assuming they are internal links, it might be possible to 
> check these using XMLHTTPRequest and HTTP HEAD requests (or a GET request 
> with ?exists at the end of the URL) - the content isn't returned - just the 
> correct HTTP status code. Doing all that would allow whole page caching, at 
> least for the documents themselves (most commonly requested) - things like 
> backlinks might get harder depending on how they're done.
> 
> The other point is using HTTP headers and client side caching. There's alot 
> that can be done here to drastically reduce load although it essentially 
> depends on having a whole page cache. This is basically this whole REST (
> http://en.wikipedia.org/wiki/REST) buzz people having been going on about - 
> use what HTTP already offers.
> 
> On 9/7/05, Joe Lapp <joe.lapp@xxxxxxxxx> wrote:
> > 
> > Thanks for the explanation, Andi. That makes sense. But I still disagree 
> > on the issue of server capacity.
> > 
> > I was thrown by the "$renderer->info['cache'] = false;" feature that seems 
> > to turn page caching on or off for the entire page. I figured that if 
> > you're 
> > refreshing at the page level but caching at the syntactic element level, 
> > then caching would be far from optimal. I didn't snoop around the code; is 
> > there a separate $info[] array for each syntactic element? Is this what 
> > you're saying?
> > 
> > However, I do want to clarify one thing. Every time you reduce resource 
> > usage you necessarily improve the **peak load** capacity. For example, 
> > assembling the page requires resources (CPU time, heap, disk access) that 
> > can be reduced by not assembling the page. In this case, I expect peak load 
> > to go up significantly, because you're doing work when the alternative is 
> > to 
> > do almost nothing (read on).
> > 
> > I suspect you're right that the **time** required to load a page, on 
> > average, would not change much by going to page-level caching, provided 
> > that 
> > the server is not running near capacity. There would be little change to 
> > off-peak performance. The issue is at what point does the server approach 
> > capacity, where connections start to become unavailable or clients 
> > sporadically start timing out?
> > 
> > So there's the question of what would have to happen in order to do 
> > page-level caching and whether this would use significantly fewer 
> > resources. 
> > I agree that a cached dynamic page will not contain real-time results -- it 
> > will be stale but some period of time, usually just minutes. If you want 
> > real-time behavior equivalent to what you offer now (if I understand you 
> > properly), you'd have to turn caching off for that page. But if a page is 
> > static or if a page can live with being stale by 2 or 3 minutes, you can 
> > make a page retrieval use little more resources than would by required by 
> > the web server retrieving the page directly.
> > 
> > Again, the issue is server resources, not round-trip time. With page-level 
> > caching you would only load the PHP source files needed to decide whether a 
> > page can be retrieved or must be refreshed. Asking the server to load only 
> > a 
> > few PHP files also itself significantly reduces server resources and 
> > improves peak capacity. If the user is logged in and the page needs to 
> > contain user-specific info, then the page would not be drawn from cache but 
> > instead rebuilt via the existing cached-instructions mechanism -- providing 
> > some caching benefit even there. (Likewise for other no-cache signals.)
> > 
> > Unless I'm missing something, page-level caching should greatly improve 
> > the load capacity of DokuWiki servers whose clients are primarily read-only 
> > anonymous visitors. I suspect that for most installations, this will be the 
> > limiting characterization, and page-level caching would help tremendously.
> > 
> > I think it's the difference between allowing DokuWiki to be used on 
> > professional, high-volume sites versus remainingly largely a cool hobbiest 
> > tool.
> > 
> > And I do think DokuWiki is a cool hobbiest tool. I'm just a perfectionist 
> > with big aspirations.
> > 
> > Best,
> > ~joe
> > 
> > P.S. I think the best way to deal with this is to mock up an 
> > imperfect-but-close page-level caching scheme, and then to compare peak 
> > load 
> > benchmarks (not page-gen times) before and after.
> > 
> > ----- Start Original Message -----
> > From: Andreas Gohr <andi@xxxxxxxxxxxxxx>
> > To: dokuwiki@xxxxxxxxxxxxx
> > Subject: [dokuwiki] Performance and caching (was: Roadmap for next 
> > release)
> > 
> > > On Mon, 05 Sep 2005 17:24:29 -0500 (CDT)
> > > "Joe Lapp" <joe.lapp@xxxxxxxxx> wrote:
> > >
> > > > Have you seen my gibbery mumbo jumbo in the performance discussion?
> > > > http://wiki.splitbrain.org/wiki:discussion:performance
> > >
> > > Yes I read it but hadn't the time to answer. I'm not sure if pagelevel
> > > caching would increase performance very much.
> > >
> > > You've read about the twostage caching and wondered what it is for. Let
> > > me explain how the current caching works. Rendering a wiki page consists
> > > of two parts: parsing (creating instructions) and rendering (creating
> > > xhtml). Both tasks are timeconsuming. But the data both tasks create are
> > > very different in how long it is valid.
> > >
> > > Instructions are only dependent on a single page (their source) and as
> > > long as the source isn't changed the instructions do not change. This
> > > means the instruction cache is only expired when it's source is changed,
> > > which occurs relatively seldom.
> > >
> > > The output (XHTML in our current case) is dependent on it's instructions
> > > (from step 1) but on other pages as well. For example a link
> > > becomes red or green depending on it's target. So we need to expire all
> > > xhtml cache files if a page is added or removed. We also need to add a
> > > cache timeout for things that need to be updated periodically (eg. RSS
> > > feed inclusions). You see the xhtml cache isn't very durable.
> > >
> > > You now understand why there are two stages in caching. If a final cache
> > > is available it is quickest to use it, but this xhtml cache may be
> > > already out of date. But we can still rely on the instructions cache to
> > > save half of the full rendering time.
> > >
> > > Okay now back to your proposal of adding another "whole page" cache. It
> > > wouldn't save any speed on rendering the content it self, as this is
> > > already covered by the mechanisms mentioned above. It would only save
> > > some time on all the things that happen around the content. But to make
> > > sure the user never gets a wrong page we would have to do a lot of
> > > checks and exeptions. We would need to do the checks already done for
> > > the current xhtml cache, we would need to check the authentication,
> > > breadcrumbs shouldn't be cached, we would need to provide a way to
> > > exclude stuff from the cache for templates and so on...
> > >
> > > I hope I made somehow clear why I don't think a global page cache would
> > > improve much. However I _do_ think there is lots of room for improvement
> > > performance wise. Many things could probably done more effective and I
> > > would be happy for all patches that would speed up certain functions.
> > >
> > > Andi
> > >
> > > --
> > > http://www.splitbrain.org
> > > --
> > > DokuWiki mailing list - more info at
> > > http://wiki.splitbrain.org/wiki:mailinglist
> > >
> > 
> > ----- End Original Message -----
> > --
> > DokuWiki mailing list - more info at
> > http://wiki.splitbrain.org/wiki:mailinglist
> >

----- End Original Message -----
--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: