[haiku-appserver] Re: accelerating app_server

  • From: Stephan Assmus <superstippi@xxxxxx>
  • To: haiku-appserver@xxxxxxxxxxxxx
  • Date: Mon, 23 Jul 2007 15:49:41 +0200

Axel Dörfler wrote (2007-07-23, 02:38:41 [+0200]):
> Stephan Assmus <superstippi@xxxxxx> wrote:
> > One thing I noticed in my performance comparissons is that our 
> > client->server communication seems to take too much time. It takes us 
> > sometimes more than double the ammount of time in some cases to figure 
> > out
> > that we don't need to do anything (disregarding drawing commands 
> > outside of
> > the current clipping region). Our drawing implementation itself is 
> > absolutely fast enough, also the clipping. But the communication 
> > overhead
> > is quite large. I have looked at our LinkSender implementation, but it
> > looks fine to me. Our BLooper::check_lock() also seems to take too much
> > time. I don't know why, it looks fast. (check_lock() is called in every
> > drawing function).
> >
> > I have a test where I draw 100 individual points using StrokeLine() and
> > measure the time inbetween two Sync()s. Running the program on ZETA 
> > produces these results:
> >
> > drawing outside clipping region: 93 µsecs
> > with actual drawing: 213 µsecs
> > increase: 120 µsecs
> >
> >
> > running in the app_server test environment:
> >
> > drawing outside clipping region: 205 µsecs
> > with actual drawing: 382 µsecs
> > increase: 177 µsecs
> >
> > ... the increase is just 57 µsecs more for the test environment, and 
> > that
> > is for drawing into a bitmap and making sure a BView is invalidated 
> > eventually for every single dot. So the actual drawing is not the 
> > problem.
> 
> Have you tried to compare the two when running in a BDirectWindow? 
> Anyway, it's nice to compare them this way; at least missing Haiku kernel 
> optimizations won't matter this way :-)

The current Accelerant based test environment drops me in the kernel 
debugger on Dano. With app_server calling a function named 
commit_suicide()... I have looked into it, but it seemed like a bit too 
much work right now to make just a BDirectWindow version. Maybe I have 
another look later.

> > On the client side, we are looking at these numbers: Dano: 15 µsecs
> > test environment: 45 µsecs
> >
> > ... to fire off the 100 StrokeLine commands to the server. 20 µsecs in 
> > our
> > number are just the check_lock() implementation (using 
> > find_thread(NULL)).
> 
> It looks like the BeOS BLooper::check_lock() implementation uses the 
> fCachedStack member - just like what the MultiLocker implementation does. 
> AFAICT this shouldn't really result in a speedup on x86 machines, though, 
> only on PPC...
> Have you tried calling their version vs. our version directly in a loop a 
> few 100000 times?

Not yet.

> > The rest of the additional delay seems to be just our communication 
> > overhead.
> 
> There, it would be interesting to see how much the client writes to the 
> server over the port; maybe it actually uses shared memory for the 
> communication.
> 
> > So the question is, I guess, does anybody have any ideas on how to cut 
> > down
> > on those times?
> 
> It's definitely helpful to pin down the performance hogs more 
> specifically. Ie. what function exactly needs longer than it should, and 
> why. In the case above, if find_thread(NULL) actually takes more time 
> than whatever Dano does here, we should use that hack, too :-)

BLooper::check_lock() is definitely a problem (and it pretty much only 
calles find_thread(NULL)), but I am wondering - which version is used in 
the test environment, our's or Dano's?

Best regards,
-Stephan

Other related posts: