Stephan Assmus <superstippi@xxxxxx> wrote: > One thing I noticed in my performance comparissons is that our > client->server communication seems to take too much time. It takes us > sometimes more than double the ammount of time in some cases to > figure out > that we don't need to do anything (disregarding drawing commands > outside of > the current clipping region). Our drawing implementation itself is > absolutely fast enough, also the clipping. But the communication > overhead > is quite large. I have looked at our LinkSender implementation, but > it > looks fine to me. Our BLooper::check_lock() also seems to take too > much > time. I don't know why, it looks fast. (check_lock() is called in > every > drawing function). > > I have a test where I draw 100 individual points using StrokeLine() > and > measure the time inbetween two Sync()s. Running the program on ZETA > produces these results: > > drawing outside clipping region: 93 µsecs > with actual drawing: 213 µsecs > increase: 120 µsecs > > > running in the app_server test environment: > > drawing outside clipping region: 205 µsecs > with actual drawing: 382 µsecs > increase: 177 µsecs > > ... the increase is just 57 µsecs more for the test environment, and > that > is for drawing into a bitmap and making sure a BView is invalidated > eventually for every single dot. So the actual drawing is not the > problem. Have you tried to compare the two when running in a BDirectWindow? Anyway, it's nice to compare them this way; at least missing Haiku kernel optimizations won't matter this way :-) > On the client side, we are looking at these numbers: > Dano: 15 µsecs > test environment: 45 µsecs > > ... to fire off the 100 StrokeLine commands to the server. 20 µsecs > in our > number are just the check_lock() implementation (using > find_thread(NULL)). It looks like the BeOS BLooper::check_lock() implementation uses the fCachedStack member - just like what the MultiLocker implementation does. AFAICT this shouldn't really result in a speedup on x86 machines, though, only on PPC... Have you tried calling their version vs. our version directly in a loop a few 100000 times? > The rest of the additional delay seems to be just our communication > overhead. There, it would be interesting to see how much the client writes to the server over the port; maybe it actually uses shared memory for the communication. > So the question is, I guess, does anybody have any ideas on how to > cut down > on those times? It's definitely helpful to pin down the performance hogs more specifically. Ie. what function exactly needs longer than it should, and why. In the case above, if find_thread(NULL) actually takes more time than whatever Dano does here, we should use that hack, too :-) Bye, Axel.