[haiku-appserver] Re: drawing thread

From: "Rudolf" <drivers.be-hold@xxxxxxxxxxxx>
To: haiku-appserver@xxxxxxxxxxxxx
Date: Thu, 21 Oct 2004 14:34:30 +0200 CEST
Adi,

> > Does Accelerantdriver already has the engine stuff in now?
>       AFAIK, no. You should give Gabe some examples of how to use the 2D 
> engine.
Hmm, so I need to think about it, as I never exactly needed to think 
about that before.

>       We have no other choice if we want d.b. At least I don't see it...
>       That means we should use MTRR to the best.
Indeed.

> > So this kind of doublebuffering would be fast (bursts, FW) and 
> > sounds 
> > like a good plan (double-buffering 'source' bitmaps are in main 
> > mem).
> > If locking is done: but you could of course just benchmark both 
> > options 
> > for yourself and see if I am correct ;-)
> > 
> > If you _don't_ use doublebuffering, then suddenly you can't 
> > 'guarantee' 
> > every thread will do serialized access optimally: now the app's 
> > behaviour will determine this. Video will off course be 
> > 'serialized' in 
> > the app, but if some figure is to be drawn onscreen (a transparant 
> > box 
> > or whatever), then this can't be done in a serial fashion (because 
> > the 
> > content of the box is not touched).
> 
> There are 2 cases:
> 1) we have no transparent layer in app_server
>       * without d.b. it is guaranteed drawing is done right, because of 
> the 
> non-overlapping visible region each layer has.
> 2) we have transparent layers in app_server.
>       * it is no problem for views inside a window. They are always 
> rendered 
> from back to front.
>       * _it is_ a problem with transparent windows as they are 2 threads 
> drawing independently. Here, there is no other solution than to force 
> double buffering for the transparent window or not to allow 
> transparent 
> windows when global double buffering is disabled.

I'm not sure I follow. I didn't think about app_server internals. I was 
merely thinking of some app that for some reason directly writes in the 
buffer in cardRAM. You cannot foretell what an app is going to do. Who 
_knows_ what it wants.. Maybe it wants to write something to the 
highest adress pixel, and then the lowest, and then the highest - 1, 
etc.. This is not serial and never will be accelerated by the bus. If 
it were done via doublebuffering however, it would: because the source 
buffer can be copied to the destbuffer in a serial fashion, while the 
app may happily poke around in the sourcebuffer residing in main mem.

> > It's a good idea to have (at least) 32bit word alignment in place 
> > for 
> > the source bitmaps in main mem BTW.
>       Yup, Christian Packmann also said that in regard to MMX/SSE 
> acceleration.
Sounds very valid :-)
I am saying this because of other reasons however: I know I stated for 
instance MTRR working with bytes, but come to think about it, I think 
it was 32bit words.. Probably there are more places in the hardware as 
well that prefer these kinds of alignments.

>       I'm talking about GeForce 3/4/5/6 pixel shaders. You know... the 
> biggest revolution in 3D graphics in years! Programs that can be 
> executed on the GPU - Drawing/Filling ovals, shapes, etc.
OK, sounds nice.
One problem exists however: these programs are kind of a machine code 
that is card architecture specific. The instructions are one of the 
best kept secrets in the graphics industry I am assuming (so AFAIK). 
This means that _this_ is likely going to be a part of the engine we 
won't be able to use. At all. If I setup 3D acc, it will be a mixture 
of real 3D acc and MESA. I think I said this before at some point...

AFAIK the input program delivered by directX or openGL apps is compiled 
or interpreted somehow, and translated to the internal code the engine 
knows before being executed. (lets call it microcode for now)

Then one thing more: you say the shaders can execute programs. True, no 
doubt. But they have to stay somewhere: in the graphics memory that 
(maybe) doesn't get swapped out on low mem like other parts containing 
bitmaps and other 3D scene stuff. At least, that would be logical to me 
ATM. (So _not_ inside some internal extra buffer, though I might be 
mistaken).

> > This is a step I am going to investigate at some point, hopefully 
> > in 
> > the not (too) distant future. The main problem here is to get cache 
> > coherency working OK I expect. There's a second step also (GART and 
> > aperture), but that only improves speed if you are going to fetch 
> > mutliple bitmaps 'simultaneously' (so that's one reason why 
> > normally 
> > this stuff is only used for 3D acc: the second reason is stability.
> 
>       I'll give you one too:
>       We'll need to copy regions from main memory, that means lots of 
> rectangles which can be interpreted as: many bitmaps 
> 'simultaneously'. 
> :-) ;-)
Hehe. IF that is true, than those GART/aperture AGP transfers speedup 
the process more.
However, I don't see it right now. You can't foretell what part of the 
screen you have to update next: as you can't foretell where a window 
hiding some other stuff will be dragged next (direction might change 
for instance). The way I see it is that you will be in fact fetching 
different pieces of different buffers at different times.

> > If you want to use 2D acc later on, you should consider real AGP 
> > transfers 
> > and so acc itself (by letting engine fetch from main mem) as being 
> > an 
> > option.
> 
>       Of course. That is 99% of what we'll have in R1, except that, 
> instead 
> of instructing the CPU to copy in vidMem, we'd instruct the graphic 
> engine to fetch from main mem.
>       Am I wrong?

I'm afraid I can't answer because I am not following you. I think you 
misread my sentence?
lets rewrite it a bit:
question: how are we going to transfer a bitmap to the graphicscard 
RAM?
1. does the driver report it can do acceleration on main mem? yes: use 
it (in the way you state: engine fetching from main mem). No:
2. lets write the bitmap, preferably in a serial fashion, to the 
graphicscard. But where?:
-------- 1. does the driver report it can do local offscreen 
acceleration? yes: use triple buffering if you want. No,
---------2. write directly onscreen.

And another subquestion might apply:
->Do we have AGP FW? Yes, use it. No: shutoff bustransfer intensitive 
extra system drewling feature.

Anyway: or something to that effect.


>       With pixel shaders at hand we won't be needing main memory anymore.
How do you figure?

> I'm SURE this is the way to go.
So I'm afraid you are a bit too much in dreamland... Although I 
certainly would love to see it :-)

>       Fetching from main mem is the best solution until we reach that 
> stage.
Of course. For AGP that is the fastest solution by far. But still, keep 
in mind we will be working on PCIe in the near future probably. I for 
one still have to readup on that subject to see if a AGP like setup 
still exists, or if this 'feature' has been totally abandoned for 
something better.

PCIe is said to be PCI compatible: not AGP compatible. Although nVidia 
_for now_ still uses a local AGP bridge on their cards. This is just a 
temporary marketing/financial workaround however. If PCIe is going to 
be a success, they will drop that immediately probably.

> > (Quake2 on my laptop runs at 3.9fps already with MESA6.1, so in 
> > openGL 
> > mode (compared to those 92FPS in internal software rendering mode I 
> > once talked about))
> 
>       You need good 3D drivers. I'm sure Quake2 will run at 600-700FPS 
> with 
> the same settings.
Of course. But the point was, that if you keep the 'load' relatively 
low, you could live with MESA only on systems that lack hardware acc.
I am certainly hoping that any desktop system will not need to use 3D 
engines as heavily as those nice games...

On the other hand, maybe your 3D stuff should be that 'drewling 
feature' that simply can be shut off if a system does not support 
hardware acc. But still: keep in mind that we will probably never have 
full featured acc: it will always be a mix between some hardware acc'd 
functions and software based stuff.

>       I've thought about that also, a little. One simple solution is to 
> have 
> app_server use only 1/3rd of video memory, but that is not a viable 
> solution. :-P
Indeed. Here you can think of multiple strategies as well I guess. If 
needed at all: who knows.

> > Hey, I still miss something else that's interesting as well I 
> > think. I 
> > understand what you mean by double/triple buffering, and I 
> > understand 
> > why you want to do that. But still, I am wondering about another of 
> > BeOS it's shortcomings: updating the screen in such a way that you 
> > won't see distortions if you drag a window (for instance). I mean: 
> > tearing. Some people talk about double buffering in this context: 
> > having a copy of the entire screen in cardRAM that is switched 
> > between 
> > during retraces.
> 
>       You mean: page flipping. :-) ;-) B-)
Yeah, yeah: you could call it that. Only normally it's used for apps I 
guess, but I meant it for system-internal use only.

> > This of course requires both buffers to be updated 
> > with everything, and you can talk and think a lot about strategies 
> > to 
> > get this done with minimal overhead.
> 
>       I'm listening... :-)
This is a subject I haven't thought about much. Two versions could be:
1. just draw everything to both buffers, but delay writing to the one 
onscreen untill it's offscreen.
2. Draw only to the offscreen buffer, and draw just the differences 
applied there to the onscreen buffer once it goes offscreen.

> > But the goal it serves would be a perfect, undistorted screen 
> > output at 
> > all times, which would be nice to have as well at some point.. 
> > (just 
> > updating the single screenbuffer during retrace is much too 
> > expensive I 
> > guess, as this leaves very little time to update the buffer: the 
> > retrace is a relative very short piece in time compared to 'full-
> > time 
> > acc', as is used now.)
> 
>       Wait a second, you are not talking about page flipping?
Not as you saw it, no. Talk to Apple users: it seems this system I am 
talking about is at least used there.

>       A while back you were talking about Allegro game library. Isn't 
> page 
> flipping the technique used by it for double buffering. Isn't the 
> page 
> flipping process ordered by the gaming library, and done on next 
> retrace?
Yep. 

> Then, why are you talking about _every_ retrace?
because now we have a system 'feature'. OTOH: if nothing changes 
onscreen, no flip is needed of course. You could think multiple ways 
here as well (as usual :)
1. just flip every retrace
2. only flip if something changed.

Whatever.. Maybe if you start thinking about this stuff a reason for 
one of them comes to mind ;-)


Bye!

Rudolf.
Follow-Ups:
- [haiku-appserver] Re: drawing thread
  - From: Gabe Yoder
- [haiku-appserver] Re: drawing thread
  - From: Adi Oanca
References:
- [haiku-appserver] Re: drawing thread
  - From: Adi Oanca
[haiku-appserver] Re: drawing thread

Other related posts: