[haiku-appserver] Re: drawing thread

  • From: Adi Oanca <adioanca@xxxxxxxxxxxxx>
  • To: haiku-appserver@xxxxxxxxxxxxx
  • Date: Thu, 21 Oct 2004 14:08:19 +0300

Hi Rudolf,

Rudolf wrote:
> Hi Adi,
> 
> 
>>      We use ViewDriver because our instruction clipping code (when in an 
>>update you are allowed to draw only in a specified region) is not in 
>>place and we use BView::ConstrainClippingRegion(&reg) for that.
> 
> Nice!  I didn't even know that such a option existed.. I guess it's 
> because I never really wrote an app yet :-/

        :-)))))))
        I only wrote applications that had a few windows which had _only_ 
buttons. For BeOS, that is. :-P

>>      When Gabe finises the clipping code(read DisplayDrive it's done) 
>>we'll 
>>be able to use the full acceleration an accelerant can give us.
> 
> So, you can use DD and AccelerantDriver.

        DD in conjuction with AccelerantDriver, yes.

> Does Accelerantdriver already has the engine stuff in now?

        AFAIK, no. You should give Gabe some examples of how to use the 2D 
engine.

> Or will it do software exectution only at the beginning? (just curious)

        Again, AFAIK, accelerant's 2D hooks will be used where possible.

>>>The sentence also implies that both the source and destination of 
>>>the 
>>>drawing actions are residing on the graphicscards RAM, or acc would 
>>>not 
>>>be possible (ATM at least).
>>
>>      HW cursor is possible. The rest, no. It's no problem, we'll use 
>>MMX, 
>>SSE. That is until you provide us with drivers that can use pixel 
>>shaders. :-))))
> 
> OK, so you are indeed confirming here you know you won't be using the 
> ACC engine in the driver. You are using MMX and SSE which in my book is 
> software drawing, so no acceleration from the cards engine. It's indeed 
> the best you can do for now, so sounds OK to me :)

        Glad. :-)
        We have no other choice if we want d.b. At least I don't see it...
        That means we should use MTRR to the best.

>>>OK, here's my 'warning':
>>>While you are correct in saying you can draw 'parallel' in those 
>>>regions inside the buffer, there may be important performance 
>>>penalties 
>>>if you don't serialize the access after all. The 'burst mode' of 
>>>writing across the PCI/AGP bus depends on serialized access (at 
>>>least 
>>>beyond those say 32bytes blocks, within these blocks it doesn't 
>>>matter). Burst mode (fully automatically generated by the system's 
>>>hardware) works in PCI mode, and in AGP mode. In AGP mode bursts 
>>>are 
>>>the ones being accelerated with the 'fastwrites' feature.
>>
>>      How do we serialize access?
> 
> 
> Locking, semaphores. Only one thread draws in the graphicsRAM at a 
> time.

        OK. Perfect! That would be Poller Thread if DW agrees.

> This ensures that you have the best chance that it's memory is 
> accessed in a more or less serial way. Of course, in the end it depends 
> on what every thread is doing. If you are updating some piece of 
> background window because it just became foreground (some other window 
> moved away), you will be doing it serially as well: every line of 
> screen is being updated (filled) from left to right (or vice versa). 
> You only make jumps in memory if you reach the end of such a updated 
> part of a line: the jump will be "bytes_per_row". After this serial 
> access again happens.

        This scenario is perfect for d.b. solution I've proposed.

> So this kind of doublebuffering would be fast (bursts, FW) and sounds 
> like a good plan (double-buffering 'source' bitmaps are in main mem).
> If locking is done: but you could of course just benchmark both options 
> for yourself and see if I am correct ;-)
> 
> If you _don't_ use doublebuffering, then suddenly you can't 'guarantee' 
> every thread will do serialized access optimally: now the app's 
> behaviour will determine this. Video will off course be 'serialized' in 
> the app, but if some figure is to be drawn onscreen (a transparant box 
> or whatever), then this can't be done in a serial fashion (because the 
> content of the box is not touched).

There are 2 cases:
1) we have no transparent layer in app_server
        * without d.b. it is guaranteed drawing is done right, because of the 
non-overlapping visible region each layer has.
2) we have transparent layers in app_server.
        * it is no problem for views inside a window. They are always rendered 
from back to front.
        * _it is_ a problem with transparent windows as they are 2 threads 
drawing independently. Here, there is no other solution than to force 
double buffering for the transparent window or not to allow transparent 
windows when global double buffering is disabled.

>>      If we're doing a blit(copy line by line) from main mem to on-screen 
>>video mem, won't the HW generated burst step into action?
> 
> Yep. OK, you could call this HW acceleration also. In my book however, 
> I would not call it that way. I guess I still have to read you 
> app_server guys book of defines yet.. :-/ (or vice versa of course ;-)
> 
> I wouldn't even know what to call it per se, but something like 'bus 
> acceleration' as a name for it would be much better here..

        OK. :-)

> It's a good idea to have (at least) 32bit word alignment in place for 
> the source bitmaps in main mem BTW.

        Yup, Christian Packmann also said that in regard to MMX/SSE 
acceleration.

>>      I don't think we'll be using videoMem that way, and I'm speaking 
>>about 
>>R2 here. R3, who knows, if we can use those pixel shader units... :-) 
> 
> 
> But, we have to talk about defines again. What do you mean by pixel 
> shader units? They have nothing to do with it (I can tell you that 
> without knowing exactly what they are... :)

        I'm talking about GeForce 3/4/5/6 pixel shaders. You know... the 
biggest revolution in 3D graphics in years! Programs that can be 
executed on the GPU - Drawing/Filling ovals, shapes, etc.

> The only thing we need to get 2D acc from main mem is me instructing 
> the engine to fetch the source from main mem instead of from local 
> graphicsmem (on matrox one single flag! (plus adressing of course). The 
> engine instructions themselves are identical (in theory) to what they 
> are now.

        That would be cool.

> This is a step I am going to investigate at some point, hopefully in 
> the not (too) distant future. The main problem here is to get cache 
> coherency working OK I expect. There's a second step also (GART and 
> aperture), but that only improves speed if you are going to fetch 
> mutliple bitmaps 'simultaneously' (so that's one reason why normally 
> this stuff is only used for 3D acc: the second reason is stability.

        I'll give you one too:
        We'll need to copy regions from main memory, that means lots of 
rectangles which can be interpreted as: many bitmaps 'simultaneously'. 
:-) ;-)

> If 
> you want to use 2D acc later on, you should consider real AGP transfers 
> and so acc itself (by letting engine fetch from main mem) as being an 
> option.

        Of course. That is 99% of what we'll have in R1, except that, instead 
of instructing the CPU to copy in vidMem, we'd instruct the graphic 
engine to fetch from main mem.
        Am I wrong?

> If you mean using real 3D for the desktop already (by mentioning pixel 
> shader units), then of course, AGP transfers are much less an option or 
> speed would get unworkable low probably. Although I guess it would 
> still be a good plan to let it be useable without 3D acc and acc engine 
> fetches from main mem (software mode via MESA without 3D acc drivers).

        With pixel shaders at hand we won't be needing main memory anymore. I'm 
SURE this is the way to go.
        Fetching from main mem is the best solution until we reach that stage.

> (Quake2 on my laptop runs at 3.9fps already with MESA6.1, so in openGL 
> mode (compared to those 92FPS in internal software rendering mode I 
> once talked about))

        You need good 3D drivers. I'm sure Quake2 will run at 600-700FPS with 
the same settings.

>>I think, a better solution 
>>is to do triple buffering. Have an off-screen surface in mainMem into 
>>which a window will draw. When drawing it's done, blit this in an 
>>_off-screen surface in video memory_, validate a flag it's OK to use 
>>that surface, and when a portion of that window is needed we use the 
>>2D(/3D) engine to blit on-screen.
>>      What do you think?
> 
> Agreed. Although I did not mention this setup, it did cross my mind :)
> 
> Of course, in the end there maybe more to consider. Like, you are 
> running 3D apps in a window at the same time. Card memory you use for 
> triplebuffering the desktop is nolonger available for the 3D app, 
> lowering it's speed (by it needing more frequently fetching stuff from 
> main mem: assuming the bus stays a bottleneck.)

        I've thought about that also, a little. One simple solution is to have 
app_server use only 1/3rd of video memory, but that is not a viable 
solution. :-P

> BTW: Talking about defines again: I find it interesting to see you talk 
> about blits when you actually mean copying. In my book the word 'blit' 
> in reserved for acc engine mem copying. (so acceleration). I mean, I 
> don't want 'my book' to be nessesarily right, but this _is_ a potential 
> problem: we could misunderstand each other easily by not having these 
> terms defined clearly...

        :-) OK.

> ======
> 
> Hey, I still miss something else that's interesting as well I think. I 
> understand what you mean by double/triple buffering, and I understand 
> why you want to do that. But still, I am wondering about another of 
> BeOS it's shortcomings: updating the screen in such a way that you 
> won't see distortions if you drag a window (for instance). I mean: 
> tearing. Some people talk about double buffering in this context: 
> having a copy of the entire screen in cardRAM that is switched between 
> during retraces.

        You mean: page flipping. :-) ;-) B-)
        Yes, crossed my mind many time. Told you, I've done some programming 
under DirectX.

> This of course requires both buffers to be updated 
> with everything, and you can talk and think a lot about strategies to 
> get this done with minimal overhead.

        I'm listening... :-)

> But the goal it serves would be a perfect, undistorted screen output at 
> all times, which would be nice to have as well at some point.. (just 
> updating the single screenbuffer during retrace is much too expensive I 
> guess, as this leaves very little time to update the buffer: the 
> retrace is a relative very short piece in time compared to 'full-time 
> acc', as is used now.)

        Wait a second, you are not talking about page flipping?
        A while back you were talking about Allegro game library. Isn't page 
flipping the technique used by it for double buffering. Isn't the page 
flipping process ordered by the gaming library, and done on next 
retrace? Then, why are you talking about _every_ retrace?


Bye,
Adi.

Other related posts: