[haiku-appserver] Re: 2D engine

  • From: "Rudolf" <drivers.be-hold@xxxxxxxxxxxx>
  • To: haiku-appserver@xxxxxxxxxxxxx
  • Date: Mon, 01 Nov 2004 11:07:01 +0100 CET


Cool article: I never saw that! (suddendly more pieces of the puzzle 
fit together :)

Answering 'out-of-order':

> "Much More Parallel
> Some people with old S3 and Cirrus Logic video cards experienced hard 
> system freezes with earlier 
> versions of the R4 beta. These freezes prove that the incredibly buff 
> new R4 graphics driver 
> architecture (designed and implemented by our own Trey "Ball-Buster" 
> Boudreau) is working correctly 
> -- too well, in the case of these cards. Prior to R4, the locking 
> done when a view was drawing was 
> coarse-grained; no two threads could draw to the frame buffer at the 
> same time.
> In R4, any number of threads can draw to the frame buffer 
> simultaneously.  

Yep. Correct, about R4 and later (of course :).
But, they kept in mind (and I love that about them!) that the new 
architecture should also be compatible with those older 'non-
compatible' cards: to (not) serialize the framebuffer access, we have a 
special flag used in the accelerants:

If this flag is set, you may access the framebuffer parallel. If not, 
(ProposeMode within the accelerant is required to clear or set the flag 
to what it needs/wants, the R5/Dano app_server nicely adhere to it)

Also note my comment in the accelerant:
        /* BTW: B_PARALLEL_ACCESS in combination with a hardcursor enables
         * BDirectWindow windowed modes. */
Which is only logical now :)

> The only resource locked
> for exclusive access is the acceleration engine, and that is locked 
> only for the time required to 
> feed the rendering commands through the FIFO; 

Indeed. Be as quick as possible. Release the engine ASAP after issuing 
a command. Do not wait for it to finish commands if not absolutely 
needed. (the app_server 'hang' mentioned by Gabe is a nice example :)

So, understand please that you issuing a command to the engine _does 
not mean_ it is executed immediately. The command is placed in (one of) 
the engine's FIFO(s), which means, your command is placed at the rear 
of the queue waiting to be executed. The 'front' of this queue is being 
served by execution of the requested command. 

> synchronizing with the engine is intelligent and is  done only when 
> absolutely necessary.
(and the question Adi asked:)
>       What is/means engine synchronization?

Engine synchronisation is the synchronisation between:
1. the engine drawing, and
2. the app_server (or app) drawing directly in the framebuffer.

As I said, issuing a command does not mean that the command is 
exectuted immediately. There may be other commands issued before this 
one, also still waiting to be executed.

But, sometimes you will need to know when a certain command you issued 
has actually being executed completely. For instance, if you want to 
move a window, and after that draw something unaccelerated inside it on 
the new location. I don't know. You'll have to tell me.
What you _should_ do, is _prevent_ you have to sync to engine too much. 
If you need to sync, please see what other stuff you can do while 
waiting, so you are not actually waiting yet. Only after you did all 
you could do, sync to the engine. You'll need to see for yourself what 
you want or need to do with this.

OK, now let's talk about _how_ you synchronize with the engine. 
There are two ways:

1.The 'dumb' way: wait until the engine is completely idle. hook: 
WAIT_ENGINE_IDLE() This hook can be called before or _after_ you 
release the engine. If you want to force the engine becoming idle, you 
should probably _not_ release the engine before, as other threads could 
issue new commands then constantly, so the engine never becomes fully 
idle. OTOH you have to realize that not releasing means other threads 
will be (probably) waiting until you do.

2. The intelligent way: wait until the engine has completed your (list 
of) commands, while it may still be busy doing other commands issued 
after the commands you are waiting for to complete where completed.
You can easily issue more commands, before waiting for a certain 
earlier given command to be completed. This would mean the engine does 
_not_ become completely idle when your command in question is finished 
as the command after that is starting to be executed. Working this way 
can potentially speedup the process as a whole (or it would not be 
invented I guess :).

Using this method requires you giving the accelerant a token to which 
it can sync. You do that be creating an empty token, and then give a 
pointer to that to the accelerant while releasing the engine with the 
status_t RELEASE_ENGINE(engine_token *et, sync_token *st)
or alternatively you can get a token without releasing the engine just 
yet by calling:
status_t GET_SYNC_TOKEN(engine_token *et, sync_token *st)

The accelerant will check if you gave a token pointer, and if you did, 
it will provide you with one. This token is kind of a time-stamp that 
the accelerant knows how to interpret to help you by giving you the 
option to wait until your specific (list of) command(s) is executed 
while the engine remains executing later issued commands.

So, you don't do anything with that token yourself, you just give it 
back to the accelerant if you want to synchronize. You do that when you 
re-aquire the engine with:
status_t ACQUIRE_ENGINE(uint32 capabilities, uint32 max_wait, 
sync_token *st, engine_token **et)
Note that you cannot use this if you did NOT get the token before! 
(i.e. you have to had aquired the engine earlier in order to get a 
token you can now pass along)

Alternatively, you can sync to the engine _without_ having aquired the 
engine at this time by calling:
status_t SYNC_TO_TOKEN(sync_token *st)


OK, I hope the setup is clear now. If you have a look at my drivers, 
you will see that if you use the sync_token stuff, it won't make any 
difference, as I just call an internal function to let you wait until 
the engine is totally idle.
Why do I do that? Well, the answer is both simple and painfull: lack of 

Probably the ATI driver has this stuff in place though, as Thomas has 
the kind of setup in place that I cannot do, that supports this 
function. ATI was the best company sofar to get the most detailed info 
about (some of) their cards.

So, how could this potentially work internally in the driver:
There are two ways that I know of:
1. The accelerant sets up a FIFO in memory (circulair buffer). You have 
a begin and end pointer to this buffer that indicate where the front 
and tail of the commands waiting are. If someone requests a sync_token, 
the accelerant will place the tail pointer in it indicating where the 
last command was dumped that you want to wait to complete for. IF you 
sync to token, the accelerant will wait for the front pointer to go 
beyond that (now) old tail pointer, indicating your (last) command(s) 
has been executed.

Technical detail: the engine has to be told where the fifo is, and how 
big it is. the engine will need to give me access to the front and back 
pointers it keeps. This requires me to have the info needed to know 
howto set this up: which currently I don't have.

2.The accelerant simply issues an extra acceleration engine command. 
This command will however not do something onscreen, but instead set or 
clear some variable somewhere (cardRAM, internal registers?) which 
belongs to the token given to the user. If syncing has to be done, the 
accelerant will simply wait until this variable is modified, in effect 
telling that your (last) command(s) has been executed.

Technical info: this command should be some special command I guess: I 
haven't yet thought too much about it. As a workaround I could possibly 
setup for instance a rect_invert() command that inverts a variable I 
just place in 'reserved' offscreen memory. As long as no apps will mess
-up this memory by writing outside their allocated areas, this setup 
should work.

For now, I won't bother however. I am first going to have a decent look 
at for instance the nvidia opensource 3D driver in utahGLX to see if 
(and if so, how) the sync_to_token scheme has been setup in there. I 
can imagine that this setup becomes important if hardware 3D 
acceleration will be setup, and not yet before.


That's it!

Hope it helps.



Other related posts: