[haiku-appserver] [Fwd: Re: Video Cards]

  • From: Adi Oanca <adioanca@xxxxxxxxx>
  • To: haiku-appserver@xxxxxxxxxxxxx
  • Date: Tue, 28 Sep 2004 22:21:07 +0300

DW, Gabe, this is a must see.

-------- Original Message --------
Subject: Re: Video Cards
Date: Mon, 27 Sep 2004 10:05:41 GMT
From: Rudolf Cornelissen <rudolf.cornelissen@xxxxxxxxxxxx>
To: Adi Oanca <adioanca@xxxxxxxxx>

Good morning Adi!!

How are you doing? You still a happy camper?
I am... :)

>       If you don't have time to show us how is the best way to use those 
> graphic hooks, we'll do it as we thinks it's good. Though, that may 
> not 
> be optimal as it seems you know something we don't.
>       Share with us please.
OK, I'll dive in it then, since you insist. Because my knowledge is
maybe deep, but only in the small area of the graphicsdrivers, this
will cost me some trouble. Also don't laugh please with maybe clumsy
coding.. I'm no C(++) expert.

->What I would love is if you have something for me that I can use to
test with a real graphicsdriver (some app_server (test) version?), that
draws a few rectangles and/or lets me move them by mouse: I could then
implement the acc code in my own clumsy way to see if I can get the
engine going. After that you could take a look at my changes and learn
from it what you need?

Of course, I can also just write some fake code that should work..

Maybe you can refresh my memory a bit in what you are requesting?

> > Indeed. But do it with 'normal' transfers as it is now, or we loose 
> > compatibility with many current systems I fear. For instance those 
> > ATI 
> > components seem to suffer from trouble. BTW: even PCIe cards use 
> > AGP 
> > internally, at least from nVidia. For now (count on at least a year 
> > before that's changed).
>       About that...
>       I imagine it's costly to do this: (contigous memory)
> for (i=0; i<bitmap->fSize; i++)
>       bitmap->data[i].red = 65;
>       bitmap->data[i].blue = 65;
>       bitmap->data[i].green = 65;
> where bitmap->data maps into video memory
> than doing this:
>       memcpy(videoMem, mainMemBitmap) ?
> What I'm asking is: writing byte by byte into videoMemory isn't more 
> costly than writing a bulk of data at one. For example how much 
> faster 
> is to write 4096 at one than writing 4 bytes 1024 times?

Thanks to the MTRR registers in the CPU's these days, writes to the
framebuffer are combined to bursts: that is, if I understand correctly,
the hardware tries to gather a block of 32 bytes of data that is
adressed to a contiguess place in the cardRAM. So if you do this:
bitmap[0] = data;
bitmap[31] = data;
bitmap[15] = data;
bitmap[11] = data;
bitmap[1] = data;
bitmap[2] = data;

The system will collect them all and do a single burstwrite into
graphicsRAM. That is of course, if the RAM was mapped successfully with
the B_MTR_WC flag.

If you would do a single write beyond the 32bytes block, so for example
bitmap[0] = data;
bitmap[31] = data;
bitmap[15] = data;
bitmap[59] = data; //<------
bitmap[11] = data;
bitmap[1] = data;
bitmap[2] = data;

This would result in three writes: burst 1 is [0] - [15], then [59],
and then the rest.

So, if you are going to randomly fill in the bitmap, the transfer will
be much faster with the memcpy (which works sequencially: so each burst
will be max block size, and so fastest), than directly into the bitmap
mapped in the graphicscardRAM.

Note: that these burst writes are the ones using the AGP FW transfers!
setting up such a transfer for that single [59] byte will not increase
speed compared to a normal PCI write I think...

So, setting up the bitmap in main memory and then memcpy-ing sounds
like a good general plan ;-)
(overlay bitmaps is a different story, as these are filled in
sequencially anyway: if you know this happens, directly filling the
graphicscard RAM is faster I expect, as its just less operations)

> Guess you know what I want: double buffering in mainMemory.
Yep: thanks for explicitly pointing that out anyway(!!): I could just
as easily not have seen the point, due to my still limited knowledge :)


OK, Adi, now we are on this MTRR subject, I want to issue a statement/
warning to you. This should not be forgotten, although I imagine its
not yet actual for us at this time.

I recently aquired a new laptop, with nVidia FX5200Go graphicscard and
a PentiumM 1.6 Ghz (zo a new model). I did my timedemos again with and
without the AGP busmanager, and I was stunned. While this laptop should
be slower than my desktop system with higher FSB (533 compared to 400),
and CPU clock (2140 compared to 1600), while using the identical
graphicscard, the FPS is dramatically higher on the laptop than on the
desktop system.
On the desktop with AGP up, I have something like 72FPS, while on the
laptop I got 92FPS!
I can only imagine one reason for this: the MTRR stuff in this CPU
works much faster. I can imagine the block size being larger than those
32bytes for example, but maybe other things changed as well: who knows.

OK, this is just side-info. My real point is this:
->If the grapics kernel driver uses the B_MTR_WC flag on mapping the
framebuffer, the app_Server loads a module called mtrr V1. (tested
DANO, checked the syslog). If I do not specify this flag, the
app_server never loads this module: this stuff is (almost) especially
there for graphics (since your eyes won't notice the small out-of-order
drawing of pixels anyway ;-), so this makes sense.

Here's the funny thing: my laptop will NOT reboot if the MTRR module is
loaded by the app_Server. (MTRR works in itself, as the FPS goes from
30->92FPS without/with MTRR used)

So, the CPU and/or system BIOS of the laptop must be concluding, that
the CPU is in a non-correct operating state so a successfull reboot
can't be done. The laptop powers down as a failsafe precaution then.

OK: this means that upon a system shutdown (Reboot), we MUST make sure
that the app_server unloads stuff as its supposed to be, at least the
MTRR module (after the graphicscarddriver is unloaded, so after the
last message written to screen I guess). The MTRR module should
correctly re-initialize the CPU to work without MTRR being active.
This should fix the trouble on these new Intel PentiumM CPU's, and I
guess we might see other trouble later on as well (other new CPU's as
they come available for instance).


That's it. Good hunting!


Castiga un telefon cu Personalitate! Exclusiv pentru femei, exclusiv pe 

Other related posts: