[haiku-appserver] [Fwd: Re: Video Cards]

  • From: Adi Oanca <adioanca@xxxxxxxxxxxxx>
  • To: haiku-appserver@xxxxxxxxxxxxx
  • Date: Wed, 20 Oct 2004 23:58:54 +0300

        I was browsing through some of my old mails...

        Gabe, DW, I think this is _good_ info:

-------- Original Message --------
Subject: Re: Video Cards
Date: Mon, 27 Sep 2004 10:05:41 GMT
From: Rudolf Cornelissen <rudolf.cornelissen@xxxxxxxxxxxx>
To: Adi Oanca <adioanca@xxxxxxxxx>


>       About that...
>       I imagine it's costly to do this: (contigous memory)
> for (i=0; i<bitmap->fSize; i++)
>       bitmap->data[i].red = 65;
>       bitmap->data[i].blue = 65;
>       bitmap->data[i].green = 65;
> where bitmap->data maps into video memory
> 
> than doing this:
>       memcpy(videoMem, mainMemBitmap) ?
> 
> What I'm asking is: writing byte by byte into videoMemory isn't more 
> costly than writing a bulk of data at one. For example how much 
> faster 
> is to write 4096 at one than writing 4 bytes 1024 times?

Thanks to the MTRR registers in the CPU's these days, writes to the
framebuffer are combined to bursts: that is, if I understand correctly,
the hardware tries to gather a block of 32 bytes of data that is
adressed to a contiguess place in the cardRAM. So if you do this:
bitmap[0] = data;
bitmap[31] = data;
bitmap[15] = data;
bitmap[11] = data;
bitmap[1] = data;
bitmap[2] = data;

The system will collect them all and do a single burstwrite into
graphicsRAM. That is of course, if the RAM was mapped successfully with
the B_MTR_WC flag.

If you would do a single write beyond the 32bytes block, so for example
this:
bitmap[0] = data;
bitmap[31] = data;
bitmap[15] = data;
bitmap[59] = data; //<------
bitmap[11] = data;
bitmap[1] = data;
bitmap[2] = data;

This would result in three writes: burst 1 is [0] - [15], then [59],
and then the rest.

So, if you are going to randomly fill in the bitmap, the transfer will
be much faster with the memcpy (which works sequencially: so each burst
will be max block size, and so fastest), than directly into the bitmap
mapped in the graphicscardRAM.

Note: that these burst writes are the ones using the AGP FW transfers!
setting up such a transfer for that single [59] byte will not increase
speed compared to a normal PCI write I think...

So, setting up the bitmap in main memory and then memcpy-ing sounds
like a good general plan ;-)
(overlay bitmaps is a different story, as these are filled in
sequencially anyway: if you know this happens, directly filling the
graphicscard RAM is faster I expect, as its just less operations)

====

OK, Adi, now we are on this MTRR subject, I want to issue a statement/
warning to you. This should not be forgotten, although I imagine its
not yet actual for us at this time.

I recently aquired a new laptop, with nVidia FX5200Go graphicscard and
a PentiumM 1.6 Ghz (zo a new model). I did my timedemos again with and
without the AGP busmanager, and I was stunned. While this laptop should
be slower than my desktop system with higher FSB (533 compared to 400),
and CPU clock (2140 compared to 1600), while using the identical
graphicscard, the FPS is dramatically higher on the laptop than on the
desktop system.
On the desktop with AGP up, I have something like 72FPS, while on the
laptop I got 92FPS!
I can only imagine one reason for this: the MTRR stuff in this CPU
works much faster. I can imagine the block size being larger than those
32bytes for example, but maybe other things changed as well: who knows.


OK, this is just side-info. My real point is this:
->If the grapics kernel driver uses the B_MTR_WC flag on mapping the
framebuffer, the app_Server loads a module called mtrr V1. (tested
DANO, checked the syslog). If I do not specify this flag, the
app_server never loads this module: this stuff is (almost) especially
there for graphics (since your eyes won't notice the small out-of-order
drawing of pixels anyway ;-), so this makes sense.

Here's the funny thing: my laptop will NOT reboot if the MTRR module is
loaded by the app_Server. (MTRR works in itself, as the FPS goes from
30->92FPS without/with MTRR used)

So, the CPU and/or system BIOS of the laptop must be concluding, that
the CPU is in a non-correct operating state so a successfull reboot
can't be done. The laptop powers down as a failsafe precaution then.

OK: this means that upon a system shutdown (Reboot), we MUST make sure
that the app_server unloads stuff as its supposed to be, at least the
MTRR module (after the graphicscarddriver is unloaded, so after the
last message written to screen I guess). The MTRR module should
correctly re-initialize the CPU to work without MTRR being active.
This should fix the trouble on these new Intel PentiumM CPU's, and I
guess we might see other trouble later on as well (other new CPU's as
they come available for instance).

====

That's it. Good hunting!

Rudolf.



Other related posts: