I was browsing through some of my old mails... Gabe, DW, I think this is _good_ info: -------- Original Message -------- Subject: Re: Video Cards Date: Mon, 27 Sep 2004 10:05:41 GMT From: Rudolf Cornelissen <rudolf.cornelissen@xxxxxxxxxxxx> To: Adi Oanca <adioanca@xxxxxxxxx> > About that... > I imagine it's costly to do this: (contigous memory) > for (i=0; i<bitmap->fSize; i++) > bitmap->data[i].red = 65; > bitmap->data[i].blue = 65; > bitmap->data[i].green = 65; > where bitmap->data maps into video memory > > than doing this: > memcpy(videoMem, mainMemBitmap) ? > > What I'm asking is: writing byte by byte into videoMemory isn't more > costly than writing a bulk of data at one. For example how much > faster > is to write 4096 at one than writing 4 bytes 1024 times? Thanks to the MTRR registers in the CPU's these days, writes to the framebuffer are combined to bursts: that is, if I understand correctly, the hardware tries to gather a block of 32 bytes of data that is adressed to a contiguess place in the cardRAM. So if you do this: bitmap = data; bitmap = data; bitmap = data; bitmap = data; bitmap = data; bitmap = data; The system will collect them all and do a single burstwrite into graphicsRAM. That is of course, if the RAM was mapped successfully with the B_MTR_WC flag. If you would do a single write beyond the 32bytes block, so for example this: bitmap = data; bitmap = data; bitmap = data; bitmap = data; //<------ bitmap = data; bitmap = data; bitmap = data; This would result in three writes: burst 1 is  - , then , and then the rest. So, if you are going to randomly fill in the bitmap, the transfer will be much faster with the memcpy (which works sequencially: so each burst will be max block size, and so fastest), than directly into the bitmap mapped in the graphicscardRAM. Note: that these burst writes are the ones using the AGP FW transfers! setting up such a transfer for that single  byte will not increase speed compared to a normal PCI write I think... So, setting up the bitmap in main memory and then memcpy-ing sounds like a good general plan ;-) (overlay bitmaps is a different story, as these are filled in sequencially anyway: if you know this happens, directly filling the graphicscard RAM is faster I expect, as its just less operations) ==== OK, Adi, now we are on this MTRR subject, I want to issue a statement/ warning to you. This should not be forgotten, although I imagine its not yet actual for us at this time. I recently aquired a new laptop, with nVidia FX5200Go graphicscard and a PentiumM 1.6 Ghz (zo a new model). I did my timedemos again with and without the AGP busmanager, and I was stunned. While this laptop should be slower than my desktop system with higher FSB (533 compared to 400), and CPU clock (2140 compared to 1600), while using the identical graphicscard, the FPS is dramatically higher on the laptop than on the desktop system. On the desktop with AGP up, I have something like 72FPS, while on the laptop I got 92FPS! I can only imagine one reason for this: the MTRR stuff in this CPU works much faster. I can imagine the block size being larger than those 32bytes for example, but maybe other things changed as well: who knows. OK, this is just side-info. My real point is this: ->If the grapics kernel driver uses the B_MTR_WC flag on mapping the framebuffer, the app_Server loads a module called mtrr V1. (tested DANO, checked the syslog). If I do not specify this flag, the app_server never loads this module: this stuff is (almost) especially there for graphics (since your eyes won't notice the small out-of-order drawing of pixels anyway ;-), so this makes sense. Here's the funny thing: my laptop will NOT reboot if the MTRR module is loaded by the app_Server. (MTRR works in itself, as the FPS goes from 30->92FPS without/with MTRR used) So, the CPU and/or system BIOS of the laptop must be concluding, that the CPU is in a non-correct operating state so a successfull reboot can't be done. The laptop powers down as a failsafe precaution then. OK: this means that upon a system shutdown (Reboot), we MUST make sure that the app_server unloads stuff as its supposed to be, at least the MTRR module (after the graphicscarddriver is unloaded, so after the last message written to screen I guess). The MTRR module should correctly re-initialize the CPU to work without MTRR being active. This should fix the trouble on these new Intel PentiumM CPU's, and I guess we might see other trouble later on as well (other new CPU's as they come available for instance). ==== That's it. Good hunting! Rudolf.