Hi Stephan :-) I really hope you're not _that_ surprized! Unless I am very much mistaken, I already warned (in general) for 8/32/ 64 bit accesses.. Also, I informed some(?) people about a _very_ nice benchmarking app outthere, that you really need to have now I guess (not Stephan's benchmarker for writes only on bebits, but a not-yet released low- level, both directions one). Here are some results from it on some of my systems: Note that I tested with and without MTRR support(!). A few more things: - If you want the app, I'll mail it. The author considers himself 'lost' to our community, and I still lack clearance to publish it. :-/ - PLEASE optimize for 64 bit reads AND writes. No MTRR avaible means that writes WILL be influenced in speed. Thanks :-) Rudolf. BTW: attachement in HTML included from some systems (reports from author about mainly my reported results to him) BTW2: and don't forget about PCI-FW via the AGP busmanager! Have a look at those results... ========= <snipping from mails to author> laptop: Packard Bell EasyNote 4012C+. CPU celeron 400Mhz, Neomagic Magicgraph PCI video chipset NM2160, mainboard chipset Intel 82440MX $ AGPBandwidth -d1 -r2 --allsizes --delay AGPBandwidth 0.6 Screenmode: 1024x768x16, framebuffer address: 0x20c00000 Delaying 4 seconds... Write 64-bit: 63.84 MB/s (210.00 MB in 3.29 s) Write 32-bit: 63.66 MB/s (210.00 MB in 3.30 s) Write 8-bit: 63.66 MB/s (210.00 MB in 3.30 s) Read 64-bit: 5.24 MB/s (22.50 MB in 4.29 s) Read 32-bit: 5.05 MB/s (22.50 MB in 4.46 s) Read 8-bit: 1.19 MB/s (7.50 MB in 6.32 s) $ AGPBandwidth -d1 -r2 --allsizes --delay AGPBandwidth 0.6 Screenmode: 1024x768x16, framebuffer address: 0x20c00000 Delaying 4 seconds... Write 64-bit: 63.87 MB/s (210.00 MB in 3.29 s) Write 32-bit: 63.75 MB/s (210.00 MB in 3.29 s) Write 8-bit: 63.67 MB/s (210.00 MB in 3.30 s) Read 64-bit: 5.24 MB/s (22.50 MB in 4.29 s) Read 32-bit: 5.06 MB/s (22.50 MB in 4.45 s) Read 8-bit: 1.18 MB/s (7.50 MB in 6.33 s) $ --------- disabled MTRR-WC: --------- $ AGPBandwidth -d1 -r2 --allsizes --delay AGPBandwidth 0.6 Screenmode: 1024x768x16, framebuffer address: 0x20c00000 Delaying 4 seconds... Write 64-bit: 57.36 MB/s (180.00 MB in 3.14 s) Write 32-bit: 29.44 MB/s (90.00 MB in 3.06 s) Write 8-bit: 7.30 MB/s (30.00 MB in 4.11 s) Read 64-bit: 5.04 MB/s (22.50 MB in 4.46 s) Read 32-bit: 4.46 MB/s (15.00 MB in 3.36 s) Read 8-bit: 1.11 MB/s (7.50 MB in 6.73 s) $ AGPBandwidth -d1 -r2 --allsizes --delay AGPBandwidth 0.6 Screenmode: 1024x768x16, framebuffer address: 0x20c00000 Delaying 4 seconds... Write 64-bit: 57.82 MB/s (180.00 MB in 3.11 s) Write 32-bit: 29.61 MB/s (90.00 MB in 3.04 s) Write 8-bit: 7.37 MB/s (30.00 MB in 4.07 s) Read 64-bit: 5.06 MB/s (22.50 MB in 4.44 s) Read 32-bit: 4.48 MB/s (15.00 MB in 3.35 s) Read 8-bit: 1.11 MB/s (7.50 MB in 6.73 s) $ ----------------------------- +mtrr-wc, plus driver speedup fix (testable thanks to you!!) (I'll sleep on including this in CVS as it seems like this speedup goes at the cost of CPU time..) ------------------------------ $ AGPBandwidth -d1 -r2 --allsizes --delay AGPBandwidth 0.6 Screenmode: 1024x768x16, framebuffer address: 0x20c00000 Delaying 4 seconds... Write 64-bit: 79.80 MB/s (240.00 MB in 3.01 s) Write 32-bit: 79.78 MB/s (240.00 MB in 3.01 s) Write 8-bit: 79.75 MB/s (240.00 MB in 3.01 s) Read 64-bit: 5.33 MB/s (22.50 MB in 4.22 s) Read 32-bit: 4.83 MB/s (15.00 MB in 3.11 s) Read 8-bit: 1.15 MB/s (7.50 MB in 6.50 s) $ AGPBandwidth -d1 -r2 --allsizes --delay AGPBandwidth 0.6 Screenmode: 1024x768x16, framebuffer address: 0x20c00000 Delaying 4 seconds... Write 64-bit: 79.89 MB/s (240.00 MB in 3.00 s) Write 32-bit: 79.81 MB/s (240.00 MB in 3.01 s) Write 8-bit: 79.70 MB/s (240.00 MB in 3.01 s) Read 64-bit: 5.33 MB/s (22.50 MB in 4.22 s) Read 32-bit: 4.82 MB/s (15.00 MB in 3.11 s) Read 8-bit: 1.15 MB/s (7.50 MB in 6.51 s) ================== > Hi all, > > The good news is that I've found a way to accelerate the alpha > blending > inside the drawing modes that Painter uses by a factor of 4.6. For > writing > to graphics memory, the access pattern doesn't seem to matter much. > There > is virtually no difference if you write 8 bits, 32 bits or 64 bits at > once. > But when you need to read from the frame buffer, the difference is > quite > noticable: It is 3.6 times faster to read 32 bits into a temporary > variable, alpha blend into that, and write it back. 4.6 times faster > to do > them same, but with 64 bits. I always thought that this stuff > mattered for > just writing to graphics mem as well, but it seems that this is not > the > case. > > I've also noticed an awesome possibility for speed improvement in my > bitmap > rendring code for clipped bitmaps. This should speed up WonderBrush > on > Haiku quite a bit. Maybe this applies to more stuff as well. If I > understand AGG correctly, it will clip stuff you draw at a very late > stage, > mostly at the time it tries to write a generated scanline to the > frame > buffer. When you manually apply a bit of clipping before that, you > can > possibly skip the generation of much of the scanline as well. For > bitmaps, > this is especially easy to accomplish. And even more effective, since > AGG > would generate a color scanline, since the "fill" is not a solid > color. > > Best regards, > -Stephan > > > >