Hi Adi, > We use ViewDriver because our instruction clipping code (when in an > update you are allowed to draw only in a specified region) is not in > place and we use BView::ConstrainClippingRegion(®) for that. Nice! I didn't even know that such a option existed.. I guess it's because I never really wrote an app yet :-/ > When Gabe finises the clipping code(read DisplayDrive it's done) > we'll > be able to use the full acceleration an accelerant can give us. So, you can use DD and AccelerantDriver. Does Accelerantdriver already has the engine stuff in now? Or will it do software exectution only at the beginning? (just curious) > > -->I am assuming here that these buffers reside on the > > Graphicscard. > In R1, no. OK. > > The sentence also implies that both the source and destination of > > the > > drawing actions are residing on the graphicscards RAM, or acc would > > not > > be possible (ATM at least). > > HW cursor is possible. The rest, no. It's no problem, we'll use > MMX, > SSE. That is until you provide us with drivers that can use pixel > shaders. :-)))) OK, so you are indeed confirming here you know you won't be using the ACC engine in the driver. You are using MMX and SSE which in my book is software drawing, so no acceleration from the cards engine. It's indeed the best you can do for now, so sounds OK to me :) > > OK, here's my 'warning': > > While you are correct in saying you can draw 'parallel' in those > > regions inside the buffer, there may be important performance > > penalties > > if you don't serialize the access after all. The 'burst mode' of > > writing across the PCI/AGP bus depends on serialized access (at > > least > > beyond those say 32bytes blocks, within these blocks it doesn't > > matter). Burst mode (fully automatically generated by the system's > > hardware) works in PCI mode, and in AGP mode. In AGP mode bursts > > are > > the ones being accelerated with the 'fastwrites' feature. > > How do we serialize access? Locking, semaphores. Only one thread draws in the graphicsRAM at a time. This ensures that you have the best chance that it's memory is accessed in a more or less serial way. Of course, in the end it depends on what every thread is doing. If you are updating some piece of background window because it just became foreground (some other window moved away), you will be doing it serially as well: every line of screen is being updated (filled) from left to right (or vice versa). You only make jumps in memory if you reach the end of such a updated part of a line: the jump will be "bytes_per_row". After this serial access again happens. So this kind of doublebuffering would be fast (bursts, FW) and sounds like a good plan (double-buffering 'source' bitmaps are in main mem). If locking is done: but you could of course just benchmark both options for yourself and see if I am correct ;-) If you _don't_ use doublebuffering, then suddenly you can't 'guarantee' every thread will do serialized access optimally: now the app's behaviour will determine this. Video will off course be 'serialized' in the app, but if some figure is to be drawn onscreen (a transparant box or whatever), then this can't be done in a serial fashion (because the content of the box is not touched). > If we're doing a blit(copy line by line) from main mem to on-screen > video mem, won't the HW generated burst step into action? Yep. OK, you could call this HW acceleration also. In my book however, I would not call it that way. I guess I still have to read you app_server guys book of defines yet.. :-/ (or vice versa of course ;-) I wouldn't even know what to call it per se, but something like 'bus acceleration' as a name for it would be much better here.. Anyway: Indeed, burst and FW will be optimally as stated above 8-) It's a good idea to have (at least) 32bit word alignment in place for the source bitmaps in main mem BTW. > > If indeed both the source and destination of the drawing action > > reside > > in graphicsRAM, this performance issue only exists if you use non- > > accelerated drawing actions, because the CPU would need to read the > > data across the PCI/AGP bus, and then write it back, while the > > acceleration engine keeps it local on the graphicscard RAM. > > I don't think we'll be using videoMem that way, and I'm speaking > about > R2 here. R3, who knows, if we can use those pixel shader units... :-) Good. But, we have to talk about defines again. What do you mean by pixel shader units? They have nothing to do with it (I can tell you that without knowing exactly what they are... :) The only thing we need to get 2D acc from main mem is me instructing the engine to fetch the source from main mem instead of from local graphicsmem (on matrox one single flag! (plus adressing of course). The engine instructions themselves are identical (in theory) to what they are now. This is a step I am going to investigate at some point, hopefully in the not (too) distant future. The main problem here is to get cache coherency working OK I expect. There's a second step also (GART and aperture), but that only improves speed if you are going to fetch mutliple bitmaps 'simultaneously' (so that's one reason why normally this stuff is only used for 3D acc: the second reason is stability. If you want to use 2D acc later on, you should consider real AGP transfers and so acc itself (by letting engine fetch from main mem) as being an option. You should take into account that only FW is used, and in some cases, even only standard PCI. (off course we'll have to see what PCIe brings us as well..) If you mean using real 3D for the desktop already (by mentioning pixel shader units), then of course, AGP transfers are much less an option or speed would get unworkable low probably. Although I guess it would still be a good plan to let it be useable without 3D acc and acc engine fetches from main mem (software mode via MESA without 3D acc drivers). (Quake2 on my laptop runs at 3.9fps already with MESA6.1, so in openGL mode (compared to those 92FPS in internal software rendering mode I once talked about)) > After the talks we've had, I thought good and wanted to ask your > opinion > on this: > Do you think it's good to draw(with the CPU) in vidMem off-screen > surfaces(I mean double buffering in videoMem)? I say no because of > the > PCI bus; writing _and_ reading is expensive. I say no too, because of just this reason. >I think, a better solution > is to do triple buffering. Have an off-screen surface in mainMem into > which a window will draw. When drawing it's done, blit this in an > _off-screen surface in video memory_, validate a flag it's OK to use > that surface, and when a portion of that window is needed we use the > 2D(/3D) engine to blit on-screen. > What do you think? Agreed. Although I did not mention this setup, it did cross my mind :) Of course, in the end there maybe more to consider. Like, you are running 3D apps in a window at the same time. Card memory you use for triplebuffering the desktop is nolonger available for the 3D app, lowering it's speed (by it needing more frequently fetching stuff from main mem: assuming the bus stays a bottleneck.) BTW: Talking about defines again: I find it interesting to see you talk about blits when you actually mean copying. In my book the word 'blit' in reserved for acc engine mem copying. (so acceleration). I mean, I don't want 'my book' to be nessesarily right, but this _is_ a potential problem: we could misunderstand each other easily by not having these terms defined clearly... ====== Hey, I still miss something else that's interesting as well I think. I understand what you mean by double/triple buffering, and I understand why you want to do that. But still, I am wondering about another of BeOS it's shortcomings: updating the screen in such a way that you won't see distortions if you drag a window (for instance). I mean: tearing. Some people talk about double buffering in this context: having a copy of the entire screen in cardRAM that is switched between during retraces. This of course requires both buffers to be updated with everything, and you can talk and think a lot about strategies to get this done with minimal overhead. But the goal it serves would be a perfect, undistorted screen output at all times, which would be nice to have as well at some point.. (just updating the single screenbuffer during retrace is much too expensive I guess, as this leaves very little time to update the buffer: the retrace is a relative very short piece in time compared to 'full-time acc', as is used now.) Greetings! Rudolf.