[haiku-development] Re: A tale of two accelerant API's

  • From: looncraz <looncraz@xxxxxxxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Tue, 12 Feb 2013 13:39:05 -0800

On 2/12/2013 06:33, Axel Dörfler wrote:

That reminds me: the app_server is already completely agnostic towards how the frame buffer looks like.
It doesn't care whether a workspace is in a single frame buffer or not.
Every Screen object can come with its own framebuffer, and that might be powered by the same graphics driver or not. In any case, the graphics driver is the only one that decides whether or not a multi-head situation shares the same framebuffer.
From the POV of the app_server, every screen has its own framebuffer.

Bye,
   Axel.



What about the case of differing resolutions for two monitors on a dual-head card? I don't see how that can be handled transparently to something which will be handling the rendering into it... without scaling, or simply over-provisioning (virtually (by hidden clipping logic I haven't seen in the code anywhere...) or otherwise) so that the rendering still occurs as if the frame buffer was some neat and tidy rectangle.

Currently the CompositeEngine isn't setup to handle multiple screen objects or Desktops - just one, but there is currently no multihead support in Radeon HD to test it, so I am testing the code outside of app_server in a fake environment (that and it is a lot easier to test the interaction of just my changes this way...) - I'm thinking each screen may get its own CompositeEngine... but share all other resources. This is five threads on a quad-core system by default, but this will be configurable so as to limit the total number of threads spawned...

The overall setup is quite simple, actually: but it is designed with the idea that everything is 2d and we don't have the luxury of wonderful hardware acceleration... because we don't.. and won't for years. It is flexible, though. WindowBuffers can easily represent vram / 3d textures and the rest is adjustable easily enough...

Other than the obvious redirection of client paints into a buffer, it is necessary to properly determine when a client draw needs to change what is on screen - when it is visible. I've added this logic into the window itself - so just before an UpdateSession is completed a RenderTask object (or two) is pulled from the RenderTaskPool and all windows which were affected by the draw are added, along with their respective affected regions, sorted by z-order. Each RenderTask represents a single client draw - but compositing is currently designed with the idea of splitting the rendering down two paths: one for alpha-mode rendering, another for fast copy-mode rendering - sometimes two RenderTasks are "paired" together, one "owning" the other because they represent parts of the same client draw. As such, windows will need to declare an alpha region in order to gain transparency. Decorators are the opposite: they will need to declare a copy-mode region to improve performance, otherwise they are considered 100% alpha-mode.

When a RenderTask is Finalize()'d it calls into the CompositeEngine where it is sorted by various properties (window priority, pixel area, drawing mode, foreground/background window, last update(min frame rate), etc...) - this is where prioritization comes to play. Now the client draw is complete and the window thread can go on its merry way. This process is much faster than it sounds - it occurs in the window threads so we can create multiple RenderTasks at once as well. Exact prioritization isn't being considered as vital. All new RenderTasks are inserted after any skipped RenderTasks from the last frame unless they are deemed to be more important by a certain factor (such as would be the case for MediaPlayer or games that don't use BDirectWindow (there are some changes I want to make here - basically just to let the BDirectWindow clients give the app_server an opportunity to overlay its draws so software cursors don't get obscured and overlays don't get messed up... at least not as badly as they do today... but, yes, that requires client cooperation - and lower performance in that window)).

The next time the CompositeEngine frame control thread (needs a nice name... like Rembrandt?) wants to render a frame (which it tries to do every 1/60th of a second or so). it tests to see if there are any pending RenderTasks and then switches contexts so new RenderTasks can be added while the current frame is being rendered. A group of threads, one per core by default, use specialized DrawingEngines attached to the frame buffer for the screen in question - this is where hardware acceleration comes in. The only action these threads perform is calling into DrawingEngine::DrawWindowBuffer(WindowBuffer* buffer, IntRect source, IntRect destination). A buffer to buffer rendering - either a raw copy, or with alpha calculations in place. The drawing mode switch is so fast I didn't worry about avoiding them, so the DrawingEngine is set to render in alpha mode or copy mode depending on the needs of the RenderTask its owning thread is servicing. The threads recycle the RenderTask objects when they are serviced, release its read lock on the frame control MultiLocker, and then starts all over.

Next, the CompositeEngine frame control thread (Rembrandt?) comes 'back to life' some time before it is desired to update the actual image on the screen and prevents the threads from working on any new RenderTasks by trying to get a WriteLock on the frame control MultiLocker. This is the mouse or certain effects. The back buffer becomes the front buffer, the front buffer the back buffer, and the modified areas of the front buffer are copied to the back buffer, any post-processing can occur now, and the time to the next frame is calculated. And it all begins again.

I currently have no code for the single-buffer case. In fact, I'm working on the premise that every single window is double buffered along with the frame buffer. I did a test run with the previously released code and the app_server memory usage went to 200MB with this setup. 112MB was used for single-buffering. And the performance difference would be noticeable on lower end systems - where RAM is cheaper than CPU.

If the buffers can exist in VRAM, though, then we see a rather low memory requirement... and it is entirely feasible to hold one buffer in video RAM and the other in system RAM. (mmap, FTW!).

BTW, DrawingEngine::DrawWindowBuffer calls into the accelerated DrawBitmap_NoScale32, IIRC... so acceleration should come "for free."

--The loon

PS: there is plenty I left out... such as how system-wide effects are to be handled (no provisions as of yet - LOW priority)

Other related posts: