*Tim Roberts* over at the osr.com forum has been very kind in answering my audio driver questions over the last few months. Recently, he referred me to this mailing list where "/all the cool audio driver kids, including several very helpful members of the Microsoft audio team, hang out/".
My goal is to play networked voice data (in our app) to a virtual render endpoint, and have it 'loop-back' to a virtual capture endpoint that 3rd-party chat apps like Skype, Zoom, etc. can consume.
My audio driver prototype is based on SYSVAD/WaveRT. Besides being newer and recommended in the docs, WaveRT handles more of the copying data to/from the DMA buffers. My hope was that I could write less code and have a more stable driver.
What's needed is just a straight-through audio pipe, so I tried using a single, common buffer for render & capture streams; keeping the respective PlayPositions in sync. For this to have a hope of working, both render and capture streams would have to be identical in terms of format (e.g. channels, frames/sec). Alas, when I tested, each stream had significant differences in their PCM format (channels, frame rate). I'm still testing to see if I can limit/coerce both endpoints to the same format: a single channel at some arbitrary 'good enough' frame rate, but this approach is starting to look uncomfortably fragile.
My second prototype is based upon MSVAD/WaveCyclic. With WaveCyclic, I implement the DMA buffer copying logic, so I can do whatever data conversion is required. I'm aware of a few other 'virtual audio driver' projects on Github with similar goals; all based upon WaveCyclic.
I wanted to see how they dealt with the data format conversion issue, but when I look at their CopyTo and CopyFrom implementations, *I don't see them dealing with conversion at all*. There doesn't appear to be any code (e.g. in DataRangeIntersection) that limits streams to a specific number of channels or frame-rate.
Can anyone shed light on this?
I'm sure my expectations must be off somehow, but it seems to me that some kind of conversion *must* be required when copying data between streams of sufficiently different format...