TL:DR; this sounds like a bug in either the Microsoft HD Audio class driver,
the Realtek driver, the HD Audio controller driver, or the hardware.
Making an RTAudio client is challenging because there are several flavors of
RTAudio:
1. Packet streaming -
KSPROPERTY_RTAUDIO_GETREADPACKET/KSPROPERTY_RTAUDIO_SETWRITEPACKET
2. RT streaming
* Event-driven vs. timer-driven -
KSPROPERTY_RTAUDIO_BUFFER_WITH_NOTIFICATION or KSPROPERTY_RTAUDIO_BUFFER
* Position register or position query –
KSPROPERTY_RTAUDIO_POSITIONREGISTER or KSPROPERTY_AUDIO_POSITION
You can count on a given piece of hardware to implement at least one of these
successfully (assuming it works with the Windows audio engine) but you cannot
pick one of these and expect it to work on all hardware.
In particular a given piece of hardware might implement, say, two of these: one
of them well, and one poorly, and Windows just happens to pick the one that
worked well. (You can sometimes flush out the precise problem for the one that
worked poorly by running HLK tests on the hardware.)
To wax poetic a little on the reason for the overabundance of mechanisms for
querying position:
Always two, there are. No more, no less. -- Yoda
There are always, conceptually, two positions Windows cares about:
1. The “presentation position”. This is the position of the sample coming
out of the speaker or going into the microphone. This is so we can do things
like display the YouTube cursor in the right position, synchronize audio and
video, or synchronize microphone and speaker reference streams for acoustic
echo cancelation. This position is expected to advance smoothly in real time;
Windows doesn’t actually need realtime updates, but it DOES need to know to a
good degree of precision the wall clock time that the last reading was taken.
For example, “at 11:35:32.439 PM (that is, 300 milliseconds ago) time code
00:45:21.884 was coming out of the speaker.”
2. The “buffer position”. This is about handoff of data between Windows and
the layer below. Actually this isn’t necessarily a position at all – the idea
is entirely to make sure that all data is written, and read, precisely once (we
should never have a case where data is overwritten before being read, nor a
case where data is read without having been written first.) In contrast to the
presentation position, this will advance as a step function, since audio is
handed off in fixed-size packets. It is not particularly important that this
advance uniformly over the short term, so long as jitter buffers are not
overcome, and so long as short-term variances do not become long-term skew.
To illustrate the difference between the two positions, suppose an audio render
pipeline has significant processing delay after the handoff between Windows and
the driver. At a particular moment in time, the audio at time code 00:04:23.743
might be being handed off from Windows to the driver, but the audio coming out
of the speaker is at time code 00:04:23.693 (fifty milliseconds behind.) This
delay might be due to, say, dynamic range correction in the hardware. Or for a
capture pipeline, the audio at time code 00:32:42.987 might be entering the
microphone, but the audio being handed to Windows is still at time code
00:32:42.967 (thirty milliseconds behind.)
In general, streaming code (IAudioCaptureClient/IAudioRenderClient) cares about
the buffer position exclusively. Higher layers care about the presentation
position but this query comes through IAudioClock.
It is important to note that the difference between the presentation position
and buffer position can, conceptually, be quite large – much larger than the
time between audio packet handoffs during streaming.
Now, details.
-- WaveRT Packet Streaming --
KSPROPERTY_RTAUDIO_PRESENTATION_POSITION is the presentation position. It is
not used when streaming, but it is necessary to power IAudioClock.
KSPROPERTY_RTAUDIO_SETWRITEPACKET and KSPROPERTY_RTAUDIO_GETREADPACKET are the
streaming properties. These replace the need for a buffer position.
-- WaveRT Position Register --
KSPROPERTY_RTAUDIO_POSITIONREGISTER is the buffer position. The presentation
position is approximated by taking the buffer position and subtracting (for
render) or adding (for capture) KSPROPERTY_RTAUDIO_HWLATENCY.
-- Classic Position --
KSPROPERTY_AUDIO_POSITION and KSPROPERTY_AUDIO_POSITIONEX gives a PlayOffset
and a WriteOffset. PlayOffset is the presentation position (for render this is
called the “play position” and for capture this is called the “record
position”), and WriteOffset is the buffer position (for render this is called
the “write position” and for capture this is called the “read position”.)
From: Eugene Muzychenko<mailto:reg.wad@xxxxxxxxxxxxxx>
Sent: Thursday, December 7, 2017 6:32 PM
Subject: [wdmaudiodev] Re: KSPROPERTY_AUDIO_POSITION and position register
value relationship
Hello Matthew,
For the “position register” value are you looking at
KSPROPERTY_RTAUDIO_POSITIONREGISTER, or
KSPROPERTY_RTAUDIO_PRESENTATION_POSITION, or both?