[wdmaudiodev] Re: MFX APO implementation for SPEECH mode (Cortana)

From: Martin Ordell Sørensen <odl@xxxxxxxxxxx>
To: "wdmaudiodev@xxxxxxxxxxxxx" <wdmaudiodev@xxxxxxxxxxxxx>
Date: Wed, 26 Sep 2018 12:53:22 +0000

Hi,

Thanks for the clarification - and no, the audio driver does not have such a
pin as far as I can see. So does that mean we need to make MFX work with the 48
kHz format that the system requests when we are working with the RAW pin on the
audio driver?

Best regards, Martin

From: wdmaudiodev-bounce@xxxxxxxxxxxxx <wdmaudiodev-bounce@xxxxxxxxxxxxx> On
Behalf Of Matthew van Eerde
Sent: 26. september 2018 14:23
To: wdmaudiodev@xxxxxxxxxxxxx
Subject: [wdmaudiodev] Re: MFX APO implementation for SPEECH mode (Cortana)

That documentation is specific to keyword spotter Kernel Streaming pins. Does
the audio driver below you have such a pin?

________________________________
From: wdmaudiodev-bounce@xxxxxxxxxxxxx<mailto:wdmaudiodev-bounce@xxxxxxxxxxxxx>
<wdmaudiodev-bounce@xxxxxxxxxxxxx<mailto:wdmaudiodev-bounce@xxxxxxxxxxxxx>> on
behalf of Martin Ordell Sørensen <odl@xxxxxxxxxxx<mailto:odl@xxxxxxxxxxx>>
Sent: Tuesday, September 25, 2018 3:03:58 AM
To: wdmaudiodev@xxxxxxxxxxxxx<mailto:wdmaudiodev@xxxxxxxxxxxxx>
Subject: [wdmaudiodev] MFX APO implementation for SPEECH mode (Cortana)

Hi, I am hoping that someone can help shed some light on how to implement an
APO for SPEECH processing on a microphone signal.

I am trying to figure out how a APO for SPEECH mode microphone processing
should be implemented.
I start from this link:
https://docs.microsoft.com/en-us/windows-hardware/drivers/audio/voice-activation<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fwindows-hardware%2Fdrivers%2Faudio%2Fvoice-activation&data=02%7C01%7Cmatthew.van.eerde%40microsoft.com%7C460ea25191704ee1ac4f08d622ce4bba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636734666808847324&sdata=q%2FPczzCyvC2iD4qDpt3Jm0haHGxdWzsAYUfsFftJDlg%3D&reserved=0>

It states that AEC, AGC and NS should be implemented as an MFX APO. This MFX
APO may do format conversion, but what does that mean exactly? Is it allowed to
do 48 kHz -> 16 kHz or does 'format conversion' simply mean non-FLOAT -> FLOAT?
It states that the APO must support 16 kHz mono FLOAT output but not what kind
of input formats it needs to support. Normally, I believe an APO cannot do
different input/output sample rates but is this somehow allowed in this case?
Also, in case of a device with 2 microphones, can this MFX APO do 2 ch -> 1 ch?

I have tried configuring an MFX APO to only accept 16 kHz -> 16 kHz mono float
but it seems to not be accepted by AudioDG. When I run the OEM Verification
Tool to start a SPEECH mode recording, I can see that it tries to initialize
the APO with 48 kHz 2 ch -> 48 kHz 2ch SPEECH mode, but after returning S_FALSE
from IsInputFormatSupported() with a 16 kHz format proposal, I see no further
calls to IsInputFormatSupported() and the APO is never inserted so it looks to
me like this is not an accepted format?

For performance reasons, it would be optimal to have the processing running in
16 kHz since this is also what Cortana receives in the end. I can make it load
an APO and process audio with 48 kHz -> 48 kHz SPEECH mode, but then the system
inserts an SRC afterwards and this is what I would like to avoid. It also
doesn't correspond to what the documentation reads above.

I am testing on an 1803 build.

Best regards, Martin

Follow-Ups:
- [wdmaudiodev] Re: MFX APO implementation for SPEECH mode (Cortana)
  - From: Matthew van Eerde

References:
- [wdmaudiodev] MFX APO implementation for SPEECH mode (Cortana)
  - From: Martin Ordell Sørensen
- [wdmaudiodev] Re: MFX APO implementation for SPEECH mode (Cortana)
  - From: Matthew van Eerde

[wdmaudiodev] Re: MFX APO implementation for SPEECH mode (Cortana)

Other related posts: