[edk2-devel] VirtIO Sound Driver (GSoC 2021)

Fri Apr 16 13:22:14 UTC 2021

Good day,

Sorry for the nitpicking.

- Protocols always need a "Revision" field as first member. This is used 
to be able to expand its capabilities in later revisions without 
introducing a new, distinct protocol.
- Consider the name EFI_SIMPLE_AUDIO_OUTPUT(!)_PROTOCOL, to not cause 
confusion if input is ever added. Input in my opinion should be a 
separate protocol as there is no reason why they would necessarily be 
coupled topology-wise (think of an USB microphone, it will never have 
any sort of output).
- To make code safety a bit easier, try to use "CONST" for "IN" 
(non-OUT) pointers, so that CONST can be propagated where possible.
- Please do *not* make the events caller-owned. We had it multiple times 
already on production firmware that events are left dangling and may be 
polled/signaled after ExitBS(). The caller should be able to decide on 
some policy maybe (i.e. abort or block on ExitBS() until the playback 
finished), as cut-off audio may be awkward; but the callee definitely 
should implement "event safety" itself. Maybe avoid exposing events 
directly at all and provide nice abstractions the caller cannot misuse.
- I don't think audio should be required at all, the required subset 
should firstly consider minimalism and security. Accessibility will not 
be of concern for some IoT device, the audio code would simply eat 
space, and introduce a larger surface for bugs.

Best regards,
Marvin

On 16.04.21 01:42, Ethin Probst wrote:
> Hi Andrew,
>
> What would that protocol interface look like if we utilized your idea?
> With mine (though I need to add channel mapping as well), your
> workflow for playing a stereo sound from left to right would probably
> be something like this:
> 1) Encode the sound using a standard tool into a Wave PCM 16.
> 2) Place the Wave file in the Firmware Volume using a given UUID as
> the name. As simple as editing the platform FDF file.
> 3) Write some BDS code
>    a) Lookup Wave file by UUID and read it into memory.
>    b) Decode the audio file (audio devices will not do this decoding
> for you, you have to do that yourself).
>    c) Call EFI_AUDIO_PROTOCOL.LoadBuffer(), passing in the sample rate
> of your audio, EFI_AUDIO_PROTOCOL_SAMPLE_FORMAT_S16 for signed 16-bit
> PCM audio, the channel mapping, the number of samples, and the samples
> themselves.
>    d) call EFI_BOOT_SERVICES.CreateEvent()/EFI_BOOT_SERVICES.CreateEventEx()
> for a playback event to signal.
>    e) call EFI_AUDIO_PROTOCOL.StartPlayback(), passing in the event you
> just created.
> The reason that LoadBuffer() takes so many parameters is because the
> device does not know the audio that your passing in. If I'm given an
> array of 16-bit audio samples, its impossible to know the parameters
> (sample rate, sample format, channel mapping, etc.) from that alone.
> Using your idea, though, my protocol could be greatly simplified.
> Forcing a particular channel mapping, sample rate and sample format on
> everyone would complicate application code. From an application point
> of view, one would, with that type of protocol, need to do the
> following:
> 1) Load an audio file in any audio file format from any storage mechanism.
> 2) Decode the audio file format to extract the samples and audio metadata.
> 3) Resample the (now decoded) audio samples and convert (quantize) the
> audio samples into signed 16-bit PCM audio.
> 4) forward the samples onto the EFI audio protocol.
> There is another option. (I'm happy we're discussing this now -- we
> can hammer out all the details now which will make a lot of things
> easier.) Since I'll most likely end up splitting the device-specific
> interfaces to different audio protocols, we could make a simple audio
> protocol that makes various assumptions about the audio samples your
> giving it (e.g.: sample rate, format, ...). This would just allow
> audio output and input in signed 16-bit PCM audio, as you've
> suggested, and would be a simple and easy to use interface. Something
> like:
> typedef struct EFI_SIMPLE_AUDIO_PROTOCOL {
>    EFI_SIMPLE_AUDIO_PROTOCOL_RESET Reset;
>    EFI_SIMPLE_AUDIO_PROTOCOL_START Start;
>    EFI_SIMPLE_AUDIO_PROTOCOL_STOP Stop;
> } EFI_SIMPLE_AUDIO_PROTOCOL;
> This way, users and driver developers have a simple audio protocol
> they can implement if they like. It would assume signed 16-bit PCM
> audio and mono channel mappings at 44100 Hz. Then, we can have an
> advanced protocol for each device type (HDA, USB, VirtIO, ...) that
> exposes all the knobs for sample formats, sample rates, that kind of
> thing. Obviously, like the majority (if not all) UEFI protocols, these
> advanced protocols would be optional. I think, however, that the
> simple audio protocol should be a required protocol in all UEFI
> implementations. But that might not be possible. So would this simpler
> interface work as a starting point?
>
> On 4/15/21, Andrew Fish <afish at apple.com> wrote:
>>
>>> On Apr 15, 2021, at 1:11 PM, Ethin Probst <harlydavidsen at gmail.com>
>>> wrote:
>>>
>>>> Is there any necessity for audio input and output to be implemented
>>>> within the same protocol?  Unlike a network device (which is
>>>> intrinsically bidirectional), it seems natural to conceptually separate
>>>> audio input from audio output.
>>> Nope, there isn't a necessity to make them in one, they can be
>>> separated into two.
>>>
>>>> The code controlling volume/mute may not have any access to the sample
>>>> buffer.  The most natural implementation would seem to allow for a
>>>> platform to notice volume up/down keypresses and use those to control the
>>>> overall system volume, without any knowledge of which samples (if any)
>>>> are currently being played by other code in the system.
>>> Your assuming that the audio device your implementing the
>>> volume/muting has volume control and muting functionality within
>>> itself, then.
>> Not really. We are assuming that audio hardware has a better understanding
>> of how that system implements volume than some generic EFI Code that is by
>> definition platform agnostic.
>>
>>> This may not be the case, and so we'd need to
>>> effectively simulate it within the driver, which isn't too hard to do.
>>> As an example, the VirtIO driver does not have a request type for
>>> muting or for volume control (this would, most likely, be within the
>>> VIRTIO_SND_R_PCM_SET_PARAMS request, see sec. 5.14.6.4.3). Therefore,
>>> either the driver would have to simulate the request or return
>>> EFI_UNSUPPORTED this instance.
>>>
>> So this is an example of above since the audio hardware knows it is routing
>> the sound output into another subsystem, and that subsystem controls the
>> volume. So the VirtIo Sound Driver know best how to bstract volume/mute for
>> this platform.
>>
>>>> Consider also the point of view of the developer implementing a driver
>>>> for some other piece of audio hardware that happens not to support
>>>> precisely the same sample rates etc as VirtIO.  It would be extremely
>>>> ugly to force all future hardware to pretend to have the same
>>>> capabilities as VirtIO just because the API was initially designed with
>>>> VirtIO in mind.
>>> Precisely, but the brilliance of VirtIO
>> The brilliance of VirtIO is that it just needs to implement a generic device
>> driver for a given operating system. In most cases these operating systems
>> have sounds subsystems that manage sound and want fine granularity of
>> control on what is going on. So the drivers are implemented to maximizes
>> flexibility since the OS has lots of generic code that deals with sound, and
>> even user configurable knobs to control audio. In our case that extra layer
>> does not exist in EFI and the end user code just want to tell the driver do
>> some simple things.
>>
>> Maybe it is easier to think about with an example. Lets say I want to play a
>> cool sound on every boot. What would be the workflow to make the happen.
>> 1) Encode the sound using a standard tool into a Wave PCM 16.
>> 2) Place the Wave file in the Firmware Volume using a given UUID as the
>> name. As simple as editing the platform FDF file.
>> 3) Write some BDS code
>>    a) Lookup Wave file by UUID and read it into memory.
>>    b) Point the EFI Sound Protocol at the buffer with the Wave file
>>    c) Tell the EFI Sound Protocol to play the sound.
>>
>> If you start adding in a lot of perimeters that work flow starts getting
>> really complicated really quickly.
>>
>>> is that the sample rate,
>>> sample format, &c., do not have to all be supported by a VirtIO
>>> device. Notice, also, how in my protocol proposal I noted that the
>>> sample rates, at least, were "recommended," not "required." Should a
>>> device not happen to support a sample rate or sample format, all it
>>> needs to do is return EFI_INVALID_PARAMETER. Section 5.14.6.2.1
>>> (VIRTIO_SND_R_JACK_GET_CONFIG) describes how a jack tells you what
>>> sample rates it supports, channel mappings, &c.
>>>
>>> I do understand how just using a predefined sample rate and sample
>>> format might be a good idea, and if that's the best way, then that's
>>> what we'll do. The protocol can always be revised at a later time if
>>> necessary. I apologize if my stance seems obstinate.
>>>
>> I think if we add the version into the protocol and make sure we have a
>> separate load and play operation we could add a member to set the extra
>> perimeters if needed. There might also be some platform specific generic
>> tunables that might be interesting for a future member function.
>>
>> Thanks,
>>
>> Andrew Fish
>>
>>> Also, thank you, Laszlo, for your advice -- I hadn't considered that a
>>> network driver would be another good way of figuring out how async
>>> works in UEFI.
>>>
>>> On 4/15/21, Andrew Fish <afish at apple.com> wrote:
>>>>
>>>>> On Apr 15, 2021, at 5:07 AM, Michael Brown <mcb30 at ipxe.org> wrote:
>>>>>
>>>>> On 15/04/2021 06:28, Ethin Probst wrote:
>>>>>> - I hoped to add recording in case we in future want to add
>>>>>> accessibility aids like speech recognition (that was one of the todo
>>>>>> tasks on the EDK2 tasks list)
>>>>> Is there any necessity for audio input and output to be implemented
>>>>> within
>>>>> the same protocol?  Unlike a network device (which is intrinsically
>>>>> bidirectional), it seems natural to conceptually separate audio input
>>>>> from
>>>>> audio output.
>>>>>
>>>>>> - Muting and volume control could easily be added by just replacing
>>>>>> the sample buffer with silence and by multiplying all the samples by a
>>>>>> percentage.
>>>>> The code controlling volume/mute may not have any access to the sample
>>>>> buffer.  The most natural implementation would seem to allow for a
>>>>> platform to notice volume up/down keypresses and use those to control
>>>>> the
>>>>> overall system volume, without any knowledge of which samples (if any)
>>>>> are
>>>>> currently being played by other code in the system.
>>>>>
>>>> I’ve also thought of adding NVRAM variable that would let setup, UEFI
>>>> Shell,
>>>> or even the OS set the current volume, and Mute. This how it would be
>>>> consumed concept is why I proposed mute and volume being separate APIs.
>>>> The
>>>> volume up/down API in addition to fixed percentage might be overkill, but
>>>> it
>>>> does allow a non liner mapping to the volume up/down keys. You would be
>>>> surprised how picky audiophiles can be and it seems they like to file
>>>> Bugzillas.
>>>>
>>>>>> - Finally, the reason I used enumerations for specifying parameters
>>>>>> like sample rate and stuff was that I was looking at this protocol
>>>>>> from a general UEFI applications point of view. VirtIO supports all of
>>>>>> the sample configurations listed in my gist, and it seems reasonable
>>>>>> to allow the application to control those parameters instead of
>>>>>> forcing a particular parameter configuration onto the developer.
>>>>> Consider also the point of view of the developer implementing a driver
>>>>> for
>>>>> some other piece of audio hardware that happens not to support
>>>>> precisely
>>>>> the same sample rates etc as VirtIO.  It would be extremely ugly to
>>>>> force
>>>>> all future hardware to pretend to have the same capabilities as VirtIO
>>>>> just because the API was initially designed with VirtIO in mind.
>>>>>
>>>>> As a developer on the other side of the API, writing code to play sound
>>>>> files on an arbitrary unknown platform, I would prefer to simply
>>>>> consume
>>>>> as simple as possible an abstraction of an audio output protocol and
>>>>> not
>>>>> have to care about what hardware is actually implementing it.
>>>>>
>>>> It may make sense to have an API to load the buffer/stream and other APIs
>>>> to
>>>> play or pause. This could allow an optional API to configure how the
>>>> stream
>>>> is played back. If we add a version to the Protocol that would at least
>>>> future proof us.
>>>>
>>>> We did get feedback that it is very common to speed up the auto playback
>>>> rates for accessibility. I’m not sure if that is practical with a simple
>>>> PCM
>>>> 16 wave file with the firmware audio implementation. I guess that is
>>>> something we could investigate.
>>>>
>>>> In terms of maybe adding text to speech there is an open source project
>>>> that
>>>> conceptually we could port to EFI. It would likely be a binary that
>>>> would
>>>> have to live on the EFI System Partition due to size. I was thinking
>>>> that
>>>> words per minute could be part of that API and it would produce a PCM 16
>>>> wave file that the audio protocol we are discussing could play.
>>>>
>>>>> Both of these argue in favour of defining a very simple API that
>>>>> expresses
>>>>> only a common baseline capability that is plausibly implementable for
>>>>> every piece of audio hardware ever made.
>>>>>
>>>>> Coupled with the relatively minimalistic requirements for boot-time
>>>>> audio,
>>>>> I'd probably suggest supporting only a single format for audio data,
>>>>> with
>>>>> a fixed sample rate (and possibly only mono output).
>>>>>
>>>> In my world the folks that work for Jony asked for a stereo boot bong to
>>>> transition from left to right :). This is not the CODEC you are looking
>>>> for
>>>> was our response…. I also did not mention that some languages are right
>>>> to
>>>> left, as the only thing worse than one complex thing is two complex
>>>> things
>>>> to implement.
>>>>
>>>>> As always: perfection is achieved, not when there is nothing more to
>>>>> add,
>>>>> but when there is nothing left to take away.  :)
>>>>>
>>>> "Simplicity is the ultimate sophistication”
>>>>
>>>> Thanks,
>>>>
>>>> Andrew Fish
>>>>
>>>>> Thanks,
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Signed,
>>> Ethin D. Probst
>>
>

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#74208): https://edk2.groups.io/g/devel/message/74208
Mute This Topic: https://groups.io/mt/81710286/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-