[edk2-devel] VirtIO Sound Driver (GSoC 2021)

Sun Apr 18 19:11:10 UTC 2021

On 18.04.21 17:22, Andrew Fish via groups.io wrote:
>
>
>> On Apr 18, 2021, at 1:55 AM, Ethin Probst <harlydavidsen at gmail.com 
>> <mailto:harlydavidsen at gmail.com>> wrote:
>>
>>> I think it would be best to sketch use-cases for audio and design 
>>> the solutions closely to the requirements. Why do we need to know 
>>> when audio finished? What will happen when we queue audio twice? 
>>> There are many layers (UX, interface, implementation details) of 
>>> questions to coming up with a pleasant and stable design.
>
> We are not using EFI to listen to music in the background. Any audio 
> being played is part of a UI element and there might be 
> synchronization requirements.

Maybe I communicated that wrong, I'm not asking because I don't know 
what audio is used for, I am saying ideally there is a written-down list 
of usage requirements before the protocol is designed, because that is 
what the design targets. The details should follow the needs.

>
> For example playing a boot bong on boot you want that to be 
> asynchronous as you don’t want to delay boot to play the sound, but 
> you may want to chose to gate some UI elements on the boot bong 
> completing. If you are building a menu UI that is accessible you may 
> need to synchronize playback with UI update, but you may not want to 
> make the slow sound playback blocking as you can get other UI work 
> done in parallel.
>
> The overhead for a caller making an async call is not much [1], but 
> not having the capability could really restrict the API for its 
> intended use. I’d also point out we picked the same pattern as the 
> async BlockIO and there is something to said for having consistency in 
> the UEFI Spec and have similar APIs work in similar ways.

I'm not saying there should be *no* async playback, I am saying it may 
be worth considering implementing it differently from caller-owned 
events. I'm not concerned with overhead, I'm concerned with points of 
failure (e.g. leaks).

I very briefly discussed some things with Ethin and it seems like the 
default EDK II timer interval of 10 ms may be problematic, but I am not 
sure. Just leaving it here as something to keep it mind.

Best regards,
Marvin

>
> [1] Overhead for making an asynchronous call.
> AUDIO_TOKEN AudioToken;
> gBS->CreateEvent  (EVT_NOTIFY_SIGNAL, TPL_CALLBACK, NULL, NULL, 
> &AudioToken.Event);
>
> Thanks,
>
> Andrew Fish
>
>> I would be happy to discuss this with you on the UEFI talkbox. I'm
>> draeand on there.
>> As for your questions:
>>
>> 1. The only reason I recommend using an event to signal audio
>> completion is because I do not want this protocol to be blocking at
>> all. (So, perhaps removing the token entirely is a good idea.) The
>> VirtIO audio device says nothing about synchronization, but I imagine
>> its asynchronous because every audio specification I've seen out there
>> is asynchronous. Similarly, every audio API in existence -- at least,
>> every low-level OS-specific one -- is asynchronous/non-blocking.
>> (Usually, audio processing is handled on a separate thread.) However,
>> UEFI has no concept of threads or processes. Though we could use the
>> MP PI package to spin up a separate processor, that would fail on
>> uniprocessor, unicore systems. Audio processing needs a high enough
>> priority that it gets first in the list of tasks served while
>> simultaneously not getting a priority that's so high that it blocks
>> everything else. This is primarily because of the way an audio
>> subsystem is designed and the way an audio device functions: the audio
>> subsystem needs to know, immediately, when the audio buffer has ran
>> out of samples and needs more, and it needs to react immediately to
>> refill the buffer if required, especially when streaming large amounts
>> of audio (e.g.: music). Similarly, the audio subsystem needs the
>> ability to react as soon as is viable when playback is requested,
>> because any significant delay will be noticeable by the end-user. In
>> more complex systems like FMOD or OpenAL, the audio processing thread
>> also needs a high priority to ensure that audio effects, positioning
>> information, dithering, etc., can be configured immediately because
>> the user will notice if any glitches or delays occur. The UEFI audio
>> protocols obviously will be nowhere near as complex, or as advanced,
>> because no one will need audio effects in a preboot environment.
>> Granted, its possible to make small audio effects, for example delays,
>> even if the protocol doesn't have functions to do that, but if an
>> end-user wants to go absolutely crazy with the audio samples and mix
>> in a really nice-sounding reverb or audio filter before sending the
>> samples to the audio engine, well, that's what they want to do and
>> that's out of our hands as driver/protocol developers. But I digress.
>> UEFI only has four TPLs, and so what we hopefully want is an engine
>> that is able to manage sample buffering and transmission, but also
>> doesn't block the application that's using the protocol. For some
>> things, blocking might be acceptable, but for speech synthesis or the
>> playing of startup sounds, this would not be an acceptable result and
>> would make the protocol pretty much worthless in the majority of
>> scenarios. So that's why I had an event to signal audio completion --
>> it was (perhaps) a cheap hack around the cooperatively-scheduled task
>> architecture of UEFI. (At least, I think its cooperative multitasking,
>> correct me if I'm wrong.)
>> 2. The VirtIO specification does not specify what occurs in the event
>> that a request is received to play a stream that's already being
>> played. However, it does provide enough information for extrapolation.
>> Every request that's sent to a VirtIO sound device must come with two
>> things: a stream ID and a buffer of samples. The sample data must
>> immediately follow the request. Therefore, for VirtIO in particular,
>> the device will simply stop playing the old set of samples and play
>> the new set instead. This goes along with what I've seen in other
>> specifications like the HDA one: unless the device in question
>> supports more than one stream, it is impossible to play two sounds on
>> a single stream simultaneously, and an HDA controller (for example) is
>> not going to perform any mixing; mixing is done purely in software.
>> Similarly, if a device does support multiple streams, it is
>> unspecified whether the device will play two or more streams
>> simultaneously or whether it will pause/abort the playback of one
>> while it plays another. Therefore, I believe (though cannot confirm)
>> that OSes like Windows simply use a single stream, even if the device
>> supports multiple streams, and just makes the applications believe
>> that unlimited streams are possible.
>>
>> I apologize for this really long-winded email, and I hope no one 
>> minds. :-)
>>
>> On 4/17/21, Marvin Häuser <mhaeuser at posteo.de 
>> <mailto:mhaeuser at posteo.de>> wrote:
>>> On 17.04.21 19:31, Andrew Fish via groups.io <http://groups.io> wrote:
>>>>
>>>>
>>>>> On Apr 17, 2021, at 9:51 AM, Marvin Häuser <mhaeuser at posteo.de 
>>>>> <mailto:mhaeuser at posteo.de>
>>>>> <mailto:mhaeuser at posteo.de <mailto:mhaeuser at posteo.de>>> wrote:
>>>>>
>>>>> On 16.04.21 19:45, Ethin Probst wrote:
>>>>>> Yes, three APIs (maybe like this) would work well:
>>>>>> - Start, Stop: begin playback of a stream
>>>>>> - SetVolume, GetVolume, Mute, Unmute: control volume of output and
>>>>>> enable muting
>>>>>> - CreateStream, ReleaseStream, SetStreamSampleRate: Control sample
>>>>>> rate of stream (but not sample format since Signed 16-bit PCM is
>>>>>> enough)
>>>>>> Marvin, how do you suggest we make the events then? We need some way
>>>>>> of notifying the caller that the stream has concluded. We could make
>>>>>> the driver create the event and pass it back to the caller as an
>>>>>> event, but you'd still have dangling pointers (this is C, after all).
>>>>>> We could just make a IsPlaying() function and WaitForCompletion()
>>>>>> function and allow the driver to do the event handling -- would that
>>>>>> work?
>>>>>
>>>>> I do not know enough about the possible use-cases to tell. Aside from
>>>>> the two functions you already mentioned, you could also take in an
>>>>> (optional) notification function.
>>>>> Which possible use-cases does determining playback end have? If it's
>>>>> too much effort, just use EFI_EVENT I guess, just the less code can
>>>>> mess it up, the better.
>>>>>
>>>>
>>>> In UEFI EFI_EVENT works much better. There is a gBS-WaitForEvent()
>>>> function that lets a caller wait on an event. That is basically what
>>>> the UEFI Shell is doing at the Shell prompt. A GUI in UEFI/C is
>>>> basically an event loop.
>>>>
>>>> Fun fact: I ended up adding gIdleLoopEventGuid to the MdeModulePkg so
>>>> the DXE Core could signal gIdleLoopEventGuid if you are sitting in
>>>> gBS-WaitForEvent() and no event is signaled. Basically in EFI nothing
>>>> is going to happen until the next timer tick so the gIdleLoopEventGuid
>>>> lets you idle the CPU until the next timer tick. I was forced to do
>>>> this as the 1st MacBook Air had a bad habit of thermal tripping when
>>>> sitting at the UEFI Shell prompt. After all another name for a loop in
>>>> C code running on bare metal is a power virus.
>>>
>>> Mac EFI is one of the best implementations we know of, frankly. I'm
>>> traumatised by Aptio 4 and alike, where (some issues are OEM-specific I
>>> think) you can have timer events signalling after ExitBS, there is event
>>> clutter on IO polling to the point where everything lags no matter what
>>> you do, and even in "smooth" scenarios there may be nothing worth the
>>> description "granularity" (events scheduled to run every 10 ms may run
>>> every 50 ms). Events are the last resort for us, if there really is no
>>> other way. My first GUI implementation worked without events at all for
>>> this reason, but as our workarounds got better, we did start using them
>>> for keyboard and mouse polling.
>>>
>>> Timers do not apply here, but what does apply is resource management.
>>> Using EFI_EVENT directly means (to the outside) the introduction of a
>>> new resource to maintain, for each caller separately. On the other side,
>>> there is no resource to misuse or leak if none such is exposed. Yet, if
>>> you argue with APIs like WaitForEvent, something has to signal it. In a
>>> simple environment this would mean, some timer event is running and may
>>> signal the event the main code waits for, where above's concern actually
>>> do apply. :) Again, the recommendation assumes the use-cases are simple
>>> enough to easily avoid them.
>>>
>>> I think it would be best to sketch use-cases for audio and design the
>>> solutions closely to the requirements. Why do we need to know when audio
>>> finished? What will happen when we queue audio twice? There are many
>>> layers (UX, interface, implementation details) of questions to coming up
>>> with a pleasant and stable design.
>>>
>>> Best regards,
>>> Marvin
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Andrew Fish.
>>>>
>>>>> If I remember correctly you mentioned the UEFI Talkbox before, if
>>>>> that is more convenient for you, I'm there as mhaeuser.
>>>>>
>>>>> Best regards,
>>>>> Marvin
>>>>>
>>>>>>
>>>>>> On 4/16/21, Andrew Fish <afish at apple.com <mailto:afish at apple.com> 
>>>>>> <mailto:afish at apple.com <mailto:afish at apple.com>>>
>>>>>> wrote:
>>>>>>>
>>>>>>>> On Apr 16, 2021, at 4:34 AM, Leif Lindholm <leif at nuviainc.com 
>>>>>>>> <mailto:leif at nuviainc.com>
>>>>>>>> <mailto:leif at nuviainc.com <mailto:leif at nuviainc.com>>> wrote:
>>>>>>>>
>>>>>>>> Hi Ethin,
>>>>>>>>
>>>>>>>> I think we also want to have a SetMode function, even if we 
>>>>>>>> don't get
>>>>>>>> around to implement proper support for it as part of GSoC 
>>>>>>>> (although I
>>>>>>>> expect at least for virtio, that should be pretty straightforward).
>>>>>>>>
>>>>>>> Leif,
>>>>>>>
>>>>>>> I’m think if we have an API to load the buffer and a 2nd API to
>>>>>>> play the
>>>>>>> buffer an optional 3rd API could configure the streams.
>>>>>>>
>>>>>>>> It's quite likely that speech for UI would be stored as 8kHz (or
>>>>>>>> 20kHz) in some systems, whereas the example for playing a tune in
>>>>>>>> GRUB
>>>>>>>> would more likely be a 44.1 kHz mp3/wav/ogg/flac.
>>>>>>>>
>>>>>>>> For the GSoC project, I think it would be quite reasonable to
>>>>>>>> pre-generate pure PCM streams for testing rather than decoding
>>>>>>>> anything on the fly.
>>>>>>>>
>>>>>>>> Porting/writing decoders is really a separate task from 
>>>>>>>> enabling the
>>>>>>>> output. I would much rather see USB *and* HDA support able to play
>>>>>>>> pcm
>>>>>>>> streams before worrying about decoding.
>>>>>>>>
>>>>>>> I agree it might turn out it is easier to have the text to speech
>>>>>>> code just
>>>>>>> encode a PCM directly.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Andrew Fish
>>>>>>>
>>>>>>>> /
>>>>>>>>    Leif
>>>>>>>>
>>>>>>>> On Fri, Apr 16, 2021 at 00:33:06 -0500, Ethin Probst wrote:
>>>>>>>>> Thanks for that explanation (I missed Mike's message). Earlier I
>>>>>>>>> sent
>>>>>>>>> a summary of those things that we can agree on: mainly, that 
>>>>>>>>> we have
>>>>>>>>> mute, volume control, a load buffer, (maybe) an unload buffer, 
>>>>>>>>> and a
>>>>>>>>> start/stop stream function. Now that I fully understand the
>>>>>>>>> ramifications of this I don't mind settling for a specific format
>>>>>>>>> and
>>>>>>>>> sample rate, and signed 16-bit PCM audio is, I think, the most
>>>>>>>>> widely
>>>>>>>>> used one out there, besides 64-bit floating point samples, which
>>>>>>>>> I've
>>>>>>>>> only seen used in DAWs, and that's something we don't need.
>>>>>>>>> Are you sure you want the firmware itself to handle the 
>>>>>>>>> decoding of
>>>>>>>>> WAV audio? I can make a library class for that, but I'll 
>>>>>>>>> definitely
>>>>>>>>> need help with the security aspect.
>>>>>>>>>
>>>>>>>>> On 4/16/21, Andrew Fish via groups.io <http://groups.io> 
>>>>>>>>> <http://groups.io <http://groups.io>>
>>>>>>>>> <afish=apple.com at groups.io <mailto:afish=apple.com at groups.io> 
>>>>>>>>> <mailto:afish=apple.com at groups.io 
>>>>>>>>> <mailto:afish=apple.com at groups.io>>>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Apr 15, 2021, at 5:59 PM, Michael Brown <mcb30 at ipxe.org 
>>>>>>>>>>> <mailto:mcb30 at ipxe.org>
>>>>>>>>>>> <mailto:mcb30 at ipxe.org <mailto:mcb30 at ipxe.org>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 16/04/2021 00:42, Ethin Probst wrote:
>>>>>>>>>>>> Forcing a particular channel mapping, sample rate and sample
>>>>>>>>>>>> format
>>>>>>>>>>>> on
>>>>>>>>>>>> everyone would complicate application code. From an
>>>>>>>>>>>> application point
>>>>>>>>>>>> of view, one would, with that type of protocol, need to do the
>>>>>>>>>>>> following:
>>>>>>>>>>>> 1) Load an audio file in any audio file format from any storage
>>>>>>>>>>>> mechanism.
>>>>>>>>>>>> 2) Decode the audio file format to extract the samples and 
>>>>>>>>>>>> audio
>>>>>>>>>>>> metadata.
>>>>>>>>>>>> 3) Resample the (now decoded) audio samples and convert
>>>>>>>>>>>> (quantize)
>>>>>>>>>>>> the
>>>>>>>>>>>> audio samples into signed 16-bit PCM audio.
>>>>>>>>>>>> 4) forward the samples onto the EFI audio protocol.
>>>>>>>>>>> You have made an incorrect assumption that there exists a
>>>>>>>>>>> requirement
>>>>>>>>>>> to
>>>>>>>>>>> be able to play audio files in arbitrary formats.  This
>>>>>>>>>>> requirement
>>>>>>>>>>> does
>>>>>>>>>>> not exist.
>>>>>>>>>>>
>>>>>>>>>>> With a protocol-mandated fixed baseline set of audio parameters
>>>>>>>>>>> (sample
>>>>>>>>>>> rate etc), what would happen in practice is that the audio
>>>>>>>>>>> files would
>>>>>>>>>>> be
>>>>>>>>>>> encoded in that format at *build* time, using tools entirely
>>>>>>>>>>> external
>>>>>>>>>>> to
>>>>>>>>>>> UEFI.  The application code is then trivially simple: it 
>>>>>>>>>>> just does
>>>>>>>>>>> "load
>>>>>>>>>>> blob, pass blob to audio protocol".
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ethin,
>>>>>>>>>>
>>>>>>>>>> Given the goal is an industry standard we value interoperability
>>>>>>>>>> more
>>>>>>>>>> that
>>>>>>>>>> flexibility.
>>>>>>>>>>
>>>>>>>>>> How about another use case. Lets say the Linux OS loader (Grub)
>>>>>>>>>> wants
>>>>>>>>>> to
>>>>>>>>>> have an accessible UI so it decides to sore sound files on 
>>>>>>>>>> the EFI
>>>>>>>>>> System
>>>>>>>>>> Partition and use our new fancy UEFI Audio Protocol to add audio
>>>>>>>>>> to the
>>>>>>>>>> OS
>>>>>>>>>> loader GUI. So that version of Grub needs to work on 1,000 of
>>>>>>>>>> different
>>>>>>>>>> PCs
>>>>>>>>>> and a wide range of UEFI Audio driver implementations. It is 
>>>>>>>>>> a much
>>>>>>>>>> easier
>>>>>>>>>> world if Wave PCM 16 bit just works every place. You could add a
>>>>>>>>>> lot of
>>>>>>>>>> complexity and try to encode the audio on the fly, maybe even in
>>>>>>>>>> Linux
>>>>>>>>>> proper but that falls down if you are booting from read only
>>>>>>>>>> media like
>>>>>>>>>> a
>>>>>>>>>> DVD or backup tape (yes people still do that in server land).
>>>>>>>>>>
>>>>>>>>>> The other problem with flexibility is you just made the test 
>>>>>>>>>> matrix
>>>>>>>>>> very
>>>>>>>>>> large for every driver that needs to get implemented. For
>>>>>>>>>> something as
>>>>>>>>>> complex as Intel HDA how you hook up the hardware and what
>>>>>>>>>> CODECs you
>>>>>>>>>> use
>>>>>>>>>> may impact the quality of the playback for a given board. Your
>>>>>>>>>> EFI is
>>>>>>>>>> likely
>>>>>>>>>> going to pick a single encoding at that will get tested all the
>>>>>>>>>> time if
>>>>>>>>>> your
>>>>>>>>>> system has audio, but all 50 other things you support not so
>>>>>>>>>> much. So
>>>>>>>>>> that
>>>>>>>>>> will required testing, and some one with audiophile ears (or 
>>>>>>>>>> an AI
>>>>>>>>>> program)
>>>>>>>>>> to test all the combinations. I’m not kidding I get BZs on the
>>>>>>>>>> quality
>>>>>>>>>> of
>>>>>>>>>> the boot bong on our systems.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> typedef struct EFI_SIMPLE_AUDIO_PROTOCOL {
>>>>>>>>>>>>  EFI_SIMPLE_AUDIO_PROTOCOL_RESET Reset;
>>>>>>>>>>>>  EFI_SIMPLE_AUDIO_PROTOCOL_START Start;
>>>>>>>>>>>>  EFI_SIMPLE_AUDIO_PROTOCOL_STOP Stop;
>>>>>>>>>>>> } EFI_SIMPLE_AUDIO_PROTOCOL;
>>>>>>>>>>> This is now starting to look like something that belongs in
>>>>>>>>>>> boot-time
>>>>>>>>>>> firmware.  :)
>>>>>>>>>>>
>>>>>>>>>> I think that got a little too simple I’d go back and look at the
>>>>>>>>>> example
>>>>>>>>>> I
>>>>>>>>>> posted to the thread but add an API to load the buffer, and then
>>>>>>>>>> play
>>>>>>>>>> the
>>>>>>>>>> buffer (that way we can an API in the future to twiddle knobs).
>>>>>>>>>> That
>>>>>>>>>> API
>>>>>>>>>> also implements the async EFI interface. Trust me the 1st thing
>>>>>>>>>> that is
>>>>>>>>>> going to happen when we add audio is some one is going to
>>>>>>>>>> complain in
>>>>>>>>>> xyz
>>>>>>>>>> state we should mute audio, or we should honer audio volume and
>>>>>>>>>> mute
>>>>>>>>>> settings from setup, or from values set in the OS. Or some one
>>>>>>>>>> is going
>>>>>>>>>> to
>>>>>>>>>> want the volume keys on the keyboard to work in EFI.
>>>>>>>>>>
>>>>>>>>>> Also if you need to pick apart the Wave PCM 16 byte file to feed
>>>>>>>>>> it into
>>>>>>>>>> the
>>>>>>>>>> audio hardware that probably means we should have a library that
>>>>>>>>>> does
>>>>>>>>>> that
>>>>>>>>>> work, so other Audio drivers can share that code. Also having a
>>>>>>>>>> library
>>>>>>>>>> makes it easier to write a unit test. We need to be security
>>>>>>>>>> conscious
>>>>>>>>>> as we
>>>>>>>>>> need to treat the Audo file as attacker controlled data.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Andrew Fish
>>>>>>>>>>
>>>>>>>>>>> Michael
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Signed,
>>>>>>>>> Ethin D. Probst
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>
>>
>> --
>> Signed,
>> Ethin D. Probst
>
> 

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#74233): https://edk2.groups.io/g/devel/message/74233
Mute This Topic: https://groups.io/mt/81710286/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-