[libvirt] [PATCH v1 1/3] adding unplug_timeout QEMU conf

Daniel Henrique Barboza danielhb413 at gmail.com
Fri Aug 30 13:32:05 UTC 2019



On 8/30/19 8:15 AM, Christophe de Dinechin wrote:
> Daniel P. Berrangé writes:
>
>>> The reason for this timeout is that we originally promised something
>>> that we cannot deliver - a synchronous device detach API, while the
>>> operation itself is asynchronous. I'm not a fan of exposing it and
>>> making it configurable.
>> I'm especially *not* a fan because the commit messages says this is
>> a problem on certain architectures. Since we know what those arches
>> are, we should use a larger timeout for those arches out of the box.
>> Requiring admin to set a config param to fix the architectures is
>> super unpleasant out of the box experiance.
> True, but also notice that 5 seconds is also already close to the
> attention span time limit for users [1]. So increasing it to 10s might
> bring people to believe things are stuck, unless you provide some sort
> of feedback that this is normal.
>
> https://www.nngroup.com/articles/response-times-3-important-limits/

Interesting link, thanks.

About the user feedback due to long response delay: we're already
breaking this with the setvcpus command, at least with PowerPC
guests and a lot of vcpus being unplugged. Here's an example in
which I am able to complete the command without kicking the
timeout error (guest is idle, vcpu unplug is fast in this case):

--- guest booted with 1 vcpu - added 39 extra vcpus. Operation takes
a second, it's fast  ---

[danielhb at kop5 libvirt]$ sudo ./run tools/virsh setvcpus vcpus-test 40 
--live

[danielhb at kop5 libvirt]$
[danielhb at kop5 libvirt]$

--- removing them back ----

[danielhb at kop5 libvirt]$ time sudo ./run tools/virsh setvcpus vcpus-test 
1 --live


real    0m21.695s
user    0m0.119s
sys    0m0.000s
[danielhb at kop5 libvirt]$

This happens in PowerPC because the timeout is being considered not for the
whole operation, but per device. Since I'm unplugging 39 devices and the
5 seconds timeout is refreshed for every operation, in theory the user
can wait close to 39*5  seconds with the terminal frozen.

Now, if we are to adhere to such UX standards (IMO, we should), I propose
the following:

- short term: increase PowerPC timeout to 10 seconds per device. Following
the UX guideline above, this is the limit we can go without warning the
user about the delay;

- short term: for PowerPC guests, tune 'setvcpus' message to warn the user
that the operation can take some time to complete;

These 2 are simples changes and I can get it done for the next release
without too much trouble.

- mid/long term: I can look into the PowerPC guest implementation, see 
if there
are device_del events being fired up in QMP and implement a better UX with
more information about how the process is going. Something like "vcpu 1 out
of 30 unplugged", "vcpu 2 out of 30 unplugged", or a progress bar, or
whatever makes more sense to give the user a feeling of operation ongoing.


Note that I'm suggesting PowerPC only changes due to what Daniel said
earlier - we can't impact other users due to something that, at first 
glance,
only PowerPC does different. I have a hunch that we should do for all
archs, but I can't defend this claim without testing this in x86 at least.
These short-term changes are easy to make it across the board though,so it's
just a matter of removing "if PowerPC" in these changes.


What do you think?



Thanks,



DHB




>> Regards,
>> Daniel
>> --
>> |:https://berrange.com       -o-https://www.flickr.com/photos/dberrange  :|
>> |:https://libvirt.org          -o-https://fstop138.berrange.com  :|
>> |:https://entangle-photo.org     -o-https://www.instagram.com/dberrange  :|
> --
> Cheers,
> Christophe de Dinechin (IRC c3d)




More information about the libvir-list mailing list