[libvirt] [RFC] kvm: x86: export vCPU halted state to sysfs

Mon Feb 5 16:10:18 UTC 2018

On 05.02.2018 16:37, Luiz Capitulino wrote:
> On Mon, 5 Feb 2018 13:47:27 +0000
> Daniel P. Berrangé <berrange at redhat.com> wrote:
> 
>> On Mon, Feb 05, 2018 at 02:43:15PM +0100, Viktor Mihajlovski wrote:
>>> On 02.02.2018 21:41, Eduardo Habkost wrote:  
>>>> On Fri, Feb 02, 2018 at 03:19:45PM -0500, Luiz Capitulino wrote:  
>>>>> On Fri, 2 Feb 2018 18:09:12 -0200
>>>>> Eduardo Habkost <ehabkost at redhat.com> wrote:  
>>>> [...]  
>>>>>> Your plan above covers what will happen when using newer QEMU
>>>>>> versions, but libvirt still needs to work sanely if running QEMU
>>>>>> 2.11.  My suggestion is that libvirt do not run query-cpus to ask
>>>>>> for the "halted" field on any architecture except s390.  
>>>>>
>>>>> My current plan is to ask libvirt to completely remove query-cpus
>>>>> usage, independent of the arch and use the new command instead.  
>>>>
>>>> This would be a regression for people running QEMU 2.11 on s390.
>>>>
>>>> (But maybe it would be an acceptable regression?  Viktor, what do
>>>> you think?  Are there production releases of management systems
>>>> that already rely on vcpu.<n>.halted?)
>>>>   
>>> Unfortunately, there's code out there looking at vcpu.<n>.halted. I've
>>> informed the product team about the issue.
>>>
>>> If we drop/deprecate vcpu.<n>.halted from the domain statistics, this
>>> should be done for all arches, if there's a replacement mechanism (i.e.
>>> new VCPU states). As a stop-gap measure we can make the call
>>> arch-dependent until the new stuff is in place.  
>>
>> Yes, I think libvirt should just restrict this 'halted' feature reporting
>> to s390 only, since the other archs have different semantics for this
>> item, and the s390 semantics are the ones we want.
> 
> From this whole discussion, there's only one thing that I still don't
> understand (in a very honest way): what makes s390 halted semantics
> different?One problem is that using the halted property to indicate that the CPU
has assumed the architected disabled wait state may not have been the
wisest decision (my fault). If the CPU enters disabled wait, it will
stay inactive until it is explicitly restarted which is different on x86.
> 
> By quickly looking at the code, it seems to be very like the x86 one
> when in kernel irqchip is not used: if a guest vCPU executes HLT, the
> vCPU exits to userspace and qemu will put the vCPU thread to sleep.
> This is the semantics I'd expect for HLT, and maybe for all archs.>
> What makes x86 different, is when the in kernel irqchip is used (which
> should be the default with libvirt). In this case, the vCPU thread avoids
> exiting to user-space. So, qemu doesn't know the vCPU halted.
> 
> That's only one of the reasons why query-cpus forces vCPUs to user-space.
> But there are other reasons, and that's why even on s390 query-cpus
> will also force vCPUs to user-space, which means s390 has the same perf
> issue but maybe this hasn't been detected yet.
> 
> For the immediate term, I still think we should have a query-cpus
> replacement that doesn't cause vCPUs to go to userspace. I'll work this
> this week.
FWIW: I currently exploring an extension to query-cpus to report
s390-specific information, allowing to ditch halted in the long run.
Further, I'm considering a new QAPI event along the lines of "CPU info
has changed" allowing QEMU to announce low-frequency changes of CPU
state (as is the case for s390) and finally wire up a handler in libvirt
to update a tbd. property (!= halted).
> 
> However, IMHO, what we really want is to add an API to the guest agent
> to export the CPU online bit from the guest userspace sysfs. This will
> give the ultimate semantics and move us away from this halted mess.
> 

-- 
Regards,
 Viktor Mihajlovski