[libvirt] [RFC] kvm: x86: export vCPU halted state to sysfs

Eduardo Habkost ehabkost at redhat.com
Thu Feb 1 20:26:49 UTC 2018


On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote:
> 2018-02-01 12:54-0500, Luiz Capitulino:
> > 
> > Libvirt needs to know when a vCPU is halted. To get this information,
> 
> I don't see why upper level management should care about that, a single
> bit about halted state that can be incorrect at the time it is processed
> seems of very limited use.

I don't see why, either.

I'm CCing libvir-list and the people involved in the code that
added halt state to libvirt domain statistics.

> 
> (A much more sensible data point would be the fraction of time when VCPU
>  was running or runnable, which is roughly what you get by sampling the
>  halted state.)
> 
> A halted vCPU it might even be halted in guest mode, so KVM doesn't know
> about that state (unless you force a VM exit), which would complicate
> the collection a bit more ... but really, what is the data being used
> for?
> 
> User might care about the state, for obscure reasons, but that isn't a
> performance problem.
> 
> > libvirt has started using the query-cpus command from QEMU. However,
> > if in kernel irqchip is in use, query-cpus will force all vCPUs
> > to user-space since they have to issue the KVM_GET_MP_STATE ioctl.
> 
> Libvirt knows if KVM exits to userspace on halts, so it can just query
> QEMU in that case and in the other case, there is a very dirty
> "solution" that works on all architectures right now:
> 
>   grep kvm_vcpu_block /proc/$vcpu_task/stack
> 
> If you get something, the vcpu is halted in KVM.

Nice.


> 
> > This has catastrophic implications to low-latency workloads like
> > KVM-RT and zero packet loss with DPDK. To make matters worse, there's
> > an OpenStack service called ceilometer that causes libvirt to
> > issue query-cpus every few minutes.
> 
> I'd expect that people running these workloads can setup the system. :(
> 
> I bet that ceilometer just mindlessly collects everything, so we should
> be able to configure libvirt to collect only some stats.  Either libvirt
> or upper layer would decide what is too expensive for its usefulness.

Yes.  Including expensive-to-collect halt state in
VIR_DOMAIN_STATS_VCPU is a serious performance regression in
libvirt.

> 
> > The solution proposed in this patch is to export the vCPU
> > halted state in the already existing vcpu directory in sysfs.
> > This way, libvirt can read the vCPU halted state from sysfs and avoid
> > using the query-cpus command. This solution seems to be sufficient
> > for libvirt needs, but it has the following cons:
> > 
> >  * vcpu information in sysfs lives in a debug directory, so
> >    libvirt would be basing its API on debug info
> 
> (It pains me to say there probably already are tools that depend on
>  kvm/debug.)
> 
> It's slightly better than the stack hack, but needs more code in kernel
> and the interface is in a gray compatibility zone, so I'd like to know
> why does userspace do that in the first place.
> 
> >  * Currently, only x86 supports the vcpu dir in sysfs, so
> >    we'd have to expand this to other archs (should be doable)
> > 
> > If we agree that this solution is feasible, I'll work on extending
> > the vcpu debug information to other archs for my next posting.
> > 
> > Signed-off-by: Luiz Capitulino <lcapitulino at redhat.com>
> > ---
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > @@ -6273,6 +6273,7 @@ void kvm_arch_exit(void)
> >  
> >  int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
> >  {
> > +	kvm_vcpu_set_halted(vcpu);
> 
> There is no point to care about !lapic_in_kernel().  I'd move the logic
> into vcpu_block() to be shared among all architectures.
> 
> >  	++vcpu->stat.halt_exits;
> >  	if (lapic_in_kernel(vcpu)) {
> >  		vcpu->arch.mp_state = KVM_MP_STATE_HALTED;

-- 
Eduardo




More information about the libvir-list mailing list