[libvirt] XenStore fix

Daniel P. Berrange berrange at redhat.com
Tue Jul 28 08:53:46 UTC 2009


On Tue, Jul 28, 2009 at 10:15:10AM +0200, Daniel Veillard wrote:
> On Tue, Jul 28, 2009 at 08:30:12AM +0200, Jonas Eriksson wrote:
> > Hi,
> > 
> > I have been examining a bug where libvirtd (and virsh) does not show
> > all virtual machines on a xen host. This proved to be because of this
> > program flow:
> > 1. virConnectNumOfDomains -> .. -> xenUnifiedNumOfDomains
> >    -> xenHypervisorNumOfDomains => 3
> > 2. virConnectListDomains(max=3) -> .. -> xenUnifiedListDomains(max=3)
> >    -> xenStoreNumOfDomains(max=3) => { 0, 2, 7 }
> 
>   Actually the problem I see is that ListDomains should really go
> through the Hypervisor API i.e. xenHypervisorListDomains(), which
> is *way* faster and garanteed to be acurate. We should try the
> hypervisor first, IMHO, the function code was modified end of last year
> to avoid Xend not properly cleaning up:

Going to the Hypervisor first will just re-introduce the bug
shown in this mail you quote:

> http://www.mail-archive.com/libvir-list@redhat.com/msg09855.html

ie, the Hypervisor will report a domain exists, while XenD will
claim it wouldn't, and thus virsh will print out lots of errors

   libvir: Xen Daemon error : GET operation failed: xend_get: error from xen 

>   The problem is that we have put xenstore driver call first, while
> it's clearly slower and has a higher chance of getting things wrong than
> the hypervisor itself (if the HV get it wrong I guess there is no cure :-)

The problem here is that there are 2 definitions of 'right'.

 - The hypervisor reports all guest domains
 - XenD reports all guest domains that it knows about

We have to ask XenD for the guest configuration later, so if we
get the list of domain IDs from the HV, it is inevitable that we
will get errors from XenD for some domains. If we are to avoid
errors then we need to get a list of domain IDs that matches
XenD's view. We can't ask XenD directly because that is insanely
slow, so XenStore is the next best, however, that seems to have
some domains that have gone away.

I think part of the problem is that this cannot be solved with a simple
prioritization of HV, XenStored, XenD.  We need to implement something
that combines information from xenstore & HV.

 - Get list of domain IDs from XenStore.
 - Remove any domain IDs from this list that don't exist in the HyperVisor

And crucially, 'numOfDomains' API has to follow same logic as the
'listDomains' API.  Currently numOfDomains goes to HV, while listDomains
goes to XenStore which is a nasty mist-match

This should ensure we don't get any domain IDs from the HV that XenD has
stopped reporting, and it should also ensure we don't report stale domain 
IDs from xenstore.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




More information about the libvir-list mailing list