[libvirt] XenStore fix

Daniel Veillard veillard at redhat.com
Tue Jul 28 10:28:08 UTC 2009


On Tue, Jul 28, 2009 at 09:53:46AM +0100, Daniel P. Berrange wrote:
> On Tue, Jul 28, 2009 at 10:15:10AM +0200, Daniel Veillard wrote:
> > On Tue, Jul 28, 2009 at 08:30:12AM +0200, Jonas Eriksson wrote:
> > > Hi,
> > > 
> > > I have been examining a bug where libvirtd (and virsh) does not show
> > > all virtual machines on a xen host. This proved to be because of this
> > > program flow:
> > > 1. virConnectNumOfDomains -> .. -> xenUnifiedNumOfDomains
> > >    -> xenHypervisorNumOfDomains => 3
> > > 2. virConnectListDomains(max=3) -> .. -> xenUnifiedListDomains(max=3)
> > >    -> xenStoreNumOfDomains(max=3) => { 0, 2, 7 }
> > 
> >   Actually the problem I see is that ListDomains should really go
> > through the Hypervisor API i.e. xenHypervisorListDomains(), which
> > is *way* faster and garanteed to be acurate. We should try the
> > hypervisor first, IMHO, the function code was modified end of last year
> > to avoid Xend not properly cleaning up:
> 
> Going to the Hypervisor first will just re-introduce the bug
> shown in this mail you quote:
> 
> > http://www.mail-archive.com/libvir-list@redhat.com/msg09855.html
> 
> ie, the Hypervisor will report a domain exists, while XenD will
> claim it wouldn't, and thus virsh will print out lots of errors
> 
>    libvir: Xen Daemon error : GET operation failed: xend_get: error from xen 
> 
> >   The problem is that we have put xenstore driver call first, while
> > it's clearly slower and has a higher chance of getting things wrong than
> > the hypervisor itself (if the HV get it wrong I guess there is no cure :-)
> 
> The problem here is that there are 2 definitions of 'right'.
> 
>  - The hypervisor reports all guest domains

 As far as I know it's the hypervisor which allocates resources

>  - XenD reports all guest domains that it knows about

  and xend/xenstore layers are just there for the management.
IMHO if the hypervisor allocates ressources to a domain, that should
be reported, even if xend or xenstore is confused.

> We have to ask XenD for the guest configuration later, so if we
> get the list of domain IDs from the HV, it is inevitable that we
> will get errors from XenD for some domains. If we are to avoid

  Well then that's IMHO indicative of Xend problems. Rogue domains
hidden from Xend or xenstore for some reason should be reported even
if incomplete status is given.

> I think part of the problem is that this cannot be solved with a simple
> prioritization of HV, XenStored, XenD.  We need to implement something
> that combines information from xenstore & HV.
> 
>  - Get list of domain IDs from XenStore.
>  - Remove any domain IDs from this list that don't exist in the HyperVisor

  Considering the various possible problem in the XenStore or XenD
userland, if someone finds a way to hide his domain from them it would
get 0 accounting as a result, if you consider hosting companies and
similar use case, I really would prefer some kind of error popping out
when running domains aren't listed in the management layer than silently
ignoring them.

> And crucially, 'numOfDomains' API has to follow same logic as the
> 'listDomains' API.  Currently numOfDomains goes to HV, while listDomains
> goes to XenStore which is a nasty mist-match

  having the same loging on both side makes sense.

> This should ensure we don't get any domain IDs from the HV that XenD has
> stopped reporting, and it should also ensure we don't report stale domain 
> IDs from xenstore.

  The real question is why xend should stop reporting about domains that
are still present at the hypervisor level. Except for state transitions
taht smells fishy to me !

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel at veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/




More information about the libvir-list mailing list