[libvirt] [PATCH v3 1/3] libxl: implement NUMA capabilities reporting

Daniel P. Berrange berrange at redhat.com
Tue Jul 16 10:18:10 UTC 2013


On Tue, Jul 16, 2013 at 11:10:25AM +0100, Dario Faggioli wrote:
> On mar, 2013-07-16 at 10:41 +0100, Daniel P. Berrange wrote:
> > On Sat, Jul 13, 2013 at 02:27:03AM +0200, Dario Faggioli wrote:
> > > @@ -788,9 +903,40 @@ libxlMakeCapabilities(libxl_ctx *ctx)
> > >          return NULL;
> > >      }
> > >  
> > > -    return libxlMakeCapabilitiesInternal(virArchFromHost(),
> > > +    caps = libxlMakeCapabilitiesInternal(virArchFromHost(),
> > >                                           &phy_info,
> > >                                           ver_info->capabilities);
> > > +
> > > +    /* Check if caps is valid. If it is, it must remain so till the end! */
> > > +    if (caps == NULL)
> > > +        goto out;
> > > +
> > > +    /* Let's try to fetch NUMA info now (not critical in case we fail) */
> > > +    numa_info = libxl_get_numainfo(ctx, &nr_nodes);
> > > +    if (numa_info == NULL)
> > > +        VIR_WARN("libxl_get_numainfo failed to retrieve NUMA data");
> > 
> > Under what scenario can libxl_get_numainfo() return NULL ? Unless this
> > is an valid expected scenario, we should treat this is an error.
> > 
> There are indeed a couple of possible reasons. Actually, I saw that the
> qemu driver does pretty much the same, i.e., if retrieving NUMA
> information fails, it gives up on that, but does not make things
> explode, and I really think it is something that makes sense.

The reason the QEMU driver does that is that libnuma will return an
error if the host machine does not expose NUMA info in its BIOS. This
is an expected, valid scenario, so we have to ignore the error and
libnuma provides no way to distinguish this valid scenario from other
errors.

> The actual possible failure reasons are: (1) it cannot prepare the
> parameters for the hypercall, or (2) the hypercall fails. It is true
> that, in both cases, something really serious might have happened, but
> there is no way to tell it from here. Thus, I honestly think that trying
> to carry on is sound... If it is really the case that some critical
> component died, we'll find out soon enough.

The only scenario in which it is acceptable to ignore the failure
is if the physical hardware does not support NUMA. The question is
whether the Xen API lets you distinguish that scenario, from other
types of errors.


Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list