[libvirt] libvirt mdev migration, mdevctl integration

Daniel P. Berrangé berrange at redhat.com
Mon Nov 25 17:47:26 UTC 2019


On Mon, Nov 25, 2019 at 06:14:33PM +0100, Cornelia Huck wrote:
> On Mon, 18 Nov 2019 19:00:25 +0000
> Daniel P. Berrangé <berrange at redhat.com> wrote:
> 
> > On Mon, Nov 18, 2019 at 10:06:34AM -0700, Alex Williamson wrote:
> > > Hey folks,
> > > 
> > > We had some discussions at KVM Forum around mdev live migration and
> > > what that might mean for libvirt handling of mdev devices and
> > > potential libvirt/mdevctl[1] flows.  I believe the current situation is
> > > that libvirt knows nothing about an mdev beyond the UUID in the XML.
> > > It expects the mdev to exist on the system prior to starting the VM.
> > > The intention is for mdevctl to step in here by providing persistence
> > > for mdev devices such that these pre-defined mdevs are potentially not
> > > just ephemeral, for example, we can tag specific mdevs for automatic
> > > startup on each boot.
> > > 
> > > It seems the next step in this journey is to figure out if libvirt can
> > > interact with mdevctl to "manage" a device.  I believe we've avoided
> > > defining managed='yes' behavior for mdev hostdevs up to this point
> > > because creating an mdev device involves policy decisions.  For
> > > example, which parent device hosts the mdev, are there optimal NUMA
> > > considerations, are there performance versus power considerations, what
> > > is the nature of the mdev, etc.  mdevctl doesn't necessarily want to
> > > make placement decisions either, but it does understand how to create
> > > and remove an mdev, what it's type is, associate it to a fixed
> > > parent, apply attributes, etc.  So would it be reasonable that for a
> > > manage='yes' mdev hostdev device, libvirt might attempt to use mdevctl
> > > to start an mdev by UUID and stop it when the VM is shutdown?  This
> > > assumes the mdev referenced by the UUID is already defined and known to
> > > mdevct.  I'd expect semantics much like managed='yes' around vfio-pci
> > > binding, ex. start/stop if it doesn't exist, leave it alone if it
> > > already exists.
> > > 
> > > If that much seems reasonable, and someone is willing to invest some
> > > development time to support it, what are then the next steps to enable
> > > migration?  
> > 
> > The first step is to deal with our virNodeDevice APIs.
> > 
> > Currently we have
> > 
> >  - Listing devices via     ( virConnectListAllNodeDevices )
> >  - Create transient device ( virNodeDeviceCreateXML )
> >  - Delete transient device ( virNodeDeviceDestroy )
> > 
> > The create/delete APIs only deal with NPIV HBAs right now, so we need
> > to extend that to deal with mdevs as first step.
> 
> I assume the listing function already deals with all device types
> supported by libvirt?

Yes, that's correct.


> > > So assuming we now have a VM with a managed='yes' mdev hostdev device,
> > > what do we need to do to reproduce that device at the migration target?
> > > mdevctl can dump a device in a json format, where libvirt could use
> > > this to define and start an equivalent device on the migration target
> > > (potentially this json is extended by mdevctl to include the migration
> > > compatibility vendor string).  Part of our discussion at the Forum was
> > > around the extent to which libvirt would want to consider this json
> > > opaque.  For instance, while libvirt doesn't currently support localhost
> > > migration, libvirt might want to use an alternate UUID for the mdev
> > > device on the migration target so as not to introduce additional
> > > barriers to such migrations.  Potentially mdevctl could accept the json
> > > from the source system as a template and allow parameters such as UUID
> > > to be overwritten by commandline options. This might allow libvirt to
> > > consider the json as opaque.  
> > 
> > We definifely cannot expose the JSON anywhere in libvirt public API.
> > The JSON is a tool specific format, and one of libvirt's core jobs is
> > to define a format that isolates apps from the specific tool's impl,
> > so that we can swap out backend impls without impacting apps.
> > 
> > > 
> > > An issue here though is that the json will also include the parent
> > > device, which we obviously cannot assume is the same (particularly the
> > > bus address) on the migration target.  We can allow commandline
> > > overrides for the parent just as we do above for the UUID when defining
> > > the mdev device from json, but it's an open issue who is going to be
> > > smart enough (perhaps dumb enough) to claim this responsibility.  It
> > > would be interesting to understand how libvirt handles other host
> > > specific information during migration, for instance if node or processor
> > > affinities are part of the VM XML, how is that translated to the
> > > target?  I could imagine that we could introduce a simple "first
> > > available" placement in mdevctl, but maybe there should minimally be a
> > > node allocation preference with optional enforcement (similar to
> > > numactl), or maybe something above libvirt needs to take this
> > > responsibility to prepare the target before we get ourselves into
> > > trouble.  
> > 
> > I don't think we need to solve placement in libvirt.
> > 
> > The guest XML will just reference the mdev via a UUID that
> > was used with virNodeDeviceDefineXML. 
> > 
> > The virNodeDeviceDefineXML call where the mdev is first defined
> > will set the details of the mdev creation for this specific host.
> > The XML used with virNodeDeviceDefineXML can be different on the
> > source + target hosts. As long as the UUID is the same in both
> > hosts, the VM will associate with it correctly.
> 
> I wonder how to sync up with different placements, but maybe I'm just
> missing something.
> 
> Looking at this from the vfio-ccw angle, we can easily have the same
> device (as identified by the device number) on different subchannels
> (parents). To find out the device number, you need to look at the child
> ccw device of the subchannel while it is *not* bound to vfio-ccw, but
> to the normal I/O subchannel driver instead. Or ask your admin for the
> system definition...

This just means that whoever/whatever is invoking "virDomainDeviceDefinXML"
or "mdevctl create" will pass different parameters on each host. When
migrating a guest the mgmt app can indicate which device should be used
for the guest on each host. This is similar issue to migrating a guest
which uses a ethNNN device that's got different name on each host ,or
a /dev/sdNNN that's different on each host, etc

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list