[libvirt] Matching the type of mediated devices in the migration

Wang, Zhi A zhi.a.wang at intel.com
Fri Aug 3 12:07:58 UTC 2018


Hi:

Thanks for unfolding your idea. The picture is clearer to me now. I didn't realize that you also want to support cross hardware migration. Well, I thought for a while, the cross hardware migration might be not popular in vGPU case but could be quite popular in other mdev cases.

Let me continue my summary:

Mdev dev type has already included a parent driver name/a group name/physical device version/configuration type. For example i915-GVTg_V5_4. The driver name and the group name could already distinguish the vendor and the product between different mdevs, e.g. between Intel and Nvidia, between vGPU or vOther.

Each device provides a collection of the version of device state of data stream in a preferred order in a mdev type, as newer version of device state might contains more information which might help on performances. 

Let's say a new device N and an old device O, they both support mdev_type M.

For example:
Device N is newer and supports the versions of device state: [ 6.3  6.2 .6.1 ] in mdev type M
Device O is older and supports the versions of device state: [ 5.3 5.2 5.1 ] in mdev type M

- Version scheme of device state in backwards compatibility case: Migrate a VM from a VM with device O to a VM with device N, the mdev type is M.

Device N: [ 6.3 6.2 6.1 5.3 ] in M
Device O: [ 5.3 5.2 5.1 ] in M
Version used in migration: 5.3
The new device directly supports mdev_type M with the preferred version on Device O. Good, best situation.

Device N: [ 6.3 6.2 6.1 5.2 ] in M
Device O: [ 5.3 5.2 5.1 ] in M
Version used in migration: 5.2
The new device supports mdev_type M, but not the preferred version. After the migration, the vendor driver might have to disable some features which is not mentioned in 5.2 device state. But this totally depends on the vendor driver. If user wish to achieve the best experience, he should update the vendor driver in device N, which supports the preferred version on device O.

Device N: [ 6.3 6.2 6.1 ] in M
Device O: [ 5.3 5.2 5.1 ] in M
Version used in migration: None
No version is matched. Migration would fail. User should update the vendor driver on device N and device O.

- Version scheme of device state in forwards compatibility case: Migrate a VM from a VM with N to a VM with device O, the mdev type is M.

Device N: [ 6.3 6.2 .6.1 ] in M
Device O: [ 5.3 5.2 5.1 ] in M, but the user updates the vendor driver on device O. Now device O could support [ 5.3 5.2 5.1 6.1 ] (As an old device, the Device O still prefers version 5.3)
Version used in migration: 6.1
As the new device states is going to migrate to an old device, the vendor driver on old device might have to specially dealing with the new version of device state. It depends on the vendor driver. 

- QEMU has to figure out and choose the version of device states before reading device state from the region. (Perhaps we can put the option of selection in the control part of the region as well)
- Libvirt will check if there is any match of the version in the collection in device O and device N before migration.
- Each mdev_type has its own collection of versions. (Device can support different versions in different types)
- Better the collection is not a range, better they could be a collection of the version strings. (The vendor driver might drop some versions during the upgrade since they are not ideal)

That's the picture so far in my mind.

Thanks,
Zhi.

-----Original Message-----
From: Alex Williamson [mailto:alex.williamson at redhat.com] 
Sent: Wednesday, August 1, 2018 8:19 PM
To: Wang, Zhi A <zhi.a.wang at intel.com>
Cc: libvir-list at redhat.com; kwankhede at nvidia.com
Subject: Re: Matching the type of mediated devices in the migration

On Wed, 1 Aug 2018 10:22:39 +0000
"Wang, Zhi A" <zhi.a.wang at intel.com> wrote:

> Hi:
> 
> Let me summarize the understanding so far I got from the discussions since I am new to this discussion.
> 
> The mdev_type would be a generic stuff since we don't want userspace application to be confused. The example of mdev_type is:

I don't think 'generic' is the right term here.  An mdev_type is a specific thing with a defined interface, we just don't define what that interface is.
 
> There are several pre-defined mdev_types with different configurations, let's say MDEV_TYPE A/B/C. The HW 1.0 might only support MDEV_TYPE A, the HW 2.0 might support both MDEV_TYPE A and B, but due to HW difference, we cannot migrate MDEV_TYPE A with HW 1.0 to MDEV_TYPE A with HW 2.0 even they have the same MDEV_TYPE. So we need a device version either in the existing MDEV_TYPE or a new sysfs entry.

This is correct, if a foo_type_a is exposed by the same vendor driver on different hardware, then the vendor driver is guaranteeing those mdev devices are software compatible to the user.  Whether the vendor driver is willing or able to support migration across the underlying hardware is a separate question.  Migration compatibility and user compatibility are separate features.

> Libvirt would have to check MDEV_TYPE match between source machine and destination machine, then the device version. If any of them is different, then it fails the migration.

Device version of what?  The hardware?  The mdev?  If the device version represents a different software interface, then the mdev type should be different.  If the device version represents a migration interface compatibility then we should define it as such.

> If my above understanding is correct, for VFIO part, we could define the device version as string or a magic number. For example, the vendor mdev driver could pass the vendor/device id and a version to VFIO and VFIO could expose them in the UUID sysfs no matter through a new sysfs entry or through existing MDEV_TYPE.

As above, why are we trying to infer migration compatibility from a device version?  What does a device version imply?  What if a vendor driver wants to support cross version migration?

> I prefer to expose it in the mdev_supported_types, since the libvirt node device list could extract the device version when it enumerating the host PCI devices or other devices, which supports mdev. We can also put it into UUID sysfs, but the user might have to first logon the target machine and then check the UUID and the device version by themselves, based on current code of libvirty. I suppose all the host device management would be in node device in libvirt, which provides remotely management of the host devices.
> 
> For the format of a device version, an example would be:
> 
> Vendor ID(16bit)Device ID(16bit)Class ID(16bit)Version(16bit)

This is no different from the mdev type, these are user visible attributes of the device which should not change without also changing the type.  Why do these necessarily convey that the migration stream is also compatible?

> For string version of the device version, I guess we have to define the max string length, which is hard to say yet. Also, a magic number is easier to be put into the state data header during the migration.

I don't think we've accomplished anything with this "device version".
If anything, I think we're looking for a sysfs representation of a migration stream version where userspace would match the vendor, type, and migration stream version to determine compatibility.  For vendor drivers that want to provide backwards compatibility, perhaps an optional minimum migration stream version would be provided, which would therefore imply that the format of the version can be parsed into a monotonically increasing value so that userspace can compare a stream produced by a source to a range supported by a target.  Thanks,

Alex




More information about the libvir-list mailing list