[libvirt] [RFC v1 0/6] Live Migration with ephemeral host NIC devices

Wed May 13 13:57:44 UTC 2015

On 05/13/2015 04:28 AM, Peter Krempa wrote:
> On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
>> * Peter Krempa (pkrempa at redhat.com) wrote:
>>> On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
>>>> my main goal is to add support migration with host NIC
>>>> passthrough devices and keep the network connectivity.
>>>>
>>>> this series patch base on Shradha's patches on
>>>> https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html
>>>> which is add migration support for host passthrough devices.
>>>>
>>>>  1) unplug the ephemeral devices before migration
>>>>
>>>>  2) do native migration
>>>>
>>>>  3) when migration finished, hotplug the ephemeral devices
>>>
>>> IMHO this algorithm is something that an upper layer management app
>>> should do. The device unplug operation is complex and it might not
>>> succeed which will make the current migration thread hang or fail in an
>>> intermediate state that will not be recoverable.
>>
>> However you wouldn't want each of the upper layer management apps implementing
>> their own hacks for this; so something somewhere needs to standardise
>> what the guest sees.
> 
> The guest still will see an PCI device unplug request and will have to
> respond to it, then will be paused and after resume a new PCI device
> will appear. This is standardised. The nonstandardised part (which can't
> really be standardised) is how the bonding or other guest-dependant
> stuff will be handled, but that is up to the guest OS to handle.
> 
> From libvirt's perspective this is only something that will trigger the
> device unplug and plug the devices back. And there are a lot of issues
> here:
> 
> 1) the destination of the migration might not have the desired devices
> 
>     This will trigger a lot of problems as we will not be able to guarantee
>     that the devices reappear on the destination and if we'd wanted to check
>     we'd need a new migration protocol AFAIK.
> 
> 2) The guest OS might refuse to detach the PCI device (it might be stuck
> before PCI code is loaded)
> 
>     In that case the migration will be stuck forever and abort attempts
>     will make the domain state basically undefined depending on the
>     phase where it failed.
> 
> Since we can't guarantee that the unplug of the PCI host devices will be
> atomic or that it will succeed we basically can't guarantee in any way
> in which state the VM will end up later after (a possibly failed)
> migration. To recover such state there are too many option that could be
> desired by the user that would be hard to implement in a way that would
> be flexible enough.

In the past I've been on the side of having libvirt automatically do the
device detach and reattach (but definitely on the side of the guest
agent and libvirt keeping their hands off of network configuration in
the guest), with the thinking that 1) libvirt is in a well situated spot
to do it, and 2) this would eliminate duplicate code in the upper level
management.

However, Peter's points above made me consider the failure cases more
closely, in particular this one:

* the destination claims to have the resources required (right type of
PCI device, enough RAM), so migration is started.

* device detached on source, guest memory migrated to destination,

* guest started - no problems. (At this point, since the guest has been
restarted, it's not really possible for libvirt to fail the migration in
a recoverable manner (unless you want to implement some sort of
"unmigration" so that the guest state on the source is updated with
whatever execution occurred on the destination, and I don't think
*anyone* wants to go there))

* libvirt finds the device still available and attempts to attach it but
(for some odd reason) fails.

Now libvirt can't tell the application that the migration has succeeded,
because it didn't (unless the device was marked as "optional"), but it
also can't fail the migration except to say "this is such a monumental
failure that your guest has simply died".

If, on the other hand, the detach and re-attach are implemented in a
higher layer (ovirt/openstack), they will at least have the guest in a
state they can deal with - it won't be pretty, but they could for
example migrate the guest to another host (maybe back to the source) and
re-attach there.

So this one message from Peter has nicely pointed out the error in my
thinking, and I now agree that auto-detach/reattach shouldn't be
implemented in libvirt - it would work nicely in an error free world,
but would crumble in the face of some errors. (I just wish I had
considered the particular failure mode above a year or two ago, so I
could have been more discouraging in my emails then :-)