[libvirt] [RFC v1 0/6] Live Migration with ephemeral host NIC devices

Wed May 13 15:12:18 UTC 2015

On 05/13/2015 10:42 AM, Dr. David Alan Gilbert wrote:
> * Laine Stump (laine at redhat.com) wrote:
>> On 05/13/2015 04:28 AM, Peter Krempa wrote:
>>> On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
>>>> * Peter Krempa (pkrempa at redhat.com) wrote:
>>>>> On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
>>>>>> my main goal is to add support migration with host NIC
>>>>>> passthrough devices and keep the network connectivity.
>>>>>>
>>>>>> this series patch base on Shradha's patches on
>>>>>> https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html
>>>>>> which is add migration support for host passthrough devices.
>>>>>>
>>>>>>  1) unplug the ephemeral devices before migration
>>>>>>
>>>>>>  2) do native migration
>>>>>>
>>>>>>  3) when migration finished, hotplug the ephemeral devices
>>>>>
>>>>> IMHO this algorithm is something that an upper layer management app
>>>>> should do. The device unplug operation is complex and it might not
>>>>> succeed which will make the current migration thread hang or fail in an
>>>>> intermediate state that will not be recoverable.
>>>>
>>>> However you wouldn't want each of the upper layer management apps implementing
>>>> their own hacks for this; so something somewhere needs to standardise
>>>> what the guest sees.
>>>
>>> The guest still will see an PCI device unplug request and will have to
>>> respond to it, then will be paused and after resume a new PCI device
>>> will appear. This is standardised. The nonstandardised part (which can't
>>> really be standardised) is how the bonding or other guest-dependant
>>> stuff will be handled, but that is up to the guest OS to handle.
>>>
>>> From libvirt's perspective this is only something that will trigger the
>>> device unplug and plug the devices back. And there are a lot of issues
>>> here:
>>>
>>> 1) the destination of the migration might not have the desired devices
>>>
>>>     This will trigger a lot of problems as we will not be able to guarantee
>>>     that the devices reappear on the destination and if we'd wanted to check
>>>     we'd need a new migration protocol AFAIK.
>>>
>>> 2) The guest OS might refuse to detach the PCI device (it might be stuck
>>> before PCI code is loaded)
>>>
>>>     In that case the migration will be stuck forever and abort attempts
>>>     will make the domain state basically undefined depending on the
>>>     phase where it failed.
>>>
>>> Since we can't guarantee that the unplug of the PCI host devices will be
>>> atomic or that it will succeed we basically can't guarantee in any way
>>> in which state the VM will end up later after (a possibly failed)
>>> migration. To recover such state there are too many option that could be
>>> desired by the user that would be hard to implement in a way that would
>>> be flexible enough.
>>
>>
>> In the past I've been on the side of having libvirt automatically do the
>> device detach and reattach (but definitely on the side of the guest
>> agent and libvirt keeping their hands off of network configuration in
>> the guest), with the thinking that 1) libvirt is in a well situated spot
>> to do it, and 2) this would eliminate duplicate code in the upper level
>> management.
>>
>> However, Peter's points above made me consider the failure cases more
>> closely, in particular this one:
>>
>> * the destination claims to have the resources required (right type of
>> PCI device, enough RAM), so migration is started.
>>
>> * device detached on source, guest memory migrated to destination,
>>
>> * guest started - no problems. (At this point, since the guest has been
>> restarted, it's not really possible for libvirt to fail the migration in
>> a recoverable manner (unless you want to implement some sort of
>> "unmigration" so that the guest state on the source is updated with
>> whatever execution occurred on the destination, and I don't think
>> *anyone* wants to go there))
>>
>> * libvirt finds the device still available and attempts to attach it but
>> (for some odd reason) fails.
>>
>> Now libvirt can't tell the application that the migration has succeeded,
>> because it didn't (unless the device was marked as "optional"), but it
>> also can't fail the migration except to say "this is such a monumental
>> failure that your guest has simply died".
>>
>> If, on the other hand, the detach and re-attach are implemented in a
>> higher layer (ovirt/openstack), they will at least have the guest in a
>> state they can deal with - it won't be pretty, but they could for
>> example migrate the guest to another host (maybe back to the source) and
>> re-attach there.
>>
>> So this one message from Peter has nicely pointed out the error in my
>> thinking, and I now agree that auto-detach/reattach shouldn't be
>> implemented in libvirt - it would work nicely in an error free world,
>> but would crumble in the face of some errors. (I just wish I had
>> considered the particular failure mode above a year or two ago, so I
>> could have been more discouraging in my emails then :-)
> 
> 
> It's a shame to limit the utility of this by dealing with an error case
> that's not a fatal error.  Does libvirt not have a way of dealing with
> non-fatal errors?

But is it non-fatal? Dan's point is that isn't up to libvirt to decide.
In the case of attached USB devices, there is an attribute called
startupPolicy which can be set to "mandatory", "requisite" or
"optional". The first would cause a failure of the migration if the
device wasn't present on the destination of migrate, while the other two
would result in the device simply not being present on the destination.
But USB works differently from PCI - I don't think it even detaces the
device from the guest - so it doesn't have the same problems as a PCI
device.

Although libvirt can reserve the device on the destination before the
migration starts, once the guest CPUs have been restarted, there is
currently "no going back". The only options would be 1) fail the
migration and kill the guest on the destination (is there even a state
for this?) or 2) implement new code to stop the CPUs and migrate the new
memory state back to the source, restart the CPUs on the source, and
report the migration as failed (not implemented, and wouldn't be very
pretty).

We *could* just unilaterally decide that all PCI assigned devices are
"optional" on the destination, and report the migration as a success
(just without the device being attached), but that is getting into the
territory of "libvirt making policy decisions" as discussed by Dan.