[libvirt] Question about migration confirm phase

Jim Fehlig jfehlig at suse.com
Mon Oct 14 19:43:28 UTC 2019


On 10/14/19 2:18 AM, Jiri Denemark wrote:
> On Fri, Oct 11, 2019 at 23:18:29 +0000, Jim Fehlig wrote:
>> I've been investigating a lockd lock ordering bug in a migration error handling
>> path in the libxl driver. In the perform phase, the src calls
>> virDomainLockProcessPause to release the lock before sending the VM to dst. In
>> this case the send fails for other reasons and an attempt is made to reacquire
>> the lock with virDomainLockProcessResume. But that fails since the dst has not
>> finished cleaning up the failed VM and releasing the lock it acquired when
>> starting to receive the VM. My immediate reaction was "why not reacquire the
>> lock in the confirm phase", but then I saw my older comment a few lines later in
>> the perform phase code
>>
>>           /*
>>            * Confirm phase will not be executed if perform fails. End the
>>            * job started in begin phase.
>>            */
>>
>> Is that just a bug in the implementation, or is it intended to skip the confirm
>> phase if perform fails?
> 
> It's intended. The Perform phase runs on the source hosts so why should
> we call Confirm to let the source know about the failure?

To do any cleanup of the failed migration after the dst has done it's cleanup in 
the finish phase?

> But of course,
> the source has to cleanup after the failed migration similarly to what
> Confirm would do.

I've made slight changes to the lock ordering and it looks promising after 
initial tests. I'll post a patch after further testing. Thanks!

Regards,
Jim




More information about the libvir-list mailing list