[libvirt] Question about migration confirm phase

Jiri Denemark jdenemar at redhat.com
Mon Oct 14 08:18:12 UTC 2019


On Fri, Oct 11, 2019 at 23:18:29 +0000, Jim Fehlig wrote:
> I've been investigating a lockd lock ordering bug in a migration error handling 
> path in the libxl driver. In the perform phase, the src calls 
> virDomainLockProcessPause to release the lock before sending the VM to dst. In 
> this case the send fails for other reasons and an attempt is made to reacquire 
> the lock with virDomainLockProcessResume. But that fails since the dst has not 
> finished cleaning up the failed VM and releasing the lock it acquired when 
> starting to receive the VM. My immediate reaction was "why not reacquire the 
> lock in the confirm phase", but then I saw my older comment a few lines later in 
> the perform phase code
> 
>          /*
>           * Confirm phase will not be executed if perform fails. End the
>           * job started in begin phase.
>           */
> 
> Is that just a bug in the implementation, or is it intended to skip the confirm 
> phase if perform fails?

It's intended. The Perform phase runs on the source hosts so why should
we call Confirm to let the source know about the failure? But of course,
the source has to cleanup after the failed migration similarly to what
Confirm would do.

Jirka




More information about the libvir-list mailing list