[libvirt] [PATCH] pass correct uri to source host when we do p2p migration

Tue Jan 11 11:00:40 UTC 2011

On Tue, Jan 11, 2011 at 09:37:46AM +0800, Wen Congyang wrote:
> At 01/10/2011 07:02 PM, Daniel P. Berrange Write:
> > On Mon, Jan 10, 2011 at 11:20:31AM +0800, Wen Congyang wrote:
> >> When we migrate a guest from remote host to localhost by p2p, the libvirtd
> >> on remote host will be deadlock. This patch fixes a bug and we can avoid
> >> the deadlock with this patch.
> >>
> >> The steps to reproduce this bug:
> >> # virsh -c qemu+ssh://<remotehost IP>/system migrate --p2p <domain name> qemu+ssh:///system
> >>
> >> We connect dest host(qemu+ssh:///system) on source host, and the uri we pass
> >> to source host is qemu+ssh:///system. And then we connect a wrong dest host.
> > 
> > IMHO, we should be reporting an error in this scenario, because the
> > user is specifying a combination of parameters that don't make
> > sense.
> 
> I don't think so. If we don't use --p2p, the migration can finish.
> If we reporting an error here, the behavior of this command will
> be strange(success without --p2p, but fail with --p2p).

If the user did a slight variation on your example, they'll hit
exactly the same problem:

  virsh -c qemu+ssh://<remotehost IP>/system migrate --p2p <domain name> qemu+ssh://localhost/system

In both cases the URI is erroneously pointing to the machine
running virsh. The URI needs to be the public IP of the machine
to migrate to, as known to the source machine.

The fact that it works with traditional migration but fails
with peer2peer migration is actually expected, because these
have very different architectures & are not intended to
behave equivalently for a given URI. The URI parameter for
these two modes of migration has different semantics:

 * normal migration: the URI is an address of the target host
   as seen from the client machine.

 * peer2peer migration: the URI is an address of the target
   host as seen from the source machine.

So in normal migration, your example requests migration between
a <remote host> and the local machine running virsh. In the
peer2peer migration your example requests a "localhost migration"
on the <remote host>.  While some hypervisors will support
localhost migration, we don't support that with QEMU currently
and so QEMU should refuse to attempt that (rather than deadlock).

Putting in the code to libvirt.c which changes the URI to add
in a full hostname of the virsh client, is in effect re-writing
the semantics of the peer2peer migration URI argument, to apply
the semantics of the normal migration URI argument. This is not
what we want, because it will prevent use of "localhost migration"
for hypervisors which do support this concept.

Also, nothing here is actually fixing the deadlock, which could
likely be triggered, just by giving 2 URIs explicitly pointing
to the same machine which is again requesting a "localhost"
migration:

  virsh -c qemu+ssh://<remotehost IP>/system migrate --p2p <domain name> qemu+ssh://<remotehost IP>/system

Regards,
Daniel