[libvirt] [RFC] Should libvirt be a proxy for migration data

Thu Aug 18 10:00:25 UTC 2011

On Tue, Aug 16, 2011 at 15:38:56 -0700, Daniel P. Berrange wrote:
> On Tue, Aug 16, 2011 at 11:53:04PM +0200, Jiri Denemark wrote:
> > Hi all,
> > 
> > Currently when we start a non-tunneled migration, data go straight from
> > source qemu to destination qemu. This is nice in that there is no additional
> > overhead but it also has several disadvantages. If the communication between
> > source and destination qemu breaks, we only get unexpected error message from
> > qemu with no glue about what happened. Another issue is that if qemu cannot
> > send migration data, we cannot cancel the migration because migrate_cancel
> > blocks until all buffers with migration data queued up for transmission are
> > written into the socket.
> > 
> > That said, I think we should act as a proxy between source and destination
> > qemu so that we can detect and report normal errors (such as connection reset
> > by peer) and cancel migration at any time. Since we have virNetSocket and we
> > already use that for connecting to destination qemu, we should use it for
> > proxying migration data as well. This approach also has some disadvantages,
> > e.g., a single libvirt thread instead of several qemu processes will now send
> > migration data from all domains that are being migrated. However, I feel like
> > the gain is bigger than the downside. And we already do the same for tunneled
> > migration anyway.
> > 
> > Any objections?
> 
> Adding libvirt into the mix introduces extra data copies on both the
> source and destination libvirt.

I was actually thinking about doing so only on source. We don't need to do
anything with the data so source libvirtd can send directly to destination
qemu.

> This is non-trivial extra overhead when you are migrating guests that have
> multiple-GB of RAM and have very heavy workloads.  These already push the
> boundaries of what is possible todo with QEMU having a direct TCP
> connection. So I don't think we should go down the route of making libvirt
> do copies itself by default.

Yes, it adds overhead. And maybe the overhead is not worth the benefits we
would gain.

And there might be one more reason for making source libvirtd a proxy between
source qemu and destination qemu. If libvirtd dies or is stopped at the right
time, migration may finish leaving a stopped domain until libvirtd starts
itself. Although this could be solved by qemu canceling a migration when a
monitor connection that started it is closed.

Jirka