[libvirt] [PATCH 4/6] Added domainMigrateStartPostCopy in qemu driver

Daniel P. Berrange berrange at redhat.com
Thu Sep 25 12:20:07 UTC 2014


On Thu, Sep 25, 2014 at 02:12:24PM +0200, Jiri Denemark wrote:
> On Thu, Sep 25, 2014 at 12:00:41 +0200, Cristian KLEIN wrote:
> > On 2014-09-24 15:06, Jiri Denemark wrote:
> > > This mostly looks good in isolation but I think this is not going to
> > > work. When post-copy is started, QEMU on the destination host will be
> > > resumed (I'm not sure if that happens automatically or we have to do
> > > it), which basically means we need to jump out of the Perform state and
> > > call Finish and once it returns, we should keep waiting for the
> > > post-copy migration to finish in Confirm state and kill the domain at
> > > the end. It's certainly possible the steps we need to do are a bit
> > > different since I'm not familiar with all the details about post-copy
> > > migration, but I believe we need to do something. And just running a
> > > single QEMU command is not enough to start post-copy in libvirt.
> > 
> > I'm not sure to follow. I tested the patch and it worked well: A VM that 
> > was "unmigratable" with pre-copy was successfully migrated through 
> > post-copy. Through the migration protocol, once we start post-copy on 
> > the source qemu, the following will happen:
> > 
> > - source qemu suspends VM and transfer CPU state;
> > - destination qemu resumes the VM.
> 
> Hmm, that's a bit unfortunate. I think we will need a way to tell QEMU
> not to resume the CPU automatically. The process should flow as follows:
> 
> - libvirt sends migrate-start-postcopy command to QEMU
> - QEMU suspends the VM and transfers CPU state
> - QEMU tells us we can resume the destination
> - libvirt tells the destination QEMU to resume the VM
> - libvirt waits until migration is done
> - libvirt kills the source QEMU
> 
> Perhaps, we could tell the destination QEMU to resume the VM while the
> source is transferring CPU state if that's allowed by QEMU to minimize
> downtime.
> 
> > Could you tell me why you think it's necessary to jump out of Perform 
> > state? What is libvirt doing when calling Finish that the destination VM 
> > requires to function properly?
> 
> The problem is Finish does more than just resuming the VM on the
> destination. Before resuming the VM, libvirt needs to transfer locks on
> resources from the source to the destination, it needs to enable
> networking for the destination QEMU, etc. Without all this, the VM won't
> be able to really work on the destination. Not to mention that if
> something fails while the VM is already resumed on the destination, the
> code in Perform phase would just abort the migration and resume the VM
> on the source, which is wrong. We need to kill both ends since non of
> them has the complete state to be able to continue running the VM.
> 
> BTW, it's going to work in simple cases, when there's no lock daemon in
> use, only basic Linux bridge support is used, etc., which is why it
> works just fine for you. But we need to count with all the non-simple
> cases too.

Yes, having this work correctly with virtlockd and sanlock is really
mandatory for including the code.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list