[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH 4/6] Added domainMigrateStartPostCopy in qemu driver

* Jiri Denemark (jdenemar redhat com) wrote:
> On Thu, Sep 25, 2014 at 12:00:41 +0200, Cristian KLEIN wrote:
> > On 2014-09-24 15:06, Jiri Denemark wrote:
> > > This mostly looks good in isolation but I think this is not going to
> > > work. When post-copy is started, QEMU on the destination host will be
> > > resumed (I'm not sure if that happens automatically or we have to do
> > > it), which basically means we need to jump out of the Perform state and
> > > call Finish and once it returns, we should keep waiting for the
> > > post-copy migration to finish in Confirm state and kill the domain at
> > > the end. It's certainly possible the steps we need to do are a bit
> > > different since I'm not familiar with all the details about post-copy
> > > migration, but I believe we need to do something. And just running a
> > > single QEMU command is not enough to start post-copy in libvirt.
> > 
> > I'm not sure to follow. I tested the patch and it worked well: A VM that 
> > was "unmigratable" with pre-copy was successfully migrated through 
> > post-copy. Through the migration protocol, once we start post-copy on 
> > the source qemu, the following will happen:
> > 
> > - source qemu suspends VM and transfer CPU state;
> > - destination qemu resumes the VM.
> Hmm, that's a bit unfortunate. I think we will need a way to tell QEMU
> not to resume the CPU automatically. The process should flow as follows:

> - libvirt sends migrate-start-postcopy command to QEMU
> - QEMU suspends the VM and transfers CPU state
> - QEMU tells us we can resume the destination
> - libvirt tells the destination QEMU to resume the VM
> - libvirt waits until migration is done
> - libvirt kills the source QEMU

The destination QEMU should behave the same way as precopy does; i.e. if you
run the qemu with -S it should pause rather than start the CPU.
If it doesn't it's a bug I can fight (I did test it a while ago, and I think
I'm using approximately the same code as precopy to do it).
The only difference is with postcopy that point happens way before the migration
has finished.


> Perhaps, we could tell the destination QEMU to resume the VM while the
> source is transferring CPU state if that's allowed by QEMU to minimize
> downtime.
> > Could you tell me why you think it's necessary to jump out of Perform 
> > state? What is libvirt doing when calling Finish that the destination VM 
> > requires to function properly?
> The problem is Finish does more than just resuming the VM on the
> destination. Before resuming the VM, libvirt needs to transfer locks on
> resources from the source to the destination, it needs to enable
> networking for the destination QEMU, etc. Without all this, the VM won't
> be able to really work on the destination. Not to mention that if
> something fails while the VM is already resumed on the destination, the
> code in Perform phase would just abort the migration and resume the VM
> on the source, which is wrong. We need to kill both ends since non of
> them has the complete state to be able to continue running the VM.
> BTW, it's going to work in simple cases, when there's no lock daemon in
> use, only basic Linux bridge support is used, etc., which is why it
> works just fine for you. But we need to count with all the non-simple
> cases too.
> Jirka
Dr. David Alan Gilbert / dgilbert redhat com / Manchester, UK

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]