[libvirt] [PATCH v2 1/8] Added public API to enable post-copy migration

Cristian Klein cristian.klein at cs.umu.se
Thu Nov 6 07:18:04 UTC 2014

On 01 Oct 2014, at 12:07 , Jiri Denemark <jdenemar at redhat.com> wrote:

> On Wed, Oct 01, 2014 at 10:45:33 +0200, Cristian KLEIN wrote:
>> On 2014-09-30 17:16, Daniel P. Berrange wrote:
>>> On Tue, Sep 30, 2014 at 05:11:03PM +0200, Jiri Denemark wrote:
>>>> On Tue, Sep 30, 2014 at 16:39:22 +0200, Cristian Klein wrote:
>>>>> Signed-off-by: Cristian Klein <cristian.klein at cs.umu.se>
>>>>> ---
>>>>>  include/libvirt/libvirt.h.in | 1 +
>>>>>  src/libvirt.c                | 7 +++++++
>>>>>  2 files changed, 8 insertions(+)
>>>>> diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in
>>>>> index 5217ab3..82f3aeb 100644
>>>>> --- a/include/libvirt/libvirt.h.in
>>>>> +++ b/include/libvirt/libvirt.h.in
>>>>> @@ -1225,6 +1225,7 @@ typedef enum {
>>>>>      VIR_MIGRATE_ABORT_ON_ERROR    = (1 << 12), /* abort migration on I/O errors happened during migration */
>>>>>      VIR_MIGRATE_AUTO_CONVERGE     = (1 << 13), /* force convergence */
>>>>>      VIR_MIGRATE_RDMA_PIN_ALL      = (1 << 14), /* RDMA memory pinning */
>>>>> +    VIR_MIGRATE_POSTCOPY          = (1 << 15), /* enable (but don't start) post-copy */
>>>>>  } virDomainMigrateFlags;
>>>> I still think we should add an extra flag to start post copy
>>>> immediately. To address your concerns about it, I don't think it's
>>>> implementing a policy in libvirt. It's for apps that want to make sure
>>>> migration converges without having to spawn another thread and monitor
>>>> the progress or wait for a timeout. It's a bit similar to migrating a
>>>> paused domain vs. migrating a running domain and pausing it when it
>>>> doesn't seem to converge.
>>> Your point about spawning another thread makes me wonder if we should
>>> actually look at adding a 'VIR_MIGRATE_ASYNC' method (that would require
>>> P2P migration of course). If this flag were set, virDomainMigrateXXX would
>>> only block for long enough to start the migration and then return.
>>> Callers can use the job info API to monitor progress & success/failure.
>>> Then we wouldn't have to keep adding flags like you suggest - apps can
>>> just easily call the appropriate API right away with no threads needed
>> This would make a lot of sense. The user would call:
>> """
>> virDomainMigrateStartPostCopy(...)
>> """
>> Would this be seen as more cumbersome than having a dedicated 
> The ASYNC flag Daniel suggested makes sense, so I guess you can just
> ignore my request for a special flag. Although, I don't think the ASYNC
> stuff needs to be done within this series, let's just focus on the
> post-copy stuff.

Hi Jirka,

I talked to the qemu post-copy guys (Andrea and Dave in CC). Starting post-copy immediately is a bad performance choice: The VM will start on the destination hypervisor before the read-only or kernel memory is there. This means that those pages need to be pulled on-demand, hence a lot of overhead and interruptions in the VM’s execution.

Instead, it is better to first do one pass of pre-copy and only then trigger post-copy. In fact, I did an experiment with a video streaming VM and starting post-copy after the first pass of pre-copy (instead of starting post-copy immediately) reduces downtime from 3.5 seconds to under 1 second.

Given all above, I propose the following post-copy API in libvirt:

virDomainMigrateStartPostCopy(...) // from a different thread

This is for those who just need the post-copy mechanism and want to implement a policy themselves.


This is for those who want to use post-copy without caring about any low-level details, offering a good enough policy for most cases.

What do you think? Would you accept patches that implement this API?


More information about the libvir-list mailing list