[libvirt] [PATCH v2 1/8] Added public API to enable post-copy migration
Cristian Klein
cristian.klein at cs.umu.se
Thu Nov 6 07:18:04 UTC 2014
On 01 Oct 2014, at 12:07 , Jiri Denemark <jdenemar at redhat.com> wrote:
> On Wed, Oct 01, 2014 at 10:45:33 +0200, Cristian KLEIN wrote:
>> On 2014-09-30 17:16, Daniel P. Berrange wrote:
>>> On Tue, Sep 30, 2014 at 05:11:03PM +0200, Jiri Denemark wrote:
>>>> On Tue, Sep 30, 2014 at 16:39:22 +0200, Cristian Klein wrote:
>>>>> Signed-off-by: Cristian Klein <cristian.klein at cs.umu.se>
>>>>> ---
>>>>> include/libvirt/libvirt.h.in | 1 +
>>>>> src/libvirt.c | 7 +++++++
>>>>> 2 files changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in
>>>>> index 5217ab3..82f3aeb 100644
>>>>> --- a/include/libvirt/libvirt.h.in
>>>>> +++ b/include/libvirt/libvirt.h.in
>>>>> @@ -1225,6 +1225,7 @@ typedef enum {
>>>>> VIR_MIGRATE_ABORT_ON_ERROR = (1 << 12), /* abort migration on I/O errors happened during migration */
>>>>> VIR_MIGRATE_AUTO_CONVERGE = (1 << 13), /* force convergence */
>>>>> VIR_MIGRATE_RDMA_PIN_ALL = (1 << 14), /* RDMA memory pinning */
>>>>> + VIR_MIGRATE_POSTCOPY = (1 << 15), /* enable (but don't start) post-copy */
>>>>> } virDomainMigrateFlags;
>>>>
>>>> I still think we should add an extra flag to start post copy
>>>> immediately. To address your concerns about it, I don't think it's
>>>> implementing a policy in libvirt. It's for apps that want to make sure
>>>> migration converges without having to spawn another thread and monitor
>>>> the progress or wait for a timeout. It's a bit similar to migrating a
>>>> paused domain vs. migrating a running domain and pausing it when it
>>>> doesn't seem to converge.
>>>
>>> Your point about spawning another thread makes me wonder if we should
>>> actually look at adding a 'VIR_MIGRATE_ASYNC' method (that would require
>>> P2P migration of course). If this flag were set, virDomainMigrateXXX would
>>> only block for long enough to start the migration and then return.
>>>
>>> Callers can use the job info API to monitor progress & success/failure.
>>>
>>> Then we wouldn't have to keep adding flags like you suggest - apps can
>>> just easily call the appropriate API right away with no threads needed
>>
>> This would make a lot of sense. The user would call:
>>
>> """
>> virDomainMigrateXXX(..., VIR_MIGRATE_POSTCOPY | VIR_MIGRATE_ASYNC)
>> virDomainMigrateStartPostCopy(...)
>> """
>>
>> Would this be seen as more cumbersome than having a dedicated
>> VIR_MIGRATE_POSTCOPY_AUTOSTART?
>
> The ASYNC flag Daniel suggested makes sense, so I guess you can just
> ignore my request for a special flag. Although, I don't think the ASYNC
> stuff needs to be done within this series, let's just focus on the
> post-copy stuff.
Hi Jirka,
I talked to the qemu post-copy guys (Andrea and Dave in CC). Starting post-copy immediately is a bad performance choice: The VM will start on the destination hypervisor before the read-only or kernel memory is there. This means that those pages need to be pulled on-demand, hence a lot of overhead and interruptions in the VM’s execution.
Instead, it is better to first do one pass of pre-copy and only then trigger post-copy. In fact, I did an experiment with a video streaming VM and starting post-copy after the first pass of pre-copy (instead of starting post-copy immediately) reduces downtime from 3.5 seconds to under 1 second.
Given all above, I propose the following post-copy API in libvirt:
virDomainMigrateXXX(..., VIR_MIGRATE_ENABLE_POSTCOPY)
virDomainMigrateStartPostCopy(...) // from a different thread
This is for those who just need the post-copy mechanism and want to implement a policy themselves.
virDomainMigrateXXX(..., VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY)
This is for those who want to use post-copy without caring about any low-level details, offering a good enough policy for most cases.
What do you think? Would you accept patches that implement this API?
Cristian
More information about the libvir-list
mailing list