[libvirt RFC] add API for parallel Saves (not for committing)

Claudio Fontana cfontana at suse.de
Thu Apr 21 18:06:40 UTC 2022


On 4/21/22 7:08 PM, Daniel P. Berrangé wrote:
> On Thu, Apr 14, 2022 at 09:54:16AM +0200, Claudio Fontana wrote:
>> RFC, starting point for discussion.
>>
>> Sketch API changes to allow parallel Saves, and open up
>> and implementation for QEMU to leverage multifd migration to files,
>> with optional multifd compression.
>>
>> This allows to improve save times for huge VMs.
>>
>> The idea is to issue commands like:
>>
>> virsh save domain /path/savevm --parallel --parallel-connections 2
>>
>> and have libvirt start a multifd migration to:
>>
>> /path/savevm   : main migration connection
>> /path/savevm.1 : multifd channel 1
>> /path/savevm.2 : multifd channel 2
> 
> At a conceptual level the idea would to still have a single file,
> but have threads writing to different regions of it. I don't think
> that's possible with multifd though, as it doesn't partition RAM
> up between threads, its just hands out pages on demand. So if one
> thread happens to be quicker it'll send more RAM than another
> thread. Also we're basically capturing the migration RAM, and the
> multifd channels have control info, in addition to the RAM pages.
> 
> That makes me wonder actually, are the multifd streams unidirectional
> or bidirectional ?  Our saving to a file logic, relies on the streams
> being unidirectional.


Unidirectional. In the meantime I completed an actual libvirt prototype that works (only did the save part, not the restore yet).


> 
> You've got me thinking, however, whether we can take QEMU out of
> the loop entirely for saving RAM.
> 
> IIUC with 'x-ignore-shared' migration capability QEMU will skip
> saving of RAM region entirely (well technically any region marked
> as 'shared', which I guess can cover more things). 

Heh I have no idea about this.

> 
> If the QEMU process is configured with a file backed shared
> memory, or memfd, I wonder if we can take advantage of this.
> eg
> 
>   1. pause the VM
>   1. write the libvirt header to save.img
>   2. sendfile(qemus-memfd, save.img-fd)  to copy the entire
>      RAM after header

I don't understand this point very much... if the ram is already backed by file why are we sending this again..?

>   3. QMP migrate with x-ignore-shared to copy device
>      state after RAM
> 
> Probably can do the same on restore too.


Do I understand correctly that you suggest to constantly update the RAM to file at runtime?
Given the compute nature of the workload, I'd think this would slow things down.

We need to evict the memory to disk rarely, but when that happens it should be as fast as possible.

The advantage of the multifd idea was, we have cpus sitting there doing nothing that were reserved for running the VM,
we may as well use them to reduce the size of the problem substantially by compressing each stream separately.

> 
> Now, this would only work for a 'save' and 'restore', not
> for snapshots, as it would rely on the VCPUs being paused
> to stop RAM being modified.
> 
>>
>> Signed-off-by: Claudio Fontana <cfontana at suse.de>
>> ---
>>  include/libvirt/libvirt-domain.h | 5 +++++
>>  src/driver-hypervisor.h          | 7 +++++++
>>  src/libvirt_public.syms          | 5 +++++
>>  src/qemu/qemu_driver.c           | 1 +
>>  tools/virsh-domain.c             | 8 ++++++++
>>  5 files changed, 26 insertions(+)
>>
>> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
>> index 2d5718301e..a7b9c4132d 100644
>> --- a/include/libvirt/libvirt-domain.h
>> +++ b/include/libvirt/libvirt-domain.h
>> @@ -1270,6 +1270,7 @@ typedef enum {
>>      VIR_DOMAIN_SAVE_RUNNING      = 1 << 1, /* Favor running over paused */
>>      VIR_DOMAIN_SAVE_PAUSED       = 1 << 2, /* Favor paused over running */
>>      VIR_DOMAIN_SAVE_RESET_NVRAM  = 1 << 3, /* Re-initialize NVRAM from template */
>> +    VIR_DOMAIN_SAVE_PARALLEL     = 1 << 4, /* Parallel Save/Restore to multiple files */
>>  } virDomainSaveRestoreFlags;
>>  
>>  int                     virDomainSave           (virDomainPtr domain,
>> @@ -1278,6 +1279,10 @@ int                     virDomainSaveFlags      (virDomainPtr domain,
>>                                                   const char *to,
>>                                                   const char *dxml,
>>                                                   unsigned int flags);
>> +int                     virDomainSaveParametersFlags (virDomainPtr domain,
>> +                                                      virTypedParameterPtr params,
>> +                                                      int nparams,
>> +                                                      unsigned int flags);
>>  int                     virDomainRestore        (virConnectPtr conn,
>>                                                   const char *from);
>>  int                     virDomainRestoreFlags   (virConnectPtr conn,
>> diff --git a/src/driver-hypervisor.h b/src/driver-hypervisor.h
>> index 4423eb0885..a4e1d21e76 100644
>> --- a/src/driver-hypervisor.h
>> +++ b/src/driver-hypervisor.h
>> @@ -240,6 +240,12 @@ typedef int
>>                           const char *dxml,
>>                           unsigned int flags);
>>  
>> +typedef int
>> +(*virDrvDomainSaveParametersFlags)(virDomainPtr domain,
>> +                                   virTypedParameterPtr params,
>> +                                   int nparams,
>> +                                   unsigned int flags);
>> +
>>  typedef int
>>  (*virDrvDomainRestore)(virConnectPtr conn,
>>                         const char *from);
>> @@ -1489,6 +1495,7 @@ struct _virHypervisorDriver {
>>      virDrvDomainGetControlInfo domainGetControlInfo;
>>      virDrvDomainSave domainSave;
>>      virDrvDomainSaveFlags domainSaveFlags;
>> +    virDrvDomainSaveParametersFlags domainSaveParametersFlags;
>>      virDrvDomainRestore domainRestore;
>>      virDrvDomainRestoreFlags domainRestoreFlags;
>>      virDrvDomainSaveImageGetXMLDesc domainSaveImageGetXMLDesc;
>> diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms
>> index f93692c427..eb3a7afb75 100644
>> --- a/src/libvirt_public.syms
>> +++ b/src/libvirt_public.syms
>> @@ -916,4 +916,9 @@ LIBVIRT_8.0.0 {
>>          virDomainSetLaunchSecurityState;
>>  } LIBVIRT_7.8.0;
>>  
>> +LIBVIRT_8.3.0 {
>> +    global:
>> +        virDomainSaveParametersFlags;
>> +} LIBVIRT_8.0.0;
>> +
>>  # .... define new API here using predicted next version number ....
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index 77012eb527..249105356c 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -20826,6 +20826,7 @@ static virHypervisorDriver qemuHypervisorDriver = {
>>      .domainGetControlInfo = qemuDomainGetControlInfo, /* 0.9.3 */
>>      .domainSave = qemuDomainSave, /* 0.2.0 */
>>      .domainSaveFlags = qemuDomainSaveFlags, /* 0.9.4 */
>> +    .domainSaveParametersFlags = qemuDomainSaveParametersFlags, /* 8.3.0 */
>>      .domainRestore = qemuDomainRestore, /* 0.2.0 */
>>      .domainRestoreFlags = qemuDomainRestoreFlags, /* 0.9.4 */
>>      .domainSaveImageGetXMLDesc = qemuDomainSaveImageGetXMLDesc, /* 0.9.4 */
>> diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
>> index d5fd8be7c3..ccded6d265 100644
>> --- a/tools/virsh-domain.c
>> +++ b/tools/virsh-domain.c
>> @@ -4164,6 +4164,14 @@ static const vshCmdOptDef opts_save[] = {
>>       .type = VSH_OT_BOOL,
>>       .help = N_("avoid file system cache when saving")
>>      },
>> +    {.name = "parallel",
>> +     .type = VSH_OT_BOOL,
>> +     .help = N_("enable parallel save to files")
>> +    },
>> +    {.name = "parallel-connections",
>> +     .type = VSH_OT_INT,
>> +     .help = N_("number of connections/files for parallel save")
>> +    },
>>      {.name = "xml",
>>       .type = VSH_OT_STRING,
>>       .completer = virshCompletePathLocalExisting,
>> -- 
>> 2.34.1
>>
> 
> With regards,
> Daniel
> 



More information about the libvir-list mailing list