[libvirt RFCv8 00/27] multifd save restore prototype

Wed May 11 12:02:32 UTC 2022

On Wed, May 11, 2022 at 01:52:05PM +0200, Claudio Fontana wrote:
> On 5/11/22 11:51 AM, Daniel P. Berrangé wrote:
> > On Wed, May 11, 2022 at 09:26:10AM +0200, Claudio Fontana wrote:
> >> Hi Daniel,
> >>
> >> thanks for looking at this,
> >>
> >> On 5/10/22 8:38 PM, Daniel P. Berrangé wrote:
> >>> On Sat, May 07, 2022 at 03:42:53PM +0200, Claudio Fontana wrote:
> >>>> This is v8 of the multifd save prototype, which fixes a few bugs,
> >>>> adds a few more code splits, and records the number of channels
> >>>> as well as the compression algorithm, so the restore command is
> >>>> more user-friendly.
> >>>>
> >>>> It is now possible to just say:
> >>>>
> >>>> virsh save mydomain /mnt/saves/mysave --parallel
> >>>>
> >>>> virsh restore /mnt/saves/mysave --parallel
> >>>>
> >>>> and things work with the default of 2 channels, no compression.
> >>>>
> >>>> It is also possible to say of course:
> >>>>
> >>>> virsh save mydomain /mnt/saves/mysave --parallel
> >>>>       --parallel-connections 16 --parallel-compression zstd
> >>>>
> >>>> virsh restore /mnt/saves/mysave --parallel
> >>>>
> >>>> and things also work fine, due to channels and compression
> >>>> being stored in the main save file.
> >>>
> >>> For the sake of people following along, the above commands will
> >>> result in creation of multiple files
> >>>
> >>>   /mnt/saves/mysave
> >>>   /mnt/saves/mysave.0
> >>
> >> just minor correction, there is no .0
> > 
> > Heh, off-by-1
> > 
> >>
> >>>   /mnt/saves/mysave.1
> >>>   ....
> >>>   /mnt/saves/mysave.n
> >>>
> >>> Where 'n' is the number of threads used.
> >>>
> >>> Overall I'm not very happy with the approach of doing any of this
> >>> on the libvirt side.
> >>
> >>
> >> Ok I understand your concern.
> >>
> >>>
> >>> Backing up, we know that QEMU can directly save to disk faster than
> >>> libvirt can. We mitigated alot of that overhead with previous patches
> >>> to increase the pipe buffer size, but some still remains due to the
> >>> extra copies inherant in handing this off to libvirt.
> >>
> >> Right;
> >> still the performance we get is insufficient for the use case we are trying to address,
> >> even without libvirt in the picture.
> >>
> >> Instead, with parallel save + compression we can make the numbers add up.
> >> For parallel save using multifd, the overhead of libvirt is negligible.
> >>
> >>>
> >>> Using multifd on the libvirt side, IIUC, gets us better performance
> >>> than QEMU can manage if doing non-multifd write to file directly,
> >>> but we still have the extra copies in there due to the hand off
> >>> to libvirt. If QEMU were to be directly capable to writing to
> >>> disk with multifd, it should beat us again.
> >>
> >> Hmm I am thinking about this point, and at first glance I don't
> >> think this is 100% accurate;
> >>
> >> if we do parallel save like in this series with multifd,
> >> the overhead of libvirt is almost non-existent in my view
> >> compared with doing it with qemu only, skipping libvirt,
> >> it is limited to the one iohelper for the main channel
> >> (which is the smallest of the transfers),
> >> and maybe this could be removed as well.
> > 
> > Libvirt adds overhead due to the multiple data copies in
> > the save process. Using multifd doesn't get rid of this
> > overhead, it merely distributes the overhead across many
> > CPUs. The overall wallclock time is reduced but in aggregate
> > the CPUs still have the same amount of total work todo
> > copying data around.
> > 
> > I don't recall the scale of the libvirt overhead that remains
> > after the pipe buffer optimizations, but whatever is less is
> > still taking up host CPU time that can be used for other guests.
> > 
> > It also just ocurred to me that currently our save/restore
> > approach is bypassing all resource limits applied to the
> > guest. eg block I/O rate limits, CPU affinity controls,
> > etc, because most of the work is done in the iohelper.
> > If we had this done in QEMU, then the save/restore process
> > is confined by the existing CPU affinity / I/o limits
> > applied to the guest. This mean we would not negatively
> > impact other co-hosted guests to the same extent.
> > 
> >> This is because even without libvirt in the picture, we
> >> are still migrating to a socket, and something needs to
> >> transfer data from that socket to a file. At that point
> >> I think both libvirt and a custom made script are in the
> >> same position.
> > 
> > If QEMU had explicit support for a "file" backend, there
> > would be no socket involved at all. QEMU would be copying
> > guest RAM directly to a file with no intermediate steps.
> > If QEMU mmap'd the save state file, then saving of the
> > guest RAM could even possibly reduce to a mere 'memcpy()'
> 
> Agree, but still, to align with your requirement to have only one file,
> libvirt would need to add some padding after the libvirt header and before the QEMU VM starts in the file,
> so that the QEMU VM starts at a block-friendly address.

That's trivial, as we already add padding in this place.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|