[libvirt RFCv8 00/27] multifd save restore prototype

Daniel P. Berrangé berrange at redhat.com
Wed May 11 09:42:26 UTC 2022


On Wed, May 11, 2022 at 10:27:30AM +0200, Christophe Marie Francois Dupont de Dinechin wrote:
> 
> 
> > On 10 May 2022, at 20:38, Daniel P. Berrangé <berrange at redhat.com> wrote:
> > 
> > On Sat, May 07, 2022 at 03:42:53PM +0200, Claudio Fontana wrote:
> >> This is v8 of the multifd save prototype, which fixes a few bugs,
> >> adds a few more code splits, and records the number of channels
> >> as well as the compression algorithm, so the restore command is
> >> more user-friendly.
> >> 
> >> It is now possible to just say:
> >> 
> >> virsh save mydomain /mnt/saves/mysave --parallel
> >> 
> >> virsh restore /mnt/saves/mysave --parallel
> >> 
> >> and things work with the default of 2 channels, no compression.
> >> 
> >> It is also possible to say of course:
> >> 
> >> virsh save mydomain /mnt/saves/mysave --parallel
> >>      --parallel-connections 16 --parallel-compression zstd
> >> 
> >> virsh restore /mnt/saves/mysave --parallel
> >> 
> >> and things also work fine, due to channels and compression
> >> being stored in the main save file.
> > 
> > For the sake of people following along, the above commands will
> > result in creation of multiple files
> > 
> >  /mnt/saves/mysave
> >  /mnt/saves/mysave.0
> >  /mnt/saves/mysave.1
> >  ....
> >  /mnt/saves/mysave.n
> > 
> > Where 'n' is the number of threads used.
> > 
> > Overall I'm not very happy with the approach of doing any of this
> > on the libvirt side.
> > 
> > Backing up, we know that QEMU can directly save to disk faster than
> > libvirt can. We mitigated alot of that overhead with previous patches
> > to increase the pipe buffer size, but some still remains due to the
> > extra copies inherant in handing this off to libvirt.
> > 
> > Using multifd on the libvirt side, IIUC, gets us better performance
> > than QEMU can manage if doing non-multifd write to file directly,
> > but we still have the extra copies in there due to the hand off
> > to libvirt. If QEMU were to be directly capable to writing to
> > disk with multifd, it should beat us again.
> > 
> > As a result of how we integrate with QEMU multifd, we're taking the
> > approach of saving the state across multiple files, because it is
> > easier than trying to get multiple threads writing to the same file.
> > It could be solved by using file range locking on the save file.
> > eg a thread can reserve say 500 MB of space, fill it up, and then
> > reserve another 500 MB, etc, etc. It is a bit tedious though and
> > won't align nicely. eg a 1 GB huge page, would be 1 GB + a few
> > bytes of QEMU RAM ave state header.
> 
> First, I do not understand why you would write things that are
> not page-aligned to start with? (As an aside, I don’t know
> how any dirty tracking would work if you do not keep things
> page-aligned).
> 
> Could uffd_register_memory accept a memory range that is
> not aligned? If so, when? Should that be specified in the
> interface?
> 
> Second, instead of creating multiple files, why not write blocks
> at a location determined by an variable that you increment using
> atomic operations each time you need a new block? If you want to
> keep the blocks page-aligned in the file as well (which might help
> if you want to mmap the file at some point), then you need to
> build a map of the blocks that you tack at the end of the file.
> 
> There may be good reasons not to do it that way, of course, but
> I am not familiar enough with the problem to know them.

This is all because QEMU is not actually writing to a file. From
QEMU's POV it just thinks it is migrating to another QEMU via
a pipe. So the questions of page alignment, position are irrelevant
to QEMU's needs - it just has a stream.

Libvirt is just capturing this raw migration stream and writing
its contents out to a file. The contents of the stream are completely
opaque to libvirt, and we don't want be be unpacking this stream to
do anything more clever. It is better to invest it making QEMU
know that it is writing to a file directly.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


More information about the libvir-list mailing list