[libvirt RFCv11 00/33] multifd save restore prototype

Claudio Fontana cfontana at suse.de
Mon Oct 23 14:06:53 UTC 2023


On 10/11/23 17:29, Daniel P. Berrangé wrote:
> On Wed, Oct 11, 2023 at 04:56:12PM +0200, Claudio Fontana wrote:
>>
>> On 10/11/23 16:05, Daniel P. Berrangé wrote:
>>>
>>> Instead of using 'getfd' though we have to use 'add-fd'.
>>>
>>> Anyway, this lets us do FD passing as normal, whle also
>>> letting us specify the offset.
>>>
>>>  {"execute": "add-fd", "arguments": {"fdset-id":"migrate"}}
>>>  {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'


Hi Daniel,

the "add-fd" is the part that I don't understand at all,

should we actually pass an fd there like with fd-get, already open with the savevm file?
Something in pseudocode like:

virsh qemu-monitor-command --pass-fds 10 --cmd='{"execute": "add-fd", "arguments": {"fdset-id":10}} ?

should we use "opaque" instead of "fdset-id" if you want to actually set it to "migrate"?
And how to reference it later?

virsh qemu-monitor-command --cmd='{"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}

?

"opaque" does not seem to get me a reachable /dev/fdset/migrate though.

I can currently trigger the migration to the URI file:/mnt/nvme/savevm so that seems to work fine,
it's the file:/dev/fdset part that I am still unable to glue together.

Thanks for any idea,

Claudio


>>>
>>>> Internally, the QEMU multifd code just reads and writes using pread, pwrite, so there is in any case just one fd to worry about,
>>>> but who should own it, libvirt or QEMU?
>>>
>>> How about both :-)
>>
>> I need to familiarize a bit with this, there are pieces I am missing. Can you correct here?
>>
>> OPTION 1)
>>
>> libvirt opens the file and has the FD, writes the header, marks the offset,
>> then we dup the FD in libvirt for the benefit of QEMU, optionally set the flags of the dup to "O_DIRECT" (the usual case) depending on --bypass-cache,
>> pass the duped FD to QEMU,
>> QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the file:// URI optional offset parameter),
>> then libvirt closes the duped fd
>> libvirt rewrites the header using the original fd (needed to update the metadata),
>> libvirt closes the original fd
>>
>>
>> OPTION 2)
>>
>> libvirt opens the file and has the FD, writes the header, marks the offset,
>> then we pass the FD to QEMU,
>> QEMU dups the FD and sets it as "O_DIRECT" depending on a passed parameter,
>> QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the file:// URI optional offset parameter),
>> QEMU closes the duped FD,
>> libvirt rewrites the header using the original fd (needed to update the metadata),
>> libvirt closes the original fd
>>
>>
>> I don't remember if QEMU changes for the file offsets optimization are already "block friendly" ie they operate correctly whatever the state of O_DIRECT or ~O_DIRECT,
>> I think so. They have been thought with O_DIRECT in mind.
> 
> The 'file' protocol as it exists currently is not O_DIRECT
> capable. It is not writing aligned buffers to aligned offsets
> in the file. It is still running the regular old migration
> stream format over the file, not taking advantage of it being
> random access.
> 
> What's needed is the followup "fixed ram" format adaptation.
> Use of that format should imply O_DIRECT, so in fact we
> don't need an explicit 'bypass_cache' parameter in QAPI,
> just a way to ask for the 'fixed ram' format.
> 
>> So I would tend to see OPTION 1) as more attractive as QEMU does not need to care about another parameter, whatever has been chosen in libvirt in terms of bypass cache is handled in libvirt.
> 
> The 'fixed ram' format will only take care of I/O for the
> main RAM blocks which are nicely aligned and can be written
> to aligned file offsets. The general device vmstate I/O
> probably can't be assumed to be aligned. While we could
> futz around with QEMUFile so that it bounce buffers vmstate
> to an aligned region and flushes it in page sized chunks
> that's probably too much of a pain.
> 
> IOW, actually I think what QEMU would likely want to
> do is
> 
>  1. qemu_open  -> get a FD *without* O_DIRECT set
>  2. write some vmstate stuff
>  3. turn on O_DIRECT
>  4. write RAM in fixed locations
>  5. turn off O_DIRECT
>  6. write remaining vmstate
> 
> With regards,
> Daniel



More information about the libvir-list mailing list