[libvirt] [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd

Mon Jul 9 20:00:17 UTC 2012

On 05/17/2012 09:14 AM, Eric Blake wrote:
> On 05/17/2012 07:42 AM, Stefan Hajnoczi wrote:
>
>>>>
>>>> The -open-hook-fd approach allows QEMU to support file descriptor passing
>>>> without changing -drive.  It also supports snapshot_blkdev and other commands
>>> By the way, How will it support them?
>>
>> The problem with snapshot_blkdev is that closing a file and opening a
>> new file cannot be done by the QEMU process when an SELinux policy is in
>> place to prevent opening files.
>
> snapshot_blkdev can take an fd:name instead of a /path/to/file for the
> file to open, in which case libvirt can pass in the named fd _prior_ to
> the snapshot_blkdev using the 'getfd' monitor command.
>
>>
>> The -open-hook-fd approach works even when the QEMU process is not
>> allowed to open files since file descriptor passing over a UNIX domain
>> socket is used to open files on behalf of QEMU.
>
> The -open-hook-fd approach would indeed allow snapshot_blokdev to ask
> for the fd after the fact, but it's much more painful.  Consider a case
> with a two-disk snapshot:
>
> with the fd:name approach, the sequence is:
>
> libvirt calls getfd:name1 over normal monitor
> qemu responds
> libvirt calls getfd:name2 over normal monitor
> qemu responds
> libvirt calls transaction around blockdev-snapshot-sync over normal
> monitor, using fd:name1 and fd:name2
> qemu responds
>
> but with -open-hook-fd, the approach would be:
>
> libvirt calls transaction
> qemu calls open(file1) over hook
> libvirt responds
> qemu calls open(file2) over hook
> libvirt responds
> qemu responds to the original transaction
>
> The 'transaction' operation is thus blocked by the time it takes to do
> two intermediate opens over a second channel, which kind of defeats the
> purpose of making the transaction take effect with minimal guest
> downtime.

How are you defining "guest down time"?

It's important to note that code running in QEMU does not equate to guest 
visible down time unless QEMU does an explicit vm_stop() which is not happening 
here.

Instead, a VCPU may become blocked *if* it attempts to acquire qemu_mute while 
QEMU is holding it.

If your concern is qemu_mutex being held while waiting for libvirt, it would be 
fairly easy to implement a qemu_open_async() that dropped allowed dropping back 
to the main loop and then calling a callback when the open completes.

It would be pretty trivial to convert qmp_transaction to use such a command.

But this is all speculative.  There's no reason to believe that an RPC would 
have a noticable guest visible latency unless you assume there's lot contention. 
  I would strongly suspect that the bdrv_flush() is going to be a much greater 
source of lock contention than the RPC would be.  An RPC is only bound by 
scheduler latency whereas synchronous disk I/O is bound spinning a platter.

> And libvirt code becomes a lot trickier to deal with the fact
> that two channels are in use, and that the channel that issued the
> 'transaction' command must block while the other channel for handling
> hooks must be responsive.

All libvirt needs to do is listen on a socket and delegate access according to a 
white list.  Whatever is providing fd's needs to have no knowledge of anythign 
other than what the guest is allowed to access which shouldn't depend on an 
executing command.

Regards,

Anthony Liguori

> I'm really disliking the hook-fd approach, when a better solution is to
> make use of 'getfd' in advance of any operation that will need to open
> new fds.
>