[libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd

Corey Bryant coreyb at linux.vnet.ibm.com
Thu Jul 5 16:35:26 UTC 2012



On 07/05/2012 10:51 AM, Kevin Wolf wrote:
> Am 05.07.2012 16:22, schrieb Corey Bryant:
>>>> For some examples:
>>>>
>>>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in
>>>> use by monitor, as member of fdset1
>>>> 2. client crashes, so all tracked fds are visited; fd=4 had not yet been
>>>> passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is
>>>> now 0 so qemu closes it
>>>
>>> Doesn't the refcount belong to an fdset rather than a single fd? As long
>>> as the fdset has still a reference, it's possible to reach the other fds
>>> in the set through a reopen. For example:
>>>
>>> 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member
>>> of fdset1, in use by monitor, refcount 1
>>> 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a
>>> member of fdset, in use by monitor, refcount is still 1
>>> 3. client calls 'device-add' with /dev/fdset/1 as the backing filename
>>> and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2
>>> 4. client crashes, so all tracked fdsets are visited; fdset1 gets its
>>> refcount decreased to 1, and both member fds 4 and 5 stay open.
>>>
>>> If you had refcounted a single fd, fd 5 would be closed now and qemu
>>> couldn't switch to O_RDONLY any more.
>>>
>>
>> If the O_RDONLY is for a future command, it would make more sense to add
>> that fd before that future command.  Or are you passing it at step 2
>> because it is needed for the probe re-open issue?
>
> Come on, requiring realistic examples isn't fair. ;-)

Heh :)

>
> Swap steps 2 and 3 in the example, then step 3 is just the preparation
> for a command that uses the new fd. The problem stays the same. Or a
> live commit operation could be in flight so that qemu must be able to
> switch on completion without libvirt interaction.

Good point.

I thought it would be useful to run through the examples again with each 
fdset having a refcount, and each fd having an in-use flag.  One 
difference though is that refcount is only incremented/decremented when 
qemu_open/qemu_close are called.  I think this works and covers Kevin's 
concerns.  Actually it might be a bit simpler too.

1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with 
refcount of 0; fd=4's in-use flag is turned on
2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
so qemu_open() increments the refcount of fdset1 to 1
3. client crashes, so all tracked fds are visited; fd=4 had not yet been
passed to 'remove-fd', so it's in-use flag is on;  in-use flag is turned 
off and fd=4 is left open because it is still in use by the block device 
(refcount is 1)
4. client re-establishes QMP connection, so all tracked fds are visited, 
and in-use flags are turned back on; 'query-fds' lets client learn about 
fd=4 still being open as part of fdset1

1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with 
refcount of 0; fd=4's in-use flag is turned on
2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
so qemu_open() increments the refcount of fdset1 to 1
3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no
longer in-use by the monitor, and is left open because it is in use by 
the block device (refcount is 1)
4. client crashes, so all tracked fds are visited; fd=4 is not in-use 
but refcount is 1 so it is not closed

1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with 
refcount of 0; fd=4's in-use flag is turned on
2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
but the command fails for some other reason, so the refcount is still 0
at the end of the command (although it may have been temporarily
incremented then decremented during the command)
3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or
QMP connection is closed), so qemu marks fd=4 as no longer in-use by the 
monitor; refcount is 0 so all fds in fdset1 are closed

1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with 
refcount of 0; fd=4's in-use flag is turned on
2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with 
refcount still 0; fd=5's in-use flag is turned on
3. client crashes, so all tracked fds are visited; fd=4 and fd=5 had not 
yet been passed to 'remove-fd', so their in-use flags are on; in-use 
flags are turned off; refcount of fdset1 is 0 so qemu closes all fds in 
fdset1

1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with 
refcount of 0; fd=4's in-use flag is turned on
2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with 
refcount still 0; fd=5's in-use flag is turned on
3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no
longer in-use by the monitor, and fd=4 is closed since the refcount is 
0; fd=5 remains open

1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 
with refcount of 0; fd=4's in-use flag is turned on
2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 
with refcount still 0; fd=5's in-use flag is turned on
3. client calls 'device-add' with /dev/fdset/1 as the backing filename 
and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1
4. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no
longer in-use by the monitor, but fd=4 remains open because refcount of 
fdset1 is 1
5. client calls 'remove-fd fdset=1 fd=5', so qemu marks fd=5 as no
longer in-use by the monitor, and fd=5 remains open because refcount of 
fdset1 is 1
6. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both 
fds are closed since refcount is 0 and their in-use flags are off

1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 
with refcount of 0; fd=4's in-use flag is turned on
2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 
with refcount still 0; fd=5's in-use flag is turned on
3. client calls 'device-add' with /dev/fdset/1 as the backing filename 
and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1
4. client crashes, so all tracked fds are visited; fd=4 and fd=5 have 
not yet been passed to 'remove-fd', so their in-use flags are on; 
in-use flags are turned off and both fds are left open because the set 
is still in use by the block device (refcount is 1)
5. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both 
fds are closed since refcount is 0 and their in-use flags are off

>
> There are enough reasons that an fd in an fdset exists and is not
> opened, but I can't think of one why it should be dropped.
>
>>>>>> In use by whom? If it's still in use in qemu (as in "in-use flag would
>>>>>> be set") and we have a refcount of zero, then that's a bug.
>>>>>>
>>>>>
>>>>> In use by qemu.  I don't think it's a bug.  I think there are situations
>>>>> where refcount gets to zero but qemu is still using the fd.
>>>>
>>>> I think the refcount being non-zero _is_ what defines an fd as being in
>>>> use by qemu (including use by the monitor).  Any place you have to close
>>>> an fd before reopening it is dangerous; the safe way is always to open
>>>> with the new permissions before closing the old permissions.
>>>
>>> There is one case I'm aware of where we need to be careful: Before
>>> opening an image, qemu may probe the format. In this case, the image
>>> gets opened twice, and the first close comes before the second open. I'm
>>> not entirely sure how hard it would be to get rid of that behaviour.
>>>
>>> If we can't get rid of it, we have a small window that the refcount
>>> doesn't really cover, and if we weren't careful it would become racy.
>>> This is why I mentioned earlier that maybe we need to defer the refcount
>>> decrease a bit. However, I can't see how the in-use flag would make a
>>> difference there. If the refcount can't cover it, the in-use flag can't
>>> either.
>>
>> Yeah this is a problem.  Could we introduce another flag to cover this?
>
> Adding more refcounts or flags is not a problem, but it doesn't solve it
> either. The hard question is when to set that flag.
>
> I believe it may be easier to just change the block layer so that it
> opens files only once during bdrv_open().
>
> Kevin
>

-- 
Regards,
Corey





More information about the libvir-list mailing list