[libvirt] [PATCH 3/3] virCommand: use procfs to learn opened FDs

Eric Blake eblake at redhat.com
Sat Jul 13 15:33:24 UTC 2019


[adding libc-help in cc]

On 7/13/19 2:41 AM, Michal Prívozník wrote:
> On 7/12/19 6:55 PM, Eric Blake wrote:
>> On 7/3/19 2:19 AM, Michal Privoznik wrote:
>>> When spawning a child process, between fork() and exec() we close
>>> all file descriptors and keep only those the caller wants us to
>>> pass onto the child. The problem is how we do that. Currently, we
>>> get the limit of opened files and then iterate through each one
>>> of them and either close() it or make it survive exec(). This
>>> approach is suboptimal (although, not that much in default
>>> configurations where the limit is pretty low - 1024). We have
>>> /proc where we can learn what FDs we hold open and thus we can
>>> selectively close only those.
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>>> ---
>>>  src/util/vircommand.c | 78 ++++++++++++++++++++++++++++++++++++++-----
>>>  1 file changed, 70 insertions(+), 8 deletions(-)
>>
>>> +# ifdef __linux__
>>> +/* On Linux, we can utilize procfs and read the table of opened
>>> + * FDs and selectively close only those FDs we don't want to pass
>>> + * onto child process (well, the one we will exec soon since this
>>> + * is called from the child). */
>>> +static int
>>> +virCommandMassCloseGetFDsLinux(virCommandPtr cmd ATTRIBUTE_UNUSED,
>>> +                               virBitmapPtr fds)
>>> +{
>>> +    DIR *dp = NULL;
>>> +    struct dirent *entry;
>>> +    const char *dirName = "/proc/self/fd";
>>> +    int rc;
>>> +    int ret = -1;
>>> +
>>> +    if (virDirOpen(&dp, dirName) < 0)
>>> +        return -1;
>>
>> Unfortunately, POSIX says that opendir()/readdir() are not async-safe
>> (the list of async-safe functions is
>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04;
>> some implementations of opendir() call malloc(), and malloc() is
>> definitely not async-safe - if you ever fork() in one thread while
>> another thread is locked inside a malloc, your child process is
>> deadlocked if it has to malloc because the forked process no longer has
>> the thread to release the malloc lock).  It also says readdir() is not
>> threadafe
>> (https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09)
>>
>> Therefore, opendir() (hidden in virDirOpen) and readdir() (hidden in
>> virDirRead) are generally unsafe to be used between fork() of a
>> multi-threaded app and the subsequent exec(). But maybe you are safe
>> because this code is only compiled on Linux? Are we absolutely sure this
>> can't deadlock, in spite of violating POSIX constraints on async-safety?
>>
>> http://austingroupbugs.net/view.php?id=696 points out some interesting
>> problems from the POSIX side - readdir_r() is absolutely broken, and
>> readdir() being marked as thread-unsafe may be too strict.  But then
>> again, http://austingroupbugs.net/view.php?id=75 states
>>    "The application shall not modify the structure to which the
>>      return value of readdir() points, nor any storage areas pointed to
>>      by pointers within the structure. The returned pointer, and
>>      pointers within the structure, might be invalidated or the
>>      structure or the storage areas might be overwritten by a
>>      subsequent call to readdir() on the same directory stream.
>>      They shall not be affected by a call to readdir() on a different
>>      directory stream."
>> where it may be the intent that if opendir() doesn't take any locks or
>> other async-unsafe actions, using opendir() after fork() may be safe
>> after all (readdir on a DIR* that was opened in the parent is not, but
>> readdir() on a fresh DIR* opened in the child might be safe, even if not
>> currently strictly portable).
> 
> D'oh. POSIX and its limitations. I could allocate the bitmap in the
> parent, sure. But avoiding opendir() is not possible (that's the whole
> point of this patch). While testing this - on Linux - I did not run into
> any deadlock, but maybe I was just lucky. On the other hand, this is
> going to run only on Linux, on anything else we'll still iterate over
> all FDs.
> 
> I find it rather surprising (in a disturbing way) that there is no
> better solution to this problem than iterating over all FDs and closing
> them. One by one.

Does anyone know if glibc guarantees that opendir/readdir in between
multi-threaded fork() and exec() is safe, even though POSIX does not
guarantee that safety in general?  I know that one approach to avoid
having to close all fds is religiously using O_CLOEXEC everywhere (so
that the race window of having an fd that would leak is not possible),
but that's also an expensive audit, compared to just ensuring that
things are closed after fork().  Are there any other ideas out there
that we should be aware of (and no, pthread_atfork is not a sane idea)?
(various BSD systems have the closefrom() syscall which is more
efficient than lots of close() calls; and Linux may be adding something
similar https://lwn.net/Articles/789023/), Is there any saner way to
close all unneeded fds that were not properly marked O_CLOEXEC by an
application linking against a multithreaded lib that must perform fork/exec?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20190713/7252914b/attachment-0001.sig>


More information about the libvir-list mailing list