[dm-devel] [lvm-devel] master - multipathd: fix fd leak when iscsi device logs in

Martin Wilck mwilck at suse.com
Mon Jul 13 09:21:19 UTC 2020


Hi Lixiaokeng,

On Mon, 2020-07-13 at 10:15 +0800, lixiaokeng wrote:
> Hi
> 
> Now the number of fd pointing /dev/mapper/control in multipathd
> process
> increases when iscsi device logs in. The reason is that wait_dmevents
> thread and uevqloop thread call _open_and_assign_control_fd
> concurrently.
> If lock add to _open_and_assign_control_fd fun in lvm2/libdm/libdm-
> iface.c,
> the trouble is solved easily but Zdenek Kabelac said that libdm is
> not
> pthread aware/safe library.So the lock could not be added to libdm.
> If
> the lock add to multipath-tools, there will be a lot of positions
> where
> dm_task_run is called and the lock shuold be added. It may degrade
> multipath-tools' performance. I don't have any other good idea about
> this trouble. Do you have some good idea about it? There is an
> another
> problem to me. Multipathd is a process with multi-thread and libdm is
> not pthread aware/safe library, why multipathd use libdm with no
> protect? Thanks.
> 
> The procedure details when fd leak occurs given as follows:
> 1. wait_dmevents thread
> wait_dmevents
> ->dmevent_loop
>     ->dm_get_events (->dm_is_mpath)
>         ->dm_task_run
>             ->_open_control
>                 ->_open_and_assign_control_fd
> 
> 2. uevqloop thread
> uevqloop
> ->uevent_dispatch
>     ->service_uevq
>         ->ev_add_path
>             ->__setup_multipath
>                 ->dm_get_info
>                     ->dm_task_run
>                         ->_open_control
>                             ->_open_and_assign_control_fd
> 
> Lixiaokeng

Thanks for the report and the excellent analysis. The general problem
may perhaps not so bad currently, as multipathd uses the "vecs lock" to
protect its own data structures, and most libdm calls are made under
this lock. dmevent_loop() is one example where the vecs lock is not
taken.

I'm attaching a tentative patch for the particular race you reported, 
which resorts to simply taking the vecs lock im dmevent_loop. Please
review/test. This is tentative, I haven't tested it myself beyond
making sure that it passes the unit test.

To be sure we aren't missing anything we'd need to assess if there are
other libdm calls which are not made under the vecs lock. Short-term I
have to time for a complete assessment. My guess would be that there
are a few, but not many, and most of them not prone to races.

In the long run, we need to think about the general problem of libdm
not being thread-safe. So far we've had very few reports like this
(actually, I'm aware of none), so perhaps the vecs log protects us
quite well already. OTOH, we've discussed repeatedly that the locking
in multipathd is too monolithic and should be more fine-grained; as
soon as we drop the monolithic lock, this might hit us hard. If we
introduce an additional lock for libdm, we'll have to think about
potential deadlock situations (probably easy, as the new lock would
just warp libdm calls and would thus, by definition, be at the bottom
of any lock hierarchy in libmultipath).

@Zdenek, do we have to protect every libdm call, or is it sufficient
to protect only dm_task_run(), as lixiaokeng suggested?

@Ben, please take a look as well, as you're the main author of the
dmevents code and know libdm better than me.

Regards,
Martin


> 
> On 2020/7/9 16:54, Zdenek Kabelac wrote:
> > Dne 09. 07. 20 v 9:02 lixiaokeng napsal(a):
> > > When one iscsi device logs in and logs out several times, the
> > > number of fd, which points to '/dev/mapper/control', increases in
> > > /proc/<multipathd-pid>/fd as follows,
> > > [root at localhost fd]# ll | grep control
> > > diff --git a/libdm/ioctl/libdm-iface.c b/libdm/ioctl/libdm-
> > > iface.c
> > > index 7ad549c..7168369 100644
> > > --- a/libdm/ioctl/libdm-iface.c
> > > +++ b/libdm/ioctl/libdm-iface.c
> > > @@ -23,6 +23,7 @@
> > >   #include <sys/ioctl.h>
> > >   #include <sys/utsname.h>
> > >   #include <limits.h>
> > > +#include <pthread.h>
> > > 
> > >   #ifdef __linux__
> > >   #  include "libdm/misc/kdev_t.h"
> > > @@ -81,6 +82,7 @@ static dm_bitset_t _dm_bitset = NULL;
> > >   static uint32_t _dm_device_major = 0;
> > > 
> > >   static int _control_fd = -1;
> > > +static pthread_mutex_t _control_fd_mutex =
> > > PTHREAD_MUTEX_INITIALIZER;
> > >   static int _hold_control_fd_open = 0;
> > >   static int _version_checked = 0;
> > >   static int _version_ok = 1;
> > > @@ -404,10 +406,19 @@ static void _close_control_fd(void)
> > >   #ifdef DM_IOCTLS
> > >   static int _open_and_assign_control_fd(const char *control)
> > >   {
> > > +    pthread_mutex_lock(&_control_fd_mutex);
> > > +
> > > +    if (_control_fd != -1) {
> > 
> > 
> > Hi
> > 
> > libdm is not pthread aware/safe library.
> > 
> > So the fix needs to happen on libdm user-side to prevent race call
> > of this function.
> > 
> > 
> > Zdenek
> > 
> > 
> > .

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-libmultipath-dmevent_loop-hold-vecs-lock.patch
Type: text/x-patch
Size: 7160 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20200713/a74feec6/attachment.bin>


More information about the dm-devel mailing list