[dm-devel] multipathd(8) and multipath(8) memory problems

Mon Apr 4 19:30:51 UTC 2005

Several multipath related issues with linux-2.6.11-rc3-udm2 and
multipath-tools-0.4.4-pre5.  Still pursuing the EMC CLARiion NDU
use case using 16 dd write processes, each one writing 1GB to
a single FC CLARiion LU with 4 paths.

Sometimes multipathd dies.  I've noticed this for about 1 week.

I've also noticed that sometimes my host is left in a state where
where the dd write threads are stuck due to dirty page congestion
since several logical units are being left in an all-paths-down state
because there is no multipath running to restore the kernel state
for currently usable paths.

In this state, the machine also showed
the log_pthread, checkerloop, waiterloop, and 5 or 6 waitevent
multipathd pthreads blocked on the same kernel futex.  Difficult
to debug, but I think this must be the logq_lock (or the logev_lock)
since I don't see any other lock which is taken by each of the
threads mentioned above.  Not sure what's happening here --
I would have expected the entire multipathd process to have died
if one of the pthreads took a SIGSEGV.  Any ideas?

With full debugging  enabled on a multipath executable on a
machine in the state described above, a manually invoked
multipath was dying with a SIGSEGV.  Last debug message
indicated an inability to allocate a multipath priority group vector in
group_by_prio.  Subsequent de-reference of the mpp->pg field in
numerous other places in the code could cause the SIGSEGV.

I'm surmising that possibly both multipathd and multipath invoked
from multipathd are occasionally dying in this test scenario due to
similar memory allocation difficulties and later taking a SIGSEGV
when de-referencing uninitialized ptrs to un-allocated memory.