[dm-devel] dm: use noio when sending kobject event

Fri Jul 10 18:22:12 UTC 2020

On Wed, Jul 08 2020 at  3:37pm -0400,
Gabriel Krisman Bertazi <krisman at collabora.com> wrote:

> Mike Snitzer <snitzer at redhat.com> writes:
> 
> > On Wed, Jul 08 2020 at  2:26pm -0400,
> > Gabriel Krisman Bertazi <krisman at collabora.com> wrote:
> >
> >> If I understand it correctly, considering the deadlock you shared, this
> >> doesn't solve the entire issue. For instance, kobject_uevent_env on the
> >> GFP_NOIO thread waits on uevent_sock_mutex, and another thread with
> >> GFP_IO holding the mutex might have triggered the shrinker from inside
> >> kobject_uevent_net_broadcast.  I believe 7e7cd796f277 ("scsi: iscsi: Fix
> >> deadlock on recovery path during GFP_IO reclaim") solved the one you
> >> shared and other similar cases for iSCSI in a different way.
> >
> > I staged a different fix, from Mikulas, for 5.9 that is meant to address
> > the original report, please see:
> > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.9&id=e5bfe9baf23dca211f4b794b651e871032c427ec
> >
> > I'd appreciate it if you could try this commit to se if it fixes the
> > original issue you reported.
> 
> I reverted 7e7cd796f277 and cherry-picked e5bfe9baf23dc on my tree.
> After a few iterations, I could see the conditions that formerly
> triggered the deadlock happening, but this patch successfully allowed
> the reclaim to succeed and the iscsi recovery thread to run.
> 
> My reproducer is a bit artificial, as I wrote it only from only the
> problem description provided by google.  They were hitting this in
> production and might have a better final word on the fix, though I know
> they don't have a simple way to reproduce it.

Nice job on getting together a reproducer that even begins to model
the issue google's production setup teased out.

Thanks for testing, I've added your Tested-by to the commit.

Mike