[Cluster-devel] [PATCH] gfs2: Prevent writeback in gfs2_file_write_iter

Fri Mar 15 20:58:12 UTC 2019

Hi Ross,

On Thu, 14 Mar 2019 at 12:18, Ross Lagerwall <ross.lagerwall at citrix.com> wrote:
> On 3/13/19 5:13 PM, Andreas Gruenbacher wrote:
> > Hi Edwin,
> >
> > On Wed, 6 Mar 2019 at 12:08, Edwin Török <edvin.torok at citrix.com>
> > wrote:
> >> Hello,
> >>
> >> I've been trying to debug a GFS2 deadlock that we see in our lab
> >> quite frequently with a 4.19 kernel. With 4.4 and older kernels we
> >> were not able to reproduce this.
> >> See below for lockdep dumps and stacktraces.
> >
> > thanks for the thorough bug report.  Does the below fix work for
> > you?
> >
> Hi Andreas,
>
> I've tested the patch and it doesn't fix the issue. As far as I can see,
> current->backing_dev_info is not used by any of the code called from
> balance_dirty_pages_ratelimited() so I don't see how it could work.

yes, I see now.

> I found a way of consistently reproducing the issue almost immediately
> (tested with the latest master commit):
>
> # cat a.py
> import os
>
> fd = os.open("f", os.O_CREAT|os.O_TRUNC|os.O_WRONLY)
>
> for i in range(1000):
>      os.mkdir("xxx" + str(i), 0777)
>
> buf = 'x' * 4096
>
> while True:
>      count = os.write(fd, buf)
>      if count <= 0:
>          break
>
> # cat b.py
> import os
> while True:
>    os.mkdir("x", 0777)
>    os.rmdir("x")
>
> # echo 8192 > /proc/sys/vm/dirty_bytes
> # cd /gfs2mnt
> # (mkdir tmp1; cd tmp1; python2 ~/a.py) &
> # (mkdir tmp2; cd tmp2; python2 ~/a.py) &
> # (mkdir tmp3; cd tmp3; python2 ~/b.py) &
>
> This should deadlock almost immediately. One of the processes will be
> waiting in balance_dirty_pages() and holding sd_log_flush_lock and
> several others will be waiting for sd_log_flush_lock.

This doesn't work for me: the python processes don't even start properly
when dirty_bytes is set so low.

> I came up with the following patch which seems to resolve the issue by
> failing to write the inode if it can't take the lock, but it seems
> like a dirty workaround rather than a proper fix:
>
> [...]

Looking at ext4_dirty_inode, it seems that we should just be able to
bail out of gfs2_write_inode an return 0 when PF_MEMALLOC is set in
current->flags.

Also, we should probably add the current->flags checks from
xfs_do_writepage to gfs2_writepage_common.

So what do you get with the below patch?

Thanks,
Andreas

---
 fs/gfs2/aops.c  | 7 +++++++
 fs/gfs2/super.c | 4 ++++
 2 files changed, 11 insertions(+)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 05dd78f..694ff91 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -102,6 +102,13 @@ static int gfs2_writepage_common(struct page *page,
 	pgoff_t end_index = i_size >> PAGE_SHIFT;
 	unsigned offset;
 
+	/* (see xfs_do_writepage) */
+	if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
+			 PF_MEMALLOC))
+		goto redirty;
+	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
+		goto redirty;
+
 	if (gfs2_assert_withdraw(sdp, gfs2_glock_is_held_excl(ip->i_gl)))
 		goto out;
 	if (current->journal_info)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index ca71163..540535c 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -756,6 +756,10 @@ static int gfs2_write_inode(struct inode *inode, struct writeback_control *wbc)
 	int ret = 0;
 	bool flush_all = (wbc->sync_mode == WB_SYNC_ALL || gfs2_is_jdata(ip));
 
+	/* (see ext4_dirty_inode) */
+	if (current->flags & PF_MEMALLOC)
+		return 0;
+
 	if (flush_all)
 		gfs2_log_flush(GFS2_SB(inode), ip->i_gl,
 			       GFS2_LOG_HEAD_FLUSH_NORMAL |
-- 
1.8.3.1