[Cluster-devel] [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks

Andreas Gruenbacher agruenba at redhat.com
Thu Aug 19 21:39:56 UTC 2021

On Thu, Aug 19, 2021 at 10:14 PM Linus Torvalds
<torvalds at linux-foundation.org> wrote:
> On Thu, Aug 19, 2021 at 12:41 PM Andreas Gruenbacher
> <agruenba at redhat.com> wrote:
> >
> > Hmm, what if GUP is made to skip VM_IO vmas without adding anything to
> > the pages array? That would match fault_in_iov_iter_writeable, which
> > is modeled after __mm_populate and which skips VM_IO and VM_PFNMAP
> > vmas.
> I don't understand what you mean.. GUP already skips VM_IO (and
> VM_PFNMAP) pages. It just returns EFAULT.
> We could make it return another error. We already have DAX and
> Of course, I think some code ends up always just returning "number of
> pages looked up" and might return 0 for "no pages" rather than the
> error for the first page.
> So we may end up having interfaces that then lose that explanation
> error code, but I didn't check.
> But we couldn't make it just say "skip them and try later addresses",
> if that is what you meant. THAT makes no sense - that would just make
> GUP look up some other address than what was asked for.

get_user_pages has a start and a nr_pages argument, which specifies an
address range from start to start + nr_pages * PAGE_SIZE. If pages !=
NULL, it adds a pointer to that array for each PAGE_SIZE subpage. I
was thinking of skipping over VM_IO vmas in that process, so when the
range starts in a mappable vma, runs into a VM_IO vma, and ends in a
mappable vma, the pages in the pages array would be discontiguous;
they would only cover the mappable vmas. But that would make it
difficult to make sense of what's in the pages array. So scratch that

> > > I also do still think that even regardless of that, we want to just
> > > add a FOLL_NOFAULT flag that just disables calling handle_mm_fault(),
> > > and then you can use the regular get_user_pages().
> > >
> > > That at least gives us the full _normal_ page handling stuff.
> >
> > And it does fix the generic/208 failure.
> Good. So I think the approach is usable, even if we might have corner
> cases left.
> So I think the remaining issue is exactly things like VM_IO and
> VM_PFNMAP. Do the fstests have test-cases for things like this? It
> _is_ quite specialized, it might be a good idea to have that.
> Of course, doing direct-IO from special memory regions with zerocopy
> might be something special people actually want to do. But I think
> we've had that VM_IO flag testing there basically forever, so I don't
> think it has ever worked (for some definition of "ever").

The v6 patch queue should handle those cases acceptably well for now,
but I don't think we have tests covering that at all.


More information about the Cluster-devel mailing list