[Cluster-devel] [PATCH] gfs2: Fix mmap + page fault deadlocks

Andreas Grünbacher andreas.gruenbacher at gmail.com
Fri Jul 2 01:14:30 UTC 2021


Am Fr., 2. Juli 2021 um 02:48 Uhr schrieb Linus Torvalds
<torvalds at linux-foundation.org>:
> On Thu, Jul 1, 2021 at 5:30 PM Linus Torvalds <torvalds at linux-foundation.org> wrote:
> > Of course, if you ask for more data than the file has, that's another
> > thing, but who really does that with direct-IO? And if they do, why
> > should we care about their silly behavior?
>
> Now, if the issue is that people do IO for bigger areas than you have
> memory for, then I think that's a chunking issue. I don't think the
> ITER_IOVEC necessarily needs to be converted to an ITER_BVEC in one
> single go. That would indeed be painful if somebody tries to do some
> huge direct-IO when they just don't have the memory for it.
>
> But the fact is, direct-IO has been an incredible pain-point for
> decades, because it's (a) unusual, (b) buggy and (c) has very little
> overall design and common code.
>
> The three issues are likely intimately tied together.
>
> The iomap code at least has tried to make for much more common code,
> but I really think some direct-IO people should seriously reconsider
> how they are doing things when there are fundamental deadlocks in the
> design.
>
> And I do think that a ITER_IOVEC -> ITER_BVEC conversion function
> might be a really really good idea to solve this problem.

I've tried to explain above how keeping the user-space pages
referenced will just lead to different kinds of deadlocks. That is the
problem with this approach.

> There's even
> a very natural chunking algorithm: try to do as much as possible with
> get_user_pages_fast() - so that if you already *have* the memory, you
> can do the full IO (or at least a big part of it).
>
> And if get_user_pages_fast() only gives you a small area - or nothing
> at all - you chunk it down aggressively, and realize that "oh, doing
> direct-IO when user space is paged out might not be the most optimal
> case".
>
>               Linus

Thanks,
Andreas




More information about the Cluster-devel mailing list