[Virtio-fs] xfstest generic/503 hangs

Max Reitz mreitz at redhat.com
Mon Mar 23 18:18:57 UTC 2020


Hi,

I have this bug report here:
https://bugzilla.redhat.com/show_bug.cgi?id=1813885

And I’m afraid I’m not really making progress on debugging it, so I was
wondering whether any of you might have some insights.

The problem is that the generic/503 xfstest hangs on virtio-fs.  Now, I
don’t know how the reporter got that test to run in the first place,
because for me, it requires fcollapse and fzero, which as far as I can
tell are currently not supported for virtio-fs.

So I first had to disable those requirements, and then let the helper
program (src/t_mmap_collision.c) not test those operations.

Then, the test hangs.  What I could find out so far is that the hang
occurs in src/t_mmap_collision.c’s truncate_down_fn() (run through
run_test(&truncate_down_fn), namely in one of the pread()s.  I can also
see that some of the pread()s before fail with EFAULT.

A bit more context: t_mmap_collision.c opens a test file twice (I think
the idea is that you open it once on an FS with DAX, and once without,
but AFAIU it should work either way).  For the relevant test, it mmap()s
the DAX FD, truncates it, then fallocates it to increase the size again.
 Then it reads from the non-DAX FD.

It does all of that in two threads simultaneously for a second.

The EFAULT seems to come from the guest kernel.  I don’t see virtiofsd
returning an error anywhere.  I don’t know where it comes from exactly,
only that when I replace all occurrences of “EFAULT” by e.g. “EBADSLT”
in mm/, the test crashes instead of hanging, so I take that to mean that
the error comes from something in mm/ (which I suppose isn’t too
unexpected).

The test passes if running the test function in a single thread instead
of two, or if you use a separate TEST_DEV and SCRATCH_DEV – but in the
latter case, you really have two separate files, so the test becomes
rather moot (AFAIU).

The fact that truncate_down_fn() uses fallocate() seems irrelevant.
When you replace it by ftruncate() (i.e. the dax_fd is just first
truncated to 0, and then truncated back to @file_size), the test fails
in the same way.  So maybe there is some interaction between the
ftruncate() and a concurrent pread()?  But where does the EFAULT come from?

Does anyone have any spontaneous ideas? :/


In any case, thanks already for reading this,

Max


(I suppose my plan now is that instead of debugging the kernel further,
I should come up with a simpler reproducer, to see whether the problem
is really just a concurrent ftruncate() + pread() on two FDs that point
to the same file.)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/virtio-fs/attachments/20200323/e1992f95/attachment.sig>


More information about the Virtio-fs mailing list