fsck failing to notice that the block device was pulled out from under it?

Theodore Ts'o tytso at mit.edu
Tue May 26 11:43:44 UTC 2015

On Tue, May 26, 2015 at 08:39:24AM +0200, Tomas Pospisek wrote:
> One more question if I may. You have in principle already answered that
> question, however I want to be sure about it. Who is it that is writing
> this to the kernel log:
>     May 25 12:39:51 hier kernel: [79872.773327] Buffer I/O error on
> device dm-0, logical block 68681774
>     May 25 12:39:51 hier kernel: [79872.773328] lost page write due to
> I/O error on dm-0
> is it the layers *below* the ext4 module that are reporting this?

These messages are coming from fs/buffer.c.  What component was
calling the buffer cache is not evident your log excerpt.  Note that
"the ext4 module" is different from fsck, which was what you were
asking earlier.  If the file system was *mounted*, then it was
probably be from the file system layer (whether it was mounted using
ext3, ext4, vfat, etc.)

Assuming modern kernels, reads and writes to the block device (such as
from user space programs such as e2fsck) don't end up going through
the buffer cache.

The bottom line is that if your USB interface is flaky (whether it is
caused by a problem in your connector, the USB cable, the USB
controller in the hard drive, the host USB etc.) there's not a whole
lot that upper layers can do.  What should happen though is that when
a USB device disconnects and reconnects, it shows up as a new block
device.  So it should not be automatically reconnected to the LUKS
device unless something like gnome-disk is "helpfully" doing this.
And even if it is doing that, if the dm-crypt device can't have its
key established it *should* have simply refused reads and writes, and
not doing something silly like passing the reads and writes through
even though it couldn't do the encryption/decryption.  Finally, if
e2fsck gets an I/O error reading or writing from the block device, it
will report it to the user and ask whether or not it should continue.

My suggestion is to debug this by breaking it down.  Try using an
unecrypted file system on a USB stick, and try what happens when yank
it out while e2fsck is running.  The USB stick should start reporting
errors, and then e2fsck will report it and ask whether you want to
continue or not.  First try it without any GNOME crap running, then
try it with GNOME running.

Then try what happens when you run something straight forward such as
"dd if=/dev/dm-0 of=/dev/null", and then try yanking and removing the
USB stick, first unencrypted, and then with LUKS running, and then
with LUKS running and with GNOME trying to "help".

Each of the layers in the storage stack is independent, so you should
be able to isolate each layer and test it in isolation.


						- Ted

More information about the Ext3-users mailing list