fsck failing to notice that the block device was pulled out from under it?

Tomas Pospisek tpo2 at sourcepole.ch
Mon May 25 21:59:44 UTC 2015


Hello,

tl;dr: it seems like fsck fails to notice when the block device
disappears from under it.

I have the following setup:

* external USB disk
* a partition with LUKS in it
* ext4 filesystem inside the LUKS block device

While doing backups to it I noticed that after some time backups would
fail with an error (failed to write, ).

In the following log I'm attaching the disk, enabling LUKS and mounting
the disk and then starting the backup, which will fail after a while:

May 25 11:39:18 hier kernel: [76241.727367] usb 4-1.1: new high-speed
USB device number 6 using ehci-pci
May 25 11:39:19 hier kernel: [76241.881892] usb 4-1.1: New USB device
found, idVendor=0bc2, idProduct=3320
May 25 11:39:19 hier kernel: [76241.881902] usb 4-1.1: New USB device
strings: Mfr=2, Product=3, SerialNumber=1
May 25 11:39:19 hier kernel: [76241.881907] usb 4-1.1: Product:
Expansion Desk
May 25 11:39:19 hier kernel: [76241.881911] usb 4-1.1: Manufacturer: Seagate
May 25 11:39:19 hier kernel: [76241.881915] usb 4-1.1: SerialNumber:
NA4JDN4N
May 25 11:39:19 hier kernel: [76241.882652] usb-storage 4-1.1:1.0: USB
Mass Storage device detected
May 25 11:39:19 hier kernel: [76241.883007] scsi7 : usb-storage 4-1.1:1.0
May 25 11:39:20 hier kernel: [76242.907527] usb 4-1.1: USB disconnect,
device number 6
May 25 11:39:30 hier kernel: [76252.987833] usb 4-1.1: new high-speed
USB device number 7 using ehci-pci
May 25 11:39:30 hier kernel: [76253.142659] usb 4-1.1: New USB device
found, idVendor=0bc2, idProduct=3320
May 25 11:39:30 hier kernel: [76253.142669] usb 4-1.1: New USB device
strings: Mfr=2, Product=3, SerialNumber=1
May 25 11:39:30 hier kernel: [76253.142674] usb 4-1.1: Product:
Expansion Desk
May 25 11:39:30 hier kernel: [76253.142678] usb 4-1.1: Manufacturer: Seagate
May 25 11:39:30 hier kernel: [76253.142681] usb 4-1.1: SerialNumber:
NA4JDN4N
May 25 11:39:30 hier kernel: [76253.143465] usb-storage 4-1.1:1.0: USB
Mass Storage device detected
May 25 11:39:30 hier kernel: [76253.144285] scsi8 : usb-storage 4-1.1:1.0
May 25 11:39:31 hier kernel: [76254.144816] scsi 8:0:0:0: Direct-Access
    Seagate  Expansion Desk   070B PQ: 0 ANSI: 6
May 25 11:39:31 hier kernel: [76254.145528] sd 8:0:0:0: Attached scsi
generic sg1 type 0
May 25 11:39:31 hier kernel: [76254.146193] sd 8:0:0:0: [sdb] 732566645
4096-byte logical blocks: (3.00 TB/2.72 TiB)
May 25 11:39:31 hier kernel: [76254.147465] sd 8:0:0:0: [sdb] Write
Protect is off
May 25 11:39:31 hier kernel: [76254.147477] sd 8:0:0:0: [sdb] Mode
Sense: 43 00 00 00
May 25 11:39:31 hier kernel: [76254.148657] sd 8:0:0:0: [sdb] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 25 11:39:31 hier kernel: [76254.149907] sd 8:0:0:0: [sdb] 732566645
4096-byte logical blocks: (3.00 TB/2.72 TiB)
May 25 11:39:31 hier kernel: [76254.179482]  sdb: sdb1 sdb2
May 25 11:39:31 hier kernel: [76254.181592] sd 8:0:0:0: [sdb] 732566645
4096-byte logical blocks: (3.00 TB/2.72 TiB)
May 25 11:39:31 hier kernel: [76254.184058] sd 8:0:0:0: [sdb] Attached
SCSI disk
May 25 11:39:40 hier kernel: [76263.066639] device-mapper: uevent:
version 1.0.3
May 25 11:39:40 hier kernel: [76263.066841] device-mapper: ioctl:
4.27.0-ioctl (2013-10-30) initialised: dm-devel at redhat.com
May 25 11:39:40 hier kernel: [76263.706456] NET: Registered protocol
family 38
May 25 11:39:41 hier kernel: [76263.976836] sha256_ssse3: Using AVX
optimized SHA-256 implementation
May 25 11:39:45 hier kernel: [76268.130879] EXT4-fs (dm-0): mounted
filesystem with ordered data mode. Opts: (null)
May 25 12:39:32 hier kernel: [79854.051465] usb 4-1.1: USB disconnect,
device number 7
May 25 12:39:32 hier kernel: [79854.052460] sd 8:0:0:0: [sdb]
Synchronizing SCSI cache
May 25 12:39:32 hier kernel: [79854.052634] sd 8:0:0:0: [sdb]
May 25 12:39:32 hier kernel: [79854.052636] Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
May 25 12:39:32 hier kernel: [79854.250864] usb 4-1.1: new high-speed
USB device number 8 using ehci-pci
May 25 12:39:32 hier kernel: [79854.405885] usb 4-1.1: New USB device
found, idVendor=0bc2, idProduct=3320
May 25 12:39:32 hier kernel: [79854.405895] usb 4-1.1: New USB device
strings: Mfr=2, Product=3, SerialNumber=1
May 25 12:39:32 hier kernel: [79854.405900] usb 4-1.1: Product:
Expansion Desk
May 25 12:39:32 hier kernel: [79854.405904] usb 4-1.1: Manufacturer: Seagate
May 25 12:39:32 hier kernel: [79854.405908] usb 4-1.1: SerialNumber:
NA4JDN4N
May 25 12:39:32 hier kernel: [79854.406710] usb-storage 4-1.1:1.0: USB
Mass Storage device detected
May 25 12:39:32 hier kernel: [79854.407068] scsi9 : usb-storage 4-1.1:1.0
May 25 12:39:32 hier kernel: [79854.655950] Buffer I/O error on device
dm-0, logical block 464117312
May 25 12:39:32 hier kernel: [79854.655959] Buffer I/O error on device
dm-0, logical block 464117312
May 25 12:39:33 hier kernel: [79855.407979] scsi 9:0:0:0: Direct-Access
    Seagate  Expansion Desk   070B PQ: 0 ANSI: 6
May 25 12:39:33 hier kernel: [79855.408847] sd 9:0:0:0: Attached scsi
generic sg1 type 0
May 25 12:39:33 hier kernel: [79855.410055] sd 9:0:0:0: [sdc] 732566645
4096-byte logical blocks: (3.00 TB/2.72 TiB)
May 25 12:39:33 hier kernel: [79855.411191] sd 9:0:0:0: [sdc] Write
Protect is off
May 25 12:39:33 hier kernel: [79855.411202] sd 9:0:0:0: [sdc] Mode
Sense: 43 00 00 00
May 25 12:39:33 hier kernel: [79855.412402] sd 9:0:0:0: [sdc] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 25 12:39:33 hier kernel: [79855.413245] sd 9:0:0:0: [sdc] 732566645
4096-byte logical blocks: (3.00 TB/2.72 TiB)
May 25 12:39:37 hier kernel: [79858.940027]  sdc: sdc1 sdc2
May 25 12:39:37 hier kernel: [79858.942029] sd 9:0:0:0: [sdc] 732566645
4096-byte logical blocks: (3.00 TB/2.72 TiB)
May 25 12:39:37 hier kernel: [79858.944241] sd 9:0:0:0: [sdc] Attached
SCSI disk
May 25 12:39:51 hier kernel: [79872.773254] Buffer I/O error on device
dm-0, logical block 17
May 25 12:39:51 hier kernel: [79872.773259] lost page write due to I/O
error on dm-0
May 25 12:39:51 hier kernel: [79872.773295] Buffer I/O error on device
dm-0, logical block 68681744
May 25 12:39:51 hier kernel: [79872.773297] lost page write due to I/O
error on dm-0
May 25 12:39:51 hier kernel: [79872.773327] Buffer I/O error on device
dm-0, logical block 68681774
May 25 12:39:51 hier kernel: [79872.773328] lost page write due to I/O
error on dm-0
May 25 12:39:51 hier kernel: [79872.773359] Buffer I/O error on device
dm-0, logical block 68681983
May 25 12:39:51 hier kernel: [79872.773360] lost page write due to I/O
error on dm-0
May 25 12:39:51 hier kernel: [79872.773390] Buffer I/O error on device
dm-0, logical block 68690183
May 25 12:39:51 hier kernel: [79872.773391] lost page write due to I/O
error on dm-0
May 25 12:39:51 hier kernel: [79872.773421] Buffer I/O error on device
dm-0, logical block 71303343
May 25 12:39:51 hier kernel: [79872.773422] lost page write due to I/O
error on dm-0
May 25 12:39:51 hier kernel: [79872.773452] Buffer I/O error on device
dm-0, logical block 72876154
May 25 12:39:51 hier kernel: [79872.773453] lost page write due to I/O
error on dm-0
May 25 12:39:51 hier kernel: [79872.773483] Buffer I/O error on device
dm-0, logical block 77070492
May 25 12:39:51 hier kernel: [79872.773485] lost page write due to I/O
error on dm-0
May 25 12:39:51 hier kernel: [79872.773514] Buffer I/O error on device
dm-0, logical block 98566225
May 25 12:39:51 hier kernel: [79872.773515] lost page write due to I/O
error on dm-0
May 25 12:39:51 hier kernel: [79872.773558] Aborting journal on device
dm-0-8.
May 25 12:39:51 hier kernel: [79872.773588] Buffer I/O error on device
dm-0, logical block 231768064
May 25 12:39:51 hier kernel: [79872.773590] lost page write due to I/O
error on dm-0
May 25 12:39:51 hier kernel: [79872.773600] JBD2: Error -5 detected when
updating journal superblock for dm-0-8.
May 25 12:39:51 hier kernel: [79872.773817] EXT4-fs warning (device
dm-0): ext4_end_bio:317: I/O error -19 writing to inode 17173338 (offset
2490368 size 131072 starting block 148009568)
May 25 12:39:51 hier kernel: [79872.773821] Buffer I/O error on device
dm-0, logical block 148009568
May 25 12:39:51 hier kernel: [79872.773827] Buffer I/O error on device
dm-0, logical block 148009569
May 25 12:39:51 hier kernel: [79872.773829] Buffer I/O error on device
dm-0, logical block 148009570
May 25 12:39:51 hier kernel: [79872.773830] Buffer I/O error on device
dm-0, logical block 148009571
May 25 12:39:51 hier kernel: [79872.773832] Buffer I/O error on device
dm-0, logical block 148009572
May 25 12:39:51 hier kernel: [79872.773834] Buffer I/O error on device
dm-0, logical block 148009573
May 25 12:39:51 hier kernel: [79872.773836] Buffer I/O error on device
dm-0, logical block 148009574
May 25 12:39:51 hier kernel: [79872.773838] Buffer I/O error on device
dm-0, logical block 148009575
May 25 12:39:51 hier kernel: [79872.773840] Buffer I/O error on device
dm-0, logical block 148009576
May 25 12:39:51 hier kernel: [79872.773841] Buffer I/O error on device
dm-0, logical block 148009577
May 25 12:39:51 hier kernel: [79872.773898] EXT4-fs warning (device
dm-0): ext4_end_bio:317: I/O error -19 writing to inode 17173338 (offset
2490368 size 131072 starting block 148009598)
May 25 12:39:51 hier kernel: [79872.773948] EXT4-fs warning (device
dm-0): ext4_end_bio:317: I/O error -19 writing to inode 40770054 (offset
3145728 size 8192 starting block 147922176)
May 25 12:39:51 hier kernel: [79872.773975] EXT4-fs warning (device
dm-0): ext4_end_bio:317: I/O error -19 writing to inode 40770076 (offset
3887104 size 4096 starting block 5281717)
May 25 12:39:51 hier kernel: [79873.230984] EXT4-fs error (device dm-0):
ext4_put_super:790: Couldn't clean up the journal
May 25 12:39:51 hier kernel: [79873.230990] EXT4-fs (dm-0): Remounting
filesystem read-only
May 25 12:39:51 hier kernel: [79873.230992] EXT4-fs (dm-0): previous I/O
error to superblock detected
May 25 12:44:07 hier kernel: [80128.859704] EXT4-fs (dm-0): recovery
complete
May 25 12:44:07 hier kernel: [80128.878030] EXT4-fs (dm-0): mounted
filesystem with ordered data mode. Opts: (null)

Which layer is saying "Buffer I/O error on device dm-0, logical block 0"
? Is it the LUKS module?

After one more round of trying to do a backup and failing I decided to
fsck the partition (the partition was unlocked via LUKS then):

    # fsck.ext4 -c -c -k
/dev/mapper/luks-446c3a20-eb34-45f6-9ff9-d4a5b17fdedd

That gave me after a while of running without error reports a never
ending series of fsck wanting me to confirm its actions. Stuff like:

    die Block-Bitmap (24117248) von Gruppe 736 ist ungültig.
Zurücksetzen<j>? ja

Since I mount that disk with `gnome-disk` I noticed there, that the lock
on the LUKS encrypted parition had in the meantime become *closed*
again, without me doing anything.

Nevertheless fsck was happily continuing with its disk check.

So I think there are a few parts broken in this chain of layers. The one
that I can put a finger on is that fsck should notice or should be
notified when the block device under it ceases to exist, as is the case
when the LUKS device becomes locked again.

I'm not sure why fsck doesn't notice. Doesn't it get the right
information from the LUKS block device?

The end result of this is, that my backups are lost. The disk can still
be read, but LUKS is no more able to decipher it.

I'm not sure how to go forward from here. I know that I had a series of
problems with external USB drives allready:

* I've had to have my USB port re-soldered (one year ago) because
contact with attached devices had become unreliable - they would
disapear and reappear in the kernel log. After having been resoldered,
the port was reliable.

* ca. 7 months ago I had to throw away an external USB drive because
again it would be disappearing in `gnome-disk` after a few minutes of
doings backups onto it (same setup as today)

The current drive and the current re-soldered port were working reliably
since then though.

So I'm not sure which parts of the chain are broken now. However I think
that:

1. the information of "the device is gone" or "has problems" should be
logged by every layer from the bottom up to the top

However I am only able to see the USB layer do that:

  "usb 4-1.1: USB disconnect, device number 7"

I can't see any log message from the usb-storage that would let me know
that the device has disappeared.

Neither can I see LUKS telling me that it is relocking the device (which
`gnome-disk` is telling me it did).

Nor can I see ext4 telling me that the block device under it is gone.

2. the information of "the device is gone" should be passed from the
first layer that has that problem up through all following layers.

Why doesn't this apparently happen?

What's the way forward to fix this?

*t




More information about the Ext3-users mailing list