[dm-devel] How do you force-close a dm device after a disk failure?

Mon Sep 14 09:45:52 UTC 2015

> Whole dm  table with all deps needs to be known.

$ dmsetup table
backup: 0 11720531968 crypt aes-xts-plain64
  0000000000000000000000000000000000000000000000000000000000000000 0
  9:10 4096

$ dmsetup status
backup: 0 11720531968 crypt

$ dmsetup ls --tree
backup (253:0)
 └─ (9:10)

$ dmsetup info -f
Name:              backup
State:             ACTIVE (DEFERRED REMOVE)
Read Ahead:        4096
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 0
Number of targets: 1
UUID: CRYPT-LUKS1-d0b3d38e421545908537dc50f59fb217-backup

All I'm using it for is to encrypt an mdadm-style RAID array composed
of two external disks, connected temporarily via USB to do a full
system backup with rsync.

> > I'm not sure how to do this, could you please elaborate?  I thought
> > "dmsetup remove --force" would do this but as that doesn't work
> 
> really state of whole table needs to be known.
> 
> >> Also note - dmsetup remove  supports --deferred removal (see man
> >> page).
> >
> > Oh I didn't notice that.  It doesn't seem to have much of an effect
> > though:
> 
> Sure it will not fix your problem - it's like lazy umount...

So replacing the table with the 'error' target won't release the
underlying device, even though that device is not used by the new
target?

> What is not clear to me is - what is your expectation here ?
> Obviously your system is far more broken - so placing 'error' target
> for your backup device will not fix it.
> 
> You should likely attach also portion of 'dmesg' - there surely will
> be written what is going wrong with your system.

What happened was in the middle of the backup, there was some USB
interruption and the disks dropped out, so the writes started failing.
The kernel logs were full of write errors to various sector numbers.  I
think you would have the same result if you set things up with a USB
stick and then unplugged it during a data transfer.

The devices are connected like this:

  dm device "backup"
   |
   +-- mdadm device /dev/md10
        |
        +-- USB/SATA disk A (/dev/sdd)
        |
        +-- USB/SATA disk B (/dev/sde)

The problem is that I can't just reconnect the disks and rerun the
backup.  mdadm refuses to stop the RAID array as it is in use by
the dm device, and it thinks the array is active despite the disks being
unplugged and in a drawer.  If I reconnect the disks they appear as
different devices (sdf and sdg) but I still can't start the "new" array
from these new disk devices, as it tells me the disks are already part
of an active array.

So the only way I can have another go at running this backup is to
close down /dev/md10, and it seems the only way I can do that is to
tell dm to release that device.  It doesn't matter if the dm device
"backup" is unusable, I will just create "backup2" to use for the
second attempt.

But until I can figure out how to get dm to release the underlying
device, I'm stuck!

> i.e. you cannot expect 'remove --force' will work when your machine
> start to show kernel errors.

There were no kernel crashes, just errors related to USB transfers.  I
would assume this is not much different to how a real failed disk might
behave, so I figure it is a situation that should be encountered
relatively often!

Thanks again,
Adam.