[dm-devel] How do you force-close a dm device after a disk failure?

Mon Sep 14 10:04:25 UTC 2015

Dne 14.9.2015 v 11:45 Adam Nielsen napsal(a):
>> Whole dm  table with all deps needs to be known.
>
> $ dmsetup table
> backup: 0 11720531968 crypt aes-xts-plain64
>    0000000000000000000000000000000000000000000000000000000000000000 0
>    9:10 4096
>
> $ dmsetup status
> backup: 0 11720531968 crypt
>
> $ dmsetup ls --tree
> backup (253:0)
>   └─ (9:10)
>
> $ dmsetup info -f
> Name:              backup
> State:             ACTIVE (DEFERRED REMOVE)
> Read Ahead:        4096
> Tables present:    LIVE
> Open count:        1
> Event number:      0
> Major, minor:      253, 0
> Number of targets: 1
> UUID: CRYPT-LUKS1-d0b3d38e421545908537dc50f59fb217-backup
>
> All I'm using it for is to encrypt an mdadm-style RAID array composed
> of two external disks, connected temporarily via USB to do a full
> system backup with rsync.
>
>>> I'm not sure how to do this, could you please elaborate?  I thought
>>> "dmsetup remove --force" would do this but as that doesn't work
>>
>> really state of whole table needs to be known.
>>
>>>> Also note - dmsetup remove  supports --deferred removal (see man
>>>> page).
>>>
>>> Oh I didn't notice that.  It doesn't seem to have much of an effect
>>> though:
>>
>> Sure it will not fix your problem - it's like lazy umount...
>
> So replacing the table with the 'error' target won't release the
> underlying device, even though that device is not used by the new
> target?
>
>> What is not clear to me is - what is your expectation here ?
>> Obviously your system is far more broken - so placing 'error' target
>> for your backup device will not fix it.
>>
>> You should likely attach also portion of 'dmesg' - there surely will
>> be written what is going wrong with your system.
>
> What happened was in the middle of the backup, there was some USB
> interruption and the disks dropped out, so the writes started failing.
> The kernel logs were full of write errors to various sector numbers.  I
> think you would have the same result if you set things up with a USB
> stick and then unplugged it during a data transfer.
>
> The devices are connected like this:
>
>    dm device "backup"
>     |
>     +-- mdadm device /dev/md10
>          |
>          +-- USB/SATA disk A (/dev/sdd)
>          |
>          +-- USB/SATA disk B (/dev/sde)
>
> The problem is that I can't just reconnect the disks and rerun the
> backup.  mdadm refuses to stop the RAID array as it is in use by
> the dm device, and it thinks the array is active despite the disks being
> unplugged and in a drawer.  If I reconnect the disks they appear as
> different devices (sdf and sdg) but I still can't start the "new" array
> from these new disk devices, as it tells me the disks are already part
> of an active array.
>
> So the only way I can have another go at running this backup is to
> close down /dev/md10, and it seems the only way I can do that is to
> tell dm to release that device.  It doesn't matter if the dm device
> "backup" is unusable, I will just create "backup2" to use for the
> second attempt.
>
> But until I can figure out how to get dm to release the underlying
> device, I'm stuck!
>
>> i.e. you cannot expect 'remove --force' will work when your machine
>> start to show kernel errors.
>
> There were no kernel crashes, just errors related to USB transfers.  I
> would assume this is not much different to how a real failed disk might
> behave, so I figure it is a situation that should be encountered
> relatively often!
>

dmsetup reload backup --table "0 11720531968 error"
dmsetup suspend --noflush backup
dmsetup resume backup

Is this working for you ?

Zdenek