[dm-devel] How do you force-close a dm device after a disk failure?

Zdenek Kabelac zkabelac at redhat.com
Mon Sep 21 17:50:57 UTC 2015


Dne 21.9.2015 v 13:39 Lars Ellenberg napsal(a):
> On Sat, Sep 19, 2015 at 07:47:52PM +1000, Adam Nielsen wrote:
>>> Was this the 'ONLY' dmsetup in your listing (i.e. you reproduced case
>>> again)?
>>
>> This was the original instance of the problem.  Today I have rebooted
>> and reproduced the problem on a fresh kernel.
>>
>>> I mean - your existing reported situation was already hopeless and
>>> needed reboot - as if  flushing suspend holds some mutexes - no other
>>> suspend call can fix it ->  you usually have just  1 chance to fix it
>>> in right way, if you go wrong way reboot is unavoidable.
>>
>> That sounds like a very unforgiving buggy kernel, if you only have one
>> chance to fix the problem ;-)
>>
>> Here is my attempt on the fresh kernel.  I received some write errors
>> in dmesg, so tried to umount the dm device to confirm I had reproduced
>> the problem, and when umount failed to exit I tried this:
>>
>>    $ dmsetup reload backup --table "0 11720531968 error"
>>    $ dmsetup suspend --noflush --nolockfs backup
>
> You need to *resume* to activate the new table.
>
>> These two worked fine now.  "dmsetup suspend" was locking up before,
>> this time it worked.
>>
>>    $ umount /mnt/backup
>>    umount: /mnt/backup: not mounted
>>
>> The dm instance is no longer mounted.
>>
>>    $ mdadm --manage --stop /dev/md10
>>    mdadm: Cannot get exclusive access to /dev/md10:Perhaps a running
>>      process, mounted filesystem or active volume group?
>
> Also, as mentioned before, why don't you
> mdadm /dev/md10 --fail /dev/sdd --remove /dev/sdd
> mdadm /dev/md10 --fail /dev/sde --remove /dev/sde
> (for whatever sdX members it currently has;
> or maybe combine in one command line, if that is supposed to work)
>
> Should kick out the disks from the MD,
> should make md10 fail all pending (and new) requests,
> should even get the stuck dm suspend going again
> (the implicit "flush" one, not the --noflush one,
> as that did not get stuck anyways).
>
>> I can't restart the underlying RAID array though, as the dm instance is
>> still holding onto the devices.
>>
>>    $ dmsetup remove --force backup
>>    device-mapper: remove ioctl on backup failed: Device or resource busy
>>    Command failed
>
> You need to *resume* the new (error) table.
> Or the previous table is only suspended, but still holds references.
>


There is a condition which may prevent replacement dm table.

If the 'dm' target has in-progress bio operation and the underlying device is 
not responding (acking bio completed),  you can't suspend such targeted with 
bio-in-progress.

It's not trivial to improve this.

So if you happen to 'deadlock' in this state - there is currently no other 
help then rebooting machine if you want to get rid of such 'frozen' device.

On the other hand - from what was said -  'dropping' USB disk out of system 
should not be causing such state.

So probably more details from logs need to be know for knowing more about this.


Zdenek





More information about the dm-devel mailing list