[dm-devel] Can not remove device. No files open, no processes attached. Forced to reboot server.

Roger Heflin rogerheflin at gmail.com
Mon Feb 7 20:14:27 UTC 2022


On Mon, Feb 7, 2022 at 3:35 AM Aidan Walton <aidan.walton at gmail.com> wrote:
>
> Hi,
> I've been chasing a problem now for a few weeks. I have a flaky SATA
> controller that fails unpredictably and upon doing so all drives
> attached are disconnected by the kernel. I have 2 discs on this
> controller which are the components of a RAID1 array. mdraid fails the
> disc (in its strange way) stating that one device is removed and the
> other is active. Apparently this is the default mdraid approach. Even
> though both devices are in fact failed. Regardless, the devmapper
> device which is supporting an LVM logical volume on top of this raid
> array, remains active. The logical volume is no longer listed by
> lvdisplay, but dmsetup -c info shows:
> Name                                Maj Min Stat Open Targ Event  UUID
> storage.mx.vg2-shared_sun_NAS.lv1   253   2 L--w    1    1      0
> LVM-Ud9pj6QE4hK1K3xiAFMVCnno3SrXaRyTXJLtTGDOPjBUppJgzr4t0jJowixEOtx7
> storage.mx.vg1-shared_sun_users.lv1 253   1 L--w    1    1      0
> LVM-ypcHlbNXu36FLRgU0EcUiXBSIvcOlHEP3MHkBKsBeHf6Q68TIuGA9hd5UfCpvOeo
> ubuntu_server--vg-ubuntu_server--lv 253   0 L--w    1    1      0
> LVM-eGBUJxP1vlW3MfNNeC2r5JfQUiKKWZ73t3U3Jji3lggXe8LPrUf0xRE0YyPzSorO
>
> The device in question is 'storage.mx.vg2-shared_sun_NAS.lv1'
>
> As can be seen is displays 'open'
>
> however lsof /dev/mapper/storage.mx.vg2-shared_sun_NAS.lv1
> <blank>
>
> fuser -m /dev/storage.mx.vg1/shared_sun_users.lv1
> <blank>
>
> dmsetup status storage.mx.vg2-shared_sun_NAS.lv1
> 0 976502784 error
>
> dmsetup remove storage.mx.vg2-shared_sun_NAS.lv1
> device-mapper: remove ioctl on storage.mx.vg2-shared_sun_NAS.lv1
> failed: Device or resource busy
>
> dmsetup wipe_table storage.mx.vg2-shared_sun_NAS.lv1
> device-mapper: resume ioctl on storage.mx.vg2-shared_sun_NAS.lv1
> failed: Invalid argument
>
>
> and so on. Nothing appears to be attached to this device but it
> refuses to be removed. As a consequence I can not disable the mdraid
> array and can not recover the controller. Which is possible by
> resetting the pci slot.
>
> Currently the only possible way I have to recover this problem is to
> reboot the server.
>
> Please see.
> https://marc.info/?l=linux-raid&m=164159457011525&w=2
>
> for the discussion regarding the same problem on the linux-raid mailing list.
> No progress so far, help appreciated
> Aidan
>


Was the filesystem mounted when this happened and if so how did you
get it unmounted?  If the filesystem is mounted and has any dirty
cache, that cache won't flush with the device missing and won't allow
the device to be umounted.

The in-kernel opens will not show for lsof (mounts, nfs exports on the
fs, and probably other direct users in the kernel of the lv), so
likely one of those is still there.




More information about the dm-devel mailing list