[linux-lvm] Uncache a LV when a cache PV is gone, bug ?

Fri Aug 21 07:21:39 UTC 2015

Dne 20.8.2015 v 18:09 Dragan Milivojević napsal(a):
> Hi all
>
> I'm testing a recovery scenario for a NAS server which uses an SSD as
> a PV for the LVM cache (dm-cache).
> When I remove the SSD and try to uncache the LV I get this:
>
>   [root at storage ~]# lvconvert -v --force --uncache /dev/total_storage/test
>    WARNING: Device for PV yJvPgB-aPlc-wFG2-DL9U-MOKI-2F93-XlzHyf not
> found or rejected by a filter.
>      There are 1 physical volumes missing.
>    Cannot change VG total_storage while PVs are missing.
>    Consider vgreduce --removemissing.
>      There are 1 physical volumes missing.
>
> [root at storage ~]# vgreduce -v --force --removemissing total_storage
>      Finding volume group "total_storage"
>    WARNING: Device for PV yJvPgB-aPlc-wFG2-DL9U-MOKI-2F93-XlzHyf not
> found or rejected by a filter.
>      There are 1 physical volumes missing.
>      There are 1 physical volumes missing.
>      Trying to open VG total_storage for recovery...
>    WARNING: Device for PV yJvPgB-aPlc-wFG2-DL9U-MOKI-2F93-XlzHyf not
> found or rejected by a filter.
>      There are 1 physical volumes missing.
>      There are 1 physical volumes missing.
>      Archiving volume group "total_storage" metadata (seqno 9).
>    Removing partial LV test.
>      activation/volume_list configuration setting not defined: Checking
> only host tags for total_storage/test
>      Executing: /usr/sbin/modprobe dm-cache
>      Creating total_storage-cache_pool00_cdata-missing_0_0
>      Loading total_storage-cache_pool00_cdata-missing_0_0 table (253:3)
>      Resuming total_storage-cache_pool00_cdata-missing_0_0 (253:3)
>      Creating total_storage-cache_pool00_cdata
>      Loading total_storage-cache_pool00_cdata table (253:4)
>      Resuming total_storage-cache_pool00_cdata (253:4)
>      Creating total_storage-cache_pool00_cmeta-missing_0_0
>      Loading total_storage-cache_pool00_cmeta-missing_0_0 table (253:5)
>      Resuming total_storage-cache_pool00_cmeta-missing_0_0 (253:5)
>      Creating total_storage-cache_pool00_cmeta
>      Loading total_storage-cache_pool00_cmeta table (253:6)
>      Resuming total_storage-cache_pool00_cmeta (253:6)
>      Creating total_storage-test_corig
>      Loading total_storage-test_corig table (253:7)
>      Resuming total_storage-test_corig (253:7)
>      Executing: /usr/sbin/cache_check -q
> /dev/mapper/total_storage-cache_pool00_cmeta
>
> vgreduce gets stuck at the last step: /usr/sbin/cache_check
>
> If I run cache_check manually I get this:
>
> [root at storage ~]# /usr/sbin/cache_check
> /dev/mapper/total_storage-cache_pool00_cmeta
> examining superblock
>    superblock is corrupt
>      incomplete io for block 0, e.res = 18446744073709551611, e.res2 =
> 0, offset = 0, nbytes = 4096
>
> and it waits indefinitely.
>
> If a replace the /usr/sbin/cache_check with a shell script that returns 0 or 1
> vgreduce just errors out. It seems that there is no way to uncache the
> LV without
> replacing the missing PV (which could pose a problem in production use).
> The origin LV (test_corig) is fine, I can mount it and use it, there
> are no file-system issues etc.
>
> Is this an intended behaviour or a bug?

It's unhandled yet scenario.

Feel free to open BZ at bugzilla.redhat.com

Zdenek