[dm-devel] new multipath device mistakenly replaced another PV in existing volume group
Neutron Sharc
neutronsharc at gmail.com
Tue Apr 25 18:38:08 UTC 2017
Benjamin: thanks for replying. We have found the root cause to be
stale read buffer at backend.
After zeroing out read buffer upon read miss the problem is gone.
-Shawn
On Tue, Apr 25, 2017 at 9:39 AM, Benjamin Marzinski <bmarzins at redhat.com> wrote:
> On Wed, Apr 19, 2017 at 02:58:47PM -0700, Neutron Sharc wrote:
>> I'm seeing a strange problem (iscsi LUNs + dm-multipath + lvm) that I
>> will walk through an example to explain.
>>
>> I have an iscsi target machine that exposes many iscsi LUNs. Iscsi
>> initiator logs in 4 iscsi LUNs (vol1_[0-3]), creates a multipath
>> device for each LUN (/dev/mapper/vol1_[0-3]), and combines the 4
>> multipath devices into a volume group and LV (vol1/vol1_lv).
>>
>> Then I log in another 4 iscsi LUNs (vol3_[0-3]), create a multipath
>> device for each new LUN (/dev/mapper/vol3_[0-3]). Now there is a
>> strange thing:
>> some new multipath devices (/dev/mapper/vol3_0, /dev/mapper/vol3_2 in
>> this example) replaced existing PVs in vol1. As a result, these new
>> multipath devices have open-count > 0, so I cannot pvcreate on them:
>>
>> $ sudo dmsetup ls --tree
>> vol1-vol1_lv (252:4)
>> ├─vol3_0 (252:9) <== fresh multipath device, should NOT be in vol1
>> │ └─ (65:128)
>> ├─vol3_2 (252:10) <== fresh multipath device, should NOT be in vol1
>> │ └─ (65:144)
>> ├─vol1_1 (252:1)
>> │ └─ (65:48)
>> └─vol1_0 (252:0)
>> └─ (65:32)
>> vol1_3 (252:3) <== was in vol1-vol1_lv, but knocked out
>> └─ (65:16)
>> vol1_2 (252:2) <== was in vol1-vol1_lv, but knocked out
>> └─ (65:64)
>> vol3_3 (252:11)
>> └─ (65:160)
>> vol3_1 (252:12)
>> └─ (65:176)
>>
>>
>> Please note that all these vol3_[0-3] are fresh, without any LVM
>> metadata on them, as shown by pvscan::
>>
>> sudo pvscan --cache /dev/mapper/vol3_0
>> Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
>> Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
>> Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
>>
>>
>> $ sudo pvcreate /dev/mapper/vol3_0 <== this multipath device was
>> mistakenly included into vol1
>>
>> Found duplicate PV 9F6vU9NVBfEq1w3e04T5UreO6fDVPJNy: using
>> /dev/mapper/vol1_3 not /dev/mapper/vol3_0
>> Using duplicate PV /dev/mapper/vol1_3 without holders, replacing
>> /dev/mapper/vol3_0
>> Can't open /dev/mapper/vol3_0 exclusively. Mounted filesystem?
>>
>>
>> ========== Configs I used:
>> BTW, I have enabled lvmetad, my lvm.conf has this:
>> filter = [ "a|/dev/mapper/.*|", "r|.*|" ]
>> global_filter = [ "a|/dev/mapper/.*|", "r|.*|" ]
>>
>> My /etc/multipath.conf is:
>> defaults {
>> user_friendly_names yes
>> path_grouping_policy failover
>> polling_interval 10
>> path_selector "round-robin 0"
>> find_multipaths yes
>> features "1 queue_if_no_path"
>> }
>> blacklist {
>> devnode "^sda[1-9]"
>> }
>> multipaths {
>> multipath {
>> wwid 360000000758757de9fb289cbde12abab
>> alias vol1_0
>> }
>> // more devices
>> }
>>
>> iscsi initiator is on centos 6.5, with pkgs version:
>> lvm2-2.02.143-12.el6.x86_64
>> device-mapper-multipath-0.4.9-100.el6.x86_64
>>
>> iscsi target is tgtd on another Ubuntu machine.
>>
>>
>>
>> Comments are appreciated.
>
> Have you tried this with multipathing removed? Does the same thing still
> happen? multipath shouldn't be changing your LVs. The reassign_maps code
> (which could do that) isn't in RHEL6. Does this work correctly with
> lvmetad turned off?
>
> -Ben
>
>>
>>
>> -Shawn
>>
>> --
>> dm-devel mailing list
>> dm-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
More information about the dm-devel
mailing list