[dm-devel] new multipath device mistakenly replaced another PV in existing volume group

Neutron Sharc neutronsharc at gmail.com
Tue Apr 25 18:38:08 UTC 2017


Benjamin:  thanks for replying.  We have found the root cause to be
stale read buffer at backend.
After zeroing out read buffer upon read miss the problem is gone.


-Shawn

On Tue, Apr 25, 2017 at 9:39 AM, Benjamin Marzinski <bmarzins at redhat.com> wrote:
> On Wed, Apr 19, 2017 at 02:58:47PM -0700, Neutron Sharc wrote:
>> I'm seeing a strange problem (iscsi LUNs + dm-multipath + lvm) that I
>> will walk through an example to explain.
>>
>> I have an iscsi target machine that exposes many iscsi LUNs. Iscsi
>> initiator logs in 4 iscsi LUNs (vol1_[0-3]),  creates a multipath
>> device for each LUN (/dev/mapper/vol1_[0-3]), and combines the 4
>> multipath devices into a volume group and LV (vol1/vol1_lv).
>>
>> Then I log in another 4 iscsi LUNs (vol3_[0-3]), create a multipath
>> device for each new LUN (/dev/mapper/vol3_[0-3]).  Now there is a
>> strange thing:
>> some new multipath devices (/dev/mapper/vol3_0, /dev/mapper/vol3_2 in
>> this example) replaced existing PVs in vol1.  As a result, these new
>> multipath devices have open-count > 0, so I cannot pvcreate on them:
>>
>> $ sudo dmsetup ls --tree
>> vol1-vol1_lv (252:4)
>>  ├─vol3_0 (252:9)   <==  fresh multipath device, should NOT be in vol1
>>  │  └─ (65:128)
>>  ├─vol3_2 (252:10)  <==  fresh multipath device, should NOT be in vol1
>>  │  └─ (65:144)
>>  ├─vol1_1 (252:1)
>>  │  └─ (65:48)
>>  └─vol1_0 (252:0)
>>     └─ (65:32)
>> vol1_3 (252:3)  <== was in vol1-vol1_lv, but knocked out
>>  └─ (65:16)
>> vol1_2 (252:2)  <== was in vol1-vol1_lv, but knocked out
>>  └─ (65:64)
>> vol3_3 (252:11)
>>  └─ (65:160)
>> vol3_1 (252:12)
>>  └─ (65:176)
>>
>>
>> Please note that all these vol3_[0-3] are fresh, without any LVM
>> metadata on them, as shown by pvscan::
>>
>> sudo pvscan --cache /dev/mapper/vol3_0
>>   Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
>>   Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
>>   Incorrect metadata area header checksum on /dev/mapper/vol3_0 at offset 4096
>>
>>
>> $ sudo pvcreate /dev/mapper/vol3_0   <== this multipath device was
>> mistakenly included into vol1
>>
>>   Found duplicate PV 9F6vU9NVBfEq1w3e04T5UreO6fDVPJNy: using
>> /dev/mapper/vol1_3 not /dev/mapper/vol3_0
>>   Using duplicate PV /dev/mapper/vol1_3 without holders, replacing
>> /dev/mapper/vol3_0
>>   Can't open /dev/mapper/vol3_0 exclusively.  Mounted filesystem?
>>
>>
>> ========== Configs I used:
>> BTW,  I have enabled lvmetad,  my lvm.conf has this:
>> filter = [ "a|/dev/mapper/.*|", "r|.*|" ]
>> global_filter = [ "a|/dev/mapper/.*|", "r|.*|" ]
>>
>> My /etc/multipath.conf is:
>> defaults {
>>   user_friendly_names     yes
>>   path_grouping_policy    failover
>>   polling_interval        10
>>   path_selector           "round-robin 0"
>>   find_multipaths         yes
>>   features       "1 queue_if_no_path"
>> }
>> blacklist {
>>   devnode  "^sda[1-9]"
>> }
>> multipaths {
>>   multipath {
>>     wwid 360000000758757de9fb289cbde12abab
>>     alias vol1_0
>>   }
>>   // more devices
>> }
>>
>> iscsi initiator is on centos 6.5, with pkgs version:
>> lvm2-2.02.143-12.el6.x86_64
>> device-mapper-multipath-0.4.9-100.el6.x86_64
>>
>> iscsi target is tgtd on another Ubuntu machine.
>>
>>
>>
>> Comments are appreciated.
>
> Have you tried this with multipathing removed? Does the same thing still
> happen? multipath shouldn't be changing your LVs. The reassign_maps code
> (which could do that) isn't in RHEL6. Does this work correctly with
> lvmetad turned off?
>
> -Ben
>
>>
>>
>> -Shawn
>>
>> --
>> dm-devel mailing list
>> dm-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel




More information about the dm-devel mailing list