[linux-lvm] Lvm hangs on San fail
jose nuno neto
jose.neto at liber4e.com
Mon Apr 19 09:21:57 UTC 2010
GoodMornings
In the meantime we did an upgrade on RHEL to 5.5 and multipath now looks
more accurate showing only 1path per HBA. We have a 2datacenter setup with
4Fabrics between them. 2Fabrics for each datacenter.
mpath-dc2-a (360060e8004f240000000f24000000502) dm-12 HITACHI,OPEN-V -SU
[size=26G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
\_ 3:0:1:0 sdg 8:96 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 5:0:1:0 sdo 8:224 [active][ready]
I'll repeat the tests and look at the state you're saying
I'm using group_by_node_name because before with 8links it was a mess, but
it spreads some load between the paths, but not on all of them. anyway
that was it the "strange" paths i'll see how it goes now
Thanks
Jose
> Hi Jose,
>
> You have a total of 8 paths per LUN, 4 are marked active thru HBA host5
> and the remaining 4 are marked enabled on HBA3 (you're on 2 differnet
> FABRICS right ?) , this may due to the fact that you use policy
> group_by_node_name. I don't know if this mode if it actually load
> balances across the 2 HBA's.
>
>
> When you pull the cable (this is the test that you're doing and that s
> failling ?) you say it times out forever.
> As you're in policy group_by_node_name, which corresponds to the
> fc_transport target node name you should look at the state of the target
> ports bound to the HBA you disconnected (is it the test you're doing?)
> (state Blocked ?) /sys/class/fc_remote_ports/rport:H:B-R (where H is
> your HBA number )forever due to may dev_loss_tmo or fast_io_fail_tmo too
> high (both timers are located under /sys/class/fc_remote_ports/rport....
>
> I have almost the same setup with almost the same storage (OPEN-V) from
> a pair of HP XP (OEM'ized Hitachi arrays) and things are setup to use
> maximum 4 paths per LUN (2 per fabric), some storage experts tend to say
> it is already too much, and as multipath policy I use multibus to
> distribute across the 2 fabrics.
>
> Hope all this will help
>
>
>
>
>
>
>
> you say this happens when you pull the fiber cable from the server
>
> On Fri, 2010-04-16 at 08:55 +0000, jose nuno neto wrote:
>> Hi
>>
>>
>> > Can you show us a pvdisplay or verbose vgdisplay ?
>> >
>>
>> Here goes the vgdisplay -v of one of the vgs with mirrors
>>
>> ###########################################################
>>
>> --- Volume group ---
>> VG Name vg_ora_jura
>> System ID
>> Format lvm2
>> Metadata Areas 3
>> Metadata Sequence No 705
>> VG Access read/write
>> VG Status resizable
>> MAX LV 0
>> Cur LV 4
>> Open LV 4
>> Max PV 0
>> Cur PV 3
>> Act PV 3
>> VG Size 52.79 GB
>> PE Size 4.00 MB
>> Total PE 13515
>> Alloc PE / Size 12292 / 48.02 GB
>> Free PE / Size 1223 / 4.78 GB
>> VG UUID nttQ3x-4ecP-Q6ms-jt2u-UIs4-texj-Q9Nxdt
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_arch
>> VG Name vg_ora_jura
>> LV UUID 8oUfYn-2TrP-yS6K-pcS2-cgI4-tcv1-33dSdX
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 5.00 GB
>> Current LE 1280
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:28
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_export
>> VG Name vg_ora_jura
>> LV UUID NLfQT6-36TS-DRHq-PJRf-9UDv-L8mz-HjPea2
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 5.00 GB
>> Current LE 1280
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:32
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_data
>> VG Name vg_ora_jura
>> LV UUID VtSBIL-XvCw-23xK-NVAH-DvYn-P2sE-OkZJro
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 12.00 GB
>> Current LE 3072
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:40
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_redo
>> VG Name vg_ora_jura
>> LV UUID KRHKBG-71Qv-YBsA-oJDt-igzP-EYaI-gPwcBX
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 2.00 GB
>> Current LE 512
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:48
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_arch_mimage_0
>> VG Name vg_ora_jura
>> LV UUID lQCOAt-aoK3-HBp1-xrQW-eh7L-6t94-CyAg5c
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 5.00 GB
>> Current LE 1280
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:26
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_arch_mimage_1
>> VG Name vg_ora_jura
>> LV UUID snrnPc-8FxY-ekAk-ooNe-sBws-tuI0-cTFfj3
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 5.00 GB
>> Current LE 1280
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:27
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_arch_mlog
>> VG Name vg_ora_jura
>> LV UUID ouqaCQ-Deex-iArv-xLe9-jg8b-5cLf-3SChQ1
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 4.00 MB
>> Current LE 1
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:25
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_data_mlog
>> VG Name vg_ora_jura
>> LV UUID TmE2S0-r8ST-v624-RxUn-Qppw-2l8p-jM9EC9
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 4.00 MB
>> Current LE 1
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:37
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_data_mimage_0
>> VG Name vg_ora_jura
>> LV UUID 8hR0bP-g9mR-OSXS-KdUM-ouZ6-KVdS-sfz51c
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 12.00 GB
>> Current LE 3072
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:38
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_data_mimage_1
>> VG Name vg_ora_jura
>> LV UUID fzdzrD-7p6d-XFkA-UHyr-CPad-F2nV-6QIU9p
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 12.00 GB
>> Current LE 3072
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:39
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_export_mlog
>> VG Name vg_ora_jura
>> LV UUID 29yLY8-N3Lv-46pN-1jze-50A2-wlhu-quuoMa
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 4.00 MB
>> Current LE 1
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:29
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_export_mimage_0
>> VG Name vg_ora_jura
>> LV UUID 1uMTsf-wPaQ-ItTy-rpma-m2La-TGZl-C4KIU4
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 5.00 GB
>> Current LE 1280
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:30
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_export_mimage_1
>> VG Name vg_ora_jura
>> LV UUID cm8Kn7-knL3-mUPL-XFvU-geMm-Wxff-32x2va
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 5.00 GB
>> Current LE 1280
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:31
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_redo_mlog
>> VG Name vg_ora_jura
>> LV UUID 811tNy-eaC5-zfZQ-1QVf-cbYP-1MIM-v6waJF
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 4.00 MB
>> Current LE 1
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:45
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_redo_mimage_0
>> VG Name vg_ora_jura
>> LV UUID aUZAer-f5rl-1f2X-9jgY-f8CJ-jdwe-F5Pmao
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 2.00 GB
>> Current LE 512
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:46
>>
>> --- Logical volume ---
>> LV Name /dev/vg_ora_jura/lv_ora_jura_redo_mimage_1
>> VG Name vg_ora_jura
>> LV UUID gAEJym-sSbq-rC4P-AjpI-OibV-k3yI-lDx1I6
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 2.00 GB
>> Current LE 512
>> Segments 1
>> Allocation inherit
>> Read ahead sectors auto
>> - currently set to 256
>> Block device 253:47
>>
>> --- Physical volumes ---
>> PV Name /dev/mapper/mpath-dc1-b
>> PV UUID hgjXU1-2qjo-RsmS-1XJI-d0kZ-oc4A-ZKCza8
>> PV Status allocatable
>> Total PE / Free PE 6749 / 605
>>
>> PV Name /dev/mapper/mpath-dc2-b
>> PV UUID hcANwN-aeJT-PIAq-bPsf-9d3e-ylkS-GDjAGR
>> PV Status allocatable
>> Total PE / Free PE 6749 / 605
>>
>> PV Name /dev/mapper/mpath-dc2-mlog1p1
>> PV UUID 4l9Qvo-SaAV-Ojlk-D1YB-Tkud-Yjg0-e5RkgJ
>> PV Status allocatable
>> Total PE / Free PE 17 / 13
>>
>>
>>
>> > On 4/15/10, jose nuno neto <jose.neto at liber4e.com> wrote:
>> >> hellos
>> >>
>> >> I spent more time on this and it seems since LVM cant write to any pv
>> on
>> >> the volumes it has lost, it cannot write the failure of the devices
>> and
>> >> update the metadata on other PVs. So it hangs forever
>> >>
>> >> Is this right?
>> >>
>> >>> GoodMornings
>> >>>
>> >>> This is what I have on multipath.conf
>> >>>
>> >>> blacklist {
>> >>> wwid SSun_VOL0_266DCF4A
>> >>> wwid SSun_VOL0_5875CF4A
>> >>> devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
>> >>> devnode "^hd[a-z]"
>> >>> }
>> >>> defaults {
>> >>> user_friendly_names yes
>> >>> }
>> >>> devices {
>> >>> device {
>> >>> vendor "HITACHI"
>> >>> product "OPEN-V"
>> >>> path_grouping_policy group_by_node_name
>> >>> failback immediate
>> >>> no_path_retry fail
>> >>> }
>> >>> device {
>> >>> vendor "IET"
>> >>> product "VIRTUAL-DISK"
>> >>> path_checker tur
>> >>> path_grouping_policy failover
>> >>> failback immediate
>> >>> no_path_retry fail
>> >>> }
>> >>> }
>> >>>
>> >>> As an example this is one LUN. It shoes [features=0] so I'd say it
>> >>> should
>> >>> fail right way
>> >>>
>> >>> mpath-dc2-a (360060e8004f240000000f24000000502) dm-15 HITACHI,OPEN-V
>> >>> -SU
>> >>> [size=26G][features=0][hwhandler=0][rw]
>> >>> \_ round-robin 0 [prio=4][active]
>> >>> \_ 5:0:1:0 sdu 65:64 [active][ready]
>> >>> \_ 5:0:1:16384 sdac 65:192 [active][ready]
>> >>> \_ 5:0:1:32768 sdas 66:192 [active][ready]
>> >>> \_ 5:0:1:49152 sdba 67:64 [active][ready]
>> >>> \_ round-robin 0 [prio=4][enabled]
>> >>> \_ 3:0:1:0 sdaw 67:0 [active][ready]
>> >>> \_ 3:0:1:16384 sdbe 67:128 [active][ready]
>> >>> \_ 3:0:1:32768 sdbi 67:192 [active][ready]
>> >>> \_ 3:0:1:49152 sdbm 68:0 [active][ready]
>> >>>
>> >>> It think they fail since I see this messages from LVM:
>> >>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in
>> >>> vg_syb_roger-lv_syb_roger_admin
>> >>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty
>> devices
>> >>> in
>> >>> vg_syb_roger-lv_syb_roger_admin
>> >>>
>> >>> But from some reason LVM cant remove them, any option I should have
>> on
>> >>> lvm.conf?
>> >>>
>> >>> BestRegards
>> >>> Jose
>> >>>> post your multipath.conf file, you may be queuing forever ?
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Wed, 2010-04-14 at 15:03 +0000, jose nuno neto wrote:
>> >>>>> Hi2all
>> >>>>>
>> >>>>> I'm on RHEL 5.4 with
>> >>>>> lvm2-2.02.46-8.el5_4.1
>> >>>>> 2.6.18-164.2.1.el5
>> >>>>>
>> >>>>> I have a multipathed SAN connection with what Im builing LVs
>> >>>>> Its a Cluster system, and I want LVs to switch on failure
>> >>>>>
>> >>>>> If I simulate a fail through the OS via
>> >>>>> /sys/bus/scsi/devices/$DEVICE/delete
>> >>>>> I get a LV fail and the service switch to other node
>> >>>>>
>> >>>>> But if I do it "real" portdown on the SAN Switch, multipath
>> reports
>> >>>>> path
>> >>>>> down, but LVM commands hang forever and nothing gets switched
>> >>>>>
>> >>>>> from the logs i see multipath failing paths, and lvm Failed to
>> remove
>> >>>>> faulty
>> >>>>> "devices"
>> >>>>>
>> >>>>> Any ideas how I should "fix" it?
>> >>>>>
>> >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has
>> >>>>> failed.
>> >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in
>> >>>>> vg_ora_scapa-lv_ora_scapa_redo
>> >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling
>> an
>> >>>>> event. Waiting...
>> >>>>>
>> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining
>> active
>> >>>>> paths: 0
>> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining
>> active
>> >>>>> paths: 0
>> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining
>> active
>> >>>>> paths: 0
>> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining
>> active
>> >>>>> paths: 0
>> >>>>>
>> >>>>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in
>> >>>>> vg_syb_roger-lv_syb_roger_admin
>> >>>>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty
>> >>>>> devices
>> >>>>> in
>> >>>>> vg_syb_roger-lv_syb_roger_admin
>> >>>>>
>> >>>>> Much Thanks
>> >>>>> Jose
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> linux-lvm mailing list
>> >>>>> linux-lvm at redhat.com
>> >>>>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> >>>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> linux-lvm mailing list
>> >>>> linux-lvm at redhat.com
>> >>>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> >>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>> >>>>
>> >>>
>> >>>
>> >>
>> >> _______________________________________________
>> >> linux-lvm mailing list
>> >> linux-lvm at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/linux-lvm
>> >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>> >>
>> >
>> > --
>> > Sent from my mobile device
>> >
>> > Regards,
>> > Eugene Vilensky
>> > evilensky at gmail.com
>> >
>> > _______________________________________________
>> > linux-lvm mailing list
>> > linux-lvm at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-lvm
>> > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>> >
>>
>> _______________________________________________
>> linux-lvm mailing list
>> linux-lvm at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
More information about the linux-lvm
mailing list