[linux-lvm] Lvm hangs on San fail

jose nuno neto jose.neto at liber4e.com
Mon Apr 19 09:21:57 UTC 2010


GoodMornings

In the meantime we did an upgrade on RHEL to 5.5 and multipath now looks
more accurate showing only 1path per HBA. We have a 2datacenter setup with
4Fabrics between them. 2Fabrics for each datacenter.

mpath-dc2-a (360060e8004f240000000f24000000502) dm-12 HITACHI,OPEN-V      -SU
[size=26G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
 \_ 3:0:1:0 sdg 8:96  [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 5:0:1:0 sdo 8:224 [active][ready]

I'll repeat the tests and look at the state you're saying

I'm using group_by_node_name because before with 8links it was a mess, but
it spreads some load between the paths, but not on all of them. anyway
that was it the "strange" paths i'll see how it goes now

Thanks
Jose

> Hi Jose,
>
> You have a total of 8 paths per LUN, 4 are marked active thru HBA host5
> and the remaining 4 are marked enabled on HBA3 (you're on 2 differnet
> FABRICS right ?) , this may due to the fact that you use policy
> group_by_node_name. I don't know if this mode if it actually load
> balances across the 2 HBA's.
>
>
> When you pull the cable (this is the test that you're doing and that s
> failling ?) you say it times out forever.
> As you're in policy group_by_node_name, which corresponds to the
> fc_transport target node name you should look at the state of the target
> ports bound to the HBA you disconnected (is it the test you're doing?)
> (state Blocked ?) /sys/class/fc_remote_ports/rport:H:B-R (where H is
> your HBA number )forever due to may dev_loss_tmo or fast_io_fail_tmo too
> high (both timers are located under /sys/class/fc_remote_ports/rport....
>
> I have almost the same setup with almost the same storage (OPEN-V) from
> a pair of HP XP (OEM'ized Hitachi arrays) and things are setup to use
> maximum 4 paths per LUN (2 per fabric), some storage experts tend to say
> it is already too much, and as multipath policy I use multibus to
> distribute across the 2 fabrics.
>
> Hope all this will help
>
>
>
>
>
>
>
> you say this happens when you pull the fiber cable from the server
>
> On Fri, 2010-04-16 at 08:55 +0000, jose nuno neto wrote:
>> Hi
>>
>>
>> > Can you show us a pvdisplay or verbose vgdisplay ?
>> >
>>
>> Here goes the vgdisplay -v of one of the vgs with mirrors
>>
>> ###########################################################
>>
>> --- Volume group ---
>>   VG Name               vg_ora_jura
>>   System ID
>>   Format                lvm2
>>   Metadata Areas        3
>>   Metadata Sequence No  705
>>   VG Access             read/write
>>   VG Status             resizable
>>   MAX LV                0
>>   Cur LV                4
>>   Open LV               4
>>   Max PV                0
>>   Cur PV                3
>>   Act PV                3
>>   VG Size               52.79 GB
>>   PE Size               4.00 MB
>>   Total PE              13515
>>   Alloc PE / Size       12292 / 48.02 GB
>>   Free  PE / Size       1223 / 4.78 GB
>>   VG UUID               nttQ3x-4ecP-Q6ms-jt2u-UIs4-texj-Q9Nxdt
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_arch
>>   VG Name                vg_ora_jura
>>   LV UUID                8oUfYn-2TrP-yS6K-pcS2-cgI4-tcv1-33dSdX
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                5.00 GB
>>   Current LE             1280
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:28
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_export
>>   VG Name                vg_ora_jura
>>   LV UUID                NLfQT6-36TS-DRHq-PJRf-9UDv-L8mz-HjPea2
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                5.00 GB
>>   Current LE             1280
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:32
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_data
>>   VG Name                vg_ora_jura
>>   LV UUID                VtSBIL-XvCw-23xK-NVAH-DvYn-P2sE-OkZJro
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                12.00 GB
>>   Current LE             3072
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:40
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_redo
>>   VG Name                vg_ora_jura
>>   LV UUID                KRHKBG-71Qv-YBsA-oJDt-igzP-EYaI-gPwcBX
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                2.00 GB
>>   Current LE             512
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:48
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_arch_mimage_0
>>   VG Name                vg_ora_jura
>>   LV UUID                lQCOAt-aoK3-HBp1-xrQW-eh7L-6t94-CyAg5c
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                5.00 GB
>>   Current LE             1280
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:26
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_arch_mimage_1
>>   VG Name                vg_ora_jura
>>   LV UUID                snrnPc-8FxY-ekAk-ooNe-sBws-tuI0-cTFfj3
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                5.00 GB
>>   Current LE             1280
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:27
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_arch_mlog
>>   VG Name                vg_ora_jura
>>   LV UUID                ouqaCQ-Deex-iArv-xLe9-jg8b-5cLf-3SChQ1
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                4.00 MB
>>   Current LE             1
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:25
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_data_mlog
>>   VG Name                vg_ora_jura
>>   LV UUID                TmE2S0-r8ST-v624-RxUn-Qppw-2l8p-jM9EC9
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                4.00 MB
>>   Current LE             1
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:37
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_data_mimage_0
>>   VG Name                vg_ora_jura
>>   LV UUID                8hR0bP-g9mR-OSXS-KdUM-ouZ6-KVdS-sfz51c
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                12.00 GB
>>   Current LE             3072
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:38
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_data_mimage_1
>>   VG Name                vg_ora_jura
>>   LV UUID                fzdzrD-7p6d-XFkA-UHyr-CPad-F2nV-6QIU9p
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                12.00 GB
>>   Current LE             3072
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:39
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_export_mlog
>>   VG Name                vg_ora_jura
>>   LV UUID                29yLY8-N3Lv-46pN-1jze-50A2-wlhu-quuoMa
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                4.00 MB
>>   Current LE             1
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:29
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_export_mimage_0
>>   VG Name                vg_ora_jura
>>   LV UUID                1uMTsf-wPaQ-ItTy-rpma-m2La-TGZl-C4KIU4
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                5.00 GB
>>   Current LE             1280
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:30
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_export_mimage_1
>>   VG Name                vg_ora_jura
>>   LV UUID                cm8Kn7-knL3-mUPL-XFvU-geMm-Wxff-32x2va
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                5.00 GB
>>   Current LE             1280
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:31
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_redo_mlog
>>   VG Name                vg_ora_jura
>>   LV UUID                811tNy-eaC5-zfZQ-1QVf-cbYP-1MIM-v6waJF
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                4.00 MB
>>   Current LE             1
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:45
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_redo_mimage_0
>>   VG Name                vg_ora_jura
>>   LV UUID                aUZAer-f5rl-1f2X-9jgY-f8CJ-jdwe-F5Pmao
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                2.00 GB
>>   Current LE             512
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:46
>>
>>   --- Logical volume ---
>>   LV Name                /dev/vg_ora_jura/lv_ora_jura_redo_mimage_1
>>   VG Name                vg_ora_jura
>>   LV UUID                gAEJym-sSbq-rC4P-AjpI-OibV-k3yI-lDx1I6
>>   LV Write Access        read/write
>>   LV Status              available
>>   # open                 1
>>   LV Size                2.00 GB
>>   Current LE             512
>>   Segments               1
>>   Allocation             inherit
>>   Read ahead sectors     auto
>>   - currently set to     256
>>   Block device           253:47
>>
>>   --- Physical volumes ---
>>   PV Name               /dev/mapper/mpath-dc1-b
>>   PV UUID               hgjXU1-2qjo-RsmS-1XJI-d0kZ-oc4A-ZKCza8
>>   PV Status             allocatable
>>   Total PE / Free PE    6749 / 605
>>
>>   PV Name               /dev/mapper/mpath-dc2-b
>>   PV UUID               hcANwN-aeJT-PIAq-bPsf-9d3e-ylkS-GDjAGR
>>   PV Status             allocatable
>>   Total PE / Free PE    6749 / 605
>>
>>   PV Name               /dev/mapper/mpath-dc2-mlog1p1
>>   PV UUID               4l9Qvo-SaAV-Ojlk-D1YB-Tkud-Yjg0-e5RkgJ
>>   PV Status             allocatable
>>   Total PE / Free PE    17 / 13
>>
>>
>>
>> > On 4/15/10, jose nuno neto <jose.neto at liber4e.com> wrote:
>> >> hellos
>> >>
>> >> I spent more time on this and it seems since LVM cant write to any pv
>> on
>> >> the  volumes it has lost, it cannot write the failure of the devices
>> and
>> >> update the metadata on other PVs. So it hangs forever
>> >>
>> >> Is this right?
>> >>
>> >>> GoodMornings
>> >>>
>> >>> This is what I have on multipath.conf
>> >>>
>> >>> blacklist {
>> >>>         wwid SSun_VOL0_266DCF4A
>> >>>         wwid SSun_VOL0_5875CF4A
>> >>>         devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
>> >>>         devnode "^hd[a-z]"
>> >>> }
>> >>> defaults {
>> >>>                 user_friendly_names             yes
>> >>> }
>> >>> devices {
>> >>>        device {
>> >>>                 vendor                          "HITACHI"
>> >>>                 product                         "OPEN-V"
>> >>>                 path_grouping_policy            group_by_node_name
>> >>>                 failback                        immediate
>> >>>                 no_path_retry                   fail
>> >>>        }
>> >>>        device {
>> >>>                 vendor                          "IET"
>> >>>                 product                         "VIRTUAL-DISK"
>> >>>                 path_checker                    tur
>> >>>                 path_grouping_policy            failover
>> >>>                 failback                        immediate
>> >>>                 no_path_retry                   fail
>> >>>        }
>> >>> }
>> >>>
>> >>> As an example this is one LUN. It shoes [features=0] so I'd say it
>> >>> should
>> >>> fail right way
>> >>>
>> >>> mpath-dc2-a (360060e8004f240000000f24000000502) dm-15 HITACHI,OPEN-V
>> >>> -SU
>> >>> [size=26G][features=0][hwhandler=0][rw]
>> >>> \_ round-robin 0 [prio=4][active]
>> >>>  \_ 5:0:1:0     sdu  65:64  [active][ready]
>> >>>  \_ 5:0:1:16384 sdac 65:192 [active][ready]
>> >>>  \_ 5:0:1:32768 sdas 66:192 [active][ready]
>> >>>  \_ 5:0:1:49152 sdba 67:64  [active][ready]
>> >>> \_ round-robin 0 [prio=4][enabled]
>> >>>  \_ 3:0:1:0     sdaw 67:0   [active][ready]
>> >>>  \_ 3:0:1:16384 sdbe 67:128 [active][ready]
>> >>>  \_ 3:0:1:32768 sdbi 67:192 [active][ready]
>> >>>  \_ 3:0:1:49152 sdbm 68:0   [active][ready]
>> >>>
>> >>> It think they fail since I see this messages from LVM:
>> >>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in
>> >>> vg_syb_roger-lv_syb_roger_admin
>> >>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty
>> devices
>> >>> in
>> >>> vg_syb_roger-lv_syb_roger_admin
>> >>>
>> >>> But from some reason LVM cant remove them, any option I should have
>> on
>> >>> lvm.conf?
>> >>>
>> >>> BestRegards
>> >>> Jose
>> >>>> post your multipath.conf file, you may be queuing forever ?
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Wed, 2010-04-14 at 15:03 +0000, jose nuno neto wrote:
>> >>>>> Hi2all
>> >>>>>
>> >>>>> I'm on RHEL 5.4 with
>> >>>>> lvm2-2.02.46-8.el5_4.1
>> >>>>> 2.6.18-164.2.1.el5
>> >>>>>
>> >>>>> I have a multipathed SAN connection with what Im builing LVs
>> >>>>> Its a Cluster system, and I want LVs to switch on failure
>> >>>>>
>> >>>>> If I simulate a fail through the OS via
>> >>>>> /sys/bus/scsi/devices/$DEVICE/delete
>> >>>>> I get a LV fail and the service switch to other node
>> >>>>>
>> >>>>> But if I do it "real" portdown on the SAN Switch, multipath
>> reports
>> >>>>> path
>> >>>>> down, but LVM commands hang forever and nothing gets switched
>> >>>>>
>> >>>>> from the logs i see multipath failing paths, and lvm Failed to
>> remove
>> >>>>> faulty
>> >>>>> "devices"
>> >>>>>
>> >>>>> Any ideas how I should  "fix" it?
>> >>>>>
>> >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Log device, 253:53, has
>> >>>>> failed.
>> >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Device failure in
>> >>>>> vg_ora_scapa-lv_ora_scapa_redo
>> >>>>> Apr 14 16:02:45 dc1-x6250-a lvm[15622]: Another thread is handling
>> an
>> >>>>> event.  Waiting...
>> >>>>>
>> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining
>> active
>> >>>>> paths: 0
>> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-a: remaining
>> active
>> >>>>> paths: 0
>> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining
>> active
>> >>>>> paths: 0
>> >>>>> Apr 14 16:02:52 dc1-x6250-a multipathd: mpath-dc1-b: remaining
>> active
>> >>>>> paths: 0
>> >>>>>
>> >>>>> Apr 14 16:03:05 dc1-x6250-a lvm[15622]: Device failure in
>> >>>>> vg_syb_roger-lv_syb_roger_admin
>> >>>>> Apr 14 16:03:14 dc1-x6250-a lvm[15622]: Failed to remove faulty
>> >>>>> devices
>> >>>>> in
>> >>>>> vg_syb_roger-lv_syb_roger_admin
>> >>>>>
>> >>>>> Much Thanks
>> >>>>> Jose
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> linux-lvm mailing list
>> >>>>> linux-lvm at redhat.com
>> >>>>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> >>>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> linux-lvm mailing list
>> >>>> linux-lvm at redhat.com
>> >>>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> >>>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>> >>>>
>> >>>
>> >>>
>> >>
>> >> _______________________________________________
>> >> linux-lvm mailing list
>> >> linux-lvm at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/linux-lvm
>> >> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>> >>
>> >
>> > --
>> > Sent from my mobile device
>> >
>> > Regards,
>> > Eugene Vilensky
>> > evilensky at gmail.com
>> >
>> > _______________________________________________
>> > linux-lvm mailing list
>> > linux-lvm at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-lvm
>> > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>> >
>>
>> _______________________________________________
>> linux-lvm mailing list
>> linux-lvm at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-lvm
>> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>




More information about the linux-lvm mailing list