[dm-devel] dm-multipath failover

Jimmie dm-devel at chaj.com
Mon Nov 29 18:48:38 UTC 2004


Can someone just confirm that DM-multipath is supposed to failover both ways?
i.e. fiber connect is pulled - fails over to other local FC. When the original
FC is replaced and the other is pulled - it fails back. Tell me I'm not
crazy.

Jim

On Mon, 29 Nov 2004, Jimmie wrote:

> I changed the debug level to 7 in Makefile and recompiled. Don't see a
> daemon.log. Is it supposed to be in /var/log? Either way, I'll post the
> failover sequence.
>
> Multipath startup:
> Nov 28 12:27:04 nfstest1 multipathd: --------start up--------
> Nov 28 12:27:04 nfstest1 multipathd: read /etc/multipath.conf
> Nov 28 12:27:04 nfstest1 multipathd: ramfs maxsize is 94344
> Nov 28 12:27:04 nfstest1 multipathd: start DM events thread
> Nov 28 12:27:04 nfstest1 multipathd: path checkers start up
> Nov 28 12:27:04 nfstest1 multipathd: initial reconfigure multipath maps
> Nov 28 12:27:04 nfstest1 multipathd: refresh devmaps list
> Nov 28 12:27:04 nfstest1 multipathd: refresh failpaths list
> Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdc
> Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:32
> Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdd
> Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:48
> Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sde
> Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:64
> Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdf
> Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:80
> Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdg
> Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:96
> Nov 28 12:27:04 nfstest1 multipathd: set readsector0 path checker for sdh
> Nov 28 12:27:04 nfstest1 multipathd: path checker startup : 8:112
> Nov 28 12:27:04 nfstest1 multipathd: start up event loops
> Nov 28 12:27:04 nfstest1 multipathd: event checker startup : big01
>
> When I pull out the FC from port 1 of the QLogic card:
> Nov 29 10:48:55 nfstest1 kernel: qla2300 0000:03:0b.0: LIP reset occured
> (f8cb).
> Nov 29 10:48:57 nfstest1 kernel: qla2300 0000:03:0b.0: LOOP DOWN detected.
> Nov 29 10:49:57 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x20000
> Nov 29 10:49:57 nfstest1 kernel: end_request: I/O error, dev sdc, sector 5424
> Nov 29 10:49:57 nfstest1 kernel: end_request: I/O error, dev sdc, sector 5432
> Nov 29 10:49:57 nfstest1 multipathd: devmap event on big01
> Nov 29 10:49:57 nfstest1 multipathd: big01 : reconfigure multipath map
> Nov 29 10:50:25 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x10000
> Nov 29 10:50:25 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x10000
> Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 2> return code = 0x10000
> Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 3> return code = 0x10000
> Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x10000
> Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 1> return code = 0x10000
> Nov 29 10:50:26 nfstest1 kernel: SCSI error : <1 0 0 2> return code = 0x10000
>
> Multipath detects the failure and remaps:
> Nov 29 10:50:26 nfstest1 multipathd: refresh devmaps list
> Nov 29 10:50:26 nfstest1 multipathd: refresh failpaths list
> Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:32
> Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:48
> Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:64
> Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:80
> Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:96
> Nov 29 10:50:26 nfstest1 multipathd: path checker already active : 8:112
> Nov 29 10:50:26 nfstest1 multipathd: start up event loops
> Nov 29 10:50:26 nfstest1 multipathd: event checker startup : big01
>
> After I put port1 back in and then pull out port2 (with a couple of minute
> wait in between):
> Nov 29 10:53:12 nfstest1 kernel: qla2300 0000:03:0b.1: LIP reset occured
> (b5b5).
> Nov 29 10:53:13 nfstest1 kernel: qla2300 0000:03:0b.1: LOOP DOWN detected.
> Nov 29 10:54:10 nfstest1 kernel: SCSI error : <2 0 0 1> return code = 0x20000
> Nov 29 10:54:10 nfstest1 kernel: end_request: I/O error, dev sdf, sector 7944
> Nov 29 10:54:10 nfstest1 kernel: end_request: I/O error, dev sdf, sector 7952
> Nov 29 10:54:11 nfstest1 multipathd: devmap event on big01
> Nov 29 10:54:11 nfstest1 multipathd: big01 : reconfigure multipath map
> Nov 29 10:54:11 nfstest1 kernel: Buffer I/O error on device dm-0, logical
> block
> 1000
> Nov 29 10:54:11 nfstest1 kernel: lost page write due to I/O error on dm-0
> Nov 29 10:54:11 nfstest1 kernel: Aborting journal on device dm-0.
> Nov 29 10:54:12 nfstest1 kernel: ext3_abort called.
> Nov 29 10:54:12 nfstest1 kernel: EXT3-fs error (device dm-0):
> ext3_journal_start
> : Detected aborted journal
> Nov 29 10:54:12 nfstest1 kernel: Remounting filesystem read-only
>
> and then a bunch of:
> Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
> block
> 56983554
> Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
> Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
> block
> 56983555
> Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
> Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
> block
> 56983556
> Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
> Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
> block
> 56983557
> Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
> Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
> block
> 56983558
> Nov 29 10:54:45 nfstest1 kernel: lost page write due to I/O error on dm-0
> Nov 29 10:54:45 nfstest1 kernel: Buffer I/O error on device dm-0, logical
> block
> 56983566
>
> So basically i get some scsi related errors. Is this normal? Does multipath
> failover only work one way? Any ideas? Please help.
>
> Jimmie
>
>
>
> On Thu, 25 Nov 2004, christophe varoqui wrote:
>
> > The daemon should log in daemon.log
> > You can push the debug level to the max and post the trace.
> >
> > In the mean time, you can also make sure you didn't apply the patchset
> > from Mike Christie which used to be appended a the tail of the -udm
> > patchset. These patches broke the event model used by the daemon.
> >
> > regards,
> > cvaroqui
> >
> > Le mercredi 24 novembre 2004 à 17:10 -0500, Jims a écrit :
> > > We have a Dell unit with 2 QLogic 23XX series cards which are providing
> > > multipathing to 3 EMC volumes. We're looking to have a failover setup (with
> > > /dev/sdc and /dev/sdf) so that if one of FC connects is pulled, multipathd
> > > will reroute the path to the other card and also be able to reestablish the
> > > connection when the Fiber is put back.
> > >
> > > dmsetup is able to create the device in /udev (/udev/big01) and we're able to
> > > mount it. When I pull an FC cable, the mount does indeed failover, however
> > > when we put it back in and pull the other, we get a bunch of scsi errors and
> > > the mount gets remounted in read-only mode. How can we remedy this? Any
> > > similar experiences and/or suggestions? Thanks.
> > >
> > > By the way, sda and sdb are the system drives. sdd,sde,sdg,sdh are other FC
> > > drives that we're not working with right now.
> > >
> > > our DMsetup table is as follows:
> > >
> > > DMsetup table <<start>>
> > > 0 1885645370 multipath 2 round-robin 1 0 /dev/sdc round-robin 1 0 /dev/sdf
> > > DMsetup table <<end>>
> > >
> > > here is our multipath.conf:
> > >
> > > multipath.conf <<start>>
> > > defaults {
> > >         multipath_tool  "/sbin/multipath -v 0 -S"
> > >         udev_dir        /udev
> > >         polling_interval 5
> > >         default_selector        round-robin
> > >         default_selector_args   0
> > >         default_path_grouping_policy    failover
> > >         default_getuid_callout  "/sbin/scsi_id -g -u -s"
> > >         default_prio_callout    "/bin/false"
> > > }
> > >
> > > devnode_blacklist {
> > >         devnode cciss
> > >         devnode fd
> > >         devnode hd
> > >         devnode md
> > >         devnode dm
> > >         devnode sr
> > >         devnode scd
> > >         devnode st
> > >         devnode ram
> > >         devnode raw
> > >         devnode loop
> > >         devnode sda
> > >         devnode sdb
> > > }
> > > multipaths {
> > >         multipath {
> > >                 wwid    501566091000
> > >                 alias   big01
> > >                 path_grouping_policy    failover
> > >                 path_selector           round-robin
> > >         }
> > > }
> > > devices {
> > >         device {
> > >                 vendor                  "SEMC     "
> > >                 product                 "SYMMETRIX      "
> > >                 path_grouping_policy    failover
> > >                 getuid_callout          "/sbin/scsi_id -g -u -s"
> > >                 path_checker            readsector0
> > >                 path_selector           round-robin
> > >         }
> > > }
> > > multipath.conf <<end>>
> > >
> > > and finally output of multipath -v2
> > >
> > > output <<start>>
> > > #
> > > # all paths :
> > > #
> > > SEMC_____SYMMETRIX______501566091000 (1 0 0 1) sdc [ready ] (8:32) [SYMMETRIX
> > > ]
> > > SEMC_____SYMMETRIX______5015660D1000 (1 0 0 2) sdd [ready ] (8:48) [SYMMETRIX
> > > ]
> > > SEMC_____SYMMETRIX______501566111000 (1 0 0 3) sde [ready ] (8:64) [SYMMETRIX
> > > ]
> > > SEMC_____SYMMETRIX______501566091000 (2 0 0 1) sdf [ready ] (8:80) [SYMMETRIX
> > > ]
> > > SEMC_____SYMMETRIX______5015660D1000 (2 0 0 2) sdg [ready ] (8:96) [SYMMETRIX
> > > ]
> > > SEMC_____SYMMETRIX______501566111000 (2 0 0 3) sdh [ready ] (8:112) [SYMMETRIX
> > > ]
> > > #
> > > # all multipaths :
> > > #
> > > SEMC_____SYMMETRIX______501566091000 [SYMMETRIX       ]
> > >  \_(1 0 0 1) sdc [ready ] (8:32)
> > >  \_(2 0 0 1) sdf [ready ] (8:80)
> > > SEMC_____SYMMETRIX______5015660D1000 [SYMMETRIX       ]
> > >  \_(1 0 0 2) sdd [ready ] (8:48)
> > >  \_(2 0 0 2) sdg [ready ] (8:96)
> > > SEMC_____SYMMETRIX______501566111000 [SYMMETRIX       ]
> > >  \_(1 0 0 3) sde [ready ] (8:64)
> > >  \_(2 0 0 3) sdh [ready ] (8:112)
> > > #
> > > # device maps :
> > > #
> > > create:SEMC_____SYMMETRIX______501566091000:0 1885655040 multipath 2
> > > round-robin 1 0 8:80 round-robin 1 0 8:32
> > > create:SEMC_____SYMMETRIX______5015660D1000:0 1885655040 multipath 2
> > > round-robin 1 0 8:96 round-robin 1 0 8:48
> > > create:SEMC_____SYMMETRIX______501566111000:0 1885655040 multipath 2
> > > round-robin 1 0 8:112 round-robin 1 0 8:64
> > > output <<end>>
> > >
> > > Help please.
> > >
> > > --
> > > dm-devel mailing list
> > > dm-devel at redhat.com
> > > https://www.redhat.com/mailman/listinfo/dm-devel
> >
>




More information about the dm-devel mailing list