[dm-devel] Multipath and HSG80 phase 2
christophe varoqui
christophe.varoqui at free.fr
Mon Dec 13 11:34:58 UTC 2004
As James Smart suggested, you can try to decrease the HBA driver nodev
timeout value to 1 sec.
I have done the cable unplug / plug test regularly and it work reliably
for me. I'll try the controler restart test tomorow.
regards,
cvaroqui
Le lundi 13 d?embre 2004 à 11:24 +0100, Nicola Ranaldo a écrit :
> > Indeed,
> > can you audit your fixes in
> > http://christophe.varoqui.free.fr/multipath-tools/multipath-tools-0.4.0.tar.bz2
> > before I release it ?
>
> Ok, now the tools does not segs, but the last check i have to do is about
> the clone syscall, on my system (slackware 10.0) i have to use fork in
> order to have multipathd daemons
> run.
> While using clone strace multipathd gives:
>
> brk(0) = 0x8051000
> brk(0x8052000) = 0x8052000
> brk(0) = 0x8052000
> brk(0) = 0x8052000
> brk(0x8056000) = 0x8056000
> clone(child_stack=0x8055040, flags=CLONE_NEWNS) = 2443
> exit_group(0) = ?
>
> and the process dies...
> it's the clone call necessary? does the process run properly even if i use
> fork?
>
> > ... and report on general behaviour.
>
> Ok, some progress is done :)))
>
> Failover initiated by an "sg_start /dev/sgx 1" works properly! and i can do
> a lot of switches between active and ghost path, with a 1/2 second delay
> between each other, with no process disruption! great :)
>
> howewer a failover initiated by a "restart other" on the hsg80 console
> gives:
>
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 655327
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 656343
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 656351
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 657367
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 657375
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658391
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658399
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658407
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658903
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658911
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 659927
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 659935
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 660951
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 660959
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 660967
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 661983
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 661991
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 663007
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 663015
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664031
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664039
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664047
> Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664791
> Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664799
> Dec 13 11:13:55 m3 kernel: Incorrect number of segments after building list
> Dec 13 11:13:55 m3 kernel: counted 8, received 1
> Dec 13 11:13:55 m3 kernel: req nr_sec 1024, cur_nr_sec 8
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block
> 81523
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block
> 81524
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block
> 81525
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block
> 81526
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block
> 81527
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block
> 81528
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block
> 81529
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block
> 81530
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block
> 81531
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block
> 81532
> Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
> Dec 13 11:13:55 m3 kernel: Incorrect number of segments after building list
> Dec 13 11:13:55 m3 kernel: counted 8, received 1
> Dec 13 11:13:55 m3 kernel: req nr_sec 1024, cur_nr_sec 8
> Dec 13 11:14:06 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
> Dec 13 11:14:06 m3 multipathd: 8:16 : tur checker reports path is down
> Dec 13 11:14:06 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
> Dec 13 11:14:06 m3 last message repeated 4 times
> Dec 13 11:14:06 m3 multipathd: event checker startup : disk1
> Dec 13 11:14:16 m3 multipathd: 8:0 : tur checker reports path is up
> Dec 13 11:14:18 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
> Dec 13 11:14:18 m3 last message repeated 2 times
> Dec 13 11:14:42 m3 kernel: Incorrect number of segments after building list
> Dec 13 11:14:42 m3 kernel: counted 8, received 1
> Dec 13 11:14:42 m3 kernel: req nr_sec 1024, cur_nr_sec 8
> Dec 13 11:14:42 m3 multipathd: devmap event on disk1
> Dec 13 11:14:42 m3 kernel: Incorrect number of segments after building list
> Dec 13 11:14:42 m3 kernel: counted 8, received 1
> Dec 13 11:14:42 m3 kernel: req nr_sec 1024, cur_nr_sec 8
> Dec 13 11:14:42 m3 kernel: Incorrect number of segments after building list
> Dec 13 11:14:42 m3 kernel: counted 8, received 1
> Dec 13 11:14:42 m3 kernel: req nr_sec 1024, cur_nr_sec 8
> Dec 13 11:14:44 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
> Dec 13 11:14:44 m3 last message repeated 2 times
> Dec 13 11:14:44 m3 multipathd: event checker startup : disk1
>
> after a long delay the random write operation (blocked due to the fail)
> restarts!
>
> but in the log i have:
>
> Dec 13 11:15:43 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
> Dec 13 11:16:16 m3 last message repeated 12 times
> Dec 13 11:16:44 m3 last message repeated 17 times
>
> and multipath -l -v3 gives
> 0:0:1:2: sg_io failed status 0x0 0x1 0x0 0x0
> 0:0:1:2: Unable to get INQUIRY vpd 1 page 0x0.
> disk1 (360001fe1001613800009205005470164)
> [size=33 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [active][first]
> \_ 0:0:0:2 sda 8:0 [ready ][active]
>
> the second path is lose!
>
> to double check giving an sg_start on the lose path i get:
> start_stop: Host_status=0x01 [DID_NO_CONNECT]
>
> all this without oops
>
> thanks
>
> Nicola Ranaldo
>
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
--
christophe varoqui <christophe.varoqui at free.fr>
More information about the dm-devel
mailing list