[dm-devel] Multipath and HSG80 phase 2

Nicola Ranaldo ranaldo at unina.it
Mon Dec 13 10:24:43 UTC 2004


> Indeed,
> can you audit your fixes in
> http://christophe.varoqui.free.fr/multipath-tools/multipath-tools-0.4.0.tar.bz2
> before I release it ?

Ok, now the tools does not segs, but the last check i have to do is about
the clone syscall, on my system (slackware 10.0)  i have to use fork in 
order to have multipathd daemons
run.
While using clone strace multipathd gives:

brk(0)                                  = 0x8051000
brk(0x8052000)                          = 0x8052000
brk(0)                                  = 0x8052000
brk(0)                                  = 0x8052000
brk(0x8056000)                          = 0x8056000
clone(child_stack=0x8055040, flags=CLONE_NEWNS) = 2443
exit_group(0)                           = ?

and the process dies...
it's the clone call necessary? does the process run properly even if i use
fork?

> ... and report on general behaviour.

Ok, some progress is done :)))

Failover initiated by an "sg_start /dev/sgx 1" works properly! and i can do
a lot of switches between active and ghost path, with a 1/2 second delay
between each other, with no process disruption! great :)

howewer a failover initiated by a "restart other" on the hsg80 console 
gives:

Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 655327
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 656343
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 656351
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 657367
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 657375
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658391
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658399
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658407
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658903
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 658911
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 659927
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 659935
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 660951
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 660959
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 660967
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 661983
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 661991
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 663007
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 663015
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664031
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664039
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664047
Dec 13 11:13:55 m3 kernel: SCSI error : <0 0 1 2> return code = 0x20000
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664791
Dec 13 11:13:55 m3 kernel: end_request: I/O error, dev sdb, sector 664799
Dec 13 11:13:55 m3 kernel: Incorrect number of segments after building list
Dec 13 11:13:55 m3 kernel: counted 8, received 1
Dec 13 11:13:55 m3 kernel: req nr_sec 1024, cur_nr_sec 8
Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
81523
Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
81524
Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
81525
Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
81526
Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
81527
Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
81528
Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
81529
Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
81530
Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
81531
Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
Dec 13 11:13:55 m3 kernel: Buffer I/O error on device dm-1, logical block 
81532
Dec 13 11:13:55 m3 kernel: lost page write due to I/O error on dm-1
Dec 13 11:13:55 m3 kernel: Incorrect number of segments after building list
Dec 13 11:13:55 m3 kernel: counted 8, received 1
Dec 13 11:13:55 m3 kernel: req nr_sec 1024, cur_nr_sec 8
Dec 13 11:14:06 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
Dec 13 11:14:06 m3 multipathd: 8:16 : tur checker reports path is down
Dec 13 11:14:06 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
Dec 13 11:14:06 m3 last message repeated 4 times
Dec 13 11:14:06 m3 multipathd: event checker startup : disk1
Dec 13 11:14:16 m3 multipathd: 8:0 : tur checker reports path is up
Dec 13 11:14:18 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
Dec 13 11:14:18 m3 last message repeated 2 times
Dec 13 11:14:42 m3 kernel: Incorrect number of segments after building list
Dec 13 11:14:42 m3 kernel: counted 8, received 1
Dec 13 11:14:42 m3 kernel: req nr_sec 1024, cur_nr_sec 8
Dec 13 11:14:42 m3 multipathd: devmap event on disk1
Dec 13 11:14:42 m3 kernel: Incorrect number of segments after building list
Dec 13 11:14:42 m3 kernel: counted 8, received 1
Dec 13 11:14:42 m3 kernel: req nr_sec 1024, cur_nr_sec 8
Dec 13 11:14:42 m3 kernel: Incorrect number of segments after building list
Dec 13 11:14:42 m3 kernel: counted 8, received 1
Dec 13 11:14:42 m3 kernel: req nr_sec 1024, cur_nr_sec 8
Dec 13 11:14:44 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
Dec 13 11:14:44 m3 last message repeated 2 times
Dec 13 11:14:44 m3 multipathd: event checker startup : disk1

after a long delay the random write operation (blocked due to the fail) 
restarts!

but in the log i have:

Dec 13 11:15:43 m3 kernel: SCSI error : <0 0 1 2> return code = 0x10000
Dec 13 11:16:16 m3 last message repeated 12 times
Dec 13 11:16:44 m3 last message repeated 17 times

and multipath -l -v3 gives
0:0:1:2: sg_io failed status 0x0 0x1 0x0 0x0
0:0:1:2: Unable to get INQUIRY vpd 1 page 0x0.
disk1 (360001fe1001613800009205005470164)
[size=33 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active][first]
  \_ 0:0:0:2 sda  8:0     [ready ][active]

the second path is lose!

to double check giving an sg_start on the lose path i get:
start_stop: Host_status=0x01 [DID_NO_CONNECT]

all this without oops

thanks

    Nicola Ranaldo




More information about the dm-devel mailing list