[dm-devel] Re: multipath-tools support for SGI TP9700...

Thu Jun 26 04:49:18 UTC 2008

Christophe,
Unfortunately it does not appear that the TP9700 is working using the 
multipath device settings you provided. 

Our configuration is such where the host (a Sun X4600 running RHEL 5.2) is 
connected to the TP9700 using two Fibrechannel connections:

No FC switches are used, just simple direct HBA to SP connectivity with 
two HBAs and two Storage Processors.  LUNs on the RAID are distributed to 
be owned by either SPA or SPB to distribute the workload between the SPs 
and the fibrechannel connections.

The TP9700 can be configured to present the storage to a host by setting 
the "Storage Array Host Type" (Linux, SGIRDAC, SGIAVT, Windows, etc).  For 
my tests, I've been experimenting with Linux and SGIRDAC.  I have been 
unsuccessful in determining what the storage array host type "Linux"s 
failover method is, but I thought I had come across an article that said 
the Linux type is basic AVT.  I could be mistaken.

Setting the TP9700 Host Type to "Linux" , I then setup /etc/multipath.conf 
to mimic the defaults for the TP9500:

       device {
               vendor                  "SGI"
               product                 "TP9[457]00"
               getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
               prio_callout            "/sbin/mpath_prio_tpc /dev/%n"
               features                "0"
               hardware_handler        "0"
               path_grouping_policy    group_by_prio
               failback                immediate
               rr_weight               uniform
               rr_min_io               1000
               path_checker            tur
       }

This configuration ran OK for a while, then began to log multipath 
failures, and eventually I/O buffer errors.  All LUNs on one SP trespassed 
to the other SP, and I had to manually place each trespassed LUN back to 
its primary path.

Changing the TP9700 host type to SGIRDAC, then trying the configuration 
you provided me caused the host to not see the ghost path.  Effectively I 
ended up with a single path.  Disconnecting a FC connection resulted in 
the inability to see any of the LUNs assigned to the associated SP.

I modified the multipath.conf a little:
        device {
                vendor "SGI"
                product "TP9700"
                path_grouping_policy failover
                getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
                features "1 queue_if_no_path"
                path_checker rdac
                prio_callout            "/sbin/mpath_prio_tpc /dev/%n"
                hardware_handler "1 rdac"
                prio rdac
                failback immediate
        }
This worked ok, but I see lots of scsi sense key errors:

Jun 23 12:16:42 p4dbl03 kernel: sdbk: Current: sense key: Recovered Error
Jun 23 12:16:42 p4dbl03 kernel:     <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 
ASCQ=0x1
Jun 23 12:16:42 p4dbl03 kernel:

I see those error regardless of how I configured the RAID and 
multipath.conf, which is worrisome.
I especially see those errors if I run 'fdisk -l'. 

Disconnecting a FC cable on one HBA caused the associated volumes to 
trespass to the other SP, however, during this process, I noticed buffer 
I/O errors.  Also, I noticed that the trespassed LUNs did not failback to 
their original SP when the FC cable was reconnected.  Am I to assume that 
RDAC or other multipath software will not tell the storage to failback 
trepassed LUNs?

Your assistance is appreciated,

- Kevin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20080626/46d3d5b6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 14003 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20080626/46d3d5b6/attachment.jpe>