[dm-devel] [PATCH]: multipath: fail path on zero size devices

Menny_Hamburger at Dell.com Menny_Hamburger at Dell.com
Thu Feb 10 10:19:27 UTC 2011


Hi,

I am sending this as a continuum to the  situation described in the following patch:
https://patchwork.kernel.org/patch/62094

We are using MD32xxi storage arrays (ISCSI, RDAC) in a clustering environment where we have two machines logged into both controllers of the storage array.

This is the output of multipath -ll on one of the machines when all is well.
lun1 (36842b2b00063c13d000003594ce9f82b) dm-1 DELL,MD32xxi
[size=1.0T][features=2 pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=200][active]
\_ 6:0:0:1  sdf 8:80  [active][ready]
\_ 5:0:0:1  sdh 8:112 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 8:0:0:1  sdj 8:144 [active][ghost]
\_ 7:0:0:1  sdk 8:160 [active][ghost]
lun0 (36842b2b000571923000003933b4c0d04) dm-0 DELL,MD32xxi
[size=1.0T][features=2 pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=200][active]
\_ 8:0:0:0  sdd 8:48  [active][ready]
\_ 7:0:0:0  sdc 8:32  [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 6:0:0:0  sdb 8:16  [active][ghost]
\_ 5:0:0:0  sde 8:64  [active][ghost]

When I disconnect the switch connecting between the machines and the storage array (while there is I/O to the devices) and rescan the storage during this period, I get READ_CAPACITY failures and as result the devices receive a zero size.  It seems that in this case, not only do we have a ping-pong between path-groups on the same machine, we also have a ping-pong between LUNs on the two machines, which causes havoc to our system.

This was the output of multipath -ll during the failure:
lun1 (36842b2b00063c13d000003594ce9f82b) dm-1 ,
[size=1.0T][features=2 pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
\_ #:#:#:# sdf 8:80  [active][undef]
\_ #:#:#:# sdh 8:112 [active][undef]
\_ round-robin 0 [prio=0][enabled]
\_ #:#:#:# sdj 8:144 [active][undef]
\_ #:#:#:# sdk 8:160 [active][undef]
lun0 (36842b2b000571923000003933b4c0d04) dm-0 ,
[size=1.0T][features=2 pg_init_retries 50][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][enabled]
\_ #:#:#:# sdd 8:48  [failed][undef]
\_ #:#:#:# sdc 8:32  [failed][undef]
\_ round-robin 0 [prio=0][enabled]
\_ #:#:#:# sdb 8:16  [failed][undef]
\_ #:#:#:# sde 8:64  [failed][undef]

Instead of not allowing 0 size paths to enter the map, we should allow them to get in but fail them although the path checker tells us that the path is up.
The following patch is over multipathd in device-mapper-multipath-0.4.7-42.el5 (RHEL56):

--- a/multipathd/main.c                2010-12-07 08:02:23.000000000 +0200
+++ b/multipathd/main.c             2011-02-10 10:51:57.000000000 +0200
@@ -1050,4 +1050,11 @@
                                                                                                               &(pp->checker.timeout));
                                                               newstate = checker_check(&pp->checker);
+                                                             if (newstate != PATH_DOWN) {
+                                                                             unsigned long long size = 0;
+
+                                                                             sysfs_get_size(sysfs_path, pp->dev, &size);
+                                                                             if (size == 0)
+                                                                                             newstate = PATH_DOWN;
+                                                             }
                               }
Since the path is down, the path groups no longer compete on LUN ownership and things remain stable.
In addition, with the ping-pong running wild I need to rescan again after storage comes up - with this fix I do not.

Menny



Menny Hamburger
Engineer
Dell | IDC
office +972 97698789,  fax +972 97698889
Dell IDC. 4 Hacharoshet St, Raanana 43657, Israel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20110210/c4f3de12/attachment.htm>


More information about the dm-devel mailing list