[dm-devel] an interesting note on LUN coalescing

Brian Bunker brian at purestorage.com
Thu May 15 17:23:06 UTC 2014


I will get both the page 0x83 and page 0x80 data of the path when the problem happens. In our case the page 0x83 and page 0x80 data are the same always. The page 0x83 data has the additional encoding and OUI information but the bytes after that 3624a9370 are exactly the same. I am wondering about how the path gets its wwid when it comes back. 

In our case we have a controller coming back online after doing something. All the LUNs are re-discovered by the initiator and an sd device gets created for each LUN path which came back. The serial of the sd device at that time I believe is known to the initiator. The multipathd gets that sd device and comes up with a wwid. I would expect that wwid would be just the serial number plus some bytes in the beginning but that is not what we see in the fail case. The path wwid and path serial are pointing to different volumes on the array. I expect that the sd device will show the same data for page 0x83 and page 0x80 pointing to the same volume on the array but I will verify that with more logging.

You can see from my previous posts on this issue. (They go back for some time). That the page 0x83 and page 0x80 from sg3_utils show that they point to same LUN but different from the dm device that they are coalesced under.

Thanks,
Brian
On May 15, 2014, at 10:07 AM, Stewart, Sean <Sean.Stewart at netapp.com> wrote:

> Hi Brian,
> 
> I responded to your other email before I saw this one, but I wanted to
> see if I could help, here, too. 
> 
> On Thu, 2014-05-15 at 09:35 -0700, Brian Bunker wrote:
>> I have added a line to multipathd/main.c to see if I could get some more information when the dm device gets the wrong LUN’s underneath. So I have a dm device:
>> 
>> 3624a9370c90d0d631ef8783e00010004 dm-2 PURE,FlashArray
>> size=500G features='0' hwhandler='0' wp=rw
>> `-+- policy='round-robin 0' prio=0 status=active
>>  |- 3:0:0:3 sdav 66:240 active undef running
>>  |- 0:0:0:3 sdbc 67:96  active undef running
>>  |- 6:0:0:3 sdbt 68:112 active undef running
>>  |- 5:0:0:3 sdbu 68:128 active undef running
>>  |- 7:0:0:1 sdal 66:80  active undef running
>>  |- 2:0:0:1 sdb  8:16   active undef running
>>  `- 4:0:0:1 sdk  8:160  active undef running
>> 
>> I can see that when sdk is added to the dm, it is added to a dm ending in 004, but the serial number of sdk clearly ends in 002:
>> 
>> May 15 05:09:55 rb9init4 multipathd: sdk: add path (uevent)
>> May 15 05:09:55 rb9init4 multipathd: 3624a9370c90d0d631ef8783e00010004: load table [0 1048576000 multipath 0 0 1 1 round-robin 0 7 1 66:240 1 67:96 1 68:112 1 68:128 1 66:80 1 8:16 1 8:160 1]
>> May 15 05:09:55 rb9init4 multipathd: sdk path added to devmap 3624a9370c90d0d631ef8783e00010004
>> May 15 05:09:55 rb9init4 multipathd: sdk [8:160]: path has a serial C90D0D631EF8783E00010002
>> 
>> I am printing out pp->serial to get the serial number of sdk. 
> 
> pp->serial actually plays no part in determining the multipath device.
> It is queried from VPD page 0x80, via the function get_serial, whereas
> the wwid is obtained from VPD page 0x83, through the function get_uid().
> The pp->wwid attribute ultimately is what it compares to determine the
> multipath device.
> 
> Perhaps when you do this, we should check what the device reports as the
> wwid in inquiry page 0x83?  If you have sg3_utils on your system, you
> can do that through sg_vpd -p 0x83 <device>.  I'd be interested in
> knowing if the page 0x83 and the serial in 0x80 are disagreeing at the
> time it's being queried by multipathd..
> 
> 
> Thanks,
> Sean Stewart
> 
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

Brian Bunker
brian at purestorage.com







More information about the dm-devel mailing list