[dm-devel] Multipaths getting cross-linked on IBM Power-8 after adding wwids

Andy D'Arcy Jewell Andy.D'ArcyJewell at csiltd.co.uk
Mon Oct 31 13:40:33 UTC 2016


Hi all,

We have been battling with this issue with help from IBM support, but I'd like to understand the situation a bit better, so thought I'd go to the source ;-)

We seem to have a work-around, but it seems rather woolly and sledge-hammer like: re-run dracut to rebuild the initramfs copy of /etc/multipaths/bindings and wwids files, and effectively tore down, and re-built the multipaths, and overlying LVM storage from scratch. It appears that the copy of bindings in initramfs assigned a different mpath id to the affected wwids than the copy in /etc/multipath. I would not have expected multipathd to be inspecting the contents of initramfs (and don't think it is), but I'm also not sure if this is a symptom or a cause.

The multipaths had been added several weeks earlier, and had been working properly ever since, but I don't believe there had been a re-boot since they had been added, nor had the intitramfs been updated. We had a scheduled SAN hardware maintenance restart, and later noticed strange filesystem corruption on a guest server.

The issue is that several mpath devices ended up with overlapping block devices after an interruption to half of the paths, due to a hardware re-start of one of the SAN nodes. All the block devices from mpathn ended up being used by another mpathm too:

Truncated output of multipath -ll:

mpathn (360050763008080eef80000000000002d) dm-13 IBM     ,2145
| |- 1:0:0:12 sdm  8:192  active undef running
| `- 3:0:0:12 sdao 66:128 active undef running
  |- 3:0:1:12 sdbc 67:96  failed undef running
  `- 1:0:1:12 sdaa 65:160 active undef running
mpathm (360050763008080eef80000000000002c) dm-12 IBM     ,2145
| |- 1:0:1:12 sdaa 65:160 active undef running
| `- 3:0:1:12 sdbc 67:96  active undef running
| |- 1:0:1:11 sdz  65:144 active undef running
| |- 3:0:1:11 sdbb 67:80  failed undef running
| |- 1:0:0:12 sdm  8:192  active undef running
| `- 3:0:0:12 sdao 66:128 active undef running
  |- 1:0:0:11 sdl  8:176  active undef running
  `- 3:0:0:11 sdan 66:112 active undef running

Note: that failed sdbb device eventually came back as active, as this was taken just after the hardware was reset, but before everything had settled down again. However, the overlap never resolved itself.

Environment:
uname -a
Linux p8-srvr1 3.10.82-2042.1.pkvm2_1_1.71.ppc64 #1 SMP Fri Jul 31 09:52:38 CDT 2015 ppc64 ppc64 ppc64 GNU/Linux

cat /etc/issue
IBM_PowerKVM release 2.1.1 build 62 service (pkvm2_1_1)
Kernel \r on a \m (\l)

rpm -qa |grep multipath
device-mapper-multipath-libs-0.4.9-51.pkvm2_1.5.ppc64
device-mapper-multipath-0.4.9-51.pkvm2_1.5.ppc64

So my questions are:
a) Is this expected behaviour or a bug?
b) If a bug, is there a fix?
c) Is there any further information you need to help diagnose?

Regards,

Andy D'Arcy Jewell
Linux/FOSS Operations
CSI LTD



******************************************************************
IMPORTANT NOTICE
'This e-mail message is intended solely for the person to whom it is addressed and may contain confidential or privileged information. If you have received it in error, please notify postmaster at csiltd.co.uk and destroy this e-mail and any attachments. In addition, you must not disclose, copy, distribute or take any action in reliance on this e-mail or any attachments. Any liability (in negligence or otherwise) arising from any third party acting, or refraining from acting, on any information contained in this e-mail is excluded. Any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of the company. 
When addressed to our customers any quotations contained in this e-mail are subject to contract and are on the terms of the company's standard Conditions, a copy of which is available on request. Any errors or omissions in any quotations or other information issued by the company shall be subject to correction without any liability on the part of the company. Copyright in documents created by or on behalf of this company remains vested in the company, and we assert our moral rights, unless the terms of our relevant client's agreement provide otherwise. 
Due to the nature of Internet communications CSI cannot guarantee that this communication or any attachments do not contain software viruses. We have taken every precaution to minimise this probability but cannot accept any liability for damage which you may sustain as a result of software viruses. We recommend you carry out your own virus checks before opening attachments. 
CSI reserves the right to monitor all e-mail communications through its internal and external networks.
This communication is from Computer Systems Integration Limited 
Registered in England and Wales
Registered number: 1748591
Registered address: Lynton House, 7-12 Tavistock Square. London WC1H 9BQ
******************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20161031/848d8f4e/attachment.htm>


More information about the dm-devel mailing list