<div dir="ltr"><div>Ok, I guess those changes can come incrementally over this patch then.<br><br></div>Applied.<br><div><div><br></div>Christophe Varoqui<br></div><a href="http://www.opensvc.com">www.opensvc.com</a><br></div>
<div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Jun 9, 2014 at 10:22 PM, Benjamin Marzinski <span dir="ltr"><<a href="mailto:bmarzins@redhat.com" target="_blank">bmarzins@redhat.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On Thu, May 15, 2014 at 11:45:40PM +0200, Christophe Varoqui wrote:<br>
> Ben,<br>
> I'd need your ack on this one.<br>
> Best regards,<br>
> Christophe Varoqui<br>
<br>
</div>Sorry I dropped the ball on this one.<br>
<br>
I'm o.k. with this patch. The biggest issue I have with it has nothing<br>
to do with its correctness, but with rlookup_wwid()'s use of scan_device.<br>
Previously, the only scan_device call always failed. Now scan every<br>
device name, but we don't ever get anything out of it. First off, if we<br>
find a match, we will never use the id. Second, if we don't find a match we<br>
return the id that of the alias we were looking for, but if we do find a<br>
match we return the next id after the one we were looking for (which is<br>
completely pointless).<br>
<br>
It seems like we could just make rlookup_wwid() return success or failure,<br>
and then call scan_device() from use_existing_alias() if we need to, and<br>
take out a bunch of pointless work that rlookup_wwid() is doing.<br>
<br>
-Ben<br>
<div class=""><br>
><br>
> On Thu, May 15, 2014 at 9:21 PM, Stewart, Sean<br>
</div><div><div class="h5">> <[1]<a href="mailto:Sean.Stewart@netapp.com">Sean.Stewart@netapp.com</a>> wrote:<br>
><br>
> Ping... Any additional comments or suggestions for this patch?<br>
> Bumping in case it got lost in the backlog. :)<br>
> On Fri, 2014-04-11 at 17:01 +0000, Stewart, Sean wrote:<br>
> > On Fri, 2014-04-11 at 17:03 +0100, Bryn M. Reeves wrote:<br>
> > > On Fri, Mar 28, 2014 at 09:01:14PM +0000, Stewart, Sean wrote:<br>
> > > > When a system is booted to the SAN, a condition can occur where<br>
> one<br>
> > > > user friendly name is given to a disk during boot, but multipathd<br>
> tries<br>
> > > > to allocate a different one after boot. If the second alias is<br>
> already<br>
> > > > used by another device, multipathd can't rename it. Multipathd<br>
> then has<br>
> > > > incorrect information about the alias/wwid relationships, which<br>
> can<br>
> > > > result in paths being added to the wrong map.<br>
> > ><br>
> > > This should only happen if the initramfs and root file system have<br>
> > > inconsistent multipath configurations (either multipath.conf or<br>
> bindings<br>
> > > / wwids file mismatched). That's not really a valid configuration<br>
> for<br>
> > > the system to be in and leads to the type of problems you describe.<br>
> ><br>
> > That is true that it only happens if they are out of sync. We tried<br>
> > remaking the initramfs to fix the problem, but it didn't help.<br>
> > ><br>
> > > > This patch works around this problem by first trying to use the<br>
> alias<br>
> > > > already bound to a device during boot. If the bindings file has<br>
> that<br>
> > > > alias bound to a different device, it'll auto generate a new alias<br>
> to<br>
> > > > rename it to.<br>
> > ><br>
> > > To be honest I'd prefer to see this cause an error. These types of<br>
> > > configurations currently run the risk of silent data corruption -<br>
> I'd<br>
> > > much rather deal with a system that refuses to boot due to an out of<br>
> > > date initramfs image than one that quietly remaps paths in<br>
> unexpected<br>
> > > ways.<br>
> ><br>
> > The issue, though, is that the system does not refuse to boot. In the<br>
> > case we saw, it booted anyway, our QA engineer ran a test, and it<br>
> ended<br>
> > with a data corruption. A user could perform a fresh installation,<br>
> > map<br>
> > new luns, reboot, and without any way of realizing it have essentially<br>
> a<br>
> > ticking time bomb on their hands, ready to go off as soon as there's a<br>
> > blip in the SAN.<br>
><br>
</div></div>> References<br>
><br>
> Visible links<br>
> 1. mailto:<a href="mailto:Sean.Stewart@netapp.com">Sean.Stewart@netapp.com</a><br>
</blockquote></div><br></div>