[linux-lvm] progress, but... - re. fixing LVM/md snafu

Jayson Vantuyl jvantuyl at engineyard.com
Sun Apr 5 18:32:00 UTC 2009


Miles,

It seems like what's probably happened is that LVM detected the raw  
device instead of the MD device at some point early in the boot  
process.  This may be because the MD detection happened after LVM  
setup.  I'm unsure if it's possible for LVM to "steal" the device from  
MD.

Depending on your distribution, this may require different things to  
fix.  Stop worrying about downtime.  If the data is important, just  
don't worry about downtime.  If downtime is really important, build a  
second machine, get it working right, and transfer the data.  Being in  
a hurry and attempting to "optimize" the recovery process is a really  
good way to lose the data.

Assuming that you're going to try to fix this setup, I'd start out  
with a backup.  This is critical.  Everybody always says to do a  
backup.  Nobody ever does it.  Really, do one.  Get an S3 account, use  
an S3 backup utility.  There's just not an excuse these days.  Your  
data is one-MD-mistake away from oblivion.

So, right now MD should have sda/sdb but only has sda.  sdb is now  
newer than sda and may have important data if this server stores  
anything like that.  The challenge is that, according to MD, sda is  
newer.  Since MD isn't handling writes to sdb, it won't be updating  
its metadata to know that it's newer.  There are two options that I  
can think of, both ugly.  Pick one of:

1.  Destroy the MD.  Create a new one with the same UUID and sdb3 as  
the source. (which you listed, the UUID part can trip you up)
2.  Sync the updated data from sdb3 onto md2.  Wipe sdb3.  Add it back  
into md2. (might be less downtime depending on data size, doesn't nuke  
MD)
3.  Build another machine.  Get it working right.  Transfer data with  
Rsync. (least downtime, most expensive)

In the first two cases, this only sets you up for it to break again.   
The core problem is figuring out what happened during boot.  In a  
perfect world, you would just tell LVM to only consider MD devices.   
That's not hard, but it's complicated by the fact that you have LVM  
on /.  This means that the configuration that's used is likely not the  
version on / but a copy of it that is made when you set up your boot  
ramdisk (a.k.a. initrd, or possibly an initramfs).  Even if we get LVM  
locked down to use just MDs and get that config used to boot-time,  
there's the possibility that the MD won't get assembled (since it  
already may not have been when LVM was first activated) and the system  
won't boot.  Again, fraught with peril.

If you want to fix the MD, first steps will be using a rescue LiveCD  
to boot up and do all of this.  With that LiveCD, you can also adjust  
the LVM configuration and update the initrd (or whatever is used for  
boot).  You may need to chroot into the system and/or trick the initrd  
into seeing the right devices.  I don't really think I can walk you  
through this via an e-mail.

The LVM part is pretty easy.  Just set a filter line (you only get  
one, so disable any other filter lines) in <root of system>/etc/ 
lvm.conf to:

> filter = [ "a|^/dev/md.*$|", "r/.*/" ]


That will prevent you from using anything but the MD.

To update the initrd with this information depends on distro (and  
distro version) .  It's usually either some invocation of "mkinitrd"  
or some script that wraps it.  It will get the LVM configuration  
available at boot-time.  This *MIGHT* sort out the MD problem.  It  
might not.  If it doesn't, I'm not sure where to tell you to start.   
If mdadm is being used by your initrd, you'll need to tweak its  
configuration.  If it's relying on MD autodetection, you might have  
turned that off in your kernel.  If you have an IDE controller that  
takes too long to initialize, that can also cause this sort of thing  
(although that's REALLY unlikely these days).

I hope that some of this helps.  Although, it will be hard for anyone  
to give you really solid advice without a little more insight into why  
the MD isn't getting assembled prior to LVM's scan.

On Apr 5, 2009, at 10:05 AM, Miles Fidelman wrote:

> Hello again Folks,
>
> So.. I'm getting closer to fixing this messed up machine.
>
> Where things stand:
>
> I have root defined as an LVM2 LV, that should use /dev/md2 as it's  
> PV.
> /dev/md2 in turn is a RAID1 array built from /dev/sda3 /dev/sdb3  
> and /dev/sdc3
>
> Instead, LVM is reporting: "Found duplicate PV  
> 2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3 not /dev/sda3"
> and the /dev/md2 is reporting itself as inactive (cat /proc/mdstat)  
> and active,degraded (mdadm --detail)
>
> ---
> I'm guessing that, during boot:
>
> - the raid array failed to start
> - LVM found both copies of the PV, and picked one (/dev/sdb3)
> - everything then came up and my server is humming away
>
> but: the md array can't rebuild because the most current device in  
> it is already in use
>
> so...  I'm looking for the right sequence of events, with the  
> minimum downtime to:
>
> 1. stop changes to /dev/sdb3 (actually, to / - which complicates  
> things)
> 2. rebuild the RAID1 array, making sure to use /dev/sdb3 as the  
> starting point for current data
> 3. restart in such a way that LVM finds /dev/md2 as the right PVM  
> instead of one of its components
>
> Each of these is just tricky enough that I'm sure there are lots of  
> gotchas to watch out for.
>
> So.. any suggestions?
>
> Thanks very much,
>
> Miles Fidelman
>
>
>
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

-- 
Jayson Vantuyl
Founder and Architect
Engine Yard
jvantuyl at engineyard.com
1 866 518 9275 ext 204
IRC (freenode): kagato

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20090405/7fc9eecd/attachment.htm>


More information about the linux-lvm mailing list