[linux-lvm] shift PV from disk to raid device?

Doug Ledford dledford at redhat.com
Tue Dec 9 19:56:24 UTC 2008


On Tue, 2008-12-09 at 20:42 +0100, Kai Schaetzl wrote:
> Stuart D. Gathman wrote on Tue, 9 Dec 2008 13:38:46 -0500 (EST):
> 
> > 1) create md2 from sda3 and sdb3
> > 2) pvcreate md2
> > 3) add sda3 to VG - LVM complains about dup and uses sdb3 instead
> 
> A bit different. But thanks for this hint. I checked my bash history now. 
> Short summary:
> 
> mdadm --create --verbose /dev/md3 --level=1 --raid-devices=2 /dev/sda3 missing
> mdadm --detail /dev/md3
> pvcreate /dev/md3
> pvcreate /dev/md3 -ff
> vgcreate dom1 /dev/md3 (to avoid name clash with dom0 on sdb3)
> .. some lvcreate and other stuff
> mdadm --detail /dev/md3
> reboot
> vgchange dom1 -an
> 
> vgrename dom1 dom0
> 
> 
> vgchange dom0 -ay
> mdadm /dev/md3 -a /dev/sdb3
> reboot
> 
> So, I did it all alright. I was wrong with my assumption about using the wrong 
> device for the PV. However, I only notice now that /dev/md3 is gone for good, 
> mdadm cannot find it. I had not been paying attention and thought that md2 was 
> the array in question. (Thinking md0=sda1/sdb1, md1=sda2/sdb2, md2=sda3/sdb3, but 
> I used md0=sda1/sdb1, md2=sda2/sdb2, md3=sda3/sdb3). So I didn't notice it.
> 
> Going further back in the history, it seems I forgot to pvremove the earlier 
> existing PV /dev/sda3 (which had the same structure and data) before creating the 
> PV on /dev/md3. That's why I had to -ff /dev/md3 I assume. It seems that after 
> adding sdb3 to the array and rebooting the array dissolved completely and the 
> existing PVs on sda3 and sdb3 came back into existence. And I thought it was an 
> effect of the mirror sync.
> 
> So, I think I will just start over by removing /dev/sda3 now completely,
> recreating the raid array and then recreating the PV on it. I think I should then 
> be able to pvmove the LVs from /dev/sdb3 to /dev/md3, right? Problem solved. This 
> time.
> 
> However, as this is the second time that this or a similar array just vanishes I 
> wonder if there's something else going on. e.g. some kind of unwanted interaction 
> between LVM and mdraid partitions. The other two raid arrays on the disk are 
> absolutely stable and don't use LVM.

One thing that might be happening here (and I'm not sure since I'm not
an LVM expert) is that when the lvm stack finds the pv on /dev/sdb3
or /dev/sda3, it sees a full size partition (meaning it can access all
the way to the end of the device).  The md subsystem by default uses
version 0.90 superblocks, which are located in the last 64k of the
physical partition.  This means that when you create a pv on a raid
device, the raid device size information returned by the kernel is
somewhat shorter than the physical partition size in order to reserve
the space at the end for the superblock.  However, accessing the bare
device does not reserve that space and it's possible that filesystem
information is overwriting part of the raid superblock and causing it to
fail its checksum check, making the raid device disappear completely.

The solution to this problem is to create the raid device with a
superblock format of 1.1 or 1.2 (aka, -e 1.1 or -e 1.2).  This places
the raid superblock at the *beginning* of the partition instead of the
end, and offsets the data in the partition past the superblock.  By so
doing, if you attempt to read an lvm label (or a filesystem label) off
of the bare scsi device, the label won't be found because it will be in
the wrong place in the partition.  This ensures that you will never
accidentally access the device via the bare scsi drive, instead you will
only be able to access it by bringing the raid device up.

Note: superblock format 1.1 and 1.2 can not be used on bootable
partitions as grub doesn't know how to deal with the offset to the
filesystem, but otherwise can be used anywhere else.

It's also worth noting that adding an internal bitmap to the device,
while it slows down writes, will greatly reduce the time to resync the
device in case of an unexpected/unclean shutdown.

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20081209/c8797328/attachment.sig>


More information about the linux-lvm mailing list