dmraid comments and a warning

Dax Kelson dax at gurulabs.com
Mon Feb 6 20:08:36 UTC 2006


It is understood that 'real' hardware RAID controllers are better than
'fake' RAID controllers commonly built into motherboards and cheap PCI
cards.

However, the 'fake' RAID is *extremely* common and is often used by
someone already running Windows who then wants to free up some room to
install and dual boot Linux.

The (single?) advantage of the 'fake' RAID over Linux software RAID is
no effort redundancy in booting because of the RAID BIOS integration.

==The Issue==

Once a RAID1 (mirror) is has been defined and built inside of the BIOS
the utility you *never ever* want to boot to half of the RAID. If you
do, and you go back to booting to whole activated RAID, you get massive
file corruption.

Currently it is very easy for this to happen.

I have been testing the dmraid support out on a test box with onboard
Nvidia "nforce4" RAID and two mirrored drives. Twice I've gotten
completely scrambled data and had to re-install from scratch.

--Event One--

I installed rawhide a few weeks ago. Anaconda automatically activated
the motherboard RAID. Cool! Inside of Anaconda's disk druid I elected to
go the manual route and not setup LVM. Instead I defined:

/dev/mapper/nvidia_foo1 -- /boot
/dev/mapper/nvidia_foo2 -- swap
/dev/mapper/nvidia_foo3 -- /

GRUB installed fine, and all the packages installed OK. It rebooted OK,
and I did a 'yum -y upgrade'. So far so good (or so it seemed).

The standard root=LABEL=/ was used on the kernel command line and what
happened is that it booted up to one side of the mirror. All the updates
and new packages (including a new kernel install which modified the
grub.conf) activity just happened on that one side of the mirror.

When I rebooted, GRUB read a garbled grub.conf because at that stage ist
*is* using a 'activated' RAID (via the RAID BIOS support). I couldn't
boot.

So I booted to the rescue environment, which did the right thing and
activated the RAID and it even mounted the filesystems. When I went and
inspected the files though, anything that got touched while it booted to
the one side of the mirror was trashed.

Result = dead Linux system

--Event Two--

With the benefit of the experience of event one. I did a new install,
but this time I let Anaconda's disk druid do the "auto setup" thing and
create a LVM. I figured that LVM using device mapper and dmraid would
always "do the right thing" in regards to *always* using the activated
RAID partitions as the PVs.

This seemed to be the case. I installed and booted OK. I verified that I
was using LVM and inspected the physical volumes using 'pvdisplay'.

I was greeted with:

# pvdisplay
  --- Physical volume ---
  PV Name               /dev/dm-1
[snip]

Looks good! Seeing /dev/dm-1 instead of /dev/mapper was a surprise, but
I agree with the idea.

This was two or three weeks ago and I have been using the system and
doing daily 'yum -y upgrade'.

Yesterday or the day before I did the 'yum -y upgrade' and rebooted. On
the GRUB screen it would not boot to the first listed kernel, with a
read error on the kernel binary. Looking at the GRUB menu, I noticed
some missing boot options (I had previously added memtest86 and there
was a menu item to boot a Windows Partition). I was able to get a boot
going choosing one of the older kernels.

On bootup I noticed an error flash by something to the effect of "LVM
ignoring duplicate PV".

I ran pvdisplay and saw:

# pvdisplay
  --- Physical volume ---
  PV Name               /dev/sda1
[snip]

It booted off one half of the mirror. It must have done the same on some
previous boot.

For fun (the Linux install was dead now of course, it just didn't know
it) I ran a 'yum -y upgrade'.

On a reboot GRUB listed 3 kernel choices, all of them gave read errors.

Result (a) = dead Linux system
Result (b) = inaccessible Windows system (if you don't know how to use
the GRUB command line)

==Comments==

There needs to be more checks in place to prevent booting off of one
half of the mirror, or at a minimum only allowing a read-only boot on
one side of the mirror. Dead systems are no fun. Loosing your personal
data is hell.

This isn't purely a Linux problem. Any operating system using fake RAID1
needs to be robust in this regard. I saw a Windows box using 'fake'
motherboard RAID and the motherboard BIOS got flashed which reset the
"Use RAID" setting to 'off'. Then Windows booted off of half the RAID.
This was noticed and the BIOS setting was turned back on and a boot
attempted. Massive corruption and a dead Windows system was the result.
To Window's credit I haven't seen it accidentally boot off of half the
RAID as long as the BIOS RAID was turned on and the drivers installed.

The rules are:

1. Don't boot off half of the RAID1 in read-write mode
2. If rule 1 is violated, don't ever again boot using the RAID1
- If you can abide by rule 2, you can do so indefinitely
3. There is no way to recover from a violated rule 1 without
reinstalling.

Dax Kelson
Guru Labs




More information about the fedora-devel-list mailing list