Fedora 11 and dmraid with intel - problems

Dan Williams dan.j.williams at intel.com
Wed Nov 11 00:16:34 UTC 2009

On Thu, Nov 5, 2009 at 7:33 PM, Michael Cronenworth <mike at cchtml.com> wrote:
> First I attempted to shutdown and unplug the bad drive. Fedora wouldn't boot
> -- gets to Password: prompt for encrypted partition. Correct password is
> entered but the encrypted partition cannot be mounted. I narrowed it down to
> that the /dev/sda1 and /dev/sda2 partitions are not showing up so the kernel
> can't find the correct UUID from /etc/crypttab to mount /. It previously
> used /dev/dm-1 I think, but this link has vanished.

[ First let me preface this by saying I am not a dmraid expert so
please forgive if I state something incorrectly below ]

I believe some versions of dmraid fail to assemble the volume if it is
marked degraded.  When you booted with the drive missing the
option-ROM noticed that the drive was missing and marked-up the
metadata to reflect that fact.  Normally when dmraid assembles a raid
set it removes the partitions of the member devices since the entire
drive is managed by DM.  It sounds like in this case it failed to
assemble the raid set and removed the partitions.  I suspect since
this is a raid1 that you should be able to boot the single good disk
if dmraid is left out of the picture, but this isn't what you want.

> Second, I plugged the bad drive back in. Fedora boots normally.

The option-ROM saw that you re-added a disk with out-of-date metadata
and marked the array as 'Rebuild'.  Since the array is not degraded
dmraid allows assembly, everything works like before.

> Third, I get brave and remove the bad drive from the RAID using the Intel
> OROM. I then turn off the machine and remove the bad drive. Fedora won't
> boot -- gets past password prompt, but during bootup it cannot find my /boot
> partition and dumps me to a recovery shell.

Yes, back to a degraded case like your first state.

> Fourthly, I plug the bad drive back in and Fedora boots. The bad drive is no
> longer in the RAID, but its old partitions are exposed. The UUID of
> /dev/sdb1 matches and so it mounts /boot.

In this case the act of removing the drive via the option-ROM erased
the RAID metadata on it, so dmraid sees nothing to claim and leaves
/dev/sdb alone.  The partitions get exposed (i.e. not removed by
dmraid) allowing the crypto code to mount, but as you have probably
guessed this isn't what you want because the raid is now bypassed.

> Fifthly, I couldn't get the bad drive to rebuild so I used a Fedora Live USB
> install to start the rebuild. Rebooted and Fedora wouldn't boot. No
> /dev/sda1 partition and no /dev/sdb1 or /dev/sdb2 partitions!

You can't rebuild in this case because your root filesystem is mounted
on this drive.  dmraid can't claim this drive for its exclusive use
and leaves the drive alone.  Booting to the Live USB means /dev/sdb is
no longer in use.

When you say "start the rebuild", do you mean you didn't allow it to
finish?  I am not sure how the metadata updates are handled in the
version of dmraid you are using, maybe it waits to update the metadata
until after the rebuild is complete???

> Eventually I solved this by removing the "--rm_partitions" part in
> /etc/rc.sysinit. This allowed Fedora to find /dev/sda1 and boot. It seems
> somehow the device mapper mappings broke when the RAID broke. I don't know
> how to fix this.

In this case you are effectively mimicking your "Fourthly" with sda in
place of sdb.

> I RMA'd the bad drive and now I have a replacement drive. I installed the
> new drive and I told the Intel OROM to include it in my RAID1 volume. I
> booted into Fedora. During boot, I get mdadm messages that it is adding a
> drive. I haven't seen this before.

This concerns me I would hope that the Fedora 11 initramfs would
disable mdadm when dmraid is being used to activate a partition.  To
verify this conflict is/isn't happening you would need to get a prompt
in the initramfs and run "cat /proc/mdstat" to see what's being

> When Fedora is loaded, dmraid wants to
> claim my RAID status is 'ok' yet no rebuilding is happening. I have no HDD
> activity. I attempt to initiate a rebuild. I get an error: ERROR: Unable to
> suspend device. Google searching for this exact string returns *zero*
> results.

Perhaps this version of dmraid does not support online rebuid?

> In order to even see any HDD activity I booted into my LiveUSB
> stick and ran "dmraid -R raidset" and it took 3 hours for the HDD light to
> turn off. I reboot and Intel OROM still says "Rebuild". Argh!!

Looks like the dmraid -R raidset command is not modifying the metadata
after the rebuild completes?

> In the middle of all this I ran "dmraid -n" and I see *three* hard drives in
> my RAID. One is the original sda drive, the second is the new sdb drive, and
> the third is the serial of the sdb drive, but with a :0 at the end. I don't
> see any way of removing that drive. How did it even get there?

I don't know how/if dmraid modified the metadata, but the option-ROM
will retain a ghost disk entry until the array is rebuilt.

> Now that I've shared my life story, I'm down to these questions:
> - How do I properly rebuild my RAID1? The Intel OROM still says "rebuild"
> - Does dmraid not support live rebuilding? That seems silly that I had to
> use a LiveUSB load to rebuild.
> - Does dmraid not support rebuild status? I had no idea if the rebuild was
> occurring besides the HDD light.
> - How do I fix device mapper so I don't have to remove the "--rm_partitions"
> out of /etc/rc.sysinit?
> - How do I get my RAID metadata looking good? (no extra ghost drives)

Newer dmraid releases may handle the rebuild case better.  However, I
suspect you should be able to rebuild it with mdadm via a Live USB/CD
image.  This should allow you to get the array back into a state that
will make the dmraid in your Fedora 11 environment happy.

0/ If you haven't already, get a backup of your one good drive in case
something goes wrong with the following steps.
1/ Boot to a Live USB/CD image with a recent version of mdadm (>= 3.0).
2/ Make sure that dmraid has not assembled the disks
3/ mdadm -A /dev/md/imsm /dev/sda # add the one good drive to an 'imsm
4/ mdadm -I /dev/md/imsm # start the container
5/ cat /proc/mdstat # verify that your raid volume was started in degraded mode
6/ mdadm --add /dev/md/imsm /dev/sdb # add the new disk to the
container which starts the rebuild
7/ <wait for rebuild to complete>
8/ mdadm -E /dev/sda # dump the metadata and check that it is no
longer marked 'Degraded'/'Rebuild'
9/ mdadm -Ss # stop the array
10/ Boot back into Fedora 11 and let dmraid assemble the array normally.


More information about the Ataraid-list mailing list