Fedora 11 and dmraid with intel - problems

Michael Cronenworth mike at cchtml.com
Fri Nov 6 02:33:46 UTC 2009


Hi everyone,

Sorry for the long email ahead of time, but I've tried researching this 
the best I can and I've run out of ideas.

I have a nice, new server mobo. It has a Intel 3420 chipset. I have two 
1TB Seagate 7200.12 drives attached. I used the Intel OROM to setup the 
RAID1 on the two drives. I installed Fedora 11 with no issues. I 
encrypted the LVM during the install.

A few days ago one of the drives reported a bad sector. The bad drive 
was sdb. I wanted to remove the bad drive so that I could slip in the 
new drive when it arrived. Here's the first steps I took:

First I attempted to shutdown and unplug the bad drive. Fedora wouldn't 
boot -- gets to Password: prompt for encrypted partition. Correct 
password is entered but the encrypted partition cannot be mounted. I 
narrowed it down to that the /dev/sda1 and /dev/sda2 partitions are not 
showing up so the kernel can't find the correct UUID from /etc/crypttab 
to mount /. It previously used /dev/dm-1 I think, but this link has 
vanished.

Second, I plugged the bad drive back in. Fedora boots normally.

Third, I get brave and remove the bad drive from the RAID using the 
Intel OROM. I then turn off the machine and remove the bad drive. Fedora 
won't boot -- gets past password prompt, but during bootup it cannot 
find my /boot partition and dumps me to a recovery shell.

Fourthly, I plug the bad drive back in and Fedora boots. The bad drive 
is no longer in the RAID, but its old partitions are exposed. The UUID 
of /dev/sdb1 matches and so it mounts /boot.

Fifthly, I couldn't get the bad drive to rebuild so I used a Fedora Live 
USB install to start the rebuild. Rebooted and Fedora wouldn't boot. No 
/dev/sda1 partition and no /dev/sdb1 or /dev/sdb2 partitions!

Eventually I solved this by removing the "--rm_partitions" part in 
/etc/rc.sysinit. This allowed Fedora to find /dev/sda1 and boot. It 
seems somehow the device mapper mappings broke when the RAID broke. I 
don't know how to fix this.

I RMA'd the bad drive and now I have a replacement drive. I installed 
the new drive and I told the Intel OROM to include it in my RAID1 
volume. I booted into Fedora. During boot, I get mdadm messages that it 
is adding a drive. I haven't seen this before. When Fedora is loaded, 
dmraid wants to claim my RAID status is 'ok' yet no rebuilding is 
happening. I have no HDD activity. I attempt to initiate a rebuild. I 
get an error: ERROR: Unable to suspend device. Google searching for this 
exact string returns *zero* results. In order to even see any HDD 
activity I booted into my LiveUSB stick and ran "dmraid -R raidset" and 
it took 3 hours for the HDD light to turn off. I reboot and Intel OROM 
still says "Rebuild". Argh!!

In the middle of all this I ran "dmraid -n" and I see *three* hard 
drives in my RAID. One is the original sda drive, the second is the new 
sdb drive, and the third is the serial of the sdb drive, but with a :0 
at the end. I don't see any way of removing that drive. How did it even 
get there?

Now that I've shared my life story, I'm down to these questions:
- How do I properly rebuild my RAID1? The Intel OROM still says "rebuild"
- Does dmraid not support live rebuilding? That seems silly that I had 
to use a LiveUSB load to rebuild.
- Does dmraid not support rebuild status? I had no idea if the rebuild 
was occurring besides the HDD light.
- How do I fix device mapper so I don't have to remove the 
"--rm_partitions" out of /etc/rc.sysinit?
- How do I get my RAID metadata looking good? (no extra ghost drives)

Thanks,
Michael

P.S. It seems the easiest way is to just nuke the array and start over 
but I want to know why this is so hard... it seems dmraid is rather 
"experiemental" and Fedora is moving to mdadm anyway.




More information about the Ataraid-list mailing list