[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Fedora 11 and dmraid with intel - problems



Hi everyone,

Sorry for the long email ahead of time, but I've tried researching this the best I can and I've run out of ideas.

I have a nice, new server mobo. It has a Intel 3420 chipset. I have two 1TB Seagate 7200.12 drives attached. I used the Intel OROM to setup the RAID1 on the two drives. I installed Fedora 11 with no issues. I encrypted the LVM during the install.

A few days ago one of the drives reported a bad sector. The bad drive was sdb. I wanted to remove the bad drive so that I could slip in the new drive when it arrived. Here's the first steps I took:

First I attempted to shutdown and unplug the bad drive. Fedora wouldn't boot -- gets to Password: prompt for encrypted partition. Correct password is entered but the encrypted partition cannot be mounted. I narrowed it down to that the /dev/sda1 and /dev/sda2 partitions are not showing up so the kernel can't find the correct UUID from /etc/crypttab to mount /. It previously used /dev/dm-1 I think, but this link has vanished.

Second, I plugged the bad drive back in. Fedora boots normally.

Third, I get brave and remove the bad drive from the RAID using the Intel OROM. I then turn off the machine and remove the bad drive. Fedora won't boot -- gets past password prompt, but during bootup it cannot find my /boot partition and dumps me to a recovery shell.

Fourthly, I plug the bad drive back in and Fedora boots. The bad drive is no longer in the RAID, but its old partitions are exposed. The UUID of /dev/sdb1 matches and so it mounts /boot.

Fifthly, I couldn't get the bad drive to rebuild so I used a Fedora Live USB install to start the rebuild. Rebooted and Fedora wouldn't boot. No /dev/sda1 partition and no /dev/sdb1 or /dev/sdb2 partitions!

Eventually I solved this by removing the "--rm_partitions" part in /etc/rc.sysinit. This allowed Fedora to find /dev/sda1 and boot. It seems somehow the device mapper mappings broke when the RAID broke. I don't know how to fix this.

I RMA'd the bad drive and now I have a replacement drive. I installed the new drive and I told the Intel OROM to include it in my RAID1 volume. I booted into Fedora. During boot, I get mdadm messages that it is adding a drive. I haven't seen this before. When Fedora is loaded, dmraid wants to claim my RAID status is 'ok' yet no rebuilding is happening. I have no HDD activity. I attempt to initiate a rebuild. I get an error: ERROR: Unable to suspend device. Google searching for this exact string returns *zero* results. In order to even see any HDD activity I booted into my LiveUSB stick and ran "dmraid -R raidset" and it took 3 hours for the HDD light to turn off. I reboot and Intel OROM still says "Rebuild". Argh!!

In the middle of all this I ran "dmraid -n" and I see *three* hard drives in my RAID. One is the original sda drive, the second is the new sdb drive, and the third is the serial of the sdb drive, but with a :0 at the end. I don't see any way of removing that drive. How did it even get there?

Now that I've shared my life story, I'm down to these questions:
- How do I properly rebuild my RAID1? The Intel OROM still says "rebuild"
- Does dmraid not support live rebuilding? That seems silly that I had to use a LiveUSB load to rebuild. - Does dmraid not support rebuild status? I had no idea if the rebuild was occurring besides the HDD light. - How do I fix device mapper so I don't have to remove the "--rm_partitions" out of /etc/rc.sysinit?
- How do I get my RAID metadata looking good? (no extra ghost drives)

Thanks,
Michael

P.S. It seems the easiest way is to just nuke the array and start over but I want to know why this is so hard... it seems dmraid is rather "experiemental" and Fedora is moving to mdadm anyway.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]