bootable failed sw raid 1 with F9

Christopher K. Johnson ckjohnson at gwi.net
Tue Jun 17 14:20:07 UTC 2008


jrw wrote:
> Sander Hoentjen wrote:
>> Hi list,
>>
>> For the first time in my life i tried to install Fedora with sw raid.
>> See below what went wrong.
>>
>> Here is what I did:
>> Start with 2 empty 500GB sata disks.
>> Make sure nvraid is turned off in my BIOS.
>> Start an F9 install, creating 2 sw RAID partitions: md0 and md1.
>> md0 is 100MB and has an ext3 /boot.
>> md1 has the rest of the space and is LVM.
>> In the lvm I have created the rest of my partitions.
>>
>> Install went great, after reboot my system booted fine, so far so good.
>> I then shutdown my system, pulled out a disk and started again. I got
>> the message "GRUB Hard Disk Error". So I shut down, plugged the disk
>> back in, pulled out the other one and started again. This time I was met
>> by a GRUB shell, no boot logo, no idea what to do (no menu).
>> Shutdown again, replug the disk, start again, get on IRC, type:
>> grub
>> root (hd0,0)
>> setup (hd0)
>> root (hd1,0)
>> setup (hd1)
>>
>> After that: reboot minus 1 disk. I can see grub, with logo and boot
>> options. It starts ok, i even get rhgb for a second and then I see:
>> "fsck.ext3: Invalid argument while trying to open /dev/md0"
>> I can go into a maintenance shell and when I do cat /proc/mdstat is see:
>> md0 : inactive sda1[0](s)
>>
>> "mdadm --assemble /dev/md0" turns it active again, but well I have no
>> idea how I can continue normal boot, if it is even possible.
>>
>> So this is my story, now my questions:
>> - Did I do anything wrong? I performed the installation twice, with both
>> times the same result.
>> - Is this a bug somewhere? Do other people get the same or better
>> results?
>> - Is there anything I can do to fix this?
>>
>> Thanks for reading this far,
>>
>> Sander
>>
>>
>>   
> I have already experienced this problem and raised a report on Redhat 
> bugzilla (no. 450722) although there  has been no response to it so 
> far. I spent some time pinning the problem down to Fedora 9, (it is OK 
> on Fedora 8 plus updates).

Chances are excellent that the initial problem was grub not writing mbr 
correctly on both disks of the mirror.  And when you did so in your 
interactive grub session, I believe you created a dependency on both 
disks being present through the use of root (hd1,0) - in effect saying 
look for /boot on the first partition of the second disk.

The subsequent problem with the mirror being broken may have been caused 
by the process of booting on one disk, not both, depending on the exact 
sequence of disk removal versus boots.  The raid superblock would be 
updated on one disk, and be stale on the other, and it is appropriate 
that you had to re-add the stale disk afterward.

Although this should definitely be addressed as a bug in the 
installation process, it can also be dealt with pro-actively when booted 
on the newly installed system and once synchronization of mirrors 
backing /boot has completed.

If your grub.conf notes use of root (hd0,0), and the md (md0 in your 
case) for that is mirrored on disks sda and sdb:
[root at myhost]# grub
grub> device (hd0) /dev/sdb
grub> setup (hd0)
grub> quit

The difference is that here we are saying to grub, pretend the second 
disk is your first disk, and then write mbr for a root on this disk 
accordingly.

Be aware that once you boot on one disk only the raid superblocks on 
mirrors there are updated and no longer match those on the removed disk, 
thus you will need to re-synchronize your mirrors when booted with both 
disks present again.

-- 
   "Spend less!  Do more!  Go Open Source..." -- Dirigo.net
   Chris Johnson, RHCE #804005699817957




More information about the fedora-list mailing list