Problem with raid modual starting before scsi modual

Aleksandar Milivojevic amilivojevic at pbl.ca
Mon Jan 10 15:39:23 UTC 2005


google at familycook1.net wrote:
> I am running an updated FC2 system.
> 
> My problem is that my raid modual is starting before my scsi modual and not
> auto-decting my software raid 5 on SATA disks (with scsi drivers).

Are you sure md module (RAID) is loaded before SCSI module?  Check init 
script from initrd file.  Something like:

$ mkdir /tmp/initrd
$ cd /tmp/initrd
$ gzip -dc < /boot/initrd-your-kernel.img | cpio -i
$ less init

(you'll get some errors that you can ignore if you attempt to extract 
initrd cpio archive as normal user, ignore those, not important).

If modules in init script are loaded (look for insmod lines) in correct 
order, than you are one of few unlucky people that are hitting a known 
race condition.  The problem is that device drivers when loaded are not 
waiting for each other to initialize completely.  So when your SCSI 
driver is loaded, it starts to detect disk drives, and insmod command 
that loaded it exits while it is still doing that.  md driver is than 
loaded, and if SCSI driver hasn't detected your disks by then, md driver 
isn't going to find them.  Classic race condition.

If this is what happens to you, each time you install new kernel, you'll 
have to make custom made initrd image by editing init script, and insert 
"sleep 10" (or 20, or 30, whatever works for you) to give drivers enough 
time to initialize before next stage is invoked.  Also, inserting sleep 
before/after udevstart is sometimes needed.  On one system, I even had 
to do this for ext3 driver (sleep between loading of jbd and ext3 
modules, sigh).

To do that, do this (this time as root):

# cp /boot/initrd-your-kernel.img /boot/initrd-your-kernel.img.orig
# mkdir /tmp/initrd
# cd /tmp/initrd
# gzip -dc /boot/initrd-your-kernel.img | cpio -idmv
# vi init  (insert some sleeps between invocations of insmod)
# find . -print | cpio -ocv | gzip -c > /boot/initrd-your-kernel.img

If you are using LILO, than you must do:

# /sbin/lilo

If you are using Grub, do not run /sbin/lilo !!!!!  (or you'll screw up 
Grub).

This was also discussed on kernel mailing list.  When I asked, one of 
the responses was that PCI is hot-pluggable subsystem, and that init 
script should (after it loads device driver) poll and wait for needed 
devices to appear (currently it doesn't, it just blindly loads driver 
after driver, and hopes that device will appear before next line that 
depends on it has been executed).

-- 
Aleksandar Milivojevic <amilivojevic at pbl.ca>    Pollard Banknote Limited
Systems Administrator                           1499 Buffalo Place
Tel: (204) 474-2323 ext 276                     Winnipeg, MB  R3T 1L7




More information about the fedora-list mailing list