Rawhide does not boot since 2.6.27-0.398

Mon Oct 13 16:09:31 UTC 2008

Antonio, very clear explanation indeed. For me the bug 466071 has been
set on this problem already. I am going to test your proposal of
workaround.
Michael, you can forget my preceding message as i didn't read the
explanation of ANtonio before.

Regards.

On Mon, Oct 13, 2008 at 5:42 PM, Antonio Olivares
<olivares14031 at yahoo.com> wrote:
> --- On Mon, 10/13/08, Michael H. Warfield <mhw at WittsEnd.com> wrote:
>
>> From: Michael H. Warfield <mhw at WittsEnd.com>
>> Subject: Re: Rawhide does not boot since 2.6.27-0.398
>> To: olivares14031 at yahoo.com, "For testers of Fedora Core development releases" <fedora-test-list at redhat.com>
>> Cc: mhw at WittsEnd.com
>> Date: Monday, October 13, 2008, 7:56 AM
>> On Sat, 2008-10-11 at 12:48 -0700, Antonio Olivares wrote:
>> > --- On Sat, 10/11/08, Bruno GARDIN
>> <bgardin at gmail.com> wrote:
>>
>> > > From: Bruno GARDIN <bgardin at gmail.com>
>> > > Subject: Rawhide does not boot since 2.6.27-0.398
>> > > To: "For testers of Fedora Core development
>> releases" <fedora-test-list at redhat.com>
>> > > Date: Saturday, October 11, 2008, 10:55 AM
>> > > I am testing rawhide for a few month now but i
>> have problem
>> > > of boot
>> > > since kernel 2.6.27-0.398. My rawhide is a
>> virtual system
>> > > on vmware
>> > > server now in version 2.0. Whenever i try to
>> boot, i got
>> > > the following
>> > > errors at the end :
>> > > Activating logical volumes
>> > >     VOlume group "VolGroup00" not found
>> > > Unable to access resume device
>> (/dev/VolGroup00/LogVol01)
>> > > Creating root device
>> > > Mounting root file system
>> > > mount: error mounting /dev/root on /sysroot as
>> ext3. No
>> > > such file or directory
>> > > Setting up other filesystems
>> > > setuproot: moving /dev failed:No such file or
>> directory
>> > > setuproot: error mounting/proc: No such file or
>> directory
>> > > setuproot: error mounting /sys: No such file or
>> directory
>> > > Mount failed for selinuxfs on /selinux: No such
>> file or
>> > > directory
>> > > Switching to new root and running init
>> > > swithroot: mount failed: No such file or
>> directory
>> > > Booting has failed
>>
>> > > Boot works fine with kernel 2.6.27-0.382 but
>> fails also
>> > > with 2.6.27-1.
>> > > I have looked at the thread related to ext4 but i
>> am using
>> > > ext3. I
>> > > have also tried a new mkinitrd on 2.6.27-1 but no
>> change.
>> > > Any idea of
>> > > what the problem could be ?
>>
>>       The real source of the problem was much earlier in the
>> messages than
>> what was originally provided.  I've been trying to
>> track this down
>> myself.  Here is the critical bit of information:
>>
>> Kernel that boots:
>>
>> Loading dm-mirror module
>> scsi 2:0:0:0: Direct-Access
>> scsi target2:0:0: Beginning Domain Validation
>> scsi target2:0:0: Domain Validation skipping write testes
>> scsi target2:0:0: Ending Domain Validation: 1204k
>> scsi target2:0:0 FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns,
>> offset 127)
>> Loading dm-zero module
>> sd 2:0:0:0: [sda] 25165824 512-byte hardware sectors (12885
>> MB)
>> sd 2:0:0:0: [sda] Write Protect is off
>> sd 2:0:0:0: [sda] Cache data unavailable
>> sd 2:0:0:0: [sda] Assuming drive cache: write through
>> Loading dm-snapshot module
>> sd 2:0:0:0: [sda] 25165824 512-byte hardware sectors (12885
>> MB)
>> sd 2:0:0:0: [sda] Write Protect is off
>> sd 2:0:0:0: [sda] Cache data unavailable
>> sd 2:0:0:0: [sda] Assuming drive cache: write through
>>  sda: sda1 sda2
>> sd 2:0:0:0: [sda] Attached SCSI disk
>> sd 2:0:0:0: Attached scsi generic sg1 type 0
>> Making device-mapper control node
>> Scanning logical volumes
>>   Reading all physical volumes.  This make take a while...
>>   Found volume group "VolGroup00" using metadata
>> type lvm2
>>
>> Kernel that fails:
>>
>> Scanning logical volumes
>> scsi 2:0:0:0: Direct-Access
>> scsi target2:0:0: Beginning Domain Validation
>> scsi target2:0:0: Domain Validation skipping write testes
>> scsi target2:0:0: Ending Domain Validation: 1204k
>> scsi target2:0:0 FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns,
>> offset 127)
>>   Reading all physical volumes.  This make take a while...
>> sd 2:0:0:0: [sda] 25165824 512-byte hardware sectors (12885
>> MB)
>> sd 2:0:0:0: [sda] Write Protect is off
>> sd 2:0:0:0: [sda] Cache data unavailable
>> sd 2:0:0:0: [sda] Assuming drive cache: write through
>> sd 2:0:0:0: [sda] 25165824 512-byte hardware sectors (12885
>> MB)
>> sd 2:0:0:0: [sda] Write Protect is off
>> sd 2:0:0:0: [sda] Cache data unavailable
>> sd 2:0:0:0: [sda] Assuming drive cache: write through
>>  sda: sda1 sda2
>> sd 2:0:0:0: [sda] Attached SCSI disk
>> sd 2:0:0:0: Attached scsi generic sg1 type 0
>> Activating logical volumes
>>   Volume group "VolGroup00" not found
>> Unable to access resume device (/dev/VolGroup00/LogVol01)
>>
>>       Note that in the kernel that's failing LVM is starting
>> to scan for
>> logical volumes before the SCSI devices have stabilized!
>> It doesn't
>> find any PV's and so it doesn't find
>> "VolGroup00".  That's the killer.
>> Now.  Look closely at the one that booted above.
>> What's interspersed
>> with the scsi startup messages?  Loading dm-mirror, loading
>> dm-zero,
>> loading dm-snapshot.  Then we see the "Scanning
>> logical volumes".  Well,
>> guess what.  Those modules are not present in the latest
>> kernel as
>> modules.  They created enough of a time delay that lvm
>> started after the
>> scsi drivers had settled.  Now they are not there, we have
>> a race
>> condition and, if lvm starts to early, lvm can find the
>> drives.  The
>> "Scanning logical volumes" is starting exactly
>> where we see the "Loading
>> dm-mirror" in the kernel which does boot.  I would
>> call that a smoking
>> gun.
>>
>>       What's really interesting (to me) is if you look at
>> the 2.6.26 kernels
>> under F9 (and I'm testing this 2.6.27 kernel under F9
>> as well as F10).
>> You find ALL of the "Loading dm-*" messages AFTER
>> the scsi drivers have
>> settled.  Something has changed here in the 2.6.27 kernels
>> where the
>> scsi drivers either are not settling as fast (debugging
>> messages and
>> checks perhaps) or the insmod is returning sooner (before
>> the drivers
>> have settled) creating this race condition which did not
>> exist at all in
>> 2.6.26.
>>
>>       I think this is a bug in mkinitrd and it's not
>> emitting a wait when
>> it's needed in this case.  Down in mkinitrd around line
>> 1483 is a check
>> for conditions under which it wants to issue a wait for the
>> scsi to
>> setting.  I think that either needs to be made
>> unconditional or at least
>> expanded to include other scsi devices like the VMware
>> ones.  I cheated.
>> Up at line 1411 I changed
>> "wait_for_scsi="no"" to
>> "wait_for_scsi="yes""
>> and rebuild the initrd.  Problem goes away and lvm starts
>> AFTER the scsi
>> devices have settled.  Another way to do this might be to
>> force the
>> including of usb-storage (mkinitrd --with-usb) which has
>> it's own delay
>> both in loading and settling.
>>
>> > > --
>> > > BeGe
>> > >
>> > > --
>>
>> > Try uninstalling the kernel and reinstalling it via
>> yum.
>> > I tried several times and succeeded :)
>>
>>       No you didn't.  You got lucky and won the race that
>> time.  That's the
>> problem with race conditions.  Sometimes you win through
>> just plain dumb
>> luck.  You may well find it fails on a subsequent reboot
>> and you're
>> screwed.  Then again, you may well have changed something
>> else that
>> changes the timing and changes the probability and it then
>> works for
>> you.
>>
>>       I'll be filing a bugzilla bug on this later if nobody
>> else gets to it
>> first.
>>
>> > Regards,
>>
>> > Antonio
>>
>>       Mike
>> --
>> Michael H. Warfield (AI4NB) | (770) 985-6132 |
>> mhw at WittsEnd.com
>>    /\/\|=mhw=|\/\/          | (678)
>> 463-0932 |  http://www.wittsend.com/mhw/
>>    NIC whois: MHW9          | An optimist believes we live
>> in the best of all
>>  PGP Key: 0xDF1DD471        | possible worlds.  A pessimist
>> is sure of it!
>
> Michael,
>
> You have a very valid point.  My problem was that the partition(not the boot partition) was ext4 and had ext4dev, and it was changed to plainly ext4.  I had to change /etc/fstab from ext4dev to ext4 then reinstalled the kernel and it worked :)
>
> But you are correct with the scsi/dm Volume group(lvms)
>
> Regards,
>
> Antonio
>
>
>
>
> --
> fedora-test-list mailing list
> fedora-test-list at redhat.com
> To unsubscribe:
> https://www.redhat.com/mailman/listinfo/fedora-test-list
>

-- 
BeGe