Another slip in the FC6 schedule

Alfredo Ferrari list at pceet030.cern.ch
Tue Oct 17 22:45:25 UTC 2006


Thanks Dave

I really appreciated. Knowing that the bug was already there in the 
previous kernels makes things somewhat better. Despite our 1k partitions
are often under heavy loads, apparently we never met it. Also knowing
a crash rather than a filesystem corruption would occur is somewhat
reassuring (in a relative sense of course)

Obviously I am perfectly aware that you have a lot of work (and you are 
always answering on these list), sorry for my screaming, but I felt
scared by the perspective of corrupting whole filesystems.

                    Alfredo

On Tue, 17 Oct 2006, Dave Jones wrote:

> On Wed, Oct 18, 2006 at 12:05:14AM +0200, Alfredo Ferrari wrote:
> > Seriously, I believe this is a big issue. Let me summarize:
> >
> > a) there was a kernel update for FC5
> > b) this kernel has a known bug which could results in corrupting
> >     ext3 filesystems with 1k block size under heavy load
>
> it doesn't corrupt filesystems, it crashes instantly when the bug is hit.
>
> > c) ... nevertheless it has been pushed out with no special warning
> > d) pratically all /boot partitions are ext3 1k (anaconda generated)
> > e) many partitions on old machine upgraded from previous versions are
> >     ext3 1k as well
>
> /boot partitions don't see anywhere near the sustained IO that is needed
> to hit this bug.  it takes _hours_ of insane amounts of IO to hit it.
> It should be noted that I was the only person to ever see this.
> No bugzilla reports. No upstream reports.  This is a real corner case
> scenario, as usually filesystems that see that kind of IO want the higher
> throughput that a larger blocksize brings.
>
> > What was the rationale for releasing an official kernel update under such
> > dangerous conditions? Just "anaconda doesn't generate 1k partitions (not
> > true BTW)"? I still believe Linux is not (yet) Windows and if features are
> > in the system (like 1k blocksize partitions) people can use them if
> > they feel appropriate and they must work. Or perhaps there was a rush to
> > push this 2.6.18 kernel out to get some extra guinea pigs finding all
> > residual bugs? But this could be fair for the FC6 betas, not for FC5 where
> > people is expecting reasonable stability, anyway no life-threatening
> > issue like a (known) filesystem corruption bug.
>
> That code hasn't changed in months, so the 2.6.17 kernel in FC5 likely
> was already affected by the same bug, and yet despite this, no-one was
> hitting it because of the pathalogical circumstances needed to hit it.
>
> > Now how long do we have to wait before we have an update for FC5 fixing
> > this critical issue? Or do we have to manually rollback kernels on all
> > machines?
>
> I'm already working on the next update.
>
> 	Dave
>
>

-- 

+----------------------------------------------------------------------------+
|  Alfredo Ferrari                         ||  Tel.: +41.22.767.6119         |
|  C.E.R.N.                                ||  Fax.: +41.22.767.7555         |
|  European Laboratory for Particle Physics||                                |
|  AB Division / ATB Group                 ||  e-mail:                       |
|  1211 Geneva 23                          ||     Alfredo.Ferrari at cern.ch    |
|  Switzerland                             ||     Alfredo.Ferrari at mi.infn.it |
+----------------------------------------------------------------------------+




More information about the fedora-devel-list mailing list