tux on 2.4.27 kernel and referrer checking

Thu Oct 28 22:37:23 UTC 2004

On Thu, Oct 28, 2004 at 04:24:59PM -0500, William Lovaton scribbled:
> Hi,
> 
> El jue, 28-10-2004 a las 13:37, Marek Habersack escribió:
> > On Thu, Oct 28, 2004 at 10:14:05AM -0500, William Lovaton scribbled:
> > > Very interesting discussion...  A question for all of you: How do you
> > > define "stable"?  How do you measure it?  Have you seen crashes with 2.6
> > That's an interesting question indeed. As a developer, I measure 'stable'
> > with the number of fixes and changes (of any kind, not only bugfixes)
> > applied to the branch of a software product labelled as 'stable'. That's one
> > side of the coin, the flip side is the "real life" stability, i.e. in
> > production, there...
> 
> Mmmm... I don't see it like you do. Although you might have a point. 
> Even stable software needs maintenance.  In fact, "stable" or
Of course, but there is a difference between maintenance (understood as a
routine and marginal process of making sure the software is update to
kill bugs etc) and active development where new features are added (drivers,
filesystems, subsystem rewrites). You cannot honestly expect a software
product to be stable when new code is constantly introduced into it.

> "production ready" means that the software is ready to receive _wider_
> testing.  Eg: Production.  Which is the ultimate testing environtment.
> ;-)
That's a bit risky statement. I guess it depends on how you define
'testing'. Part of serious testing is being proactive in what you do - i.e.
you should and are encouraged to stress the software in any imaginable way. 
If you can risk losing a client or several clients, then by all
means, test software on their servers - most businesses cannot afford that.
I used to think the same way you do, my boss made me change the way I think
about it. On my own machine (desktop, server) I might be aggressive with new
software, on customers' machines I tend to be be more and more conservative. 
So, when the 2.6 patches get well below 1 meg in size (packed!), then will
be time to consider 2.6 as stable.

> Stability, in a developer point of view, is how much the API changes
> breaks backwards compatibility.  According to mailing lists, 2.6 doesn't
> seem to be that stable in this regard.
I sort of agree with you, but there is one small difference as far as the
kernel goes - the kernel developers never do and (hopefully) never will
guarantee that any API in the kernel won't change in the future. Making such
guarantees makes the software freeze at some point or reach the state where
it gets unmaintainable and beyond control. As far as (to some extent)
providing compatibility for software libraries is certainly a good thing
(while having mechanisms to provide backward compatibility and seamless
introduction owf new APIs), in kernel that would be rather a mistake, IMHO.

> > > kernels?  Are they reproducible?
> > ...what's important is the number of security vulnerabilities (often being
> > result of 'unstable' code in the sense described above) and glitches that
> > cause the machine to misbehave in production. I've seen a few 2.6 crashes
> > (mostly on desktop), freezes (desktop and fileserver) and spontaneous
> > reboots/freezes (on a moderately busy webserver). The desktop crashes/oopses
> > very often have to do with preemption, as for the rest, I'm not sure - the
> > BIO (which is still stabilizing IIRC) seems likely to be the problem since
> > the freezes occur under I/O load.
> 
> I haven't use FC2 in desktops that much but I love what I see, feature
> wise and stability wise.  In fact, I'm dying to use FC3 in my
> workstation (home and office) ASAP.
I cannot be a partner for discussion here since I don't use any of the
RedHat systems anymore, but I do think it's not a good idea to use the same
kernel for both desktop and server (think about the i/o scheduler for
instance, or the filesystems you use).
> 
> > As for 2.4, it has serious problems with
> > the VM under stress (heavy webserver load with php/sql activity) to the
> > point when an OOM kills not only the process(es) but also the machine.
> 
> OOM?? May be you have some serious issues with your lack of memory or
> software leaking a lot.
Unlikely. Even then, a user application should NOT be able to kill the
kernel. What we see is the treaded 0 order allocation failed, apache
(because it's apache in 99% of case) gets killed and the kernel freezes. No
oops, nothing in the logs. It's not a unit issue, it happened on several
machines (ranging from P3 to P4 systems) with at least 0.5G of RAM and twice
as much of swap.

> Before the server I mentioned in this thread (SPM 4X 700MHz) we were
> using another one a little bit smaller (SMP 4X 550MHz).  It used RH9
> with 2.4 kernel and tux.  It was rock solid.
We have machines which never bomb, and others that bomb every now and
then. All of them run the same kernel on more or less the same hardware
(with the differences being ignorable).

> > As for your are they reproducible question. It's another interesting issue -
> > for me, personally, the reproducible bugs/glitches are the better ones since
> > they are easier to spot and fix. If a software product suffers from
> > non-reproducible, random crashes which are definitely related (preemption,
> > VM for instance) but don't follow a pattern that's easy to reproduce, then
> > there is something in state of flux inside the product and it's not
> > production quality. I don't really want to name names here, but if you look
> > at the sources of the kernel shipped by one major company, you will be
> > amazed that it ships with 2.5MB of bzipped diffs and 1.6MB of vendor
> > additions. This is not what I consider 'stable'. I might be wrong, of
> > course.
> 
> Preemption?? I don't think that any sane distributor will ship a kernel
> with preemption enabled.  It just works to find latency bugs but it
> doesn't work very well for production use.  Ingo has been doing a great
> job releasing patches at a furious rate to improve this situation.
I certainly do hope nobody will release such kernels, it was just an example
of a possible situation where there would be a usability conflict for such
kernel.

> Patched kernels from distributors are ok.  In fact that seems to be
> right according to the new development model.  kernel.org will be the
> most complete and fastest kernel but not neccesarilly the most stable. 
> It's up to the distributors to stabilize the kernels they ship.
Sure, and merge the changes back to the mainline kernel. But if you take
into account that the patches are 36MB (unpacked), totalling 1133827
(example taken from a real vendor kernel source) diff lines (with 3 
lines of context), then there is something wrong in the picture, don't 
you think?

> > > I'm using Fedora Core 2 (with official updates) in a high loaded, high
> > > traffic production server and it is very, very stable.  Right now it has
> > > 25 days of uptime.  It could be more by now, but some reboots have
> > > prevented it.
> > Can you define high load? We've had machines that were over 150 days of
> > uptime under heavy load, only to crash suddenly under the same load. They
> > are running 2.4 and can, indeed, be considered stable.
> 
> In my case high load means 115 requests/sec according to apache
> access_log 70% of them are static content, the rest is dynamic (PHP).  I
> have seen peaks of 320 req/sec.  This is why TUX is so great.
> 
> The load average is between 40 and 80 during peak times without tux. 
> The server gets 4.5 millions of requests per day.
Some of our machines all of a sudden get load average over 300, one of them
had load of over 900 last week. Surely, most of the cases can be attributed
to bad programming (php, perl scripts that do weird things), but that should
not crash the kernel. At least I hope so. We tried resource limits, but on a
reasonably busy machine it's rather hard to come up with sensible resource
limits for over 800 apache processes. If you give each of them 1/3 of the
RAM to use as the maximum, it's enough that 6 of them run away and the
machine dies. They are dedicated severs handling an awful lot of traffic, so
limiting the number of apache processes below 512 makes no real sense.
vservers might be a good promise for enforcing resource limits perhaps, but
that only with 2.6, so not for us yet.

> > > The only problem I have is TUX (not using it right now) and that's why
> > > I'm subscribed to this list.  Anyway TUX is not present in the official
> > > kernel anymore.
> > What is the problem with TUX?
> 
> http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=125091
Sounds as awful as the bug Ingo fixed sometime ago for us - but we had a
problem with the user space modules for Tux.

regards,
marek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/tux-list/attachments/20041029/6431589f/attachment.sig>