4KSTACKS et al...

Ian Kent raven at themaw.net
Fri Aug 5 01:22:55 UTC 2005


On Tue, 2 Aug 2005, Paul A Houle wrote:

>     A few weeks ago we had a 4-way amd64 web server running RHEL 4 that 
> crashed sporadically -- nothing left in the syslog.  up2date didn't find 
> a new kernel,  so I just downloaded and installed the latest kernel from 
> kernel.org and the system has been stable ever since.  I'm not sure if I 
> could have gone to RH for support because Cornell has a site license,  
> and even if I had a direct line to RH management,  it would take me more 
> time to explain the problem than it would take to try a mainstream kernel.
> 
>     Overall,  I'm quite happy with the four-digit revision mainstream 
> linux kernels.  We had a crash on our main machine that left a stack 
> trace,  did some research on the web,  found that this had been fixed in 
> 2.6.11.something,  upgraded the kernel,  case closed.
> 
>     People are willing to pay $$ to get an "enterprise" product which is 
> reliable,  and supported,  but this is another case where the generic 
> product turns out to be more reliable than the branded product,  and 
> looking at what's happening with Fedora,  I've got a lot of concern that 
> RH's pursuit of innovation will always lead to a kernel long on gee-whiz 
> features and short on reliability.  Crashes mean I get calls from the 
> NOC at 4am,  and god forbid that my toddler hears the phone ring or me 
> walking down the stairs,  because I'll need to entertain him while 
> dealing with the crash and for the rest of the morning.  Then a week 
> later I go to netcraft and they say my uptime is seven days and I feel 
> like a jerk because the whole world knows about my problems.
> 
>     I think there are two reasons for the RHEL 4 instability:  (i) the 
> quarterly release cycle means that I have to wait for bug fixes -- and 
> if you're running a non-x86 architecture,  it seems like 2.6 is shaking 
> out bugs at a high rate,  and (ii) RH is aggressively pushing new features.
> 
>     I really don't know what's in RHEL 4 (it would take me more time to 
> look at the patches than it would to revert to mainstream) but the 
> activation of 4KSTACKS in Fedora is one of those changes that reduces 
> reliably.
> 
>     I've been looking,  and I've never found out what benefit that 
> 4KSTACKS has for end users.  The kernel team is sensible,  so I'm sure 
> that there are some real benefits,  but looking at the problem reports 
> and at the attitudes of some people on this list,  I start to wonder if 
> it's just a vindicitive attempt to put an end to ndiswrappers.  (I'd 
> really love to see an explanation of the benefits of 4KSTACKS)
> 
>     The real trouble is that 4KSTACKS problems aren't in kernel modules 
> per se,  but really are in the combination of modules that are running.  
> Yeah,  maybe they can get reiserfs running under 4KSTACKS,  but what if 
> you're running an NFSv4 server with all the whizzy options turned on,  
> and IPv6 with tunneling and it's a reiserfs filesystem and you're using 
> LVM and RAID and a particularly funky SCSI driver,  what then?
> 
>     By adopting 4KSTACKS early,  Fedora has helped shake out problems 
> with 4KSTACKS,  but when 4KSTACKS becomes the main option in the 
> mainstream kernel,  we'll see people dealing with weird problems that 
> happen sporadically on certain setups for years to come.  We seem to 
> have one of the worst workloads in the world,  and the last thing I need 
> is more crashes.

I find it hard to understand why it is so important to remove this option, 
making the kernel 4K stacks only, as was proposed not so long ago.

I also find it hard to understand why it is such a problem having a larger 
stack. As you point out, as software evolves it ultimately becomes more 
complex. If the developers design needs it and the software is reliable 
and efficient (aka performs well) then why not.

A quick caclulation.

2000*4k is about 8M in say 1G at least.

Not a large percentage overhead I think.

But of course I don't know the reasoning behind the change (as I missed 
the thread), I just get a little burned by the consequences from time to 
time, like yourself.

Ian




More information about the fedora-devel-list mailing list