CONFIG_DEBUG_STACKOVERFLOW hurts

Mon Sep 17 16:42:27 UTC 2007

On Sat, 2007-09-15 at 17:45 -0500, Eric Sandeen wrote:
> Gilboa Davara wrote:
> 
> >> I was looking at this from a slightly different angle, which is that the
> >> stack overflow warning is largely pointless - no matter how much you
> >> lighten up the dump_stack path, it will add something to the stack depth
> >> of the current process, effectively *reducing* the available stack for
> >> all processes, and increasing the risk that you'll actually overflow.
> >> (if you take an interrupt towards the end of the stack, the warning will
> >> go off and use the last bit - so you can't count on that stack space to
> >> be available).
> > 
> > While it is true,
> > A. If adding ~40 bytes to the kernel's stack usage is critical, we're
> > already passed the all-doom-and-gloom-point.
> 
> Though, just to play devil's advocate, say your absolute worst case
> stack depth goes to within 35 bytes of the end.  The warning (even if
> trimmed down to 40 bytes) now renders your system unstable in the long
> run.  Why waste it?

True.
... Though being the devil's advocate, if you don't have enough stack
space to dump_strace, do_IRQ should -really- call BUG() and halt
everything before things turn really ugly.

> 
> > B. We can always calculate the available stack size, and if stack_remain
> > is bigger then say, 80 bytes, call dump_stack.
> 
> That seems reasonable.  Today, the dump_stack depth ~= the warning
> threshold, so it's just broken and pointless as it stands.

Hopefully my patch will get admitted, reducing the stack usage to ~<80
bytes. (I need to recalculate the callstack usage)

Speaking of which, who's the maintainer of kernel/*syms*?
The (second) patch [1] seemed to have passed the LKML without any
(negative) comments, so I'll try my luck with the actual code
maintainer. (I couldn't find anyone in MAINTAINERS)

> 
> > Yeah, but at least to me, as a developer, having a warning before
> > all-hell-breaks-lose, is a good thing (tm). 
> 
> Perhaps.  The current warning is fairly random, anyway, since it only
> fires on an IRQ.  If you randomly get the warning at your max stack
> excursion, but never go so far as to actually blow the stack, then you
> really didn't need the warning, did you? If you get the warning at
> maybe 85% of your stack excursion, then your thread continues post-IRQ
> and blows the stack, you get a nice backtrace anyway and the warning
> didn't help much.  I'm still not convinced that it's that useful.

I'm usually afraid of the randomness of things.
A minor stack over run can only show it's nasty head -long- -long- after
the actual trash took place making it next to impossible to pin-point
the original source of the problem.

> 
> I like the CONFIG_DEBUG_STACKUSAGE which accurately tells you what your
> max stack excursions have been.   I just wish it could tell you what the
> callchain *was* (not really possible, as it's written) - and making it
> resettable would be nice too (easy).

/me adds a note to self to lookup the CONFIG_DEBUG_STACKUSAGE code so
I'll know what you're talking about ;)

> 
> -Eric

- Gilboa

[1] http://lkml.org/lkml/2007/9/15/159