RFC: Optimizing for 386 (Part 2)

Sun Mar 27 18:40:46 UTC 2005

On Fri, 2005-03-25 at 09:34 -0600, Derek Moore wrote:
> > No, they aren't.  Not a single objective benchmark or measurement.  Just
> > "it feels faster" with no proof that it's not just a placebo effect.
> > You might have set out to prove something, but you haven't even started
> > to do so yet.  Show some benchmarks.  Show how much quicker the X
> > rendering show.  Show how much smaller the latency is.  Prove it.
> 
> He did the hard part of recompiling the distro, and he probably wanted
> to get that released to users as soon as possible.
> 
> > of RAM, but only ask for a paltry 80GB 7200 disk?  Either you have no
> > clue what you're doing, or you're trying to scam people into buying your
> > new gaming box for you.  Either way, I certainly hope nobody gives you
> > any money for that thing.  *snicker*
> 
> There's no need to be a prick in public.
> 
> He's not the only one that believes compiler optimizations have an
> effect at runtime (if he was, Gentoo wouldn't exist).
> 

And at one point there were people who believed they could get rich
by transmuting lead into gold. 

That kind of uberhackers tends to have some points in common

1) They are completely unaware of some trivial economic concepts like
"Return on Investment".  Let's give a practical example:  Assume my box
spends a week building Gentoo/Fedora with super-super-super-optimized
compiling flags and I get a 10% improvement at pure-CPU tasks.  That
means I will need 10 weeks (2 months and half) to recover my investment
but this is assumming a) I don't power off my machine  b) my machine
ever has something to do instead of simply waiting for keyboard input
c) that thing being done is a pure CPU task, ie no disk I/O  d) Bug
fixes and security alerts don't force me to upgrade (and thus recompile)
any significant piece of software.  

2) They tend to completely ignore the issues involved when building
software: Autoconf will build a binary according of what software is
installed in the box.  If something is missing at times it aborts the
building but most of the time it will just silently build a package
without the "optional" feature.  Then you spend days struggling with
Samba and wondering why the damned thing doesn't work with active
directory or doesn't support ACLs.  And it could be worse: imagine
when the problem is not your software not finding the "optional"
library but that when the library was built it didn't find that other
library with that nifty feature.  Did I mention that when you start
playing that kind of games you introduce a variability who makes
nearly impossible to support you?

3) They haven't read the gcc manual.  They don't know about the 
-mcpu= and the -march= distinction and I have seen a guy recommend
recompiling the kernel with -03.  Hint: at that time -O3 was just
an abbreviation for "-O3 -finline-functions" and having the
compiler try to inline functions in a program, like the kernel,
where the programmer has already inlined the useful functions will only
SLOW it.  

4)  They don't cross-check assertions, they don't quantify,
they don't run benchmarks and the rare times they do they
compare apples and oranges.  This could be acceptable for art
students but we are supposed to be engineers isn't it?
As an example I ran on an article who discussed the need of
doing a burn-in test when you get a new computer and that
while Linux distros don't come with a deidcted burn in program
"you have a very good test who is compiling the kernel".   Of
course the assertion is silly: compiling the kernel does zero
FPU instructions, about zero MMX instructions and on an Athlon
2000 the processor spends over 40% of the time doing nothing
(I measured it :-)) and thus cooling.  In fact compiling the
kernel raised the temperature of my Athlon by a mere 5 degrees
Celsius ie it is completely useless for detecting flaky 
processors.  The author of the article was Arnold Robbins, Gentoo's
author.

BTW: On Fedora 3, gcc 3.4.2, athlon XP 2000+
 I ran the nbyte benchmark after compiling it a la RedHat
ie  "-O2 -mcpu=i686" and with "-O2 -march=athlon-xp"
The optimized program was 4% faster at the memory tests,
15% at the integer tests (due to being 50% faster on one test)
and only 0.8% faster on floating point.  It was not done
"scientifically" ie no daemons, no network, same processor temperature
at start of test.  It was done with same kernel and same glibc.
A kernel optimized for athlon would have had zero influence since
the test spends very little time in kernel mode, an optimized
glibc (like when you recompile entirely your distribution) could
have had some effect specially for the FP tests.  However RedHat
does not compile the glibc with -mcpu=i686 but with -march=i686
and this is only 3 or 4% slower on an Athlon than -march=athlon.  
Using the SSE unit for floating point in the libm would have
improved results but since SSE doesn't suppport double precision
it would create an interesting problem for maintaining the
Makefile.  Using -mfpmath=sse for compiling the benchmark itself
produced zero improvements: I think most/all FP work is done through
calls to the libm. 

-- 
Jean Francois Martinez <jfm512 at free.fr>