[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Changing CFLAGS for i386 packages on x86_64(new benchmarks included)



Kevin Kofler wrote:
dragoran <dragoran <at> feuerpokemon.de> writes:
the last one (-m64) is weird (much slower!!) no -OX was used.

Then your benchmarks are essentially worthless. Sorry, but since GCC defaults to -O0, which means no optimization whatsoever, i.e. very bad code, you should NEVER compile production code without an -O flag (generally -O2 or -Os), and especially not benchmarks!

So please rerun your benchmarks with -O2 to get more useful results.

ok done results see attachment
native 64bit one is much faster now (than without -O) and its now ~ equal to 32bit perfomance as for the other result there are better then the old one but the diff between them is not the same as before all of them have almost equal perfomance.
so far the old benchmarks with -O2 added...
I have also tryed -mfpmath=sse and -ftree-vectorice and they seem to have a positive effect on module 1 which is (looking at the sourcecode):
/*******************************************************/
/* Module 1.  Calculate integral of df(x)/f(x) defined */
/*            below.  Result is ln(f(1)). There are 14 */
/*            double precision operations per loop     */
/*            ( 7 +, 0 -, 6 *, 1 / ) that are included */
/*            in the timing.                           */
/*            50.0% +, 00.0% -, 42.9% *, and 07.1% /   */
/*******************************************************/
For x86_64 (the -m64 benchmarks) it seems that -Os is better than -O2 (maybe because on x86_64 the binarys are generally bigger and this reduces this effect == less cache misses?)
        Kevin Kofler


gcc -O2 -DUNIX flops.c -m32 -march=i386 -mtune=generic -o flops
   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1     -8.1208e-11      0.0100   1403.2013
     2      1.4704e-15      0.0086    815.2356
     3     -3.8213e-15      0.0076   2238.5432
     4      6.1151e-14      0.0079   1902.7553
     5     -4.4419e-14      0.0159   1819.4941
     6      7.7002e-15      0.0141   2059.8045
     7     -6.6161e-13      0.0236    508.9145
     8      2.2789e-14      0.0141   2127.2906

   Iterations      =  512000000
   NullTime (usec) =     0.0000
   MFLOPS(1)       =  1029.1609
   MFLOPS(2)       =  1014.7499
   MFLOPS(3)       =  1567.2928
   MFLOPS(4)       =  2084.3364
-------------------------------------------------------------------
gcc -O2 -DUNIX flops.c -m32 -march=i686 -mtune=generic -o flops
   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1     -8.1208e-11      0.0100   1403.2011
     2      1.4704e-15      0.0091    773.0317
     3     -3.8213e-15      0.0075   2252.4475
     4      6.1151e-14      0.0079   1902.7549
     5     -4.4419e-14      0.0159   1822.1738
     6      7.7002e-15      0.0141   2057.5211
     7     -6.6161e-13      0.0236    509.0832
     8      2.2789e-14      0.0141   2128.4701

   Iterations      =  512000000
   NullTime (usec) =     0.0000
   MFLOPS(1)       =   984.4080
   MFLOPS(2)       =  1015.3324
   MFLOPS(3)       =  1568.4767
   MFLOPS(4)       =  2086.2031
----------------------------------------------------------------------
gcc -O2 -msse2 -DUNIX flops.c -m32 -march=i686 -mtune=generic -o flops
   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1     -8.1208e-11      0.0099   1409.8255
     2      1.4704e-15      0.0094    747.2427
     3     -3.8213e-15      0.0076   2233.9463
     4      6.1151e-14      0.0078   1914.1380
     5     -4.4419e-14      0.0160   1807.9726
     6      7.7002e-15      0.0141   2051.8342
     7     -6.6161e-13      0.0236    508.7459
     8      2.2789e-14      0.0141   2123.7611

   Iterations      =  512000000
   NullTime (usec) =     0.0000
   MFLOPS(1)       =   955.0271
   MFLOPS(2)       =  1014.0094
   MFLOPS(3)       =  1565.4547
   MFLOPS(4)       =  2082.1009
---------------------------------------------------------------------
gcc -O2 -DUNIX flops.c -m32 -march=k8 -mtune=k8  -o flops
   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1     -8.1208e-11      0.0102   1372.0430
     2      1.4704e-15      0.0085    821.2137
     3     -3.8213e-15      0.0076   2250.1178
     4      6.1151e-14      0.0079   1908.4292
     5     -4.4419e-14      0.0160   1816.8225
     6      7.7002e-15      0.0143   2026.0742
     7     -6.6161e-13      0.0237    506.5646
     8      2.2789e-14      0.0141   2127.2906

   Iterations      =  512000000
   NullTime (usec) =     0.0000
   MFLOPS(1)       =  1036.3726
   MFLOPS(2)       =  1008.9609
   MFLOPS(3)       =  1558.4048
   MFLOPS(4)       =  2076.1624
-------------------------------------------------------------------------
gcc -O2 -DUNIX flops.c -m64 -march=k8 -mtune=k8  -o flops

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0098   1435.8077
     2     -1.4166e-13      0.0085    819.7111
     3      4.7184e-14      0.0080   2122.7938
     4     -1.2557e-13      0.0075   2008.2427
     5     -1.3800e-13      0.0156   1854.9566
     6      3.2380e-13      0.0145   2004.1944
     7     -8.4583e-11      0.0204    588.2436
     8      3.4867e-13      0.0148   2023.0560

   Iterations      =  512000000
   NullTime (usec) =     0.0000
   MFLOPS(1)       =  1025.5139
   MFLOPS(2)       =  1110.0527
   MFLOPS(3)       =  1612.1846
   MFLOPS(4)       =  2032.3280
------------------------------------------------------------------------
gcc -O2 -msse2 -DUNIX flops.c -m32 -march=i686 -mtune=generic -mfpmath=sse -o flops
   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0086   1631.9561
     2     -1.4166e-13      0.0078    901.3518
     3      4.7184e-14      0.0078   2186.7984
     4     -1.2557e-13      0.0073   2066.6088
     5     -1.3800e-13      0.0156   1856.8126
     6      3.2380e-13      0.0142   2038.3128
     7     -8.4583e-11      0.0203    591.6425
     8      3.4867e-13      0.0163   1840.7288

   Iterations      =  512000000
   NullTime (usec) =     0.0000
   MFLOPS(1)       =  1115.7724
   MFLOPS(2)       =  1129.3849
   MFLOPS(3)       =  1621.5578
   MFLOPS(4)       =  1997.4742
--------------------------------------------------------------------
gcc -O2 -msse2 -DUNIX flops.c -m32 -march=i686 -mtune=generic -mfpmath=sse -ftree-vectorize -o flops
   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0089   1573.2115
     2     -1.4166e-13      0.0078    897.7396
     3      4.7184e-14      0.0078   2173.6899
     4     -1.2557e-13      0.0073   2057.7493
     5     -1.3800e-13      0.0157   1852.1796
     6      3.2380e-13      0.0141   2060.9484
     7     -8.4583e-11      0.0201    595.5424
     8      3.4867e-13      0.0162   1847.8153

   Iterations      =  512000000
   NullTime (usec) =     0.0000
   MFLOPS(1)       =  1110.9305
   MFLOPS(2)       =  1131.4868
   MFLOPS(3)       =  1620.0114
   MFLOPS(4)       =  2003.6594
---------------------------------------------------------------------
gcc -Os -DUNIX flops.c -m64 -march=k8 -mtune=generic  -o flops
 FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0096   1450.9217
     2     -1.4166e-13      0.0090    776.3812
     3      4.7184e-14      0.0078   2180.2247
     4     -1.2557e-13      0.0077   1953.0823
     5     -1.3800e-13      0.0151   1925.1909
     6      3.2380e-13      0.0140   2078.2574
     7     -8.4583e-11      0.0212    566.5452
     8      3.4867e-13      0.0145   2064.3871

   Iterations      =  512000000
   NullTime (usec) =     0.0000
   MFLOPS(1)       =   983.3900
   MFLOPS(2)       =  1094.5643
   MFLOPS(3)       =  1624.8007
   MFLOPS(4)       =  2069.8902




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]