Extremely poor performance crunching random numbers under PIV-FC5
BankHacker
bankhacker at gmail.com
Fri May 19 15:38:27 UTC 2006
> Either have struct random_data randomdataState;
> and replace current uses of *randomdataState with randomdataState and
> currnet uses of randomdataState with &randomdataState, or initialize
> the pointer to an address of some struct random_data.
Ok. Now it is fix and runs without errors thanks tu Jakub and Andy help.
Unfortunately, the performance continues being very poor on the
FC5-PIV @ 3Ghz system, and only on it, but not the others (Opteron and
PIV @ 2.4Ghz).
I post the corrected full source code and the results here:
### test-cpu-2.c ################################################
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <string.h>
#include <fcntl.h>
#ifdef linux
inline void randomize(struct random_data *randomdataState) {
#else
inline void randomize() {
#endif
#ifdef linux
static int buf[32];
#else
#endif
time_t seconds;
time(&seconds);
srand((unsigned int) seconds);
#ifdef linux
/* Se inicializa el generador especial de numeros aleatorios */
srand48((unsigned int) seconds);
memset(randomdataState, 0, sizeof(*randomdataState));
initstate_r(seconds, (char *) buf, 128, randomdataState);
#else
#endif
}
/* int main(int argc, char ** argv) { */
int main() {
int i, r, numero_ciclos, numero_ciclosM;
clock_t start, end;
char* buf;
struct random_data randomdataState;
/* Se inicializa el generador de numeros aleatorios */
#ifdef linux
randomize(&randomdataState);
#else
randomize();
#endif
start = clock();
/* Se reserva 0.1 Gb de memoria */
buf=malloc(100*1024*1024);
end = clock();
printf("Reservado 0.1 Gb de memoria en %.3f s.\n", (double)(end -
start)/CLOCKS_PER_SEC);
start = clock();
/* Se escribe en 0.1 Gb de memoria */
for(i=0; i<100*1024*1024; i++) {
buf[i]='0';
}
end = clock();
printf("Escritura sobre 0.1 Gb de memoria en %.3f s.\n",
(double)(end - start)/CLOCKS_PER_SEC);
numero_ciclos = 10000000; numero_ciclosM = numero_ciclos / 1E6;
start = clock();
for(i=0; i<numero_ciclos; i++) {
r = rand();
}
end = clock();
printf("%d M de rand() en %.3f s. (ejemplo.: %d)\n",
numero_ciclosM, (double)(end - start)/CLOCKS_PER_SEC, r);
start = clock();
for(i=0; i<numero_ciclos; i++) {
r = sqrt(i);
}
end = clock();
printf("%d M de sqrt(i) en %.3f s. (ejemplo.: %d)\n",
numero_ciclosM, (double)(end - start)/CLOCKS_PER_SEC, r);
start = clock();
for(i=0; i<numero_ciclos; i++) {
r = log(i);
}
end = clock();
printf("%d M de log(i) en %.3f s. (ejemplo.: %d)\n",
numero_ciclosM, (double)(end - start)/CLOCKS_PER_SEC, r);
start = clock();
for(i=0; i<numero_ciclos; i++) {
r = log10(i);
}
end = clock();
printf("%d M de log10(i) en %.3f s. (ejemplo.: %d)\n",
numero_ciclosM, (double)(end - start)/CLOCKS_PER_SEC, r);
#ifdef linux
start = clock();
for(i=0; i<numero_ciclos; i++) {
r = random();
}
end = clock();
printf("LINUX: %d M de random() en %.3f s. (ejemplo.: %d)\n",
numero_ciclosM, (double)(end - start)/CLOCKS_PER_SEC, r);
start = clock();
for(i=0; i<numero_ciclos; i++) {
random_r(&randomdataState, &r);
}
end = clock();
printf("LINUX: %d M de random_r() en %.3f s. (ejemplo.: %d)\n",
numero_ciclosM, (double)(end - start)/CLOCKS_PER_SEC, r);
start = clock();
for(i=0; i<numero_ciclos; i++) {
r = lrand48();
}
end = clock();
printf("LINUX: %d M de lrand48() en %.3f s. (ejemplo.: %d)\n",
numero_ciclosM, (double)(end - start)/CLOCKS_PER_SEC, r);
#else
#endif
return (0);
}
#################################################################
I compile the code this way in the FC5-PIV @ 3Ghz system:
# gcc test-cpu-2.c -o randr-test-cpu-2 -lm -W -Wall -pedantic -O3
No errors, no warnings. And execute it:
# ./randr-test-cpu-2
Reservado 0.1 Gb de memoria en 0.000 s.
Escritura sobre 0.1 Gb de memoria en 0.240 s.
10 M de rand() en 46.640 s. (ejemplo.: 1867229032)
10 M de sqrt(i) en 0.170 s. (ejemplo.: 3162)
10 M de log(i) en 0.810 s. (ejemplo.: 16)
10 M de log10(i) en 0.810 s. (ejemplo.: 6)
LINUX: 10 M de random() en 38.630 s. (ejemplo.: 19070960)
LINUX: 10 M de random_r() en 19.390 s. (ejemplo.: 1867229032)
LINUX: 10 M de lrand48() en 31.610 s. (ejemplo.: 1479483981)
random_r() function is faster than rand(), more than twice faster.
That is better performance, but not enough, cause the results in the
PIV @ 2.4Ghz are these:
# gcc test-cpu-2.c -o randr-test-cpu-2 -lm -W -Wall -pedantic -O3
# ./randr-test-cpu-2
Reservado 0.1 Gb de memoria en 0.000 s.
Escritura sobre 0.1 Gb de memoria en 0.390 s.
10 M de rand() en 0.410 s. (ejemplo.: 1589201696)
10 M de sqrt(i) en 0.220 s. (ejemplo.: 3162)
10 M de log(i) en 1.110 s. (ejemplo.: 16)
10 M de log10(i) en 1.160 s. (ejemplo.: 6)
LINUX: 10 M de random() en 0.330 s. (ejemplo.: 158326915)
LINUX: 10 M de random_r() en 0.190 s. (ejemplo.: 1589201696)
LINUX: 10 M de lrand48() en 0.580 s. (ejemplo.: 447468310)
Under Opteron system, similar results:
# gcc test-cpu-2.c -o randr-test-cpu-2 -lm -W -Wall -pedantic -O3
# ./randr-test-cpu-2
Reservado 0.1 Gb de memoria en 0.000 s.
Escritura sobre 0.1 Gb de memoria en 0.170 s.
10 M de rand() en 0.160 s. (ejemplo.: 859117811)
10 M de sqrt(i) en 0.120 s. (ejemplo.: 3162)
10 M de log(i) en 1.060 s. (ejemplo.: 16)
10 M de log10(i) en 1.220 s. (ejemplo.: 6)
LINUX: 10 M de random() en 0.140 s. (ejemplo.: 304030109)
LINUX: 10 M de random_r() en 0.080 s. (ejemplo.: 859117811)
LINUX: 10 M de lrand48() en 0.140 s. (ejemplo.: 770314866)
Conclusions:
1st.- Las oddity is resolved. It was due to a bug in my source code. Sorry!
2nd.- random_r() function is hard to implement, but gives better
performance than rand() function
3th.- random_r() function is outputing exactly the same random
numbers than rand() function. Look at the example results in the
tests. I don´t know if that is correct, reasonable or a possible
problem ...
4th.- We still don´t know the origin of the extreme low performance
of random functions in the FC5-PIV @ 3 Ghz system.
5th.- We suspect that the problem may be due to an odd bug that
appears when combining FC5 glibc (libc.so.6) version plus certain PIV
CPU's.
Already done unsuccesfully:
1.- To activate/desactivate SELinux
2.- To activate/desactivate swap
3.- Try to use static /usr/lib/libm.a
4.- Use alternate random function random_r()
5.- Magic things down /proc/sys/kernel/
Next to do:
1.- Andy told me to make my own bankhacker_random_r() function and
avoid glibc's (libc.so.6). I am going to work on it, but it is not
easy, I think.
2.- Jakub told that "On PIV, atomic instructions are horribly
expensive. Either you have preloaded some library that called
pthread_create, or your CPU is unable to do the jump around lock
prefix trick quickly." It sounds very interesting but I don´t know how
to handle this ... any further explanation would be a great hint.
Thanks!
3.- More ideas? Thanks honestly ...
More information about the fedora-list
mailing list