[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Low benchmark performance


While running a simple thread wake-up benchmark on a 4-way ia64 system I
noticed an interesting behaviour.

The program was stolen from Ian Wienand's pthreadbench suite.
It is a simple producer/consumer program with 1 producer and N consumers.
The thing is, with some values of N, the program runs almost 10 times
slower with NPTL than with LinuxThreads.

With 1 consumer, NPTL has a slight advantage, and with both libraries, the
2 threads run on a single CPU. (CPU activity was monitored using xosview)

With 2 consumers, NPTL uses 3 CPU, LT uses 2. NPTL is FOUR times faster.

Actually it seems that LT always uses 2 CPU when there is more than 1

With 3 consumers, NPTL uses 4 CPU and is twice as fast as LT.
>From now on NPTL always seems to use 4 CPUs.

With 4 or 5 consumers there is no clear winner.

But with more consumers, NPTL is MUCH slower than LT.
Even 25 times slower with 60 consumers...

This system is running Linux 2.5.67 and NPTL 0.36.

I guess the performance problem is due to the fact that NPTL uses more
CPUs than LT and therefore uses the cache with much less efficiency than

My questions are:
-Is this problem a benchmark-only problem, unlikely to happen in real
programs ?
-If not, who is to be blamed ? NPTL, or the kernel ? Maybe just some
tuning to do.

You can find the test program as an attached file. It takes one parameter,
the number of consumers.

/* thread benchmark header */

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <signal.h>
#include <sys/time.h>
#include <assert.h>

void do_test(void);

int time_to_wait;
int nb_cons;

/* what are we doing */
char *things = "wakes ups";			
/* how many have we done */
unsigned long things_done=0;

/* threads */
pthread_t thread_id;
void *thread(void*);

/* globals */
unsigned long *data_accessed = &things_done;

typedef struct condition_struct {
  pthread_mutex_t mutex;
  pthread_cond_t empty;
  pthread_cond_t full;
  unsigned long value;
} condition_t;


/* clear a 'queue' */
void *thread(void *arg) {
  while (1) {
    pthread_mutex_lock( &condition.mutex );
    while ( condition.value == 0 )
      pthread_cond_wait( &condition.full, &condition.mutex );
#ifdef DEBUG
    pthread_cond_signal( &condition.empty );
    pthread_mutex_unlock( &condition.mutex );


void do_test(void) {
  pthread_t threads[100];
  int i = 0;

  /* have 10 workers */
  for ( ; i < nb_cons ; i++ ) 
    pthread_create( &threads[i], NULL, thread, (void*)NULL );

  /* fill a queue, signal to threads to empty it */
  while ( 1 ) {

    pthread_mutex_lock( &condition.mutex );
    /* if the queue is full, signal for worker to clean it */
    if ( condition.value ) {
      pthread_cond_broadcast( &condition.full );
      //pthread_cond_signal( &condition.full );
      pthread_cond_wait( &condition.empty , &condition.mutex);
    pthread_mutex_unlock( &condition.mutex );
    /* fill it back up */
    pthread_mutex_lock( &condition.mutex );
#ifdef DEBUG
    condition.value = 5; 
    pthread_mutex_unlock( &condition.mutex );


/* global */
struct timeval start,end;
extern unsigned long things_done;
extern char * things;

/* on alarm print out results */
void on_alarm(int signo) 
  struct timeval diff ;
  double diff_secs;

  /* grab things done before we continue */
  unsigned long stamp = things_done;
  gettimeofday(&end, NULL);
  timersub( &end, &start , &diff );
  diff_secs = diff.tv_sec + diff.tv_usec*1e-6;

  printf("%d %s in %g sec = ", stamp,  things, diff_secs);
  printf("%g per second\n", stamp / diff_secs );


/* main */
int main(int argc, char *argv[]) 

	static struct sigaction alarm_m;

	assert(argc > 1);
	//time_to_wait = atoi(argv[1]);
	nb_cons = atoi(argv[1]);
	assert(nb_cons <= 100);

	printf("Testing during %d seconds with %d consumers\n", time_to_wait, nb_cons);

	/* setup alarm handler */
	alarm_m.sa_handler = on_alarm;
	sigaction(SIGALRM , &alarm_m, NULL);

	alarm( time_to_wait );

	gettimeofday( &start , NULL );

	while ( 1 ) do_test();

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]