[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Poor thread performance on Linux vs. Solaris



Hi all,

I've recently ported a heavily multithreaded application from Solaris to 
Linux, but upon running performance tests, I'm encountering some serious 
issues on a 4 way 2.8GHz Xeon box running RedHat 9.  Here's some example 
top output while the app is running (kernel 2.6.0-test4 from 
http://people.redhat.com/arjanv/2.5/RPMS.kernel/):

CPU0 states:  63.2% user  22.3% system    0.1% iowait  13.1% idle
CPU1 states:  65.1% user  24.3% system    0.0% iowait  10.0% idle
CPU2 states:  63.1% user  24.0% system    0.0% iowait  12.2% idle
CPU3 states:  64.3% user  23.1% system    0.0% iowait  11.4% idle

Every so often, the app will freeze for a second or so, and then the 
system time will jump significantly:

CPU0 states:  25.2% user  68.0% system    0.0% iowait   6.2% idle
CPU1 states:  25.0% user  68.1% system    0.0% iowait   6.2% idle
CPU2 states:  25.3% user  67.0% system    0.0% iowait   7.1% idle
CPU3 states:  24.4% user  69.2% system    0.0% iowait   5.2% idle

After the freeze, the app continues as normal, and the system time will 
gradually drop until the next freeze.  This run-freeze-run-freeze cycle 
repeats until the performance test is complete.  The system time ranges 
from about 20% to 80%.

I receive even worse results on RH9's latest 2.4 kernel: the system time 
fluctutates constantly after the first freeze, ranging anywhere from 0% 
to 100%, but usually hovering around 80%.  Overall throughput is worse as 
well.

Both Intel's vtune tool and oprofile reveal the majority of the time is 
spent in '.text.lock.futex', which after some research, I believe is code 
that implements a spinlock for the 'futex' lock in kernel/futex.c.  The 
logical conclusion follows that the system time represents the CPUs 
wasting their time spinning while waiting for the lock.

Now my question is, how should I proceed?  The same app scales fine on 
Solaris, chewing up 100% user time of every CPU I can throw at it.  I 
have a few ideas of my own, and I was hoping someone here might be able 
to give me more feedback:

1) compare results under LinuxThreads vs. NPTL.  Problem is the app 
doesn't run properly under LT, probably because LT isn't posix compliant, 
and I'm not sure I want to waste time fixing it.

2) use some sort of tool to track down the offending locks in our code.  
If nothing else, may provide some insight, or reveal they can be 
eliminated.  Unfortunately, I don't know of any such tool for linux.

3) gather some more system metrics besides just top output for clues.  But 
which?

4) try to eliminate the global locks in the kernel/futex.c file.  I'm 
afraid it might not be that easy though - the weird freeze make me wonder 
if the scheduler may play a role here as well.

5) make some noise on linux-kernel.  Figured I'd check here first.

Thanks,
Bill


STATEMENT OF CONFIDENTIALITY

The information contained in this electronic message and any attachments
to this message are intended for the exclusive use of the addressee(s)
and may contain confidential or privileged information. If you are not
the intended recipient, please notify SunGard Trading Systems immediately
at (201) 499-5900 and destroy all copies of this message and any
attachments.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]