[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Poor thread performance on Linux vs. Solaris



On Mon, 8 Sep 2003, Bill Soudan wrote:

> > have you tried oprofile? It should be able to profile your app too.
> 
> I have already, it was one of the tools I used to track the system time
> to the futex spinlock.  I've also used Intel's proprietary vtune tool,
> which points to the same culprit.

as a crude method, attach to one of your worker threads via gdb.
statistically you'll get one of the contention points. Attach a couple of
times to get a feel for which point is the most common one.

another method is to look at oprofile output for your app (oprofile can
profile your own userspace object files as well), and look for high
cachemiss profile hits for instructions that are the _return_ instructions
after a pthread lock function. A contended lock will likely cause a
schedule, and will likely cause a colder cache upon return from the
locking API. Arguably this is a bit indirect method ...

> > How does 'vmstat 1' output look before, during and after such a freeze?
> > Does it show a sharp increase in context-switching? Is there a sharp
> > increase in runqueue length? [nr of tasks running]
> 
> To my untrained eye, context switching appears very high and volatile
> for the duration of the test... the only time it drops is during the
> mysterious freezes, during which system time spikes up to 80-100%.  
> I've included the vmstat output for the duration of the test.  Notice
> the particularly bad freeze near the end of the test, 99% system time
> for 2 seconds.

it would be interesting to know what happens during such sections:

 0  0  0      0 3535332   8020  58416    0    0     0     0 1342  6313 55 28 18
 0  0  0      0 3535268   8020  58416    0    0     0     0 1010    36  1 99  0
 0  0  0      0 3535268   8020  58416    0    0     0     0 1008    19  1 99  0

this is the 'freeze', right? Could you try to do 2-second readprofile
captures, the following way:

	N=0
	while true; do
		readprofile -r
		vmstat -n 2 2 > vmstat-$N
		cp /proc/profile profile-$N
		N=$[$N+1]
	done

this is a relatively low-overhead method of capturing performance data. 
(You can increase the timeout to 4 seconds via 'vmstat -n 4 2')

after the run, check out the vmstat files and try to find the one that
represents the 'freeze' the best way. (it's only the second vmstat line
that represents the 2 seconds measurement) The one that has a very low
'cs' field, while having 99 as 'sy' is the most likely candidate. Then
please send us the symbolic readprofile of the profile file that
accompanies that particular vmstat snapshot. Eg. if vmstat-123 is the one
representing the freeze, then the output of the following line would be
the interesting one:

	readprofile -n -p ./profile-123 | sort -n

> Runqueue length is usually 0 for the duration of the test.

i'd exclude the scheduler at this point. But the high system time during
the 'freeze' period is unnatural - the readprofile snapshot ought to help
us more. It's quite unlikely to be scheduler related. It could be
something your app does rarely and that triggers some code in the kernel.

could you try one more tune? In kernel/futex.c change FUTEX_HASHBITS from
8 to 13 or so. How many locks is your app using, total?

> At the very least, the system time spikes look like a kernel problem to
> me.

agreed.

	Ingo




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]