RHELv4 and v5 - So slow as to be unusable.

Yong Huang yong321 at yahoo.com
Fri Oct 8 17:34:59 UTC 2010


Gary,

As you proved, not all performance problems can be identified by 
performance monitoring tools. In this case, "performance" is not a good 
word. "Locking" may be better.

We recently had a problem with TrendMicro on our RHEL 5 box. cp a 1GB 
file took 35 minutes for the prompt to come back, even though the copied 
file started to have the same checksum and size after about 1 minute. 
/proc/<cp pid>/status shows disk sleep state. The cp command is not 
killable, indicating it's in kernel mode not coming back up. strace or 
pstack the process hangs (but strace or pstack is killable). The message 
in /var/log/messages sheds light on the problem:

Sep 26 11:02:11 ourhostname kernel: INFO: task cp:10658 blocked for more than 120 seconds.
Sep 26 11:02:11 ourhostname kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...
Sep 26 11:02:11 ourhostname kernel: Call Trace:
Sep 26 11:02:11 ourhostname kernel:  [<ffffffff884a45a8>] :splxmod:closeHook+0x784/0x9d8

So some splxmod module's closeHook function is the suspect since it's at 
the top of the call stack. Searching on Google indicates it's a module 
in TrendMicro's software. We contacted them and they quickly provided a patch.

RHEL 4 doesn't have /proc/sys/kernel/hung_task_timeout_secs. I'm not sure 
if the kernel can be reconfigured to add that. For those interested, the 
source code is at
http://koders.com/c/fidFAF17DCD13DB287057ACC4136EEEFE2D9644BA9A.aspx

In your case, can you try pstack and strace on a simple process such as 
date (both programs need to be installed)? And tell us /proc/<pid>/status.

Yong Huang


      




More information about the redhat-list mailing list