RHELv4 and v5 - So slow as to be unusable.
Yong Huang
yong321 at yahoo.com
Fri Oct 8 17:34:59 UTC 2010
Gary,
As you proved, not all performance problems can be identified by
performance monitoring tools. In this case, "performance" is not a good
word. "Locking" may be better.
We recently had a problem with TrendMicro on our RHEL 5 box. cp a 1GB
file took 35 minutes for the prompt to come back, even though the copied
file started to have the same checksum and size after about 1 minute.
/proc/<cp pid>/status shows disk sleep state. The cp command is not
killable, indicating it's in kernel mode not coming back up. strace or
pstack the process hangs (but strace or pstack is killable). The message
in /var/log/messages sheds light on the problem:
Sep 26 11:02:11 ourhostname kernel: INFO: task cp:10658 blocked for more than 120 seconds.
Sep 26 11:02:11 ourhostname kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...
Sep 26 11:02:11 ourhostname kernel: Call Trace:
Sep 26 11:02:11 ourhostname kernel: [<ffffffff884a45a8>] :splxmod:closeHook+0x784/0x9d8
So some splxmod module's closeHook function is the suspect since it's at
the top of the call stack. Searching on Google indicates it's a module
in TrendMicro's software. We contacted them and they quickly provided a patch.
RHEL 4 doesn't have /proc/sys/kernel/hung_task_timeout_secs. I'm not sure
if the kernel can be reconfigured to add that. For those interested, the
source code is at
http://koders.com/c/fidFAF17DCD13DB287057ACC4136EEEFE2D9644BA9A.aspx
In your case, can you try pstack and strace on a simple process such as
date (both programs need to be installed)? And tell us /proc/<pid>/status.
Yong Huang
More information about the redhat-list
mailing list