RHELv4 and v5 - So slow as to be unusable.

Fri Oct 8 20:04:09 UTC 2010

As soon as one of the machines slows down again I will try that.  I have 
been able to ^C out of a program that was taking a long time to start up 
however.

The one user discovered this morning that when the system is in this state 
the output of "date" goes backwards.  A simple csh loop,
       while (1)
            date
       end
Shows that time *mainly* goes forward but perhaps every few seconds time 
suddenly jumps backwards by a few seconds.    This may explain why the GUI 
clocks remain nearly "frozen" overnight if a machine is left in this 
state.  The hardware clock (/usr/sbin/hwclock) is fine but the kernel's 
concept of time (/bin/date) is not always running forward for some reason. 
 ntp was not configured on that machine and so it was not running.

Does anyone know whether a Pentium 4 machine should run a simple kernel 
such as 2.6.9-89.29-1.El or if it should run an SMP kernel such as 
2.6.9-89.29.1.ELsmp? 

Of the eight Pentium 4 machines I have, five have chosen the SMP kernel 
and the other three have not.  There are four different motherboards, and 
so presumably four different BIOS loads, and the two machines displaying 
the problem both use the same motherboard.  They both chose to run the SMP 
kernel.  I noticed when I loaded the RHELv4 CDs that only a non-SMP kernel 
was installed.  And at the first up2date run it brought in a new kernel, 
in both single and SMP forms, and then changed grub to run the new SMP 
kernel.  At the moment I have one of the problem machines running the 
original 2.6.9-89.ELsmp kernel and the other one running the newer 
2.6.9-89.29.1.EL (non-SMP) kernel to see either change makes a difference.

        Gary

Yong Huang <yong321 at yahoo.com> wrote on 10/08/2010 10:34:59 AM:

> From: Yong Huang <yong321 at yahoo.com>
> To: Gary E Barnes/Cupertino/IBM at IBMUS
> Cc: redhat-list at redhat.com
> Date: 10/08/2010 10:41 AM
> Subject: Re: RHELv4 and v5 - So slow as to be unusable.
> 
> Gary,
> 
> As you proved, not all performance problems can be identified by 
> performance monitoring tools. In this case, "performance" is not a good 
> word. "Locking" may be better.
> 
> We recently had a problem with TrendMicro on our RHEL 5 box. cp a 1GB 
> file took 35 minutes for the prompt to come back, even though the copied 

> file started to have the same checksum and size after about 1 minute. 
> /proc/<cp pid>/status shows disk sleep state. The cp command is not 
> killable, indicating it's in kernel mode not coming back up. strace or 
> pstack the process hangs (but strace or pstack is killable). The message 

> in /var/log/messages sheds light on the problem:
> 
> Sep 26 11:02:11 ourhostname kernel: INFO: task cp:10658 blocked for 
> more than 120 seconds.
> Sep 26 11:02:11 ourhostname kernel: "echo 0 > /proc/sys/kernel/
> hung_task_timeout_secs" disables this message.
> ...
> Sep 26 11:02:11 ourhostname kernel: Call Trace:
> Sep 26 11:02:11 ourhostname kernel:  [<ffffffff884a45a8>] 
> :splxmod:closeHook+0x784/0x9d8
> 
> So some splxmod module's closeHook function is the suspect since it's at 

> the top of the call stack. Searching on Google indicates it's a module 
> in TrendMicro's software. We contacted them and they quickly provided a 
patch.
> 
> RHEL 4 doesn't have /proc/sys/kernel/hung_task_timeout_secs. I'm not 
sure 
> if the kernel can be reconfigured to add that. For those interested, the 

> source code is at
> http://koders.com/c/fidFAF17DCD13DB287057ACC4136EEEFE2D9644BA9A.aspx
> 
> In your case, can you try pstack and strace on a simple process such as 
> date (both programs need to be installed)? And tell us 
/proc/<pid>/status.
> 
> Yong Huang
> 
> 
>