Frequent RHEL Server crash/restarts

Barry Brimer lists at brimer.org
Tue Jul 10 04:08:40 UTC 2007



On Mon, 9 Jul 2007, aix tiger wrote:

> Hi Friends
>
>  I am facing a strange problem on one of my RHEL server which is that this server crashes and restart frequently. This is an HP proliant DL740 and part of RHEL cluster (V4U4).
>
>  Another HP proliant DL740 is part of that cluster with same version of RHEL OS and cluster but it faces no such problems...
>
>  In my /var/log/messages , i receive no errors .. in HP ILO messages there is no error mentioned except a message " A critical server error occured before this POST"...
>
>  I have asked HP hardware engineer to check all hardware possible errors but he says that from diagnostics there are no issues.
>
>  How can i troubleshoot this problem?? There is no specific timings of this problem , it happens any time ( usually once in aweek is a must )... please advice where to solve this issue?

I had a similar problem with an Oracle RAC cluster.  One node rebooted, 
one didn't.  While I've not yet solved the problem, it is because the 
condition stopped occurring.  I set up netconsole (part of netdump) which 
eventually told me that hangcheck-timer was rebooting my system.  I also 
am running hangwatch (http://people.redhat.com/csnook/hangwatch/) which 
will run sysrq commands to capture the system state when the system load 
spikes.

HTH,
Barry




More information about the redhat-list mailing list