Frequent RHEL Server crash/restarts

Ray Van Dolson rvandolson at esri.com
Mon Jul 9 21:13:57 UTC 2007


On Mon, Jul 09, 2007 at 02:05:55PM -0700, aix tiger wrote:
> Hi Friends
>    
>   I am facing a strange problem on one of my RHEL server which is
>   that this server crashes and restart frequently. This is an HP
>   proliant DL740 and part of RHEL cluster (V4U4).
>    
>   Another HP proliant DL740 is part of that cluster with same version
>   of RHEL OS and cluster but it faces no such problems...
>    
>   In my /var/log/messages , i receive no errors .. in HP ILO messages
>   there is no error mentioned except a message " A critical server
>   error occured before this POST"...
>    
>   I have asked HP hardware engineer to check all hardware possible
>   errors but he says that from diagnostics there are no issues. 
>    
>   How can i troubleshoot this problem?? There is no specific timings
>   of this problem , it happens any time ( usually once in aweek is a
>   must )... please advice where to solve this issue?
>    

Although it's odd the other machine would be fine and this one is
not...

What are the details of the version of RHEL on your "bad" system?  What
update?  Are you at the latest kernel errata version?

Have you run your own diagnostics like memtest86?

Does the server get proper ventiliation (ie could it be overheating?)
Some graphing of CPU temp or chassis temp may make this easier to
identify and compare to the other machines...

When did this start happening?  Any changes you're aware of around that
time?

Ray




More information about the redhat-list mailing list