2.6.9-55.ELsmp EDAC Errors

Mike Hanby mhanby at uab.edu
Mon Jul 30 14:11:41 UTC 2007


Thanks for the info on the firmware. I'll have to check with our vendor.

I'll tell you what, the constant string of EDAC errors in the log file
will bring the system down a heck of a lot faster than an actual ECC
problem when the partition fills up :-)

-----Original Message-----
From: redhat-list-bounces at redhat.com
[mailto:redhat-list-bounces at redhat.com] On Behalf Of Troy Knabe
Sent: Wednesday, July 25, 2007 14:17
To: General Red Hat Linux discussion list
Subject: Re: 2.6.9-55.ELsmp EDAC Errors

I got these on Sun x4100's with Opteron 280 and 285 procs.  I did not 
get them with 4.4 (42 version of the kernel), but did with 4.5 (55 
version of the kernel).  I found that Sun had a firmware patch that was 
required to work with the new 4.5 version of Red Hat.  You might check 
with your hardware vendor.  After applying the patch, I have received no

more of the errors, and they were happening constantly before the 
firmware upgrade.

Hope this helps.

-Troy

Mike Hanby wrote:
> Howdy,
> 
>  
> 
> I just installed RedHat 4.5 on 64 identical nodes (Dual AMD Opteron
242
> with 2GB ECC RAM).
> 
>  
> 
> 4 of the nodes are logging errors similar to:
> 
>  
> 
> Jul 25 11:07:30 node1.local kernel: EDAC k8 MC0: extended error code:
> ECC chipkill x4 error
> 
> Jul 25 11:07:30 node1.local sshd[3027]: error: Bind to port 22 on
> 0.0.0.0 failed: Address already in use.
> 
> Jul 25 11:07:31 node1.local kernel: EDAC k8 MC0: general bus error:
> participating processor(local node response), time-out(no timeout)
> memory transaction type(generic read), mem or i/o(mem access), cache
> level(generic)
> 
> Jul 25 11:07:31 node1.local kernel: EDAC k8 MC0: extended error code:
> ECC chipkill x4 error
> 
> Jul 25 11:07:32 node1.local kernel: EDAC k8 MC0: general bus error:
> participating processor(local node response), time-out(no timeout)
> memory transaction type(generic read), mem or i/o(mem access), cache
> level(generic)
> 
>  
> 
> I previously had RedHat 4.0 installed on these nodes and didn't see
any
> EDAC errors.
> 
>  
> 
> I've run memtest86 on the 4 nodes (100 passes) and each of them passed
> without error.
> 
>  
> 
> Does anyone know if it is possible to disable the EDAC checking and
> error reporting?  Primarily because, these errors are logged every
> second and in a short time fill up the /var partition.
> 
>  
> 
> Thanks for any info,
> 
>  
> 
> Mike
> 

-- 
redhat-list mailing list
unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list




More information about the redhat-list mailing list