Got my first EDAC error today

Steve Snyder swsnyder at insightbb.com
Mon Apr 3 14:48:31 UTC 2006


On Monday 03 April 2006 10:21 am, Roger Heflin wrote:
> > -----Original Message-----
> > From: fedora-list-bounces at redhat.com
> > [mailto:fedora-list-bounces at redhat.com] On Behalf Of Steve Snyder
> > Sent: Sunday, April 02, 2006 6:54 AM
> > To: fedora-list at redhat.com
> > Subject: Got my first EDAC error today
> >
> > Got my first error report from the shiny-new EDAC driver
> > today.  A kWriteD window popped up and displayed:
> >
> > EDAC MC0: UE page 0x2c, offset 0x0, grain 4096, row 0, labels
> > "": i82860 UE
> >
> > Great.  Now where do I find how to interpret these error reports?
>
> Some questions:
>
> Does your system have ECC ram?  If you don't have ECC and/or your
> chipset is not supported EDAC will pretty much only check PCI parity. 
> Given that it is reporting i82860 I would guess that your chipset is
> supported, and that it believes that you have ECC>

Yes, I do have ECC RAM, and the BIOS is configured to use it for error 
correction.  Specifically, I have 2 sticks of 512MB dual-channel PC800 
RDRAM.

> UE means uncorrectable error which means that more than 1 bit was
> messed up in your memory, generally you won't get these without getting
> lots of single big (CE) errors.

Well, it is possible I've been getting single-bit errors and didn't know 
it.  Still, though, I would have expected uncorrectable RAM errors to 
have crashed my machine, or at least generated alarming system errors, in 
the past.  Instead this machine has been rock-solid stable in the 3 years 
I've had it, and I've been using the same RAM thoughout that period.

Certainly, RAM can go bad, but the lack of "unexplained" lockups makes me 
a little skeptical that frequent uncorrectable RAM errors are occurring 
on a regular basis.

> You can check /proc/mc/0 that may give you better information, where
> the "" is is supposed to be a label to the dimm location on the
> motherboard, no one has yet mapped the locations that will be listed to
> actual locations on most motherboards.

Actually, I can't check that:

$ ll /proc/mc*
ls: /proc/mc*: No such file or directory
$ ll /proc/edac*
ls: /proc/edac*: No such file or directory

The EDAC info doesn't seem to be brought out to the /proc filesystem, at 
least not in the kernel-smp-2.6.16-1.2069_FC4 that I'm running.

Thanks for the response.




More information about the fedora-list mailing list