memtest86+ ECC oddity

Rex Dieter rdieter at math.unl.edu
Thu May 4 15:02:04 UTC 2006


Jack Howarth wrote:
>    We have a machine with ECC support enabled in the motherboard firmware
> and ECC DIMMs installed. Recently this machine has suffered a couple
> random freezes and yesterday began to report the following kernel error...
> 
> kernel: EDAC MC0: UE page 0x8e0, offset 0x0, grain 4096, row 0, labels ":": i82875p UE
> 
> ...indicating it had unrecoverable memory errors. However, when I run
> memtest86+ by booting into it, the default settings with ECC disabled
> don't report any memory errors during the test. If I enable the ECC
> mode in memtest86+, I finally do see a bad memory location appear
> repeatedly. 
>    What exactly is happening in this situation? I am guessing that the
> ECC enabled memory is suppressing the bad memory location just enough
> that it passes when the memtest86+ memory test is run with ECC disabled.
> This would only make sense if memtest86+ somehow short-circuited the
> ECC feature when the ECC mode in memtest86+ is enabled so that it could
> see if ECC is correcting memory errors in the background silently. Is
> this a correct read on the situation?

IMO, no.  *I* think it's the ECC feature of your chips/mobo that's the 
culprit here, not ECC masking the problem.  That's just a guess though.

-- Rex




More information about the fedora-list mailing list