memtest86+ ECC oddity - EDAC in kernel 2.6.16 (ie FC5)

David Timms dtimms at bigpond.net.au
Thu May 4 22:13:05 UTC 2006


Jack Howarth wrote:
>     Well it appears while memtest86+ may be able to disable the ECC enabled
> by the BIOS during its default testing mode, memtest86+ can't enable ECC if
> it is disabled in the BIOS. FYI.
I noted one of my machines with ECC ram that happens to have the intel 
865p ecc chipset immediately begun displaying these errors when upgraded 
to FC5. On info provided on 
http://buttersideup.com/edacwiki/WhyAmIgettingMemoryErrors :

1. ran memtest for 60 hours one weekend (but did not know to turn ecc 
test enabled): OK
2. removed 1 stick RAM, UE (uncorrectable errors - but fewer).
3. other stick: UE (UE-but fewer).
4. both sticks in different (allowed) slots: UE (1 to 3 a second)
5. BIOS disable ECC: OK
6. BIOS re-enable ECC, spread spectrm off: UE
7. made a few guesses at memory timing: UE no change.
8. ran memtest for a few minutes with ECC test on: UEs shown.

As I understand it the EDAC module is simply reading the ECC results 
from the chipset that talks to the RAM. It is difficult to decide if the 
RAM is :
a. faulty in it's data values
b. faulty in it's parity values
c. problem with board design
d. problem with memory chipset design
etc.
I guess a good bet would be to install a different ECC ram flavour, or 
to try more pessimistic memory timings.

I think the error correcting code is at the chipset level (the ECC ram 
just provides the extra storage bit per byte that is needed to implement 
the ECC code). Perhaps what is happening is the chipset detects and 
fixes the single bit error - this would mean no data loss / corruption. 
The difference is that now we know that the RAM itself is faulty (UE 
errors in FC5), but at the moment few enough bits are bad at once so 
that ECC can fix the errors - pretty cool. This would be why memtest 
can't detect a problem - the chipset has detected the error and fixed it 
before presenting the data to memtest... [sorry, I just nutted this out 
myself :~)]

DaveT.




More information about the fedora-list mailing list