EDAC error

Brent Snow, Mr. brent.snow at mcgill.ca
Tue Mar 25 13:22:55 UTC 2008


Hi Roger,

	Thanks for the insight!!! 

	This is the odd thing. A standard out of the box Fedora 7 Kernel
(no updates) does not produce this error. It starts as soon as I upgrade
the Kernel. As I want to take advantage of the latest functions of the
Kernel, I wanted to use the latest. 

	I am going to reinstall the system as Fedora 7, and see what
happens. I updated the BIOS to the latest released, and would turn off
ECC if it was available in the BIOS, but it is not an available setting.
I guess I will start there and keep testing and searching for a
solution. 

	Thanks for all the advice.

	Brent

-----Original Message-----
From: fedora-list-bounces at redhat.com
[mailto:fedora-list-bounces at redhat.com] On Behalf Of Roger Heflin
Sent: Monday, March 24, 2008 10:27 AM
To: For users of Fedora
Subject: Re: EDAC error

Ric Moore wrote:
> On Sat, 2008-03-22 at 10:03 -0500, Roger Heflin wrote:
>> Ric Moore wrote:
>>> On Thu, 2008-03-20 at 21:58 -0500, Roger Heflin wrote:
>>>> Brent Snow, Mr. wrote:
>>>>> Hi All,
>>>>>
>>>>>  
>>>>>
>>>>>             I am having a problem with a new Dell PowerEdge 1900
Server
>>>>> running Fedora 8.
>>>>>
>>>>>  
>>>>>
>>>>>             The System setup is as follows:
>>>>>
>>>>>  
>>>>>
>>>>>             2 - Xeon  E5310 (Quad-Core 1.6 GHz) processors
>>>>>
>>>>>  
>>>>>
>>>>>             16 GB of RAM, I SATA 80 GB HDD. 
>>>>>
>>> Holy Smokes! 2 quad cores? That's 8 cores total(?) and 16 GIGS of
Ram??
>>> My Gawd, not only am I jealous as all hell, I'm wondering what kinda
>>> kernel are you running?? Any sort of stock kernel would roll over
and
>>> join the Choir Eternal. 
>> Actually fairly normal kernels work just fine on the large boxes, I
have ran 
>> stock FC6 kernels up to 8 cpus/16 cores and up to 64GB of ram with no
issues.
>>
>>> Wouldn't you be running some sort of mini clustering setup?? Setup
>>> right, it should really blow serious coal. Your problem might lie in
>>> that direction. You might have training wheels on a Dodge Hemi. With
a
>>> machine like that, I could almost do without eating! 
>>> <huge drooling grins> Ric
>>>  
>> Clustering setups are only needed when you have more than 1 machine,
having lots 
>> of cpus on a single machine is much easier than clustering as you
don't need 
>> have to worry about the networking, and the memory can be shared
easily between 
>> the cpus.
> 
> Huh, I wonder then why he's having problems. In the -OLD- days he'd be
> rolling a new kernel. Is the stock kernel multi-cpu aware or does he
> need a more specialized kernel, or is it the kernel at all?? That's
> where I would be looking, fer sure. God, I want one like he's got.
> <scratching strong itch> I always stay a couple of years behind. :)
Ric
> 

Hyperthreading has been around too long, and dual core has also been
around too 
long, so pretty much everyone ships with SMP on *NOW*.   And you are
correct, 
several years ago, SMP was default off on a number of distributions, so
you 
almost always had to compile your own.

EDAC errors either mean that the memory is actually bad (or not
correctly 
seated, or has dirty connectors, or has some other issue), or that EDAC
has some 
sort of issues with either his bios or his hardware.    I guess the
easiest way 
to test would be to test a minimum ram configuration and see if *ANY*
config 
gets no EDAC errors, if he can find a configuration that has no errors,
then it 
is fairly likely that EDAC actually works on that MB, and it is likely
he has 
one of the other problems.

It is really much harder to build the big machines, they have more dimms
to 
start with and each of the dimms have 2x-4x times the number of chips
that a 
normal PC dimm has (ignoring the ECC chips the dimm has), this is
because the 
dimms are often double-sided and sometimes on top of that have 2 or 4
chips 
stacked on top of each other to increase the capacity (I don't remember
the term 
for that), and once you start stacking the fanout on the memory
controller 
rises, and everything gets a lot nastier, and harder to get to work
reliably, 
timing has to be changed (and how much it has to be changed depends on
the 
number of dimms on the controller).  It just gets messy, I have seen
some really 
weird failures when using all of the dimm slots on MB's, often things
are not 
adequately tested by the MB companies and/or noted in the MB manual.


                                  Roger

-- 
fedora-list mailing list
fedora-list at redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list




More information about the fedora-list mailing list