I need help debugging a random lockup

Marshall Lewis marshall at novafoundry.com
Thu May 6 00:02:07 UTC 2004


My workstation is actually running on a msi k8t master2-far.  Single
opteron 246 w/1gig ecc ram (2 512 sticks).

How'd you notice the activity on the eth port?  watching the light?  
I'm using the onboard sata, and it has an ide dvd-rom, but it's never
been running during one of these lockups.

I've noticed the some problems with the broadcom nic as well... problems
with it being detected by the switch, sometimes I have to drop the
interface and bring it backup before the switch will notice it.  I've
also seen on some of our other opteron boxes (might be a coincidence..
but I do believe the boxes that have had the problem are using the tg3
driver) where the switch won't auto-sense the speed correctly (registers
100mbit instead of 1000mbit) but the nic thinks it's running at full
speed.... obviously this leads to issues : )  

--
 Marshall
 

On Wed, 2004-05-05 at 16:27, John Griffin wrote:
> we've been struggling w/ a lockup issue and I'm wondering if there is any
> relationship-
> we have
> dual opterons (244s)
> MSI K8D Master-F.
> ECC memory
> we're having a devil of a time getting our CRON scripts to burn backups on a
> CDRW
> I should say the isolated issue is on the portion of the script related to
> the CD Burn program- when the system is at the point of completing the burn
> (writing the table of contents?), our system locks up.
> we noticed activity on our eth0 port and SCSI array at the moment the system
> locks.
> our thought is there is a conflict w/ our RAID (through our Adaptec
> 39320A-R).
> we also have discovered issues in our driver for the Broadcom gig-e on
> board. We have been unable to implement the updated drivers for the
> Broadcoms and are still using the Tigon3 drivers.
> 
> Has anyone had this type of lockup?
> 
> John Griffin
> V1 Datacom Inc
> 
> 
> 
> ----- Original Message ----- 
> From: "Marshall Lewis" <marshall at novafoundry.com>
> To: <amd64-list at redhat.com>
> Sent: Wednesday, May 05, 2004 1:10 PM
> Subject: Re: I need help debugging a random lockup
> 
> 
> > Well.. now I can reproduce it .. sorta..  of course right after sending
> > this email I decide to try and chart some stuff in oo calc, and it has
> > locked 5 times in the last half hour.  I'm not any closer to determining
> > the cause though... maybe it's 32 bit apps? (or maybe it's power
> > fluctuations, or the room is too hot, or my monitors are too close
> > together... )        : )
> >
> > --
> >  Marshall
> >
> > On Wed, 2004-05-05 at 15:34, Marshall Lewis wrote:
> > > I'm not sure how to begin debugging this, it's not something that I've
> > > ever had happen on a linux system before.  I have an opteron workstation
> > > here at work, and it locks up more or less randomly... It's a complete
> > > system freeze.  There are no messages about it in /var/log/messages, and
> > > I'm guessing there aren't any messages on the consoles, although I have
> > > no real way to confirm.
> > >
> > > It doesn't seem to be application related (unless maybe it's X itself or
> > > the nvidia drivers), and is not load related.  .. it can happen while
> > > I'm clicking around in firefox, typeing on openoffice, coding in nedit
> > > or zend studio, highlighting text, etc.. etc... .. and of course it even
> > > happens when my back is turned (probably out of spite).
> > >
> > >  On any other machine I might suspect the ram.. and I guess I'm kinda
> > > suspecting the ram here, but it is an opteron, and it's running with ECC
> > > ram... so ram really shouldn't be an issue right?  or wrong? : )
> > >
> > >
> > > Any suggestions on how I can determine the cause of the lockup? (or at
> > > least narrow it down).  BTW these lockups occured in FC1 as well
> > > (currently running FC2T3).
> > >
> > > --
> > >  Marshall
> > >
> >
> >
> > -- 
> > amd64-list mailing list
> > amd64-list at redhat.com
> > https://www.redhat.com/mailman/listinfo/amd64-list
> 





More information about the amd64-list mailing list