Disk thrashing, Please HELP!

Mon May 17 13:36:23 UTC 2004

Thanks for the great suggestions.

I rebooted before trying them to get the problem going again and 
... no problem!  So, I've tried to remember what I changed, but 
other than installing some of the recent updates (but not all, 
I'm still working on that... takes a while on a 56K modem) from 
Redhat, I don't think I did anything that would change it.  So, 
I'm probably going to be left with a mystery.

I know I needed to follow these suggestions while the problem 
process(es) were running, but of course, I cannot get it to 
happen now, so here's what I found without them running and 
perhaps you'll see something that I missed.  I admit that while 
not being a newbie to Linux, neither am I a power user, so please 
forgive my ignorance.  My computer serves as the gateway to 
several others in our home network - the other computers run 
various versions of Windows.

Jason Dixon wrote:
> Without more information on what services you're running, it's going to 

In run level 5, I'm running the following: apmd, autofs, crond, 
cups, dhcpd, gpm, hpoj, ip6tables, iptables, irqbalance, isdn, 
kudzu, mdmonitor, mdmpd, microcode_ctl, named, netfs, network, 
nfslock, pcmcia, portmap, random, rawdevices, rhnsd, sendmail, 
sgi_fam, sshd, syslog, vmware, wine, xinetd.  Anything suspicious 
here I should investigate further?

> be tough.  Use "ps afx" while the python process(es) is running to see 
> what's actually calling python.

This showed that up2date called python, but up2date was not 
running before, so that's not it.

Ed Wilts wrote:
> It sounds like you need to look at your scheduled tasks to see what is
> starting python.  One of the ways to do this is to use lsof.  For
> example:
> # lsof / | grep python
> 
> The second column is the pid of the process that's running python.  Now
> see if you can track down the guilty culprit from there.  
> 

Again, only up2date which was not running when I experienced the 
problem.

> You can also check which cron jobs are running python with:
> [root at p6000 ewilts]# grep python /etc/cron.*/*
> [root at p6000 ewilts]# grep python /var/spool/cron/*
> That will help find some, but obviously not all since the cron entry
> could simply be to a script that in turn runs python.
> 

No dice.  Both commands showed nothing.

> My gut tells me you're running mailman since it does have the rare habit
> of thrashing a system like you're describing.  Are you mailman, and if
> so, are you current?

No, I'm using sendmail.

Larry Brown wrote:
> If after rebooting you get the same problem, I'd grep the contents of your
> startup scripts looking for the bang for python.  I have not written
> anything in python, however, you may do ...
> 
> fgrep python *
> 
> from within /etc/rc.d/init.d and see which ones where written in it.  Then
> temporarily disable them by changing SXXscriptName to sXXscriptName where XX
> represents the given number of the script.  This will prevent it from
> starting up.  If upon reboot after that, the thrashing has stopped, you can
> one by one change the s to S and reboot until you find it.  I don't know of

Nope, nothing starts python directly.

> any faster way off hand.  Did this start after loading some package or is
> this some existing package that recently started giving you the problem?

It's some existing package.  I had not installed anything in 
several days when this started.  I had installed several (many) 
fonts right before this happened (not packages, just individual 
fonts) which took a long time in and of itself and didn't get 
finished before I stopped it and rebooted and ended up with the 
thrashing problem, which is why I suspected that might have 
something to do with it, but I can't find any evidence to support 
that.  I've readded some of the fonts again, more slowly, not so 
many at once, without the same thing happening.

Further looking into the system log (once I didn't have so much 
going on that I could actually look at such a huge file), I found 
  the following messages repeated many, many times

May 16 14:04:54 rosina xinetd[20798]: warning: can't get client 
address: Transport endpoint is not connected
May 16 14:04:54 rosina xinetd[20798]: libwrap refused connection 
to sgi_fam (libwrap=fam) from <no address>

The process ID changed, but other than that, these repeated for a 
long time.  So, I'm guessing that that's what syslogd was writing 
causing the disk to thrash.  I've never done anything with 
xinetd, so I don't even know how it works let alone what it was 
doing that might have caused it.

Perhaps it was a cron job that finally finished, but I would have 
thought that rebooting would have stopped the cron job and it 
would not have started again until the next time that job came 
up, not after a reboot, right?  This lasted through several 
reboots.  Also, I had allowed it to run and thrash my hard drive 
for over a day, hoping that whatever was running would finish, 
but it did not.

Thanks for your help,
Rosina

-- 

Rosina Bignall
rbignall at earthlink.net