Disk thrashing, Please HELP!
Rosina Bignall
rbignall at earthlink.net
Mon May 17 13:36:23 UTC 2004
Thanks for the great suggestions.
I rebooted before trying them to get the problem going again and
... no problem! So, I've tried to remember what I changed, but
other than installing some of the recent updates (but not all,
I'm still working on that... takes a while on a 56K modem) from
Redhat, I don't think I did anything that would change it. So,
I'm probably going to be left with a mystery.
I know I needed to follow these suggestions while the problem
process(es) were running, but of course, I cannot get it to
happen now, so here's what I found without them running and
perhaps you'll see something that I missed. I admit that while
not being a newbie to Linux, neither am I a power user, so please
forgive my ignorance. My computer serves as the gateway to
several others in our home network - the other computers run
various versions of Windows.
Jason Dixon wrote:
> Without more information on what services you're running, it's going to
In run level 5, I'm running the following: apmd, autofs, crond,
cups, dhcpd, gpm, hpoj, ip6tables, iptables, irqbalance, isdn,
kudzu, mdmonitor, mdmpd, microcode_ctl, named, netfs, network,
nfslock, pcmcia, portmap, random, rawdevices, rhnsd, sendmail,
sgi_fam, sshd, syslog, vmware, wine, xinetd. Anything suspicious
here I should investigate further?
> be tough. Use "ps afx" while the python process(es) is running to see
> what's actually calling python.
This showed that up2date called python, but up2date was not
running before, so that's not it.
Ed Wilts wrote:
> It sounds like you need to look at your scheduled tasks to see what is
> starting python. One of the ways to do this is to use lsof. For
> example:
> # lsof / | grep python
>
> The second column is the pid of the process that's running python. Now
> see if you can track down the guilty culprit from there.
>
Again, only up2date which was not running when I experienced the
problem.
> You can also check which cron jobs are running python with:
> [root at p6000 ewilts]# grep python /etc/cron.*/*
> [root at p6000 ewilts]# grep python /var/spool/cron/*
> That will help find some, but obviously not all since the cron entry
> could simply be to a script that in turn runs python.
>
No dice. Both commands showed nothing.
> My gut tells me you're running mailman since it does have the rare habit
> of thrashing a system like you're describing. Are you mailman, and if
> so, are you current?
No, I'm using sendmail.
Larry Brown wrote:
> If after rebooting you get the same problem, I'd grep the contents of your
> startup scripts looking for the bang for python. I have not written
> anything in python, however, you may do ...
>
> fgrep python *
>
> from within /etc/rc.d/init.d and see which ones where written in it. Then
> temporarily disable them by changing SXXscriptName to sXXscriptName where XX
> represents the given number of the script. This will prevent it from
> starting up. If upon reboot after that, the thrashing has stopped, you can
> one by one change the s to S and reboot until you find it. I don't know of
Nope, nothing starts python directly.
> any faster way off hand. Did this start after loading some package or is
> this some existing package that recently started giving you the problem?
It's some existing package. I had not installed anything in
several days when this started. I had installed several (many)
fonts right before this happened (not packages, just individual
fonts) which took a long time in and of itself and didn't get
finished before I stopped it and rebooted and ended up with the
thrashing problem, which is why I suspected that might have
something to do with it, but I can't find any evidence to support
that. I've readded some of the fonts again, more slowly, not so
many at once, without the same thing happening.
Further looking into the system log (once I didn't have so much
going on that I could actually look at such a huge file), I found
the following messages repeated many, many times
May 16 14:04:54 rosina xinetd[20798]: warning: can't get client
address: Transport endpoint is not connected
May 16 14:04:54 rosina xinetd[20798]: libwrap refused connection
to sgi_fam (libwrap=fam) from <no address>
The process ID changed, but other than that, these repeated for a
long time. So, I'm guessing that that's what syslogd was writing
causing the disk to thrash. I've never done anything with
xinetd, so I don't even know how it works let alone what it was
doing that might have caused it.
Perhaps it was a cron job that finally finished, but I would have
thought that rebooting would have stopped the cron job and it
would not have started again until the next time that job came
up, not after a reboot, right? This lasted through several
reboots. Also, I had allowed it to run and thrash my hard drive
for over a day, hoping that whatever was running would finish,
but it did not.
Thanks for your help,
Rosina
--
Rosina Bignall
rbignall at earthlink.net
More information about the redhat-list
mailing list