lhh at redhat.com
Thu Aug 2 15:51:55 UTC 2007
On Tue, Jul 31, 2007 at 08:11:23PM +0300, Janne Peltonen wrote:
> On Tue, Jul 31, 2007 at 05:54:41PM +0300, Janne Peltonen wrote:
> > On Tue, Jul 31, 2007 at 09:41:21AM -0400, Lon Hohberger wrote:
> > > On Tue, Jul 31, 2007 at 03:14:38PM +0300, Janne Peltonen wrote:
> > > > On Tue, Jul 10, 2007 at 06:19:22PM -0400, Lon Hohberger wrote:
> > > > >
> > > > > http://people.redhat.com/lhh/rhel5-test
> > > > >
> > > > > You'll need at least the updated cman package. The -2.1lhh build of
> > > > > rgmanager is the one I just built today; the others are a bit older.
> > > >
> > > > Well, I installed the new versions of the cman and rgmanager packages I
> > > > found there, but to no avail: I still get 1500 invocations of fs.sh per
> > > > second.
> > >
> > > I put a log message in fs.sh:
> > >
> > > Jul 31 09:27:29 bart clurgmgrd: : <err> /usr/share/cluster/fs.sh
> > > TEST
> > >
> > > It comes up once every several (10-20) seconds like it's supposed to.
> > I did the same, with the same results. It seems to me that the clurgmgrd
> > process isn't calling the complete script any more times than it's
> > supposed to. What I'm seeing are the execs of fs.sh, that is, it
> > includes each () and `` and so on. Each fs.sh invocation seems to create
> > quite an amount of subshells.
> > I'm sorry for having misled you. And this all means, there isn't
> > probably much reason to read the cluster.conf and rg_test rules output -
> > I'll attach them anyway.
> After running the new rgmanager packages for abt four hours without any
> of the load fluctuation I'd experienced before (with a more-or-less
> four-hour interval, system load first increases slowly until it reaches
> a high level - dependent on overall system load - and then swiftly
> decreases to near zero, to start increasing again. This fluctuation
> peaks at about 5.0 in a system with no users at all, but many services.
> If there are many users and the user peak coincides with the base peak,
> the system experiences a shortish load peak of abt 100.0, after which it
> recovers and the basic load fluctuation becomes visible again). Then the
> load averages started increasing again, to something 10.0ish, so -
> frustrated - I edited /usr/share/cluster/fs.sh and put an exit 0 to the
> switch-case "status|monitor" on $1. Well. Load averages promptly fell
> back to under 0.5, disk usage% fell by 30 %-units, and overall system
> responsiveness increased considerably.
> So I'll be running my cluster without fs status checks for now. I hope
> someone'll work out what's wrong with fs.sh soon... ;)
There are a number of things we can do - can you file a bugzilla about
this, now that we know what's going on? (and that it's not internal
rgmanager difficulties, just inefficient scripting)?
Lon Hohberger - Software Engineer - Red Hat, Inc.
More information about the Linux-cluster