[Linux-cluster] fs.sh?

Tue Jul 31 17:11:23 UTC 2007

On Tue, Jul 31, 2007 at 05:54:41PM +0300, Janne Peltonen wrote:
> On Tue, Jul 31, 2007 at 09:41:21AM -0400, Lon Hohberger wrote:
> > On Tue, Jul 31, 2007 at 03:14:38PM +0300, Janne Peltonen wrote:
> > > On Tue, Jul 10, 2007 at 06:19:22PM -0400, Lon Hohberger wrote:
> > > > 
> > > > http://people.redhat.com/lhh/rhel5-test
> > > > 
> > > > You'll need at least the updated cman package.  The -2.1lhh build of
> > > > rgmanager is the one I just built today; the others are a bit older.
> > > 
> > > Well, I installed the new versions of the cman and rgmanager packages I
> > > found there, but to no avail: I still get 1500 invocations of fs.sh per
> > > second.
> > 
> > I put a log message in fs.sh:
> > 
> > Jul 31 09:27:29 bart clurgmgrd: [4395]: <err> /usr/share/cluster/fs.sh
> > TEST 
> > 
> > It comes up once every several (10-20) seconds like it's supposed to. 
> 
> I did the same, with the same results. It seems to me that the clurgmgrd
> process isn't calling the complete script any more times than it's
> supposed to.  What I'm seeing are the execs of fs.sh, that is, it
> includes each () and `` and so on. Each fs.sh invocation seems to create
> quite an amount of subshells.
> 
> I'm sorry for having misled you. And this all means, there isn't
> probably much reason to read the cluster.conf and rg_test rules output -
> I'll attach them anyway.

After running the new rgmanager packages for abt four hours without any
of the load fluctuation I'd experienced before (with a more-or-less
four-hour interval, system load first increases slowly until it reaches
a high level - dependent on overall system load - and then swiftly
decreases to near zero, to start increasing again. This fluctuation
peaks at about 5.0 in a system with no users at all, but many services.
If there are many users and the user peak coincides with the base peak,
the system experiences a shortish load peak of abt 100.0, after which it
recovers and the basic load fluctuation becomes visible again). Then the
load averages started increasing again, to something 10.0ish, so -
frustrated - I edited /usr/share/cluster/fs.sh and put an exit 0 to the
switch-case "status|monitor" on $1. Well. Load averages promptly fell
back to under 0.5, disk usage% fell by 30 %-units, and overall system
responsiveness increased considerably.

So I'll be running my cluster without fs status checks for now. I hope
someone'll work out what's wrong with fs.sh soon... ;)

--Janne