[Linux-cluster] write's pausing - which tools to debug?

Troy Dawson dawson at fnal.gov
Tue Oct 18 14:20:14 UTC 2005


Hi,
We've been having some problems with doing a write's to our GFS file 
system, and it will pause, for long periods.  (Like from 5 to 10 
seconds, to 30 seconds, and occasially 5 minutes)  After the pause, it's 
like nothing happened, whatever the process is, just keeps going happy 
as can be.
Except for these pauses, our GFS is quite zippy, both reads and writes. 
  But these pauses are holding us back from going full production.
I need to know what tools I should use to figure out what is causing 
these pauses.

Here is the setup.
-------------------
All machines: RHEL 4 update 1 (ok, actually S.L. 4.1), kernel 
2.6.9-11.ELsmp, GFS 6.1.0, ccs 1.0.0, gulm 1.0.0, rgmanager 1.9.34

I have no ability to do fencing yet, so I chose to use the gulm locking 
mechanism.  I have it setup so that there are 3 lock servers, for 
failover.  I have tested the failover, and it works quite well.

I have 5 machines in the cluster.  1 isn't connected to the SAN, or 
using GFS.  It is just a failover gulm lock server incase the other two 
lock servers go down.

So I have 4 machines connected to our SAN and using GFS.  3 are 
read-only, 1 is read-write.  If it is important, the 3 read-only are 
x86_64, the 1 read-write and the 1 not connected are i386.

The read/write machine is our master lock server.  Then one of the 
read-only is a fallback lock server, as is the machine not using GFS.
----------------

Anyway, we're getting these pauses when writting, and I'm having a hard 
time tracking down where the problem is.  I *think* that we can still 
read from the other machines.  But since this comes and goes, I haven't 
been able to verify that.

Anyway, which tools do you think would be best in diagnosing this?

Many Thanks
Troy Dawson
-- 
__________________________________________________
Troy Dawson  dawson at fnal.gov  (630)840-6468
Fermilab  ComputingDivision/CSS  CSI Group
__________________________________________________




More information about the Linux-cluster mailing list