[Linux-cluster] Hard lockups when writing a lot to GFS
Rick Stevens
rstevens at vitalstream.com
Thu Dec 9 22:31:29 UTC 2004
I have a two-node setup on a dual-port SCSI SAN. Note this is just
for test purposes. Part of the SAN is a GFS filesystem shared between
the two nodes.
When we fetch content to the GFS filesystem via an rsync pull (well,
several rsync pulls) on node 1, it runs for a while then node 1 hard
locks (nothing on the console, network dies, console dies, it's frozen
solid). Of course, node 2 notices it and marks node 1 down
(/proc/cluster/nodes shows an "X" for node 1 under "Sts"). So the
cluster behaviour is OK. If I "fence-ack-manual -n node1" on node 2,
it runs along happily. I can reboot node 1 and everything returns to
normalcy.
The problem is, why is node 1 dying like this? It is important that
this get sorted out as we have a LOT of data to synchronize (rsync is
just the test case--we'll probably use a different scheme on
deployment), and I suspect it's heavy write activity on that node
that's causing the crash.
Oh, both nodes have the GFS filesystem mounted with "-o rw,noatime".
Any ideas would be GREATLY appreciated!
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer rstevens at vitalstream.com -
- VitalStream, Inc. http://www.vitalstream.com -
- -
- Do you know how to save five drowning lawyers? No? GOOD! -
----------------------------------------------------------------------
More information about the Linux-cluster
mailing list