[Linux-cluster] Can't write to file systems

Sat Mar 25 16:37:55 UTC 2006

Unfortunately I had to resolve the problem before I got your reply so I
didn't have a chance to try your solutions.  The mailing list seems to have
been having slowness issues.

In a slightly desperate act I fenced and rejoined each node one after the
other but didn't start gfs.  When I rejoined the second of the three nodes,
everything magically started working again, the third node started accessing
the gfs file systems as expected.  Next I remounted the gfs file systems on
the other boxes and they have also been behaving themselves.  Finally I
restarted  rgmanager one one node and all my nfs mounts where restored.
Slightly irrationally I haven't started rgmanager on the other nodes as I
think for some reason it may be to blame.

In summary, it seemed that access to the file systems was being prevented by
the cluster for some reason.  Any more thoughts anyone may have would be
appreciated.

In addition is there a way to check:

1 If a file system is suspended?
2 Why?
3 Which node is causing the problem?
4 A guide to interpreting the structure/comments of  /proc/cluster/services?

Cheers
Ben

> On Fri, Mar 24, 2006 at 12:26:16AM -0000, Ben Yarwood wrote:
> > 
> > I am really stumped and could do with some help.
> > 
> > I have a 3 node gfs cluster running gfs 6.1 nad it has started to 
> > behave very strangely after I had some problems earlier today 
> > expanding one of the file systems.
> > 
> > At the moment all the nodes are in the cluser and it is 
> quorate, and 
> > all the gfs file systems are mounted.  Reading from the gfs file 
> > systems works fine but anything that tries to write to them 
> causes the file system to hang.
> 
> I've never heard of that happening before.  Here are a couple 
> things you could try.
> 
> - Try mounting one fs at a time and seeing if there's 
> problems with just
>   one of them, or all, or only when there's more than one fs 
> mounted...
> 
> - Shut down the cluster and have just a single node mount each fs
>   using lock_nolock: mount -t gfs /dev/foo /gfs -o 
> lockproto=lock_nolock
>   This mounts the fs as a local file system, so make sure you 
> don't have
>   other nodes mounting the fs!  Once it's mounted like this, 
> see if you
>   can access the fs normally.
> 
> > I see a lot of gfs_recoverd and dlm_recoverd processes running.  Do 
> > these have something to do with it?
> 
> Those will be running all the time.
> 
> Dave
> 
>