[Linux-cluster] fc6 two-node cluster with gfs2 not working

Thu Nov 2 21:28:48 UTC 2006

On Thu, Nov 02, 2006 at 02:56:44PM -0600, Greg Swift wrote:
> the first box starts gfs2 just fine, second won't, it hangs at this 
> (from var/log/messages):
> 
> Nov 1 22:41:07 box2 kernel: audit(1162442467.427:150): avc: denied { 
> connectto } for pid=3724 comm="mount.gfs2" 
> path=006766735F636F6E74726F6C645F736F6$
> Nov 1 22:41:07 box2 kernel: GFS2: fsid=: Trying to join cluster 
> "lock_dlm", "outMail:data"
> Nov 1 22:41:07 box2 kernel: audit(1162442467.451:151): avc: denied { 
> search } for pid=3724 comm="mount.gfs2" name="dlm" dev=debugfs ino=13186 
> scontext$
> Nov 1 22:41:07 box2 kernel: dlm: data: recover 1
> Nov 1 22:41:07 box2 kernel: GFS2: fsid=outMail:data.1: Joined cluster. 
> Now mounting FS...
> Nov 1 22:41:07 box2 kernel: dlm: data: add member 1
> Nov 1 22:41:07 box2 kernel: dlm: data: add member 2
> Nov 1 22:49:07 box2 gfs_controld[3639]: mount: failed -17
> 
> 
> Remember it is set to permissive.
> 
> So I shut down the box that came up fine on its own, manually enabled 
> the services on box2 (the box that wasnt coming up) and it works fine. 
> Turned on the box1, and at boot it is hanging at the same place box2 was.
> 
> I also realize that a 2 node cluster is not prefered, but its what i'm 
> setting up, what i have access to at the moment, and honestly i'm not 
> sure that i believe a 3rd box would help (but it might).

A couple things about your cluster.conf

1. You probably want to set a post_join_delay of around 10 to avoid
   fencing at startup time. e.g.

  <fence_daemon post_join_delay="10">
  </fence_daemon>

2. It's "fencedevices" and "fencedevice", no "_".

With both machines in a clean/reset state, do service cman start on both,
then start clvmd on both, then _before_ doing anything with gfs, check the
status; on both nodes run:

$ cman_tool status
$ cman_tool nodes
$ group_tool -v

then do "mount -v /dev/foo /dir" on one node
then do group_tool -v on that node
then do "mount -v /dev/foo /dir" on the other node
then do group_tool -v on _both_ nodes

Send the output of all that and we'll try to see where things are going
off track.

Dave