[Linux-cluster] gfs2 resource not mounting
Neale Ferguson
neale at sinenomine.net
Fri Oct 3 19:32:34 UTC 2014
Using the same two-node configuration I described in an earlier post this forum, I'm having problems getting a gfs2 resource started on one of the nodes. The resource in question:
Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/vg_cluster/ha_lv directory=/mnt/gfs2-demo fstype=gfs2 options=noatime
Operations: start interval=0s timeout=60 (clusterfs-start-timeout-60)
stop interval=0s timeout=60 (clusterfs-stop-timeout-60)
monitor interval=10s on-fail=fence (clusterfs-monitor-interval-10s)
pcs status shows:
Clone Set: dlm-clone [dlm]
Started: [ rh7cn1.devlab.sinenomine.net rh7cn2.devlab.sinenomine.net ]
Clone Set: clvmd-clone [clvmd]
Started: [ rh7cn1.devlab.sinenomine.net rh7cn2.devlab.sinenomine.net ]
Clone Set: clusterfs-clone [clusterfs]
Started: [ rh7cn1.devlab.sinenomine.net ]
Stopped: [ rh7cn2.devlab.sinenomine.net ]
Failed actions:
clusterfs_start_0 on rh7cn2.devlab.sinenomine.net 'unknown error' (1): call=46, status=complete, last-rc-change='Fri Oct 3 14:41:26 2014', queued=4702ms, exec=0ms
Using pcs resource debug-start I see:
Operation start for clusterfs:0 (ocf:heartbeat:Filesystem) returned 1
> stderr: INFO: Running start for /dev/vg_cluster/ha_lv on /mnt/gfs2-demo
> stderr: mount: permission denied
> stderr: ERROR: Couldn't mount filesystem /dev/vg_cluster/ha_lv on /mnt/gfs2-demo
The log on the node shows -
Oct 3 14:57:37 rh7cn2 kernel: GFS2: fsid=rh7cluster:vol1: Trying to join cluster "lock_dlm", "rh7cluster:vol1"
Oct 3 14:57:38 rh7cn2 kernel: GFS2: fsid=rh7cluster:vol1: Joined cluster. Now mounting FS...
Oct 3 14:57:38 rh7cn2 dlm_controld[5857]: 1564 cpg_dispatch error 9
On the other node -
Oct 3 15:09:47 rh7cn1 kernel: GFS2: fsid=rh7cluster:vol1.0: recover generation 14 done
Oct 3 15:09:48 rh7cn1 kernel: GFS2: fsid=rh7cluster:vol1.0: recover generation 15 done
I'm assuming I didn't define the gfs2 resource such that it could be used concurrently by both nodes. Here's the cib.xml definition for it:
<clone id="clusterfs-clone">
<primitive class="ocf" id="clusterfs" provider="heartbeat" type="Filesystem">
<instance_attributes id="clusterfs-instance_attributes">
<nvpair id="clusterfs-instance_attributes-device" name="device" value="/dev/vg_cluster/ha_lv"/>
<nvpair id="clusterfs-instance_attributes-directory" name="directory" value="/mnt/gfs2-demo"/>
<nvpair id="clusterfs-instance_attributes-fstype" name="fstype" value="gfs2"/>
<nvpair id="clusterfs-instance_attributes-options" name="options" value="noatime"/>
</instance_attributes>
<operations>
<op id="clusterfs-start-timeout-60" interval="0s" name="start" timeout="60"/>
<op id="clusterfs-stop-timeout-60" interval="0s" name="stop" timeout="60"/>
<op id="clusterfs-monitor-interval-10s" interval="10s" name="monitor" on-fail="fence"/>
</operations>
</primitive>
<meta_attributes id="clusterfs-clone-meta">
<nvpair id="clusterfs-interleave" name="interleave" value="true"/>
</meta_attributes>
</clone>
-------------------------------
Unrelated (I believe) to the above, I also note the following messages in /var/log/messages which appear to be related to pacemaker and http (another resource I have defined):
Oct 3 15:05:06 rh7cn2 systemd: pacemaker.service: Got notification message from PID 6036, but reception only permitted for PID 5575
I'm running systemd-208-11.el7_0.2. A bugzilla search matches with one report but the fix was put into -11.
Neale
More information about the Linux-cluster
mailing list