[Linux-cluster] Adding new file system caused problems

Tue Nov 27 17:22:40 UTC 2007

I am running a two node cluster using Centos 5 that is basically being 
used as a NAS head for our iscsi based storage.  Here are the related 
rpms and their versions I am using:
kmod-gfs-0.1.16-5.2.6.18_8.1.14.el5
kmod-gfs-0.1.16-6.2.6.18_8.1.15.el5
system-config-lvm-1.0.22-1.0.el5
cman-2.0.64-1.0.1.el5
rgmanager-2.0.24-1.el5.centos
gfs-utils-0.1.11-3.el5
lvm2-2.02.16-3.el5
lvm2-cluster-2.02.16-3.el5

This morning I created a 100GB volume on our storage unit and proceeded 
to make it available to the cluster so it could be served via NFS to a 
client on our network.  I used pvcreate and vgcreate as I always do and 
created a new volume group.  When I went to create the logical volume I 
saw this message:
Error locking on node nfs1-cluster.nws.noaa.gov: Volume group for uuid 
not found: 9crOQoM3V0fcuZ1E2163k9vdRLK7njfvnIIMTLPGreuvGmdB1aqx6KR4t7mmDRDs

I figured I had done something wrong and tried to remove the Lvol and 
couldn't.  Lvdisplay showed that the logvol had been created and 
vgdisplay looked good with the exception of the volume not being 
activated.  So, I ran vgchange -aly <Volumegroupname> which didn't 
return any error, but also did not activate the volume.  I then rebooted 
the node which made everything OK.  I could now see the VG and lvol, 
both were active and I could now create the gfs file system on the 
lvol.  The file system mounted  and I thought I was in the clear.

However, node #2 wasn't picking this new filesystem up at all.  I 
stopped the cluster services on this node which all stopped cleanly and 
then tried to restart them.  cman started fine but clvmd didn't.  It 
hung on the vgscan.   Even after a reboot of node #2, clvmd would not 
start and would hang on the vgscan.  It wasn't until I shut down both 
nodes completely and started cluster that both nodes could see the new 
filesystem.

I'm sure it's my own ignorance that's making this more difficult than it 
needs to be.  Am I missing a step?  Is more information required to 
help?  Any assistance in figuring out what happened here would be 
greatly appreciated.  I know I going to need to do similar tasks in the 
future and obviously can't afford to bring everything down in order for 
the cluster to see a new filesystem.

Thank you,

Randy

P.S.  Here is my cluster.conf:
[root at nfs2-cluster ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="ohd_cluster" config_version="114" name="ohd_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="60"/>
        <clusternodes>
                <clusternode name="nfs1-cluster.nws.noaa.gov" nodeid="1" 
votes="1">
                        <fence>
                                <method name="1">
                                        <device name="nfspower" port="8" 
switch="1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="nfs2-cluster.nws.noaa.gov" nodeid="2" 
votes="1">
                        <fence>
                                <method name="1">
                                        <device name="nfspower" port="7" 
switch="1"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="nfs-failover" ordered="0" 
restricted="1">
                                <failoverdomainnode 
name="nfs1-cluster.nws.noaa.gov" priority="1"/>
                                <failoverdomainnode 
name="nfs2-cluster.nws.noaa.gov" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="140.90.91.244" monitor_link="1"/>
                        <clusterfs 
device="/dev/VolGroupFS/LogVol-shared" force_unmount="0" fsid="30647" 
fstype="gfs" mountpoint="/fs/shared" name="fs-shared" options="acl"/>
                        <nfsexport name="fs-shared-exp"/>
                        <nfsclient name="fs-shared-client" 
options="no_root_squash,rw" path="" target="140.90.91.0/24"/>
                        <clusterfs 
device="/dev/VolGroupTemp/LogVol-rfcdata" force_unmount="0" fsid="54233" 
fstype="gfs" mountpoint="/rfcdata" name="rfcdata" options="acl"/>
                        <nfsexport name="rfcdata-exp"/>
                        <nfsclient name="rfcdata-client" 
options="no_root_squash,rw" path="" target="140.90.91.0/24"/>
                </resources>
                <service autostart="1" domain="nfs-failover" name="nfs">
                        <clusterfs ref="fs-shared">
                                <nfsexport ref="fs-shared-exp">
                                        <nfsclient ref="fs-shared-client"/>
                                </nfsexport>
                        </clusterfs>
                        <ip ref="140.90.91.244"/>
                        <clusterfs ref="rfcdata">
                                <nfsexport ref="rfcdata-exp">
                                        <nfsclient ref="rfcdata-client"/>
                                </nfsexport>
                                <ip ref="140.90.91.244"/>
                        </clusterfs>
                </service>
        </rm>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.42.30" 
login="rbrown" name="nfspower" passwd="XXXXXXX"/>
        </fencedevices>
</cluster>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20071127/a1d3f47d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: randy_brown.vcf
Type: text/x-vcard
Size: 313 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20071127/a1d3f47d/attachment.vcf>