[Linux-cluster] rgmanager causing hard lock ups

Ryan Thomson thomsonr at ucalgary.ca
Sat Dec 10 01:14:45 UTC 2005


I retort, it doesn't happen on the same node always, just the one that
gets the "NFS exports" service.

--
Ryan


> I am now certain it has *something* to do with either the "NFS exports"
> service and/or the "Backup Mounts" service as when I removed both
> services, everything started fine. When I re-added them, this time with
> only a couple GFS mounts and NFS exports, it did the same thing...
>
> The interesting part is that it keep happening to the same node, not
> different nodes.
>
> Any help or insights are appreciated.
>
> --
> Ryan
>
>
>> Hi List,
>>
>> I have an RHCS cluster with four nodes on RHEL4U2 using the RHN RPMs
>> and
>> GFS CVS (RHEL4) and LVM2 (clvmd) from source tarball (2.2.01.09).
>>
>> I'm seeing some rather disturbing behavior from my cluster. I can get
>> all
>> the nodes to join, fence each other properly, etc. I also have some
>> services setup, mainly GFS mounts and NFS exports.
>>
>> However, now if I bring up the cluster and start rgmanager, the node
>> that
>> tries to start one or more of the services (I can't tell which service
>> but
>> I suspect the NFS export service) will hard lock with the caps lock and
>> scroll lock lights blinking and the rest of the cluster is useless:
>> services don't start and rgmanager won't stop or reload or do
>> anything...
>> on all the nodes. Also, I have all but one of my services set to NOT
>> autostart, yet when I start rgmanager, they begin starting anyways...
>>
>> Here is my cluster.conf file, I suspect the problem is with my NFS
>> export
>> service as that is the only one I've changed since I started seeing
>> this
>> behavior:
>>
>> <?xml version="1.0" ?>
>> <cluster config_version="99" name="biocomp_cluster">
>>         <fence_daemon clean_start="1" post_fail_delay="0"
>> post_join_delay="3"/>
>>         <clusternodes>
>>                 <clusternode name="wolverine" votes="1">
>>                         <fence>
>>                                 <method name="1">
>>                                         <device name="apcfence"
>> port="1"
>> switch="0"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>                 <clusternode name="skunk" votes="1">
>>                         <fence>
>>                                 <method name="1">
>>                                         <device name="apcfence"
>> port="2"
>> switch="0"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>                 <clusternode name="cottontail" votes="1">
>>                         <fence>
>>                                 <method name="1">
>>                                         <device name="apcfence"
>> port="3"
>> switch="0"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>                 <clusternode name="walrus" votes="1">
>>                         <fence>
>>                                 <method name="1">
>>                                         <device name="apcfence"
>> port="4"
>> switch="0"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>         </clusternodes>
>>         <cman/>
>>         <fencedevices>
>>                 <fencedevice agent="fence_apc" ipaddr="10.1.1.254"
>> login="fence_user" name="apcfence" passwd="xxx"/>
>>         </fencedevices>
>>         <rm>
>>                 <failoverdomains>
>>                         <failoverdomain name="Cluster Failover"
>> ordered="0" restricted="1">
>>                                 <failoverdomainnode name="wolverine"
>> priority="1"/>
>>                                 <failoverdomainnode name="skunk"
>> priority="1"/>
>>                                 <failoverdomainnode name="cottontail"
>> priority="1"/>
>>                         </failoverdomain>
>>                         <failoverdomain name="Backup" ordered="0"
>> restricted="1">
>>                                 <failoverdomainnode name="walrus"
>> priority="1"/>
>>                         </failoverdomain>
>>                 </failoverdomains>
>>                 <resources>
>>                         <nfsexport name="Cluster Export"/>
>>                         <nfsclient name="Biocomp Clients"
>> options="rw,sync" target="xxx.xxx.xxx.xxx/24"/>
>>                         <clusterfs device="/dev/BIOCOMP/docs"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/projects/docs" name="Documentation"
>> options="acl"/>
>>                         <clusterfs device="/dev/BIOCOMP/ryan"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/people/ryan" name="Home - Ryan"
>> options="acl"/>
>>                         <clusterfs device="/dev/BIOCOMP/luca"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/people/luca" name="Home - Luca"
>> options="acl"/>
>>                         <clusterfs device="/dev/BIOCOMP/jlmaccal"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/people/jlmaccal" name="Home - Justin"
>> options="acl"/>
>>                         <clusterfs device="/dev/BIOCOMP/jm_hexane"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/projects/jlmaccal/hexane"
>> name="Project - JM Hexane" options="acl"/>
>>                         <clusterfs device="/dev/BIOCOMP/jm_LJ"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/projects/jlmaccal/LJ" name="Project -
>> JM LJ" options="acl"/>
>>                         <clusterfs
>> device="/dev/BIOCOMP/jm_sidechain_pmf"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/projects/jlmaccal/sidechain_pmf"
>> name="Project - JM sidechain_pmf" options="acl"/>
>>                         <clusterfs device="/dev/BIOCOMP/jm_CG"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/projects/jlmaccal/CG" name="Project -
>> JM CG" options="acl"/>
>>                         <clusterfs device="/dev/BIOCOMP/jm_CISS3"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/projects/jlmaccal/CISS3"
>> name="Project - JM CISS3" options="acl"/>
>>                         <clusterfs
>> device="/dev/BIOCOMP/jm_OPLS-sidechain"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/projects/jlmaccal/OPLS-sidechain"
>> name="Project - JM OPLS-sidechain" options="acl"/>
>>                         <clusterfs device="/dev/BIOCOMP/jm_arg_pull"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/projects/jlmaccal/arg_pull"
>> name="Project - JM arg_pull" options="acl"/>
>>                         <clusterfs device="/dev/BIOCOMP/jm_halo"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/projects/jlmaccal/halo" name="Project
>> - JM halo" options="acl"/>
>>                         <clusterfs device="/dev/BIOCOMP/jm_old_bison"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/projects/jlmaccal/old_bison"
>> name="Project - JM old_bison" options="acl"/>
>>                         <clusterfs device="/dev/BIOCOMP/jm_CISS2"
>> force_unmount="0" fstype="gfs"
>> mountpoint="/projects/jlmaccal/CISS2"
>> name="Project - JM CISS2" options="acl"/>
>>                 </resources>
>>                 <service domain="Cluster Failover" name="cluster NAT">
>>                         <ip address="10.1.1.1" monitor_link="1"/>
>>                         <script file="/cluster/scripts/cluster_nat"
>> name="cluster NAT script"/>
>>                 </service>
>>                 <service domain="Cluster Failover" name="FDS Service">
>>                         <ip address="xxx.xxx.xxx.xxx"
>> monitor_link="1"/>
>>                         <script file="/cluster/scripts/fds" name="FDS
>> script"/>
>>                 </service>
>>                 <service domain="Cluster Failover" exclusive="1"
>> name="NFS
>> Exports">
>>                         <ip address="xxx.xxx.xxx.xxx"
>> monitor_link="1"/>
>>                         <clusterfs ref="Documentation">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Home - Ryan">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Home - Luca">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Home - Justin">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Project - JM Hexane">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Project - JM LJ">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Project - JM sidechain_pmf">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Project - JM CG">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Project - JM CISS3">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Project - JM OPLS-sidechain">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Project - JM arg_pull">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Project - JM halo">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Project - JM old_bison">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                         <clusterfs ref="Project - JM CISS2">
>>                                 <nfsexport ref="Cluster Export">
>>                                         <nfsclient ref="Biocomp
>> Clients"/>
>>                                 </nfsexport>
>>                         </clusterfs>
>>                 </service>
>>                 <service domain="Backup" name="Backup Mounts">
>>                         <clusterfs ref="Documentation"/>
>>                         <clusterfs ref="Home - Ryan"/>
>>                         <clusterfs ref="Home - Luca"/>
>>                         <clusterfs ref="Home - Justin"/>
>>                         <clusterfs ref="Project - JM Hexane"/>
>>                         <clusterfs ref="Project - JM LJ"/>
>>                         <clusterfs ref="Project - JM sidechain_pmf"/>
>>                         <clusterfs ref="Project - JM CG"/>
>>                         <clusterfs ref="Project - JM CISS3"/>
>>                         <clusterfs ref="Project - JM OPLS-sidechain"/>
>>                         <clusterfs ref="Project - JM arg_pull"/>
>>                         <clusterfs ref="Project - JM halo"/>
>>                         <clusterfs ref="Project - JM old_bison"/>
>>                         <clusterfs ref="Project - JM CISS2"/>
>>                 </service>
>>         </rm>
>> </cluster>
>>
>> I'm wondering whether I setup the NFS exports in the "correct" fashion
>> or
>> not... It was working this way just fine until I started adding a lot
>> of
>> GFS volumes and NFS exports for each one.
>>
>> Any clues?
>>
>> --
>> Ryan
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
> --
> Ryan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Ryan




More information about the Linux-cluster mailing list