[Linux-cluster] RHCS TestCluster with ScientificLinux 5.2
Rainer Schwierz
R.Schwierz at physik.tu-dresden.de
Thu Sep 17 05:30:09 UTC 2009
Hello,
hmm, meanwhile the fence_apc problem is fixed by a more recent version
of fence_apc.
But the nfs lock problem is still open. Does it mean I definitely
should not use ScientificLinux and switch to Fedora 11 or RHEL5.4?
Cheers, Rainer
Rainer Schwierz wrote:
> Hello experts,
>
> In preparation of a new production system I have setup a testsystem
> with RHCS under ScientificLinux 5.2.
> It consists of two identical nodes FSC/RX200, a Brocade FibreChannel
> switch, a FSC/SX80 FibreChannel-raidarray, and a APC-powerswitch.
> The configuration is attached at the end.
> I want to have three (GFS) filesystems
> - exported via nfs to a number of clients, each service has its own IP
> - backup the filesystems via TSM to a TSM-server
>
> I see some problems I need an explanation/solution:
> 1) if I connect the nfs-clients to the IP of the configured nfs-service
> started e.g. on tnode02, the filesystem is mounted, but I see a
> strange lock problem
> tnode02 kernel: portmap: server "client-IP" not\
> responding, timed out
> tnode02 kernel: lockd: server "client-IP" not responding,\
> timed out
> It goes away, if I bind the nfs-clients direct to the IP of the
> the node tnode02. If I start the services on tnode01, it is exactly
> the same problem, solved by binding the clients direct to tnode01. It
> does not depend on firewall configuration, it is the same if I switch
> off iptables on both tnode0[12] and clients.
>
> 2) tnode02 died with kernel-panic; no real helpfull logs found regarding
> the panic, I only see a lot of messages regarding problems nfs
> locking over gfs :
>
> kernel: lockd: grant for unknown block
> kernel: dlm: dlm_plock_callback: lock granted after lock request failed
>
> before the kernel paniced, but is this a real reason to panic?
>
> At this point tnod01 tried to take over the cluster and to fence
> tnode02, which gave an error, I do not understand, because fence_apc
> runnig by hand (On, Off, Status) is properly working
>
> tnode01 fenced[3127]: fencing node "tnode02.phy.tu-dresden.de"
> tnode01 fenced[3127]: agent "fence_apc" reports: Traceback (most recent
> call last): File "/sbin/fence_apc", line 829, in ? main() File
> "/sbin/fence_apc", line 349, in main do_power_off(sock) File
> "/sbin/fence_apc", line 813, in do_power_off x =
> do_power_switch(sock, "off") File "/sbi
> tnode01 fenced[3127]: agent "fence_apc" reports: n/fence_apc", line 611,
> in do_power_switch result_code, response = power_off(txt + ndbuf)
> File "/sbin/fence_apc", line 817, in power_off x =
> power_switch(buffer, False, "2", "3"); File "/sbin/fence_apc", line
> 810, in power_switch raise "un
> tnode01 fenced[3127]: agent "fence_apc" reports: known screen
> encountered in \n" + str(lines) + "\n" unknown screen encountered in
> ['', '> 2', '', '', '------- Configure Outlet
> ------------------------------------------------------', '', ' #
> State Ph Name Pwr On Dly Pwr Off D
> tnode01 fenced[3127]: agent "fence_apc" reports: ly Reboot Dur.', '
> ----------------------------------------------------------------------------',
> ' 2 ON 1 Outlet 2 0 sec 0 sec 5
> sec', '', ' 1- Outlet Name : Outlet 2', ' 2- Power On
> Delay(sec) :
> tnode01 fenced[3127]: agent "fence_apc" reports: 0', ' 3- Power Off
> Delay(sec): 0', ' 4- Reboot Duration(sec): 5', ' 5- Accept
> Changes : ', '', ' ?- Help, <ESC>- Back, <ENTER>- Refresh,
> <CTRL-L>- Event Log']
>
> So tnode01 did not stop fencing tnod02 and so it was not able to take
> over the cluster services. Via system-config-cluster one was also not
> able to stop any service. Stopping processes did not really help. The
> only solution at this point was to power down both nodes and restart
> the cluster.
>
> so my questions:
>
> Is there a solution for the locking problem if one bind the nfs clients
> to the configured nfs service IP ?
>
> Is there an explanation/solution of the nfs (dlm) GFS locking problem ?
>
> Is there a signifivant update to fence_apc I have missed ?
>
> Why do I have to configure the GFS resources with the "force umount"
> option?
> I was under the impression that one can mount GFS filesystems
> simultanously on a number of nodes. If I define the GFS resources
> without "force umount", the filesystem is not mounted at all. But
> running the defined TSM service depends on all mounted filesystems.
>
> Thanks for any help, Rainer
>
> The configuration is
> Scientific Linux SL release 5.2 (Boron)
> kernel 2.6.18-128.4.1.el5 #1 SMP Tue Aug 4 12:51:10 EDT 2009 x86_64
> x86_64 x86_64 GNU/Linux
> device-mapper-multipath-0.4.7-23.el5_3.2.x86_64
> rgmanager-2.0.38-2.el5_2.1.x86_64
> system-config-cluster-1.0.52-1.1.noarch
> cman-2.0.84-2.el5.x86_64
> kmod-gfs-0.1.23-5.el5_2.4.x86_64
> gfs2-utils-0.1.44-1.el5.x86_64
> gfs-utils-0.1.17-1.el5.x86_64
> lvm2-cluster-2.02.32-4.el5.x86_64
> modcluster-0.12.0-7.el5.x86_64
> ricci-0.12.0-7.el5.x86_64
> openais-0.80.3-15.el5.x86_64
>
> cluster.conf
> <?xml version="1.0"?>
> <cluster alias="tstw_HA2" config_version="115" name="tstw_HA2">
> <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
> <clusternodes>
> <clusternode name="tnode02.tst.tu-dresden.de" nodeid="1"
> votes="1">
> <fence>
> <method name="1">
> <device name="HA_APC" port="2"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="tnode01.tst.tu-dresden.de" nodeid="2"
> votes="1">
> <fence>
> <method name="1">
> <device name="HA_APC" port="1"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1"/>
> <fencedevices>
> <fencedevice agent="fence_apc" ipaddr="192.168.0.10"
> login="xxx" name="HA_APC" passwd="yy-xxxx"/>
> </fencedevices>
> <rm>
> <failoverdomains>
> <failoverdomain name="HA_new_failover"
> ordered="1" restricted="1">
> <failoverdomainnode
> name="tnode01.tst.tu-dresden.de" priority="1"/>
> <failoverdomainnode
> name="tnode02.tst.tu-dresden.de" priority="2"/>
> </failoverdomain>
> </failoverdomains>
> <resources>
> <clusterfs device="/dev/VG1/LV00"
> force_unmount="1" fsid="53422" fstype="gfs" mountpoint="/global_home"
> name="home_GFS" options=""/>
> <nfsexport name="home_nfsexport"/>
> <nfsclient name="tstw_home"
> options="rw,root_squash" path="/global_home"
> target="tstw*.tst.tu-dresden.de"/>
> <ip address="111.22.33.32" monitor_link="1"/>
> <ip address="192.168.20.30" monitor_link="1"/>
> <nfsclient name="fast_nfs_home_clients"
> options="rw,root_squash" path="/global_home" target="192.168.20.0/24"/>
> <nfsexport name="cluster_nfsexport"/>
> <nfsclient name="tstw_cluster"
> options="no_root_squash,ro" path="/global_cluster"
> target="tstw*.tst.tu-dresden.de"/>
> <nfsclient name="fast_nfs_cluster_clients"
> options="no_root_squash,ro" path="/global_cluster"
> target="192.168.20.0/24"/>
> <script file="/etc/rc.d/init.d/tsm"
> name="TSM_backup"/>
> <clusterfs device="/dev/VG1/LV10"
> force_unmount="1" fsid="192" fstype="gfs" mountpoint="/global_cluster"
> name="cluster_GFS" options=""/>
> <clusterfs device="/dev/VG1/LV20"
> force_unmount="1" fsid="63016" fstype="gfs" mountpoint="/global_soft"
> name="software_GFS" options=""/>
> <nfsexport name="soft_nfsexport"/>
> <nfsclient name="tstw_soft"
> options="rw,root_squash" path="/global_soft"
> target="tstw*.tst.tu-dresden.de"/>
> <nfsclient name="fast_nfs_soft_clients"
> options="rw,root_squash" path="/global_soft" target="192.168.20.0/24"/>
> <nfsclient name="tsts_home"
> options="no_root_squash,rw" path="/global_home"
> target="tsts0*.tst.tu-dresden.de"/>
> <nfsclient name="tsts_cluster"
> options="rw,root_squash" path="/global_cluster"
> target="tsts0*.tst.tu-dresden.de"/>
> <nfsclient name="tsts_soft"
> options="rw,root_squash" path="/global_soft"
> target="tsts0*.tst.tu-dresden.de"/>
> <nfsclient name="tstf_home"
> options="rw,root_squash" path="/global_home"
> target="tstf*.tst.tu-dresden.de"/>
> <nfsclient name="tstf_cluster"
> options="rw,root_squash" path="/global_cluster"
> target="tstf*.tst.tu-dresden.de"/>
> <nfsclient name="tstf_soft"
> options="rw,root_squash" path="/global_soft"
> target="tstf*.tst.tu-dresden.de"/>
> <ip address="111.22.33.31" monitor_link="1"/>
> <ip address="111.22.33.30" monitor_link="1"/>
> <ip address="192.168.20.31" monitor_link="1"/>
> <ip address="192.168.20.32" monitor_link="1"/>
> <clusterfs device="/dev/VG1/LV20"
> force_unmount="0" fsid="11728" fstype="gfs" mountpoint="/global_soft"
> name="Software_GFS" options=""/>
> <clusterfs device="/dev/VG1/LV10"
> force_unmount="0" fsid="36631" fstype="gfs" mountpoint="/global_cluster"
> name="Cluster_GFS" options=""/>
> <clusterfs device="/dev/VG1/LV00"
> force_unmount="0" fsid="45816" fstype="gfs" mountpoint="/global_home"
> name="Home_GFS" options=""/>
> </resources>
> <service autostart="1" domain="HA_new_failover"
> name="service_nfs_home">
> <nfsexport ref="home_nfsexport"/>
> <nfsclient ref="tstw_home"/>
> <ip ref="111.22.33.32"/>
> <nfsclient ref="tsts_home"/>
> <nfsclient ref="tstf_home"/>
> <clusterfs ref="home_GFS"/>
> </service>
> <service autostart="1" domain="HA_new_failover"
> name="service_nfs_home_fast">
> <nfsexport ref="home_nfsexport"/>
> <nfsclient ref="fast_nfs_home_clients"/>
> <ip ref="192.168.20.32"/>
> <clusterfs ref="Home_GFS"/>
> </service>
> <service autostart="1" domain="HA_new_failover"
> name="service_nfs_cluster">
> <nfsexport ref="cluster_nfsexport"/>
> <nfsclient ref="tstw_cluster"/>
> <nfsclient ref="tsts_cluster"/>
> <nfsclient ref="tstf_cluster"/>
> <ip ref="111.22.33.30"/>
> <clusterfs ref="cluster_GFS"/>
> </service>
> <service autostart="1" name="service_nfs_cluster_fast">
> <nfsexport ref="cluster_nfsexport"/>
> <ip ref="192.168.20.30"/>
> <nfsclient ref="fast_nfs_cluster_clients"/>
> <clusterfs ref="Cluster_GFS"/>
> </service>
> <service autostart="1" domain="HA_new_failover"
> name="service_TSM">
> <ip ref="111.22.33.31"/>
> <script ref="TSM_backup"/>
> <clusterfs ref="Software_GFS"/>
> <clusterfs ref="Cluster_GFS"/>
> <clusterfs ref="Home_GFS"/>
> </service>
> <service autostart="1" domain="HA_new_failover"
> name="service_nfs_soft">
> <nfsexport ref="soft_nfsexport"/>
> <nfsclient ref="tstw_soft"/>
> <nfsclient ref="tsts_soft"/>
> <nfsclient ref="tstf_soft"/>
> <ip ref="111.22.33.31"/>
> <clusterfs ref="software_GFS"/>
> </service>
> <service autostart="1" domain="HA_new_failover"
> name="service_nfs_soft_fast">
> <nfsexport ref="soft_nfsexport"/>
> <nfsclient ref="fast_nfs_soft_clients"/>
> <ip ref="192.168.20.31"/>
> <clusterfs ref="Software_GFS"/>
> </service>
> </rm>
> </cluster>
>
--
| R.Schwierz at physik.tu-dresden.de |
| Rainer Schwierz, Inst. f. Kern- und Teilchenphysik |
| TU Dresden, D-01062 Dresden |
| Tel. ++49 351 463 32957 FAX ++49 351 463 37292 |
| http://iktp.tu-dresden.de/~schwierz/ |
More information about the Linux-cluster
mailing list