From claudio.tassini at gmail.com Sat Sep 1 00:57:48 2007 From: claudio.tassini at gmail.com (Claudio Tassini) Date: Sat, 1 Sep 2007 02:57:48 +0200 Subject: [Linux-cluster] Multipathed quorum disk Message-ID: <39fdf1c70708311757h75a57fc3r15b740ed8ad0f58b@mail.gmail.com> Hi, I recently upgraded a 2-nodes cluster adding two more nodes. I would like a single node to remain in cluster even if the other three are out of service, so I'm trying to add a quorum disk to the cluster. The problem is that the quorum disk is a LUN in a shared storage which has not the same device name through all the cluster nodes. Moreover, we use device-mapper AND lvm. I could resolve the problem using an lvm logical volume, because it would always have the same name and recognize the underlying "dm" or "sd" device name even if it changes across a reboot, but I've read that it's not advisable to use a logical volume as quorum device. Any idea? -- Claudio Tassini -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianbrn at gmail.com Sat Sep 1 10:50:27 2007 From: ianbrn at gmail.com (Ian Brown) Date: Sat, 1 Sep 2007 13:50:27 +0300 Subject: [Linux-cluster] GFS and GFS2 : two questions: which is bettter; gfs_controld error Message-ID: - Hello, I had installed RHEL5 on two x86_64 machine on the same LAN; afterwards I had installed the RHEL5 cluster suite packege (cman-2.0.60-1.el5) and openais-0.80.2-1.el5. I had also installed kmod-gfs-0.1.16-5.2.6.18_8.el5 and gfs-utils and gfs2-utils. I had crated a 2-node cluster and started the cman service OK on both nodes. Now I tried to create a gfs partition with gfs_mkfs (with -p lock_dlm) and mount it, and I got errors when trying to mount it (this errors talk about gfs_controld). I made a second try with mkfs.gfs2 (also with -p lock_dlm) ); this time I **could** mounted the gfs2 partition succesfully. My questions are: - should I be able with this installation to create and mount a gfs partition ? in case this is possible - what can be my mistale ? - is gfs2 considered safe to work with ? or is it still experimental and not recommended ? which features do I have in GFS2 which I don't have in GFS? Regards, Ian From wcheng at redhat.com Sat Sep 1 17:38:19 2007 From: wcheng at redhat.com (Wendy Cheng) Date: Sat, 01 Sep 2007 13:38:19 -0400 Subject: [Linux-cluster] GFS and GFS2 : two questions: which is bettter; gfs_controld error In-Reply-To: References: Message-ID: <46D9A38B.50304@redhat.com> Ian Brown wrote: > - Hello, > I had installed RHEL5 on two x86_64 machine on the same LAN; afterwards I > had installed the RHEL5 cluster suite packege (cman-2.0.60-1.el5) and > openais-0.80.2-1.el5. > > > I had also installed kmod-gfs-0.1.16-5.2.6.18_8.el5 and gfs-utils >and gfs2-utils. > > I had crated a 2-node cluster and started the cman service OK on both nodes. > > Now I tried to create a gfs partition with gfs_mkfs (with -p lock_dlm) > and mount it, and I got errors when trying to mount it (this errors >talk about > gfs_controld). > > You didn't include the error message here ? This could be a known issue where gfs kernel module is not loaded by default (due to a RPM dependency problem). To check it out: before mounting the gfs partition ... 1) shell> lsmod This is to check whether gfs (not gfs2) kernel module is loaded. If yes, mount the gfs partition, then read the /var/log/messages file and cut-and-paste the print-out (a.k.a the gfs_controld error messages) and repost here. 2) shell> cd /lib/modules/'your kernel version'/ extra/gfs Check if gfs.ko is there. If not, you have installation problems. 3) shell> insmod gfs.ko This is to manually load gfs kernel module 4) Retry the mount. If still failing, send us the /var/log/messages file. > I made a second try with mkfs.gfs2 (also with -p lock_dlm) ); > this time I **could** mounted the gfs2 partition succesfully. > > GFS2 is part of the base kernel, so it doesn't need to worry about RPM dependency. > My questions are: > > - should I be able with this installation to create and mount a gfs > partition ? in case this is possible - what can be my mistale ? > > See above. > - is gfs2 considered safe to work with ? or is it still experimental and > not recommended ? which features do I have in GFS2 which I don't have in > GFS? > > > The advantage of GFS2 are (my personal opinion - not necessarily Red Hat's) : 1. It is mainstream and will be well maintained and updated; vs. GFS starts to enter maintanence mode. We're hoping to phase out GFS as soon as GFS2 is proved to be stable. 2. It preforms better (faster), particularly for smaller file size, but not as stable as GFS. However, there are tools to facilitate people to migrate from GFS to GFS2. So if you want stability, GFS is not a bad choice at this moment. -- Wendy From wcheng at redhat.com Sat Sep 1 17:42:28 2007 From: wcheng at redhat.com (Wendy Cheng) Date: Sat, 01 Sep 2007 13:42:28 -0400 Subject: [Linux-cluster] GFS and GFS2 : two questions: which is bettter; gfs_controld error In-Reply-To: <46D9A38B.50304@redhat.com> References: <46D9A38B.50304@redhat.com> Message-ID: <46D9A484.5050908@redhat.com> > 2) shell> cd /lib/modules/'your kernel version'/ extra/gfs Hope you catch I have a (wrong) white space before the "extra" in above sentence. > 3) shell> insmod gfs.ko > This is to manually load gfs kernel module should be "insmod ./gfs.ko" -- Wendy From kadlec at sunserv.kfki.hu Sat Sep 1 19:42:33 2007 From: kadlec at sunserv.kfki.hu (Kadlecsik Jozsi) Date: Sat, 1 Sep 2007 21:42:33 +0200 (MEST) Subject: [Linux-cluster] quorum lost in spite of 'leave remove' In-Reply-To: References: <46D828EE.5070103@redhat.com> Message-ID: On Fri, 31 Aug 2007, Kadlecsik Jozsi wrote: > On Fri, 31 Aug 2007, Kadlecsik Jozsi wrote: > > > > > '/etc/init.d/cman stop' was issued and executed successfully on the tree > > > > other nodes. > > > > > > It looks like a bug to me. Congratulations - you're the first person to spot that! I collected some debug logs on a node when at another one the command 'cman stop' was issued and attached it. I hope it helps. Best regards, Jozsef -- E-mail : kadlec at sunserv.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary -------------- next part -------------- waiting for aisexec to start waiting for aisexec to start [MAIN ] AIS Executive Service RELEASE 'subrev 1358 version 0.80.3' [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors. [MAIN ] Copyright (C) 2006 Red Hat, Inc. [MAIN ] AIS Executive Service: started and ready to provide service. [MAIN ] Using override node name lxserv0-gfs [MAIN ] openais component openais_cpg loaded. [MAIN ] Registering service handler 'openais cluster closed process group service v1.01' [MAIN ] openais component openais_cfg loaded. [MAIN ] Registering service handler 'openais configuration service' [MAIN ] openais component openais_msg loaded. [MAIN ] Registering service handler 'openais message service B.01.01' [MAIN ] openais component openais_lck loaded. [MAIN ] Registering service handler 'openais distributed locking service B.01.01' [MAIN ] openais component openais_evt loaded. [MAIN ] Registering service handler 'openais event service B.01.01' [MAIN ] openais component openais_ckpt loaded. [MAIN ] Registering service handler 'openais checkpoint service B.01.01' [MAIN ] openais component openais_amf loaded. [MAIN ] Registering service handler 'openais availability management framework B.01.01' [MAIN ] openais component openais_clm loaded. [MAIN ] Registering service handler 'openais cluster membership service B.01.01' [MAIN ] openais component openais_evs loaded. [MAIN ] Registering service handler 'openais extended virtual synchrony service' [MAIN ] openais component openais_cman loaded. [MAIN ] Registering service handler 'openais CMAN membership service 2.01' [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms) [TOTEM] token hold (386 ms) retransmits before loss (20 retrans) [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms) [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs) [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500 [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages) [TOTEM] send threads (0 threads) [TOTEM] RRP token expired timeout (495 ms) [TOTEM] RRP token problem counter (2000 ms) [TOTEM] RRP threshold (10 problem count) [TOTEM] RRP mode set to none. [TOTEM] heartbeat_failures_allowed (0) [TOTEM] max_network_delay (50 ms) [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0 [TOTEM] Receive multicast socket recv buffer size (288000 bytes). [TOTEM] Transmit multicast socket send buffer size (262142 bytes). [TOTEM] The network interface [192.168.192.15] is now up. [TOTEM] Created or loaded sequence id 944.192.168.192.15 for this ring. [TOTEM] entering GATHER state from 15. [SERV ] Initialising service handler 'openais extended virtual synchrony service' [SERV ] Initialising service handler 'openais cluster membership service B.01.01' [SERV ] Initialising service handler 'openais availability management framework B.01.01' [SERV ] Initialising service handler 'openais checkpoint service B.01.01' [SERV ] Initialising service handler 'openais event service B.01.01' [SERV ] Initialising service handler 'openais distributed locking service B.01.01' [SERV ] Initialising service handler 'openais message service B.01.01' [SERV ] Initialising service handler 'openais configuration service' [SERV ] Initialising service handler 'openais cluster closed process group service v1.01' [SERV ] Initialising service handler 'openais CMAN membership service 2.01' [CMAN ] CMAN 2.01.00 (built Aug 23 2007 12:19:58) started [SYNC ] Not using a virtual synchrony filter. [TOTEM] Creating commit token because I am the rep. [TOTEM] Saving state aru 0 high seq received 0 [TOTEM] entering COMMIT state. [TOTEM] entering RECOVERY state. [TOTEM] position [0] member 192.168.192.15: [TOTEM] previous ring seq 944 rep 192.168.192.15 [TOTEM] aru 0 high delivered 0 received flag 0 [TOTEM] Did not need to originate any messages in recovery. [TOTEM] Storing new sequence id for ring 3b4 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is b [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is b [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 4000000b to fd 10 [TOTEM] Sending initial ORF token [logging.c:0090] daemon: read 0 bytes from fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 7 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 7 [logging.c:0090] memb: command return code is -2 [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 40000007 to fd 10 [CLM ] CLM CONFIGURATION CHANGE [CLM ] New Configuration: [CLM ] Members Left: [CLM ] Members Joined: [logging.c:0090] ais: confchg_fn called type = 1, seq=948 [SYNC ] This node is within the primary component and will provide service. [CLM ] CLM CONFIGURATION CHANGE [CLM ] New Configuration: [CLM ] r(0) ip(192.168.192.15) [CLM ] Members Left: [CLM ] Members Joined: [CLM ] r(0) ip(192.168.192.15) [logging.c:0090] ais: confchg_fn called type = 0, seq=948 [logging.c:0090] ais: last memb_count = 0, current = 1 [logging.c:0090] memb: sending TRANSITION message. cluster_name = kfki [logging.c:0090] ais: comms send message 0xbfadf2dc len = 65 [logging.c:0090] daemon: sending reply 103 to fd 10 [SYNC ] This node is within the primary component and will provide service. [TOTEM] entering OPERATIONAL state. [logging.c:0090] ais: deliver_fn called, iov_len = 1, iov[0].len = 81, source nodeid = 1, conversion reqd=0 [logging.c:0090] memb: Message on port 0 is 5 (len = 65) [logging.c:0090] memb: got TRANSITION from node 1 [logging.c:0090] memb: add_ais_node ID=1, incarnation = 948 [CLM ] got nodejoin message 192.168.192.15 [TOTEM] entering GATHER state from 11. [TOTEM] Saving state aru 9 high seq received 9 [TOTEM] entering COMMIT state. [TOTEM] entering RECOVERY state. [TOTEM] position [0] member 192.168.192.6: [TOTEM] previous ring seq 952 rep 192.168.192.6 [TOTEM] aru 2d high delivered 2d received flag 0 [TOTEM] position [1] member 192.168.192.7: [TOTEM] previous ring seq 952 rep 192.168.192.6 [TOTEM] aru 2d high delivered 2d received flag 0 [TOTEM] position [2] member 192.168.192.15: [TOTEM] previous ring seq 948 rep 192.168.192.15 [TOTEM] aru 9 high delivered 9 received flag 0 [TOTEM] position [3] member 192.168.192.17: [TOTEM] previous ring seq 952 rep 192.168.192.6 [TOTEM] aru 2d high delivered 2d received flag 0 [TOTEM] position [4] member 192.168.192.18: [TOTEM] previous ring seq 952 rep 192.168.192.6 [TOTEM] aru 2d high delivered 2d received flag 0 [TOTEM] Did not need to originate any messages in recovery. [TOTEM] Storing new sequence id for ring 3bc [CLM ] CLM CONFIGURATION CHANGE [CLM ] New Configuration: [CLM ] r(0) ip(192.168.192.15) [CLM ] Members Left: [CLM ] Members Joined: [logging.c:0090] ais: confchg_fn called type = 1, seq=956 [SYNC ] This node is within the primary component and will provide service. [CLM ] CLM CONFIGURATION CHANGE [CLM ] New Configuration: [CLM ] r(0) ip(192.168.192.6) [CLM ] r(0) ip(192.168.192.7) [CLM ] r(0) ip(192.168.192.15) [CLM ] r(0) ip(192.168.192.17) [CLM ] r(0) ip(192.168.192.18) [CLM ] Members Left: [CLM ] Members Joined: [CLM ] r(0) ip(192.168.192.6) [CLM ] r(0) ip(192.168.192.7) [CLM ] r(0) ip(192.168.192.17) [CLM ] r(0) ip(192.168.192.18) [logging.c:0090] ais: confchg_fn called type = 0, seq=956 [logging.c:0090] ais: last memb_count = 1, current = 5 [logging.c:0090] memb: sending TRANSITION message. cluster_name = kfki [logging.c:0090] ais: comms send message 0xbfadf2dc len = 65 [logging.c:0090] daemon: sending reply 103 to fd 10 [SYNC ] This node is within the primary component and will provide service. [TOTEM] entering OPERATIONAL state. [logging.c:0090] ais: deliver_fn called, iov_len = 1, iov[0].len = 81, source nodeid = 2, conversion reqd=0 [logging.c:0090] memb: Message on port 0 is 5 (len = 65) [logging.c:0090] memb: got TRANSITION from node 2 [logging.c:0090] memb: add_ais_node ID=2, incarnation = 956 [logging.c:0090] ais: deliver_fn called, iov_len = 1, iov[0].len = 81, source nodeid = 1, conversion reqd=0 [logging.c:0090] memb: Message on port 0 is 5 (len = 65) [logging.c:0090] memb: got TRANSITION from node 1 [logging.c:0090] memb: add_ais_node ID=1, incarnation = 956 [logging.c:0090] ais: deliver_fn called, iov_len = 1, iov[0].len = 81, source nodeid = 3, conversion reqd=0 [logging.c:0090] memb: Message on port 0 is 5 (len = 65) [logging.c:0090] memb: got TRANSITION from node 3 [logging.c:0090] memb: add_ais_node ID=3, incarnation = 956 [CMAN ] quorum regained, resuming activity [logging.c:0090] ais: deliver_fn called, iov_len = 1, iov[0].len = 81, source nodeid = 5, conversion reqd=0 [logging.c:0090] memb: Message on port 0 is 5 (len = 65) [logging.c:0090] memb: got TRANSITION from node 5 [logging.c:0090] memb: add_ais_node ID=5, incarnation = 956 [logging.c:0090] ais: deliver_fn called, iov_len = 1, iov[0].len = 81, source nodeid = 4, conversion reqd=0 [logging.c:0090] memb: Message on port 0 is 5 (len = 65) [logging.c:0090] memb: got TRANSITION from node 4 [logging.c:0090] memb: add_ais_node ID=4, incarnation = 956 [CLM ] got nodejoin message 192.168.192.6 [CLM ] got nodejoin message 192.168.192.7 [CLM ] got nodejoin message 192.168.192.15 [CLM ] got nodejoin message 192.168.192.17 [CLM ] got nodejoin message 192.168.192.18 [CPG ] got joinlist message from node 2 [CPG ] got joinlist message from node 3 [CPG ] got joinlist message from node 5 [CPG ] got joinlist message from node 4 [logging.c:0090] daemon: read 20 bytes from fd 11 [logging.c:0090] daemon: client command is 1 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 1 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 40000001 to fd 11 [logging.c:0090] daemon: read 20 bytes from fd 11 [logging.c:0090] daemon: client command is 5 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 5 [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 40000005 to fd 11 [logging.c:0090] daemon: read 20 bytes from fd 11 [logging.c:0090] daemon: client command is 7 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 7 [logging.c:0090] memb: get_all_members: allocated new buffer (retsize=1024) [logging.c:0090] memb: get_all_members: retlen = 2120 [logging.c:0090] memb: command return code is 5 [logging.c:0090] daemon: Returning command data. length = 2120 [logging.c:0090] daemon: sending reply 40000007 to fd 11 [logging.c:0090] daemon: read 20 bytes from fd 11 [logging.c:0090] daemon: client command is 7 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 7 [logging.c:0090] memb: get_all_members: allocated new buffer (retsize=1024) [logging.c:0090] memb: get_all_members: retlen = 2120 [logging.c:0090] memb: command return code is 5 [logging.c:0090] daemon: Returning command data. length = 2120 [logging.c:0090] daemon: sending reply 40000007 to fd 11 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 7 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 7 [logging.c:0090] memb: get_all_members: allocated new buffer (retsize=1024) [logging.c:0090] memb: get_all_members: retlen = 2120 [logging.c:0090] memb: command return code is 5 [logging.c:0090] daemon: Returning command data. length = 2120 [logging.c:0090] daemon: sending reply 40000007 to fd 10 [logging.c:0090] daemon: read 0 bytes from fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 91 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 91 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 24 [logging.c:0090] daemon: sending reply 40000091 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 9 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 9 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 16 [logging.c:0090] daemon: sending reply 40000009 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 92 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 92 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 320 [logging.c:0090] daemon: sending reply 40000092 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 5 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 5 [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 40000005 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is d [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is d [logging.c:0090] memb: command return code is 2 [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 4000000d to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 90 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 90 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 424 [logging.c:0090] daemon: sending reply 40000090 to fd 10 [logging.c:0090] daemon: read 0 bytes from fd 10 [logging.c:0090] ais: deliver_fn called, iov_len = 1, iov[0].len = 20, source nodeid = 2, conversion reqd=0 [logging.c:0090] memb: Message on port 0 is 7 (len = 4) [logging.c:0090] memb: got LEAVE from node 2, reason = 3 [TOTEM] The token was lost in the OPERATIONAL state. [TOTEM] Receive multicast socket recv buffer size (288000 bytes). [TOTEM] Transmit multicast socket send buffer size (262142 bytes). [TOTEM] entering GATHER state from 2. [TOTEM] entering GATHER state from 11. [TOTEM] Saving state aru 43 high seq received 43 [TOTEM] entering COMMIT state. [TOTEM] entering RECOVERY state. [TOTEM] position [0] member 192.168.192.6: [TOTEM] previous ring seq 956 rep 192.168.192.6 [TOTEM] aru 43 high delivered 43 received flag 0 [TOTEM] position [1] member 192.168.192.15: [TOTEM] previous ring seq 956 rep 192.168.192.6 [TOTEM] aru 43 high delivered 43 received flag 0 [TOTEM] position [2] member 192.168.192.17: [TOTEM] previous ring seq 956 rep 192.168.192.6 [TOTEM] aru 43 high delivered 43 received flag 0 [TOTEM] position [3] member 192.168.192.18: [TOTEM] previous ring seq 956 rep 192.168.192.6 [TOTEM] aru 43 high delivered 43 received flag 0 [TOTEM] Did not need to originate any messages in recovery. [TOTEM] Storing new sequence id for ring 3c4 [CLM ] CLM CONFIGURATION CHANGE [CLM ] New Configuration: [CLM ] r(0) ip(192.168.192.6) [CLM ] r(0) ip(192.168.192.15) [CLM ] r(0) ip(192.168.192.17) [CLM ] r(0) ip(192.168.192.18) [CLM ] Members Left: [CLM ] r(0) ip(192.168.192.7) [CLM ] Members Joined: [logging.c:0090] ais: confchg_fn called type = 1, seq=964 [logging.c:0090] memb: del_ais_node 2 [logging.c:0090] daemon: sending reply 102 to fd 11 [SYNC ] This node is within the primary component and will provide service. [CLM ] CLM CONFIGURATION CHANGE [CLM ] New Configuration: [CLM ] r(0) ip(192.168.192.6) [CLM ] r(0) ip(192.168.192.15) [CLM ] r(0) ip(192.168.192.17) [CLM ] r(0) ip(192.168.192.18) [CLM ] Members Left: [CLM ] Members Joined: [logging.c:0090] ais: confchg_fn called type = 0, seq=964 [logging.c:0090] ais: last memb_count = 5, current = 4 [logging.c:0090] memb: sending TRANSITION message. cluster_name = kfki [logging.c:0090] ais: comms send message 0xbfadf2dc len = 65 [SYNC ] This node is within the primary component and will provide service. [TOTEM] entering OPERATIONAL state. [logging.c:0090] daemon: read 20 bytes from fd 11 [logging.c:0090] daemon: client command is 5 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 5 [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 40000005 to fd 11 [logging.c:0090] daemon: read 20 bytes from fd 11 [logging.c:0090] daemon: client command is 7 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 7 [logging.c:0090] memb: get_all_members: allocated new buffer (retsize=1024) [logging.c:0090] memb: get_all_members: retlen = 2120 [logging.c:0090] memb: command return code is 5 [logging.c:0090] daemon: Returning command data. length = 2120 [logging.c:0090] daemon: sending reply 40000007 to fd 11 [logging.c:0090] daemon: read 20 bytes from fd 11 [logging.c:0090] daemon: client command is 7 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 7 [logging.c:0090] memb: get_all_members: allocated new buffer (retsize=1024) [logging.c:0090] memb: get_all_members: retlen = 2120 [logging.c:0090] memb: command return code is 5 [logging.c:0090] daemon: Returning command data. length = 2120 [logging.c:0090] daemon: sending reply 40000007 to fd 11 [logging.c:0090] ais: deliver_fn called, iov_len = 1, iov[0].len = 81, source nodeid = 1, conversion reqd=0 [logging.c:0090] memb: Message on port 0 is 5 (len = 65) [logging.c:0090] memb: got TRANSITION from node 1 [logging.c:0090] memb: add_ais_node ID=1, incarnation = 964 [logging.c:0090] ais: deliver_fn called, iov_len = 1, iov[0].len = 81, source nodeid = 3, conversion reqd=0 [logging.c:0090] memb: Message on port 0 is 5 (len = 65) [logging.c:0090] memb: got TRANSITION from node 3 [logging.c:0090] memb: add_ais_node ID=3, incarnation = 964 [logging.c:0090] ais: deliver_fn called, iov_len = 1, iov[0].len = 81, source nodeid = 5, conversion reqd=0 [logging.c:0090] memb: Message on port 0 is 5 (len = 65) [logging.c:0090] memb: got TRANSITION from node 5 [logging.c:0090] memb: add_ais_node ID=5, incarnation = 964 [logging.c:0090] ais: deliver_fn called, iov_len = 1, iov[0].len = 81, source nodeid = 4, conversion reqd=0 [logging.c:0090] memb: Message on port 0 is 5 (len = 65) [logging.c:0090] memb: got TRANSITION from node 4 [logging.c:0090] memb: add_ais_node ID=4, incarnation = 964 [CLM ] got nodejoin message 192.168.192.6 [CLM ] got nodejoin message 192.168.192.15 [CLM ] got nodejoin message 192.168.192.17 [CLM ] got nodejoin message 192.168.192.18 [CPG ] got joinlist message from node 3 [CPG ] got joinlist message from node 5 [CPG ] got joinlist message from node 4 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 91 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 91 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 24 [logging.c:0090] daemon: sending reply 40000091 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 9 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 9 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 16 [logging.c:0090] daemon: sending reply 40000009 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 92 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 92 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 320 [logging.c:0090] daemon: sending reply 40000092 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 5 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 5 [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 40000005 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is d [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is d [logging.c:0090] memb: command return code is 2 [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 4000000d to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 90 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 90 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 424 [logging.c:0090] daemon: sending reply 40000090 to fd 10 [logging.c:0090] daemon: read 0 bytes from fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 91 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 91 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 24 [logging.c:0090] daemon: sending reply 40000091 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 9 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 9 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 16 [logging.c:0090] daemon: sending reply 40000009 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 92 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 92 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 320 [logging.c:0090] daemon: sending reply 40000092 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 5 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 5 [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 40000005 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is d [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is d [logging.c:0090] memb: command return code is 2 [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 4000000d to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 90 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 90 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 424 [logging.c:0090] daemon: sending reply 40000090 to fd 10 [logging.c:0090] daemon: read 0 bytes from fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 91 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 91 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 24 [logging.c:0090] daemon: sending reply 40000091 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 9 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 9 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 16 [logging.c:0090] daemon: sending reply 40000009 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 92 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 92 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 320 [logging.c:0090] daemon: sending reply 40000092 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 5 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 5 [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 40000005 to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is d [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is d [logging.c:0090] memb: command return code is 2 [logging.c:0090] daemon: Returning command data. length = 0 [logging.c:0090] daemon: sending reply 4000000d to fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 90 [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 90 [logging.c:0090] memb: command return code is 0 [logging.c:0090] daemon: Returning command data. length = 424 [logging.c:0090] daemon: sending reply 40000090 to fd 10 [logging.c:0090] daemon: read 0 bytes from fd 10 [logging.c:0090] daemon: read 20 bytes from fd 10 [logging.c:0090] daemon: client command is 800000bb [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is 800000bb [logging.c:0090] daemon: sending reply 102 to fd 11 [logging.c:0090] memb: command return code is -11 [logging.c:0090] daemon: read 20 bytes from fd 11 [logging.c:0090] daemon: client command is bc [logging.c:0090] daemon: About to process command [logging.c:0090] memb: command to process is bc [logging.c:0090] memb: Shutdown reply is 1 [logging.c:0090] memb: Sending LEAVE, reason 3 [logging.c:0090] ais: comms send message 0xbfae5bdc len = 4 [logging.c:0090] memb: shutdown decision is: 0 (yes=1, no=0) flags=2 [logging.c:0090] memb: command return code is -11 [logging.c:0090] ais: deliver_fn called, iov_len = 1, iov[0].len = 20, source nodeid = 1, conversion reqd=0 [logging.c:0090] memb: Message on port 0 is 7 (len = 4) [logging.c:0090] memb: got LEAVE from node 1, reason = 3 [logging.c:0090] daemon: send status return: 0 [logging.c:0090] daemon: sending reply c00000bb to fd 10 From carlopmart at gmail.com Sat Sep 1 20:00:03 2007 From: carlopmart at gmail.com (carlopmart) Date: Sat, 01 Sep 2007 22:00:03 +0200 Subject: [Linux-cluster] Re: fence_xvmd doesn't starts In-Reply-To: <46D7E431.2020100@gmail.com> References: <46D7E431.2020100@gmail.com> Message-ID: <46D9C4C3.3070009@gmail.com> carlopmart wrote: > Hi all, > > I am running standalone xen host using rhel5 with three rhel5 xen guest > with cluster-suite. I have setup fence_xvm as a fence device on all > three guest. On the host side I have setup fence_xvmd on cluster.conf file. > > My problems starts when I need to restart xen server host. Every time > that reboots, fence_xvmd doesn't starts. If I execute "service cman > restart" all its ok: fence_xvmd starts. Why?? How can I fix it?? > > Many thanks. > Please I need an answer about this ... -- CL Martinez carlopmart {at} gmail {d0t} com From Nick.Couchman at seakr.com Sat Sep 1 22:17:27 2007 From: Nick.Couchman at seakr.com (Nick Couchman) Date: Sat, 01 Sep 2007 16:17:27 -0600 Subject: [Linux-cluster] GFS and GFS2 : two questions: which is bettter; gfs_controld error Message-ID: <46D99096.87A6.0099.1@seakr.com> In my opinion, GFS2 is still not stable enough for production use. GFS2 is designed to be better than GFS, but still lacks some stability. GFS2 has better support for certain features (extended attributes, for example), and is supposed to perform better. You can start with a GFS filesystem, then use the gfs2_convert utility when GFS2 becomes stable to move to GFS2. --Nick >>> On 2007/09/01 at 04:50:27, "Ian Brown" wrote: - Hello, I had installed RHEL5 on two x86_64 machine on the same LAN; afterwards I had installed the RHEL5 cluster suite packege (cman-2.0.60-1.el5) and openais-0.80.2-1.el5. I had also installed kmod-gfs-0.1.16-5.2.6.18_8.el5 and gfs-utils and gfs2-utils. I had crated a 2-node cluster and started the cman service OK on both nodes. Now I tried to create a gfs partition with gfs_mkfs (with -p lock_dlm) and mount it, and I got errors when trying to mount it (this errors talk about gfs_controld). I made a second try with mkfs.gfs2 (also with -p lock_dlm) ); this time I **could** mounted the gfs2 partition succesfully. My questions are: - should I be able with this installation to create and mount a gfs partition ? in case this is possible - what can be my mistale ? - is gfs2 considered safe to work with ? or is it still experimental and not recommended ? which features do I have in GFS2 which I don't have in GFS? Regards, Ian -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianbrn at gmail.com Sun Sep 2 08:04:23 2007 From: ianbrn at gmail.com (Ian Brown) Date: Sun, 2 Sep 2007 11:04:23 +0300 Subject: [Linux-cluster] GFS and GFS2 : two questions: which is bettter; gfs_controld error In-Reply-To: <46D9A38B.50304@redhat.com> References: <46D9A38B.50304@redhat.com> Message-ID: Hello, I had ran "modprobe gfs" and I see by lsmod the the gfs module is loaded. also I had verified that under /lib/modules/MyKernelVersion/extra/gfs/ there is gfs.ko. Then I try: gfs_mkfs -p lock_dlm -t myCLuster -j 32 /dev/cciss/c0d1p2 mount /dev/cciss/c0d1p2 /mnt/gfs The errors I see in the console are: /sbin/mount.gfs: lock_dlm_join: gfs_controld join error: -22 /sbin/mount.gfs: error mounting lockproto lock_dlm The error I see in kernel log is: gfs_controld[32629]: mount: not in default fence domain I want to add that the cman service is started succesfully as the kernel log shows. I want also to add that "service cman start" performs modprbe of gfs2 module and not gfs module ! Namely, I ran rmmod gfs; then, after : service cman stop and rmmod lock_dlm rmmod gfs2 running lsmod | grep gfs2 shows that no gfs2 is loaded, and after "service cman start" I see by lsmod | grep gfs2 gfs2 522965 1 lock_dlm which means that starting the cman service performed modprobe/insmod of gfs2 and lock_dlm Is this how things should be? rgs, Ian On 9/1/07, Wendy Cheng wrote: > Ian Brown wrote: > > > - Hello, > > I had installed RHEL5 on two x86_64 machine on the same LAN; afterwards I > > had installed the RHEL5 cluster suite packege (cman-2.0.60-1.el5) and > > openais-0.80.2-1.el5. > > > > > > I had also installed kmod-gfs-0.1.16-5.2.6.18_8.el5 and gfs-utils > >and gfs2-utils. > > > > I had crated a 2-node cluster and started the cman service OK on both nodes. > > > > Now I tried to create a gfs partition with gfs_mkfs (with -p lock_dlm) > > and mount it, and I got errors when trying to mount it (this errors > >talk about > > gfs_controld). > > > > > You didn't include the error message here ? This could be a known issue > where gfs kernel module is not loaded by default (due to a RPM > dependency problem). To check it out: before mounting the gfs partition ... > > 1) shell> lsmod > This is to check whether gfs (not gfs2) kernel module is loaded. If yes, > mount the gfs partition, then read the /var/log/messages file and > cut-and-paste the print-out (a.k.a the gfs_controld error messages) and > repost here. > > 2) shell> cd /lib/modules/'your kernel version'/ extra/gfs > Check if gfs.ko is there. If not, you have installation problems. > > 3) shell> insmod gfs.ko > This is to manually load gfs kernel module > > 4) Retry the mount. If still failing, send us the /var/log/messages file. > > > I made a second try with mkfs.gfs2 (also with -p lock_dlm) ); > > this time I **could** mounted the gfs2 partition succesfully. > > > > > > GFS2 is part of the base kernel, so it doesn't need to worry about RPM > dependency. > > > My questions are: > > > > - should I be able with this installation to create and mount a gfs > > partition ? in case this is possible - what can be my mistale ? > > > > > > See above. > > > - is gfs2 considered safe to work with ? or is it still experimental and > > not recommended ? which features do I have in GFS2 which I don't have in > > GFS? > > > > > > > The advantage of GFS2 are (my personal opinion - not necessarily Red > Hat's) : > 1. It is mainstream and will be well maintained and updated; vs. GFS > starts to enter maintanence mode. We're hoping to phase out GFS as soon > as GFS2 is proved to be stable. > 2. It preforms better (faster), particularly for smaller file size, but > not as stable as GFS. > > However, there are tools to facilitate people to migrate from GFS to > GFS2. So if you want stability, GFS is not a bad choice at this moment. > > -- Wendy > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From isplist at logicore.net Sun Sep 2 10:44:04 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Sun, 2 Sep 2007 05:44:04 -0500 Subject: [Linux-cluster] Cluster won't come up when T1 is down??? Message-ID: <2007925444.953113@leena> Here's a very weird one. I have a cluster of web servers outgoing over a T1. When the T1 went down this morning, the cluster, which is all internal, non routable IP's, would not come back. All of the machines locked up around the loading DLM section on bootup. Once the T1 came back, they all booted just fine and went into cluster mode. What in the world would cause that? There aren't any external services required to fire up my local cluster, never were, it's always been fine before. Mike From kmoriwak at redhat.com Mon Sep 3 01:42:20 2007 From: kmoriwak at redhat.com (Kazuo Moriwaka) Date: Mon, 03 Sep 2007 10:42:20 +0900 Subject: [Linux-cluster] Discovering the world of clustering In-Reply-To: <200708312133.14035.mm@yuhu.biz> References: <200708312133.14035.mm@yuhu.biz> Message-ID: <1188783740.4413.141.camel@kmoriwak> Hi Claudio, I'm learning gfs 3 node cluster with xen. There are some references: 'Virtualization for Dummies' will be great help to use xen. http://intranet.corp.redhat.com/ic/intranet/RHEL5info I put some configuration files in svn, you can see them at: https://trac.nrt.redhat.com/trac/browser/tools/VMconfigs regards, 2007-08-31 (Fri) ? 21:33 +0300 ? Marian Marinov ????????: > You can always use Xen virtual machines which can easy migrate from machine to > machine. > > http://www.cl.cam.ac.uk/research/srg/netos/xen/ > > You can have the Xen virtual machines over a GFS cluster. > > Best regards > Marian Marinov > On Friday 31 August 2007 18:07:49 Augusto Lima wrote: > > Hi, i'm Augusto and i don't know much about clusters. > > I'm from brazil, so my english it's not quite good. > > I have an idea to test in my organization. > > > > We have two large DELL servers with 6GB RAM each and a Xeon 3GHz > > processor each. > > They have also lots of disk space. > > > > We want to cluster the 2 servers and run VMware Server on them, trying > > to utilize most of the processors and the available RAM all the time. > > We have plans to make 6 Virtual Machines running on top of them. > > We also want to take advantage of High Availbility on our > > configuration, meaning that if one servers goes down, the other have to > > hold the 6 VMs for a period of time. > > We can't afford any paid solution, since our organization does'nt > > support that kind of implementation. > > > > So, i'm wondering if anyone can give a opinion about if it is possible > > and how can i do it using only free solutions. > > > > Thanks in advance, > > > > Augusto > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From isplist at logicore.net Sat Sep 1 00:55:07 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Fri, 31 Aug 2007 19:55:07 -0500 Subject: [Linux-cluster] Cluster won't come up when T1 is down??? Message-ID: <200783119557.749480@leena> Here's a very weird one. I have a cluster of web servers outgoing over a T1. When the T1 went down this morning, the cluster, which is all internal, non routable IP's, would not come back. All of the machines locked up around the loading DLM section on bootup. Once the T1 came back, they all booted just fine and went into cluster mode. What in the world would cause that? There aren't any external services required to fire up my local cluster, never were, it's always been fine before. Mike From kmoriwak at redhat.com Mon Sep 3 01:58:04 2007 From: kmoriwak at redhat.com (Kazuo Moriwaka) Date: Mon, 03 Sep 2007 10:58:04 +0900 Subject: [Linux-cluster] Discovering the world of clustering In-Reply-To: <1188783740.4413.141.camel@kmoriwak> References: <200708312133.14035.mm@yuhu.biz> <1188783740.4413.141.camel@kmoriwak> Message-ID: <1188784684.4413.153.camel@kmoriwak> Hi, I'm very sorry for I mistaked that this list is red hat internal list.. Links which I sent are unavailable from outside from redhat.com. I'll show some tips when using xen from them. - use 'w!' attribute for shared block device ex. disk = [ 'tap:aio:/media/disk/VMImages/rhel5_1,xvda,w', 'tap:aio:/media/disk/VMImages/gfs_disk,xvdb,w!',] - make a dummy network interface to build virtual network, in /etc/xen/xend-config.sxp: (network-script 'network-bridge bridge=xenbr0 netdev=dummy0') - dnsmasq is very useful for constructing dns server for virtual network. http://www.thekelleys.org.uk/dnsmasq/doc.html regards, 2007-09-03 (Mon) ? 10:42 +0900 ? Kazuo Moriwaka ????????: > Hi Claudio, > > I'm learning gfs 3 node cluster with xen. There are some > references: > > 'Virtualization for Dummies' will be great help to use xen. > http://intranet.corp.redhat.com/ic/intranet/RHEL5info > > I put some configuration files in svn, you can see them at: > https://trac.nrt.redhat.com/trac/browser/tools/VMconfigs > > regards, > > 2007-08-31 (Fri) ? 21:33 +0300 ? Marian Marinov ????????: > > You can always use Xen virtual machines which can easy migrate from machine to > > machine. > > > > http://www.cl.cam.ac.uk/research/srg/netos/xen/ > > > > You can have the Xen virtual machines over a GFS cluster. > > > > Best regards > > Marian Marinov > > On Friday 31 August 2007 18:07:49 Augusto Lima wrote: > > > Hi, i'm Augusto and i don't know much about clusters. > > > I'm from brazil, so my english it's not quite good. > > > I have an idea to test in my organization. > > > > > > We have two large DELL servers with 6GB RAM each and a Xeon 3GHz > > > processor each. > > > They have also lots of disk space. > > > > > > We want to cluster the 2 servers and run VMware Server on them, trying > > > to utilize most of the processors and the available RAM all the time. > > > We have plans to make 6 Virtual Machines running on top of them. > > > We also want to take advantage of High Availbility on our > > > configuration, meaning that if one servers goes down, the other have to > > > hold the 6 VMs for a period of time. > > > We can't afford any paid solution, since our organization does'nt > > > support that kind of implementation. > > > > > > So, i'm wondering if anyone can give a opinion about if it is possible > > > and how can i do it using only free solutions. > > > > > > Thanks in advance, > > > > > > Augusto > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From carlopmart at gmail.com Mon Sep 3 06:42:37 2007 From: carlopmart at gmail.com (carlopmart) Date: Mon, 03 Sep 2007 08:42:37 +0200 Subject: [Linux-cluster] Re: fence_xvmd doesn't starts In-Reply-To: <46D9C4C3.3070009@gmail.com> References: <46D7E431.2020100@gmail.com> <46D9C4C3.3070009@gmail.com> Message-ID: <46DBACDD.6060803@gmail.com> carlopmart wrote: > carlopmart wrote: >> Hi all, >> >> I am running standalone xen host using rhel5 with three rhel5 xen >> guest with cluster-suite. I have setup fence_xvm as a fence device on >> all three guest. On the host side I have setup fence_xvmd on >> cluster.conf file. >> >> My problems starts when I need to restart xen server host. Every time >> that reboots, fence_xvmd doesn't starts. If I execute "service cman >> restart" all its ok: fence_xvmd starts. Why?? How can I fix it?? >> >> Many thanks. >> > Please I need an answer about this ... > > Well I think that I found the problem: cman startup script. In this line: # Check for presence of Domain-0; if it's not there, we can't # run xvmd. # xm list --long 2> /dev/null | grep -q "Domain-0" || return 1 If it is executed from command line any result is returned: [root at xenhost xen]# xm list --long 2> /dev/null | grep -q "Domain-0" [root at xenhost xen]# If I put -X under /etc/sysconfig/cman on FENCE_XVMD_OPTS, nothing happens. Is this a bug??? -- CL Martinez carlopmart {at} gmail {d0t} com From maalgi at ono.com Mon Sep 3 11:03:58 2007 From: maalgi at ono.com (maalgi at ono.com) Date: Mon, 3 Sep 2007 13:03:58 +0200 (CEST) Subject: [Linux-cluster] Fence Device (Ethernet) Message-ID: <18776480.239821188817438698.JavaMail.root@resprs03> Hi, the first thing, sorry for my english i'm spanish. I'm trying to mount in an eviorement of test a system cluster, with two pc's. None PC have fence devices installed. Every PC has 2 cards of network ethernet, one of wich this one in private network 10.0.0.x, for test to another node. Another card has an ip 192.168.x.x, of my net and a virtual ip 192.168.x.x1 to the access to services of the cluster. NODE1 eth0------->192.168.56.15 eth0:1----->192.168.56.24 eth1------->10.0.0.1 NODE2 eth0----->192.168.56.16 eth0:1--->192.168.56.24 eth1------>10.0.0.2 The interfaz "eth0:1" is up when cluster is up, while in the other node interfaz and cluster are down. Is possible configure the cluster this way?? That type of device fence must i to use?? I probe to configure cluster (eth0:1) with WTI, APC.... fence devices, and i have same ressults. /etc/init.d/ccsd start OK /etc/init.d/fenced start FAIL ...... Thank you very much and regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsrlinux at gmail.com Mon Sep 3 11:24:56 2007 From: gsrlinux at gmail.com (GS R) Date: Mon, 3 Sep 2007 16:54:56 +0530 Subject: [Linux-cluster] Fence Device (Ethernet) In-Reply-To: <18776480.239821188817438698.JavaMail.root@resprs03> References: <18776480.239821188817438698.JavaMail.root@resprs03> Message-ID: > > NODE1 > eth0------->192.168.56.15 > eth0:1----->192.168.56.24 > eth1------->10.0.0.1 > > NODE2 > eth0----->192.168.56.16 > eth0:1--->192.168.56.24 > eth1------>10.0.0.2 > > The interfaz "eth0:1" is up when cluster is up, while in the other node > interfaz and cluster are down. > Is possible configure the cluster this way?? Yes. Assuming that you have a common storage. That type of device fence must i to use?? You can your GNBD as your fencing device. I probe to configure cluster (eth0:1) with WTI, APC.... fence devices, and i > have same ressults. > > /etc/init.d/ccsd start OK > /etc/init.d/fenced start FAIL Which version of RedHat are you using? /etc/init.d/cman start & There is no /etc/init.d/fenced start use /sbin/fenced instead. Thanks Gowrishankar Rajaiyan -------------- next part -------------- An HTML attachment was scrubbed... URL: From maalgi at ono.com Mon Sep 3 11:03:58 2007 From: maalgi at ono.com (maalgi at ono.com) Date: Mon, 3 Sep 2007 13:03:58 +0200 (CEST) Subject: [Linux-cluster] Fence Device (Ethernet) Message-ID: <18776480.239821188817438698.JavaMail.root@resprs03> Hi, the first thing, sorry for my english i'm spanish. I'm trying to mount in an eviorement of test a system cluster, with two pc's. None PC have fence devices installed. Every PC has 2 cards of network ethernet, one of wich this one in private network 10.0.0.x, for test to another node. Another card has an ip 192.168.x.x, of my net and a virtual ip 192.168.x.x1 to the access to services of the cluster. NODE1 eth0------->192.168.56.15 eth0:1----->192.168.56.24 eth1------->10.0.0.1 NODE2 eth0----->192.168.56.16 eth0:1--->192.168.56.24 eth1------>10.0.0.2 The interfaz "eth0:1" is up when cluster is up, while in the other node interfaz and cluster are down. Is possible configure the cluster this way?? That type of device fence must i to use?? I probe to configure cluster (eth0:1) with WTI, APC.... fence devices, and i have same ressults. /etc/init.d/ccsd start OK /etc/init.d/fenced start FAIL ...... Thank you very much and regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Augusto at prpb.mpf.gov.br Mon Sep 3 16:26:52 2007 From: Augusto at prpb.mpf.gov.br (Augusto Lima) Date: Mon, 03 Sep 2007 13:26:52 -0300 Subject: [Linux-cluster] Discovering the world of clustering Message-ID: Thanks very much for the answers but as i was tolding you, its worth for me a cluster which share computational powers between the machines, and not only the HD. Regards, Augusto >>> kmoriwak at redhat.com 02/09/2007 22:58 >>> Hi, I'm very sorry for I mistaked that this list is red hat internal list.. Links which I sent are unavailable from outside from redhat.com. I'll show some tips when using xen from them. - use 'w!' attribute for shared block device ex. disk = [ 'tap:aio:/media/disk/VMImages/rhel5_1,xvda,w', 'tap:aio:/media/disk/VMImages/gfs_disk,xvdb,w!',] - make a dummy network interface to build virtual network, in /etc/xen/xend-config.sxp: (network-script 'network-bridge bridge=xenbr0 netdev=dummy0') - dnsmasq is very useful for constructing dns server for virtual network. http://www.thekelleys.org.uk/dnsmasq/doc.html regards, 2007-09-03 (Mon) ? 10:42 +0900 ? Kazuo Moriwaka ????????: > Hi Claudio, > > I'm learning gfs 3 node cluster with xen. There are some > references: > > 'Virtualization for Dummies' will be great help to use xen. > http://intranet.corp.redhat.com/ic/intranet/RHEL5info > > I put some configuration files in svn, you can see them at: > https://trac.nrt.redhat.com/trac/browser/tools/VMconfigs > > regards, > > 2007-08-31 (Fri) ? 21:33 +0300 ? Marian Marinov ????????: > > You can always use Xen virtual machines which can easy migrate from machine to > > machine. > > > > http://www.cl.cam.ac.uk/research/srg/netos/xen/ > > > > You can have the Xen virtual machines over a GFS cluster. > > > > Best regards > > Marian Marinov > > On Friday 31 August 2007 18:07:49 Augusto Lima wrote: > > > Hi, i'm Augusto and i don't know much about clusters. > > > I'm from brazil, so my english it's not quite good. > > > I have an idea to test in my organization. > > > > > > We have two large DELL servers with 6GB RAM each and a Xeon 3GHz > > > processor each. > > > They have also lots of disk space. > > > > > > We want to cluster the 2 servers and run VMware Server on them, trying > > > to utilize most of the processors and the available RAM all the time. > > > We have plans to make 6 Virtual Machines running on top of them. > > > We also want to take advantage of High Availbility on our > > > configuration, meaning that if one servers goes down, the other have to > > > hold the 6 VMs for a period of time. > > > We can't afford any paid solution, since our organization does'nt > > > support that kind of implementation. > > > > > > So, i'm wondering if anyone can give a opinion about if it is possible > > > and how can i do it using only free solutions. > > > > > > Thanks in advance, > > > > > > Augusto > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From wcheng at redhat.com Mon Sep 3 21:49:48 2007 From: wcheng at redhat.com (Wendy Cheng) Date: Mon, 03 Sep 2007 17:49:48 -0400 Subject: [Linux-cluster] GFS and GFS2 : two questions: which is bettter; gfs_controld error In-Reply-To: References: <46D9A38B.50304@redhat.com> Message-ID: <46DC817C.1050208@redhat.com> Ian Brown wrote: >gfs_mkfs -p lock_dlm -t myCLuster -j 32 /dev/cciss/c0d1p2 > > Few things: First, not sure why gfs_mkfs let you get away without specifying filesystem-name (-t option) .. ideally a gfs_mkfs should be dispatched as: shell> gfs_mkfs -t mycluster:myfs -p lock_dlm -j 2 /dev/vg0/mygfs (see the ":" between the cluster name and fs name here ?). Do a "man gfs_mkfs" to get the correct syntax of "-t" (locktable) Second, I notice you didn't use (c)lvm partition but a cciss raw device. How many nodes do you have in the cluster (or how many nodes do you plan to access this particular filesystem) ? If it is planned for multiple nodes access, please use (cluster version of) LVM (clvm). If this is for single node access, it is probably better using "-p nolock" protocol but "-p lock_dlm" should work fine. >mount /dev/cciss/c0d1p2 /mnt/gfs > >The errors I see in the console are: >/sbin/mount.gfs: lock_dlm_join: gfs_controld join error: -22 >/sbin/mount.gfs: error mounting lockproto lock_dlm > >The error I see in kernel log is: >gfs_controld[32629]: mount: not in default fence domain > > In theory, when you do "mount", the gfs-kmod should be loaded automatically (assume "service cman start" has been run). Check your /etc/cluster/cluster.conf file please! Also make sure "fenced" is up and runnning ("service cman start" should bring it up) when you do the mount. >I want to add that the cman service is started succesfully as the >kernel log shows. > >I want also to add that "service cman start" performs modprbe of gfs2 module >and not gfs module ! > >Namely, I ran rmmod gfs; then, after : >service cman stop >and >rmmod lock_dlm >rmmod gfs2 > >running lsmod | grep gfs2 shows that >no gfs2 is loaded, >and after "service cman start" I see by > lsmod | grep gfs2 >gfs2 522965 1 lock_dlm > >which means that starting the cman service performed modprobe/insmod >of gfs2 and lock_dlm > >Is this how things should be? > > > Yes, it was the original design for RHEL5 (i.e., gfs2 is the default). However, you really shouldn't worry about this module loading business. The "mount" should be able to find the correct module and load the module behind the scene. If your gfs-kmod correctly exists in /lib/modules directory, then I don't have goold clues why things go wrong (it works for me). Open a service ticket if you have RHEL subscription (so support folks can look into the details). Or maybe GFS team's other team member can spot anything that I've missed ? -- Wendy From kadlec at sunserv.kfki.hu Tue Sep 4 09:26:18 2007 From: kadlec at sunserv.kfki.hu (Kadlecsik Jozsi) Date: Tue, 4 Sep 2007 11:26:18 +0200 (MEST) Subject: [Linux-cluster] quorum lost in spite of 'leave remove' In-Reply-To: References: Message-ID: On Fri, 31 Aug 2007, Kadlecsik Jozsi wrote: > In spite of having 'fence_tool leave' and 'cman_tool leave remove' in the > 'cman' init script, when stopping the five-member cluster, it looses > quorum when only two machines run the cluster components: > > root at web1:~# cman_tool status > Version: 6.0.1 > Config Version: 6 > Cluster Name: kfki > Cluster Id: 1583 > Cluster Member: Yes > Cluster Generation: 748 > Membership state: Cluster-Member > Nodes: 2 > Expected votes: 5 > Total votes: 2 > Quorum: 3 Activity blocked > Active subsystems: 7 > Flags: > Ports Bound: 0 11 > Node name: web1-gfs > Node ID: 4 > Multicast addresses: 224.0.0.3 > Node addresses: 192.168.192.6 > > root at web1:~# cman_tool nodes > Node Sts Inc Joined Name > 1 X 728 lxserv0-gfs > 2 M 728 2007-08-31 09:19:09 lxserv1-gfs > 3 X 728 web0-gfs > 4 M 724 2007-08-31 09:18:48 web1-gfs > 5 X 728 saturn-gfs > > '/etc/init.d/cman stop' was issued and executed successfully on the tree > other nodes. As I see it happens because the 'expected_votes' of the nodes are not adjusted when nodes are removed. So even when decreasing of the quorum is allowed, the highest expected vote value prevents decreasing the value of the quorum. I wrote the attached patch to adjust expected_votes when a node is removed (and when it appears again). Please review it and apply if you agree with it. Best regards, Jozsef -- E-mail : kadlec at sunserv.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary -------------- next part -------------- diff -urN --exclude=deb cluster-2.01.00.orig/cman/daemon/commands.c cluster-2.01.00/cman/daemon/commands.c --- cluster-2.01.00.orig/cman/daemon/commands.c 2007-06-26 11:09:13.000000000 +0200 +++ cluster-2.01.00/cman/daemon/commands.c 2007-09-04 10:43:27.000000000 +0200 @@ -1867,7 +1867,7 @@ } } -void override_expected(int newexp) +void reset_expected(int may_increase, int newexp) { struct list *nodelist; struct cluster_node *node; @@ -1875,13 +1875,12 @@ list_iterate(nodelist, &cluster_members_list) { node = list_item(nodelist, struct cluster_node); if (node->state == NODESTATE_MEMBER - && node->expected_votes > newexp) { + && (node->expected_votes > newexp || may_increase)) { node->expected_votes = newexp; } } } - /* Add a node from CCS, note that it may already exist if user has simply updated the config file */ void add_ccs_node(char *nodename, int nodeid, int votes, int expected_votes) { @@ -1942,6 +1941,8 @@ node->incarnation = incarnation; node->state = NODESTATE_MEMBER; cluster_members++; + if ((node->leave_reason & 0xF) == CLUSTER_LEAVEFLAG_REMOVED) + reset_expected(1, us->expected_votes + node->votes); recalculate_quorum(0); } } @@ -1983,9 +1984,11 @@ node->state = NODESTATE_DEAD; cluster_members--; - if ((node->leave_reason & 0xF) == CLUSTER_LEAVEFLAG_REMOVED) + if ((node->leave_reason & 0xF) == CLUSTER_LEAVEFLAG_REMOVED) { + override_expected(us->expected_votes > node->votes ? + us->expected_votes - node->votes : 1); recalculate_quorum(1); - else + } else recalculate_quorum(0); break; diff -urN --exclude=deb cluster-2.01.00.orig/cman/daemon/commands.h cluster-2.01.00/cman/daemon/commands.h --- cluster-2.01.00.orig/cman/daemon/commands.h 2006-08-17 15:22:39.000000000 +0200 +++ cluster-2.01.00/cman/daemon/commands.h 2007-09-04 10:28:17.000000000 +0200 @@ -29,12 +29,12 @@ extern void add_ais_node(int nodeid, uint64_t incarnation, int total_members); extern void del_ais_node(int nodeid); extern void add_ccs_node(char *name, int nodeid, int votes, int expected_votes); -extern void override_expected(int expected); +extern void reset_expected(int may_increase, int expected); extern void cman_send_confchg(unsigned int *member_list, int member_list_entries, unsigned int *left_list, int left_list_entries, unsigned int *joined_list, int joined_list_entries); - +#define override_expected(expected) reset_expected(0, expected) /* Startup stuff called from cmanccs: */ extern int cman_set_nodename(char *name); From alain.richard at equation.fr Tue Sep 4 09:53:42 2007 From: alain.richard at equation.fr (Alain RICHARD) Date: Tue, 4 Sep 2007 11:53:42 +0200 Subject: [Linux-cluster] Multipathed quorum disk In-Reply-To: <39fdf1c70708311757h75a57fc3r15b740ed8ad0f58b@mail.gmail.com> References: <39fdf1c70708311757h75a57fc3r15b740ed8ad0f58b@mail.gmail.com> Message-ID: <46724B2A-C44E-4D0C-98C5-34B33CC5F253@equation.fr> Le 1 sept. 07 ? 02:57, Claudio Tassini a ?crit : > Hi, > > I recently upgraded a 2-nodes cluster adding two more nodes. I > would like a single node to remain in cluster even if the other > three are out of service, so I'm trying to add a quorum disk to the > cluster. > > The problem is that the quorum disk is a LUN in a shared storage > which has not the same device name through all the cluster nodes. > Moreover, we use device-mapper AND lvm. I could resolve the problem > using an lvm logical volume, because it would always have the same > name and recognize the underlying "dm" or "sd" device name even if > it changes across a reboot, but I've read that it's not advisable > to use a logical volume as quorum device. > > Any idea? > > -- > Claudio Tassini > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster using multipath -ll you'll see that your multipath device has got a unique id (wwid) : # multipath -ll /dev/mpath/mpath2 (200c0b60a76000032) dm-3 SNAP,FILEDISK [size=10M][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 2:0:0:0 sdd 8:48 [active][ready] \_ 1:0:0:0 sdc 8:32 [active][ready] all you have to do then, is to modify your /etc/multipath.conf file to ask a fixed name for this multipath device instead of having it get a dynamique name (/dev/mpath/mpathx) : /etc/multipath.conf : ... multipaths { multipath { wwid 200c0b60a76000032 alias qdsk1 } } and then : # multipath -ll [root at titan2 ~]# multipath -ll qdsk1 (200c0b60a76000032) dm-4 SNAP,FILEDISK [size=10M][features=0][hwhandler=0] \_ round-robin 0 [prio=0][active] \_ 4:0:0:0 sdf 8:80 [active][ready] \_ 3:0:0:0 sde 8:64 [active][ready] (please, be warn that the first time you do it, it rename the multipath device to the name you have ask for, but it fails to rename the /dev/mpath/ device, so you have to do it manually once). do it on all your cluster members and they all get the multipath device with the same name. I have also encoutered a problem with cman that refuse to register a node with more than 16 chars (qdiskd register the qdisk device as a node name). So you must ensure your device path is less than 16 chars for a qdisk device (this is why I use /dev/mpath/qdsk1 instead of / dev/mpath/qdisk1). Regards, -- Alain RICHARD EQUATION SA Tel : +33 477 79 48 00 Fax : +33 477 79 48 01 E-Liance, Op?rateur des entreprises et collectivit?s, Liaisons Fibre optique, SDSL et ADSL -------------- next part -------------- An HTML attachment was scrubbed... URL: From pawel.mastalerz at mainseek.com Tue Sep 4 11:41:37 2007 From: pawel.mastalerz at mainseek.com (=?ISO-8859-2?Q?Pawe=B3_Mastalerz?=) Date: Tue, 04 Sep 2007 13:41:37 +0200 Subject: [Linux-cluster] GFS and iscsi problem Message-ID: <46DD4471.6010304@mainseek.com> Hi, I have some problem with gfs cluster and iscsi VTrak M500i. Cluster structure looks like that:each of 14 nodes is connected to vtrack and have sdb7 disc mounted with GFS.Right now 6 machines are using thath disc to read&write images.Those 6 machines, on which is that site stored, are plugged to LB. Scheme looks like that: *iscsi* | | | | | | node1 node2 node3 node4... etc config: . (...) From time to time there is a problem on one of those nodes with loosing connection to iscsi, when that happens the whole GFS is blocked and the rest of the nodes has no access to that partition (sdb7) :( Question - Why is GFS blocking access to that directory for all nodes if on the node (which cause problems) connection to SCSI has been recovered? I suppose that is GFS fault, but why logs dont show that? The only thing i can do now is to reload cluster and GFS. -- Pawel Mastalerz pawel[dot]mastalerz[at]mainseek[dot]com http://mainseek.net/ From Alexandre.Racine at mhicc.org Tue Sep 4 16:57:10 2007 From: Alexandre.Racine at mhicc.org (Alexandre Racine) Date: Tue, 4 Sep 2007 12:57:10 -0400 Subject: [Linux-cluster] GFS and iscsi problem References: <46DD4471.6010304@mainseek.com> Message-ID: Hi all, I would like to know that too, since I made some similar tests and GFS seems simply to hang. My config: # cat /etc/cluster/cluster.conf Alexandre Racine Projets sp?ciaux 514-461-1300 poste 3304 alexandre.racine at mhicc.org -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of Pawel Mastalerz Sent: Tue 2007-09-04 07:41 To: linux-cluster at redhat.com Subject: [Linux-cluster] GFS and iscsi problem Hi, I have some problem with gfs cluster and iscsi VTrak M500i. Cluster structure looks like that:each of 14 nodes is connected to vtrack and have sdb7 disc mounted with GFS.Right now 6 machines are using thath disc to read&write images.Those 6 machines, on which is that site stored, are plugged to LB. Scheme looks like that: *iscsi* | | | | | | node1 node2 node3 node4... etc config: . (...) From time to time there is a problem on one of those nodes with loosing connection to iscsi, when that happens the whole GFS is blocked and the rest of the nodes has no access to that partition (sdb7) :( Question - Why is GFS blocking access to that directory for all nodes if on the node (which cause problems) connection to SCSI has been recovered? I suppose that is GFS fault, but why logs dont show that? The only thing i can do now is to reload cluster and GFS. -- Pawel Mastalerz pawel[dot]mastalerz[at]mainseek[dot]com http://mainseek.net/ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3862 bytes Desc: not available URL: From orkcu at yahoo.com Tue Sep 4 18:16:15 2007 From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=) Date: Tue, 4 Sep 2007 11:16:15 -0700 (PDT) Subject: [Linux-cluster] GFS and iscsi problem In-Reply-To: Message-ID: <353580.21090.qm@web50606.mail.re2.yahoo.com> --- Alexandre Racine wrote: > Hi all, I would like to know that too, since I made > some similar tests and GFS seems simply to hang. > people getting this king of problem usually have a problem with fencing, your fencing is manual, this is really bad for production because if there is a problem with the GFS in one node,the cluster will wait for that node to be feced and if it fenced by humand hand..... until you send the aknowledge that the cluster will be in standby. > > > My config: > # cat /etc/cluster/cluster.conf > > > > > > > > > > > > > > > > > > use a real fence device and try gfs again :-) cu roger __________________________________________ RedHat Certified ( RHCE ) Cisco Certified ( CCNA & CCDA ) ____________________________________________________________________________________ Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. http://sims.yahoo.com/ From pawel.mastalerz at mainseek.com Tue Sep 4 18:28:10 2007 From: pawel.mastalerz at mainseek.com (=?UTF-8?B?UGF3ZcWCIE1hc3RhbGVyeg==?=) Date: Tue, 04 Sep 2007 20:28:10 +0200 Subject: [Linux-cluster] GFS and iscsi problem In-Reply-To: <353580.21090.qm@web50606.mail.re2.yahoo.com> References: <353580.21090.qm@web50606.mail.re2.yahoo.com> Message-ID: <46DDA3BA.9010403@mainseek.com> Roger Pe?a pisze: > --- Alexandre Racine > wrote: > >> Hi all, I would like to know that too, since I made >> some similar tests and GFS seems simply to hang. >> > people getting this king of problem usually have a > problem with fencing, your fencing is manual, this is > really bad for production because if there is a > problem with the GFS in one node,the cluster will wait > for that node to be feced and if it fenced by humand > hand..... > until you send the aknowledge that the cluster will be > in standby. Yes, but i use: Message-ID: <7398.83425.qm@web50611.mail.re2.yahoo.com> --- Pawe?? Mastalerz wrote: > Roger Pe?a pisze: > > --- Alexandre Racine > > wrote: > > > >> Hi all, I would like to know that too, since I > made > >> some similar tests and GFS seems simply to hang. > >> > > people getting this king of problem usually have > a > > problem with fencing, your fencing is manual, this > is > > really bad for production because if there is a > > problem with the GFS in one node,the cluster will > wait > > for that node to be feced and if it fenced by > humand > > hand..... > > until you send the aknowledge that the cluster > will be > > in standby. > > Yes, but i use: > > ipaddr=.... > sorry, I didn't read throught your messages, just looked at Alexandre Racine's configuration > and it's not a problem, fence work fine. Fence work > only when one of > nodes is down or have some other problem with > connection to other nodes. well, I would expect if one node has a problem with its GFS filesystem ( for example, network failure in the iscsi scenario), the cluster should-must fence that node just to avoid filesystem corruption but I could be wrong... cu roger __________________________________________ RedHat Certified ( RHCE ) Cisco Certified ( CCNA & CCDA ) ____________________________________________________________________________________ Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/ From carlopmart at gmail.com Tue Sep 4 19:21:02 2007 From: carlopmart at gmail.com (carlopmart) Date: Tue, 04 Sep 2007 21:21:02 +0200 Subject: [Linux-cluster] Re: fence_xvmd doesn't starts In-Reply-To: <46DBACDD.6060803@gmail.com> References: <46D7E431.2020100@gmail.com> <46D9C4C3.3070009@gmail.com> <46DBACDD.6060803@gmail.com> Message-ID: <46DDB01E.4030803@gmail.com> carlopmart wrote: > carlopmart wrote: >> carlopmart wrote: >>> Hi all, >>> >>> I am running standalone xen host using rhel5 with three rhel5 xen >>> guest with cluster-suite. I have setup fence_xvm as a fence device on >>> all three guest. On the host side I have setup fence_xvmd on >>> cluster.conf file. >>> >>> My problems starts when I need to restart xen server host. Every >>> time that reboots, fence_xvmd doesn't starts. If I execute "service >>> cman restart" all its ok: fence_xvmd starts. Why?? How can I fix it?? >>> >>> Many thanks. >>> >> Please I need an answer about this ... >> >> > > Well I think that I found the problem: cman startup script. In this line: > > > # Check for presence of Domain-0; if it's not there, we can't > # run xvmd. > # > xm list --long 2> /dev/null | grep -q "Domain-0" || return 1 > > If it is executed from command line any result is returned: > > [root at xenhost xen]# xm list --long 2> /dev/null | grep -q "Domain-0" > [root at xenhost xen]# > > If I put -X under /etc/sysconfig/cman on FENCE_XVMD_OPTS, nothing > happens. Is this a bug??? > Please any hints about this??? -- CL Martinez carlopmart {at} gmail {d0t} com From Alexandre.Racine at mhicc.org Tue Sep 4 19:50:47 2007 From: Alexandre.Racine at mhicc.org (Alexandre Racine) Date: Tue, 4 Sep 2007 15:50:47 -0400 Subject: [Linux-cluster] GFS and iscsi problem References: <353580.21090.qm@web50606.mail.re2.yahoo.com> Message-ID: Haaaaaa. I can see the light now. This was the last part of the puzzle I needed and that I had put beside at first. Thanks. For those who want to have more infos on this: http://sourceware.org/cluster/faq.html#fence_what -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of Roger Pe?a Sent: Tue 2007-09-04 14:16 To: linux clustering Subject: RE: [Linux-cluster] GFS and iscsi problem --- Alexandre Racine wrote: > Hi all, I would like to know that too, since I made > some similar tests and GFS seems simply to hang. > people getting this king of problem usually have a problem with fencing, your fencing is manual, this is really bad for production because if there is a problem with the GFS in one node,the cluster will wait for that node to be feced and if it fenced by humand hand..... until you send the aknowledge that the cluster will be in standby. > > > My config: > # cat /etc/cluster/cluster.conf > > > > > > > > > > > > > > > > > > use a real fence device and try gfs again :-) cu roger __________________________________________ RedHat Certified ( RHCE ) Cisco Certified ( CCNA & CCDA ) ____________________________________________________________________________________ Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. http://sims.yahoo.com/ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3458 bytes Desc: not available URL: From lhh at redhat.com Tue Sep 4 21:03:28 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 4 Sep 2007 17:03:28 -0400 Subject: [Linux-cluster] Fence Device (Ethernet) In-Reply-To: <18776480.239821188817438698.JavaMail.root@resprs03> References: <18776480.239821188817438698.JavaMail.root@resprs03> Message-ID: <20070904210328.GF19477@redhat.com> On Mon, Sep 03, 2007 at 01:03:58PM +0200, maalgi at ono.com wrote: > Hi, the first thing, sorry for my english i'm spanish. We'll try to help anyway. > I probe to configure cluster (eth0:1) with WTI, APC.... fence devices, and i have same ressults. > > /etc/init.d/ccsd start OK > /etc/init.d/fenced start FAIL cman must start before fenced; /etc/init.d/ccsd start /etc/init.d/cman start /etc/init.d/fenced start -- Lon Hohberger - Software Engineer - Red Hat, Inc. From lhh at redhat.com Tue Sep 4 21:04:34 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 4 Sep 2007 17:04:34 -0400 Subject: [Linux-cluster] Fence Device (Ethernet) In-Reply-To: References: <18776480.239821188817438698.JavaMail.root@resprs03> Message-ID: <20070904210434.GG19477@redhat.com> On Mon, Sep 03, 2007 at 04:54:56PM +0530, GS R wrote: > Which version of RedHat are you using? > /etc/init.d/cman start > & > There is no /etc/init.d/fenced start use /sbin/fenced instead. On RHEL4 / cluster-1.0x, there is. -- Lon Hohberger - Software Engineer - Red Hat, Inc. From lhh at redhat.com Tue Sep 4 21:10:05 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 4 Sep 2007 17:10:05 -0400 Subject: [Linux-cluster] Multipathed quorum disk In-Reply-To: <39fdf1c70708311757h75a57fc3r15b740ed8ad0f58b@mail.gmail.com> References: <39fdf1c70708311757h75a57fc3r15b740ed8ad0f58b@mail.gmail.com> Message-ID: <20070904211005.GH19477@redhat.com> On Sat, Sep 01, 2007 at 02:57:48AM +0200, Claudio Tassini wrote: > Hi, > I recently upgraded a 2-nodes cluster adding two more nodes. I would like a > single node to remain in cluster even if the other three are out of service, > so I'm trying to add a quorum disk to the cluster. > > The problem is that the quorum disk is a LUN in a shared storage which has > not the same device name through all the cluster nodes. Moreover, we use > device-mapper AND lvm. I could resolve the problem using an lvm logical > volume, because it would always have the same name and recognize the > underlying "dm" or "sd" device name even if it changes across a reboot, but > I've read that it's not advisable to use a logical volume as quorum device. You can do it, but if the LVM volume is clustered, you can introduce a circular dependency: * need quorum to access CLVM volume * need CLVM volume to become quorate... You can work around this by making qdisk's votes 1 less than the number of nodes in the cluster. Ex: 1 vote for qdisk and 1 vote per node, expected_votes = 3 & two_node = 0 for CMAN. Then, both nodes can come online before you start qdisk to eliminate the "chicken and egg" problem. This might also be related - There's a bugzilla open against qdisk; as qdisk doesn't work with devices which do not have a 512 byte sector size. I should have a fix for it this week. https://bugzilla.redhat.com/show_bug.cgi?id=272861 -- Lon Hohberger - Software Engineer - Red Hat, Inc. From lhh at redhat.com Tue Sep 4 21:13:23 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 4 Sep 2007 17:13:23 -0400 Subject: [Linux-cluster] RE: qdisk votes not in cman In-Reply-To: <30E8283B-B35E-4DE2-A8B6-9D59ED51C3E8@equation.fr> References: <30E8283B-B35E-4DE2-A8B6-9D59ED51C3E8@equation.fr> Message-ID: <20070904211323.GI19477@redhat.com> On Fri, Aug 31, 2007 at 12:46:50PM +0200, Alain RICHARD wrote: > Perhaps a better error reporting is needed in qdiskd to shows that we > have hit this problem. Also using a generic name like "qdisk device" > when qdiskd is registering its node to cman is a better approach. What about using the label instead of the device name, and restricting the label to 16 chars when advertising to cman? -- Lon -- Lon Hohberger - Software Engineer - Red Hat, Inc. From lhh at redhat.com Tue Sep 4 21:14:04 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 4 Sep 2007 17:14:04 -0400 Subject: [Linux-cluster] Re: fence_xvmd doesn't starts In-Reply-To: <46DBACDD.6060803@gmail.com> References: <46D7E431.2020100@gmail.com> <46D9C4C3.3070009@gmail.com> <46DBACDD.6060803@gmail.com> Message-ID: <20070904211404.GJ19477@redhat.com> On Mon, Sep 03, 2007 at 08:42:37AM +0200, carlopmart wrote: > carlopmart wrote: > >carlopmart wrote: > >>Hi all, > >> > >> I am running standalone xen host using rhel5 with three rhel5 xen > >>guest with cluster-suite. I have setup fence_xvm as a fence device on > >>all three guest. On the host side I have setup fence_xvmd on > >>cluster.conf file. > >> > >> My problems starts when I need to restart xen server host. Every time > >>that reboots, fence_xvmd doesn't starts. If I execute "service cman > >>restart" all its ok: fence_xvmd starts. Why?? How can I fix it?? > >> > >>Many thanks. > >> > >Please I need an answer about this ... > > > > > > Well I think that I found the problem: cman startup script. In this line: > > > # Check for presence of Domain-0; if it's not there, we can't > # run xvmd. > # > xm list --long 2> /dev/null | grep -q "Domain-0" || return 1 > > If it is executed from command line any result is returned: > > [root at xenhost xen]# xm list --long 2> /dev/null | grep -q "Domain-0" > [root at xenhost xen]# > > If I put -X under /etc/sysconfig/cman on FENCE_XVMD_OPTS, nothing > happens. Is this a bug??? Yes. -- Lon Hohberger - Software Engineer - Red Hat, Inc. From lhh at redhat.com Tue Sep 4 21:15:37 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 4 Sep 2007 17:15:37 -0400 Subject: [Linux-cluster] Re: fence_xvmd doesn't starts In-Reply-To: <46DDB01E.4030803@gmail.com> References: <46D7E431.2020100@gmail.com> <46D9C4C3.3070009@gmail.com> <46DBACDD.6060803@gmail.com> <46DDB01E.4030803@gmail.com> Message-ID: <20070904211537.GK19477@redhat.com> On Tue, Sep 04, 2007 at 09:21:02PM +0200, carlopmart wrote: > carlopmart wrote: > >carlopmart wrote: > >>carlopmart wrote: > >>>Hi all, > >>> > >>> I am running standalone xen host using rhel5 with three rhel5 xen > >>>guest with cluster-suite. I have setup fence_xvm as a fence device on > >>>all three guest. On the host side I have setup fence_xvmd on > >>>cluster.conf file. > >>> > >>> My problems starts when I need to restart xen server host. Every > >>>time that reboots, fence_xvmd doesn't starts. If I execute "service > >>>cman restart" all its ok: fence_xvmd starts. Why?? How can I fix it?? > >>> > >>>Many thanks. > >>> > >>Please I need an answer about this ... > >> > >> > > > >Well I think that I found the problem: cman startup script. In this line: > > > > > > # Check for presence of Domain-0; if it's not there, we can't > > # run xvmd. > > # > > xm list --long 2> /dev/null | grep -q "Domain-0" || return 1 > > > >If it is executed from command line any result is returned: > > > > [root at xenhost xen]# xm list --long 2> /dev/null | grep -q "Domain-0" > > [root at xenhost xen]# > > > >If I put -X under /etc/sysconfig/cman on FENCE_XVMD_OPTS, nothing > >happens. Is this a bug??? > > > > Please any hints about this??? It sounds like a bug that is fixed in 5.1 beta. fence_xvmd needs xend to be running. Now, in 5.0, if xend didn't start, fence_xvmd didn't correctly start. In 5.1 beta, fence_xvmd will wait for xend to start. -- Lon Hohberger - Software Engineer - Red Hat, Inc. From dwalgamo at gmail.com Tue Sep 4 21:20:35 2007 From: dwalgamo at gmail.com (David Walgamotte) Date: Tue, 4 Sep 2007 16:20:35 -0500 Subject: [Linux-cluster] howto Message-ID: <77ad9a6b0709041420t18101f19vceef4ec5b49c98b5@mail.gmail.com> any1 know of a good howto for web cluster with gfs. I need good step by step guide as the redhat docs are not working. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpeterso at redhat.com Tue Sep 4 21:23:43 2007 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 04 Sep 2007 16:23:43 -0500 Subject: [Linux-cluster] howto In-Reply-To: <77ad9a6b0709041420t18101f19vceef4ec5b49c98b5@mail.gmail.com> References: <77ad9a6b0709041420t18101f19vceef4ec5b49c98b5@mail.gmail.com> Message-ID: <1188941023.661.2.camel@technetium.msp.redhat.com> On Tue, 2007-09-04 at 16:20 -0500, David Walgamotte wrote: > any1 know of a good howto for web cluster with gfs. I need good step > by step guide as the redhat docs are not working. > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster Hi David, I don't know which Red Hat docs aren't working for you, but here are some you can try: http://sources.redhat.com/cluster/doc/nfscookbook.pdf http://sources.redhat.com/cluster/doc/usage.txt http://sources.redhat.com/cluster/faq.html Regards, Bob Peterson Red Hat Cluster Suite From carlopmart at gmail.com Wed Sep 5 07:09:15 2007 From: carlopmart at gmail.com (carlopmart) Date: Wed, 05 Sep 2007 09:09:15 +0200 Subject: [Linux-cluster] Re: fence_xvmd doesn't starts In-Reply-To: <20070904211537.GK19477@redhat.com> References: <46D7E431.2020100@gmail.com> <46D9C4C3.3070009@gmail.com> <46DBACDD.6060803@gmail.com> <46DDB01E.4030803@gmail.com> <20070904211537.GK19477@redhat.com> Message-ID: <46DE561B.7040109@gmail.com> Lon Hohberger wrote: > On Tue, Sep 04, 2007 at 09:21:02PM +0200, carlopmart wrote: >> carlopmart wrote: >>> carlopmart wrote: >>>> carlopmart wrote: >>>>> Hi all, >>>>> >>>>> I am running standalone xen host using rhel5 with three rhel5 xen >>>>> guest with cluster-suite. I have setup fence_xvm as a fence device on >>>>> all three guest. On the host side I have setup fence_xvmd on >>>>> cluster.conf file. >>>>> >>>>> My problems starts when I need to restart xen server host. Every >>>>> time that reboots, fence_xvmd doesn't starts. If I execute "service >>>>> cman restart" all its ok: fence_xvmd starts. Why?? How can I fix it?? >>>>> >>>>> Many thanks. >>>>> >>>> Please I need an answer about this ... >>>> >>>> >>> Well I think that I found the problem: cman startup script. In this line: >>> >>> >>> # Check for presence of Domain-0; if it's not there, we can't >>> # run xvmd. >>> # >>> xm list --long 2> /dev/null | grep -q "Domain-0" || return 1 >>> >>> If it is executed from command line any result is returned: >>> >>> [root at xenhost xen]# xm list --long 2> /dev/null | grep -q "Domain-0" >>> [root at xenhost xen]# >>> >>> If I put -X under /etc/sysconfig/cman on FENCE_XVMD_OPTS, nothing >>> happens. Is this a bug??? >>> >> Please any hints about this??? > > It sounds like a bug that is fixed in 5.1 beta. fence_xvmd needs xend > to be running. > > Now, in 5.0, if xend didn't start, fence_xvmd didn't correctly start. > > In 5.1 beta, fence_xvmd will wait for xend to start. > Mant thanks Lon. I will wait until rhel 5.1 is released. Meanwhile, i will start fence_xvmd manually from rc.local. -- CL Martinez carlopmart {at} gmail {d0t} com From hlawatschek at atix.de Wed Sep 5 07:48:32 2007 From: hlawatschek at atix.de (Mark Hlawatschek) Date: Wed, 5 Sep 2007 09:48:32 +0200 Subject: [Linux-cluster] howto In-Reply-To: <1188941023.661.2.camel@technetium.msp.redhat.com> References: <77ad9a6b0709041420t18101f19vceef4ec5b49c98b5@mail.gmail.com> <1188941023.661.2.camel@technetium.msp.redhat.com> Message-ID: <200709050948.34577.hlawatschek@atix.de> On Tuesday 04 September 2007 23:23:43 Bob Peterson wrote: > On Tue, 2007-09-04 at 16:20 -0500, David Walgamotte wrote: > > any1 know of a good howto for web cluster with gfs. I need good step > > by step guide as the redhat docs are not working. > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > Hi David, > > I don't know which Red Hat docs aren't working for you, but here are > some you can try: > > http://sources.redhat.com/cluster/doc/nfscookbook.pdf > http://sources.redhat.com/cluster/doc/usage.txt > http://sources.redhat.com/cluster/faq.html There's also a howto for setting up a GFS cluster at http://open-sharedroot.org/documentation/the-opensharedroot-mini-howto/ Note, that this howto mainly covers the installation of a GFS bases diskless sharedroot cluster. -- Gruss / Regards, Dipl.-Ing. Mark Hlawatschek http://www.atix.de/ http://www.open-sharedroot.org/ ** ATIX Informationstechnologie und Consulting AG Einsteinstr. 10 85716 Unterschleissheim Deutschland/Germany From pawel.mastalerz at mainseek.com Wed Sep 5 08:41:10 2007 From: pawel.mastalerz at mainseek.com (=?UTF-8?B?UGF3ZcWCIE1hc3RhbGVyeg==?=) Date: Wed, 05 Sep 2007 10:41:10 +0200 Subject: [Linux-cluster] GFS and iscsi problem In-Reply-To: <7398.83425.qm@web50611.mail.re2.yahoo.com> References: <7398.83425.qm@web50611.mail.re2.yahoo.com> Message-ID: <46DE6BA6.9090308@mainseek.com> Roger Pe?a pisze: >> Yes, but i use: >> >> > ipaddr=.... >> > sorry, I didn't read throught your messages, just > looked at Alexandre Racine's configuration > > >> and it's not a problem, fence work fine. Fence work >> only when one of >> nodes is down or have some other problem with >> connection to other nodes. > well, I would expect if one node has a problem with > its GFS filesystem ( for example, network failure in > the iscsi scenario), the cluster should-must fence > that node just to avoid filesystem corruption > but I could be wrong... > Yes should... :) but when one of nodes lost connection to iscsi nothing happen, only i cant access to gfs, and gfs dont write to klog info about that... Plz help :) -- Pawel Mastalerz pawel[dot]mastalerz[at]mainseek[dot]com http://mainseek.net/ From mgrac at redhat.com Wed Sep 5 14:04:08 2007 From: mgrac at redhat.com (Marek 'marx' Grac) Date: Wed, 05 Sep 2007 16:04:08 +0200 Subject: [Linux-cluster] postgres-8 resource In-Reply-To: References: Message-ID: <46DEB758.4070007@redhat.com> Hi, Hell, Robert wrote: > > Aug 30 19:37:06 pg-ba-001 clurgmgrd: [31089]: Trying to execute > sudo -u postgres /usr/bin/postmaster -c > config_file=/etc/cluster/postgres-8/postgres-8:postgresql_vts1/postgresql.conf > -> *some debugging, works fine when executed manually* > > Aug 30 19:37:06 pg-ba-001 clurgmgrd: [31089]: Starting Service > postgres-8:postgresql_vts1 > Failed > > Aug 30 19:37:06 pg-ba-001 clurgmgrd[31089]: start on > postgres-8 "postgresql_vts1" returned 1 (generic error) > > Aug 30 19:37:06 pg-ba-001 clurgmgrd[31089]: #68: Failed to > start service:pg-ba-vts1; return value: 1 > > > > Any ideas how to determine why it won?t start? > Sorry for late response (vacantions :)). You found real problems with resourge agent for postgres-8, please fill a bug in bugzilla. In the attachment is a patch which should work (extract to /usr/share/cluster; it fixes postres-8.sh and utils/config-utils.sh). Fixes problems with listen_address, directory for pid file and running postmaster on background. If it will work I will put in in the CVS. Thanks, marx -- Marek Grac Red Hat Czech s.r.o. -------------- next part -------------- A non-text attachment was scrubbed... Name: postgres-ra.tgz Type: application/x-compressed-tar Size: 4175 bytes Desc: not available URL: From beres.laszlo at sys-admin.hu Wed Sep 5 19:02:27 2007 From: beres.laszlo at sys-admin.hu (BERES Laszlo) Date: Wed, 05 Sep 2007 21:02:27 +0200 Subject: [Linux-cluster] Cluster won't come up when T1 is down??? In-Reply-To: <2007925444.953113@leena> References: <2007925444.953113@leena> Message-ID: <46DEFD43.8090206@sys-admin.hu> isplist at logicore.net wrote: > What in the world would cause that? There aren't any external services > required to fire up my local cluster, never were, it's always been fine > before. Just a silly question: how about name resolution? Is it independent from the external DNS? Are the members available without nameservers? -- B?RES L?szl? RHCE, RHCX senior IT engineer, trainer From Alexandre.Racine at mhicc.org Wed Sep 5 19:55:15 2007 From: Alexandre.Racine at mhicc.org (Alexandre Racine) Date: Wed, 5 Sep 2007 15:55:15 -0400 Subject: [Linux-cluster] gfs-kernel with 2.6.22 References: <46DEB758.4070007@redhat.com> Message-ID: Is there someone who is using 2.6.22 with gfs-kernel? (1.03 or 1.04) All of this was working fine with 2.6.20 ... I am using gentoo-2.6.22-r5 and was just wondering if I need to bug hunt or not. Thanks. Error message below... make[3]: Entering directory `/usr/src/linux-2.6.22-gentoo-r5' CC [M] /var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src/nolock/main.o /var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src/nolock/main.c: In function 'nolock_plock_get': /var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src/nolock/main.c:250: error: too many arguments to function 'posix_test_lock' make[4]: *** [/var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src/nolock/main.o] Error 1 make[3]: *** [_module_/var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src/nolock] Error 2 make[3]: Leaving directory `/usr/src/linux-2.6.22-gentoo-r5' make[2]: *** [all] Error 2 make[2]: Leaving directory `/var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src/nolock' make[1]: *** [all] Error 2 make[1]: Leaving directory `/var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src' make: *** [all] Error 2 !!! ERROR: sys-cluster/gfs-kernel-1.03.00-r1 failed. Call stack: ebuild.sh, line 1638: Called dyn_compile ebuild.sh, line 985: Called qa_call 'src_compile' ebuild.sh, line 44: Called src_compile gfs-kernel-1.03.00-r1.ebuild, line 59: Called die Alexandre Racine Projets sp?ciaux 514-461-1300 poste 3304 alexandre.racine at mhicc.org -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3011 bytes Desc: not available URL: From Christopher.Barry at qlogic.com Wed Sep 5 22:16:32 2007 From: Christopher.Barry at qlogic.com (Christopher Barry) Date: Wed, 05 Sep 2007 18:16:32 -0400 Subject: [Linux-cluster] Quorum question / split brain paranoia Message-ID: <1189030593.5447.94.camel@localhost> Greetings all, I'm building a 6-node hybrid virtual cluster, and would like a little advice about quorum concepts, with the goal being functioning of at least one virtual node, on one physical box, and the complete inability of split-brain occurring. I have Two Physical machines (PM) that will host many virtual machines, however only three VMs per PM will actually be members of the cluster. _Each_ PM is running VMware ESX 3, with: * 3 es4ud5 cluster node VMs * director node VM in an Active/Passive config * various other, non-cluster nodes, out of scope. The diagram of one physical machine with the relevant VMs, virtual switches and virtual wiring can be seen below. Each PM is a mirror image of the other: (view this as a fixed font) +------------------------------+ | PHYSICAL ESX BOX | | +-----------------------+ | | | VM Cluster Node1 |--+ | | +-----------------------+ | | | +-----------------------+ | | | | VM Cluster Node2 |--+ | | +-----------------------+ | | | +-----------------------+ | | | | VM Cluster Node3 |--+ | | +-----------------------+ | | | +---+---+------' | | +-----------|---|---|---+ | | |Cluster Virtual Switch |--+ | | +--------------|--------+ | | | 10.0.1.0/24 +-----+ | | | +--------------------|--+ | | | | Director VM Node (NAT)| | | | +---|-------------------+ | | | +-' 10.0.0.0/24 | | | +-|---------------------+ | | | |Director Virtual Switch| | | | +----------|------------+ | | | | | | | | +-------' | +===[fc0]===[e0]===[e1]========+ | | | to SAN | `---> x-over cable to mirror box To LAN The cluster nodes will run GFS, the director will not. Only one director will be active with a VIP, load will balance across all 6 VMs. The crossover will actually have VLANs on it that will allow a separate heartbeat net, but it was getting a bit tricky with ASCII art ;) Can anyone see any issues that may arise where quorum could create a split brain scenario? What would be the best way to approach votes, etc. here? Thanks all for your time. -- Regards, -C Christopher Barry Systems Engineer, Principal QLogic Corporation 780 Fifth Avenue, Suite 140 King of Prussia, PA 19406 o/f: 610-233-4870 / 4777 m: 267-242-9306 From basv at sara.nl Thu Sep 6 06:02:39 2007 From: basv at sara.nl (Bas van der Vlies) Date: Thu, 6 Sep 2007 08:02:39 +0200 Subject: [Linux-cluster] gfs-kernel with 2.6.22 In-Reply-To: References: <46DEB758.4070007@redhat.com> Message-ID: <46DF97FF.9090305@sara.nl> Alexandre Racine wrote: > > Is there someone who is using 2.6.22 with gfs-kernel? (1.03 or 1.04) > All of this was working fine with 2.6.20 ... > The release versions are tight to a kernel version. I know for 1.04 it is 2.6.20. For 1.03 it 2.6.16 or 2.6.17. If you want to use a newer kernel you have to get the source form cvs STABLE branch. I do not know if this version compiles against your kernel version because the STABLE branch has not many updates lately. > I am using gentoo-2.6.22-r5 and was just wondering if I need to bug hunt or not. Thanks. > > Error message below... > > > make[3]: Entering directory `/usr/src/linux-2.6.22-gentoo-r5' > CC [M] /var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src/nolock/main.o > /var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src/nolock/main.c: In function 'nolock_plock_get': > /var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src/nolock/main.c:250: error: too many arguments to function 'posix_test_lock' > make[4]: *** [/var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src/nolock/main.o] Error 1 > make[3]: *** [_module_/var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src/nolock] Error 2 > make[3]: Leaving directory `/usr/src/linux-2.6.22-gentoo-r5' > make[2]: *** [all] Error 2 > make[2]: Leaving directory `/var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src/nolock' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/var/tmp/portage/sys-cluster/gfs-kernel-1.03.00-r1/work/cluster-1.03.00/gfs-kernel/src' > make: *** [all] Error 2 > > !!! ERROR: sys-cluster/gfs-kernel-1.03.00-r1 failed. > Call stack: > ebuild.sh, line 1638: Called dyn_compile > ebuild.sh, line 985: Called qa_call 'src_compile' > ebuild.sh, line 44: Called src_compile > gfs-kernel-1.03.00-r1.ebuild, line 59: Called die > > > Alexandre Racine > Projets sp?ciaux > 514-461-1300 poste 3304 > alexandre.racine at mhicc.org > > > > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- ******************************************************************** * * * Bas van der Vlies e-mail: basv at sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From pcaulfie at redhat.com Thu Sep 6 08:10:12 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 06 Sep 2007 09:10:12 +0100 Subject: [Linux-cluster] quorum lost in spite of 'leave remove' In-Reply-To: References: <46D828EE.5070103@redhat.com> Message-ID: <46DFB5E4.2010108@redhat.com> Hmm, my outgoing email seems to have eaten while I was away... I did raise a bugzilla for this last week and it has a patch attached. if you get chance, you might like to try it. https://bugzilla.redhat.com/show_bug.cgi?id=271701 -- Patrick From hlawatschek at atix.de Thu Sep 6 08:58:40 2007 From: hlawatschek at atix.de (Mark Hlawatschek) Date: Thu, 6 Sep 2007 10:58:40 +0200 Subject: [Linux-cluster] GFS profiling result Message-ID: <200709061058.40741.hlawatschek@atix.de> Hi, during a performance analysis and tuning session, I did some profiling with oprofile on GFS and dlm. I got some weird results ... The installed software is: RHEL4u5, kernel 2.6.9-55.0.2.ELsmp GFS: 2.6.9-72.2.0.2 DLM: 2.6.9-46.16.0.1 The configuration includes 2 clusternodes. I put the following load on one cluster node: 100 processes are doing in parallel: - create 1000 files with 100kb size each (ie altogether we have 100.000 files) - flock 1000 files - unlink 1000 files. The following oprofile output shows, that the system spends about 49% (75%*65%*) of the time in gfs_unlinked_get. Looking into the code whe can see, that this is related to unlinked.c: 53 9394211 58.7081 : ul = list_entry(tmp, struct gfs_unlinked, ul_list); It can also be observed, that dlm spends more than 50% of its time in searching for hashes... Is this the expected behaviour or can this be tuned somewhere ? Thanks, Mark oprofile shows the following: # opreport --long-filenames --threshold 1 samples| %| ------------------ 168187984 75.4905 /gfs 37896161 17.0095 /usr/lib/debug/lib/modules/2.6.9-55.0.2.ELsmp/vmlinux 11686302 5.2453 /dlm # opreport image:/gfs -l --threshold 1 110838927 65.8899 gfs_unlinked_get 12918468 7.6796 gfs_unlinked_hold 10958430 6.5144 scan_glock 9448317 5.6167 examine_bucket 5504188 3.2720 gfs_unlinked_unlock 3795382 2.2562 trylock_on_glock 3368017 2.0022 unlock_on_glock 1939971 1.1532 run_queue # opreport image:/dlm -l --threshold 1 samples % symbol name 5853674 50.0875 search_hashchain 3726276 31.8842 search_bucket 1506327 12.8890 __find_lock_by_id -- Gruss / Regards, Dipl.-Ing. Mark Hlawatschek http://www.atix.de/ http://www.open-sharedroot.org/ ** ATIX Informationstechnologie und Consulting AG Einsteinstr. 10 85716 Unterschleissheim Deutschland/Germany From lhh at redhat.com Thu Sep 6 12:19:26 2007 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 6 Sep 2007 08:19:26 -0400 Subject: [Linux-cluster] Re: fence_xvmd doesn't starts In-Reply-To: <46DE561B.7040109@gmail.com> References: <46D7E431.2020100@gmail.com> <46D9C4C3.3070009@gmail.com> <46DBACDD.6060803@gmail.com> <46DDB01E.4030803@gmail.com> <20070904211537.GK19477@redhat.com> <46DE561B.7040109@gmail.com> Message-ID: <20070906121926.GD30969@redhat.com> On Wed, Sep 05, 2007 at 09:09:15AM +0200, carlopmart wrote: > >It sounds like a bug that is fixed in 5.1 beta. fence_xvmd needs xend > >to be running. > > > >Now, in 5.0, if xend didn't start, fence_xvmd didn't correctly start. > > > >In 5.1 beta, fence_xvmd will wait for xend to start. > > > > Mant thanks Lon. I will wait until rhel 5.1 is released. Meanwhile, i > will start fence_xvmd manually from rc.local. > No problem, though, beta is available and you should test it if you have time. More testing over wider audience = better. (You can just pull fence_xvmd out of the 5.1 beta cman package, if you want.) -- Lon Hohberger - Software Engineer - Red Hat, Inc. From lhh at redhat.com Thu Sep 6 12:22:35 2007 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 6 Sep 2007 08:22:35 -0400 Subject: [Linux-cluster] Quorum question / split brain paranoia In-Reply-To: <1189030593.5447.94.camel@localhost> References: <1189030593.5447.94.camel@localhost> Message-ID: <20070906122235.GE30969@redhat.com> On Wed, Sep 05, 2007 at 06:16:32PM -0400, Christopher Barry wrote: > > The cluster nodes will run GFS, the director will not. Only one director > will be active with a VIP, load will balance across all 6 VMs. The > crossover will actually have VLANs on it that will allow a separate > heartbeat net, but it was getting a bit tricky with ASCII art ;) > > Can anyone see any issues that may arise where quorum could create a > split brain scenario? What would be the best way to approach votes, etc. > here? So, two physical boxes hosting LVS to virtual machines as the real servers (how ironic, actually...). Said real server cluster is using GFS to share the data? (I want to make sure I understand the question here) -- Lon Hohberger - Software Engineer - Red Hat, Inc. From Christopher.Barry at qlogic.com Thu Sep 6 13:30:55 2007 From: Christopher.Barry at qlogic.com (Christopher Barry) Date: Thu, 06 Sep 2007 09:30:55 -0400 Subject: [Linux-cluster] Quorum question / split brain paranoia In-Reply-To: <20070906122235.GE30969@redhat.com> References: <1189030593.5447.94.camel@localhost> <20070906122235.GE30969@redhat.com> Message-ID: <1189085455.5276.4.camel@localhost> On Thu, 2007-09-06 at 08:22 -0400, Lon Hohberger wrote: > On Wed, Sep 05, 2007 at 06:16:32PM -0400, Christopher Barry wrote: > > > > The cluster nodes will run GFS, the director will not. Only one director > > will be active with a VIP, load will balance across all 6 VMs. The > > crossover will actually have VLANs on it that will allow a separate > > heartbeat net, but it was getting a bit tricky with ASCII art ;) > > > > Can anyone see any issues that may arise where quorum could create a > > split brain scenario? What would be the best way to approach votes, etc. > > here? > > So, two physical boxes hosting LVS to virtual machines as the real > servers (how ironic, actually...). Said real server cluster is using > GFS to share the data? > > (I want to make sure I understand the question here) > Hi Lon, It is a bit ironic, isn't it ;) Yes, you are correct; the vm real-servers are sharing a gfs volume. -- Regards, -C From orkcu at yahoo.com Fri Sep 7 02:41:01 2007 From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=) Date: Thu, 6 Sep 2007 19:41:01 -0700 (PDT) Subject: [Linux-cluster] GFS and SELinux work together in RHEL4? Message-ID: <93478.57314.qm@web50605.mail.re2.yahoo.com> Hi yesterday I upgrade a RHEL4.4 to RHEL4.5 and the apacher server start complaining about, the logs point to selinux support in the GFS filesystem that hold the DocumentRoot of several VHosts when I try to see the context of the files and directories I realice that the version of GFS-kernel did not support selinux. I am using centos csgfs over RHEL and because Centos do not have yet the *-kernel packages for the newest kernel of rhel4.5 I am still running the old kernel. I am planing to recompile the srpm of the packages fo the new kernel but first I am trying to find if GFS-kernel-2.6.9-72.2 bring SELinux support to GFS filesystems, I could find any hint to confirm a yes or no. Well the lack of information subjest a "no" :-) I found this FAQ entry: http://sourceware.org/cluster/faq.html#gfs_selinux but I do not know if it is updated :-) also I found serveral places where it is mention that SELinux xattr is supported since the end of last year. so, the question: RH GFS 4.5 bring SELinux support for GFS silesystems? thanks roger __________________________________________ RedHat Certified ( RHCE ) Cisco Certified ( CCNA & CCDA ) ____________________________________________________________________________________ Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. http://autos.yahoo.com/green_center/ From kadlec at sunserv.kfki.hu Fri Sep 7 08:13:57 2007 From: kadlec at sunserv.kfki.hu (Kadlecsik Jozsi) Date: Fri, 7 Sep 2007 10:13:57 +0200 (MEST) Subject: [Linux-cluster] quorum lost in spite of 'leave remove' In-Reply-To: <46DFB5E4.2010108@redhat.com> References: <46D828EE.5070103@redhat.com> <46DFB5E4.2010108@redhat.com> Message-ID: Hi, On Thu, 6 Sep 2007, Patrick Caulfield wrote: > I did raise a bugzilla for this last week and it has a patch attached. if you > get chance, you might like to try it. > > https://bugzilla.redhat.com/show_bug.cgi?id=271701 I tested it and the patch fixes bug and works fine. Thank you very much indeed. Best regards, Jozsef -- E-mail : kadlec at sunserv.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From frankie.montenegro at gmail.com Fri Sep 7 10:28:01 2007 From: frankie.montenegro at gmail.com (Frankie Montenegro) Date: Fri, 7 Sep 2007 06:28:01 -0400 Subject: [Linux-cluster] Small Cluster, Port Trunking, To use switch or not? Message-ID: <46b1e5210709070328k41d02047sc0ee4e0730c78376@mail.gmail.com> Hi everyone, I am building a small HPC cluster with two slaves and a master. I can put together two slave nodes with under 350$ per node, so I don't want to spend more then 70-80$ for networking. Buying a gigabit ethernet 4 port switch would be the most straightforward solution. However, I was hoping to get "port trunking" set up, and doubling the network speed. Since this is not supported by switches that are within my budget, I wondered if I can achieve port trunking without a switch, just adding couple of network cards to the master node and having a master node be a switch. WIll I be able to get 2Gbps or is this idea completely idiotic? Thanks, F. From bob.marcan at interstudio.homeunix.net Fri Sep 7 10:44:04 2007 From: bob.marcan at interstudio.homeunix.net (Bob Marcan) Date: Fri, 07 Sep 2007 12:44:04 +0200 Subject: [Linux-cluster] Small Cluster, Port Trunking, To use switch or not? In-Reply-To: <46b1e5210709070328k41d02047sc0ee4e0730c78376@mail.gmail.com> References: <46b1e5210709070328k41d02047sc0ee4e0730c78376@mail.gmail.com> Message-ID: <46E12B74.6030406@interstudio.homeunix.net> Frankie Montenegro wrote: > Hi everyone, > > I am building a small HPC cluster with two slaves and a master. I can > put together two slave nodes with under 350$ per node, so I don't want > to spend more then 70-80$ for networking. > > Buying a gigabit ethernet 4 port switch would be the most > straightforward solution. However, I was hoping to get "port trunking" > set up, and doubling the network speed. Since this is not supported > by switches that are within my budget, I wondered if I can achieve > port trunking without a switch, just > adding couple of network cards to the master node and having a > master node be a switch. WIll I be able to get 2Gbps or is this idea > completely idiotic? > > Thanks, > F. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster /usr/share/doc/kernel-doc-XXXXXXX/Documentation/networking/bonding.txt Look for mode. Regards, Bob -- Bob Marcan, Consultant mailto:bob.marcan at snt.si S&T Slovenija d.d. tel: +386 (1) 5895-300 Leskoskova cesta 6 fax: +386 (1) 5895-202 1000 Ljubljana, Slovenia url: http://www.snt.si From Alexandre.Racine at mhicc.org Fri Sep 7 15:17:04 2007 From: Alexandre.Racine at mhicc.org (Alexandre Racine) Date: Fri, 7 Sep 2007 11:17:04 -0400 Subject: [Linux-cluster] users... References: <46DEB758.4070007@redhat.com> <46DF97FF.9090305@sara.nl> Message-ID: Hi, I'll install my first SGE soon! Only two little problems and there we go. One of them is users. In the docs, it stipulate : "Ensure that all users of the grid engine system have the same user names on all submit and execution hosts." That's good, but do we need to have passwordless login between servers or it is not needed? Or another question would be, how do you manage user accounts on all the cluster servers? Alexandre Racine Projets sp?ciaux 514-461-1300 poste 3304 alexandre.racine at mhicc.org -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2689 bytes Desc: not available URL: From fait at anl.gov Fri Sep 7 15:16:29 2007 From: fait at anl.gov (James Fait) Date: Fri, 07 Sep 2007 10:16:29 -0500 Subject: [Linux-cluster] users... In-Reply-To: References: <46DEB758.4070007@redhat.com> <46DF97FF.9090305@sara.nl> Message-ID: <46E16B4D.1060500@anl.gov> Alexandre Racine wrote: > Hi, > > I'll install my first SGE soon! Only two little problems and there we go. > > One of them is users. In the docs, it stipulate : "Ensure that all users of the grid engine system have the same user names on all submit and execution hosts." > > That's good, but do we need to have passwordless login between servers or it is not needed? Or another question would be, how do you manage user accounts on all the cluster servers? > > > > > > Alexandre Racine > Projets sp?ciaux > 514-461-1300 poste 3304 > alexandre.racine at mhicc.org > > > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster The easiest way is to set up Fedora Directory Services or equivalent for LDAP authentication. That way you do all user administration at one point. Sincerely James Fait, Ph.D. Beamline Scientist, SER-CAT APS, ANL -------------- next part -------------- An HTML attachment was scrubbed... URL: From frankie.montenegro at gmail.com Fri Sep 7 15:52:31 2007 From: frankie.montenegro at gmail.com (Frankie Montenegro) Date: Fri, 7 Sep 2007 11:52:31 -0400 Subject: [Linux-cluster] Small Cluster, Port Trunking, To use switch or not? In-Reply-To: <46E12B74.6030406@interstudio.homeunix.net> References: <46b1e5210709070328k41d02047sc0ee4e0730c78376@mail.gmail.com> <46E12B74.6030406@interstudio.homeunix.net> Message-ID: <46b1e5210709070852q2ad913cfx4e5e61c59813b497@mail.gmail.com> Thanks. That will be very usefull when I start putting things together. Did I understand this howto correctly: either way, switch or not, my network devices need to be complient with this 802.3ad protocol if I want to bond them. RIght? Well that's a bummer: I can't use the network cards on board of my "cheapo" motherboards, which means I have to buy 2 cards per node ( and the cheapest card with support of this protocol was around 50$) . I guess I better forget about bonding then. F. On 9/7/07, Bob Marcan wrote: > Frankie Montenegro wrote: > > Hi everyone, > > > > I am building a small HPC cluster with two slaves and a master. I can > > put together two slave nodes with under 350$ per node, so I don't want > > to spend more then 70-80$ for networking. > > > > Buying a gigabit ethernet 4 port switch would be the most > > straightforward solution. However, I was hoping to get "port trunking" > > set up, and doubling the network speed. Since this is not supported > > by switches that are within my budget, I wondered if I can achieve > > port trunking without a switch, just > > adding couple of network cards to the master node and having a > > master node be a switch. WIll I be able to get 2Gbps or is this idea > > completely idiotic? > > > > Thanks, > > F. > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > /usr/share/doc/kernel-doc-XXXXXXX/Documentation/networking/bonding.txt > Look for mode. > > Regards, Bob > -- > Bob Marcan, Consultant mailto:bob.marcan at snt.si > S&T Slovenija d.d. tel: +386 (1) 5895-300 > Leskoskova cesta 6 fax: +386 (1) 5895-202 > 1000 Ljubljana, Slovenia url: http://www.snt.si > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From pillai at mathstat.dal.ca Fri Sep 7 16:17:02 2007 From: pillai at mathstat.dal.ca (Balagopal Pillai) Date: Fri, 07 Sep 2007 13:17:02 -0300 Subject: [Linux-cluster] Small Cluster, Port Trunking, To use switch or not? In-Reply-To: <46b1e5210709070852q2ad913cfx4e5e61c59813b497@mail.gmail.com> References: <46b1e5210709070328k41d02047sc0ee4e0730c78376@mail.gmail.com> <46E12B74.6030406@interstudio.homeunix.net> <46b1e5210709070852q2ad913cfx4e5e61c59813b497@mail.gmail.com> Message-ID: <46E1797E.3020308@mathstat.dal.ca> Hi, For 802.3ad bonding mode, the switch needs to support lacp. Static trunking feature on the switch is not enough. In your case with no switch support, mode 6 or adaptive load balancing is a good option. Round robin is the only mode that will give you more than an interface worth of throughput on a single connection. But that needs some switch support. (like cisco etherchannel for example) Also there is additional overhead due to out of the order packets. The other modes will give better aggregate throughput. Regards Balagopal Frankie Montenegro wrote: > Thanks. That will be very usefull when I start putting things together. > > Did I understand this howto correctly: either way, switch or not, my > network devices need to be complient with this 802.3ad protocol if I > want to bond them. RIght? > > Well that's a bummer: I can't use the network cards on board of my > "cheapo" motherboards, which means I have to buy 2 cards per node ( > and the cheapest card with support of this > protocol was around 50$) . I guess I better forget about bonding then. > > F. > > On 9/7/07, Bob Marcan wrote: > >> Frankie Montenegro wrote: >> >>> Hi everyone, >>> >>> I am building a small HPC cluster with two slaves and a master. I can >>> put together two slave nodes with under 350$ per node, so I don't want >>> to spend more then 70-80$ for networking. >>> >>> Buying a gigabit ethernet 4 port switch would be the most >>> straightforward solution. However, I was hoping to get "port trunking" >>> set up, and doubling the network speed. Since this is not supported >>> by switches that are within my budget, I wondered if I can achieve >>> port trunking without a switch, just >>> adding couple of network cards to the master node and having a >>> master node be a switch. WIll I be able to get 2Gbps or is this idea >>> completely idiotic? >>> >>> Thanks, >>> F. >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> /usr/share/doc/kernel-doc-XXXXXXX/Documentation/networking/bonding.txt >> Look for mode. >> >> Regards, Bob >> -- >> Bob Marcan, Consultant mailto:bob.marcan at snt.si >> S&T Slovenija d.d. tel: +386 (1) 5895-300 >> Leskoskova cesta 6 fax: +386 (1) 5895-202 >> 1000 Ljubljana, Slovenia url: http://www.snt.si >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From rstevens at internap.com Fri Sep 7 18:01:49 2007 From: rstevens at internap.com (Rick Stevens) Date: Fri, 07 Sep 2007 11:01:49 -0700 Subject: [Linux-cluster] users... In-Reply-To: <46E16B4D.1060500@anl.gov> References: <46DEB758.4070007@redhat.com> <46DF97FF.9090305@sara.nl> <46E16B4D.1060500@anl.gov> Message-ID: <1189188109.29112.28.camel@prophead.corp.publichost.com> On Fri, 2007-09-07 at 10:16 -0500, James Fait wrote: > Alexandre Racine wrote: > > Hi, > > > > I'll install my first SGE soon! Only two little problems and there we go. > > > > One of them is users. In the docs, it stipulate : "Ensure that all users of the grid engine system have the same user names on all submit and execution hosts." > > > > That's good, but do we need to have passwordless login between servers or it is not needed? Or another question would be, how do you manage user accounts on all the cluster servers? > The easiest way is to set up Fedora Directory Services or equivalent > for LDAP authentication. That way you do all user administration at > one point. LDAP is one solution, so's NIS/NIS+ (and a bit easier to set up IMHO). ---------------------------------------------------------------------- - Rick Stevens, Principal Engineer rstevens at internap.com - - CDN Systems, Internap, Inc. http://www.internap.com - - - - Better to understand a little than to misunderstand a lot. - ---------------------------------------------------------------------- From Alexandre.Racine at mhicc.org Fri Sep 7 18:48:02 2007 From: Alexandre.Racine at mhicc.org (Alexandre Racine) Date: Fri, 7 Sep 2007 14:48:02 -0400 Subject: [Linux-cluster] users... References: <46DEB758.4070007@redhat.com> <46DF97FF.9090305@sara.nl> <46E16B4D.1060500@anl.gov> Message-ID: So, if I use the GFS, all UID and GID must be the same on all servers of the cluster, right? Alexandre Racine Projets sp?ciaux 514-461-1300 poste 3304 alexandre.racine at mhicc.org -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of James Fait Sent: Fri 2007-09-07 11:16 To: linux clustering Subject: Re: [Linux-cluster] users... Alexandre Racine wrote: > Hi, > > I'll install my first SGE soon! Only two little problems and there we go. > > One of them is users. In the docs, it stipulate : "Ensure that all users of the grid engine system have the same user names on all submit and execution hosts." > > That's good, but do we need to have passwordless login between servers or it is not needed? Or another question would be, how do you manage user accounts on all the cluster servers? > > > > > > Alexandre Racine > Projets sp?ciaux > 514-461-1300 poste 3304 > alexandre.racine at mhicc.org > > > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster The easiest way is to set up Fedora Directory Services or equivalent for LDAP authentication. That way you do all user administration at one point. Sincerely James Fait, Ph.D. Beamline Scientist, SER-CAT APS, ANL -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3353 bytes Desc: not available URL: From hlawatschek at atix.de Fri Sep 7 18:59:06 2007 From: hlawatschek at atix.de (Mark Hlawatschek) Date: Fri, 7 Sep 2007 20:59:06 +0200 Subject: [Linux-cluster] users... In-Reply-To: References: <46E16B4D.1060500@anl.gov> Message-ID: <200709072059.06506.hlawatschek@atix.de> You could create a shared root configuration. That would mean that all cluster nodes use the same user database per concept. have a look at http://open-sharedroot.org/ for details. Mark On Friday 07 September 2007 20:48:02 Alexandre Racine wrote: > So, if I use the GFS, all UID and GID must be the same on all servers of > the cluster, right? > > > Alexandre Racine > Projets sp?ciaux > 514-461-1300 poste 3304 > alexandre.racine at mhicc.org > > > > -----Original Message----- > From: linux-cluster-bounces at redhat.com on behalf of James Fait > Sent: Fri 2007-09-07 11:16 > To: linux clustering > Subject: Re: [Linux-cluster] users... > > Alexandre Racine wrote: > > Hi, > > > > I'll install my first SGE soon! Only two little problems and there we go. > > > > One of them is users. In the docs, it stipulate : "Ensure that all users > > of the grid engine system have the same user names on all submit and > > execution hosts." > > > > That's good, but do we need to have passwordless login between servers or > > it is not needed? Or another question would be, how do you manage user > > accounts on all the cluster servers? > > > > > > > > > > > > Alexandre Racine > > Projets sp?ciaux > > 514-461-1300 poste 3304 > > alexandre.racine at mhicc.org > > > > > > > > ------------------------------------------------------------------------ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > The easiest way is to set up Fedora Directory Services or equivalent for > LDAP authentication. That way you do all user administration at one point. > > Sincerely > > James Fait, Ph.D. > Beamline Scientist, SER-CAT > APS, ANL -- Gruss / Regards, Dipl.-Ing. Mark Hlawatschek http://www.atix.de/ http://www.open-sharedroot.org/ ** ATIX Informationstechnologie und Consulting AG Einsteinstr. 10 85716 Unterschleissheim Deutschland/Germany From c_triantafillou at hotmail.com Fri Sep 7 20:56:58 2007 From: c_triantafillou at hotmail.com (Christos Triantafillou) Date: Fri, 7 Sep 2007 21:56:58 +0100 Subject: [Linux-cluster] DLM - Lock Value Block error Message-ID: Hi, I am using RHEL 4.5 and DLM 1.0.3 on a 4-node cluster. I noticed the following regarding the LVB: 1. there are two processes: one that sets the LVB of a resource while holding an EX lock and another one that has a NL lock on the same resource and is blocked on a dlm_lock_wait for getting a CR lock and reading the LVB.2. when the first process is interrupted with control-C or killed, the second process getsan invalid LVB error. It seems that DLM falsely releases the resource after the first process is gone and then the second process reads an uninitialized LVB. Can you please confirm this error and create a bug report if necessary? Kind regards, Christos Triantafillou _________________________________________________________________ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE -------------- next part -------------- An HTML attachment was scrubbed... URL: From rstevens at internap.com Fri Sep 7 21:28:45 2007 From: rstevens at internap.com (Rick Stevens) Date: Fri, 07 Sep 2007 14:28:45 -0700 Subject: [Linux-cluster] users... In-Reply-To: References: <46DEB758.4070007@redhat.com> <46DF97FF.9090305@sara.nl> <46E16B4D.1060500@anl.gov> Message-ID: <1189200525.31171.23.camel@prophead.corp.publichost.com> On Fri, 2007-09-07 at 14:48 -0400, Alexandre Racine wrote: > So, if I use the GFS, all UID and GID must be the same on all servers > of the cluster, right? Yes. They'll all see the same filesystem, so if the UID/GIDs don't match across all systems, you'll have permissions and file ownership problems. > -----Original Message----- > From: linux-cluster-bounces at redhat.com on behalf of James Fait > Sent: Fri 2007-09-07 11:16 > To: linux clustering > Subject: Re: [Linux-cluster] users... > > Alexandre Racine wrote: > > Hi, > > > > I'll install my first SGE soon! Only two little problems and there > we go. > > > > One of them is users. In the docs, it stipulate : "Ensure that all > users of the grid engine system have the same user names on all submit > and execution hosts." > > > > > That's good, but do we need to have passwordless login between > servers or it is not needed? Or another question would be, how do you > manage user accounts on all the cluster servers? > > > > > > > > > > > > > Alexandre Racine > > Projets sp?ciaux > > 514-461-1300 poste 3304 > > alexandre.racine at mhicc.org > > > > > > > > > ------------------------------------------------------------------------ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > The easiest way is to set up Fedora Directory Services or equivalent > for > LDAP authentication. That way you do all user administration at one > point. > > Sincerely > > James Fait, Ph.D. > Beamline Scientist, SER-CAT > APS, ANL > ---------------------------------------------------------------------- - Rick Stevens, Principal Engineer rstevens at internap.com - - CDN Systems, Internap, Inc. http://www.internap.com - - - - To understand recursion, you must first understand recursion. - ---------------------------------------------------------------------- From mhanafi at csc.com Sat Sep 8 20:34:37 2007 From: mhanafi at csc.com (Mahmoud Hanafi) Date: Sat, 8 Sep 2007 16:34:37 -0400 Subject: [Linux-cluster] vip device selection Message-ID: Cluster nodes having more than 1 network device how do you select which device is used for the VIP. Mahmoud Hanafi Sr. System Administrator CSC HPC COE Bld. 676 2435 Fifth Street WPAFB, Ohio 45433 (937) 255-1536 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From pcaulfie at redhat.com Mon Sep 10 07:44:28 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Mon, 10 Sep 2007 08:44:28 +0100 Subject: [Linux-cluster] DLM - Lock Value Block error In-Reply-To: References: Message-ID: <46E4F5DC.3080209@redhat.com> Christos Triantafillou wrote: > Hi, > > I am using RHEL 4.5 and DLM 1.0.3 on a 4-node cluster. > > I noticed the following regarding the LVB: > 1. there are two processes: one that sets the LVB of a resource while > holding an EX lock > and another one that has a NL lock on the same resource and is blocked > on a dlm_lock_wait > for getting a CR lock and reading the LVB. > 2. when the first process is interrupted with control-C or killed, the > second process gets > an invalid LVB error. > > It seems that DLM falsely releases the resource after the first process > is gone and then > the second process reads an uninitialized LVB. > > Can you please confirm this error and create a bug report if necessary? I don't know of this bug in particular, though it might be so. Can you raise a bug and put as much information as possible into it please (example programs, sample output, and contents of /proc/cluster/dlm_locks on the master node before and after the incident). Thanks. -- Patrick From Alain.Moulle at bull.net Mon Sep 10 09:14:47 2007 From: Alain.Moulle at bull.net (Alain Moulle) Date: Mon, 10 Sep 2007 11:14:47 +0200 Subject: [Linux-cluster] CS4 U4 / questions about quorum disk Message-ID: <46E50B07.7040807@bull.net> Hi Some questions about quorum disk : 1. is the quorum disk working correctly on CS4 Update 4 ? or is there any known issue which could lead to problems ? 2. when you have two or three shared disk_array between two HA nodes, is it needed to have a quorum disk each disk-array or is one quorum disk on only one disk_array sufficient ? (I think one is sufficient but just to have your opinion ...) Thanks for your response. Regards. Alain Moull? From claudio.tassini at gmail.com Mon Sep 10 11:19:25 2007 From: claudio.tassini at gmail.com (Claudio Tassini) Date: Mon, 10 Sep 2007 13:19:25 +0200 Subject: [Linux-cluster] GFS: drop_count and drop_period tuning Message-ID: <39fdf1c70709100418j44935e4sd9bae4da92319a11@mail.gmail.com> Hi all, I have a four-nodes GFS cluster on RH 4.5 (last versions, updated yesterday). There are three GFS filesystems ( 1 TB, 450 GB and 5GB), serving some mail domains with postfix/courier imap in a "maildir" configuration. As you can suspect, this is not exactly the best for GFS: we have a lot (thousands) of very small files (emails) in a very lot of directories. I'm trying to tune up things to reach the best performance. I found that tuning the drop_count parameter in /proc/cluster/lock_dlm/drop_period , setting it to a very large value (it was 500000 and now, after a memory upgrade, I've set it to 1500000 ), uses a lot of memory (about 10GB out of 16 that I've installed in every machine) and seems to "boost" performance limiting the iowait CPU usage. The bad thing is that when I umount a filesystem, it must clean up all that locks (I think), and sometimes it causes problems to the whole cluster, with the other nodes that stop writes to the filesystem while I'm umounting on one node only. Is this normal? How can I tune this to clean memory faster when I umount the FS? I've read something about setting more gfs_glockd daemons per fs with the num_glockd mount option, but it seems to be quite deprecated because it shouldn't be necessary.. -- Claudio Tassini -------------- next part -------------- An HTML attachment was scrubbed... URL: From Vinoda_Kumar at Satyam.com Mon Sep 10 11:38:10 2007 From: Vinoda_Kumar at Satyam.com (Vinoda_Kumar) Date: Mon, 10 Sep 2007 17:08:10 +0530 Subject: [Linux-cluster] Cluster Suite on mainframe (s390x)? Message-ID: Hi All, Is Cluster Suite bundled with RHEL 5 AP for mainframe (systemZ / s390x)? Thanks Vinoda Kumar S Project Lead, System Computer Services Limited +91 80 6658 3215 DISCLAIMER: This email (including any attachments) is intended for the sole use of the intended recipient/s and may contain material that is CONFIDENTIAL AND PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying or distribution or forwarding of any or all of the contents in this message is STRICTLY PROHIBITED. If you are not the intended recipient, please contact the sender by email and delete all copies; your cooperation in this regard is appreciated. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jwilson at transolutions.net Mon Sep 10 19:44:10 2007 From: jwilson at transolutions.net (James Wilson) Date: Mon, 10 Sep 2007 14:44:10 -0500 Subject: [Linux-cluster] Cluster not starting backup after reboot Message-ID: <46E59E8A.4000407@transolutions.net> I had 2 host cluster up and going over the weekend. I came in today and shutdown the cluster and added 2 more hosts to my current cluster. The new hosts are xen domU's. When I rebooted everything the cluster will not come back up. And my /var/log/messeges file has a lot of these errors below. Does anyone know why I would be getting these errors now? Any help is appreciated. ccsd[8297]: Cluster is not quorate. Refusing connection. ccsd[8297]: Error while processing connect: Connection refused From Michael.Hagmann at hilti.com Mon Sep 10 21:17:56 2007 From: Michael.Hagmann at hilti.com (Hagmann, Michael) Date: Mon, 10 Sep 2007 23:17:56 +0200 Subject: [Linux-cluster] GFS: drop_count and drop_period tuning In-Reply-To: <39fdf1c70709100418j44935e4sd9bae4da92319a11@mail.gmail.com> References: <39fdf1c70709100418j44935e4sd9bae4da92319a11@mail.gmail.com> Message-ID: <9C203D6FD2BF9D49BFF3450201DEDA5301EACA71@LI-OWL.hag.hilti.com> Hi When you are on RHEL4.5 then I highly suggest you to use the new glock_purge Parameter for every gfs Filesystem add to /etc/rc.local ------- gfs_tool settune / glock_purge 50 gfs_tool settune /scratch glock_purge 50 ------- also this Parameter has to set new on every mount. That mean when you umount it and then mount it again, run the /etc/rc.local again, otherway the parameter are gone! maybe also checkout this page --> http://www.open-sharedroot.org/Members/marc/blog/blog-on-gfs/glock-trimm ing-patch mike ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Claudio Tassini Sent: Montag, 10. September 2007 13:19 To: linux clustering Subject: [Linux-cluster] GFS: drop_count and drop_period tuning Hi all, I have a four-nodes GFS cluster on RH 4.5 (last versions, updated yesterday). There are three GFS filesystems ( 1 TB, 450 GB and 5GB), serving some mail domains with postfix/courier imap in a "maildir" configuration. As you can suspect, this is not exactly the best for GFS: we have a lot (thousands) of very small files (emails) in a very lot of directories. I'm trying to tune up things to reach the best performance. I found that tuning the drop_count parameter in /proc/cluster/lock_dlm/drop_period , setting it to a very large value (it was 500000 and now, after a memory upgrade, I've set it to 1500000 ), uses a lot of memory (about 10GB out of 16 that I've installed in every machine) and seems to "boost" performance limiting the iowait CPU usage. The bad thing is that when I umount a filesystem, it must clean up all that locks (I think), and sometimes it causes problems to the whole cluster, with the other nodes that stop writes to the filesystem while I'm umounting on one node only. Is this normal? How can I tune this to clean memory faster when I umount the FS? I've read something about setting more gfs_glockd daemons per fs with the num_glockd mount option, but it seems to be quite deprecated because it shouldn't be necessary.. -- Claudio Tassini -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Mon Sep 10 21:30:59 2007 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 10 Sep 2007 17:30:59 -0400 Subject: [Linux-cluster] Quorum question / split brain paranoia In-Reply-To: <1189085455.5276.4.camel@localhost> References: <1189030593.5447.94.camel@localhost> <20070906122235.GE30969@redhat.com> <1189085455.5276.4.camel@localhost> Message-ID: <20070910213059.GF7563@redhat.com> On Thu, Sep 06, 2007 at 09:30:55AM -0400, Christopher Barry wrote: > On Thu, 2007-09-06 at 08:22 -0400, Lon Hohberger wrote: > > On Wed, Sep 05, 2007 at 06:16:32PM -0400, Christopher Barry wrote: > > > > > > The cluster nodes will run GFS, the director will not. Only one director > > > will be active with a VIP, load will balance across all 6 VMs. The > > > crossover will actually have VLANs on it that will allow a separate > > > heartbeat net, but it was getting a bit tricky with ASCII art ;) > > > > > > Can anyone see any issues that may arise where quorum could create a > > > split brain scenario? What would be the best way to approach votes, etc. > > > here? > > > > So, two physical boxes hosting LVS to virtual machines as the real > > servers (how ironic, actually...). Said real server cluster is using > > GFS to share the data? > > > > (I want to make sure I understand the question here) > > > > > Hi Lon, > > It is a bit ironic, isn't it ;) Yes, you are correct; the vm > real-servers are sharing a gfs volume. No real issues, but your qdiskd heuristics should be based on "can I talk to a physical node in the cluster" or something like that. Basically, you need to implement a solution which will allow all-but-one node to fail in the "virtual machine" cluster. This way, if you lose half of the VMs, you can still maintain a quorum. -- Lon Hohberger - Software Engineer - Red Hat, Inc. From lhh at redhat.com Mon Sep 10 21:42:09 2007 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 10 Sep 2007 17:42:09 -0400 Subject: [Linux-cluster] CS4 U4 / questions about quorum disk In-Reply-To: <46E50B07.7040807@bull.net> References: <46E50B07.7040807@bull.net> Message-ID: <20070910214209.GG7563@redhat.com> On Mon, Sep 10, 2007 at 11:14:47AM +0200, Alain Moulle wrote: > Hi > > Some questions about quorum disk : > > 1. is the quorum disk working correctly on CS4 Update 4 ? > or is there any known issue which could lead to problems ? I'd recommend using U5. > 2. when you have two or three shared disk_array between two > HA nodes, is it needed to have a quorum disk each disk-array > or is one quorum disk on only one disk_array sufficient ? > (I think one is sufficient but just to have your opinion ...) You can only have quorum disk "device". (Multipathed / mirrored devices should be fine...) -- Lon Hohberger - Software Engineer - Red Hat, Inc. From lhh at redhat.com Mon Sep 10 21:43:25 2007 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 10 Sep 2007 17:43:25 -0400 Subject: [Linux-cluster] Cluster not starting backup after reboot In-Reply-To: <46E59E8A.4000407@transolutions.net> References: <46E59E8A.4000407@transolutions.net> Message-ID: <20070910214325.GH7563@redhat.com> On Mon, Sep 10, 2007 at 02:44:10PM -0500, James Wilson wrote: > I had 2 host cluster up and going over the weekend. I came in today and > shutdown the cluster and added 2 more hosts to my current cluster. The > new hosts are xen domU's. When I rebooted everything the cluster will > not come back up. And my /var/log/messeges file has a lot of these > errors below. Does anyone know why I would be getting these errors now? > Any help is appreciated. > > ccsd[8297]: Cluster is not quorate. Refusing connection. > ccsd[8297]: Error while processing connect: Connection refused You need at least 3 nodes online and the configuration file version # matching on all of them. I'd start checking there. -- Lon -- Lon Hohberger - Software Engineer - Red Hat, Inc. From smeacham at charter.net Mon Sep 10 21:59:31 2007 From: smeacham at charter.net (smeacham at charter.net) Date: Mon, 10 Sep 2007 21:59:31 +0000 Subject: [Linux-cluster] Cluster not starting backup after reboot In-Reply-To: <20070910214325.GH7563@redhat.com> References: <46E59E8A.4000407@transolutions.net><20070910214325.GH7563@redhat.com> Message-ID: <1381753941-1189461572-cardhu_decombobulator_blackberry.rim.net-1440139959-@bxe019.bisx.prod.on.blackberry> Sent via BlackBerry by AT&T -----Original Message----- From: Lon Hohberger Date: Mon, 10 Sep 2007 17:43:25 To:jwilson at transolutions.net,linux clustering Subject: Re: [Linux-cluster] Cluster not starting backup after reboot On Mon, Sep 10, 2007 at 02:44:10PM -0500, James Wilson wrote: > I had 2 host cluster up and going over the weekend. I came in today and > shutdown the cluster and added 2 more hosts to my current cluster. The > new hosts are xen domU's. When I rebooted everything the cluster will > not come back up. And my /var/log/messeges file has a lot of these > errors below. Does anyone know why I would be getting these errors now? > Any help is appreciated. > > ccsd[8297]: Cluster is not quorate. Refusing connection. > ccsd[8297]: Error while processing connect: Connection refused You need at least 3 nodes online and the configuration file version # matching on all of them. I'd start checking there. -- Lon -- Lon Hohberger - Software Engineer - Red Hat, Inc. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From jwilson at transolutions.net Mon Sep 10 22:18:37 2007 From: jwilson at transolutions.net (James Wilson) Date: Mon, 10 Sep 2007 17:18:37 -0500 Subject: [Linux-cluster] Cluster not starting backup after reboot In-Reply-To: <1381753941-1189461572-cardhu_decombobulator_blackberry.rim.net-1440139959-@bxe019.bisx.prod.on.blackberry> References: <46E59E8A.4000407@transolutions.net><20070910214325.GH7563@redhat.com> <1381753941-1189461572-cardhu_decombobulator_blackberry.rim.net-1440139959-@bxe019.bisx.prod.on.blackberry> Message-ID: <46E5C2BD.7090705@transolutions.net> When I remove the xen domU's from the configuration everything comes up fine. Should the domU's be apart of their own cluster? But then I wouldn't be able to mount gfs from the dom0 right? smeacham at charter.net wrote: > Sent via BlackBerry by AT&T > > -----Original Message----- > From: Lon Hohberger > > Date: Mon, 10 Sep 2007 17:43:25 > To:jwilson at transolutions.net,linux clustering > Subject: Re: [Linux-cluster] Cluster not starting backup after reboot > > > On Mon, Sep 10, 2007 at 02:44:10PM -0500, James Wilson wrote: > >> I had 2 host cluster up and going over the weekend. I came in today and >> shutdown the cluster and added 2 more hosts to my current cluster. The >> new hosts are xen domU's. When I rebooted everything the cluster will >> not come back up. And my /var/log/messeges file has a lot of these >> errors below. Does anyone know why I would be getting these errors now? >> Any help is appreciated. >> >> ccsd[8297]: Cluster is not quorate. Refusing connection. >> ccsd[8297]: Error while processing connect: Connection refused >> > > You need at least 3 nodes online and the configuration file version # > matching on all of them. I'd start checking there. > > -- Lon > > From orkcu at yahoo.com Tue Sep 11 01:26:33 2007 From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=) Date: Mon, 10 Sep 2007 18:26:33 -0700 (PDT) Subject: [Linux-cluster] Cluster not starting backup after reboot In-Reply-To: <46E5C2BD.7090705@transolutions.net> Message-ID: <449347.296.qm@web50604.mail.re2.yahoo.com> --- James Wilson wrote: > When I remove the xen domU's from the configuration > everything comes up > fine. Should the domU's be apart of their own > cluster? But then I > wouldn't be able to mount gfs from the dom0 right? if you remove the 2 domU, then your cluster will be quorate with the other 2 nodes but if you add the 2 domUs, _and_ none of then join the cluster, the cluster will not be quorate :-( I sujest adding one-by-one domUs, because if you add just one domU to the cluster, it become a 3 node cluster and will quorated with just 2 nodes (the old ones), ultil your 1es domU join succefully the cluster don try to add the second domU. check the firewall of the domUs (comunications between the nodes) cu roger > > smeacham at charter.net wrote: > > Sent via BlackBerry by AT&T > > > > -----Original Message----- > > From: Lon Hohberger > > > > Date: Mon, 10 Sep 2007 17:43:25 > > To:jwilson at transolutions.net,linux clustering > > > Subject: Re: [Linux-cluster] Cluster not starting > backup after reboot > > > > > > On Mon, Sep 10, 2007 at 02:44:10PM -0500, James > Wilson wrote: > > > >> I had 2 host cluster up and going over the > weekend. I came in today and > >> shutdown the cluster and added 2 more hosts to my > current cluster. The > >> new hosts are xen domU's. When I rebooted > everything the cluster will > >> not come back up. And my /var/log/messeges file > has a lot of these > >> errors below. Does anyone know why I would be > getting these errors now? > >> Any help is appreciated. > >> > >> ccsd[8297]: Cluster is not quorate. Refusing > connection. > >> ccsd[8297]: Error while processing connect: > Connection refused > >> > > > > You need at least 3 nodes online and the > configuration file version # > > matching on all of them. I'd start checking > there. > > > > -- Lon > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > __________________________________________ RedHat Certified ( RHCE ) Cisco Certified ( CCNA & CCDA ) ____________________________________________________________________________________ Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545433 From bernard.chew at muvee.com Tue Sep 11 03:54:42 2007 From: bernard.chew at muvee.com (Bernard Chew) Date: Tue, 11 Sep 2007 11:54:42 +0800 Subject: [Linux-cluster] See DLM locks that are held Message-ID: <229C73600EB0E54DA818AB599482BCE901AC5A42@shadowfax.sg.muvee.net> Hi, I have a cluster with 4 nodes each running RHEL5. I remember able to see the DLM locks held in RHEL4 by echo the lockspace name into /proc/cluster/dlm_locks. How do I do this in RHEL5? I cannot see any cluster related directory in /proc. Regards, Bernard Chew From pcaulfie at redhat.com Tue Sep 11 07:02:48 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Tue, 11 Sep 2007 08:02:48 +0100 Subject: [Linux-cluster] See DLM locks that are held In-Reply-To: <229C73600EB0E54DA818AB599482BCE901AC5A42@shadowfax.sg.muvee.net> References: <229C73600EB0E54DA818AB599482BCE901AC5A42@shadowfax.sg.muvee.net> Message-ID: <46E63D98.9040604@redhat.com> Bernard Chew wrote: > Hi, > > I have a cluster with 4 nodes each running RHEL5. I remember able to see > the DLM locks held in RHEL4 by echo the lockspace name into > /proc/cluster/dlm_locks. How do I do this in RHEL5? I cannot see any > cluster related directory in /proc. Mount debugfs (eg on /debug) then look in /debug/dlm//locks -- Patrick From bernard.chew at muvee.com Tue Sep 11 08:03:49 2007 From: bernard.chew at muvee.com (Bernard Chew) Date: Tue, 11 Sep 2007 16:03:49 +0800 Subject: [Linux-cluster] See DLM locks that are held In-Reply-To: <46E63D98.9040604@redhat.com> References: <229C73600EB0E54DA818AB599482BCE901AC5A42@shadowfax.sg.muvee.net> <46E63D98.9040604@redhat.com> Message-ID: <229C73600EB0E54DA818AB599482BCE901AC5AEA@shadowfax.sg.muvee.net> > Bernard Chew wrote: > Hi, > > I have a cluster with 4 nodes each running RHEL5. I remember able to see > the DLM locks held in RHEL4 by echo the lockspace name into > /proc/cluster/dlm_locks. How do I do this in RHEL5? I cannot see any > cluster related directory in /proc. > > Mount debugfs (eg on /debug) then look in > > /debug/dlm//locks > > -- > Patrick Thanks Patrick! Regards, Bernard Chew From claudio.tassini at gmail.com Tue Sep 11 08:35:43 2007 From: claudio.tassini at gmail.com (Claudio Tassini) Date: Tue, 11 Sep 2007 10:35:43 +0200 Subject: [Linux-cluster] GFS: drop_count and drop_period tuning In-Reply-To: <9C203D6FD2BF9D49BFF3450201DEDA5301EACA71@LI-OWL.hag.hilti.com> References: <39fdf1c70709100418j44935e4sd9bae4da92319a11@mail.gmail.com> <9C203D6FD2BF9D49BFF3450201DEDA5301EACA71@LI-OWL.hag.hilti.com> Message-ID: <39fdf1c70709110135n7e50bb81p83237ff901b8bc87@mail.gmail.com> Thanks Michael, I've set this option on my filesystems. How should this impact to the system performance/behaviour? More/less memory usage? I guess that, by trimming the 50% of unused locks every 5 secs, it should cut off memory usage too.. am I right? If this works, I could also raise the drop_count value? 2007/9/10, Hagmann, Michael : > > Hi > > When you are on RHEL4.5 then I highly suggest you to use the new > glock_purge Parameter for every gfs Filesystem add to /etc/rc.local > ------- > gfs_tool settune / glock_purge 50 > gfs_tool settune /scratch glock_purge 50 > ------- > > also this Parameter has to set new on every mount. That mean when you > umount it and then mount it again, run the /etc/rc.local again, otherway the > parameter are gone! > > maybe also checkout this page --> http://www.open-sharedroot.org > /Members/marc/blog/blog-on-gfs/glock-trimming-patch > > mike > > ------------------------------ > *From:* linux-cluster-bounces at redhat.com [mailto: > linux-cluster-bounces at redhat .com] > *On Behalf Of *Claudio Tassini > *Sent:* Montag, 10. September 2007 13:19 > *To:* linux clustering > *Subject:* [Linux-cluster] GFS: drop_count and drop_period tuning > > Hi all, > > I have a four-nodes GFS cluster on RH 4.5 (last versions, updated > yesterday). There are three GFS filesystems ( 1 TB, 450 GB and 5GB), serving > some mail domains with postfix/courier imap in a "maildir" configuration. > > > As you can suspect, this is not exactly the best for GFS: we have a lot > (thousands) of very small files (emails) in a very lot of directories. I'm > trying to tune up things to reach the best performance. I found that tuning > the drop_count parameter in /proc/cluster/lock_dlm/drop_period , setting > it to a very large value (it was 500000 and now, after a memory upgrade, > I've set it to 1500000 ), uses a lot of memory (about 10GB out of 16 that > I've installed in every machine) and seems to "boost" performance limiting > the iowait CPU usage. > > > The bad thing is that when I umount a filesystem, it must clean up all > that locks (I think), and sometimes it causes problems to the whole cluster, > with the other nodes that stop writes to the filesystem while I'm umounting > on one node only. > Is this normal? How can I tune this to clean memory faster when I umount > the FS? I've read something about setting more gfs_glockd daemons per fs > with the num_glockd mount option, but it seems to be quite deprecated > because it shouldn't be necessary.. > > > > > -- > Claudio Tassini > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman > /listinfo/linux-cluster > -- Claudio Tassini -------------- next part -------------- An HTML attachment was scrubbed... URL: From jwilson at transolutions.net Tue Sep 11 13:18:08 2007 From: jwilson at transolutions.net (James Wilson) Date: Tue, 11 Sep 2007 08:18:08 -0500 Subject: [Linux-cluster] Cluster not starting backup after reboot In-Reply-To: <449347.296.qm@web50604.mail.re2.yahoo.com> References: <449347.296.qm@web50604.mail.re2.yahoo.com> Message-ID: <46E69590.3070707@transolutions.net> I think I have the wrong syntax for fencing xen. Do I add to the config file or ? Roger Pe?a wrote: > --- James Wilson wrote: > > >> When I remove the xen domU's from the configuration >> everything comes up >> fine. Should the domU's be apart of their own >> cluster? But then I >> wouldn't be able to mount gfs from the dom0 right? >> > if you remove the 2 domU, then your cluster will be > quorate with the other 2 nodes but > if you add the 2 domUs, _and_ none of then join the > cluster, the cluster will not be quorate :-( > > I sujest adding one-by-one domUs, because if you add > just one domU to the cluster, it become a 3 node > cluster and will quorated with just 2 nodes (the old > ones), ultil your 1es domU join succefully the cluster > don try to add the second domU. > > > check the firewall of the domUs (comunications between > the nodes) > > cu > roger > > > >> smeacham at charter.net wrote: >> >>> Sent via BlackBerry by AT&T >>> >>> -----Original Message----- >>> From: Lon Hohberger >>> >>> Date: Mon, 10 Sep 2007 17:43:25 >>> To:jwilson at transolutions.net,linux clustering >>> >> >> >>> Subject: Re: [Linux-cluster] Cluster not starting >>> >> backup after reboot >> >>> On Mon, Sep 10, 2007 at 02:44:10PM -0500, James >>> >> Wilson wrote: >> >>> >>> >>>> I had 2 host cluster up and going over the >>>> >> weekend. I came in today and >> >>>> shutdown the cluster and added 2 more hosts to my >>>> >> current cluster. The >> >>>> new hosts are xen domU's. When I rebooted >>>> >> everything the cluster will >> >>>> not come back up. And my /var/log/messeges file >>>> >> has a lot of these >> >>>> errors below. Does anyone know why I would be >>>> >> getting these errors now? >> >>>> Any help is appreciated. >>>> >>>> ccsd[8297]: Cluster is not quorate. Refusing >>>> >> connection. >> >>>> ccsd[8297]: Error while processing connect: >>>> >> Connection refused >> >>>> >>>> >>> You need at least 3 nodes online and the >>> >> configuration file version # >> >>> matching on all of them. I'd start checking >>> >> there. >> >>> -- Lon >>> >>> >>> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> >> > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > __________________________________________ > RedHat Certified ( RHCE ) > Cisco Certified ( CCNA & CCDA ) > > > > ____________________________________________________________________________________ > Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. > http://answers.yahoo.com/dir/?link=list&sid=396545433 > > From orkcu at yahoo.com Tue Sep 11 14:02:19 2007 From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=) Date: Tue, 11 Sep 2007 07:02:19 -0700 (PDT) Subject: [Linux-cluster] Cluster not starting backup after reboot In-Reply-To: <46E69590.3070707@transolutions.net> Message-ID: <517240.11618.qm@web50606.mail.re2.yahoo.com> --- James Wilson wrote: > I think I have the wrong syntax for fencing xen. Do > I add > to the config file or ? > I think, but I could be wrong, that for the purpose of "joining a cluster" a fence configuration for a node is not sooo important I would try to garantie the comunication between nodes, and the instalation of proper *-kernel packages according to the kernel running at the node just yesterday I had a funny problem, funny because cman report that que node join the cluster (and the cluster was quorated with its all 4 nodes in, the others nodes reported the problematic node as 'online') but ccsd was saying the opposite and refuse to listen to connections (complaining about "can't comunicate with cluster infrasture...") so everything wasn't unable to start (fence, clvmd, rgmanager, etc, etc) the problem was: running kernel-smp in the node but installed cman-kernel and not cman-kernel-smp ;-) so cman kernel module was not loaded ... BTW, cman start succefull without complain about not able to load its kernel module .... (I really don't like to top posting but how to follow the threat if not do it? ) > Roger Pe?a wrote: > > --- James Wilson > wrote: > > > > > >> When I remove the xen domU's from the > configuration > >> everything comes up > >> fine. Should the domU's be apart of their own > >> cluster? But then I > >> wouldn't be able to mount gfs from the dom0 > right? > >> > > if you remove the 2 domU, then your cluster will > be > > quorate with the other 2 nodes but > > if you add the 2 domUs, _and_ none of then join > the > > cluster, the cluster will not be quorate :-( > > > > I sujest adding one-by-one domUs, because if you > add > > just one domU to the cluster, it become a 3 node > > cluster and will quorated with just 2 nodes (the > old > > ones), ultil your 1es domU join succefully the > cluster > > don try to add the second domU. > > > > > > check the firewall of the domUs (comunications > between > > the nodes) > > > > cu > > roger > > > > > > > >> smeacham at charter.net wrote: > >> > >>> Sent via BlackBerry by AT&T > >>> > >>> -----Original Message----- > >>> From: Lon Hohberger > >>> > >>> Date: Mon, 10 Sep 2007 17:43:25 > >>> To:jwilson at transolutions.net,linux clustering > >>> > >> > >> > >>> Subject: Re: [Linux-cluster] Cluster not > starting > >>> > >> backup after reboot > >> > >>> On Mon, Sep 10, 2007 at 02:44:10PM -0500, James > >>> > >> Wilson wrote: > >> > >>> > >>> > >>>> I had 2 host cluster up and going over the > >>>> > >> weekend. I came in today and > >> > >>>> shutdown the cluster and added 2 more hosts to > my > >>>> > >> current cluster. The > >> > >>>> new hosts are xen domU's. When I rebooted > >>>> > >> everything the cluster will > >> > >>>> not come back up. And my /var/log/messeges file > >>>> > >> has a lot of these > >> > >>>> errors below. Does anyone know why I would be > >>>> > >> getting these errors now? > >> > >>>> Any help is appreciated. > >>>> > >>>> ccsd[8297]: Cluster is not quorate. Refusing > >>>> > >> connection. > >> > >>>> ccsd[8297]: Error while processing connect: > >>>> > >> Connection refused > >> > >>>> > >>>> > >>> You need at least 3 nodes online and the > >>> > >> configuration file version # > >> > >>> matching on all of them. I'd start checking > >>> > >> there. > >> > >>> -- Lon cu roger __________________________________________ RedHat Certified ( RHCE ) Cisco Certified ( CCNA & CCDA ) ____________________________________________________________________________________ Need a vacation? Get great deals to amazing places on Yahoo! Travel. http://travel.yahoo.com/ From jparsons at redhat.com Tue Sep 11 14:32:48 2007 From: jparsons at redhat.com (James Parsons) Date: Tue, 11 Sep 2007 10:32:48 -0400 Subject: [Linux-cluster] Cluster not starting backup after reboot In-Reply-To: <46E69590.3070707@transolutions.net> References: <449347.296.qm@web50604.mail.re2.yahoo.com> <46E69590.3070707@transolutions.net> Message-ID: <46E6A710.8050408@redhat.com> James Wilson wrote: > I think I have the wrong syntax for fencing xen. Do I add > to the config file or ? The tag should be in the cluster.conf file as a child of the tag, in the dom0 cluster. This just tells the outside cluster that vm fencing is going to be employed, so the fence_xvm daemon is started and begins listening for distress from the virtual cluster it is hosting. BTW, I am pretty sure that if you include this tag and you do NOT have a virtual cluster set up (yet), then nothing bad happens except that a few cpu cycles are stolen from the dom0 machine running the daemon for nothing. The inverse, however, is not true. A virtual cluster cannot be depended upon without the daemon running on the physical host(s). I hope this explanation buys you some insight. The simple reason for all of this, is that DomU machines do not know they are virtual and they cannot call 'vm destroy' on another vm even if they did know. They call the fence_xvm fence agent when there is trouble, and this agent calls out to the fence_xvm daemon running in the outer physical cluster and asks it to please shut a particular VM down. Perhaps xen kernels should include Kierkagaard and Sarte libraries for helping them deal with their isolation, alienation, and dreaded lonliness. -J From furor_hater at hotmail.com Tue Sep 11 19:32:07 2007 From: furor_hater at hotmail.com (notol Perc) Date: Tue, 11 Sep 2007 19:32:07 +0000 Subject: [Linux-cluster] GNBD Problems loading module Message-ID: Using the latest CVS Cluster Source (09-11-2007) I have configured a cluster on kernel 2.6.23-rc5 (running under Debian Etch) I can get everything running short of importing GNBD due to the fact that I can not find the kernal module. I can directly make cluster/gnbd-kernel/src/ I get the following: make -C /usr/src/linux-2.6.23-rc5 M=/usr/src/cluster/gnbd-kernel/src symverfile=/usr/src/linux-2.6.23-rc5/Module.symvers modules USING_KBUILD=yes make[1]: Entering directory `/usr/src/linux-2.6.23-rc5' Building modules, stage 2. MODPOST 1 modules make[1]: Leaving directory `/usr/src/linux-2.6.23-rc5' then make install make -C /usr/src/linux-2.6.23-rc5 M=/usr/src/cluster/gnbd-kernel/src symverfile=/usr/src/linux-2.6.23-rc5/Module.symvers modules USING_KBUILD=yes make[1]: Entering directory `/usr/src/linux-2.6.23-rc5' Building modules, stage 2. MODPOST 1 modules make[1]: Leaving directory `/usr/src/linux-2.6.23-rc5' install -d /usr/include/linux install gnbd.h /usr/include/linux install -d /lib/modules/`uname -r`/kernel/drivers/block/gnbd install gnbd.ko /lib/modules/`uname -r`/kernel/drivers/block/gnbd Ca some one pleas help be get this going? _________________________________________________________________ Get a FREE small business Web site and more from Microsoft? Office Live! http://clk.atdmt.com/MRT/go/aub0930003811mrt/direct/01/ From Abdel.Sadek at lsi.com Tue Sep 11 21:27:16 2007 From: Abdel.Sadek at lsi.com (Sadek, Abdel) Date: Tue, 11 Sep 2007 15:27:16 -0600 Subject: [Linux-cluster] fence_scsi agent on RHEL 4.5 Message-ID: I am running a 2-node cluster with RHEL 4.5 Native cluster. I am using scsi persistent reservation as my fencing device. I have noticed when I shutdown one of the nodes, the fence_scsi agent on the surviving node fails to fence the dying node. I get the following message: Sep 11 16:18:13 troy fenced[3614]: agent "fence_scsi" reports: parse error: unknown option "nodename=porsche" Sep 11 16:18:13 troy fenced[3614]: fence "porsche" failed it looks like the fence_scsi command is executed using with the nodename parameter instead of the -n option. when I run fence_scsi -h I get the following (there is no nodename parameter) Usage fence_scsi [options] Options -n IP address or hostname of node to fence -h usage -V version -v verbose But the man page of the fence_scsi command talks about using both the "-n" and "nodename=" options. So, how do I make the fence_scsi run with the -n instead of the nodename= option? Thanks. Abdel... -------------- next part -------------- An HTML attachment was scrubbed... URL: From Joel.Becker at oracle.com Tue Sep 11 23:46:08 2007 From: Joel.Becker at oracle.com (Joel Becker) Date: Tue, 11 Sep 2007 16:46:08 -0700 Subject: [Linux-cluster] changing configuration Message-ID: <20070911234607.GD27482@tasint.org> Hey everyone, How do I update the IP addresses of existing nodes? I have a simple cluster. I had two nodes on a private network (10.x.x.x). I decided to add two more nodes, but they are only on the public network. So I wanted to add them as well as change the existing nodes to use the public network. I shut down cman/ccs on all nodes. I edited cluster.conf. I started cman back on one node, and I ensured that cman_tool went to the new version of the config via "cman_tool version -r N+1". The problem is that it still appears to be using the private network addresses. I see this in the log and with "cman_tool nodes -a". What can I do to fix this, short of hunting down all cman and openais droppings and removing them? I want the "right" way :-) Joel -- "To fall in love is to create a religion that has a fallible god." -Jorge Luis Borges Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 From orkcu at yahoo.com Wed Sep 12 01:42:32 2007 From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=) Date: Tue, 11 Sep 2007 18:42:32 -0700 (PDT) Subject: [Linux-cluster] RHEL4.5, GFS and selinux, are they playing nice? Message-ID: <724236.51256.qm@web50608.mail.re2.yahoo.com> Hello everybody ;-) I keep working in making a web cluster play nice after the upgrade from RHEL4.4 -> RHEL4.5 with this upgrade, the relation httpd-selinux become more strict, my first problem came when the RHGFS4.4 do not support xattr (our web content is in a gfs filesystem) so I must update RHGFS and RHCS to 4.5 (from centos recompilation) so now I have support to xattr in ours GFS filesystems but, here is the problem: the httpd do not want to start because some config files (witch reside in another GFS filesystem) have a forbidden context (httpd can not read file with that context) (those files are included from the main apache configuration) even if I change the context and ls -Z show me that I change the context for every parent and final dir in the GFS filesystem. here are the error from selinux: { search } for pid=2289 comm="httpd" name="/" dev=dm-7 ino=25 scontext=root:system_r:httpd_t tcontext=system_u:object_r:nfs_t tclass=dir as you can see, selinux is dening access to httpd process to make a search in / (root of the filesystem in device dm-7), with inode 25 and that inode is a directory, it deny access because the context of that directory is system_u:object_r:nfs_t am I right? but, that directory is /opt/soft: ll -di /opt/soft/ 25 drwxr-xr-x 8 root root 3864 Sep 11 2007 /opt/soft/ ^^ <--- this is the inode and it context is system_u:object_r:httpd_config_t: ll -dZ /opt/soft/ drwxr-xr-x root root system_u:object_r:httpd_config_t /opt/soft/ so, who is wrong? ls -Z or "global selinux kernel module" ? because ls -Z show that the context of that directory is system_u:object_r:httpd_config_t if I set selinux to be in permissive mode, then apache can start, of course, but with some complains like this: Sep 11 14:18:08 blade26 kernel: audit(1189534688.151:38): avc: denied { search } for pid=2333 comm="httpd" name="/" dev=dm-7 ino=25 scontext=root:system_r:httpd_t tcontext=system_u:object_r:nfs_t tclass=dir Sep 11 14:18:08 blade26 kernel: audit(1189534688.155:39): avc: denied { getattr } for pid=2333 comm="httpd" name="apache" dev=dm-7 ino=31 scontext=root:system_r:httpd_t tcontext=system_u:object_r:nfs_t tclass=dir Sep 11 14:18:08 blade26 kernel: audit(1189534688.155:40): avc: denied { read } for pid=2333 comm="httpd" name="apache" dev=dm-7 ino=31 scontext=root:system_r:httpd_t tcontext=system_u:object_r:nfs_t tclass=dir Sep 11 14:18:08 blade26 kernel: audit(1189534688.158:41): avc: denied { getattr } for pid=2333 comm="httpd" name="httpd.conf" dev=dm-7 ino=484983 scontext=root:system_r:httpd_t tcontext=system_u:object_r:nfs_t tclass=file Sep 11 14:18:08 blade26 kernel: audit(1189534688.158:42): avc: denied { read } for pid=2333 comm="httpd" name="httpd.conf" dev=dm-7 ino=484983 scontext=root:system_r:httpd_t tcontext=system_u:object_r:nfs_t tclass=file this mean: access deny to do 1- search in /opt/soft 2- getattr and read directory /opt/soft/conf/apache 3- getattr and read file httpd.conf but: all this files or directory has context system_u:object_r:httpd_config_t ll -dZ /opt/soft/conf/apache/ drwxr-xr-x root root system_u:object_r:httpd_config_t /opt/soft/conf/apache/ ll -di /opt/soft/conf/apache/ 31 drwxr-xr-x 2 root root 3864 Sep 11 09:44 /opt/soft/conf/apache/ is this related to the fact that selinux policy stated this: genfscon gfs / system_u:object_r:nfs_t what do you recomment to solve this complains of selinux? mount the gfs filesystem with the option fscontext ? but that filesystem has other stuff, not related with apache, so, what context should I use? thanks roger __________________________________________ RedHat Certified ( RHCE ) Cisco Certified ( CCNA & CCDA ) ____________________________________________________________________________________ Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos. http://autos.yahoo.com/index.html From alain.richard at equation.fr Wed Sep 12 05:05:43 2007 From: alain.richard at equation.fr (Alain Richard) Date: Wed, 12 Sep 2007 07:05:43 +0200 Subject: [Linux-cluster] RE: qdisk votes not in cman In-Reply-To: <20070904211323.GI19477@redhat.com> References: <30E8283B-B35E-4DE2-A8B6-9D59ED51C3E8@equation.fr> <20070904211323.GI19477@redhat.com> Message-ID: Le 4 sept. 07 ? 23:13, Lon Hohberger a ?crit : > On Fri, Aug 31, 2007 at 12:46:50PM +0200, Alain RICHARD wrote: >> Perhaps a better error reporting is needed in qdiskd to shows that we >> have hit this problem. Also using a generic name like "qdisk device" >> when qdiskd is registering its node to cman is a better approach. > > What about using the label instead of the device name, and restricting > the label to 16 chars when advertising to cman? > > -- Lon Because when using multipath devices (for example a two paths device), all the paths and the multi-path device are recognized as having the same label, so qdisk fails to get the good device (the multi-path device). Regards, -- Alain RICHARD EQUATION SA Tel : +33 477 79 48 00 Fax : +33 477 79 48 01 Applications client/serveur, ing?nierie r?seau et Linux -------------- next part -------------- An HTML attachment was scrubbed... URL: From jprats at cesca.es Wed Sep 12 07:14:04 2007 From: jprats at cesca.es (Jordi Prats) Date: Wed, 12 Sep 2007 09:14:04 +0200 Subject: [Linux-cluster] Services timeout Message-ID: <46E791BC.2090006@cesca.es> Hi, I have a NFS server with RedHat Cluster. Sometimes when is on heavy load it sets the service status to failed. There's no fs corruption and no daemon is down. I suspect this is caused by some timeout while is checking the fs is mounted. There is any way to define the check interval or the check timeout? Thank you! Jordi -- ...................................................................... __ / / Jordi Prats C E / S / C A Dept. de Sistemes /_/ Centre de Supercomputaci? de Catalunya Gran Capit?, 2-4 (Edifici Nexus) ? 08034 Barcelona T. 93 205 6464 ? F. 93 205 6979 ? jprats at cesca.es ...................................................................... From pcaulfie at redhat.com Wed Sep 12 11:45:41 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Wed, 12 Sep 2007 12:45:41 +0100 Subject: [Linux-cluster] DLM - Lock Value Block error In-Reply-To: References: Message-ID: <46E7D165.4040301@redhat.com> Christos Triantafillou wrote: > Hi, > > I am using RHEL 4.5 and DLM 1.0.3 on a 4-node cluster. > > I noticed the following regarding the LVB: > 1. there are two processes: one that sets the LVB of a resource while > holding an EX lock > and another one that has a NL lock on the same resource and is blocked > on a dlm_lock_wait > for getting a CR lock and reading the LVB. > 2. when the first process is interrupted with control-C or killed, the > second process gets > an invalid LVB error. > > It seems that DLM falsely releases the resource after the first process > is gone and then > the second process reads an uninitialized LVB. > > Can you please confirm this error and create a bug report if necessary? I've just run the program on VMS and it exhibits exactly the same behaviour. Therefore I suspect this is not a bug ;-) -- Patrick From lhh at redhat.com Wed Sep 12 18:52:49 2007 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 12 Sep 2007 14:52:49 -0400 Subject: [Linux-cluster] RE: qdisk votes not in cman In-Reply-To: References: <30E8283B-B35E-4DE2-A8B6-9D59ED51C3E8@equation.fr> <20070904211323.GI19477@redhat.com> Message-ID: <20070912185249.GL7563@redhat.com> On Wed, Sep 12, 2007 at 07:05:43AM +0200, Alain Richard wrote: > > Le 4 sept. 07 ? 23:13, Lon Hohberger a ?crit : > > >On Fri, Aug 31, 2007 at 12:46:50PM +0200, Alain RICHARD wrote: > >>Perhaps a better error reporting is needed in qdiskd to shows that we > >>have hit this problem. Also using a generic name like "qdisk device" > >>when qdiskd is registering its node to cman is a better approach. > > > >What about using the label instead of the device name, and restricting > >the label to 16 chars when advertising to cman? > Because when using multipath devices (for example a two paths > device), all the paths and the multi-path device are recognized as > having the same label, so qdisk fails to get the good device (the > multi-path device). I meant implementation-wise, using the label instead of the device name to solve or work around the 16 character limit when talking to CMAN... -- Lon Hohberger - Software Engineer - Red Hat, Inc. From lhh at redhat.com Wed Sep 12 18:54:23 2007 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 12 Sep 2007 14:54:23 -0400 Subject: [Linux-cluster] Cluster not starting backup after reboot In-Reply-To: <46E5C2BD.7090705@transolutions.net> References: <1381753941-1189461572-cardhu_decombobulator_blackberry.rim.net-1440139959-@bxe019.bisx.prod.on.blackberry> <46E5C2BD.7090705@transolutions.net> Message-ID: <20070912185421.GM7563@redhat.com> On Mon, Sep 10, 2007 at 05:18:37PM -0500, James Wilson wrote: > When I remove the xen domU's from the configuration everything comes up > fine. Should the domU's be apart of their own cluster? But then I > wouldn't be able to mount gfs from the dom0 right? Yes, I wouldn't mix physical and virtual nodes in the same cluster. *that* introduces ugly quorum problems :) -- Lon Hohberger - Software Engineer - Red Hat, Inc. From lhh at redhat.com Wed Sep 12 18:57:59 2007 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 12 Sep 2007 14:57:59 -0400 Subject: [Linux-cluster] changing configuration In-Reply-To: <20070911234607.GD27482@tasint.org> References: <20070911234607.GD27482@tasint.org> Message-ID: <20070912185759.GN7563@redhat.com> On Tue, Sep 11, 2007 at 04:46:08PM -0700, Joel Becker wrote: > Hey everyone, > How do I update the IP addresses of existing nodes? > I have a simple cluster. I had two nodes on a private network > (10.x.x.x). I decided to add two more nodes, but they are only on the > public network. So I wanted to add them as well as change the existing > nodes to use the public network. The cluster node names need to resolve to the public network interface address, and I think 'uname -n' will need to match in some cases. Otherwise, you can issue: 'cman_tool join -n ' -- Lon -- Lon Hohberger - Software Engineer - Red Hat, Inc. From lhh at redhat.com Wed Sep 12 18:59:03 2007 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 12 Sep 2007 14:59:03 -0400 Subject: [Linux-cluster] Services timeout In-Reply-To: <46E791BC.2090006@cesca.es> References: <46E791BC.2090006@cesca.es> Message-ID: <20070912185903.GO7563@redhat.com> On Wed, Sep 12, 2007 at 09:14:04AM +0200, Jordi Prats wrote: > Hi, > I have a NFS server with RedHat Cluster. Sometimes when is on heavy load > it sets the service status to failed. There's no fs corruption and no > daemon is down. I suspect this is caused by some timeout while is > checking the fs is mounted. There is any way to define the check > interval or the check timeout? It shouldn't matter about load - a fail only occurs on fail-to-stop cases. Do you have any log messages from the incident? -- Lon Hohberger - Software Engineer - Red Hat, Inc. From Michael.Hagmann at hilti.com Wed Sep 12 19:50:20 2007 From: Michael.Hagmann at hilti.com (Hagmann, Michael) Date: Wed, 12 Sep 2007 21:50:20 +0200 Subject: [Linux-cluster] GFS: drop_count and drop_period tuning References: <39fdf1c70709100418j44935e4sd9bae4da92319a11@mail.gmail.com><9C203D6FD2BF9D49BFF3450201DEDA5301EACA71@LI-OWL.hag.hilti.com> <39fdf1c70709110135n7e50bb81p83237ff901b8bc87@mail.gmail.com> Message-ID: <9C203D6FD2BF9D49BFF3450201DEDA530D101D@LI-OWL.hag.hilti.com> Claudio the Problem is that ( befor glock_purge Parameter ) no real mechanism to release glocks exists, the only limit is the memory size. Because we have a lot of Memory ( min. 32 GB RAM ) and 6 Nodes, DLM cam on its limit to handle the locks ( over 6 million ) and timed out ! That means for you, maybe you use less memory but was is more important "performance" The DLM has less glocks to handle and is faster! In our Case the Cluster was, wihout this parameter not able to run! But I don't now how this impact the drop_cout value. mike Michael Hagmann UNIX Systems Engineering Enterprise Systems Technology Hilti Corporation 9494 Schaan Liechtenstein Department FIBS Feldkircherstrasse 100 P.O.Box 333 P +423-234 2467 F +423-234 6467 E michael.hagmann at hilti.com www.hilti.com -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of Claudio Tassini Sent: Tue 9/11/2007 10:35 To: linux clustering Subject: Re: [Linux-cluster] GFS: drop_count and drop_period tuning Thanks Michael, I've set this option on my filesystems. How should this impact to the system performance/behaviour? More/less memory usage? I guess that, by trimming the 50% of unused locks every 5 secs, it should cut off memory usage too.. am I right? If this works, I could also raise the drop_count value? 2007/9/10, Hagmann, Michael < Michael.Hagmann at hilti.com >: Hi When you are on RHEL4.5 then I highly suggest you to use the new glock_purge Parameter for every gfs Filesystem add to /etc/rc.local ------- gfs_tool settune / glock_purge 50 gfs_tool settune /scratch glock_purge 50 ------- also this Parameter has to set new on every mount. That mean when you umount it and then mount it again, run the /etc/rc.local again, otherway the parameter are gone! maybe also checkout this page --> http://www.open-sharedroot.org/Members/marc/blog/blog-on-gfs/glock-trimming-patch mike ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat .com ] On Behalf Of Claudio Tassini Sent: Montag, 10. September 2007 13:19 To: linux clustering Subject: [Linux-cluster] GFS: drop_count and drop_period tuning Hi all, I have a four-nodes GFS cluster on RH 4.5 (last versions, updated yesterday). There are three GFS filesystems ( 1 TB, 450 GB and 5GB), serving some mail domains with postfix/courier imap in a "maildir" configuration. As you can suspect, this is not exactly the best for GFS: we have a lot (thousands) of very small files (emails) in a very lot of directories. I'm trying to tune up things to reach the best performance. I found that tuning the drop_count parameter in /proc/cluster/lock_dlm/drop_period , setting it to a very large value (it was 500000 and now, after a memory upgrade, I've set it to 1500000 ), uses a lot of memory (about 10GB out of 16 that I've installed in every machine) and seems to "boost" performance limiting the iowait CPU usage. The bad thing is that when I umount a filesystem, it must clean up all that locks (I think), and sometimes it causes problems to the whole cluster, with the other nodes that stop writes to the filesystem while I'm umounting on one node only. Is this normal? How can I tune this to clean memory faster when I umount the FS? I've read something about setting more gfs_glockd daemons per fs with the num_glockd mount option, but it seems to be quite deprecated because it shouldn't be necessary.. -- Claudio Tassini -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman /listinfo/linux-cluster -- Claudio Tassini -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 4879 bytes Desc: not available URL: From orkcu at yahoo.com Wed Sep 12 19:50:27 2007 From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=) Date: Wed, 12 Sep 2007 12:50:27 -0700 (PDT) Subject: [Linux-cluster] RHEL4.5, GFS and selinux, are they playing nice? In-Reply-To: <724236.51256.qm@web50608.mail.re2.yahoo.com> Message-ID: <141914.88451.qm@web50602.mail.re2.yahoo.com> --- Roger Pe?a wrote: > Hello everybody ;-) > > I keep working in making a web cluster play nice > after > the upgrade from RHEL4.4 -> RHEL4.5 > with this upgrade, the relation httpd-selinux become > more strict [bla bla bla] > so now I have support to xattr in ours GFS > filesystems > but, here is the problem: > the httpd do not want to start because some config > files (witch reside in another GFS filesystem) have > a > forbidden context (httpd can not read file with that > context) (those files are included from the main > apache configuration) > here are the error from selinux: > { search } for pid=2289 comm="httpd" name="/" > dev=dm-7 ino=25 > scontext=root:system_r:httpd_t > tcontext=system_u:object_r:nfs_t > tclass=dir [bla bla bla] > but, that directory is /opt/soft: > ll -di /opt/soft/ > 25 drwxr-xr-x 8 root root 3864 Sep 11 2007 > /opt/soft/ > ^^ <--- this is the inode > > and it context is system_u:object_r:httpd_config_t: > ll -dZ /opt/soft/ > drwxr-xr-x root root > system_u:object_r:httpd_config_t /opt/soft/ > > so, who is wrong? ls -Z or "global selinux kernel > module" ? > because ls -Z show that the context of that > directory > is system_u:object_r:httpd_config_t [lots of bla bla] > is this related to the fact that selinux policy > stated > this: > genfscon gfs / system_u:object_r:nfs_t should I follow what is stated for reiserfs in this url: http://james-morris.livejournal.com/3580.html ? if I should do it, because is the right thing to do, why: 1- redhat did not do it for the release of 4.5 ? 2- others aren't getting this king of problems? Am I the only one with GFS-selinux problems ? cu roger __________________________________________ RedHat Certified ( RHCE ) Cisco Certified ( CCNA & CCDA ) ____________________________________________________________________________________ Yahoo! oneSearch: Finally, mobile search that gives answers, not web links. http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC From rohara at redhat.com Wed Sep 12 20:26:00 2007 From: rohara at redhat.com (Ryan O'Hara) Date: Wed, 12 Sep 2007 15:26:00 -0500 Subject: [Linux-cluster] RHEL4.5, GFS and selinux, are they playing nice? In-Reply-To: <141914.88451.qm@web50602.mail.re2.yahoo.com> References: <141914.88451.qm@web50602.mail.re2.yahoo.com> Message-ID: <46E84B58.7060209@redhat.com> Roger Pe?a wrote: >> is this related to the fact that selinux policy >> stated >> this: >> genfscon gfs / system_u:object_r:nfs_t Yes. This is what would be used for a filesystem that does not support selinux xattrs. In RHEL4.5, SELinux xattr support was added to GFS. However... > should I follow what is stated for reiserfs in this > url: > http://james-morris.livejournal.com/3580.html Yes. GFS needs to be defined as a filesystem that supports selinux xattrs. > if I should do it, because is the right thing to do, > why: > 1- redhat did not do it for the release of 4.5 ? The reason that the selinux policy was not updated for RHEL4.5 (in regards to selinux xattr support for GFS) is described in BZ 215559, comment #3: "Changing this on the installed environment could have unexpected results. For example, currently all files on gfs are unlabled and treated as nfs_t. If I suddenly make this change, these file would then be treated file_t and any domain that was using them would become unable to . This would require a relabel to fix. And could cause hundreds of AVC messages. I do not feel this is worth it since almost everyone will not use the labels on GFS to treat one file differently than another. In the future, where you might have /usr mounted on a gfs or gfs2 partition, this would become more valuable." > 2- others aren't getting this king of problems? I'm not sure how many people are using GFS with SELinux enabled. :) -Ryan From jwilson at transolutions.net Wed Sep 12 20:33:21 2007 From: jwilson at transolutions.net (James Wilson) Date: Wed, 12 Sep 2007 15:33:21 -0500 Subject: [Linux-cluster] Cluster not starting backup after reboot In-Reply-To: <20070912185421.GM7563@redhat.com> References: <1381753941-1189461572-cardhu_decombobulator_blackberry.rim.net-1440139959-@bxe019.bisx.prod.on.blackberry> <46E5C2BD.7090705@transolutions.net> <20070912185421.GM7563@redhat.com> Message-ID: <46E84D11.80305@transolutions.net> Thanks for the replies. I have decided to have the dom0's in one cluster and the domU's in another. I import the storage into the xen instances as raw storage and configure gfs from within the domU and it is working fine that now. The only thing is when I test failover the ip does not move over. When I checked the service it was still assigned to the instance that got fenced. Any ideas? Lon Hohberger wrote: > On Mon, Sep 10, 2007 at 05:18:37PM -0500, James Wilson wrote: > >> When I remove the xen domU's from the configuration everything comes up >> fine. Should the domU's be apart of their own cluster? But then I >> wouldn't be able to mount gfs from the dom0 right? >> > > Yes, I wouldn't mix physical and virtual nodes in the same cluster. > *that* introduces ugly quorum problems :) > > From orkcu at yahoo.com Wed Sep 12 20:37:03 2007 From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=) Date: Wed, 12 Sep 2007 13:37:03 -0700 (PDT) Subject: [Linux-cluster] RHEL4.5, GFS and selinux, are they playing nice? In-Reply-To: <46E84B58.7060209@redhat.com> Message-ID: <231325.27769.qm@web50607.mail.re2.yahoo.com> --- Ryan O'Hara wrote: > > Roger Pe?a wrote: > > >> is this related to the fact that selinux policy > >> stated > >> this: > >> genfscon gfs / system_u:object_r:nfs_t > > Yes. This is what would be used for a filesystem > that does not support > selinux xattrs. In RHEL4.5, SELinux xattr support > was added to GFS. > However... > > > should I follow what is stated for reiserfs in > this > > url: > > http://james-morris.livejournal.com/3580.html > > Yes. GFS needs to be defined as a filesystem that > supports selinux xattrs. > > > if I should do it, because is the right thing to > do, > > why: > > 1- redhat did not do it for the release of 4.5 ? > > The reason that the selinux policy was not updated > for RHEL4.5 (in > regards to selinux xattr support for GFS) is > described in BZ 215559, > comment #3: > > "Changing this on the installed environment could > have unexpected > results. For example, currently all files on gfs > are unlabled and > treated as nfs_t. If I suddenly make this change, > these file would then > be treated file_t and any domain that was using them > would become unable > to . This would require a relabel to fix. And > could cause hundreds of > AVC messages. I do not feel this is worth it since > almost everyone will > not use the labels on GFS to treat one file > differently than another. In > the future, where you might have /usr mounted on a > gfs or gfs2 > partition, this would become more valuable." thanks a lot I had few days looking in the net but never look in bugzilla :-( jejejeje > > > 2- others aren't getting this king of problems? > > I'm not sure how many people are using GFS with > SELinux enabled. :) I was forced to, by httpd, it complain about not able to open configuration files and documentRoots .... ok, I will try to follow what stated in the webpage, and relabel the system, but this after I study a litle bit more about selinux :-) thanks again roger __________________________________________ RedHat Certified ( RHCE ) Cisco Certified ( CCNA & CCDA ) ____________________________________________________________________________________ Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 From teigland at redhat.com Wed Sep 12 20:45:25 2007 From: teigland at redhat.com (David Teigland) Date: Wed, 12 Sep 2007 15:45:25 -0500 Subject: [Linux-cluster] GFS profiling result In-Reply-To: <200709061058.40741.hlawatschek@atix.de> References: <200709061058.40741.hlawatschek@atix.de> Message-ID: <20070912204525.GE5634@redhat.com> On Thu, Sep 06, 2007 at 10:58:40AM +0200, Mark Hlawatschek wrote: > Hi, > > during a performance analysis and tuning session, I did some profiling with > oprofile on GFS and dlm. > I got some weird results ... > > The installed software is: > RHEL4u5, kernel 2.6.9-55.0.2.ELsmp > GFS: 2.6.9-72.2.0.2 > DLM: 2.6.9-46.16.0.1 > > The configuration includes 2 clusternodes. > > I put the following load on one cluster node: > > 100 processes are doing in parallel: > - create 1000 files with 100kb size each (ie altogether we have 100.000 files) > - flock 1000 files > - unlink 1000 files. > > The following oprofile output shows, that the system spends about 49% > (75%*65%*) of the time in gfs_unlinked_get. > Looking into the code whe can see, that this is related to unlinked.c: > 53 9394211 58.7081 : ul = list_entry(tmp, struct > gfs_unlinked, ul_list); > > It can also be observed, that dlm spends more than 50% of its time in > searching for hashes... > > Is this the expected behaviour or can this be tuned somewhere ? Thanks for doing this, it's very interesting. For the dlm search_hashchain, could you try changing rsbtbl_size to 1024 (the default is 256). echo 1024 > /proc/.../rsbtbl_size after loading the dlm module, but before the lockspace is created. For gfs, I haven't looked very closely, but the linked list could probably be simply turned into a hash table. We'd want to study it more closely to make sure that the long non-hashed list is really the right thing to fix (i.e. we don't want to just fix a symptom of something else). Dave From david.costakos at gmail.com Wed Sep 12 21:03:38 2007 From: david.costakos at gmail.com (Dave Costakos) Date: Wed, 12 Sep 2007 14:03:38 -0700 Subject: [Linux-cluster] RE: qdisk votes not in cman In-Reply-To: <20070912185249.GL7563@redhat.com> References: <30E8283B-B35E-4DE2-A8B6-9D59ED51C3E8@equation.fr> <20070904211323.GI19477@redhat.com> <20070912185249.GL7563@redhat.com> Message-ID: <6b6836c60709121403x53061da6r2a061627e0cd388c@mail.gmail.com> For my part, I'd at least like to see an error message logged. Would've saved us all some time here. On 9/12/07, Lon Hohberger wrote: > > On Wed, Sep 12, 2007 at 07:05:43AM +0200, Alain Richard wrote: > > > > Le 4 sept. 07 ? 23:13, Lon Hohberger a ?crit : > > > > >On Fri, Aug 31, 2007 at 12:46:50PM +0200, Alain RICHARD wrote: > > >>Perhaps a better error reporting is needed in qdiskd to shows that we > > >>have hit this problem. Also using a generic name like "qdisk device" > > >>when qdiskd is registering its node to cman is a better approach. > > > > > >What about using the label instead of the device name, and restricting > > >the label to 16 chars when advertising to cman? > > > Because when using multipath devices (for example a two paths > > device), all the paths and the multi-path device are recognized as > > having the same label, so qdisk fails to get the good device (the > > multi-path device). > > I meant implementation-wise, using the label instead of the device name > to solve or work around the 16 character limit when talking to CMAN... > > -- > Lon Hohberger - Software Engineer - Red Hat, Inc. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Dave Costakos mailto:david.costakos at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Timothy.Ward at itt.com Wed Sep 12 22:03:20 2007 From: Timothy.Ward at itt.com (Ward, Timothy - SSD) Date: Wed, 12 Sep 2007 18:03:20 -0400 Subject: [Linux-cluster] Cluster NFS causes kernel bug In-Reply-To: <77E700AE7021314DB6CDF6D6E8F661320396FC24@ACDFWMAIL1.acd.de.ittind.com> Message-ID: <77E700AE7021314DB6CDF6D6E8F661320396FC25@ACDFWMAIL1.acd.de.ittind.com> I have successfully setup apache and samba as cluster services. I am now trying to setup nfs, but encountering a kernel bug. Any ideas where I should start looking to fix this? Thanks, Tim System ------ node1# uname -a Linux node1.cluster.com 2.6.18-1.2798.fc6 #1 SMP Mon Oct 16 14:39:22 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux FC6 64bit RPMs -------------- rpm -ivh fc6_rpm/openais-0.80.1-3.x86_64.rpm rpm -ivh fc6_rpm/perl-Net-Telnet-3.03-5.noarch.rpm rpm -ivh fc6_rpm_more/xen-libs-3.0.3-9.fc6.x86_64.rpm rpm -ivh fc6_rpm_more/bridge-utils-1.1-2.x86_64.rpm rpm -ivh --nodeps fc6_rpm_more/libvirt-0.2.3-1.fc6.x86_64.rpm rpm -ivh fc6_rpm_more/libvirt-python-0.2.3-1.fc6.x86_64.rpm rpm -ivh fc6_rpm_more/python-virtinst-0.95.0-1.fc6.noarch.rpm rpm -ivh fc6_rpm_more/xen-3.0.3-9.fc6.x86_64.rpm rpm -ivh fc6_rpm_updates/cman-2.0.60-1.fc6.x86_64.rpm rpm -ivh fc6_rpm_updates/gfs2-utils-0.1.25-1.fc6.x86_64.rpm rpm -ivh --force fc6_rpm_updates/device-mapper-1.02.13-1.fc6.x86_64.rpm rpm -ivh --force fc6_rpm_updates/lvm2-2.02.17-1.fc6.x86_64.rpm rpm -ivh fc6_rpm_updates/lvm2-cluster-2.02.17-1.fc6.x86_64.rpm rpm -ivh fc6_rpm/rgmanager-2.0.8-1.fc6.x86_64.rpm Luci rpm -ivh conga/python-imaging-1.1.6-3.fc6.x86_64.rpm rpm -ivh conga/zope-2.9.7-2.fc6.x86_64.rpm rpm -ivh conga/plone-2.5.3-1.fc6.x86_64.rpm rpm -ivh conga/luci-0.9.3-2.fc6.x86_64.rpm Ricci rpm -ivh --nodeps conga/oddjob-libs-0.27-8.x86_64.rpm rpm -ivh conga/oddjob-0.27-8.x86_64.rpm rpm -ivh conga/modcluster-0.9.3-2.fc6.x86_64.rpm rpm -ivh conga/ricci-0.9.3-2.fc6.x86_64.rpm /etc/cluster/cluster.conf -------------------------