From sco at adviseo.fr Sat Apr 1 21:06:32 2006 From: sco at adviseo.fr (Sylvain Coutant) Date: Sat, 1 Apr 2006 23:06:32 +0200 Subject: [Linux-cluster] gnbd server & cache Message-ID: <003001c655d0$2706e680$6300000a@ELTON> Hi, Could someone help me understand why gnbd server does not support non-caching exports when not coupled with the cluster suite ? I wonder what's the link between both ... BR, -- Sylvain COUTANT ADVISEO http://www.adviseo.fr/ http://www.open-sp.fr/ From halomoan at powere2e.com Sun Apr 2 04:58:35 2006 From: halomoan at powere2e.com (Halomoan ) Date: Sun, 2 Apr 2006 12:58:35 +0800 Subject: [Linux-cluster] GFS is for what and how it works ? Message-ID: <200604021258.AA403309094@mail.powere2e.com> Sorry, I'm newbie in GFS. Followed Redhat's GFS documentation To find out how GFS works, I have 2 nodes (node A and node B) for GFS and 1 node (node C) for GNBD server. It runs with no error but i don't know how to use it (GFS) I attached my /etc/cluster/cluster.conf below. My question is: 1. At a time, how many nodes have GFS filesystem mounted ? Where is the cluster's work in GFS ? 2. How do I shared the GFS filesystem to other server ? Do I need other software ? 3. From this configuration, if node A failed, what happen to the GFS filesystem ? failover to node B ? How about with the other server that is using the GFS filesystem in node A ? 4. Could you give me example what is actually the GFS real usage in real live ? I'm absolutely confuse with this GFS on how they works. Thanks Regards, Halomoan --------------------- Cluster.conf ------------------------ ______________ ______________ ______________ ______________ Sent via the KillerWebMail system at mail.powere2e.com From pcaulfie at redhat.com Mon Apr 3 09:04:11 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Mon, 03 Apr 2006 10:04:11 +0100 Subject: [Linux-cluster] standard mechanism to communicate between cluster nodes from kernel In-Reply-To: References: Message-ID: <4430E50B.9020104@redhat.com> Aneesh Kumar wrote: > Hi all, > > I was trying to understand whether there is a standard set of API we > are working on for communicating between different nodes in a cluster > inside kernel. I looked at ocfs2 and the ocfs2 dlm code base seems to > use tcp via o2net_send_tcp_msg and the redhat dlm seems to sctp. There > is also tipc (net/tipc) code in the kernel now ( I am not sure about > the details of tipc). This confuses me a lot. If i want to use all > these cluster components what is the standard way. I am right now > looking at clusterproc > (http://www.openssi.org/cgi-bin/view?page=proc-hooks.html ) and > wondering what should be the communication mechanism. clusterproc was > earlier based on CI which provided a simple easy way to define > different cluster services( more or less like rpcgen style > http://ci-linux.sourceforge.net/ics.shtml ). Does we are looking for a > framework like that ? > > NOTE: I am not trying to find out which one is the best. I am trying > to find out if there is a standard way of doing this > I'll repeat the reply I sent you you when you asked me this via private email, just for the record... I think you've answered your own question. each cluster manager has its own way of communicating between nodes. As for which is best, That depends on what you mean by "best". There are lots of variables in cluster comms. Do you want speed? reliability? predictability? ordering?" -- patrick From thaidn at gmail.com Mon Apr 3 10:30:16 2006 From: thaidn at gmail.com (Thai Duong) Date: Mon, 3 Apr 2006 17:30:16 +0700 Subject: [Linux-cluster] Manual fencing doest work Message-ID: Hi all, I have a 2 node GFS 6.1 cluster with the following configuration: It turns out that manual fencing doest work as expected. When I force power down a node, the other could not fence it and worse, the whole GFS file system is freeze waiting for the downed node to be up again. I got something like below in kernel log Apr 2 16:46:28 fcc1 fenced[3444]: fencing node "fcc4" Apr 2 16:46:28 fcc1 fenced[3444]: fence "fcc4" failed Some information about GFS and kernel: [root at fcc1 ~]# rpm -qa | grep GFS GFS-6.1.3-0 GFS-kernel-2.6.9-45.0.2 [root at fcc1 ~]# uname -a Linux fcc1 2.6.9-22.0.2.EL #1 SMP Thu Jan 5 17:04:58 EST 2006 ia64 ia64 ia64 GNU/Linux Please help. TIA, Thai Duong. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sunjw at onewaveinc.com Mon Apr 3 09:51:36 2006 From: sunjw at onewaveinc.com (=?GB2312?B?y++/oc6w?=) Date: Mon, 3 Apr 2006 17:51:36 +0800 Subject: [Linux-cluster] kernel panic about lock_dlm Message-ID: Hi, everyone I use kernel 2.6.15-rc7 and the latest STABLE cvs branch of GFS when the newest kernel is 2.6.15-rc7? I've started a GFS cluster with 4 nodes, but after about 4 days, the cluster did not work.I found the /var/log/messages as follows: <-- Mar 28 15:31:29 nd05 kernel: d 1 locks Mar 28 15:31:29 nd05 kernel: gfs-sda1 update remastered resources Mar 28 15:31:29 nd05 kernel: gfs-sda1 updated 0 resources Mar 28 15:31:29 nd05 kernel: gfs-sda1 rebuild locks Mar 28 15:31:29 nd05 kernel: gfs-sda1 rebuilt 0 locks Mar 28 15:31:29 nd05 kernel: gfs-sda1 recover event 11 done Mar 28 15:31:29 nd05 kernel: gfs-sda1 move flags 0,0,1 ids 8,11,11 Mar 28 15:31:29 nd05 kernel: gfs-sda1 process held requests Mar 28 15:31:29 nd05 kernel: gfs-sda1 processed 0 requests Mar 28 15:31:29 nd05 kernel: gfs-sda1 resend marked requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 resent 0 requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 11 finished Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 1,0,0 ids 11,11,11 Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 0,1,0 ids 11,14,11 Mar 28 15:31:30 nd05 kernel: gfs-sda1 move use event 14 Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14 Mar 28 15:31:30 nd05 kernel: gfs-sda1 add node 2 Mar 28 15:31:30 nd05 kernel: gfs-sda1 total nodes 4 Mar 28 15:31:30 nd05 kernel: gfs-sda1 rebuild resource directory Mar 28 15:31:30 nd05 kernel: gfs-sda1 rebuilt 1552 resources Mar 28 15:31:30 nd05 kernel: gfs-sda1 purge requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 purged 0 requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 mark waiting requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 marked 0 requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14 done Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 0,0,1 ids 11,14,14 Mar 28 15:31:30 nd05 kernel: gfs-sda1 process held requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 processed 0 requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 resend marked requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 resent 0 requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14 finished Mar 28 15:31:30 nd05 kernel: gfs-sda1 grant lock on lockqueue 2 Mar 28 15:31:30 nd05 kernel: gfs-sda1 process_lockqueue_reply id 9190386 state 0 Mar 28 15:31:30 nd05 kernel: gfs-sda1 grant lock on lockqueue 2 Mar 28 15:31:30 nd05 kernel: gfs-sda1 process_lockqueue_reply id eab0065 state 0 Mar 28 15:31:30 nd05 kernel: gfs-sda1 unlock fb040350 no id Mar 28 15:31:30 nd05 kernel: recovery_done jid 3 msg 309 a Mar 28 15:31:30 nd05 kernel: 3961 recovery_done nodeid 4 flg 18 Mar 28 15:31:30 nd05 kernel: 3977 pr_start last_stop 3 last_start 4 last_finish 3 Mar 28 15:31:31 nd05 kernel: 3977 pr_start count 3 type 3 event 4 flags 21a Mar 28 15:31:31 nd05 kernel: 3977 pr_start 4 done 1 Mar 28 15:31:31 nd05 kernel: 3976 pr_finish flags 1a Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13415b4b id 163005c 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13425b42 id 180002f 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13435b39 id 1a00360 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13445b30 id 1760186 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13455b27 id 17a038b 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13465b1e id 15a01a8 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13475b15 id 1910380 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13485b0c id 1880309 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13495b03 id 17001e6 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134a5afa id 1940352 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134b5af1 id 1650349 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134c5ae8 id 167001d 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,134d5adf id 15c0083 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134e5ad6 id 1770155 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134f5acd id 16400cb 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13505ac4 id 1680102 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13515abb id 1920051 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13525ab2 id 1850182 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13535aa9 id 17301cb 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13545aa0 id 17803ed 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13555a97 id 18a0111 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13565a8e id 16d03c5 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13575a85 id 1870026 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13585a7c id 185030b 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13595a73 id 15d0190 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135a5a6a id 14b03f1 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135b5a61 id 177025e 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135c5a58 id 198016f 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135d5a4f id 1640163 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135e5a46 id 1730233 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135f5a3d id 1880130 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13495b03 id 17001e6 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134a5afa id 1940352 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134b5af1 id 1650349 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134c5ae8 id 167001d 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,134d5adf id 15c0083 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134e5ad6 id 1770155 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134f5acd id 16400cb 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13505ac4 id 1680102 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13515abb id 1920051 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13525ab2 id 1850182 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13535aa9 id 17301cb 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13545aa0 id 17803ed 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13555a97 id 18a0111 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13565a8e id 16d03c5 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13575a85 id 1870026 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13585a7c id 185030b 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13595a73 id 15d0190 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135a5a6a id 14b03f1 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135b5a61 id 177025e 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135c5a58 id 198016f 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135d5a4f id 1640163 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135e5a46 id 1730233 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135f5a3d id 1880130 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13605a34 id 16f00aa 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13615a2b id 17400e1 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13625a22 id 16b03c1 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13635a19 id 16b03ad 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13645a10 id 17e03d4 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13655a07 id 18202c0 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136659fe id 170036c 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136759f5 id 155031c 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136859ec id 1660212 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136959e3 id 15c0114 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136a59da id 15a038f 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136b59d1 id 17600bb 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136c59c8 id 1a20336 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136d59bf id 171003c 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136e59b6 id 1500008 3,0 Mar 28 15:31:32 nd05 kernel: 3976 pr_start last_stop 4 last_start 9 last_finish 4 Mar 28 15:31:33 nd05 kernel: 3976 pr_start count 4 type 2 event 9 flags 21a Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,136f59ad id 15e026f 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,137059a4 id 170017e 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,1371599b id 16b01e3 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13725992 id 18000a2 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13735989 id 177017c 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13745980 id 16d035a 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13755977 id 18102d6 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,1376596e id 1740020 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13775965 id 1780207 3,0 Mar 28 15:31:33 nd05 kernel: 3976 pr_start 9 done 1 Mar 28 15:31:33 nd05 kernel: 3976 pr_finish flags 1a Mar 28 15:31:33 nd05 kernel: 3976 pr_start last_stop 9 last_start 10 last_finish 9 Mar 28 15:31:33 nd05 kernel: 3976 pr_start count 3 type 3 event 10 flags 21a Mar 28 15:31:33 nd05 kernel: 3976 pr_start 10 done 1 Mar 28 15:31:33 nd05 kernel: 3977 pr_finish flags 1a Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,370232 id 23a010e 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,380229 id 2630143 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,390220 id 29f0338 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3a0217 id 2850133 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3b020e id 268035b 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3c0205 id 2710344 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3d01fc id 27701f4 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3e01f3 id 28203f7 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3f01ea id 236011f 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,4001e1 id 25e0387 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,4101d8 id 2810157 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4201cf id 248035a 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4301c6 id 24d0297 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4401bd id 2920280 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4501b4 id 267000b 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4601ab id 263012c 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4701a2 id 2930281 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,480199 id 28e028d 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,490190 id 243031a 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4a0187 id 259000d 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4b017e id 2650370 3,0 Mar 28 15:31:35 nd05 kernel: 3976 pr_start last_stop 10 last_start 15 last_finish 10 Mar 28 15:31:35 nd05 kernel: 3976 pr_start count 4 type 2 event 15 flags 21a Mar 28 15:31:35 nd05 kernel: 3976 pr_start 15 done 1 Mar 28 15:31:35 nd05 kernel: 3976 pr_finish flags 1a Mar 28 15:31:35 nd05 kernel: Mar 28 15:31:35 nd05 kernel: lock_dlm: Assertion failed on line 357 of file /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/dlm/lock.c Mar 28 15:31:35 nd05 kernel: lock_dlm: assertion: "!error" Mar 28 15:31:35 nd05 kernel: lock_dlm: time = 79185725 Mar 28 15:31:35 nd05 kernel: gfs-sda1: error=-22 num=3,133b5b81 lkf=9 flags=84 Mar 28 15:31:35 nd05 kernel: Mar 28 15:31:37 nd05 kernel: ------------[ cut here ]------------ Mar 28 15:31:37 nd05 kernel: kernel BUG at /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/dlm/lock.c:357! Mar 28 15:31:37 nd05 kernel: invalid operand: 0000 [#1] Mar 28 15:31:37 nd05 kernel: SMP Mar 28 15:31:37 nd05 kernel: Modules linked in: lock_dlm dlm cman gfs lock_harness ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msgha ndler binfmt_misc dm_mirror dm_round_robin dm_multipath dm_mod video thermal processor fan button battery ac uhci_hcd usbcore hw_random shpchp pci_hotplug e1000 bonding qla2300 qla2xxx scsi_transport_fc sd_mod Mar 28 15:31:37 nd05 kernel: CPU: 1 Mar 28 15:31:37 nd05 kernel: EIP: 0060:[] Not tainted VLI Mar 28 15:31:37 nd05 kernel: EFLAGS: 00010282 (2.6.15-rc7smp) Mar 28 15:31:37 nd05 kernel: EIP is at do_dlm_unlock+0x8f/0xa4 [lock_dlm] Mar 28 15:31:37 nd05 kernel: eax: 00000004 ebx: f560c180 ecx: f5cf7f10 edx: f89edf11 Mar 28 15:31:37 nd05 kernel: esi: ffffffea edi: f8a7f000 ebp: f8a61580 esp: f5cf7f0c Mar 28 15:31:37 nd05 kernel: ds: 007b es: 007b ss: 0068 Mar 28 15:31:37 nd05 kernel: Process gfs_glockd (pid: 3979, threadinfo=f5cf6000 task=f6735030) Mar 28 15:31:37 nd05 kernel: Stack: f89edf11 f8a7f000 f55517b0 f89e97f0 f560c180 f8a3c64f f560c180 00000003 Mar 28 15:31:37 nd05 kernel: f55517d4 f8a329d8 f8a7f000 f560c180 00000003 f55517b0 f8a61580 f55517b0 Mar 28 15:31:37 nd05 kernel: f8a7f000 f8a31f28 f55517b0 f55517b0 00000001 f8a31fdc d82c34c0 f55517b0 Mar 28 15:31:37 nd05 kernel: Call Trace: Mar 28 15:31:37 nd05 kernel: [] lm_dlm_unlock+0x19/0x20 [lock_dlm] Mar 28 15:31:37 nd05 kernel: [] gfs_lm_unlock+0x2c/0x43 [gfs] Mar 28 15:31:37 nd05 kernel: [] gfs_glock_drop_th+0xe8/0x122 [gfs] Mar 28 15:31:37 nd05 kernel: [] rq_demote+0x76/0x92 [gfs] Mar 28 15:31:37 nd05 kernel: [] run_queue+0x54/0xb5 [gfs] Mar 28 15:31:37 nd05 kernel: [] unlock_on_glock+0x1d/0x24 [gfs] Mar 28 15:31:37 nd05 kernel: [] gfs_reclaim_glock+0xbd/0x135 [gfs] Mar 28 15:31:37 nd05 kernel: [] gfs_glockd+0x3a/0xe3 [gfs] Mar 28 15:31:37 nd05 kernel: [] default_wake_function+0x0/0x12 Mar 28 15:31:37 nd05 kernel: [] ret_from_fork+0x6/0x14 Mar 28 15:31:37 nd05 kernel: [] default_wake_function+0x0/0x12 Mar 28 15:31:37 nd05 kernel: [] gfs_glockd+0x0/0xe3 [gfs] Mar 28 15:31:37 nd05 kernel: [] kernel_thread_helper+0x5/0xb Mar 28 15:31:37 nd05 kernel: Code: 73 34 ff 73 2c ff 73 08 ff 73 04 ff 73 0c 56 8b 03 ff 70 18 68 09 e0 9e f8 e8 ac 14 73 c7 83 c4 34 68 11 df 9e f8 e8 9f 14 73 c7 <0f> 0b 65 01 58 de 9e f8 68 13 df 9e f8 e8 23 0d 73 c7 5b 5e c3 --> What problem may be there? Thanks for any reply! Luckey From troels at arvin.dk Mon Apr 3 14:16:55 2006 From: troels at arvin.dk (Troels Arvin) Date: Mon, 03 Apr 2006 16:16:55 +0200 Subject: [Linux-cluster] Using a null modem for heartbeat with CS4? Message-ID: Hello, I would like to have to heartbeat channels between my cluster nodes: A cross-over ethernet cable and a null modem cable. In the manual for Cluster Suite 3 (CS2), it's stated that a null modem cable can be used for heartbeat. The manual for CS4 doesn't mention null modem cables. Isn't it possible to use null modem cables for heartbeat in CS4? -- Greetings from Troels Arvin From libregeek at gmail.com Mon Apr 3 14:20:03 2006 From: libregeek at gmail.com (Manilal K M) Date: Mon, 3 Apr 2006 19:50:03 +0530 Subject: [Linux-cluster] Using a null modem for heartbeat with CS4? In-Reply-To: References: Message-ID: <2315046d0604030720p1e2d4fc3n8b5f2708649e950f@mail.gmail.com> On 03/04/06, Troels Arvin wrote: > Hello, > > I would like to have to heartbeat channels between my cluster nodes: A > cross-over ethernet cable and a null modem cable. > > In the manual for Cluster Suite 3 (CS2), it's stated that a null modem > cable can be used for heartbeat. > > The manual for CS4 doesn't mention null modem cables. Isn't it possible to > use null modem cables for heartbeat in CS4? AFAIK, Null modems are not supported in CS4. regards Manilal From Bowie_Bailey at BUC.com Mon Apr 3 14:30:36 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Mon, 3 Apr 2006 10:30:36 -0400 Subject: [Linux-cluster] GFS is for what and how it works ? Message-ID: <4766EEE585A6D311ADF500E018C154E302133870@bnifex.cis.buc.com> Halomoan wrote: > Sorry, I'm newbie in GFS. > > Followed Redhat's GFS documentation > To find out how GFS works, I have 2 nodes (node A and node B) for > GFS and 1 node (node C) for GNBD server. It runs with no error but i > don't know how to use it (GFS) > > I attached my /etc/cluster/cluster.conf below. > > My question is: > > 1. At a time, how many nodes have GFS filesystem mounted ? Where is > the cluster's work in GFS ? You can mount one node for each journal you created when you built the GFS filesystem. What the cluster does is manage access to the GFS filesystem and (attempt to) ensure that if one node starts having problems, it can't corrupt the filesystem. > 2. How do I shared the GFS filesystem to other server ? Do I need > other software ? GFS is simply a filesystem which is capable of being used on multiple nodes at the same time. How you mount it depends on what software or hardware you are using to share the media. GNBD can be used by a server to share it's storage with the other nodes. You can also use iSCSI, aoe, and others to connect each node directly to a separate storage unit. > 3. From this configuration, if node A failed, what happen to the GFS > filesystem ? failover to node B ? How about with the other server > that is using the GFS filesystem in node A ? There is no failover. Everything is always active. As long as the storage itself doesn't fail, the failure of one node should not be a problem. Unless, of course, it causes your cluster to lose quorum (drop below the minimum number of servers necessary to maintain the cluster). > 4. Could you give me example what is actually the GFS real usage in > real live ? I'm using it to share a 1.2 TB storage area between two systems that use it for processing and a third system that has direct access for making backups. > I'm absolutely confuse with this GFS on how they works. Yea. The documentation is not very extensive at this point. -- Bowie From JACOB_LIBERMAN at Dell.com Mon Apr 3 20:16:17 2006 From: JACOB_LIBERMAN at Dell.com (JACOB_LIBERMAN at Dell.com) Date: Mon, 3 Apr 2006 15:16:17 -0500 Subject: [Linux-cluster] Order of execution Message-ID: Hi cluster geniuses, I have a quick question. I am trying to write a custom startup script for an application called adsi rms. The application comes with its own startup script that requires the disk resource and network interface. Here is my question: When I create a custom startup script for the service and place it in /etc/init.d/, the cluster service can start the application successfully but not all services come online because the shared disk and IP do not appear to be available when the service starts. Is there a way to set the order of execution for a service so that the application will not start until AFTER the disk and network interface are available? Thanks again, Jacob From eric at bootseg.com Mon Apr 3 20:26:44 2006 From: eric at bootseg.com (Eric Kerin) Date: Mon, 03 Apr 2006 16:26:44 -0400 Subject: [Linux-cluster] Order of execution In-Reply-To: References: Message-ID: <1144096004.4004.14.camel@auh5-0479.corp.jabil.org> Jacob, The start/stop orders are defined in /usr/share/cluster/service.sh look under the special tag, there should be a child tag for each type of child node of service. Mine looks like so (current rgmanager rpm from RHN): For starting, fs should start first, then clusterfs, etc... finally smb and script start. For stopping, script would be stopped first, then ip, etc... finally fs. Thanks, Eric Kerin eric at bootseg.com On Mon, 2006-04-03 at 15:16 -0500, JACOB_LIBERMAN at Dell.com wrote: > Hi cluster geniuses, > > I have a quick question. > > I am trying to write a custom startup script for an application called > adsi rms. The application comes with its own startup script that > requires the disk resource and network interface. Here is my question: > > When I create a custom startup script for the service and place it in > /etc/init.d/, the cluster service can start the application successfully > but not all services come online because the shared disk and IP do not > appear to be available when the service starts. > > Is there a way to set the order of execution for a service so that the > application will not start until AFTER the disk and network interface > are available? > > Thanks again, Jacob > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From jbrassow at redhat.com Mon Apr 3 22:37:56 2006 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Mon, 3 Apr 2006 17:37:56 -0500 Subject: [Linux-cluster] Manual fencing doest work In-Reply-To: References: Message-ID: <6475746f533faa0d27117afbbcf54e7f@redhat.com> Fence manual setup simply waits until either 1) the user reboots the failed node _and_ uses fence_ack_manaul to notify the node asking for the fence that you have done so. or 2) the node that "failed" comes back up In the steps you described, you never acknowledged the request for fencing - hence, you have to wait for the machine to come back up. brassow BTW, i'd never use manual fencing in production. On Apr 3, 2006, at 5:30 AM, Thai Duong wrote: > Hi all, > > I have a 2 node GFS 6.1 cluster with the following configuration: > > > > > ??? > ??? > > ??? > ????? > ?????? > ??????? > ???????? > ??????? > ?????? > ????? > > ????? > ?????? > ??????? > ???????? > ??????? > ?????? > ????? > ?? > > ? > ?? > ? > > ? > > It turns out that manual fencing doest work as expected. When I force > power down a node, the other could not fence it and worse, the whole > GFS file system is freeze waiting for the downed node to be up again. > I got something like below in kernel log > > Apr? 2 16:46:28 fcc1 fenced[3444]: fencing node "fcc4" > Apr? 2 16:46:28 fcc1 fenced[3444]: fence "fcc4" failed > > Some information about GFS and kernel: > > [root at fcc1 ~]# rpm -qa | grep GFS > GFS-6.1.3-0 > GFS-kernel-2.6.9-45.0.2 > > [root at fcc1 ~]# uname -a > Linux fcc1 2.6.9-22.0.2.EL #1 SMP Thu Jan 5 17:04:58 EST 2006 ia64 > ia64 ia64 GNU/Linux > > Please help. > > TIA, > > Thai Duong. > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From teigland at redhat.com Tue Apr 4 03:08:53 2006 From: teigland at redhat.com (David Teigland) Date: Mon, 3 Apr 2006 22:08:53 -0500 Subject: [Linux-cluster] Manual fencing doest work In-Reply-To: References: Message-ID: <20060404030853.GA12817@redhat.com> On Mon, Apr 03, 2006 at 05:30:16PM +0700, Thai Duong wrote: > > > Try "fencedevices" and "fencedevice". Dave From halomoan at powere2e.com Tue Apr 4 06:11:18 2006 From: halomoan at powere2e.com (Halomoan Chow) Date: Tue, 4 Apr 2006 14:11:18 +0800 Subject: [Linux-cluster] GFS is for what and how it works ? In-Reply-To: <4766EEE585A6D311ADF500E018C154E302133870@bnifex.cis.buc.com> Message-ID: <001c01c657ae$9d9595f0$100fcc0a@pc002> Thank you Bowie You gave me a little light in GFS jungle :D Regards, Halomoan -----Original Message----- From: Bowie Bailey [mailto:Bowie_Bailey at BUC.com] Sent: Monday, April 03, 2006 10:31 PM To: halomoan at powere2e.com Cc: linux clustering Subject: RE: [Linux-cluster] GFS is for what and how it works ? Halomoan wrote: > Sorry, I'm newbie in GFS. > > Followed Redhat's GFS documentation > To find out how GFS works, I have 2 nodes (node A and node B) for > GFS and 1 node (node C) for GNBD server. It runs with no error but i > don't know how to use it (GFS) > > I attached my /etc/cluster/cluster.conf below. > > My question is: > > 1. At a time, how many nodes have GFS filesystem mounted ? Where is > the cluster's work in GFS ? You can mount one node for each journal you created when you built the GFS filesystem. What the cluster does is manage access to the GFS filesystem and (attempt to) ensure that if one node starts having problems, it can't corrupt the filesystem. > 2. How do I shared the GFS filesystem to other server ? Do I need > other software ? GFS is simply a filesystem which is capable of being used on multiple nodes at the same time. How you mount it depends on what software or hardware you are using to share the media. GNBD can be used by a server to share it's storage with the other nodes. You can also use iSCSI, aoe, and others to connect each node directly to a separate storage unit. > 3. From this configuration, if node A failed, what happen to the GFS > filesystem ? failover to node B ? How about with the other server > that is using the GFS filesystem in node A ? There is no failover. Everything is always active. As long as the storage itself doesn't fail, the failure of one node should not be a problem. Unless, of course, it causes your cluster to lose quorum (drop below the minimum number of servers necessary to maintain the cluster). > 4. Could you give me example what is actually the GFS real usage in > real live ? I'm using it to share a 1.2 TB storage area between two systems that use it for processing and a third system that has direct access for making backups. > I'm absolutely confuse with this GFS on how they works. Yea. The documentation is not very extensive at this point. -- Bowie From JACOB_LIBERMAN at Dell.com Tue Apr 4 12:55:44 2006 From: JACOB_LIBERMAN at Dell.com (JACOB_LIBERMAN at Dell.com) Date: Tue, 4 Apr 2006 07:55:44 -0500 Subject: [Linux-cluster] Order of execution Message-ID: Eric, I am running RHEL3 U4 with clumanager 1.2.22. I do not have the options listed below. Does anyone have an example script for this version? Lon? Thanks, Jacob > -----Original Message----- > From: Eric Kerin [mailto:eric at bootseg.com] > Sent: Monday, April 03, 2006 3:27 PM > To: Liberman, Jacob > Cc: linux clustering > Subject: Re: [Linux-cluster] Order of execution > > Jacob, > > The start/stop orders are defined in > /usr/share/cluster/service.sh look under the special tag, > there should be a child tag for each type of child node of service. > > Mine looks like so (current rgmanager rpm from RHN): > > > > > > > > > > > > > For starting, fs should start first, then clusterfs, etc... > finally smb and script start. > > For stopping, script would be stopped first, then ip, etc... > finally fs. > > Thanks, > Eric Kerin > eric at bootseg.com > > > On Mon, 2006-04-03 at 15:16 -0500, JACOB_LIBERMAN at Dell.com wrote: > > Hi cluster geniuses, > > > > I have a quick question. > > > > I am trying to write a custom startup script for an > application called > > adsi rms. The application comes with its own startup script that > > requires the disk resource and network interface. Here is > my question: > > > > When I create a custom startup script for the service and > place it in > > /etc/init.d/, the cluster service can start the application > successfully > > but not all services come online because the shared disk > and IP do not > > appear to be available when the service starts. > > > > Is there a way to set the order of execution for a service > so that the > > application will not start until AFTER the disk and network > interface > > are available? > > > > Thanks again, Jacob > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > From pcaulfie at redhat.com Tue Apr 4 13:40:52 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Tue, 04 Apr 2006 14:40:52 +0100 Subject: [Linux-cluster] Using a null modem for heartbeat with CS4? In-Reply-To: <2315046d0604030720p1e2d4fc3n8b5f2708649e950f@mail.gmail.com> References: <2315046d0604030720p1e2d4fc3n8b5f2708649e950f@mail.gmail.com> Message-ID: <44327764.4080108@redhat.com> Manilal K M wrote: > On 03/04/06, Troels Arvin wrote: >> Hello, >> >> I would like to have to heartbeat channels between my cluster nodes: A >> cross-over ethernet cable and a null modem cable. >> >> In the manual for Cluster Suite 3 (CS2), it's stated that a null modem >> cable can be used for heartbeat. >> >> The manual for CS4 doesn't mention null modem cables. Isn't it possible to >> use null modem cables for heartbeat in CS4? > AFAIK, Null modems are not supported in CS4. > If you're really desperate you could set up a serial PPP link between the two machines and do the IP heartbeat over that. Don't tell anyone I said that though ;-) -- patrick From Alain.Moulle at bull.net Wed Apr 5 08:51:33 2006 From: Alain.Moulle at bull.net (Alain Moulle) Date: Wed, 05 Apr 2006 10:51:33 +0200 Subject: [Linux-cluster] CS4 Update2 / cman systematically FAILED on service stop Message-ID: <44338515.10200@bull.net> Hi I have a systematic problem with cman stop on my configuration : knowing that there is no service with autostart in the cluster.conf, and that I have only one main service to be started by : clusvcadm -e SERVICE -m First test : launch CS4 OK stop CS4 OK no problem Second test : launch CS4 clusvcadm -e SERVICE -m then clusvcadm -d SERVICE stop CS4 ... in this case, cman stop is systematically FAILED ... This is true if both cases where CS4 is started on peer node as well as where is it stopped. Any clue or track to identify the problem ? Thanks Alain Moull? From ben.yarwood at juno.co.uk Wed Apr 5 11:51:31 2006 From: ben.yarwood at juno.co.uk (Ben Yarwood) Date: Wed, 5 Apr 2006 12:51:31 +0100 Subject: [Linux-cluster] Monitoring Cluster Services Message-ID: <089401c658a7$481d72b0$3964a8c0@WS076> I have set up a monitoring tool to check that all the appropriate processes are running on our cluster nodes. I am currently checking for the following: ccsd , 1 instance cman_comms, 1 instance cman_memb , 1 instance cman_serviced, 1 instance cman_hbeat, 1 instance fenced, 1 instance clvmd, 1 instance gfs_inoded, 1 instance for each gfs mount clurgmgrd, 1 instance Can anyone tell me if this is a correct and exhaustive list. Regards Ben From ilya at cs.msu.su Wed Apr 5 15:27:57 2006 From: ilya at cs.msu.su (Ilya M. Slepnev) Date: Wed, 05 Apr 2006 19:27:57 +0400 Subject: [Linux-cluster] Problems with compilation. Message-ID: <1144250877.8183.19.camel@localhost.localdomain> Hi, I'm sorry for inconvenience, did anybody faced such problem with configuring cluster-suite? It writes, that there is no directory named "/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2", but it is there... Am I doing something wrong? Is there some FAQ about that? Thanks, Ilya... khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make cd dlm-kernel && make make[1]: Entering directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel' cd src2 && make all make[2]: Entering directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2' make -C M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2 modules USING_KBUILD=yes make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No such file or directory. Stop. make: Entering an unknown directorymake: Leaving an unknown directorymake[2]: *** [all] Error 2 make[2]: Leaving directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2' make[1]: *** [all] Error 2 make[1]: Leaving directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel' make: *** [all] Error 2 khext at hess:~/nigma/ext3/gfs/cvs/cluster$ From jbrassow at redhat.com Wed Apr 5 15:40:45 2006 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Wed, 5 Apr 2006 10:40:45 -0500 Subject: [Linux-cluster] Problems with compilation. In-Reply-To: <1144250877.8183.19.camel@localhost.localdomain> References: <1144250877.8183.19.camel@localhost.localdomain> Message-ID: <6e718842c9112d2f91e40fc31e3b29b9@redhat.com> might want to skip the 'make' by itself... try: dir/cluster> make clean; make distclean dir/cluster> ./configure --kernel_src= dir/cluster> make install brassow On Apr 5, 2006, at 10:27 AM, Ilya M. Slepnev wrote: > Hi, > > I'm sorry for inconvenience, did anybody faced such problem with > configuring cluster-suite? It writes, that there is no directory named > "/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2", but it is > there... Am I doing something wrong? Is there some FAQ about that? > > Thanks, Ilya... > > khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make > cd dlm-kernel && make > make[1]: Entering directory > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel' > cd src2 && make all > make[2]: Entering directory > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2' > make -C M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2 > modules USING_KBUILD=yes > make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No > such file or directory. Stop. > make: Entering an unknown directorymake: Leaving an unknown > directorymake[2]: *** [all] Error 2 > make[2]: Leaving directory > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2' > make[1]: *** [all] Error 2 > make[1]: Leaving directory > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel' > make: *** [all] Error 2 > khext at hess:~/nigma/ext3/gfs/cvs/cluster$ > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From ilya at cs.msu.su Wed Apr 5 16:16:25 2006 From: ilya at cs.msu.su (Ilya M. Slepnev) Date: Wed, 05 Apr 2006 20:16:25 +0400 Subject: [Linux-cluster] Problems with compilation. In-Reply-To: <6e718842c9112d2f91e40fc31e3b29b9@redhat.com> References: <1144250877.8183.19.camel@localhost.localdomain> <6e718842c9112d2f91e40fc31e3b29b9@redhat.com> Message-ID: <1144253785.8185.27.camel@localhost.localdomain> Surely, I tried that first... Here is a lot of output of configure and "make install"... It seems not better than previous!-) khext at hess:~/nigma/ext3/gfs/cvs/cluster$ ./configure --kernel_src=/home/khext/nigma/ext3/linux-2.6.16.1 configure dlm-kernel Configuring Makefiles for your system... Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 101. configure gnbd-kernel Configuring Makefiles for your system... Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 95. configure magma Configuring Makefiles for your system... Completed Makefile configuration configure ccs Configuring Makefiles for your system... Completed Makefile configuration configure cman Configuring Makefiles for your system... Completed Makefile configuration configure dlm Configuring Makefiles for your system... Completed Makefile configuration configure fence Configuring Makefiles for your system... Completed Makefile configuration configure iddev Configuring Makefiles for your system... Completed Makefile configuration configure gulm Configuring Makefiles for your system... Completed Makefile configuration configure gfs-kernel Configuring Makefiles for your system... Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 107. configure gfs2-kernel Configuring Makefiles for your system... Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 95. configure gfs Configuring Makefiles for your system... Completed Makefile configuration configure gfs2 Configuring Makefiles for your system... Completed Makefile configuration configure gnbd Configuring Makefiles for your system... Completed Makefile configuration configure magma-plugins Configuring Makefiles for your system... Completed Makefile configuration configure rgmanager Configuring Makefiles for your system... Completed Makefile configuration configure cmirror Configuring Makefiles for your system... Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 95. khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make install cd dlm-kernel && make install make[1]: Entering directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel' cd src2 && make install make[2]: Entering directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2' make -C M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2 modules USING_KBUILD=yes make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No such file or directory. Stop. make: Entering an unknown directorymake: Leaving an unknown directorymake[2]: *** [all] Error 2 make[2]: Leaving directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2' make[1]: *** [install] Error 2 make[1]: Leaving directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel' make: *** [install] Error 2 khext at hess:~/nigma/ext3/gfs/cvs/cluster$ On Wed, 2006-04-05 at 10:40 -0500, Jonathan E Brassow wrote: > might want to skip the 'make' by itself... try: > > dir/cluster> make clean; make distclean > dir/cluster> ./configure --kernel_src= > dir/cluster> make install > > brassow > On Apr 5, 2006, at 10:27 AM, Ilya M. Slepnev wrote: > > > Hi, > > > > I'm sorry for inconvenience, did anybody faced such problem with > > configuring cluster-suite? It writes, that there is no directory named > > "/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2", but it is > > there... Am I doing something wrong? Is there some FAQ about that? > > > > Thanks, Ilya... > > > > khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make > > cd dlm-kernel && make > > make[1]: Entering directory > > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel' > > cd src2 && make all > > make[2]: Entering directory > > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2' > > make -C M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2 > > modules USING_KBUILD=yes > > make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No > > such file or directory. Stop. > > make: Entering an unknown directorymake: Leaving an unknown > > directorymake[2]: *** [all] Error 2 > > make[2]: Leaving directory > > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2' > > make[1]: *** [all] Error 2 > > make[1]: Leaving directory > > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel' > > make: *** [all] Error 2 > > khext at hess:~/nigma/ext3/gfs/cvs/cluster$ > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From jbrassow at redhat.com Wed Apr 5 18:36:55 2006 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Wed, 5 Apr 2006 13:36:55 -0500 Subject: [Linux-cluster] Problems with compilation. In-Reply-To: <1144253785.8185.27.camel@localhost.localdomain> References: <1144250877.8183.19.camel@localhost.localdomain> <6e718842c9112d2f91e40fc31e3b29b9@redhat.com> <1144253785.8185.27.camel@localhost.localdomain> Message-ID: <23453f82d4985b73787dc15e364ee7aa@redhat.com> did you setup and do a 'make' in your kernel tree. Failing to do that will give those errors. brassow On Apr 5, 2006, at 11:16 AM, Ilya M. Slepnev wrote: > Surely, I tried that first... Here is a lot of output of configure and > "make install"... It seems not better than previous!-) > > khext at hess:~/nigma/ext3/gfs/cvs/cluster$ ./configure > --kernel_src=/home/khext/nigma/ext3/linux-2.6.16.1 > configure dlm-kernel > > Configuring Makefiles for your system... > Can't open > /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at > ./configure line 101. > configure gnbd-kernel > > Configuring Makefiles for your system... > Can't open > /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at > ./configure line 95. > configure magma > > Configuring Makefiles for your system... > Completed Makefile configuration > > configure ccs > > Configuring Makefiles for your system... > Completed Makefile configuration > > configure cman > > Configuring Makefiles for your system... > Completed Makefile configuration > > configure dlm > > Configuring Makefiles for your system... > Completed Makefile configuration > > configure fence > > Configuring Makefiles for your system... > Completed Makefile configuration > > configure iddev > > Configuring Makefiles for your system... > Completed Makefile configuration > > configure gulm > > Configuring Makefiles for your system... > Completed Makefile configuration > > configure gfs-kernel > > Configuring Makefiles for your system... > Can't open > /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at > ./configure line 107. > configure gfs2-kernel > > Configuring Makefiles for your system... > Can't open > /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at > ./configure line 95. > configure gfs > > Configuring Makefiles for your system... > Completed Makefile configuration > > configure gfs2 > > Configuring Makefiles for your system... > Completed Makefile configuration > > configure gnbd > > Configuring Makefiles for your system... > Completed Makefile configuration > > configure magma-plugins > > Configuring Makefiles for your system... > Completed Makefile configuration > > configure rgmanager > > Configuring Makefiles for your system... > Completed Makefile configuration > > configure cmirror > > Configuring Makefiles for your system... > Can't open > /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at > ./configure line 95. > khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make install > cd dlm-kernel && make install > make[1]: Entering directory > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel' > cd src2 && make install > make[2]: Entering directory > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2' > make -C M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2 > modules USING_KBUILD=yes > make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No > such file or directory. Stop. > make: Entering an unknown directorymake: Leaving an unknown > directorymake[2]: *** [all] Error 2 > make[2]: Leaving directory > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2' > make[1]: *** [install] Error 2 > make[1]: Leaving directory > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel' > make: *** [install] Error 2 > khext at hess:~/nigma/ext3/gfs/cvs/cluster$ > > > > > On Wed, 2006-04-05 at 10:40 -0500, Jonathan E Brassow wrote: >> might want to skip the 'make' by itself... try: >> >> dir/cluster> make clean; make distclean >> dir/cluster> ./configure --kernel_src= >> dir/cluster> make install >> >> brassow >> On Apr 5, 2006, at 10:27 AM, Ilya M. Slepnev wrote: >> >>> Hi, >>> >>> I'm sorry for inconvenience, did anybody faced such problem with >>> configuring cluster-suite? It writes, that there is no directory >>> named >>> "/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2", but it is >>> there... Am I doing something wrong? Is there some FAQ about that? >>> >>> Thanks, Ilya... >>> >>> khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make >>> cd dlm-kernel && make >>> make[1]: Entering directory >>> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel' >>> cd src2 && make all >>> make[2]: Entering directory >>> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2' >>> make -C M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2 >>> modules USING_KBUILD=yes >>> make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: >>> No >>> such file or directory. Stop. >>> make: Entering an unknown directorymake: Leaving an unknown >>> directorymake[2]: *** [all] Error 2 >>> make[2]: Leaving directory >>> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2' >>> make[1]: *** [all] Error 2 >>> make[1]: Leaving directory >>> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel' >>> make: *** [all] Error 2 >>> khext at hess:~/nigma/ext3/gfs/cvs/cluster$ >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From jeffbethke at aol.net Wed Apr 5 20:57:54 2006 From: jeffbethke at aol.net (Jeffrey Bethke) Date: Wed, 05 Apr 2006 16:57:54 -0400 Subject: [Linux-cluster] speeding up df/statvfs( ) calls to large GFS volumes? Message-ID: <44342F52.3030608@aol.net> Hi! Is there a way to speed up the return vaules for df/statvfs( ) when using large GFS volume(e.g 25TB+)? I'm currently working a problem where, as part of disk monitoring, we need to run a statvfs( ) every few minutes. The problem is that we can't determine the interval of running the tool as GFS can, on occasion, take a long time to return a value! So, is there any variable I can tweak w/ gfs_tool, or mount option I can apply outside of 'noatime', that will help things like 'df -h' run consistently faster? Help? Thanks! .jeff From mtp at tilted.com Thu Apr 6 01:22:08 2006 From: mtp at tilted.com (Mark Petersen) Date: Wed, 05 Apr 2006 20:22:08 -0500 Subject: [Linux-cluster] GNBD, CLVM and snapshots Message-ID: <7.0.1.0.2.20060405195416.02784ab0@tilted.com> I'm wanting to use gnbd with clvm to export block devices for 3 (possibly more) hosts running Xen. Each host will have access to the single gnbd export with LVM. Only a single host will ever actually have the device mounted. GNBD can support live migrations with a block device, which is the main attraction. So a little info on Xen and what I want to do. There are dom0's (privileged VM) that have full access to any running domU (VM instances started by the dom0.) The dom0 will be running clvm/CCS/gnbd-Client/etc. The dom0 will start a domU that mounts the lv, only the dom0 needs direct access to this resource. In this configuration, would it be possible to take snapshots of the LV from the dom0? What about from another dom0 in the cluster? What about the gnbd-server? Is work still be done on csnap? There isn't much documentation on this, and it seems like it might be GFS specific. If this won't work with clvm and gnbd, is there an alternative that would work? I really want to be able to do snapshots and live migration with block devices. I'm not sure this is possible. I may fallback to only live migrations with gnbd if I have to. Finally, ideally this would be backed by DRBD, but can gnbd handle a primary/secondary role instead of doing multipath (which won't work with drbd.) Failover mode was mentioned in posts from over a year ago, and it sounds promising. From starstom at gmail.com Thu Apr 6 03:53:34 2006 From: starstom at gmail.com (Tom Stars) Date: Thu, 6 Apr 2006 09:23:34 +0530 Subject: [Linux-cluster] About Linux Cluster Message-ID: <551992020604052053m7bbc7f8cua7f20da14cf0d28f@mail.gmail.com> Hi I am newbie to linux clusters. i would like to setup a linux cluster of 4 nodes, and a DAS box for Storage connected to linux systems.through an optical fiber. All linux systems are running RHEL 4.0. AS Q1)Do i need GFS to be configured in case i have to run oracle on the cluster nodes . (Oracle 11i Application Server) Q2) when do i need GFS. Q3) If the DAS is mounted on 1 node and create an NFS Server and provides shares to other nodes, does it affect the performance. Thanks. Tom. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alain.Moulle at bull.net Thu Apr 6 07:13:28 2006 From: Alain.Moulle at bull.net (Alain Moulle) Date: Thu, 06 Apr 2006 09:13:28 +0200 Subject: [Linux-cluster] RE: CS4 Update2 / cman systematically FAILED on service stop /// New question /// Message-ID: <4434BF98.8070002@bull.net> I've identified the problem : in fact, that was due to a process launched via the SERVICE script, but which was not stopped on clusvcadm -s SERVICE (or -d) . Then, on service cman stop, the modprobe -r dlm was successful but at the end of this modprobe -r, the lsmod indicates one user left on cman : cman 136480 1 but without user identification (such as "cman 136480 10 dlm" when cs4 is all active). So the modprobe -r cman was then impossible. Could someone explain to me the link between a process managed in the SERVICE script and the remaining 1 user on cman ? Thanks Alain Moull? >> I have a systematic problem with cman stop on my configuration : >> knowing that there is no service with autostart in >> the cluster.conf, and that I have only one main service >> to be started by : clusvcadm -e SERVICE -m >> First test : >> launch CS4 OK >> stop CS4 OK >> no problem >> Second test : >> launch CS4 >> clusvcadm -e SERVICE -m >> then >> clusvcadm -d SERVICE >> stop CS4 ... >> in this case, cman stop is systematically FAILED ... >> This is true if both cases where CS4 is started >> on peer node as well as where is it stopped. >> Any clue or track to identify the problem ? From pcaulfie at redhat.com Thu Apr 6 07:25:53 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 06 Apr 2006 08:25:53 +0100 Subject: [Linux-cluster] RE: CS4 Update2 / cman systematically FAILED on service stop /// New question /// In-Reply-To: <4434BF98.8070002@bull.net> References: <4434BF98.8070002@bull.net> Message-ID: <4434C281.6010804@redhat.com> Alain Moulle wrote: > I've identified the problem : in fact, that was due to > a process launched via the SERVICE script, but which > was not stopped on clusvcadm -s SERVICE (or -d) . > Then, on service cman stop, the modprobe -r dlm was successful > but at the end of this modprobe -r, the lsmod > indicates one user left on cman : > cman 136480 1 > but without user identification (such as "cman 136480 10 dlm" when cs4 > is all active). > So the modprobe -r cman was then impossible. > > Could someone explain to me the link between a process > managed in the SERVICE script and the remaining 1 user > on cman ? There's no direct link. The usage count on cman is simply the number of links to it. They could be kernel or userspace users. In this case it could be CCS. Even if the cluster isn't operating, ccs polls the cluster manager to see if has come back up. -- patrick From figaro at neo-info.net Thu Apr 6 09:44:27 2006 From: figaro at neo-info.net (Figaro Yang) Date: Thu, 6 Apr 2006 17:44:27 +0800 Subject: [Linux-cluster] lock_gulm.ko needs unknown symbol tap_sig Message-ID: <011701c6595e$b8837a60$c800a8c0@neooffice> Hi ~ All? I have some question for rebuild gfs kernel , that has some error messages : if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map 2.6.11.img;fi WARNING: /lib/modules/2.6.11/kernel/fs/gfs_locking/lock_gulm/lock_gulm.ko needs unknown symbol tap_sig WARNING: /lib/modules/2.6.11/kernel/fs/gfs_locking/lock_gulm/lock_gulm.ko needs unknown symbol watch_sig WARNING: /lib/modules/2.6.11/kernel/fs/gfs_locking/lock_gulm/lock_gulm.ko needs unknown symbol sig_watcher_init WARNING: /lib/modules/2.6.11/kernel/fs/gfs_locking/lock_gulm/lock_gulm.ko needs unknown symbol sig_watcher_lock_drop how to fix this error ? thanks all help !! -------------- next part -------------- An HTML attachment was scrubbed... URL: From ocrete at max-t.com Thu Apr 6 16:34:41 2006 From: ocrete at max-t.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=) Date: Thu, 06 Apr 2006 12:34:41 -0400 Subject: [Linux-cluster] cman kickout out nodes for no good reason Message-ID: <1144341281.355.38.camel@cocagne.max-t.internal> Hi, I have a strange problem where cman suddenly starts kicking out members of the cluster with "Inconsistent cluster view" when I join a new node (sometimes). It takes a few minutes between each kicking. I'm using a snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is in transition state at that point and I can't stop/start services or do anything else. It did not do that with a snapshot I took a few months ago. -- Olivier Cr?te ocrete at max-t.com Maximum Throughput Inc. From charlie.sharkey at bustech.com Wed Apr 5 17:40:48 2006 From: charlie.sharkey at bustech.com (Charlie Sharkey) Date: Wed, 5 Apr 2006 13:40:48 -0400 Subject: [Linux-cluster] two node cluster startup problem Message-ID: <03FB5D708BE3C8448E8079186A56CDE67658CD@BTIBURMAIL.bustech.com> Hi, I'm having trouble with a two node cluster. The second node ("one") gets the config from "zero" ok, but won't join the cluster. It instead starts it's own cluster (according to /proc/cluster/nodes). My config file is below, any help would be appreciated. thanks ! From lhh at redhat.com Thu Apr 6 20:34:25 2006 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 06 Apr 2006 16:34:25 -0400 Subject: [Linux-cluster] Monitoring Cluster Services In-Reply-To: <089401c658a7$481d72b0$3964a8c0@WS076> References: <089401c658a7$481d72b0$3964a8c0@WS076> Message-ID: <1144355665.3723.1.camel@ayanami.boston.redhat.com> On Wed, 2006-04-05 at 12:51 +0100, Ben Yarwood wrote: > I have set up a monitoring tool to check that all the appropriate processes > are running on our cluster nodes. I am currently checking for the > following: > > ccsd , 1 instance > cman_comms, 1 instance > cman_memb , 1 instance > cman_serviced, 1 instance > cman_hbeat, 1 instance > fenced, 1 instance > clvmd, 1 instance > gfs_inoded, 1 instance for each gfs mount > clurgmgrd, 1 instance > > Can anyone tell me if this is a correct and exhaustive list. Looks like it's missing DLM threads. -- Lon From lhh at redhat.com Thu Apr 6 20:41:17 2006 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 06 Apr 2006 16:41:17 -0400 Subject: [Linux-cluster] Order of execution In-Reply-To: References: Message-ID: <1144356077.3723.10.camel@ayanami.boston.redhat.com> On Tue, 2006-04-04 at 07:55 -0500, JACOB_LIBERMAN at Dell.com wrote: > Eric, > > I am running RHEL3 U4 with clumanager 1.2.22. I do not have the options > listed below. > > Does anyone have an example script for this version? Lon? The linux-cluster / RHCS4 ordering is directly taken from RHCS3: (a) mount file systems (b) bring up IPs (c) start user service (only can have one in RHCS3) Is the cluster controlling all of the components, or is it only controlling some of them? It sounds like it should work. -- Lon From gstaltari at arnet.net.ar Thu Apr 6 21:19:47 2006 From: gstaltari at arnet.net.ar (German Staltari) Date: Thu, 06 Apr 2006 18:19:47 -0300 Subject: [Linux-cluster] GFS and CPU time Message-ID: <443585F3.4090100@arnet.net.ar> Hi, we've created a 6 node cluster with GFS filesystem. The question is why there's always one node that the CPU time of those GFS/lock related processes is a lot higher than the others. Node 1 root 3799 0.0 0.0 0 0 ? S< Mar31 0:00 [dlm_recoverd] root 3806 0.1 0.0 0 0 ? S< Mar31 16:37 [lock_dlm1] root 3807 0.1 0.0 0 0 ? S< Mar31 16:40 [lock_dlm2] root 3808 1.0 0.0 0 0 ? S Mar31 102:27 [gfs_scand] root 3809 0.1 0.0 0 0 ? S Mar31 18:05 [gfs_glockd] root 3810 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_recoverd] root 3811 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_logd] root 3812 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_quotad] root 3813 0.0 0.0 0 0 ? S Mar31 0:18 [gfs_inoded] Node 2 root 4230 0.0 0.0 0 0 ? S< Mar31 0:00 [dlm_recoverd] root 4237 0.0 0.0 0 0 ? S< Mar31 4:16 [lock_dlm1] root 4238 0.0 0.0 0 0 ? S< Mar31 4:13 [lock_dlm2] root 4239 0.4 0.0 0 0 ? S Mar31 38:01 [gfs_scand] root 4240 0.0 0.0 0 0 ? S Mar31 2:58 [gfs_glockd] root 4241 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_recoverd] root 4242 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_logd] root 4243 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_quotad] root 4244 0.0 0.0 0 0 ? S Mar31 0:45 [gfs_inoded] Node 3 root 4124 0.0 0.0 0 0 ? S< Mar31 0:00 [dlm_recoverd] root 4131 0.0 0.0 0 0 ? S< Mar31 2:29 [lock_dlm1] root 4132 0.0 0.0 0 0 ? S< Mar31 2:29 [lock_dlm2] root 4133 0.9 0.0 0 0 ? S Mar31 88:45 [gfs_scand] root 4134 0.0 0.0 0 0 ? S Mar31 2:35 [gfs_glockd] root 4135 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_recoverd] root 4136 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_logd] root 4137 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_quotad] root 4138 0.0 0.0 0 0 ? S Mar31 0:06 [gfs_inoded] Node 4 root 17576 0.0 0.0 0 0 ? S< Mar31 0:00 [dlm_recoverd] root 17577 0.0 0.0 0 0 ? S< Mar31 0:00 [lock_dlm1] root 17578 0.0 0.0 0 0 ? S< Mar31 0:00 [lock_dlm2] root 17579 0.0 0.0 0 0 ? S Mar31 0:01 [gfs_scand] root 17580 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_glockd] root 17581 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_recoverd] root 17582 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_logd] root 17583 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_quotad] root 17584 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_inoded] Node 5 root 30784 0.0 0.0 0 0 ? S< Mar31 0:00 [dlm_recoverd] root 30785 0.0 0.0 0 0 ? S< Mar31 0:47 [lock_dlm1] root 30786 0.0 0.0 0 0 ? S< Mar31 0:46 [lock_dlm2] root 30787 0.2 0.0 0 0 ? S Mar31 10:00 [gfs_scand] root 30788 0.0 0.0 0 0 ? S Mar31 0:50 [gfs_glockd] root 30789 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_recoverd] root 30790 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_logd] root 30791 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_quotad] root 30792 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_inoded] Node 6 root 4273 0.0 0.0 0 0 ? S< Mar31 0:00 [dlm_recoverd] root 4274 0.0 0.0 0 0 ? S< Mar31 0:18 [lock_dlm1] root 4275 0.0 0.0 0 0 ? S< Mar31 0:17 [lock_dlm2] root 4276 0.1 0.0 0 0 ? S Mar31 5:36 [gfs_scand] root 4277 0.0 0.0 0 0 ? S Mar31 0:22 [gfs_glockd] root 4278 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_recoverd] root 4279 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_logd] root 4280 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_quotad] root 4281 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_inoded] FC 4 kernel-smp-2.6.15-1.1831_FC4 dlm-kernel-smp-2.6.11.5-20050601.152643.FC4.21 GFS-kernel-smp-2.6.11.8-20050601.152643.FC4.24 cman-kernel-smp-2.6.11.5-20050601.152643.FC4.22 TIA German Staltari From ben.yarwood at juno.co.uk Thu Apr 6 22:45:59 2006 From: ben.yarwood at juno.co.uk (Ben Yarwood) Date: Thu, 6 Apr 2006 23:45:59 +0100 Subject: [Linux-cluster] Monitoring Cluster Services In-Reply-To: <1144355665.3723.1.camel@ayanami.boston.redhat.com> Message-ID: <093c01c659cb$df9bf150$3964a8c0@WS076> Is there one instance of each of the following? dlm_astd dlm_recvd dlm_sendd Cheers Ben > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger > Sent: 06 April 2006 21:34 > To: linux clustering > Subject: Re: [Linux-cluster] Monitoring Cluster Services > > On Wed, 2006-04-05 at 12:51 +0100, Ben Yarwood wrote: > > I have set up a monitoring tool to check that all the appropriate > > processes are running on our cluster nodes. I am currently > checking > > for the > > following: > > > > ccsd , 1 instance > > cman_comms, 1 instance > > cman_memb , 1 instance > > cman_serviced, 1 instance > > cman_hbeat, 1 instance > > fenced, 1 instance > > clvmd, 1 instance > > gfs_inoded, 1 instance for each gfs mount clurgmgrd, 1 instance > > > > Can anyone tell me if this is a correct and exhaustive list. > > Looks like it's missing DLM threads. > > -- Lon > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From ookami at gmx.de Fri Apr 7 04:36:04 2006 From: ookami at gmx.de (wolfgang pauli) Date: Fri, 7 Apr 2006 06:36:04 +0200 (MEST) Subject: [Linux-cluster] newbie: gfs merge Message-ID: <5174.1144384564@www022.gmx.net> Hi, I installed gfs and all the cluster stuff on our systems and I didn't have the impression that I missed any of the steps in the manual. So I have to nodes which both have a gfs partition mounted. I can also mount these, if I exported them with gnbd. But I don't see the big difference to nfs yet (apart from maybe performance). I thought that if I name the gfs-partitions the same (clustername:gfs1) they would be magically merged or something like that. I thought this was meant by the notion in the docs that GFS does not have a single point of failure. Or that we could have redundant file-servers. What did I get wrong about all that? P.S.: I did the changes to /etc/lvm/lvm.conf regarding the locking (locking_type=2). Thanks for any help!!! wolfgang -- E-Mails und Internet immer und ?berall! 1&1 PocketWeb, perfekt mit GMX: http://www.gmx.net/de/go/pocketweb From pcaulfie at redhat.com Fri Apr 7 07:20:23 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 07 Apr 2006 08:20:23 +0100 Subject: [Linux-cluster] two node cluster startup problem In-Reply-To: <03FB5D708BE3C8448E8079186A56CDE67658CD@BTIBURMAIL.bustech.com> References: <03FB5D708BE3C8448E8079186A56CDE67658CD@BTIBURMAIL.bustech.com> Message-ID: <443612B7.6010202@redhat.com> Charlie Sharkey wrote: > Hi, > > I'm having trouble with a two node cluster. The second node ("one") > gets the config from "zero" ok, but won't join the cluster. It instead > starts it's own cluster (according to /proc/cluster/nodes). My config > file is below, any help would be appreciated. thanks ! > Check you don't have any firewalling enabled. It's most likely that the nodes can't talk to each other. You'll need to open ports 6809/udp and 21064/tcp. Also check that you can ping and/or ssh between the machines. -- patrick From Michael.Roethlein at ri-solution.com Fri Apr 7 08:51:29 2006 From: Michael.Roethlein at ri-solution.com (=?iso-8859-1?Q?R=F6thlein_Michael_=28RI-Solution=29?=) Date: Fri, 7 Apr 2006 10:51:29 +0200 Subject: [Linux-cluster] GFS freezes without a trace Message-ID: <992633B6A0E42B49BC5A41C10A8C841B01DB222B@MUCEX004.root.local> Hi, In the last days it occured several times that gfs got lost, but I could not find any trace in any logfile I could think of. We have a 4 node cluster with each node attached to one storage with one gfs partition. Is there a gfs or whatever logfile i might have not found or is it possible to enable debugging? Thanks in Advance Yours Michael From Bowie_Bailey at BUC.com Fri Apr 7 13:42:26 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Fri, 7 Apr 2006 09:42:26 -0400 Subject: [Linux-cluster] newbie: gfs merge Message-ID: <4766EEE585A6D311ADF500E018C154E3021338A7@bnifex.cis.buc.com> wolfgang pauli wrote: > > I installed gfs and all the cluster stuff on our systems and I didn't > have the impression that I missed any of the steps in the manual. So > I have to nodes which both have a gfs partition mounted. I can also > mount these, if I exported them with gnbd. But I don't see the big > difference to nfs yet (apart from maybe performance). I thought that > if I name the gfs-partitions the same (clustername:gfs1) they would > be magically merged or something like that. I thought this was meant > by the notion in the docs that GFS does not have a single point of > failure. Or that we could have redundant file-servers. What did I get > wrong about all that? It sounds like you are a bit confused about what GFS does. I replied to someone within the last week or so on almost the same issue. Check the archives. GFS is a filesystem that allows multiple nodes to access and update it at the same time. The cluster services manage the nodes and try to prevent a misbehaving node from corrupting the filesystem. If you have hard drives in all of your nodes, GFS and the cluster will not help you make them into one big shared storage area -- at least not yet, I believe there is a beta (alpha?) project out there somewhere. If you have a big storage area, GFS and the cluster _will_ allow you to connect all of your nodes to it. The redundancy comes in the fact that you have multiple machines running from the same storage area. If one of the machines goes down, the others can continue working. In a load-balanced configuration, the loss of one of the nodes will be transparent to the users. In theory, of course... If the storage dies, that's another issue. Hopefully, your storage is raid and can handle a disk failure. -- Bowie From charlie.sharkey at bustech.com Fri Apr 7 14:00:08 2006 From: charlie.sharkey at bustech.com (Charlie Sharkey) Date: Fri, 7 Apr 2006 10:00:08 -0400 Subject: [Linux-cluster] two node cluster startup problem Message-ID: <03FB5D708BE3C8448E8079186A56CDE67659B4@BTIBURMAIL.bustech.com> That was it, problem solved. Ping worked ok, but not ssh. I stopped both the portmap and iptables services and now it joins ok. Thanks for your help ! charlie From ookami at gmx.de Fri Apr 7 19:22:51 2006 From: ookami at gmx.de (wolfgang pauli) Date: Fri, 7 Apr 2006 21:22:51 +0200 (MEST) Subject: [Linux-cluster] newbie: gfs merge References: <4766EEE585A6D311ADF500E018C154E3021338A7@bnifex.cis.buc.com> Message-ID: <20750.1144437771@www010.gmx.net> > > I installed gfs and all the cluster stuff on our systems and I didn't > > have the impression that I missed any of the steps in the manual. So > > I have to nodes which both have a gfs partition mounted. I can also > > mount these, if I exported them with gnbd. But I don't see the big > > difference to nfs yet (apart from maybe performance). I thought that > > if I name the gfs-partitions the same (clustername:gfs1) they would > > be magically merged or something like that. I thought this was meant > > by the notion in the docs that GFS does not have a single point of > > failure. Or that we could have redundant file-servers. What did I get > > wrong about all that? > > It sounds like you are a bit confused about what GFS does. I replied > to someone within the last week or so on almost the same issue. Check > the archives. > > GFS is a filesystem that allows multiple nodes to access and update it > at the same time. The cluster services manage the nodes and try to > prevent a misbehaving node from corrupting the filesystem. > > If you have hard drives in all of your nodes, GFS and the cluster will > not help you make them into one big shared storage area -- at least not > yet, I believe there is a beta (alpha?) project out there somewhere. > If you have a big storage area, GFS and the cluster _will_ allow you > to connect all of your nodes to it. > > The redundancy comes in the fact that you have multiple machines > running from the same storage area. If one of the machines goes down, > the others can continue working. In a load-balanced configuration, > the loss of one of the nodes will be transparent to the users. In > theory, of course... If the storage dies, that's another issue. > Hopefully, your storage is raid and can handle a disk failure. > > -- > Bowie Hm... Thanks for you answer! I am definetelly confused a bit. Even after reading you post of last week. I understand that i can not merge the file systems. Our setup is very basic. We have to linux machines who could act as file server and we thought that we could one (A) have working as an active backup of the other (B). Is that what the documentation calls a failover domain, with (B) being the failover "domain" for (A)? Until now, we were running rsync at night, so that if the first of the two servers failed, clients could mount the NFS from the other server. There is nothing fancy here, like a SAN I guess, just machines connected via ethernet switches. So basically the question is, whether it is possible to keep the filesystems on the two servers in total sync, so that it would not matter whether clients mount the remote share from (A) or (B). Whether the clients would automatically be able to mount the GFS from (B), if (A) fails. Wolfgang -- GMX Produkte empfehlen und ganz einfach Geld verdienen! Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner From Bowie_Bailey at BUC.com Fri Apr 7 19:32:38 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Fri, 7 Apr 2006 15:32:38 -0400 Subject: [Linux-cluster] newbie: gfs merge Message-ID: <4766EEE585A6D311ADF500E018C154E3021338B8@bnifex.cis.buc.com> wolfgang pauli wrote: > > > I installed gfs and all the cluster stuff on our systems and I > > > didn't have the impression that I missed any of the steps in the > > > manual. So I have to nodes which both have a gfs partition > > > mounted. I can also mount these, if I exported them with gnbd. > > > But I don't see the big difference to nfs yet (apart from maybe > > > performance). I thought that if I name the gfs-partitions the > > > same (clustername:gfs1) they would be magically merged or > > > something like that. I thought this was meant by the notion in > > > the docs that GFS does not have a single point of failure. Or > > > that we could have redundant file-servers. What did I get wrong > > > about all that? > > > > It sounds like you are a bit confused about what GFS does. I > > replied to someone within the last week or so on almost the same > > issue. Check the archives. > > > > GFS is a filesystem that allows multiple nodes to access and > > update it at the same time. The cluster services manage the nodes > > and try to prevent a misbehaving node from corrupting the > > filesystem. > > > > If you have hard drives in all of your nodes, GFS and the cluster > > will not help you make them into one big shared storage area -- at > > least not yet, I believe there is a beta (alpha?) project out > > there somewhere. If you have a big storage area, GFS and the > > cluster _will_ allow you to connect all of your nodes to it. > > > > The redundancy comes in the fact that you have multiple machines > > running from the same storage area. If one of the machines goes > > down, the others can continue working. In a load-balanced > > configuration, the loss of one of the nodes will be transparent to > > the users. In theory, of course... If the storage dies, that's > > another issue. Hopefully, your storage is raid and can handle a > > disk failure. > > Hm... Thanks for you answer! I am definetelly confused a bit. Even > after reading you post of last week. I understand that i can not > merge the file systems. Our setup is very basic. We have to linux > machines who could act as file server and we thought that we could > one (A) have working as an active backup of the other (B). Is that > what the documentation calls a failover domain, with (B) being the > failover "domain" for (A)? Until now, we were running rsync at > night, so that if the first of the two servers failed, clients could > mount the NFS from the other server. There is nothing fancy here, > like a SAN I guess, just machines connected via ethernet switches. > So basically the question is, whether it is possible to keep the > filesystems on the two servers in total sync, so that it would not > matter whether clients mount the remote share from (A) or (B). > Whether the clients would automatically be able to mount the GFS > from (B), if (A) fails. No, GFS doesn't work quite like that. What you have is something more like this: Two machines, (A) and (B), are file servers. A third machine, (C), is either a linux box exporting it's filesystem via GNBD, or a dedicated storage box running iSCSI, AoE, or something similar that will allow multiple connections. (A) and (B) are both connected to the GFS filesystem exported by (C). If either (A) or (B) goes down, the other one can continue serving the data from (C). They don't need to be synchronized because they are using the same physical storage. And, if the application permits, you can even run them both simultaneously. You are looking for something different. There is a project out there for that, but it is not production ready at this point. Maybe someone else remembers the name. -- Bowie From ookami at gmx.de Fri Apr 7 21:01:06 2006 From: ookami at gmx.de (wolfgang pauli) Date: Fri, 7 Apr 2006 23:01:06 +0200 (MEST) Subject: [Linux-cluster] newbie: gfs merge References: <4766EEE585A6D311ADF500E018C154E3021338B8@bnifex.cis.buc.com> Message-ID: <4720.1144443666@www010.gmx.net> > > > > I installed gfs and all the cluster stuff on our systems and I > > > > didn't have the impression that I missed any of the steps in the > > > > manual. So I have to nodes which both have a gfs partition > > > > mounted. I can also mount these, if I exported them with gnbd. > > > > But I don't see the big difference to nfs yet (apart from maybe > > > > performance). I thought that if I name the gfs-partitions the > > > > same (clustername:gfs1) they would be magically merged or > > > > something like that. I thought this was meant by the notion in > > > > the docs that GFS does not have a single point of failure. Or > > > > that we could have redundant file-servers. What did I get wrong > > > > about all that? > > > > > > It sounds like you are a bit confused about what GFS does. I > > > replied to someone within the last week or so on almost the same > > > issue. Check the archives. > > > > > > GFS is a filesystem that allows multiple nodes to access and > > > update it at the same time. The cluster services manage the nodes > > > and try to prevent a misbehaving node from corrupting the > > > filesystem. > > > > > > If you have hard drives in all of your nodes, GFS and the cluster > > > will not help you make them into one big shared storage area -- at > > > least not yet, I believe there is a beta (alpha?) project out > > > there somewhere. If you have a big storage area, GFS and the > > > cluster _will_ allow you to connect all of your nodes to it. > > > > > > The redundancy comes in the fact that you have multiple machines > > > running from the same storage area. If one of the machines goes > > > down, the others can continue working. In a load-balanced > > > configuration, the loss of one of the nodes will be transparent to > > > the users. In theory, of course... If the storage dies, that's > > > another issue. Hopefully, your storage is raid and can handle a > > > disk failure. > > > > Hm... Thanks for you answer! I am definetelly confused a bit. Even > > after reading you post of last week. I understand that i can not > > merge the file systems. Our setup is very basic. We have to linux > > machines who could act as file server and we thought that we could > > one (A) have working as an active backup of the other (B). Is that > > what the documentation calls a failover domain, with (B) being the > > failover "domain" for (A)? Until now, we were running rsync at > > night, so that if the first of the two servers failed, clients could > > mount the NFS from the other server. There is nothing fancy here, > > like a SAN I guess, just machines connected via ethernet switches. > > So basically the question is, whether it is possible to keep the > > filesystems on the two servers in total sync, so that it would not > > matter whether clients mount the remote share from (A) or (B). > > Whether the clients would automatically be able to mount the GFS > > from (B), if (A) fails. > > No, GFS doesn't work quite like that. What you have is something more > like this: Two machines, (A) and (B), are file servers. A third > machine, (C), is either a linux box exporting it's filesystem via > GNBD, or a dedicated storage box running iSCSI, AoE, or something > similar that will allow multiple connections. (A) and (B) are both > connected to the GFS filesystem exported by (C). If either (A) or (B) > goes down, the other one can continue serving the data from (C). They > don't need to be synchronized because they are using the same physical > storage. And, if the application permits, you can even run them both > simultaneously. > > You are looking for something different. There is a project out there > for that, but it is not production ready at this point. Maybe someone > else remembers the name. > > -- > Bowie > Oh, OK. This would makes sense to me. But I still have some questions.. 1. Would this reduce the load on (C)? 2. I know how to export the gfs from (C) and mount it on (A) and (B), but how to the clients know whether they should connect to (A) or (B). Is this managed my clvmd? Thanks for the great help so far!! wolfgang -- Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%! Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer From kumaresh81 at yahoo.co.in Sat Apr 8 16:48:04 2006 From: kumaresh81 at yahoo.co.in (Kumaresh Ponnuswamy) Date: Sat, 8 Apr 2006 17:48:04 +0100 (BST) Subject: [Linux-cluster] issues with rhcs 4.2 Message-ID: <20060408164804.54434.qmail@web8319.mail.in.yahoo.com> hi, I recently migrated from rhcs 3 to rhcs 4.2 and since then I am unable to bring up the clustered services. Even though the services are getting executed (like the VIP, shared devices etc), the status in clustat and system-config-cluster still displays failed and because of this the failover is not happening. Any light on this will be much appreciated. Cluster is on RHEL AS 4U2 with two nodes. Regards, Kumaresh --------------------------------- Jiyo cricket on Yahoo! India cricket Yahoo! Messenger Mobile Stay in touch with your buddies all the time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From l.dardini at comune.prato.it Sat Apr 8 17:05:18 2006 From: l.dardini at comune.prato.it (Leandro Dardini) Date: Sat, 8 Apr 2006 19:05:18 +0200 Subject: [Linux-cluster] Cluster node not able to access all cluster resource Message-ID: <0C5C8B118420264EBB94D7D7050150011EFA92@exchange2.comune.prato.local> The topic is not a problem, but what I want to do. I have a lots of service, each on is now run by a two node cluster. This is very bad due to each node fencing other one during network blackout. I'd like to create only one cluster, but each resource, either GFS filesystems, must be readable only by a limited number of nodes. For example, taking a Cluster "test" made of node A, node B, node C, node D and with the following resources: GFS Filesystem alpha and GFS Filesystem beta. I want that only node A and node B can access GFS Filesystem alpha and only node C and node D can access GFS Filesystem beta. Is it possible? Leandro From ookami at gmx.de Sun Apr 9 00:44:15 2006 From: ookami at gmx.de (wolfgang pauli) Date: Sun, 9 Apr 2006 02:44:15 +0200 (MEST) Subject: [Linux-cluster] hangs when copying with gnbd and gfs Message-ID: <20347.1144543455@www012.gmx.net> Hi, I could successfully mount a gfs partition and export with gnbd. It was also very fast, when I was moving a file from the client to the server, but if I try a second operation, like copying the file back, it always hangs. I can not even do copy files locally to the gfs partition anymore. Unfortunately, there is no info at all in the syslog or any other logfile. And the "gnbd_import -vl" and "gnbd_export -vl" don't show any error either. I guess it has something to do with the locking or fencing, but I don't understand that very well. Below it my config etc. Thanks for any hints!! I exported/imported the file system like that: gnbd_export -d /dev/hdd1 -e testgfs gnbd_import -i eon mount -t gfs /dev/gnbd/testgfs /mnt/gfs1/ -- GMX Produkte empfehlen und ganz einfach Geld verdienen! Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner From ookami at gmx.de Sun Apr 9 02:54:55 2006 From: ookami at gmx.de (wolfgang pauli) Date: Sun, 9 Apr 2006 04:54:55 +0200 (MEST) Subject: [Linux-cluster] hangs when copying with gnbd and gfs References: <20347.1144543455@www012.gmx.net> Message-ID: <22376.1144551295@www084.gmx.net> Could this be related to automount? I just tried it again copied back a forth some mpg files and everything worked fine. But then I copied another file (230MB of /dev/zero) and the copying froze. The only think I could find in the log file was this: Apr 8 20:44:26 echo automount[5176]: failed to mount /misc/.directory Apr 8 20:44:26 echo automount[5177]: failed to mount /misc/.directory Apr 8 20:44:26 echo automount[5178]: >> /usr/sbin/showmount: can't get address for .directory Apr 8 20:44:26 echo automount[5178]: lookup(program): lookup for .directory failed Apr 8 20:44:26 echo automount[5178]: failed to mount /net/.directory Apr 8 20:44:26 echo automount[5183]: >> /usr/sbin/showmount: can't get address for .directory Apr 8 20:44:26 echo automount[5183]: lookup(program): lookup for .directory failed Apr 8 20:44:26 echo automount[5183]: failed to mount /net/.directory Another question I have is whether it is possible to mount the gfs on the server while it gnbd-exports the filesystem? wolfgang -- GMX Produkte empfehlen und ganz einfach Geld verdienen! Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner From Alain.Moulle at bull.net Mon Apr 10 11:02:08 2006 From: Alain.Moulle at bull.net (Alain Moulle) Date: Mon, 10 Apr 2006 13:02:08 +0200 Subject: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster Message-ID: <443A3B30.10307@bull.net> Hi I'm trying to configure a simple 3 nodes cluster with simple tests scripts. But I can't start cman, it remains stalled with this message in syslog : Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005 16:04:34) installed Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered protocol family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]: cluster.conf (cluster name = HA_METADATA_3N, version = 8) found. Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to join or form a Linux-cluster Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2 Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status:: Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN: forming a new cluster and nothing more. The graphic tool dos not detect any error in configuration; I 've attached my cluster.conf for the three nodes, knowing that I wanted two nodes (yack10 and yack21) running theirs applications and the 3rd one (yack23) as a backup for yack10 and/or yack21, but I don't want any failover between yack10 and yack21. PS : I 've verified all ssh connections between the 3 nodes, and all the fence paths as described in the cluster.conf. Thanks again for your help. Alain -------------- next part -------------- A non-text attachment was scrubbed... Name: cluster.conf Type: text/xml Size: 1500 bytes Desc: not available URL: From l.dardini at comune.prato.it Mon Apr 10 11:11:04 2006 From: l.dardini at comune.prato.it (Leandro Dardini) Date: Mon, 10 Apr 2006 13:11:04 +0200 Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster Message-ID: <0C5C8B118420264EBB94D7D7050150011EFACF@exchange2.comune.prato.local> > -----Messaggio originale----- > Da: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] Per conto di Alain Moulle > Inviato: luned? 10 aprile 2006 13.02 > A: linux-cluster at redhat.com > Oggetto: [Linux-cluster] CS4 U2 / problem to configure a 3 > nodes cluster > > Hi > > I'm trying to configure a simple 3 nodes cluster with simple > tests scripts. > But I can't start cman, it remains stalled with this message > in syslog : > Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr 10 > 11:38:00 s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005 > 16:04:34) installed > Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered > protocol family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]: > cluster.conf (cluster name = HA_METADATA_3N, version = 8) found. > Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to join > or form a Linux-cluster Apr 10 11:38:01 s_sys at yack21 > ccsd[25004]: Connected to cluster infrastruture > via: CMAN/SM Plugin v1.1.2 > Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status:: > Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN: > forming a new cluster > > and nothing more. > > The graphic tool dos not detect any error in configuration; I > 've attached my cluster.conf for the three nodes, knowing > that I wanted two nodes (yack10 and yack21) running theirs > applications and the 3rd one (yack23) as a backup for yack10 > and/or yack21, but I don't want any failover between yack10 > and yack21. > > PS : I 've verified all ssh connections between the 3 nodes, > and all the fence paths as described in the cluster.conf. > Thanks again for your help. > > Alain > Are you starting the cman on all three nodes in the same time? A node doesn't start until each other node is starting. Timing is important during booting. Leandro From pcaulfie at redhat.com Mon Apr 10 12:02:58 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Mon, 10 Apr 2006 13:02:58 +0100 Subject: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster In-Reply-To: <443A3B30.10307@bull.net> References: <443A3B30.10307@bull.net> Message-ID: <443A4972.5030000@redhat.com> Alain Moulle wrote: > Hi > > I'm trying to configure a simple 3 nodes cluster > with simple tests scripts. > But I can't start cman, it remains stalled with > this message in syslog : > Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded > Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005 > 16:04:34) installed > Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered protocol family 30 > Apr 10 11:38:00 s_sys at yack21 ccsd[25004]: cluster.conf (cluster name = > HA_METADATA_3N, version = 8) found. > Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to join or form a > Linux-cluster > Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Connected to cluster infrastruture > via: CMAN/SM Plugin v1.1.2 > Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status:: Inquorate > Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN: forming a new cluster > > and nothing more. > > The graphic tool dos not detect any error in configuration; I 've > attached my cluster.conf for the three nodes, knowing that > I wanted two nodes (yack10 and yack21) running theirs applications > and the 3rd one (yack23) as a backup for yack10 and/or yack21, > but I don't want any failover between yack10 and yack21. > > PS : I 've verified all ssh connections between the 3 nodes, and > all the fence paths as described in the cluster.conf. > Thanks again for your help. Check that the cluster ports are not blocked by any firewalling. You'll need 6809/udp & 21064/tcp opened. -- patrick From ugo.parsi at gmail.com Mon Apr 10 14:25:20 2006 From: ugo.parsi at gmail.com (Ugo PARSI) Date: Mon, 10 Apr 2006 16:25:20 +0200 Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) Message-ID: Hello, Do you know how to run GFS / linux-cluster suite under a 2.6.16 vanilla kernel ? All I've got is : /usr/src/cluster/dlm-kernel/src2/lockspace.c: In function `do_uevent': /usr/src/cluster/dlm-kernel/src2/lockspace.c:160: error: too many arguments to function `kobject_uevent' /usr/src/cluster/dlm-kernel/src2/lockspace.c:162: error: too many arguments to function `kobject_uevent' make[4]: *** [/usr/src/cluster/dlm-kernel/src2/lockspace.o] Error 1 I've removed the last argument in the kobject_uvent call wich was "NULL", it does compile, but I don't really know if it's safe to do this that way... Anyway, I'm stuck with another error which seem due to a missing include .h file (dlm.h) : libdlm.c:44:17: dlm.h: No such file or directory In file included from libdlm.c:46: libdlm.h:142: warning: `struct dlm_lksb' declared inside parameter list libdlm.h:142: warning: its scope is only this definition or declaration, which is probably not what you want libdlm.h:145: warning: `struct dlm_lksb' declared inside parameter list libdlm.h:156: warning: `struct dlm_lksb' declared inside parameter list libdlm.h:160: warning: `struct dlm_lksb' declared inside parameter list libdlm.h:210: warning: `struct dlm_lksb' declared inside parameter list libdlm.h:221: warning: `struct dlm_lksb' declared inside parameter list libdlm.h:225: warning: `struct dlm_lksb' declared inside parameter list libdlm.h:229: warning: `struct dlm_lksb' declared inside parameter list libdlm.c:47:24: dlm_device.h: No such file or directory libdlm.c:70: warning: `struct dlm_lock_result' declared inside parameter list libdlm.c:71: warning: `struct dlm_lock_result' declared inside parameter list libdlm.c:72: warning: `struct dlm_write_request' declared inside parameter list libdlm.c:120: error: field `lksb' has incomplete type libdlm.c: In function `unlock_resource': libdlm.c:215: error: `DLM_EUNLOCK' undeclared (first use in this function) libdlm.c:215: error: (Each undeclared identifier is reported only once libdlm.c:215: error: for each function it appears in.) libdlm.c: At top level: libdlm.c:268: warning: `struct dlm_write_request' declared inside parameter list libdlm.c: In function `set_version': libdlm.c:270: error: dereferencing pointer to incomplete type libdlm.c:270: error: `DLM_DEVICE_VERSION_MAJOR' undeclared (first use in this function) libdlm.c:271: error: dereferencing pointer to incomplete type Any ideas ? Thanks a lot, Ugo PARSI From jerome.castang at adelpha-lan.org Mon Apr 10 14:33:21 2006 From: jerome.castang at adelpha-lan.org (Castang Jerome) Date: Mon, 10 Apr 2006 16:33:21 +0200 Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: References: Message-ID: <443A6CB1.7010307@adelpha-lan.org> Ugo PARSI a ?crit : >Hello, > >Do you know how to run GFS / linux-cluster suite under a 2.6.16 vanilla kernel ? > >All I've got is : > >/usr/src/cluster/dlm-kernel/src2/lockspace.c: In function `do_uevent': >/usr/src/cluster/dlm-kernel/src2/lockspace.c:160: error: too many >arguments to function `kobject_uevent' >/usr/src/cluster/dlm-kernel/src2/lockspace.c:162: error: too many >arguments to function `kobject_uevent' >make[4]: *** [/usr/src/cluster/dlm-kernel/src2/lockspace.o] Error 1 > >I've removed the last argument in the kobject_uvent call wich was >"NULL", it does compile, but I don't really know if it's safe to do >this that way... > >Anyway, I'm stuck with another error which seem due to a missing >include .h file (dlm.h) : > >libdlm.c:44:17: dlm.h: No such file or directory >In file included from libdlm.c:46: >libdlm.h:142: warning: `struct dlm_lksb' declared inside parameter list >libdlm.h:142: warning: its scope is only this definition or >declaration, which is probably not what you want >libdlm.h:145: warning: `struct dlm_lksb' declared inside parameter list >libdlm.h:156: warning: `struct dlm_lksb' declared inside parameter list >libdlm.h:160: warning: `struct dlm_lksb' declared inside parameter list >libdlm.h:210: warning: `struct dlm_lksb' declared inside parameter list >libdlm.h:221: warning: `struct dlm_lksb' declared inside parameter list >libdlm.h:225: warning: `struct dlm_lksb' declared inside parameter list >libdlm.h:229: warning: `struct dlm_lksb' declared inside parameter list >libdlm.c:47:24: dlm_device.h: No such file or directory >libdlm.c:70: warning: `struct dlm_lock_result' declared inside parameter list >libdlm.c:71: warning: `struct dlm_lock_result' declared inside parameter list >libdlm.c:72: warning: `struct dlm_write_request' declared inside parameter list >libdlm.c:120: error: field `lksb' has incomplete type >libdlm.c: In function `unlock_resource': >libdlm.c:215: error: `DLM_EUNLOCK' undeclared (first use in this function) >libdlm.c:215: error: (Each undeclared identifier is reported only once >libdlm.c:215: error: for each function it appears in.) >libdlm.c: At top level: >libdlm.c:268: warning: `struct dlm_write_request' declared inside parameter list >libdlm.c: In function `set_version': >libdlm.c:270: error: dereferencing pointer to incomplete type >libdlm.c:270: error: `DLM_DEVICE_VERSION_MAJOR' undeclared (first use >in this function) >libdlm.c:271: error: dereferencing pointer to incomplete type > >Any ideas ? > >Thanks a lot, > >Ugo PARSI > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > > For the problem with dlm.h i found this: http://rpmfind.net/linux/RPM/fedora/updates/4/x86_64/debug/dlm-kernel-debuginfo-2.6.11.5-20050601.152643.FC4.21.x86_64.html Seems that dlm.h is provided by dlm-kernel-debuginfo . -- Jerome Castang mail: jcastang at adelpha-lan.org From ugo.parsi at gmail.com Mon Apr 10 14:39:25 2006 From: ugo.parsi at gmail.com (Ugo PARSI) Date: Mon, 10 Apr 2006 16:39:25 +0200 Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: <443A6CB1.7010307@adelpha-lan.org> References: <443A6CB1.7010307@adelpha-lan.org> Message-ID: > > For the problem with dlm.h i found this: > >http://rpmfind.net/linux/RPM/fedora/updates/4/x86_64/debug/dlm-kernel-debuginfo-2.6.11.5-20050601.152643.FC4.21.x86_64.html The link is dead :( > > Seems that dlm.h is provided by dlm-kernel-debuginfo > . > I've installed two packages on Debian # apt-cache search dlm libdlm-dev - Distributed lock manager - development files libdlm0 - Distributed lock manager - library Here's all I've got : # locate dlm.h /usr/include/libdlm.h /usr/src/cluster/dlm-kernel/src2/dlm.h /usr/src/cluster/dlm-kernel/src/dlm.h /usr/src/cluster/dlm/lib/libdlm.h /usr/src/cluster/gfs-kernel/src/dlm/lock_dlm.h /usr/src/cluster/gfs/lock_dlm/daemon/lock_dlm.h /usr/src/linux-2.6.16.1/fs/ocfs2/dlm/userdlm.h I'm trying your package, but I suppose it's redhat-only... Thanks, Ugo PARSI From jerome.castang at adelpha-lan.org Mon Apr 10 14:51:26 2006 From: jerome.castang at adelpha-lan.org (Castang Jerome) Date: Mon, 10 Apr 2006 16:51:26 +0200 Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: References: <443A6CB1.7010307@adelpha-lan.org> Message-ID: <443A70EE.4070907@adelpha-lan.org> Ugo PARSI a ?crit : >>For the problem with dlm.h i found this: >> >> >>>http://rpmfind.net/linux/RPM/fedora/updates/4/x86_64/debug/dlm-kernel-debuginfo-2.6.11.5-20050601.152643.FC4.21.x86_64.html >>> >>> > >The link is dead :( > > Link is dead ? It works perfectly for me... > > >>Seems that dlm.h is provided by dlm-kernel-debuginfo >>. >> >> >> > >I've installed two packages on Debian > ># apt-cache search dlm >libdlm-dev - Distributed lock manager - development files >libdlm0 - Distributed lock manager - library > > >Here's all I've got : > ># locate dlm.h >/usr/include/libdlm.h >/usr/src/cluster/dlm-kernel/src2/dlm.h >/usr/src/cluster/dlm-kernel/src/dlm.h >/usr/src/cluster/dlm/lib/libdlm.h >/usr/src/cluster/gfs-kernel/src/dlm/lock_dlm.h >/usr/src/cluster/gfs/lock_dlm/daemon/lock_dlm.h >/usr/src/linux-2.6.16.1/fs/ocfs2/dlm/userdlm.h > >I'm trying your package, but I suppose it's redhat-only... > >Thanks, > >Ugo PARSI > >--88 >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > > I suppose you can try to get this RH package and unpack it to get files and put them where they should be... -- Jerome Castang mail: jcastang at adelpha-lan.org From ugo.parsi at gmail.com Mon Apr 10 14:57:14 2006 From: ugo.parsi at gmail.com (Ugo PARSI) Date: Mon, 10 Apr 2006 16:57:14 +0200 Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: <443A70EE.4070907@adelpha-lan.org> References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> Message-ID: > I suppose you can try to get this RH package and unpack it to get files > and put them where they should be... > Well I've just did and it doesn't change pretty much :( Ugo PARSI From jerome.castang at adelpha-lan.org Mon Apr 10 15:16:18 2006 From: jerome.castang at adelpha-lan.org (Castang Jerome) Date: Mon, 10 Apr 2006 17:16:18 +0200 Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> Message-ID: <443A76C2.8070900@adelpha-lan.org> Ugo PARSI a ?crit : >>I suppose you can try to get this RH package and unpack it to get files >>and put them where they should be... >> >> >> > >Well I've just did and it doesn't change pretty much :( > >Ugo PARSI > > Have you tried to start with the cvs of Cluster Project ? I think cvs provides all you need. -- Jerome Castang mail: jcastang at adelpha-lan.org From basv at sara.nl Mon Apr 10 15:26:06 2006 From: basv at sara.nl (Bas van der Vlies) Date: Mon, 10 Apr 2006 17:26:06 +0200 Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> Message-ID: <443A790E.1040002@sara.nl> Ugo PARSI wrote: >> I suppose you can try to get this RH package and unpack it to get files >> and put them where they should be... >> > > Well I've just did and it doesn't change pretty much :( > > Ugo PARSI Ugo, Which version for GFS do you use cvs STABLE or HEAD? I have compiled deb-packages for kernel 2.6.16.2 and uses the CVS STABLE branch. Regards -- -- ******************************************************************** * * * Bas van der Vlies e-mail: basv at sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From carlopmart at gmail.com Mon Apr 10 15:52:20 2006 From: carlopmart at gmail.com (carlopmart) Date: Mon, 10 Apr 2006 17:52:20 +0200 Subject: [Linux-cluster] Question about manual fencing Message-ID: <443A7F34.7000901@gmail.com> Hi all, I would like to test manual fencing on two nodes for testing pourposes. I have read RedHat's docs about this but I don't see very clear. If I setup manual fencing, when one node shutdowns, the other node startups all services that I have configured on the another node automatically? Thanks. -- CL Martinez carlopmart {at} gmail {d0t} com From jerome.castang at adelpha-lan.org Mon Apr 10 15:59:14 2006 From: jerome.castang at adelpha-lan.org (Castang Jerome) Date: Mon, 10 Apr 2006 17:59:14 +0200 Subject: [Linux-cluster] Question about manual fencing In-Reply-To: <443A7F34.7000901@gmail.com> References: <443A7F34.7000901@gmail.com> Message-ID: <443A80D2.6050806@adelpha-lan.org> carlopmart a ?crit : > Hi all, > > I would like to test manual fencing on two nodes for testing > pourposes. I have read RedHat's docs about this but I don't see very > clear. If I setup manual fencing, when one node shutdowns, the other > node startups all services that I have configured on the another node > automatically? > > Thanks. > I don't think so. Fencing a node is to stop it, or make it leaving the cluster (using any method like shutdown...) So if you use manual fencing, the other nodes will not start automaticly their services... -- Jerome Castang mail: jcastang at adelpha-lan.org From tf0054 at gmail.com Sat Apr 8 16:23:05 2006 From: tf0054 at gmail.com (=?ISO-2022-JP?B?GyRCQ2ZMbkxUGyhC?=) Date: Sun, 9 Apr 2006 01:23:05 +0900 Subject: [Linux-cluster] Cisco fence agent Message-ID: Hi all. Do anyone have cisco catalyst fence agent? If nobody make that, I will make. Thanks. From Bowie_Bailey at BUC.com Mon Apr 10 16:09:03 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Mon, 10 Apr 2006 12:09:03 -0400 Subject: [Linux-cluster] newbie: gfs merge Message-ID: <4766EEE585A6D311ADF500E018C154E3021338C7@bnifex.cis.buc.com> wolfgang pauli wrote: > > > > > > Hm... Thanks for you answer! I am definetelly confused a bit. Even > > > after reading you post of last week. I understand that i can not > > > merge the file systems. Our setup is very basic. We have to linux > > > machines who could act as file server and we thought that we could > > > one (A) have working as an active backup of the other (B). Is that > > > what the documentation calls a failover domain, with (B) being the > > > failover "domain" for (A)? Until now, we were running rsync at > > > night, so that if the first of the two servers failed, clients > > > could mount the NFS from the other server. There is nothing fancy > > > here, like a SAN I guess, just machines connected via ethernet > > > switches. So basically the question is, whether it is possible to > > > keep the filesystems on the two servers in total sync, so that it > > > would not matter whether clients mount the remote share from (A) > > > or (B). Whether the clients would automatically be able to mount > > > the GFS from (B), if (A) fails. > > > > No, GFS doesn't work quite like that. What you have is something > > more like this: Two machines, (A) and (B), are file servers. A > > third machine, (C), is either a linux box exporting it's filesystem > > via GNBD, or a dedicated storage box running iSCSI, AoE, or > > something similar that will allow multiple connections. (A) and > > (B) are both connected to the GFS filesystem exported by (C). If > > either (A) or (B) goes down, the other one can continue serving the > > data from (C). They don't need to be synchronized because they are > > using the same physical storage. And, if the application permits, > > you can even run them both simultaneously. > > > > You are looking for something different. There is a project out > > there for that, but it is not production ready at this point. > > Maybe someone else remembers the name. > > Oh, OK. This would makes sense to me. But I still have some > questions.. > > 1. Would this reduce the load on (C)? Reduce it from what? (C) would be a completely different type of machine from (A) and (B). (A) and (B) are application systems, while (C) is just a fileserver. (C) would not need to be quite as fast as the others, just fast enough to keep up with the I/O requirements of the storage and the GFS/Cluster overhead. > 2. I know how to export the gfs from (C) and mount it on (A) and (B), > but how to the clients know whether they should connect to (A) or > (B). Is this managed my clvmd? No, this is managed by your network. If (A) and (B) are running the same software, it doesn't matter which one they connect to. On my system, I have a Foundry ServerIron that load-balances the two machines. You can also do it using LVS software, such as the stuff in the Linux HA project. -- Bowie From schlegel at riege.com Mon Apr 10 16:20:20 2006 From: schlegel at riege.com (Gunther Schlegel) Date: Tue, 11 Apr 2006 00:20:20 +0800 Subject: [Linux-cluster] gfs file locking Message-ID: <443A85C4.2060608@riege.com> Hi, does GFS support the same ways of file locking a local filesystem does? I am evaluating to put an application on gfs that runs pretty fine on local filesystems but tends to have severe problems on NFS. I know NFS is totally different from GFS, but from the applications point of view both are just filesystems. best regards, Gunther -------------- next part -------------- A non-text attachment was scrubbed... Name: schlegel.vcf Type: text/x-vcard Size: 344 bytes Desc: not available URL: From ugo.parsi at gmail.com Mon Apr 10 16:53:41 2006 From: ugo.parsi at gmail.com (Ugo PARSI) Date: Mon, 10 Apr 2006 18:53:41 +0200 Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> <443A76C2.8070900@adelpha-lan.org> Message-ID: Reposting sorry : On 4/10/06, Ugo PARSI wrote: > > Have you tried to start with the cvs of Cluster Project ? > > I think cvs provides all you need. > > > > Well, that's the only thing I did....I guess ?! > > I've followed that document indeed : > > http://sources.redhat.com/cluster/doc/usage.txt > > So I did a cvs -d :pserver:cvs at sources.redhat.com:/cvs/cluster > checkout cluster > > Is that okay ? > > Thanks a lot, > > Ugo PARSI > From ugo.parsi at gmail.com Mon Apr 10 16:57:02 2006 From: ugo.parsi at gmail.com (Ugo PARSI) Date: Mon, 10 Apr 2006 18:57:02 +0200 Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: <443A790E.1040002@sara.nl> References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> <443A790E.1040002@sara.nl> Message-ID: > Which version for GFS do you use cvs STABLE or HEAD? > I don't know how to tell... Is stable this thing ? - The 'cluster' cvs head can be unstable, so it's recommended that you checkout from the RHEL4 branch -- 'checkout -r RHEL4 cluster' I've tried both with or without anyway.... > I have compiled deb-packages for kernel 2.6.16.2 and uses the CVS > STABLE branch. > >From a vanilla kernel ? Because basically, I've just tried all of this from a fresh vanilla 2.6.16.1 (I'm gonna try the 2.6.16.2) downloaded from kernel.org. System was running that kernel at time of compilation, and I provided the path of the kernel to the configure script. Anything wrong ? Any ideas ? You've made some fixes/patches ? Thanks a lot, Ugo PARSI From basv at sara.nl Mon Apr 10 19:24:58 2006 From: basv at sara.nl (Bas van der Vlies) Date: Mon, 10 Apr 2006 21:24:58 +0200 Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> <443A790E.1040002@sara.nl> Message-ID: > >> I have compiled deb-packages for kernel 2.6.16.2 and uses the CVS >> STABLE branch. >> > > You have to download, from cvs STABLE: cvs -d :pserver:cvs at sources.redhat.com:/cvs/cluster checkout -r STABLE cluster Some packages need header files that are provided by others. So you most install them before compiling the rest. I have made debian package scripts for all cluster packages. If i have some time a will put them on our ftp-server. I have made a small document it is in dutch, but it is not that difficult. You have to install each package before building the others. It make life for me easier then examine all the dependencies. cd cluster/cman-kernel dch -i (vullen met juiste kernel versie) debian/rules clean debian/rules build debian/rules binary dpkg -i ../cman-kernel_.deb depmod -a Nu de volgende delen maken op de bovenstaande manier: dlm-kernel cd juiste pad dpkg -i ../dlm-kernel_.deb gnbd-kernel dpkg -i ../gnbd-kernel_.deb gfs-kernel dpkg -i ../gfs-kernel_.deb Nu de volgende kernel onafhankelijke delen bouwen: magma dch -i (juiste cvs versie) debian/rules clean debian/rules binary dpkg -i ../magma.deb idem: iddev dpkg -i ../iddev.deb ccs dpkg -i ../ccs.deb cman dlm dpkg -i ../dlm.deb gnbd gfs fence gulm dpkg -i ../gulm.deb magma-plugins rgmanager -- Bas van der Vlies basv at sara.nl From ocrete at max-t.com Mon Apr 10 21:01:47 2006 From: ocrete at max-t.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=) Date: Mon, 10 Apr 2006 17:01:47 -0400 Subject: [Linux-cluster] cman kickout out nodes for no good reason In-Reply-To: <1144341281.355.38.camel@cocagne.max-t.internal> References: <1144341281.355.38.camel@cocagne.max-t.internal> Message-ID: <1144702908.21093.7.camel@cocagne.max-t.internal> On Thu, 2006-06-04 at 12:34 -0400, Olivier Cr?te wrote: > I have a strange problem where cman suddenly starts kicking out members > of the cluster with "Inconsistent cluster view" when I join a new node > (sometimes). It takes a few minutes between each kicking. I'm using a > snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is > in transition state at that point and I can't stop/start services or do > anything else. It did not do that with a snapshot I took a few months > ago. Its still happening, the node that joins says "Transition master unknown", while all of the other nodes who the master is, then the master gets kicked out. Then a new master is selected, all of the nodes seem to know who the master is, but refuse to act on it. After a while, the new master is kicked out and the process restarts. I guess its related to the changes with the timestamps to prevent master desync, I dont see any other recent change that could have caused it. -- Olivier Cr?te ocrete at max-t.com Maximum Throughput Inc. From ookami at gmx.de Mon Apr 10 23:07:48 2006 From: ookami at gmx.de (wolfgang pauli) Date: Tue, 11 Apr 2006 01:07:48 +0200 (MEST) Subject: [Linux-cluster] hangs when copying with gnbd and gfs References: <22376.1144551295@www084.gmx.net> Message-ID: <28595.1144710468@www031.gmx.net> > Could this be related to automount? I just tried it again copied back a > forth some mpg files and everything worked fine. But then I copied another > file (230MB of /dev/zero) and the copying froze. The only think I could > find in the log file was this: > Apr 8 20:44:26 echo automount[5176]: failed to mount /misc/.directory > Apr 8 20:44:26 echo automount[5177]: failed to mount /misc/.directory > Apr 8 20:44:26 echo automount[5178]: >> /usr/sbin/showmount: can't get > address for .directory > Apr 8 20:44:26 echo automount[5178]: lookup(program): lookup > for .directory failed > Apr 8 20:44:26 echo automount[5178]: failed to mount /net/.directory > Apr 8 20:44:26 echo automount[5183]: >> /usr/sbin/showmount: can't get > address for .directory > Apr 8 20:44:26 echo automount[5183]: lookup(program): lookup > for .directory failed > Apr 8 20:44:26 echo automount[5183]: failed to mount /net/.directory > > Another question I have is whether it is possible to mount the gfs on the > server while it gnbd-exports the filesystem? > > wolfgang > OK, I think I solved it. I switched from GNBD to iSCSI. I have iscsitarget running on the server and open-iscsi on the client. I had to export the logical volume rather then the war device to be able to mount it on the client. -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From forigato at gmail.com Mon Apr 10 23:57:16 2006 From: forigato at gmail.com (ANDRE LUIS FORIGATO) Date: Mon, 10 Apr 2006 20:57:16 -0300 Subject: [Linux-cluster] Help-me, Please Message-ID: <9e7b71460604101657n1eebc099jfaabb5a08ebbc630@mail.gmail.com> Linux xlx2 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005 i686 i686 i386 GNU/Linux Redhat-config-cluster 1.0.3 clumanager 1.2.22 Apr 10 01:18:07 xlx2 clusvcmgrd[4671]: Couldn't connect to member #0: Connection timed out Apr 10 05:13:43 xlx2 clusvcmgrd[4671]: Couldn't connect to member #0: Connection timed out Apr 10 05:13:43 xlx2 clusvcmgrd[4671]: Unable to obtain cluster lock: No locks available Apr 10 05:13:49 xlx2 cluquorumd[4463]: Disk-TB: Partner is DOWN (Dead/Hung) Apr 10 05:13:54 xlx2 cluquorumd[4463]: Disk-TB: State Change: Partner UP Apr 10 10:47:08 xlx2 clusvcmgrd[4671]: Couldn't connect to member #0: Connection timed out Apr 10 10:47:08 xlx2 clusvcmgrd[4671]: Unable to obtain cluster lock: No locks available Apr 10 11:30:59 xlx2 clusvcmgrd[4671]: Couldn't connect to member #0: Connection timed out Apr 10 11:30:59 xlx2 clusvcmgrd[4671]: Unable to obtain cluster lock: No locks available Apr 10 11:31:07 xlx2 clumembd[4493]: Membership View #5:0x00000002 Apr 10 11:31:08 xlx2 cluquorumd[4463]: Membership reports #0 as down, but disk reports as up: State uncertain! Apr 10 11:31:08 xlx2 cluquorumd[4463]: --> Commencing STONITH <-- Apr 10 11:31:08 xlx2 cluquorumd[4463]: Disk-TB: Partner is DOWN (Dead/Hung) Apr 10 11:31:10 xlx2 cluquorumd[4463]: Disk-TB: State Change: Partner UP Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: Quorum Event: View #12 0x00000002 Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: Member 200.254.254.171's state is uncertain: Some services may be unavailable! Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: Quorum Event: View #13 0x00000002 Apr 10 11:31:29 xlx2 clusvcmgrd[4671]: Couldn't connect to member #0: Connection timed out Apr 10 11:31:29 xlx2 clusvcmgrd[4671]: Unable to obtain cluster lock: No locks available Apr 10 11:31:34 xlx2 cluquorumd[4463]: Disk-TB: Partner is DOWN (Dead/Hung) Apr 10 11:31:38 xlx2 cluquorumd[4463]: --> Commencing STONITH <-- Apr 10 11:31:38 xlx2 cluquorumd[4463]: STONITH: Falsely claiming that 200.254.254.171 has been fenced Apr 10 11:31:38 xlx2 cluquorumd[4463]: STONITH: Data integrity may be compromised! Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: Couldn't connect to member #0: Connection timed out Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: Unable to obtain cluster lock: No locks available Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: Quorum Event: View #15 0x00000002 Apr 10 11:31:41 xlx2 clusvcmgrd[4671]: State change: 200.254.254.172 DOWN Apr 10 11:34:08 xlx2 cluquorumd[4463]: Disk-TB: State Change: Partner UP Apr 10 11:34:09 xlx2 clusvcmgrd[4671]: Quorum Event: View #16 0x00000002 Apr 10 11:34:16 xlx2 clusvcmgrd[4671]: Couldn't connect to member #0: No route to host Apr 10 11:34:16 xlx2 clusvcmgrd[4671]: Unable to obtain cluster lock: No locks available Apr 10 11:34:25 xlx2 clusvcmgrd[4671]: Couldn't connect to member #0: No route to host Apr 10 11:34:25 xlx2 clusvcmgrd[4671]: Unable to obtain cluster lock: No locks available Apr 10 11:34:34 xlx2 clusvcmgrd[4671]: Couldn't connect to member #0: No route to host Apr 10 11:34:34 xlx2 clusvcmgrd[4671]: Unable to obtain cluster lock: No locks available Apr 10 11:34:43 xlx2 clusvcmgrd[4671]: Couldn't connect to member #0: No route to host Apr 10 11:34:43 xlx2 clusvcmgrd[4671]: Unable to obtain cluster lock: No locks available Apr 10 11:34:50 xlx2 clumembd[4493]: Member 200.254.254.171 UP Apr 10 11:34:50 xlx2 clumembd[4493]: Membership View #6:0x00000003 Apr 10 11:34:50 xlx2 cluquorumd[4463]: __msg_send: Incomplete write to 13. Error: Connection reset by peer Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: Quorum Event: View #17 0x00000003 Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: State change: Local UP Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: State change: 200.254.254.171 UP Apr 10 13:21:25 xlx2 clusvcmgrd[4671]: Couldn't connect to member #0: Connection timed out Apr 10 17:03:22 xlx2 clusvcmgrd[4671]: Couldn't connect to member #0: Connection timed out Apr 10 20:30:30 xlx2 clulockd[4498]: Denied 200.254.254.171: Broken pipe Apr 10 20:30:30 xlx2 clulockd[4498]: select error: Broken pipe Att, Forigas From Alain.Moulle at bull.net Tue Apr 11 06:08:57 2006 From: Alain.Moulle at bull.net (Alain Moulle) Date: Tue, 11 Apr 2006 08:08:57 +0200 Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster Message-ID: <443B47F9.6090506@bull.net> >Hi >> >> I'm trying to configure a simple 3 nodes cluster with simple >> tests scripts. >> But I can't start cman, it remains stalled with this message >> in syslog : >> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr 10 >> 11:38:00 s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005 >> 16:04:34) installed >> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered >> protocol family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]: >> cluster.conf (cluster name = HA_METADATA_3N, version = 8) found. >> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to join >> or form a Linux-cluster Apr 10 11:38:01 s_sys at yack21 >> ccsd[25004]: Connected to cluster infrastruture >> via: CMAN/SM Plugin v1.1.2 >> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status:: >> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN: >> forming a new cluster >> >> and nothing more. >> >> The graphic tool dos not detect any error in configuration; I >> 've attached my cluster.conf for the three nodes, knowing >> that I wanted two nodes (yack10 and yack21) running theirs >> applications and the 3rd one (yack23) as a backup for yack10 >> and/or yack21, but I don't want any failover between yack10 >> and yack21. >> >> PS : I 've verified all ssh connections between the 3 nodes, >> and all the fence paths as described in the cluster.conf. >> Thanks again for your help. >> >> Alain >> >Are you starting the cman on all three nodes in the same time? A node doesn't >start until each other node is starting. Timing is important during booting. >Leandro Hi, no I wasn't ... I've tried now, and this is ok on yack21 and yack23, but not on yack10, is there something wrong in the cluster.conf to explain this behavior ? On yack10 , cman is trying to : CMAN: forming a new cluster but fails with a timeout ... ?? Thanks Alain -- mailto:Alain.Moulle at bull.net +------------------------------+--------------------------------+ | Alain Moull? | from France : 04 76 29 75 99 | | | FAX number : 04 76 29 72 49 | | Bull SA | | | 1, Rue de Provence | Adr : FREC B1-041 | | B.P. 208 | | | 38432 Echirolles - CEDEX | Email: Alain.Moulle at bull.net | | France | BCOM : 229 7599 | +-------------------------------+-------------------------------+ -------------- next part -------------- A non-text attachment was scrubbed... Name: cluster.conf Type: text/xml Size: 1500 bytes Desc: not available URL: From l.dardini at comune.prato.it Tue Apr 11 06:59:13 2006 From: l.dardini at comune.prato.it (Leandro Dardini) Date: Tue, 11 Apr 2006 08:59:13 +0200 Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster Message-ID: <0C5C8B118420264EBB94D7D7050150011EFAEB@exchange2.comune.prato.local> > -----Messaggio originale----- > Da: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] Per conto di Alain Moulle > Inviato: marted? 11 aprile 2006 8.09 > A: linux-cluster at redhat.com > Oggetto: R: [Linux-cluster] CS4 U2 / problem to configure a 3 > nodes cluster > > >Hi > >> > >> I'm trying to configure a simple 3 nodes cluster with simple tests > >> scripts. > >> But I can't start cman, it remains stalled with this message in > >> syslog : > >> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr > 10 11:38:00 > >> s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005 > >> 16:04:34) installed > >> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered protocol > >> family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]: > >> cluster.conf (cluster name = HA_METADATA_3N, version = 8) found. > >> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to > join or form > >> a Linux-cluster Apr 10 11:38:01 s_sys at yack21 > >> ccsd[25004]: Connected to cluster infrastruture > >> via: CMAN/SM Plugin v1.1.2 > >> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status:: > >> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN: > >> forming a new cluster > >> > >> and nothing more. > >> > >> The graphic tool dos not detect any error in configuration; I 've > >> attached my cluster.conf for the three nodes, knowing that > I wanted > >> two nodes (yack10 and yack21) running theirs applications > and the 3rd > >> one (yack23) as a backup for yack10 and/or yack21, but I > don't want > >> any failover between yack10 and yack21. > >> > >> PS : I 've verified all ssh connections between the 3 > nodes, and all > >> the fence paths as described in the cluster.conf. > >> Thanks again for your help. > >> > >> Alain > >> > > > >Are you starting the cman on all three nodes in the same > time? A node > >doesn't start until each other node is starting. Timing is > important during booting. > > >Leandro > > Hi, no I wasn't ... > I've tried now, and this is ok on yack21 and yack23, but not > on yack10, is there something wrong in the cluster.conf to > explain this behavior ? > On yack10 , cman is trying to : > CMAN: forming a new cluster > but fails with a timeout ... > > ?? > Thanks > Alain > -- > Maybe this time is due to a firewall setup, as already stated on the list. A tcpdump from yack10 to the other nodes may help you catch the bug. Leandro From ugo.parsi at gmail.com Tue Apr 11 07:44:56 2006 From: ugo.parsi at gmail.com (Ugo PARSI) Date: Tue, 11 Apr 2006 09:44:56 +0200 Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> <443A790E.1040002@sara.nl> Message-ID: > You have to download, from cvs STABLE: > cvs -d :pserver:cvs at sources.redhat.com:/cvs/cluster checkout -r > STABLE cluster > Ok I've tried it, thanks, it does seem to work better but I have still issues.... This time there's no kernel issues....but another missing .h file : [...] make[2]: Entering directory `/usr/src/cluster/cman/lib' gcc -Wall -g -O -I. -fPIC -I/usr/src/cluster/build/incdir/cluster -c -o libcman.o libcman.c libcman.c:31:35: cluster/cnxman-socket.h: No such file or directory libcman.c:44: warning: `struct cl_cluster_node' declared inside parameter list libcman.c:44: warning: its scope is only this definition or declaration, which is probably not what you want libcman.c: In function `copy_node': libcman.c:46: error: dereferencing pointer to incomplete type libcman.c:47: error: dereferencing pointer to incomplete type [...] > Some packages need header files that are provided by others. So you > most install them > before compiling the rest. I have made debian package scripts for > all cluster packages. True, but well, that's what the main Makefile is doing, right ? [....] cd cman-kernel && ${MAKE} install ${MAKELINE} cd dlm-kernel && ${MAKE} install ${MAKELINE} cd gfs-kernel && ${MAKE} install ${MAKELINE} cd gnbd-kernel && ${MAKE} install ${MAKELINE} cd magma && ${MAKE} install ${MAKELINE} cd ccs && ${MAKE} install ${MAKELINE} [....] So I don't see what you are doing more.... except the fact you are building Debian packages ? Thanks a lot, Ugo PARSI From pcaulfie at redhat.com Tue Apr 11 07:47:52 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Tue, 11 Apr 2006 08:47:52 +0100 Subject: [Linux-cluster] cman kickout out nodes for no good reason In-Reply-To: <1144702908.21093.7.camel@cocagne.max-t.internal> References: <1144341281.355.38.camel@cocagne.max-t.internal> <1144702908.21093.7.camel@cocagne.max-t.internal> Message-ID: <443B5F28.1060004@redhat.com> Olivier Cr?te wrote: > On Thu, 2006-06-04 at 12:34 -0400, Olivier Cr?te wrote: >> I have a strange problem where cman suddenly starts kicking out members >> of the cluster with "Inconsistent cluster view" when I join a new node >> (sometimes). It takes a few minutes between each kicking. I'm using a >> snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is >> in transition state at that point and I can't stop/start services or do >> anything else. It did not do that with a snapshot I took a few months >> ago. > > Its still happening, the node that joins says "Transition master > unknown", while all of the other nodes who the master is, then the > master gets kicked out. Then a new master is selected, all of the nodes > seem to know who the master is, but refuse to act on it. After a while, > the new master is kicked out and the process restarts. I guess its > related to the changes with the timestamps to prevent master desync, I > dont see any other recent change that could have caused it. > That's very peculiar behaviour, and it's going to be hard to pin down. How consistently does it happen ? It could be caused by extreme network packet loss, or something blocking the progress of cman processes. Are the already joined nodes very busy when you bring the new node into the cluster (if so, doing what?) I think the best way to try and track this down is to get a tcpdump of the cluster traffic (port 6809/udp) happening at the time of the join - make sure that all nodes are included in the dump and that all of the packet is captured. -- patrick From pcaulfie at redhat.com Tue Apr 11 08:46:15 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Tue, 11 Apr 2006 09:46:15 +0100 Subject: [Linux-cluster] DLM messages In-Reply-To: <4427CB55.2060203@sara.nl> References: <20060327084643.GB27410@redhat.com> <4427AA3F.3040009@sara.nl> <4427CB55.2060203@sara.nl> Message-ID: <443B6CD7.8050704@redhat.com> > === FS2 == > Mar 27 12:28:25 ifs2 kernel: ------------[ cut here ]------------ > Mar 27 12:28:25 ifs2 kernel: kernel BUG at > /usr/src/gfs/stable_1.0.2/stable/cluster/cman-kernel/src/membership.c:3151! > Mar 27 12:28:25 ifs2 kernel: invalid opcode: 0000 [#1] > Mar 27 12:28:25 ifs2 kernel: SMP > Mar 27 12:28:25 ifs2 kernel: Modules linked in: lock_dlm dlm cman > dm_round_robin dm_multipath sg ide_floppy ide_cd cdrom qla2xxx siimage piix > e1000 gfs lock_harness dm_mod > Mar 27 12:28:25 ifs2 kernel: CPU: 0 > Mar 27 12:28:25 ifs2 kernel: EIP: 0060:[] Tainted: GF VLI > Mar 27 12:28:25 ifs2 kernel: EFLAGS: 00010246 (2.6.16-rc5-sara3 #1) > Mar 27 12:28:25 ifs2 kernel: EIP is at elect_master+0x34/0x41 [cman] That cman crash looks nasty, though it may be related to "disabing the heartbeat-network interface". Is this the node you are referring to ? -- patrick From basv at sara.nl Tue Apr 11 10:13:33 2006 From: basv at sara.nl (Bas van der Vlies) Date: Tue, 11 Apr 2006 12:13:33 +0200 Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> <443A790E.1040002@sara.nl> Message-ID: <443B814D.6030706@sara.nl> Ugo PARSI wrote: >> You have to download, from cvs STABLE: >> cvs -d :pserver:cvs at sources.redhat.com:/cvs/cluster checkout -r >> STABLE cluster >> > > Ok I've tried it, thanks, it does seem to work better but I have still > issues.... > This time there's no kernel issues....but another missing .h file : > > [...] > make[2]: Entering directory `/usr/src/cluster/cman/lib' > gcc -Wall -g -O -I. -fPIC -I/usr/src/cluster/build/incdir/cluster -c > -o libcman.o libcman.c > libcman.c:31:35: cluster/cnxman-socket.h: No such file or directory > libcman.c:44: warning: `struct cl_cluster_node' declared inside parameter list > libcman.c:44: warning: its scope is only this definition or > declaration, which is probably not what you want > libcman.c: In function `copy_node': > libcman.c:46: error: dereferencing pointer to incomplete type > libcman.c:47: error: dereferencing pointer to incomplete type > [...] > This a bug i reported it to his list, but no replies. I think i removed the cluster from the include cluster/cnxman-socket.h line. Your are using debian or not. I can put the deb-packages that kernel independed on our ftp-server. No warranty they include all init.d script and start at runlevel 3. When i machine start in starts at runlevel 2, not in cluster enabled mode. To enable cluster mode we do a init 3, to can remove a node from a cluster with the init 2 command. Regards -- -- ******************************************************************** * * * Bas van der Vlies e-mail: basv at sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From basv at sara.nl Tue Apr 11 10:19:45 2006 From: basv at sara.nl (Bas van der Vlies) Date: Tue, 11 Apr 2006 12:19:45 +0200 Subject: [Linux-cluster] DLM messages In-Reply-To: <443B6CD7.8050704@redhat.com> References: <20060327084643.GB27410@redhat.com> <4427AA3F.3040009@sara.nl> <4427CB55.2060203@sara.nl> <443B6CD7.8050704@redhat.com> Message-ID: <443B82C1.7010603@sara.nl> Patrick Caulfield wrote: >> === FS2 == >> Mar 27 12:28:25 ifs2 kernel: ------------[ cut here ]------------ >> Mar 27 12:28:25 ifs2 kernel: kernel BUG at >> /usr/src/gfs/stable_1.0.2/stable/cluster/cman-kernel/src/membership.c:3151! >> Mar 27 12:28:25 ifs2 kernel: invalid opcode: 0000 [#1] >> Mar 27 12:28:25 ifs2 kernel: SMP >> Mar 27 12:28:25 ifs2 kernel: Modules linked in: lock_dlm dlm cman >> dm_round_robin dm_multipath sg ide_floppy ide_cd cdrom qla2xxx siimage piix >> e1000 gfs lock_harness dm_mod >> Mar 27 12:28:25 ifs2 kernel: CPU: 0 >> Mar 27 12:28:25 ifs2 kernel: EIP: 0060:[] Tainted: GF VLI >> Mar 27 12:28:25 ifs2 kernel: EFLAGS: 00010246 (2.6.16-rc5-sara3 #1) >> Mar 27 12:28:25 ifs2 kernel: EIP is at elect_master+0x34/0x41 [cman] > > That cman crash looks nasty, though it may be related to "disabing the > heartbeat-network interface". Is this the node you are referring to ? > As i read the thread this must be the node that i disabled the heartbeat-network. So the other nodes could fence this node and they did but the other nodes also crashed. Regards -- -- ******************************************************************** * * * Bas van der Vlies e-mail: basv at sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From Alain.Moulle at bull.net Tue Apr 11 10:58:30 2006 From: Alain.Moulle at bull.net (Alain Moulle) Date: Tue, 11 Apr 2006 12:58:30 +0200 Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster Message-ID: <443B8BD6.80906@bull.net> >>Hi >> >>>> >> >>>> >> I'm trying to configure a simple 3 nodes cluster with simple tests >>>> >> scripts. >>>> >> But I can't start cman, it remains stalled with this message in >>>> >> syslog : >>>> >> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr > >> 10 11:38:00 > >>>> >> s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005 >>>> >> 16:04:34) installed >>>> >> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered protocol >>>> >> family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]: >>>> >> cluster.conf (cluster name = HA_METADATA_3N, version = 8) found. >>>> >> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to > >> join or form > >>>> >> a Linux-cluster Apr 10 11:38:01 s_sys at yack21 >>>> >> ccsd[25004]: Connected to cluster infrastruture >>>> >> via: CMAN/SM Plugin v1.1.2 >>>> >> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status:: >>>> >> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN: >>>> >> forming a new cluster >>>> >> >>>> >> and nothing more. >>>> >> >>>> >> The graphic tool dos not detect any error in configuration; I 've >>>> >> attached my cluster.conf for the three nodes, knowing that > >> I wanted > >>>> >> two nodes (yack10 and yack21) running theirs applications > >> and the 3rd > >>>> >> one (yack23) as a backup for yack10 and/or yack21, but I > >> don't want > >>>> >> any failover between yack10 and yack21. >>>> >> >>>> >> PS : I 've verified all ssh connections between the 3 > >> nodes, and all > >>>> >> the fence paths as described in the cluster.conf. >>>> >> Thanks again for your help. >>>> >> >>>> >> Alain >>>> >> > >> >> > >>> >Are you starting the cman on all three nodes in the same > >> time? A node > >>> >doesn't start until each other node is starting. Timing is > >> important during booting. >> > >>> >Leandro > >> >> Hi, no I wasn't ... >> I've tried now, and this is ok on yack21 and yack23, but not >> on yack10, is there something wrong in the cluster.conf to >> explain this behavior ? >> On yack10 , cman is trying to : >> CMAN: forming a new cluster >> but fails with a timeout ... >> >> ?? >> Thanks >> Alain >> -- >> >Maybe this time is due to a firewall setup, as already stated on the list. A >tcpdump from yack10 to the other nodes may help you catch the bug. >Leandro No firewall setup on yack10, neither on yack21 nor yack23. Besides the ssh connections are all valid between the three nodes in all combinations without passwd request. And still the problem ... Any other idea ? Is my cluster.conf correct ? Besides, with regard to you first answer, I've tested on yack21 and yack23 : if I start cman only on yack21, it does end in timeout. And if I start cman quite at the same time on yack21 and yack23, it works on both nodes. I haven't found in documentation any recommandation about this point. Besides, if one node is breakdowned, that mean that we will never be able to reboot the other node and launch the CS4 again with all applications ... sounds strange, doesn't it ? Thanks Alain Moull? From pcaulfie at redhat.com Tue Apr 11 11:52:23 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Tue, 11 Apr 2006 12:52:23 +0100 Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster In-Reply-To: <443B8BD6.80906@bull.net> References: <443B8BD6.80906@bull.net> Message-ID: <443B9877.2020505@redhat.com> Alain Moulle wrote: >> Maybe this time is due to a firewall setup, as already stated on the list. A >> tcpdump from yack10 to the other nodes may help you catch the bug. >> Leandro > > No firewall setup on yack10, neither on yack21 nor yack23. Besides > the ssh connections are all valid between the three nodes in all > combinations without passwd request. And still the problem ... > Any other idea ? > Is my cluster.conf correct ? > > Besides, with regard to you first answer, I've tested on yack21 and yack23 : > if I start cman only on yack21, it does end in timeout. > And if I start cman quite at the same time on yack21 and yack23, it > works on both nodes. > I haven't found in documentation any recommandation about this point. > Besides, if one node is breakdowned, that mean that we will never be > able to reboot the other node and launch the CS4 again with all > applications ... sounds strange, doesn't it ? > Can you be a little clearer exactly what you mean by this? and post some exact messages please. It's not clear to me now just what your problem is. >From your initial post it sounded like the nodes in the cluster were forming separate clusters, but that last sentence makes it sound like you're seeing something else. -- patrick From l.dardini at comune.prato.it Tue Apr 11 12:48:35 2006 From: l.dardini at comune.prato.it (Leandro Dardini) Date: Tue, 11 Apr 2006 14:48:35 +0200 Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster Message-ID: <0C5C8B118420264EBB94D7D7050150011EFAFC@exchange2.comune.prato.local> > -----Messaggio originale----- > Da: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] Per conto di Alain Moulle > Inviato: marted? 11 aprile 2006 12.59 > A: linux-cluster at redhat.com > Oggetto: R: [Linux-cluster] CS4 U2 / problem to configure a 3 > nodes cluster > > >>Hi > >> > >>>> >> > >>>> >> I'm trying to configure a simple 3 nodes cluster with simple > >>>> >> tests scripts. > >>>> >> But I can't start cman, it remains stalled with this > message in > >>>> >> syslog : > >>>> >> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr > > > >> 10 11:38:00 > > > >>>> >> s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005 > >>>> >> 16:04:34) installed > >>>> >> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: > Registered protocol > >>>> >> family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]: > >>>> >> cluster.conf (cluster name = HA_METADATA_3N, version > = 8) found. > >>>> >> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to > > > >> join or form > > > >>>> >> a Linux-cluster Apr 10 11:38:01 s_sys at yack21 > >>>> >> ccsd[25004]: Connected to cluster infrastruture > >>>> >> via: CMAN/SM Plugin v1.1.2 > >>>> >> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status:: > >>>> >> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN: > >>>> >> forming a new cluster > >>>> >> > >>>> >> and nothing more. > >>>> >> > >>>> >> The graphic tool dos not detect any error in configuration; I > >>>> >> 've attached my cluster.conf for the three nodes, knowing that > > > >> I wanted > > > >>>> >> two nodes (yack10 and yack21) running theirs applications > > > >> and the 3rd > > > >>>> >> one (yack23) as a backup for yack10 and/or yack21, but I > > > >> don't want > > > >>>> >> any failover between yack10 and yack21. > >>>> >> > >>>> >> PS : I 've verified all ssh connections between the 3 > > > >> nodes, and all > > > >>>> >> the fence paths as described in the cluster.conf. > >>>> >> Thanks again for your help. > >>>> >> > >>>> >> Alain > >>>> >> > > > >> > >> > > > >>> >Are you starting the cman on all three nodes in the same > > > >> time? A node > > > >>> >doesn't start until each other node is starting. Timing is > > > >> important during booting. > >> > > > >>> >Leandro > > > >> > >> Hi, no I wasn't ... > >> I've tried now, and this is ok on yack21 and yack23, but not on > >> yack10, is there something wrong in the cluster.conf to > explain this > >> behavior ? > >> On yack10 , cman is trying to : > >> CMAN: forming a new cluster > >> but fails with a timeout ... > >> > >> ?? > >> Thanks > >> Alain > >> -- > >> > > > >Maybe this time is due to a firewall setup, as already stated on the > >list. A tcpdump from yack10 to the other nodes may help you > catch the bug. > >Leandro > > No firewall setup on yack10, neither on yack21 nor yack23. > Besides the ssh connections are all valid between the three > nodes in all combinations without passwd request. And still > the problem ... > Any other idea ? > Is my cluster.conf correct ? > > Besides, with regard to you first answer, I've tested on > yack21 and yack23 : > if I start cman only on yack21, it does end in timeout. > And if I start cman quite at the same time on yack21 and > yack23, it works on both nodes. > I haven't found in documentation any recommandation about this point. > Besides, if one node is breakdowned, that mean that we will > never be able to reboot the other node and launch the CS4 > again with all applications ... sounds strange, doesn't it ? > No, this doesn't sound strange. Cluster must be quorate to operate. Quorum can be reduced while a node is down, fencing it or simply removing it, by cman or by hand editing cluster.conf. Try this: start all the node without cman, gfs and other GFS suite packages. Then start by hand, one a time on each node, ccsd, cman, lock_gulm(?), fenced, clvmd and rgmanager init scripts. After each run, check the /var/log/messages output and connectivity between nodes. Unfortunately the configuration is far different from the one I use, so I cannot help you. Leandro From akpinar_haydar at hotmail.com Tue Apr 11 05:17:53 2006 From: akpinar_haydar at hotmail.com (Haydar Akpinar) Date: Tue, 11 Apr 2006 05:17:53 +0000 Subject: [Linux-cluster] Linux (qmail) clustering Message-ID: Hello every one. I am a newbe so don't really know much about UNIX nor Linux for that matter I have been asked to do a high availability qmail(non LDAP) clustering which is running on Redhat 9. I would like to know if it is possible to do and also if any one has done qmail clustering on a Linux box. And if any one can direct me with finding the information on How To Thanks for your time. _________________________________________________________________ Hava durumunu bizden ?grenin ve evden ?yle ?ikin! http://www.msn.com.tr/havadurumu/ From ocrete at max-t.com Tue Apr 11 14:06:40 2006 From: ocrete at max-t.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=) Date: Tue, 11 Apr 2006 10:06:40 -0400 Subject: [Linux-cluster] cman kickout out nodes for no good reason In-Reply-To: <443B5F28.1060004@redhat.com> References: <1144341281.355.38.camel@cocagne.max-t.internal> <1144702908.21093.7.camel@cocagne.max-t.internal> <443B5F28.1060004@redhat.com> Message-ID: <1144764400.9106.3.camel@TesterBox.tester.ca> On Tue, 2006-11-04 at 08:47 +0100, Patrick Caulfield wrote: > Olivier Cr?te wrote: > > On Thu, 2006-06-04 at 12:34 -0400, Olivier Cr?te wrote: > >> I have a strange problem where cman suddenly starts kicking out members > >> of the cluster with "Inconsistent cluster view" when I join a new node > >> (sometimes). It takes a few minutes between each kicking. I'm using a > >> snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is > >> in transition state at that point and I can't stop/start services or do > >> anything else. It did not do that with a snapshot I took a few months > >> ago. > > > > Its still happening, the node that joins says "Transition master > > unknown", while all of the other nodes who the master is, then the > > master gets kicked out. Then a new master is selected, all of the nodes > > seem to know who the master is, but refuse to act on it. After a while, > > the new master is kicked out and the process restarts. I guess its > > related to the changes with the timestamps to prevent master desync, I > > dont see any other recent change that could have caused it. > > > > That's very peculiar behaviour, and it's going to be hard to pin down. How > consistently does it happen ? Often, but I haven't found the exact sequence to reproduce it. > It could be caused by extreme network packet loss, or something blocking the > progress of cman processes. Are the already joined nodes very busy when you > bring the new node into the cluster (if so, doing what?) I doubt its packet loss since cman is running over myrinet's ethernet/ip layer and its the only user of that port (so it shouldn't be affected by the rest of the traffic over the myrinet). The other nodes may be busy, but the CPU isn't at 100% us on any of them, although the PCIX bus may be used a lot. > I think the best way to try and track this down is to get a tcpdump of the > cluster traffic (port 6809/udp) happening at the time of the join - make sure > that all nodes are included in the dump and that all of the packet is captured. I will try to get a tcpdump. Thanks for you help, -- Olivier Cr?te ocrete at max-t.com Maximum Throughput Inc. From mbrookov at mines.edu Tue Apr 11 14:49:04 2006 From: mbrookov at mines.edu (Matthew B. Brookover) Date: Tue, 11 Apr 2006 08:49:04 -0600 Subject: [Linux-cluster] Cisco fence agent In-Reply-To: References: Message-ID: <1144766944.16956.10.camel@merlin.Mines.EDU> I do not know if this will help, but here is what I put together. We have 3 Cisco 3750 switches. I am currently using SNMP to turn off the ports of a host that is being fenced. I wrote a perl script called fence_cisco that works with GFS 6. I have attached a copy of fence_cisco to this message and its config file. I do not have much in the way of documentation for it, and it will probably take some hacking to get it to work with a current version of GFS. If you know a little perl, writing a fencing agent is not very difficult. I have also included a copy for the config file for fence_cisco. The first two lines specify the SNMP community string and the IP address for the switch. The rest is a list of hosts and the ports they use. You will have to talk to your local network guru to figure out Cisco community strings and the numbers involved. It took some tinkering to figure out how Cisco does this stuff, and even after writing the code, I am still not sure that I understand it. I do know that it does work, GFS does do the correct things during a crash. Most people use one of the power supply switches. Redhat provides the fence_apc agent that will turn off the power to a node that needs to be fenced. I like the network option because the host that is having problems will be able to write log entries after it has been fenced. You will need to get the Net::SNMP module from cpan.org to use fence_cisco. Matt On Sun, 2006-04-09 at 01:23 +0900, ??? wrote: > Hi all. > Do anyone have cisco catalyst fence agent? > If nobody make that, I will make. > > Thanks. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fence_cisco Type: application/x-perl Size: 10442 bytes Desc: not available URL: -------------- next part -------------- community:YOURSTRINGHERE switch:1.1.1.1 imagine:GigabitEthernet1/0/9:GigabitEthernet2/0/9:GigabitEthernet1/0/5 illuminate:GigabitEthernet2/0/10:GigabitEthernet3/0/9:GigabitEthernet2/0/6 illusion:GigabitEthernet1/0/10:GigabitEthernet3/0/10:GigabitEthernet1/0/6 inception:GigabitEthernet1/0/11:GigabitEthernet2/0/11:GigabitEthernet1/0/7 inspire:GigabitEthernet2/0/12:GigabitEthernet3/0/11:GigabitEthernet2/0/8 incantation:GigabitEthernet1/0/12:GigabitEthernet3/0/12:GigabitEthernet1/0/8 From carlopmart at gmail.com Tue Apr 11 15:01:16 2006 From: carlopmart at gmail.com (carlopmart) Date: Tue, 11 Apr 2006 17:01:16 +0200 Subject: [Linux-cluster] Question about manual fencing In-Reply-To: <443A80D2.6050806@adelpha-lan.org> References: <443A7F34.7000901@gmail.com> <443A80D2.6050806@adelpha-lan.org> Message-ID: <443BC4BC.1030405@gmail.com> Thanks Jerome. Castang Jerome wrote: > carlopmart a ?crit : > >> Hi all, >> >> I would like to test manual fencing on two nodes for testing >> pourposes. I have read RedHat's docs about this but I don't see very >> clear. If I setup manual fencing, when one node shutdowns, the other >> node startups all services that I have configured on the another node >> automatically? >> >> Thanks. >> > > I don't think so. > Fencing a node is to stop it, or make it leaving the cluster (using any > method like shutdown...) > So if you use manual fencing, the other nodes will not start automaticly > their services... > > -- CL Martinez carlopmart {at} gmail {d0t} com From basv at sara.nl Tue Apr 11 15:35:47 2006 From: basv at sara.nl (Bas van der Vlies) Date: Tue, 11 Apr 2006 17:35:47 +0200 Subject: [Offlist] Re: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> <443A790E.1040002@sara.nl> Message-ID: On Apr 11, 2006, at 3:58 PM, Nate Carlson wrote: > On Mon, 10 Apr 2006, Bas van der Vlies wrote: >> I have compiled deb-packages for kernel 2.6.16.2 and uses the CVS >> STABLE branch. > > Do you have the source packages? It'd be really handy to be able to > build module packages. :) > > I did not make source packages, its is a good suggestion, but i use gfs from CVS and use different kind of kernels. So i regularly make new versions. For every package i creates a debian directory and i made i global script that compiles everything and make debian packages - for the kernel modules, the kernel version is in the package - for the user space tools i only update the version number. Regards -- Bas van der Vlies basv at sara.nl From natecars at natecarlson.com Tue Apr 11 15:37:58 2006 From: natecars at natecarlson.com (Nate Carlson) Date: Tue, 11 Apr 2006 10:37:58 -0500 (CDT) Subject: [Offlist] Re: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> <443A790E.1040002@sara.nl> Message-ID: On Tue, 11 Apr 2006, Bas van der Vlies wrote: > I did not make source packages, its is a good suggestion, but i use gfs > from CVS and use different kind of kernels. So i regularly make new > versions. > > For every package i creates a debian directory and i made i global script > that compiles everything and make debian packages > - for the kernel modules, the kernel version is in the package > - for the user space tools i only update the version number. Would you mind sharing the scripts? That'd make my life a bit easier when packaging GFS for debian. :) ------------------------------------------------------------------------ | nate carlson | natecars at natecarlson.com | http://www.natecarlson.com | | depriving some poor village of its idiot since 1981 | ------------------------------------------------------------------------ From basv at sara.nl Tue Apr 11 15:43:37 2006 From: basv at sara.nl (Bas van der Vlies) Date: Tue, 11 Apr 2006 17:43:37 +0200 Subject: [Offlist] Re: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> <443A790E.1040002@sara.nl> Message-ID: <2553AC38-C5BC-4C10-95CE-8CFB0F85E0A6@sara.nl> On Apr 11, 2006, at 5:37 PM, Nate Carlson wrote: > On Tue, 11 Apr 2006, Bas van der Vlies wrote: >> I did not make source packages, its is a good suggestion, but i >> use gfs from CVS and use different kind of kernels. So i >> regularly make new versions. >> >> For every package i creates a debian directory and i made i global >> script that compiles everything and make debian packages >> - for the kernel modules, the kernel version is in the package >> - for the user space tools i only update the version number. > > Would you mind sharing the scripts? That'd make my life a bit > easier when packaging GFS for debian. :) > No problem, I have to package it and make it available on our ftp- server. If find bug or have improvements mail them. I will send an email to list if i have made release ;-) Regards -- Bas van der Vlies basv at sara.nl From natecars at natecarlson.com Tue Apr 11 15:43:59 2006 From: natecars at natecarlson.com (Nate Carlson) Date: Tue, 11 Apr 2006 10:43:59 -0500 (CDT) Subject: [Offlist] Re: [Linux-cluster] Using GFS with vanilla kernel (2.6.16) In-Reply-To: <2553AC38-C5BC-4C10-95CE-8CFB0F85E0A6@sara.nl> References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> <443A790E.1040002@sara.nl> <2553AC38-C5BC-4C10-95CE-8CFB0F85E0A6@sara.nl> Message-ID: On Tue, 11 Apr 2006, Bas van der Vlies wrote: > No problem, I have to package it and make it available on our ftp-server. > If find bug or have improvements mail them. > I will send an email to list if i have made release ;-) Great - thanks! :) ------------------------------------------------------------------------ | nate carlson | natecars at natecarlson.com | http://www.natecarlson.com | | depriving some poor village of its idiot since 1981 | ------------------------------------------------------------------------ From jbrassow at redhat.com Tue Apr 11 15:48:25 2006 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Tue, 11 Apr 2006 10:48:25 -0500 Subject: [Linux-cluster] Question about manual fencing In-Reply-To: <443BC4BC.1030405@gmail.com> References: <443A7F34.7000901@gmail.com> <443A80D2.6050806@adelpha-lan.org> <443BC4BC.1030405@gmail.com> Message-ID: <634f53a0e00f383b47d142f530b9dbf7@redhat.com> manual fencing gets it's name because it requires manual intervention... that is, it is not automatic. brassow On Apr 11, 2006, at 10:01 AM, carlopmart wrote: > Thanks Jerome. > > Castang Jerome wrote: >> carlopmart a ?crit : >>> Hi all, >>> >>> I would like to test manual fencing on two nodes for testing >>> pourposes. I have read RedHat's docs about this but I don't see >>> very clear. If I setup manual fencing, when one node shutdowns, the >>> other node startups all services that I have configured on the >>> another node automatically? >>> >>> Thanks. >>> >> I don't think so. >> Fencing a node is to stop it, or make it leaving the cluster (using >> any method like shutdown...) >> So if you use manual fencing, the other nodes will not start >> automaticly their services... > > -- > CL Martinez > carlopmart {at} gmail {d0t} com > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From Alain.Moulle at bull.net Tue Apr 11 15:56:02 2006 From: Alain.Moulle at bull.net (Alain Moulle) Date: Tue, 11 Apr 2006 17:56:02 +0200 Subject: [Linux-cluster] CS4 Update 2 / cluster 3 noeuds question Message-ID: <443BD192.1000407@bull.net> Hi Finally I've found the problem (a bad alias in /etc/hosts !). But I've another question : As told before, I have yack10 and yack23 with each one a service to run, and yack23 as backup for both nodes (see attached cluster.conf) I've tested with a poweroff on yack10 and the service is well failoverd on yack23. But then I tried to do poweroff on yack21, but it does not failover because "missing two many heart beats". I suspect that it is normal because we have only one node left among the three, and so there is not enough votes ... But I would like to have a confirmation ? And if so, is there a way to configure so that yack23 could failover the services of both other nodes stopped at the same time ? Thanks Alain -------------- next part -------------- A non-text attachment was scrubbed... Name: cluster.conf Type: text/xml Size: 2015 bytes Desc: not available URL: From teigland at redhat.com Tue Apr 11 16:52:59 2006 From: teigland at redhat.com (David Teigland) Date: Tue, 11 Apr 2006 11:52:59 -0500 Subject: [Linux-cluster] cluster-1.02.00 Message-ID: <20060411165259.GB5820@redhat.com> A new source tarball from the STABLE branch has been released; it builds and runs on 2.6.16: ftp://sources.redhat.com/pub/cluster/releases/cluster-1.02.00.tar.gz Version 1.02.00 - 10 April 2006 =============================== dlm-kernel: Allow DLM to start if the node gets a different nodeid. dlm-kernel: Add WARNING printk when cman calls emergency_shutdown. dlm-kernel: The in_recovery semaphore wasn't being released in corner case where grant message is ignored for lock being unlocked. dlm-kernel: Remove an assertion that triggers unnecessarily in rare cases of overlapping and invalid master lookups. dlm-kernel: Don't close existing connection if a double-connect is attempted - just ignore the last one. dlm-kernel: Fix a race where an attempt to unlock a lock in the completion AST routine could crash on SMP. dlm-kernel: Fix transient hangs that could be caused by incorrect handling of locks granted due to ALTMODE. bz#178738 dlm-kernel: Allow any old user to create the default lockspace. You need Udev running AND build dlm with ./configure --have_udev. dlm-kernel: Only release a lockspace if all users have closed it. bz#177934 cman-kernel: Fix cman master confusion during recovery. bz#158592 cman-kernel: Add printk to assert failure when a nodeid lookup fails. cman-kernel: Give an interface "max-retries" attempts to get fixed after an error before we give up and shut down the cluster. cman-kernel: IPv6 FF1x:: multicast addresses don't work. Always send out of the locally bound address. bz#166752 cman-kernel: Ignore really badly delayed old duplicates that might get sent via a bonded interface. bz#173621 cman-kernel: /proc/cluster/services seq_start needs to initialise the pointer, we may not be starting from the beginning every time. bz#175372 cman-kernel: Fix memory leak when reading from /proc/cluster/nodes or /proc/cluster/services. bz#178367 cman-kernel: Send a userspace notification when we are the last node in a cluster. bz#182233 cman-kernel: add quorum device interface for userspace cman-kernel: Add node ID to /proc/cluster/status cman: Allow "cman_tool leave force" to cause cman to leave the cluster even if it's in transition or joining. cman: Look over more than 16 interfaces when searching for the broadcast address. cman: init script does 'cman_tool leave remove' on stop cman: add cman_get/set_private to libcman cman: add quorum device API to libcman gfs-kernel: Fix performance with sync mount option; pages were not being flushed when gfs_writepage is called. bz#173147 gfs-kernel: Flush pages into storage in case of DirectIO falling back to BufferIO. DirectIO reads were sometimes getting stale data. gfs-kernel: Make sendfile work with stuffed inodes; after a write on stuffed inode, mark cached page as not uptodate. bz#142849 gfs-kernel: Fix spot where the quota_enforce setting is ignored. gfs-kernel: Fix case of big allocation slowdown. The allocator could end up failing its passive attempts to lock all recent rgrps because another node had deallocated from them and was caching the locks. The allocator now switches from passive to forceful requests after try_threshold failures. gfs-kernel: Fix rare case of bad NFS file handles leading to stale file handle errors. bz#178469 gfs-kernel: Properly handle error return code from verify_jhead(). gfs-kernel: Fix possible umount panic due to the ordering of log flushes and log shutdown. bz#164331, bz#178469 gfs-kernel: Fix directory delete out of memory error. bz#182057 gfs-kernel: Return code was not being propagated while setting default ACLs causing an EPERM everytime. bz#182066 gulm: Fix bug that would cause luck_gulmd to not call waitpid unless SIGCHLD was received from the child. bz#171246 gulm: Fix problems with host lookups. Now try to match the ip if we are unable to match the name of a lock server as well as fixing the expiration of locks if gulm somehow gets a FQDN. bz#169171 fence/fenced: Multiple devices in one method were not being translated into multiple calls to an agent, but all the device data was lumped together for one agent call. bz#172401 fence/fence_apc: Make agent work with 7900 series apc switches. bz#172441 fence/fence_ipmilan: fixes for bz#178314 fence/fence_drac: support for drac 4/I fence/fence_drac: interface change in drac_mc firmware version 1.2 fence: Add support for IBM rsa fence agent gnbd-kernel: gnbd_monitor wouldn't correctly reset after an uncached gnbd had failed and been restored. bz#155304 gnbd-kernel: kill gnbd_monitor when all uncached gnbds have been removed. bz#127042 gnbd: changes to let multipath run over gnbd. gfs_fsck: Fix small window where another node can mount during a gfs_fsck. bz#169087 gfs_fsck: gfs_fsck crashed on many types of extended attribute corruptions. bz#173697 gfs_fsck: Check result code and handle failure's in fsck rgrp read code. bz#169340 gfs_fsck: fix errors checking large (multi-TB) filesystems. bz#186125 gfs_edit: new version with more options that uses ncurses. ccs: Make ccs connection descriptors time out, fixing a problem where all descriptors could be used up, even though none are in use. ccs: Increase number of connection descriptors from 10 to 30. ccs: Ignore SIGPIPE, don't catch SIGSEV, allowing for core dumps. ccs: endian fixes for clusters of machines with different endianness ccs: Fix error printing. bz#178812 ccs: fix ccs_tool seg fault on upgrade. bz#186121 magma-plugins/sm: Fix reads of /proc/cluster/services. bz#175033 magma-plugins/gulm: Fix clu_lock() return value that resulted in "Resource temporarily unavailable" messages at times. bz#171253 rgmanager: Add support for inheritance in the form "type%attribute" instead of just attribute so as to avoid confusion. rgmanager: Fix bz#150346 - Clustat usability problems rgmanager: Fix bz#170859 - VIPs show up on multiple members. rgmanager: Fix bz#171034 - Missing: Monitoring for local and cluster fs's rgmanager: Fix bz#171036 - RFE: Log messages in resource agents rgmanager: Fix bz#165447 - ip.sh fails when using VLAN on bonded interface rgmanager: Fix bz#171153 - clustat withholds information if run on multiple members simultaneously rgmanager: Fix bz#171236 - ia64 alignment warnings rgmanager: Fix bz#173526 - Samba Resource Agent rgmanager: Fix bz#173916 - rgmanager log level change requires restart rgmanager: Fix bz#174819 - clustat crashes if ccsd is not running rgmanager: Fix bz#175106 - lsof -b blocks when using gethostbyname causing slow force-unmount when DNS is broken rgmanager: Fix bz#175108 - rgmanager storing extraneous info using VF rgmanager: Fix bz#175114 - rgmanager uses wrong stop-order for unspecified resource agents rgmanager: Implement bz#175215: Inherit fsid for nfs exports rgmanager: Fix bz#175229 - remove unneeded references to clurmtabd; it is no longer a necessary piece for NFS failover rgmanager: Fix bz#176343 - __builtin_return_address(x) for x>0 is never guaranteed to work rgmanager: Ensure rgmanager doesn't block SIGSEGV when debug is not enabled. rgmanager: Fix bz#172177, bz#172178 rgmanager: Allow scripts to inherit the name attr of a parent in case the script wants to know it. bz#172310 rgmanager: Fix #166109 - random segfault in clurgmgrd rgmanager: Fix most of 177467 - clustat hang From gstaltari at arnet.net.ar Tue Apr 11 19:25:20 2006 From: gstaltari at arnet.net.ar (German Staltari) Date: Tue, 11 Apr 2006 16:25:20 -0300 Subject: [Linux-cluster] cluster-1.02.00 In-Reply-To: <20060411165259.GB5820@redhat.com> References: <20060411165259.GB5820@redhat.com> Message-ID: <443C02A0.5010103@arnet.net.ar> David Teigland wrote: > A new source tarball from the STABLE branch has been released; it builds > and runs on 2.6.16: > > ftp://sources.redhat.com/pub/cluster/releases/cluster-1.02.00.tar.gz > > Version 1.02.00 - 10 April 2006 > =============================== > dlm-kernel: Allow DLM to start if the node gets a different nodeid. > dlm-kernel: Add WARNING printk when cman calls emergency_shutdown. > dlm-kernel: The in_recovery semaphore wasn't being released in corner case > where grant message is ignored for lock being unlocked. > dlm-kernel: Remove an assertion that triggers unnecessarily in rare > cases of overlapping and invalid master lookups. > dlm-kernel: Don't close existing connection if a double-connect is > attempted - just ignore the last one. > dlm-kernel: Fix a race where an attempt to unlock a lock in the completion > AST routine could crash on SMP. > dlm-kernel: Fix transient hangs that could be caused by incorrect handling > of locks granted due to ALTMODE. bz#178738 > dlm-kernel: Allow any old user to create the default lockspace. You need Udev > running AND build dlm with ./configure --have_udev. > dlm-kernel: Only release a lockspace if all users have closed it. bz#177934 > cman-kernel: Fix cman master confusion during recovery. bz#158592 > cman-kernel: Add printk to assert failure when a nodeid lookup fails. > cman-kernel: Give an interface "max-retries" attempts to get fixed after > an error before we give up and shut down the cluster. > cman-kernel: IPv6 FF1x:: multicast addresses don't work. Always send out > of the locally bound address. bz#166752 > cman-kernel: Ignore really badly delayed old duplicates that might get > sent via a bonded interface. bz#173621 > cman-kernel: /proc/cluster/services seq_start needs to initialise the pointer, > we may not be starting from the beginning every time. bz#175372 > cman-kernel: Fix memory leak when reading from /proc/cluster/nodes or > /proc/cluster/services. bz#178367 > cman-kernel: Send a userspace notification when we are the last node in > a cluster. bz#182233 > cman-kernel: add quorum device interface for userspace > cman-kernel: Add node ID to /proc/cluster/status > cman: Allow "cman_tool leave force" to cause cman to leave the cluster > even if it's in transition or joining. > cman: Look over more than 16 interfaces when searching for the broadcast > address. > cman: init script does 'cman_tool leave remove' on stop > cman: add cman_get/set_private to libcman > cman: add quorum device API to libcman > gfs-kernel: Fix performance with sync mount option; pages were not being > flushed when gfs_writepage is called. bz#173147 > gfs-kernel: Flush pages into storage in case of DirectIO falling back to > BufferIO. DirectIO reads were sometimes getting stale data. > gfs-kernel: Make sendfile work with stuffed inodes; after a write on > stuffed inode, mark cached page as not uptodate. bz#142849 > gfs-kernel: Fix spot where the quota_enforce setting is ignored. > gfs-kernel: Fix case of big allocation slowdown. The allocator could end > up failing its passive attempts to lock all recent rgrps because another > node had deallocated from them and was caching the locks. The allocator now > switches from passive to forceful requests after try_threshold failures. > gfs-kernel: Fix rare case of bad NFS file handles leading to stale file > handle errors. bz#178469 > gfs-kernel: Properly handle error return code from verify_jhead(). > gfs-kernel: Fix possible umount panic due to the ordering of log flushes > and log shutdown. bz#164331, bz#178469 > gfs-kernel: Fix directory delete out of memory error. bz#182057 > gfs-kernel: Return code was not being propagated while setting default > ACLs causing an EPERM everytime. bz#182066 > gulm: Fix bug that would cause luck_gulmd to not call waitpid unless > SIGCHLD was received from the child. bz#171246 > gulm: Fix problems with host lookups. Now try to match the ip if we are > unable to match the name of a lock server as well as fixing the expiration > of locks if gulm somehow gets a FQDN. bz#169171 > fence/fenced: Multiple devices in one method were not being translated > into multiple calls to an agent, but all the device data was lumped together > for one agent call. bz#172401 > fence/fence_apc: Make agent work with 7900 series apc switches. bz#172441 > fence/fence_ipmilan: fixes for bz#178314 > fence/fence_drac: support for drac 4/I > fence/fence_drac: interface change in drac_mc firmware version 1.2 > fence: Add support for IBM rsa fence agent > gnbd-kernel: gnbd_monitor wouldn't correctly reset after an uncached gnbd had > failed and been restored. bz#155304 > gnbd-kernel: kill gnbd_monitor when all uncached gnbds have been removed. > bz#127042 > gnbd: changes to let multipath run over gnbd. > gfs_fsck: Fix small window where another node can mount during a gfs_fsck. > bz#169087 > gfs_fsck: gfs_fsck crashed on many types of extended attribute corruptions. > bz#173697 > gfs_fsck: Check result code and handle failure's in fsck rgrp read code. > bz#169340 > gfs_fsck: fix errors checking large (multi-TB) filesystems. bz#186125 > gfs_edit: new version with more options that uses ncurses. > ccs: Make ccs connection descriptors time out, fixing a problem where all > descriptors could be used up, even though none are in use. > ccs: Increase number of connection descriptors from 10 to 30. > ccs: Ignore SIGPIPE, don't catch SIGSEV, allowing for core dumps. > ccs: endian fixes for clusters of machines with different endianness > ccs: Fix error printing. bz#178812 > ccs: fix ccs_tool seg fault on upgrade. bz#186121 > magma-plugins/sm: Fix reads of /proc/cluster/services. bz#175033 > magma-plugins/gulm: Fix clu_lock() return value that resulted in > "Resource temporarily unavailable" messages at times. bz#171253 > rgmanager: Add support for inheritance in the form "type%attribute" > instead of just attribute so as to avoid confusion. > rgmanager: Fix bz#150346 - Clustat usability problems > rgmanager: Fix bz#170859 - VIPs show up on multiple members. > rgmanager: Fix bz#171034 - Missing: Monitoring for local and cluster fs's > rgmanager: Fix bz#171036 - RFE: Log messages in resource agents > rgmanager: Fix bz#165447 - ip.sh fails when using VLAN on bonded interface > rgmanager: Fix bz#171153 - clustat withholds information if run on multiple > members simultaneously > rgmanager: Fix bz#171236 - ia64 alignment warnings > rgmanager: Fix bz#173526 - Samba Resource Agent > rgmanager: Fix bz#173916 - rgmanager log level change requires restart > rgmanager: Fix bz#174819 - clustat crashes if ccsd is not running > rgmanager: Fix bz#175106 - lsof -b blocks when using gethostbyname causing > slow force-unmount when DNS is broken > rgmanager: Fix bz#175108 - rgmanager storing extraneous info using VF > rgmanager: Fix bz#175114 - rgmanager uses wrong stop-order for unspecified > resource agents > rgmanager: Implement bz#175215: Inherit fsid for nfs exports > rgmanager: Fix bz#175229 - remove unneeded references to clurmtabd; it is no > longer a necessary piece for NFS failover > rgmanager: Fix bz#176343 - __builtin_return_address(x) for x>0 is never > guaranteed to work > rgmanager: Ensure rgmanager doesn't block SIGSEGV when debug is not enabled. > rgmanager: Fix bz#172177, bz#172178 > rgmanager: Allow scripts to inherit the name attr of a parent in case the > script wants to know it. bz#172310 > rgmanager: Fix #166109 - random segfault in clurgmgrd > rgmanager: Fix most of 177467 - clustat hang > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > It would be nice to have the rpm for FC4 from this new update. TIA German From gregp at liveammo.com Wed Apr 12 03:13:31 2006 From: gregp at liveammo.com (Greg Perry) Date: Tue, 11 Apr 2006 23:13:31 -0400 Subject: [Linux-cluster] Questions about GFS Message-ID: <443C705B.6020606@liveammo.com> Hello, I have been researching GFS for a few days, and I have some questions that hopefully some seasoned users of GFS may be able to answer. I am working on the design of a linux cluster that needs to be scalable, it will be primarily an RDBMS-driven data warehouse used for data mining and content indexing. In an ideal world, we would be able to start with a small (say 4 node) cluster, then add machines (and storage) as the various RDBMS' grow in size (as well as the use virtual IPs for load balancing across multiple lighttpd instances. All machines on the node need to be able to talk to the same volume of information, and GFS (in theory at least) would be used to aggregate the drives from each machine into that huge shared logical volume). With that being said, here are some questions: 1) What is the preference on the RDBMS, will MySQL 5.x work and are there any locking issues to consider? What would the best open source RDBMS be (MySQL vs. Postgresql etc) 2) If there was a 10 machine cluster, each with a 300GB SATA drive, can you use GFS to aggregate all 10 drives into one big logical 3000GB volume? Would that scenario work similar to a RAID array? If one or two nodes fail, but the GFS quorum is maintained, can those nodes be replaced and repopulated just like a RAID-5 array? If this scenario is possible, how difficult is it to "grow" the shared logical volume by adding additional nodes (say I had two more machines each with a 300GB SATA drive)? 3) How stable is GFS currently, and is it used in many production environments? 4) How stable is the FC5 version, and does it include all of the configuration utilities in the RH Enterprise Cluster version? (the idea would be to prove the point on FC5, then migrate to RH Enterprise). 5) Would CentOS be preferred over FC5 for the initial proof of concept and early adoption? 6) Are there any restrictions or performance advantages of using all drives with the same geometry, or can you mix and match different size drives and just add to the aggregate volume size? Thanks in advance, Greg From pcaulfie at redhat.com Wed Apr 12 07:06:17 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Wed, 12 Apr 2006 08:06:17 +0100 Subject: [Linux-cluster] CS4 Update 2 / cluster 3 noeuds question In-Reply-To: <443BD192.1000407@bull.net> References: <443BD192.1000407@bull.net> Message-ID: <443CA6E9.9000402@redhat.com> Alain Moulle wrote: > Hi > Finally I've found the problem (a bad alias in /etc/hosts !). > > But I've another question : > As told before, I have yack10 and yack23 with each one a service > to run, and yack23 as backup for both nodes (see attached cluster.conf) > > I've tested with a poweroff on yack10 and the service > is well failoverd on yack23. But then I tried to > do poweroff on yack21, but it does not failover > because "missing two many heart beats". > I suspect that it is normal because we have only > one node left among the three, and so there is > not enough votes ... > But I would like to have a confirmation ? Yes, that's correct. If you have a three-node cluster then there needs to be two active nodes for it to have quorum. Otherwise single nodes could split form "clusters" on their own and corrupt the filesystem (in the case of GFS) > And if so, is there a way to configure so that > yack23 could failover the services of both > other nodes stopped at the same time ? > -- patrick From kumaresh81 at yahoo.co.in Wed Apr 12 08:12:24 2006 From: kumaresh81 at yahoo.co.in (Kumaresh Ponnuswamy) Date: Wed, 12 Apr 2006 09:12:24 +0100 (BST) Subject: [Linux-cluster] a doubt on quorums Message-ID: <20060412081224.71455.qmail@web8318.mail.in.yahoo.com> Hi, I have a problem with my cluster and quorum settings and any help will be appreciated. I have a five node cluster with quorum vote of 1 for all the 5 nodes. They have a GFS shared file system on all the five nodes, and, two domains and two services involving two nodes. When I shut down the 3 nodes that don't participate in the two domains and clustered services, both the services stop and fail to start when tried manually also. I guess it is something to do with the quorum settings, but not sure on the way forward. The environment is on RHEL AS 4U2 with GFS 6.1 and RHCS 4U2. Regards, Kumaresh --------------------------------- Jiyo cricket on Yahoo! India cricket Yahoo! Messenger Mobile Stay in touch with your buddies all the time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From placid at adelpha-lan.org Wed Apr 12 08:18:20 2006 From: placid at adelpha-lan.org (Castang Jerome) Date: Wed, 12 Apr 2006 10:18:20 +0200 Subject: [Linux-cluster] a doubt on quorums In-Reply-To: <20060412081224.71455.qmail@web8318.mail.in.yahoo.com> References: <20060412081224.71455.qmail@web8318.mail.in.yahoo.com> Message-ID: <443CB7CC.5@adelpha-lan.org> Kumaresh Ponnuswamy a ?crit : > Hi, > > I have a problem with my cluster and quorum settings and any help will > be appreciated. > > I have a five node cluster with quorum vote of 1 for all the 5 nodes. > They have a GFS shared file system on all the five nodes, and, two > domains and two services involving two nodes. > > When I shut down the 3 nodes that don't participate in the two domains > and clustered services, both the services stop and fail to start when > tried manually also. > > I guess it is something to do with the quorum settings, but not sure > on the way forward. > > The environment is on RHEL AS 4U2 with GFS 6.1 and RHCS 4U2. > > Regards, > Kumaresh > > ------------------------------------------------------------------------ If you have 3 nodes of 5 falling down, your cluster becomes a two node cluster. So, as it is written in documentation, it's a "special cluster" and it has to be specified (in cluster.conf or by this command "can_tool join -2") When you have a two node cluster, it is possible that each node is isolated (this is the "splitbrain" ). -- Jerome CASTANG Tel: 06.85.74.33.02 mail: jerome.castang at adelpha-lan.org --------------------------------------------- Comme le dit un vieu proverbe chinois: RTFM ! From erwan at seanodes.com Wed Apr 12 08:18:48 2006 From: erwan at seanodes.com (Velu Erwan) Date: Wed, 12 Apr 2006 10:18:48 +0200 Subject: [Linux-cluster] cluster-1.02.00 In-Reply-To: <20060411165259.GB5820@redhat.com> References: <20060411165259.GB5820@redhat.com> Message-ID: <443CB7E8.3020508@seanodes.com> David Teigland a ?crit : >A new source tarball from the STABLE branch has been released; it builds >and runs on 2.6.16: > > Is it possible to split the kernel part from the binaries part in the make process ? If yes, it could helps to have a dkms package that help us to use this release in an easiest way ;o) My build host don't have the same kernel source as my nodes, so I'd like to build the binaries on it and then generate the dkms package. When you install this dkms package on a new host, the kernel part of gfs recompiles itself.. This is very usefull ;) Erwan, From basv at sara.nl Wed Apr 12 08:37:30 2006 From: basv at sara.nl (Bas van der Vlies) Date: Wed, 12 Apr 2006 10:37:30 +0200 Subject: [Linux-cluster] ANNOUNCE: gfs_2_deb utils initial version In-Reply-To: References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> <443A790E.1040002@sara.nl> <2553AC38-C5BC-4C10-95CE-8CFB0F85E0A6@sara.nl> Message-ID: <443CBC4A.5080607@sara.nl> = gfs_2_deb - utilities = This is a release of the SARA package gfs_2_deb that contains utilities that we use to make debian packages from the RedHat Cluster Software (GFS). All init.d scripts in the debian package start at runlevel 3 and the scripts start in the right order. We have choosen this setup for these reasons, default runlevel is 2: 1) When a node is fenced, the node is rebooted and is ready for cluster mode. 2) We can easily switch from runlevels, join or leave the cluster See README for further info The package can be downloaded at: ftp://ftp.sara.nl/pub/outgoing/gfs_2_deb-0.1.tar.gz Regards -- -- ******************************************************************** * * * Bas van der Vlies e-mail: basv at sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From deval.kulshrestha at progression.com Wed Apr 12 08:57:41 2006 From: deval.kulshrestha at progression.com (Deval kulshrestha) Date: Wed, 12 Apr 2006 14:27:41 +0530 Subject: [Linux-cluster] RE: how to dis-allow manual mounting of cluster file system resources? Message-ID: <004501c65e0f$2afde300$7600a8c0@PROGRESSION> Hi I am using one MSA 500 G2 , two no. of HP DL360 G4P server with HP's HBA 642, Server installed with RHEL 4 ES U1 and RHCS4 with lock mgr as DLM I have to run around 14 different services in HA mode, I have break them up in two different priority domain. Now 7 services runs on node1 in HA mode, node2 is failover host for them, Remaining 7 services runs on node2 in HA mode and node1 is failover domain for them. In my scenario Simultaneous logical drive access is not required, thus I am not using GFS here What ever is needed is configured properly and working fine. But this cluster is still causes some data inconsistency error if somebody manually mounts the partitions which is already in access by other node. I understand that this is against the basics of non-shared file system. This can be documented also, but everybody knows that after 2-3 yrs down the line when support staff replaced by new people, when they come in with very limited understanding about the running stuff they can do some mount mistake.(umount is a document screw up, but mount is here undocumented screw up) every body knows mount is just a simple command, it does not harm anything, if I just want to read data mount is ok. But in our case we wanted to restrict other users to use mount command when some logical volume is already mounted on one node. I want some help on this, when shared file system is not implemented. How we can restrict manual mount of cluster file system resources when its being in use by some cluster services? Any help would be highly appreciable here. With regard Deval K. =========================================================== Privileged or confidential information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), please delete this message and kindly notify the sender by an emailed reply. Opinions, conclusions and other information in this message that do not relate to the official business of Progression and its associate entities shall be understood as neither given nor endorsed by them. ------------------------------------------------------------- Progression Infonet Private Limited, Gurgaon (Haryana), India -------------- next part -------------- An HTML attachment was scrubbed... URL: From kumaresh81 at yahoo.co.in Wed Apr 12 10:03:38 2006 From: kumaresh81 at yahoo.co.in (Kumaresh Ponnuswamy) Date: Wed, 12 Apr 2006 11:03:38 +0100 (BST) Subject: [Linux-cluster] RE: how to dis-allow manual mounting of cluster file system resources? In-Reply-To: <004501c65e0f$2afde300$7600a8c0@PROGRESSION> Message-ID: <20060412100338.70764.qmail@web8327.mail.in.yahoo.com> Hi, In your case, I guess removing the SUID on mount for normal users is the best solution. This is will prevent non root members from mounting the file systesm. Regards, Kumaresh Deval kulshrestha wrote: Hi I am using one MSA 500 G2 , two no. of HP DL360 G4P server with HP?s HBA 642, Server installed with RHEL 4 ES U1 and RHCS4 with lock mgr as DLM I have to run around 14 different services in HA mode, I have break them up in two different priority domain. Now 7 services runs on node1 in HA mode, node2 is failover host for them, Remaining 7 services runs on node2 in HA mode and node1 is failover domain for them. In my scenario Simultaneous logical drive access is not required, thus I am not using GFS here What ever is needed is configured properly and working fine. But this cluster is still causes some data inconsistency error if somebody manually mounts the partitions which is already in access by other node. I understand that this is against the basics of non-shared file system. This can be documented also, but everybody knows that after 2-3 yrs down the line when support staff replaced by new people, when they come in with very limited understanding about the running stuff they can do some mount mistake.(umount is a document screw up, but mount is here undocumented screw up) every body knows mount is just a simple command, it does not harm anything, if I just want to read data mount is ok. But in our case we wanted to restrict other users to use mount command when some logical volume is already mounted on one node. I want some help on this, when shared file system is not implemented. How we can restrict manual mount of cluster file system resources when its being in use by some cluster services? Any help would be highly appreciable here. With regard Deval K. =========================================================== Privileged or confidential information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), please delete this message and kindly notify the sender by an emailed reply. Opinions, conclusions and other information in this message that do not relate to the official business of Progression and its associate entities shall be understood as neither given nor endorsed by them. ------------------------------------------------------------- Progression Infonet Private Limited, Gurgaon (Haryana), India -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster --------------------------------- Jiyo cricket on Yahoo! India cricket Yahoo! Messenger Mobile Stay in touch with your buddies all the time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From deval.kulshrestha at progression.com Wed Apr 12 10:59:13 2006 From: deval.kulshrestha at progression.com (Deval kulshrestha) Date: Wed, 12 Apr 2006 16:29:13 +0530 Subject: [Linux-cluster] RE: how to dis-allow manual mounting of cluster file system resources? In-Reply-To: <20060412100338.70764.qmail@web8327.mail.in.yahoo.com> Message-ID: <005501c65e20$22437b10$7600a8c0@PROGRESSION> Hi Kumaresh Thanks for the reply/inputs SAN LUN's are not defined in /etc/fstab. They don't have to be mounted while OS boots. SAN volumes are the part of Cluster resources groups, they are in control of Cluster services rgmanager. I did not understand how we can make it work, please suggest how we can go ahead. Regards Deval -----Original Message----- From: Kumaresh Ponnuswamy [mailto:kumaresh81 at yahoo.co.in] Sent: Wednesday, April 12, 2006 3:34 PM To: Deval kulshrestha; linux clustering Subject: Re: [Linux-cluster] RE: how to dis-allow manual mounting of cluster file system resources? Hi, In your case, I guess removing the SUID on mount for normal users is the best solution. This is will prevent non root members from mounting the file systesm. Regards, Kumaresh Deval kulshrestha wrote: Hi I am using one MSA 500 G2 , two no. of HP DL360 G4P server with HP's HBA 642, Server installed with RHEL 4 ES U1 and RHCS4 with lock mgr as DLM I have to run around 14 different services in HA mode, I have break them up in two different priority domain. Now 7 services runs on node1 in HA mode, node2 is failover host for them, Remaining 7 services runs on node2 in HA mode and node1 is failover domain for them. In my scenario Simultaneous logical drive access is not required, thus I am not using GFS here What ever is needed is configured properly and working fine. But this cluster is still causes some data inconsistency error if somebody manually mounts the partitions which is already in access by other node. I understand that this is against the basics of non-shared file system. This can be documented also, but everybody knows that after 2-3 yrs down the line when support staff replaced by new people, when they come in with very limited understanding about the running stuff they can do some mount mistake.(umount is a document screw up, but mount is here undocumented screw up) every body knows mount is just a simple command, it does not harm anything, if I just want to read data mount is ok. But in our case we wanted to restrict other users to use mount command when some logical volume is already mounted on one node. I want some help on this, when shared file system is not implemented. How we can restrict manual mount of cluster file system resources when its being in use by some cluster services? Any help would be highly appreciable here. With regard Deval K. =========================================================== Privileged or confidential information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), please delete this message and kindly notify the sender by an emailed reply. Opinions, conclusions and other information in this message that do not relate to the official business of Progression and its associate entities shall be understood as neither given nor endorsed by them. ------------------------------------------------------------- Progression Infonet Private Limited, Gurgaon (Haryana), India -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster _____ Jiyo cricket on Yahoo! India cricket Yahoo! Messenger Mobile Stay in touch with your buddies all the time. =========================================================== Privileged or confidential information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), please delete this message and kindly notify the sender by an emailed reply. Opinions, conclusions and other information in this message that do not relate to the official business of Progression and its associate entities shall be understood as neither given nor endorsed by them. ------------------------------------------------------------- Progression Infonet Private Limited, Gurgaon (Haryana), India -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowie_Bailey at BUC.com Wed Apr 12 14:56:13 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Wed, 12 Apr 2006 10:56:13 -0400 Subject: [Linux-cluster] Questions about GFS Message-ID: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com> Greg Perry wrote: > > I have been researching GFS for a few days, and I have some questions > that hopefully some seasoned users of GFS may be able to answer. > > I am working on the design of a linux cluster that needs to be > scalable, it will be primarily an RDBMS-driven data warehouse used > for data mining and content indexing. In an ideal world, we would be > able to start with a small (say 4 node) cluster, then add machines > (and storage) as the various RDBMS' grow in size (as well as the use > virtual IPs for load balancing across multiple lighttpd instances. > All machines on the node need to be able to talk to the same volume > of information, and GFS (in theory at least) would be used to > aggregate the drives from each machine into that huge shared logical > volume). > > With that being said, here are some questions: > > 1) What is the preference on the RDBMS, will MySQL 5.x work and are > there any locking issues to consider? What would the best open source > RDBMS be (MySQL vs. Postgresql etc) Someone more qualified than me will have to answer that question. > 2) If there was a 10 machine cluster, each with a 300GB SATA drive, > can you use GFS to aggregate all 10 drives into one big logical 3000GB > volume? Would that scenario work similar to a RAID array? If one or > two nodes fail, but the GFS quorum is maintained, can those nodes be > replaced and repopulated just like a RAID-5 array? If this scenario > is possible, how difficult is it to "grow" the shared logical volume > by adding additional nodes (say I had two more machines each with a > 300GB SATA drive)? GFS doesn't work that way. GFS is just a fancy filesystem. It takes an already shared volume and allows all of the nodes to access it at the same time. > 3) How stable is GFS currently, and is it used in many production > environments? It seems to be stable for me, but we are still in testing mode at the moment. > 4) How stable is the FC5 version, and does it include all of the > configuration utilities in the RH Enterprise Cluster version? (the > idea would be to prove the point on FC5, then migrate to RH > Enterprise). Haven't used that one. > 5) Would CentOS be preferred over FC5 for the initial > proof of concept and early adoption? If your eventual platform is RHEL, then CentOS would make more sense for a testing platform since it is almost identical to RHEL. Fedora can be less stable and may introduce some issues that you wouldn't have with RHEL. On the other hand, RHEL may have some problems that don't appear on Fedora because of updated packages. If you want bleeding edge, use Fedora. If you want stability, use CentOS or RHEL. > 6) Are there any restrictions or performance advantages of using all > drives with the same geometry, or can you mix and match different size > drives and just add to the aggregate volume size? As I said earlier, GFS does not do the aggregation. What you get with GFS is the ability to share an already networked storage volume. You can use iSCSI, AoE, GNBD, or others to connect the storage to all of the cluster nodes. Then you format the volume with GFS so that it can be used with all of the nodes. I believe there is a project for the aggregate filesystem that you are looking for, but as far as I know, it is still beta. -- Bowie From gregp at liveammo.com Wed Apr 12 15:21:27 2006 From: gregp at liveammo.com (Greg Perry) Date: Wed, 12 Apr 2006 11:21:27 -0400 Subject: [Linux-cluster] Questions about GFS In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com> References: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com> Message-ID: <443D1AF7.8090105@liveammo.com> Thanks Bowie, I understand more now. So within this architecture, it would make more sense to utilize a RAID-5/10 SAN, then add diskless workstations as needed for performance...? For said diskless workstations, does it make sense to run Stateless Linux to keep the images the same across all of the workstations/client machines? Regards Greg Bowie Bailey wrote: > Greg Perry wrote: >> I have been researching GFS for a few days, and I have some questions >> that hopefully some seasoned users of GFS may be able to answer. >> >> I am working on the design of a linux cluster that needs to be >> scalable, it will be primarily an RDBMS-driven data warehouse used >> for data mining and content indexing. In an ideal world, we would be >> able to start with a small (say 4 node) cluster, then add machines >> (and storage) as the various RDBMS' grow in size (as well as the use >> virtual IPs for load balancing across multiple lighttpd instances. >> All machines on the node need to be able to talk to the same volume >> of information, and GFS (in theory at least) would be used to >> aggregate the drives from each machine into that huge shared logical >> volume). >> >> With that being said, here are some questions: >> >> 1) What is the preference on the RDBMS, will MySQL 5.x work and are >> there any locking issues to consider? What would the best open source >> RDBMS be (MySQL vs. Postgresql etc) > > Someone more qualified than me will have to answer that question. > >> 2) If there was a 10 machine cluster, each with a 300GB SATA drive, >> can you use GFS to aggregate all 10 drives into one big logical 3000GB >> volume? Would that scenario work similar to a RAID array? If one or >> two nodes fail, but the GFS quorum is maintained, can those nodes be >> replaced and repopulated just like a RAID-5 array? If this scenario >> is possible, how difficult is it to "grow" the shared logical volume >> by adding additional nodes (say I had two more machines each with a >> 300GB SATA drive)? > > GFS doesn't work that way. GFS is just a fancy filesystem. It takes > an already shared volume and allows all of the nodes to access it at > the same time. > >> 3) How stable is GFS currently, and is it used in many production >> environments? > > It seems to be stable for me, but we are still in testing mode at the > moment. > >> 4) How stable is the FC5 version, and does it include all of the >> configuration utilities in the RH Enterprise Cluster version? (the >> idea would be to prove the point on FC5, then migrate to RH >> Enterprise). > > Haven't used that one. > >> 5) Would CentOS be preferred over FC5 for the initial >> proof of concept and early adoption? > > If your eventual platform is RHEL, then CentOS would make more sense > for a testing platform since it is almost identical to RHEL. Fedora > can be less stable and may introduce some issues that you wouldn't have > with RHEL. On the other hand, RHEL may have some problems that don't > appear on Fedora because of updated packages. > > If you want bleeding edge, use Fedora. > If you want stability, use CentOS or RHEL. > >> 6) Are there any restrictions or performance advantages of using all >> drives with the same geometry, or can you mix and match different size >> drives and just add to the aggregate volume size? > > As I said earlier, GFS does not do the aggregation. > > What you get with GFS is the ability to share an already networked > storage volume. You can use iSCSI, AoE, GNBD, or others to connect > the storage to all of the cluster nodes. Then you format the volume > with GFS so that it can be used with all of the nodes. > > I believe there is a project for the aggregate filesystem that you are > looking for, but as far as I know, it is still beta. > From gregp at liveammo.com Wed Apr 12 15:28:13 2006 From: gregp at liveammo.com (Greg Perry) Date: Wed, 12 Apr 2006 11:28:13 -0400 Subject: [Linux-cluster] Questions about GFS In-Reply-To: <443D1AF7.8090105@liveammo.com> References: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com> <443D1AF7.8090105@liveammo.com> Message-ID: <443D1C8D.5080503@liveammo.com> Also, after reviewing the GFS architecture it seems there would be significant security issues to consider, ie if one client/member of the GFS volume were compromised, that would lead to a full compromise of the filesystem across all nodes (and the ability to create special devices and modify the filesystem on any other GFS node member). Are there any plans to include any form of discretionary or mandatory access controls for GFS in the upcoming v2 release? Greg Greg Perry wrote: > Thanks Bowie, I understand more now. So within this architecture, it > would make more sense to utilize a RAID-5/10 SAN, then add diskless > workstations as needed for performance...? > > For said diskless workstations, does it make sense to run Stateless > Linux to keep the images the same across all of the workstations/client > machines? > > Regards > > Greg > > Bowie Bailey wrote: >> Greg Perry wrote: >>> I have been researching GFS for a few days, and I have some questions >>> that hopefully some seasoned users of GFS may be able to answer. >>> >>> I am working on the design of a linux cluster that needs to be >>> scalable, it will be primarily an RDBMS-driven data warehouse used >>> for data mining and content indexing. In an ideal world, we would be >>> able to start with a small (say 4 node) cluster, then add machines >>> (and storage) as the various RDBMS' grow in size (as well as the use >>> virtual IPs for load balancing across multiple lighttpd instances. >>> All machines on the node need to be able to talk to the same volume >>> of information, and GFS (in theory at least) would be used to >>> aggregate the drives from each machine into that huge shared logical >>> volume). >>> With that being said, here are some questions: >>> >>> 1) What is the preference on the RDBMS, will MySQL 5.x work and are >>> there any locking issues to consider? What would the best open source >>> RDBMS be (MySQL vs. Postgresql etc) >> >> Someone more qualified than me will have to answer that question. >> >>> 2) If there was a 10 machine cluster, each with a 300GB SATA drive, >>> can you use GFS to aggregate all 10 drives into one big logical 3000GB >>> volume? Would that scenario work similar to a RAID array? If one or >>> two nodes fail, but the GFS quorum is maintained, can those nodes be >>> replaced and repopulated just like a RAID-5 array? If this scenario >>> is possible, how difficult is it to "grow" the shared logical volume >>> by adding additional nodes (say I had two more machines each with a >>> 300GB SATA drive)? >> >> GFS doesn't work that way. GFS is just a fancy filesystem. It takes >> an already shared volume and allows all of the nodes to access it at >> the same time. >> >>> 3) How stable is GFS currently, and is it used in many production >>> environments? >> >> It seems to be stable for me, but we are still in testing mode at the >> moment. >> >>> 4) How stable is the FC5 version, and does it include all of the >>> configuration utilities in the RH Enterprise Cluster version? (the >>> idea would be to prove the point on FC5, then migrate to RH >>> Enterprise). >> >> Haven't used that one. >> >>> 5) Would CentOS be preferred over FC5 for the initial >>> proof of concept and early adoption? >> >> If your eventual platform is RHEL, then CentOS would make more sense >> for a testing platform since it is almost identical to RHEL. Fedora >> can be less stable and may introduce some issues that you wouldn't have >> with RHEL. On the other hand, RHEL may have some problems that don't >> appear on Fedora because of updated packages. >> >> If you want bleeding edge, use Fedora. >> If you want stability, use CentOS or RHEL. >> >>> 6) Are there any restrictions or performance advantages of using all >>> drives with the same geometry, or can you mix and match different size >>> drives and just add to the aggregate volume size? >> >> As I said earlier, GFS does not do the aggregation. >> >> What you get with GFS is the ability to share an already networked >> storage volume. You can use iSCSI, AoE, GNBD, or others to connect >> the storage to all of the cluster nodes. Then you format the volume >> with GFS so that it can be used with all of the nodes. >> >> I believe there is a project for the aggregate filesystem that you are >> looking for, but as far as I know, it is still beta. >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From hlawatschek at atix.de Wed Apr 12 15:36:46 2006 From: hlawatschek at atix.de (Mark Hlawatschek) Date: Wed, 12 Apr 2006 17:36:46 +0200 Subject: [Linux-cluster] Questions about GFS In-Reply-To: <443D1AF7.8090105@liveammo.com> References: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com> <443D1AF7.8090105@liveammo.com> Message-ID: <200604121736.46956.hlawatschek@atix.de> Greg, you can use a diskless shared root configuration with gfs. This setup would enable you to add cluster nodes as you need them. Have a look at http://www.open-sharedroot.org/ Mark On Wednesday 12 April 2006 17:21, Greg Perry wrote: > Thanks Bowie, I understand more now. So within this architecture, it > would make more sense to utilize a RAID-5/10 SAN, then add diskless > workstations as needed for performance...? > > For said diskless workstations, does it make sense to run Stateless > Linux to keep the images the same across all of the workstations/client > machines? > > Regards > > Greg > > Bowie Bailey wrote: > > Greg Perry wrote: > >> I have been researching GFS for a few days, and I have some questions > >> that hopefully some seasoned users of GFS may be able to answer. > >> > >> I am working on the design of a linux cluster that needs to be > >> scalable, it will be primarily an RDBMS-driven data warehouse used > >> for data mining and content indexing. In an ideal world, we would be > >> able to start with a small (say 4 node) cluster, then add machines > >> (and storage) as the various RDBMS' grow in size (as well as the use > >> virtual IPs for load balancing across multiple lighttpd instances. > >> All machines on the node need to be able to talk to the same volume > >> of information, and GFS (in theory at least) would be used to > >> aggregate the drives from each machine into that huge shared logical > >> volume). > >> > >> With that being said, here are some questions: > >> > >> 1) What is the preference on the RDBMS, will MySQL 5.x work and are > >> there any locking issues to consider? What would the best open source > >> RDBMS be (MySQL vs. Postgresql etc) > > > > Someone more qualified than me will have to answer that question. > > > >> 2) If there was a 10 machine cluster, each with a 300GB SATA drive, > >> can you use GFS to aggregate all 10 drives into one big logical 3000GB > >> volume? Would that scenario work similar to a RAID array? If one or > >> two nodes fail, but the GFS quorum is maintained, can those nodes be > >> replaced and repopulated just like a RAID-5 array? If this scenario > >> is possible, how difficult is it to "grow" the shared logical volume > >> by adding additional nodes (say I had two more machines each with a > >> 300GB SATA drive)? > > > > GFS doesn't work that way. GFS is just a fancy filesystem. It takes > > an already shared volume and allows all of the nodes to access it at > > the same time. > > > >> 3) How stable is GFS currently, and is it used in many production > >> environments? > > > > It seems to be stable for me, but we are still in testing mode at the > > moment. > > > >> 4) How stable is the FC5 version, and does it include all of the > >> configuration utilities in the RH Enterprise Cluster version? (the > >> idea would be to prove the point on FC5, then migrate to RH > >> Enterprise). > > > > Haven't used that one. > > > >> 5) Would CentOS be preferred over FC5 for the initial > >> proof of concept and early adoption? > > > > If your eventual platform is RHEL, then CentOS would make more sense > > for a testing platform since it is almost identical to RHEL. Fedora > > can be less stable and may introduce some issues that you wouldn't have > > with RHEL. On the other hand, RHEL may have some problems that don't > > appear on Fedora because of updated packages. > > > > If you want bleeding edge, use Fedora. > > If you want stability, use CentOS or RHEL. > > > >> 6) Are there any restrictions or performance advantages of using all > >> drives with the same geometry, or can you mix and match different size > >> drives and just add to the aggregate volume size? > > > > As I said earlier, GFS does not do the aggregation. > > > > What you get with GFS is the ability to share an already networked > > storage volume. You can use iSCSI, AoE, GNBD, or others to connect > > the storage to all of the cluster nodes. Then you format the volume > > with GFS so that it can be used with all of the nodes. > > > > I believe there is a project for the aggregate filesystem that you are > > looking for, but as far as I know, it is still beta. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Gruss / Regards, Dipl.-Ing. Mark Hlawatschek Phone: +49-89 121 409-55 http://www.atix.de/ http://www.open-sharedroot.org/ ** ATIX - Ges. fuer Informationstechnologie und Consulting mbH Einsteinstr. 10 - 85716 Unterschleissheim - Germany From Bowie_Bailey at BUC.com Wed Apr 12 15:45:19 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Wed, 12 Apr 2006 11:45:19 -0400 Subject: [Linux-cluster] Questions about GFS Message-ID: <4766EEE585A6D311ADF500E018C154E3021338DB@bnifex.cis.buc.com> As someone else pointed out, it is possible to run diskless workstations with their root on the GFS. I haven't tried this configuration, so I don't know what issues their may be. The security issue is there. Since they are all running from the same disk, a compromise on one can corrupt the entire cluster. On my systems, I just have a small hard drive to hold the OS and applications and then mount the GFS as a data partition. Bowie Greg Perry wrote: > Also, after reviewing the GFS architecture it seems there would be > significant security issues to consider, ie if one client/member of > the GFS volume were compromised, that would lead to a full compromise > of the filesystem across all nodes (and the ability to create special > devices and modify the filesystem on any other GFS node member). Are > there any plans to include any form of discretionary or mandatory > access controls for GFS in the upcoming v2 release? > > Greg > > Greg Perry wrote: > > Thanks Bowie, I understand more now. So within this architecture, > > it would make more sense to utilize a RAID-5/10 SAN, then add > > diskless workstations as needed for performance...? > > > > For said diskless workstations, does it make sense to run Stateless > > Linux to keep the images the same across all of the > > workstations/client machines? > > > > Regards > > > > Greg > > > > Bowie Bailey wrote: > > > Greg Perry wrote: > > > > I have been researching GFS for a few days, and I have some > > > > questions that hopefully some seasoned users of GFS may be able > > > > to answer. > > > > > > > > I am working on the design of a linux cluster that needs to be > > > > scalable, it will be primarily an RDBMS-driven data warehouse > > > > used for data mining and content indexing. In an ideal world, > > > > we would be able to start with a small (say 4 node) cluster, > > > > then add machines (and storage) as the various RDBMS' grow in > > > > size (as well as the use virtual IPs for load balancing across > > > > multiple lighttpd instances. All machines on the node need to > > > > be able to talk to the same volume of information, and GFS (in > > > > theory at least) would be used to aggregate the drives from > > > > each machine into that huge shared logical volume). With that > > > > being said, here are some questions: > > > > > > > > 1) What is the preference on the RDBMS, will MySQL 5.x work and > > > > are there any locking issues to consider? What would the best > > > > open source RDBMS be (MySQL vs. Postgresql etc) > > > > > > Someone more qualified than me will have to answer that question. > > > > > > > 2) If there was a 10 machine cluster, each with a 300GB SATA > > > > drive, can you use GFS to aggregate all 10 drives into one big > > > > logical 3000GB volume? Would that scenario work similar to a > > > > RAID array? If one or two nodes fail, but the GFS quorum is > > > > maintained, can those nodes be replaced and repopulated just > > > > like a RAID-5 array? If this scenario is possible, how > > > > difficult is it to "grow" the shared logical volume by adding > > > > additional nodes (say I had two more machines each with a 300GB > > > > SATA drive)? > > > > > > GFS doesn't work that way. GFS is just a fancy filesystem. It > > > takes an already shared volume and allows all of the nodes to > > > access it at the same time. > > > > > > > 3) How stable is GFS currently, and is it used in many > > > > production environments? > > > > > > It seems to be stable for me, but we are still in testing mode at > > > the moment. > > > > > > > 4) How stable is the FC5 version, and does it include all of the > > > > configuration utilities in the RH Enterprise Cluster version? > > > > (the idea would be to prove the point on FC5, then migrate to RH > > > > Enterprise). > > > > > > Haven't used that one. > > > > > > > 5) Would CentOS be preferred over FC5 for the initial > > > > proof of concept and early adoption? > > > > > > If your eventual platform is RHEL, then CentOS would make more > > > sense for a testing platform since it is almost identical to > > > RHEL. Fedora can be less stable and may introduce some issues > > > that you wouldn't have with RHEL. On the other hand, RHEL may > > > have some problems that don't appear on Fedora because of updated > > > packages. > > > > > > If you want bleeding edge, use Fedora. > > > If you want stability, use CentOS or RHEL. > > > > > > > 6) Are there any restrictions or performance advantages of > > > > using all drives with the same geometry, or can you mix and > > > > match different size drives and just add to the aggregate > > > > volume size? > > > > > > As I said earlier, GFS does not do the aggregation. > > > > > > What you get with GFS is the ability to share an already networked > > > storage volume. You can use iSCSI, AoE, GNBD, or others to > > > connect the storage to all of the cluster nodes. Then you format > > > the volume with GFS so that it can be used with all of the nodes. > > > > > > I believe there is a project for the aggregate filesystem that > > > you are looking for, but as far as I know, it is still beta. > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster From Bowie_Bailey at BUC.com Wed Apr 12 15:48:19 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Wed, 12 Apr 2006 11:48:19 -0400 Subject: [Linux-cluster] Questions about GFS Message-ID: <4766EEE585A6D311ADF500E018C154E3021338DC@bnifex.cis.buc.com> Also, keep in mind that the number of nodes is limited by the number of journals on your GFS filesystem. So when you create the filesystem, you should add a few extra journals to accommodate expansion. If you run out, you have to add disks to the GFS in order to create more journals. Bowie Mark Hlawatschek wrote: > Greg, > > you can use a diskless shared root configuration with gfs. This setup > would enable you to add cluster nodes as you need them. > Have a look at http://www.open-sharedroot.org/ > > Mark > > On Wednesday 12 April 2006 17:21, Greg Perry wrote: > > Thanks Bowie, I understand more now. So within this architecture, > > it would make more sense to utilize a RAID-5/10 SAN, then add > > diskless workstations as needed for performance...? > > > > For said diskless workstations, does it make sense to run Stateless > > Linux to keep the images the same across all of the > > workstations/client machines? From tf0054 at gmail.com Wed Apr 12 17:10:52 2006 From: tf0054 at gmail.com (Takeshi NAKANO) Date: Thu, 13 Apr 2006 02:10:52 +0900 Subject: [Linux-cluster] Cisco fence agent In-Reply-To: <1144766944.16956.10.camel@merlin.Mines.EDU> References: <1144766944.16956.10.camel@merlin.Mines.EDU> Message-ID: Hello Matthew. Thank for showing your code! That is exactly same one which I will make. > I like the network option because the host that is having problems > will be able to write log entries after it has been fenced. I can not agree more. Thanks a lot. Takeshi NAKANO. 2006/4/11, Matthew B. Brookover : > I do not know if this will help, but here is what I put together. > > We have 3 Cisco 3750 switches. I am currently using SNMP to turn off the > ports of a host that is being fenced. I wrote a perl script called > fence_cisco that works with GFS 6. I have attached a copy of fence_cisco to > this message and its config file. I do not have much in the way of > documentation for it, and it will probably take some hacking to get it to > work with a current version of GFS. If you know a little perl, writing a > fencing agent is not very difficult. > > I have also included a copy for the config file for fence_cisco. The first > two lines specify the SNMP community string and the IP address for the > switch. The rest is a list of hosts and the ports they use. You will have > to talk to your local network guru to figure out Cisco community strings and > the numbers involved. It took some tinkering to figure out how Cisco does > this stuff, and even after writing the code, I am still not sure that I > understand it. I do know that it does work, GFS does do the correct things > during a crash. > > Most people use one of the power supply switches. Redhat provides the > fence_apc agent that will turn off the power to a node that needs to be > fenced. I like the network option because the host that is having problems > will be able to write log entries after it has been fenced. > > You will need to get the Net::SNMP module from cpan.org to use fence_cisco. > Matt > > > > On Sun, 2006-04-09 at 01:23 +0900, ??? wrote: > > Hi all. Do anyone have cisco catalyst fence agent? If nobody make that, I > will make. Thanks. > -- Linux-cluster mailing list Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From aaron at firebright.com Wed Apr 12 19:25:49 2006 From: aaron at firebright.com (Aaron Stewart) Date: Wed, 12 Apr 2006 12:25:49 -0700 Subject: [Linux-cluster] CLVM and AoE Message-ID: <443D543D.2030202@firebright.com> Hey All, I'm currently in process of setting up a Coraid ATA over Ethernet device as a backend storage for multiple systems that export individual partitions to Xen virtual servers. In our discussions with Coraid, they suggested looking into CLVM in order to handle this. Obviously, I have some questions.. :) - Has anyone used this kind of setup? I have very little experience with Redhat's cluster management, but have a fairly high level of expertise overall in this arena. - How does management of LVM logical volumes occur? Do we need to maintain one server that administers the volume group? - What kind of pitfalls should we be aware of? Can anyone point to any experience or any HOWTO's that discuss setting something like this up? Here's the setup: 1. Coraid SR1520 configured in one lblade, exported via AoE on a dedicated storage network as one LUN 2. Centos4.2 on all cluster nodes 3. logical volumes get masked when getting passed into Xen, so on the Dom0 controller it should look like /dev/VolGroup00/{xenvmID} (which shows up in the virtual as /dev/sda1) 4. only one host need access to a given logical volume at any given time. If migration needs to occur, the volume should be unmounted and remounted on another physical system. 5. Despite the fact that AoE is a layer 4 protocol, apparently it can coexist with IP on the same network interface, so we can transport cluster metadata over the same interface. Barring that, there is a second (public) interface on each box. 6. We want to avoid a single point of failure (such as a second AoE server that exports luns from lvm lv's) Thanks in advance.. -=Aaron Stewart -------------- next part -------------- A non-text attachment was scrubbed... Name: aaron.vcf Type: text/x-vcard Size: 289 bytes Desc: not available URL: From sanelson at gmail.com Wed Apr 12 20:10:42 2006 From: sanelson at gmail.com (Steve Nelson) Date: Wed, 12 Apr 2006 21:10:42 +0100 Subject: [Linux-cluster] [OT] Serial Connection to MSA1000 Message-ID: Hi All, I'm assuming that most of us on this list have used HP MSA kit, so excuse me a slightly off-topic question! I've got a cluster connected to an MSA1000, but want to make some changes on the MSA1000 itself. I've got a dumb terminal that runs procom, but its pretty horrid, so I've connected the controller direct to the serial port of one of the linux machines to use minicom. As per HP's documentation, I've set it up as: pr port /dev/ttyS0 pu baudrate 19200 pu bits 8 pu parity N pu stopbits 1 However, I get no response. Any ideas on how to troubleshoot? Anyone got this working? S. From greg.freemyer at gmail.com Wed Apr 12 20:18:06 2006 From: greg.freemyer at gmail.com (Greg Freemyer) Date: Wed, 12 Apr 2006 16:18:06 -0400 Subject: [Linux-cluster] [OT] Serial Connection to MSA1000 In-Reply-To: References: Message-ID: <87f94c370604121318y179cdd1as8bd8fc62d988ad99@mail.gmail.com> Did you try 9600 baud? Seems like all the Dec, I mean Compaq, I mean HP storage uses 9600 not 19200. I don't know what the HP stuff uses that is not from the old Dec storageworks line. On 4/12/06, Steve Nelson wrote: > Hi All, > > I'm assuming that most of us on this list have used HP MSA kit, so > excuse me a slightly off-topic question! > > I've got a cluster connected to an MSA1000, but want to make some > changes on the MSA1000 itself. > > I've got a dumb terminal that runs procom, but its pretty horrid, so > I've connected the controller direct to the serial port of one of the > linux machines to use minicom. > > As per HP's documentation, I've set it up as: > > pr port /dev/ttyS0 > pu baudrate 19200 > pu bits 8 > pu parity N > pu stopbits 1 > > However, I get no response. > > Any ideas on how to troubleshoot? Anyone got this working? > > S. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Greg Freemyer The Norcross Group Forensics for the 21st Century From cjk at techma.com Wed Apr 12 20:28:30 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Wed, 12 Apr 2006 16:28:30 -0400 Subject: [Linux-cluster] [OT] Serial Connection to MSA1000 Message-ID: Turn off flow control if it's on, save the config as default and restart minicom. Also, make sure you are using the HP supplied cable and not some one off or general serial cable. In true HP form, it's a custom cable... If that doesn't work, here are some things to check.. 1. The HP cable is plugged into the _front_ of the MSA (the back is all fibre) 2. Make sure your serial port is not being used by something else (serial terminal) 3. umm, I dunno, these are pretty simple... Good luck Regards, Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Steve Nelson Sent: Wednesday, April 12, 2006 4:11 PM To: linux clustering Subject: [Linux-cluster] [OT] Serial Connection to MSA1000 Hi All, I'm assuming that most of us on this list have used HP MSA kit, so excuse me a slightly off-topic question! I've got a cluster connected to an MSA1000, but want to make some changes on the MSA1000 itself. I've got a dumb terminal that runs procom, but its pretty horrid, so I've connected the controller direct to the serial port of one of the linux machines to use minicom. As per HP's documentation, I've set it up as: pr port /dev/ttyS0 pu baudrate 19200 pu bits 8 pu parity N pu stopbits 1 However, I get no response. Any ideas on how to troubleshoot? Anyone got this working? S. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From cjk at techma.com Wed Apr 12 20:29:05 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Wed, 12 Apr 2006 16:29:05 -0400 Subject: [Linux-cluster] [OT] Serial Connection to MSA1000 Message-ID: MSA1x00's use 19200... it's an oddball Regards Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Greg Freemyer Sent: Wednesday, April 12, 2006 4:18 PM To: linux clustering Subject: Re: [Linux-cluster] [OT] Serial Connection to MSA1000 Did you try 9600 baud? Seems like all the Dec, I mean Compaq, I mean HP storage uses 9600 not 19200. I don't know what the HP stuff uses that is not from the old Dec storageworks line. On 4/12/06, Steve Nelson wrote: > Hi All, > > I'm assuming that most of us on this list have used HP MSA kit, so > excuse me a slightly off-topic question! > > I've got a cluster connected to an MSA1000, but want to make some > changes on the MSA1000 itself. > > I've got a dumb terminal that runs procom, but its pretty horrid, so > I've connected the controller direct to the serial port of one of the > linux machines to use minicom. > > As per HP's documentation, I've set it up as: > > pr port /dev/ttyS0 > pu baudrate 19200 > pu bits 8 > pu parity N > pu stopbits 1 > > However, I get no response. > > Any ideas on how to troubleshoot? Anyone got this working? > > S. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Greg Freemyer The Norcross Group Forensics for the 21st Century -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From cjk at techma.com Wed Apr 12 20:30:42 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Wed, 12 Apr 2006 16:30:42 -0400 Subject: [Linux-cluster] [OT] Serial Connection to MSA1000 Message-ID: Could be that someone else changed the baud setting tho, so Greg has a good point.. If someone used to 9600 worked on it, they might have changed it cuz the default wuz "wrong" :) Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Greg Freemyer Sent: Wednesday, April 12, 2006 4:18 PM To: linux clustering Subject: Re: [Linux-cluster] [OT] Serial Connection to MSA1000 Did you try 9600 baud? Seems like all the Dec, I mean Compaq, I mean HP storage uses 9600 not 19200. I don't know what the HP stuff uses that is not from the old Dec storageworks line. On 4/12/06, Steve Nelson wrote: > Hi All, > > I'm assuming that most of us on this list have used HP MSA kit, so > excuse me a slightly off-topic question! > > I've got a cluster connected to an MSA1000, but want to make some > changes on the MSA1000 itself. > > I've got a dumb terminal that runs procom, but its pretty horrid, so > I've connected the controller direct to the serial port of one of the > linux machines to use minicom. > > As per HP's documentation, I've set it up as: > > pr port /dev/ttyS0 > pu baudrate 19200 > pu bits 8 > pu parity N > pu stopbits 1 > > However, I get no response. > > Any ideas on how to troubleshoot? Anyone got this working? > > S. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Greg Freemyer The Norcross Group Forensics for the 21st Century -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From sanelson at gmail.com Wed Apr 12 20:28:51 2006 From: sanelson at gmail.com (Steve Nelson) Date: Wed, 12 Apr 2006 21:28:51 +0100 Subject: [Linux-cluster] [OT] Serial Connection to MSA1000 In-Reply-To: <87f94c370604121318y179cdd1as8bd8fc62d988ad99@mail.gmail.com> References: <87f94c370604121318y179cdd1as8bd8fc62d988ad99@mail.gmail.com> Message-ID: On 4/12/06, Greg Freemyer wrote: > Did you try 9600 baud? I did... I am assuming /dev/ttyS0 is correct - it only has one serial port! S. From sanelson at gmail.com Wed Apr 12 20:40:49 2006 From: sanelson at gmail.com (Steve Nelson) Date: Wed, 12 Apr 2006 21:40:49 +0100 Subject: [Linux-cluster] [OT] Serial Connection to MSA1000 In-Reply-To: References: Message-ID: On 4/12/06, Kovacs, Corey J. wrote: > Turn off flow control if it's on, save the config as default and restart > minicom. Thanks very much. I had turned off flow control, but saving as default, and restarting appeared to make the difference. Welcome to minicom 2.00.0 OPTIONS: History Buffer, F-key Macros, Search History Buffer, I18n Compiled on Sep 12 2003, 17:33:22. Press CTRL-A Z for help on special keys Invalid CLI command. CLI> AT S7=45 S0=0 L1 V1 X4 &c1 E1 Q0 Invalid CLI command. CLI> Incidentally, how do I get it not to send that dialling stuff? > Corey S. From Bowie_Bailey at BUC.com Wed Apr 12 20:59:26 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Wed, 12 Apr 2006 16:59:26 -0400 Subject: [Linux-cluster] CLVM and AoE Message-ID: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com> Aaron Stewart wrote: > > I'm currently in process of setting up a Coraid ATA over Ethernet > device as a backend storage for multiple systems that export > individual partitions to Xen virtual servers. In our discussions > with Coraid, they suggested looking into CLVM in order to handle this. > > Obviously, I have some questions.. :) > > - Has anyone used this kind of setup? I have very little experience > with Redhat's cluster management, but have a fairly high level of > expertise overall in this arena. I don't know anything about Xen, but I am using this same basic setup on my systems. > - How does management of LVM logical volumes occur? Do we need to > maintain one server that administers the volume group? The management is distributed. You can manage the cluster and volume groups from any node. > - What kind of pitfalls should we be aware of? Some people have complained about throughput issues with GFS. Our application doesn't require high throughput, so I can't comment on this. I haven't found any issues in my testing so far. > Can anyone point to any experience or any HOWTO's that discuss setting > something like this up? There are a few documents, but most of the ones that I've seen are out of date. If you have specific questions, you can ask here. If you don't have it already, here is the yum config with the current cluster RPMs for CentOS. Just drop it in a file in /etc/yum.repos.d/. Note that the current cluster RPMs are for the new 2.6.9-34.EL kernel. ---------------------------- [csgfs] name=CentOS-4 - CSGFS baseurl=http://mirror.centos.org/centos/$releasever/csgfs/$basearch/ gpgcheck=1 enabled=1 ---------------------------- The only thing you need to build from source is the AoE driver from CoRaid. > Here's the setup: > > 1. Coraid SR1520 configured in one lblade, exported via AoE on a > dedicated storage network as one LUN > 2. Centos4.2 on all cluster nodes > 3. logical volumes get masked when getting passed into Xen, so on the > Dom0 controller it should look like /dev/VolGroup00/{xenvmID} (which > shows up in the virtual as /dev/sda1) > 4. only one host need access to a given logical volume at any given > time. If migration needs to occur, the volume should be unmounted and > remounted on another physical system. This can be done, but the cluster will not do it for you. Each logical volume can be accessed by as many nodes as you need. Note that you need one GFS journal per node that needs simultaneous access. > 5. Despite the fact that AoE is a layer 4 protocol, apparently it can > coexist with IP on the same network interface, so we can transport > cluster metadata over the same interface. Barring that, there is a > second (public) interface on each box. > 6. We want to avoid a single point of failure (such as a second AoE > server that exports luns from lvm lv's) Now that DLM is the recommended locking manager, everything is distributed. Your only single point of failure is the CoRaid box. -- Bowie From aaron at firebright.com Wed Apr 12 21:11:24 2006 From: aaron at firebright.com (Aaron Stewart) Date: Wed, 12 Apr 2006 14:11:24 -0700 Subject: [Linux-cluster] CLVM and AoE In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com> References: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com> Message-ID: <443D6CFC.7000507@firebright.com> Hey Bowie, Wow.. That's perfect. Thanks for the response. I have a question about whether GFS is a requirement.. Since each lv is a separate partition mounted on xen, does GFS make sense, or can we use ext3/xfs/etc.? -=Aaron Bowie Bailey wrote: > Aaron Stewart wrote: > >> I'm currently in process of setting up a Coraid ATA over Ethernet >> device as a backend storage for multiple systems that export >> individual partitions to Xen virtual servers. In our discussions >> with Coraid, they suggested looking into CLVM in order to handle this. >> >> Obviously, I have some questions.. :) >> >> - Has anyone used this kind of setup? I have very little experience >> with Redhat's cluster management, but have a fairly high level of >> expertise overall in this arena. >> > > I don't know anything about Xen, but I am using this same basic setup > on my systems. > > >> - How does management of LVM logical volumes occur? Do we need to >> maintain one server that administers the volume group? >> > > The management is distributed. You can manage the cluster and volume > groups from any node. > > >> - What kind of pitfalls should we be aware of? >> > > Some people have complained about throughput issues with GFS. Our > application doesn't require high throughput, so I can't comment on > this. I haven't found any issues in my testing so far. > > >> Can anyone point to any experience or any HOWTO's that discuss setting >> something like this up? >> > > There are a few documents, but most of the ones that I've seen are out > of date. If you have specific questions, you can ask here. > > If you don't have it already, here is the yum config with the current > cluster RPMs for CentOS. Just drop it in a file in /etc/yum.repos.d/. > Note that the current cluster RPMs are for the new 2.6.9-34.EL kernel. > > ---------------------------- > [csgfs] > name=CentOS-4 - CSGFS > baseurl=http://mirror.centos.org/centos/$releasever/csgfs/$basearch/ > gpgcheck=1 > enabled=1 > ---------------------------- > > The only thing you need to build from source is the AoE driver from > CoRaid. > > >> Here's the setup: >> >> 1. Coraid SR1520 configured in one lblade, exported via AoE on a >> dedicated storage network as one LUN >> 2. Centos4.2 on all cluster nodes >> 3. logical volumes get masked when getting passed into Xen, so on the >> Dom0 controller it should look like /dev/VolGroup00/{xenvmID} (which >> shows up in the virtual as /dev/sda1) >> 4. only one host need access to a given logical volume at any given >> time. If migration needs to occur, the volume should be unmounted and >> remounted on another physical system. >> > > This can be done, but the cluster will not do it for you. Each > logical volume can be accessed by as many nodes as you need. Note > that you need one GFS journal per node that needs simultaneous access. > > >> 5. Despite the fact that AoE is a layer 4 protocol, apparently it can >> coexist with IP on the same network interface, so we can transport >> cluster metadata over the same interface. Barring that, there is a >> second (public) interface on each box. >> 6. We want to avoid a single point of failure (such as a second AoE >> server that exports luns from lvm lv's) >> > > Now that DLM is the recommended locking manager, everything is > distributed. Your only single point of failure is the CoRaid box. > > -------------- next part -------------- A non-text attachment was scrubbed... Name: aaron.vcf Type: text/x-vcard Size: 289 bytes Desc: not available URL: From mtp at tilted.com Wed Apr 12 21:29:00 2006 From: mtp at tilted.com (Mark Petersen) Date: Wed, 12 Apr 2006 16:29:00 -0500 Subject: [Linux-cluster] CLVM and AoE In-Reply-To: <443D6CFC.7000507@firebright.com> References: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com> <443D6CFC.7000507@firebright.com> Message-ID: <7.0.1.0.2.20060412162416.028964f0@tilted.com> At 04:11 PM 4/12/2006, you wrote: >Hey Bowie, > >Wow.. That's perfect. Thanks for the response. > >I have a question about whether GFS is a requirement.. Since each lv >is a separate partition mounted on xen, does GFS make sense, or can >we use ext3/xfs/etc.? So is every dom0 going to mount the CoRaid device directly using AoE? And CLVM will notify the whole cluster when any single node makes LVM changes? If not, then you'll need to use GNBD to export the lv's I guess. Either way you can use whatever fs you have support for in a xenU kernel. You shouldn't need to format anything GFS at all. From lhh at redhat.com Wed Apr 12 22:07:05 2006 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 12 Apr 2006 18:07:05 -0400 Subject: [Linux-cluster] Help-me, Please In-Reply-To: <9e7b71460604101657n1eebc099jfaabb5a08ebbc630@mail.gmail.com> References: <9e7b71460604101657n1eebc099jfaabb5a08ebbc630@mail.gmail.com> Message-ID: <1144879625.15794.48.camel@ayanami.boston.redhat.com> On Mon, 2006-04-10 at 20:57 -0300, ANDRE LUIS FORIGATO wrote: > Linux xlx2 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005 > i686 i686 i386 GNU/Linux > Apr 10 01:18:07 xlx2 clusvcmgrd[4671]: Couldn't connect to > member #0: Connection timed out > Apr 10 05:13:43 xlx2 clusvcmgrd[4671]: Couldn't connect to > member #0: Connection timed out > Apr 10 05:13:43 xlx2 clusvcmgrd[4671]: Unable to obtain cluster > lock: No locks available > Apr 10 05:13:49 xlx2 cluquorumd[4463]: Disk-TB: Partner is DOWN > (Dead/Hung) > Apr 10 05:13:54 xlx2 cluquorumd[4463]: Disk-TB: State Change: Partner UP > Apr 10 10:47:08 xlx2 clusvcmgrd[4671]: Couldn't connect to > member #0: Connection timed out > Apr 10 10:47:08 xlx2 clusvcmgrd[4671]: Unable to obtain cluster > lock: No locks available > Apr 10 11:30:59 xlx2 clusvcmgrd[4671]: Couldn't connect to > member #0: Connection timed out > Apr 10 11:30:59 xlx2 clusvcmgrd[4671]: Unable to obtain cluster > lock: No locks available > Apr 10 11:31:07 xlx2 clumembd[4493]: Membership View #5:0x00000002 > Apr 10 11:31:08 xlx2 cluquorumd[4463]: Membership reports #0 > as down, but disk reports as up: State uncertain! > Apr 10 11:31:08 xlx2 cluquorumd[4463]: --> Commencing STONITH <-- > Apr 10 11:31:08 xlx2 cluquorumd[4463]: Disk-TB: Partner is DOWN > (Dead/Hung) > Apr 10 11:31:10 xlx2 cluquorumd[4463]: Disk-TB: State Change: Partner UP > Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: Quorum Event: View #12 0x00000002 > Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: Member > 200.254.254.171's state is uncertain: Some services may be > unavailable! > Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: Quorum Event: View #13 0x00000002 > Apr 10 11:31:29 xlx2 clusvcmgrd[4671]: Couldn't connect to > member #0: Connection timed out > Apr 10 11:31:29 xlx2 clusvcmgrd[4671]: Unable to obtain cluster > lock: No locks available > Apr 10 11:31:34 xlx2 cluquorumd[4463]: Disk-TB: Partner is DOWN > (Dead/Hung) > Apr 10 11:31:38 xlx2 cluquorumd[4463]: --> Commencing STONITH <-- > Apr 10 11:31:38 xlx2 cluquorumd[4463]: STONITH: Falsely > claiming that 200.254.254.171 has been fenced > Apr 10 11:31:38 xlx2 cluquorumd[4463]: STONITH: Data integrity > may be compromised! > Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: Couldn't connect to > member #0: Connection timed out > Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: Unable to obtain cluster > lock: No locks available > Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: Quorum Event: View #15 0x00000002 > Apr 10 11:31:41 xlx2 clusvcmgrd[4671]: State change: > 200.254.254.172 DOWN > Apr 10 11:34:08 xlx2 cluquorumd[4463]: Disk-TB: State Change: Partner UP > Apr 10 11:34:09 xlx2 clusvcmgrd[4671]: Quorum Event: View #16 0x00000002 > Apr 10 11:34:16 xlx2 clusvcmgrd[4671]: Couldn't connect to > member #0: No route to host > Apr 10 11:34:16 xlx2 clusvcmgrd[4671]: Unable to obtain cluster > lock: No locks available > Apr 10 11:34:25 xlx2 clusvcmgrd[4671]: Couldn't connect to > member #0: No route to host > Apr 10 11:34:25 xlx2 clusvcmgrd[4671]: Unable to obtain cluster > lock: No locks available > Apr 10 11:34:34 xlx2 clusvcmgrd[4671]: Couldn't connect to > member #0: No route to host > Apr 10 11:34:34 xlx2 clusvcmgrd[4671]: Unable to obtain cluster > lock: No locks available > Apr 10 11:34:43 xlx2 clusvcmgrd[4671]: Couldn't connect to > member #0: No route to host > Apr 10 11:34:43 xlx2 clusvcmgrd[4671]: Unable to obtain cluster > lock: No locks available > Apr 10 11:34:50 xlx2 clumembd[4493]: Member 200.254.254.171 UP > Apr 10 11:34:50 xlx2 clumembd[4493]: Membership View #6:0x00000003 > Apr 10 11:34:50 xlx2 cluquorumd[4463]: __msg_send: Incomplete > write to 13. Error: Connection reset by peer > Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: Quorum Event: View #17 0x00000003 > Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: State change: Local UP > Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: State change: 200.254.254.171 UP > Apr 10 13:21:25 xlx2 clusvcmgrd[4671]: Couldn't connect to > member #0: Connection timed out > Apr 10 17:03:22 xlx2 clusvcmgrd[4671]: Couldn't connect to > member #0: Connection timed out > Apr 10 20:30:30 xlx2 clulockd[4498]: Denied 200.254.254.171: > Broken pipe > Apr 10 20:30:30 xlx2 clulockd[4498]: select error: Broken pipe What were you doing when this happened? -- Lon From lhh at redhat.com Wed Apr 12 22:13:25 2006 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 12 Apr 2006 18:13:25 -0400 Subject: [Linux-cluster] Cluster node not able to access all cluster resource In-Reply-To: <0C5C8B118420264EBB94D7D7050150011EFA92@exchange2.comune.prato.local> References: <0C5C8B118420264EBB94D7D7050150011EFA92@exchange2.comune.prato.local> Message-ID: <1144880005.15794.54.camel@ayanami.boston.redhat.com> On Sat, 2006-04-08 at 19:05 +0200, Leandro Dardini wrote: > The topic is not a problem, but what I want to do. I have a lots of > service, each on is now run by a two node cluster. This is very bad due > to each node fencing other one during network blackout. I'd like to > create only one cluster, but each resource, either GFS filesystems, must > be readable only by a limited number of nodes. > > For example, taking a Cluster "test" made of node A, node B, node C, > node D and with the following resources: GFS Filesystem alpha and GFS > Filesystem beta. I want that only node A and node B can access GFS > Filesystem alpha and only node C and node D can access GFS Filesystem > beta. > > Is it possible? You can just mount alpha on {A B} and beta on {C D}, but I don't think there is an easy way to forcefully prevent mounting alpha on {C D} currently; someone else might know better. -- Lon From lhh at redhat.com Wed Apr 12 22:36:27 2006 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 12 Apr 2006 18:36:27 -0400 Subject: [Linux-cluster] issues with rhcs 4.2 In-Reply-To: <20060408164804.54434.qmail@web8319.mail.in.yahoo.com> References: <20060408164804.54434.qmail@web8319.mail.in.yahoo.com> Message-ID: <1144881387.15794.66.camel@ayanami.boston.redhat.com> On Sat, 2006-04-08 at 17:48 +0100, Kumaresh Ponnuswamy wrote: > hi, > > I recently migrated from rhcs 3 to rhcs 4.2 and since then I am unable > to bring up the clustered services. > > Even though the services are getting executed (like the VIP, shared > devices etc), the status in clustat and system-config-cluster still > displays failed and because of this the failover is not happening. > Any light on this will be much appreciated. Cluster is on RHEL AS 4U2 > with two nodes. The part that fails should be in the log. My guess is that it is the script. Rgmanager expects LSB behavior - i.e. "stop after stop" should return 0, not 1. If we have a '1' return code, rgmanager thinks the service has failed to stop - so the service can not fail over (resources might still be allocated!). See: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151104 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=173991 -- Lon From cjkovacs at verizon.net Thu Apr 13 01:42:55 2006 From: cjkovacs at verizon.net (Corey Kovacs) Date: Wed, 12 Apr 2006 21:42:55 -0400 Subject: [Linux-cluster] [OT] Serial Connection to MSA1000 In-Reply-To: References: Message-ID: <1144892575.8357.1.camel@ronin.home.net> Good to hear it's working... You can get rid of the modem stuff by pressing CTRL+O, then select the modem settings option. Just clean everything that you can out and save again. Have fun... Corey On Wed, 2006-04-12 at 21:40 +0100, Steve Nelson wrote: > On 4/12/06, Kovacs, Corey J. wrote: > > Turn off flow control if it's on, save the config as default and restart > > minicom. > > Thanks very much. I had turned off flow control, but saving as > default, and restarting appeared to make the difference. > > Welcome to minicom 2.00.0 > > OPTIONS: History Buffer, F-key Macros, Search History Buffer, I18n > Compiled on Sep 12 2003, 17:33:22. > > Press CTRL-A Z for help on special keys > > > Invalid CLI command. > > CLI> AT S7=45 S0=0 L1 V1 X4 &c1 E1 Q0 > Invalid CLI command. > > CLI> > > Incidentally, how do I get it not to send that dialling stuff? > > > Corey > > S. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From cjkovacs at verizon.net Thu Apr 13 04:07:53 2006 From: cjkovacs at verizon.net (Corey Kovacs) Date: Thu, 13 Apr 2006 00:07:53 -0400 Subject: [Linux-cluster] [OT] Serial Connection to MSA1000 In-Reply-To: <1144892575.8357.1.camel@ronin.home.net> References: <1144892575.8357.1.camel@ronin.home.net> Message-ID: <1144901273.8357.3.camel@ronin.home.net> Sorry, that should be "CTRL+A, then o" On Wed, 2006-04-12 at 21:42 -0400, Corey Kovacs wrote: > Good to hear it's working... > > You can get rid of the modem stuff by pressing CTRL+O, then select the > modem settings option. Just clean everything that you can out and save > again. > > Have fun... > > > Corey > > > > On Wed, 2006-04-12 at 21:40 +0100, Steve Nelson wrote: > > On 4/12/06, Kovacs, Corey J. wrote: > > > Turn off flow control if it's on, save the config as default and restart > > > minicom. > > > > Thanks very much. I had turned off flow control, but saving as > > default, and restarting appeared to make the difference. > > > > Welcome to minicom 2.00.0 > > > > OPTIONS: History Buffer, F-key Macros, Search History Buffer, I18n > > Compiled on Sep 12 2003, 17:33:22. > > > > Press CTRL-A Z for help on special keys > > > > > > Invalid CLI command. > > > > CLI> AT S7=45 S0=0 L1 V1 X4 &c1 E1 Q0 > > Invalid CLI command. > > > > CLI> > > > > Incidentally, how do I get it not to send that dialling stuff? > > > > > Corey > > > > S. > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From kumaresh81 at yahoo.co.in Thu Apr 13 07:06:53 2006 From: kumaresh81 at yahoo.co.in (Kumaresh Ponnuswamy) Date: Thu, 13 Apr 2006 08:06:53 +0100 (BST) Subject: [Linux-cluster] issues with rhcs 4.2 In-Reply-To: <1144881387.15794.66.camel@ayanami.boston.redhat.com> Message-ID: <20060413070653.37951.qmail@web8326.mail.in.yahoo.com> Hi, thanks for the mail. the issue is that RHCS 4 expects an RC script rather than a normal script. After making it an RC script, the cluster is working. Regards, Kumaresh Lon Hohberger wrote: On Sat, 2006-04-08 at 17:48 +0100, Kumaresh Ponnuswamy wrote: > hi, > > I recently migrated from rhcs 3 to rhcs 4.2 and since then I am unable > to bring up the clustered services. > > Even though the services are getting executed (like the VIP, shared > devices etc), the status in clustat and system-config-cluster still > displays failed and because of this the failover is not happening. > Any light on this will be much appreciated. Cluster is on RHEL AS 4U2 > with two nodes. The part that fails should be in the log. My guess is that it is the script. Rgmanager expects LSB behavior - i.e. "stop after stop" should return 0, not 1. If we have a '1' return code, rgmanager thinks the service has failed to stop - so the service can not fail over (resources might still be allocated!). See: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151104 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=173991 -- Lon -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster --------------------------------- Jiyo cricket on Yahoo! India cricket Yahoo! Messenger Mobile Stay in touch with your buddies all the time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sanelson at gmail.com Thu Apr 13 07:54:32 2006 From: sanelson at gmail.com (Steve Nelson) Date: Thu, 13 Apr 2006 08:54:32 +0100 Subject: [Linux-cluster] [OT] Serial Connection to MSA1000 In-Reply-To: <1144901273.8357.3.camel@ronin.home.net> References: <1144892575.8357.1.camel@ronin.home.net> <1144901273.8357.3.camel@ronin.home.net> Message-ID: On 4/13/06, Corey Kovacs wrote: > Sorry, that should be "CTRL+A, then o" Yeah, I quickly got the hang of the minicom interface as it seems to be just the same as screen and ratpoison (and thus I suppose emacs?). Thanks for your help! S. From Alain.Moulle at bull.net Thu Apr 13 09:04:09 2006 From: Alain.Moulle at bull.net (Alain Moulle) Date: Thu, 13 Apr 2006 11:04:09 +0200 Subject: [Linux-cluster] CS4 Update 2 / question about quorum Message-ID: <443E1409.4050700@bull.net> Hi A question a little bit theoretical for my understanding : for a cluster with 8 nodes, I understand that each node has by default a Quorum Votes value = 1 , so does that mean that until 3 nodes are failed , services are failovered by others, and that at the 4th failed one, the cluster is stalled in the current state ? And in which cases would it be judicious to set the Quorum Vote for some nodes at 2 or more ? Or is there a way to modify the % to define if the cluster is quorate or note ? For example, let's suppose that on the 8 nodes cluster, whatever 2 nodes are able (in term of capacity/perf ...) to run all HA services of the 8 nodes, is-it possible to configure the cs4 such as the failover will be possible even if 6 nodes are failed ? Thanks Alain Moull? From pcaulfie at redhat.com Thu Apr 13 09:16:07 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 13 Apr 2006 10:16:07 +0100 Subject: [Linux-cluster] CS4 Update 2 / question about quorum In-Reply-To: <443E1409.4050700@bull.net> References: <443E1409.4050700@bull.net> Message-ID: <443E16D7.5000606@redhat.com> Alain Moulle wrote: > Hi > > A question a little bit theoretical for my understanding : > for a cluster with 8 nodes, I understand that each node > has by default a Quorum Votes value = 1 , so does that > mean that until 3 nodes are failed , services are failovered > by others, and that at the 4th failed one, the cluster > is stalled in the current state ? > And in which cases would it be judicious to set the > Quorum Vote for some nodes at 2 or more ? > Or is there a way to modify the % to define if > the cluster is quorate or note ? > For example, let's suppose that on the 8 nodes cluster, > whatever 2 nodes are able (in term of capacity/perf ...) > to run all HA services of the 8 nodes, is-it possible > to configure the cs4 such as the failover will be possible > even if 6 nodes are failed ? Quorum is not really related to failover, it's to prevent "split-brain" so that (eg) a service doesn't end up running on two nodes that can't talk to each other or (more importantly) that a GFS filesystem doesn't get corrupted by two non-cooperating systems. Yes, it's possible to set votes on some machines higher than others but you need to be very careful that you do your calculations correctly such that you can't get into a split brain situation if two higher-rated nodes split off into two separate clusters. Fiddling with the node votes is most useful where you have server (perhaps gnbd) nodes in the cluster without which the satellites can't work. patrick From carlopmart at gmail.com Thu Apr 13 10:39:50 2006 From: carlopmart at gmail.com (carlopmart) Date: Thu, 13 Apr 2006 12:39:50 +0200 Subject: [Linux-cluster] OT: Tomcat with RHEL on CS Message-ID: <443E2A76.8070705@gmail.com> Hi all, I need some help to accomplish the following task: I need to setup a high availability cluster for Tomcat+Apache. I can not use a shared storage (content pages html, jsp and so on are static). Which can be the best form: use Cluster Suite with RHEL, heartbeat from linx-ha.org, keepalived or another one?? Many thanks. -- CL Martinez carlopmart {at} gmail {d0t} com From dgolden at cp.dias.ie Thu Apr 13 11:13:43 2006 From: dgolden at cp.dias.ie (David Golden) Date: Thu, 13 Apr 2006 12:13:43 +0100 Subject: [Linux-cluster] CLVM and AoE In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com> References: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com> Message-ID: <20060413111342.GA3168@ariadne.cp.dias.ie> On 2006-04-12 16:59:26 -0400, Bowie Bailey wrote: > > - What kind of pitfalls should we be aware of? > > Some people have complained about throughput issues with GFS. Our > application doesn't require high throughput, so I can't comment on > this. I haven't found any issues in my testing so far. > Well, a thing I _think_ we've seen a few times is that the case of many simultaneous writes to different files in different directories is MUCH faster than many simultaneous writes to different files in the same directory. I think this may have been mentioned before on-list, IIRC it's a design trade-off, something to do with GFS's efforts to preserve strict unix-like consistency (generally regarded as a major advantage of GFS over the horrors of NFS), directory metadata about the files needs to be updated an awful lot in the same directory case, and the directory therefore needs to be locked for update an awful lot, which can lead to much slowdown. I don't have hard numbers, nor available facilities to generate them right now, so feel free to regard this as FUD. From pcaulfie at redhat.com Thu Apr 13 13:27:10 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 13 Apr 2006 14:27:10 +0100 Subject: [Linux-cluster] New cman & ccs Message-ID: <443E51AE.5090007@redhat.com> I've written a short web page on the differences between the 'old' in-kernel cman (in the RHEL4 & STABLE branches) and the 'new' userspace openAIS-based cman. http://people.redhat.com/pcaulfie/cmanccs.html This isn't a tutorial on CCS or cluster.conf, it just outlines what is different between the two. The only non-forwards compatible bit is that the userland version needs nodeids assigning. ccs_tool now has a subcommand to do this for you. -- patrick From Bowie_Bailey at BUC.com Thu Apr 13 14:40:22 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Thu, 13 Apr 2006 10:40:22 -0400 Subject: [Linux-cluster] CLVM and AoE Message-ID: <4766EEE585A6D311ADF500E018C154E3021338EC@bnifex.cis.buc.com> Aaron Stewart wrote: > Hey Bowie, > > Wow.. That's perfect. Thanks for the response. > > I have a question about whether GFS is a requirement.. Since each lv > is a separate partition mounted on xen, does GFS make sense, or can > we use ext3/xfs/etc.? What you get from GFS is the ability for multiple nodes to mount the filesystem simultaneously. If you are never going to do this, then you can use any filesystem you want. CLVM can handle management of the lv's across the nodes. If you don't use GFS, just make absolutely sure that there is no way that two nodes could mount the same lv. As far as I know, there is nothing in the cluster that will prevent an ext3 or xfs filesystem from being mounted by multiple nodes. And if it happens, you have almost guaranteed data corruption. -- Bowie From Bowie_Bailey at BUC.com Thu Apr 13 15:33:07 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Thu, 13 Apr 2006 11:33:07 -0400 Subject: [Linux-cluster] bootup error - undefined symbol: lvm_snprintf after update Message-ID: <4766EEE585A6D311ADF500E018C154E3021338EF@bnifex.cis.buc.com> This is an x86_64 system that I just updated to the newest Cluster rpms. When I watch the bootup on the console, I see an error: lvm.static: symbol lookup error: /usr/lib64/liblvm2clusterlock.so: undefined symbol: lvm_snprintf This error comes immediately after the "Activating VGs" line, so it appears to be triggered by the vgchange command in the clvmd startup file. I have another, identically configured, server which I have not updated yet. This server does not give the error. Everything seems to be working fine, so is this something I need to worry about? -- Bowie From sanelson at gmail.com Thu Apr 13 16:27:54 2006 From: sanelson at gmail.com (Steve Nelson) Date: Thu, 13 Apr 2006 17:27:54 +0100 Subject: [Linux-cluster] Order to Power Up Message-ID: Hi All, I've had to power down all the machines in a GFS 6.0 cluster - 2 nodes and a lock_gulmd qurum server. If I bring them up one at a time, the first server will hang waiting to start lock_gulmd. What's the best way to do this, and the best order? Should I bring them up in single user mode first and then start the services manually? S. From mtp at tilted.com Thu Apr 13 18:03:13 2006 From: mtp at tilted.com (Mark Petersen) Date: Thu, 13 Apr 2006 13:03:13 -0500 Subject: [Linux-cluster] CLVM and AoE In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338EC@bnifex.cis.buc.co m> References: <4766EEE585A6D311ADF500E018C154E3021338EC@bnifex.cis.buc.com> Message-ID: <7.0.1.0.2.20060413125753.0281dc00@tilted.com> At 09:40 AM 4/13/2006, you wrote: >If you don't use GFS, just make absolutely sure that there is no way >that two nodes could mount the same lv. As far as I know, there is >nothing in the cluster that will prevent an ext3 or xfs filesystem >from being mounted by multiple nodes. And if it happens, you have >almost guaranteed data corruption. The thing with Xen is, if you use GFS on the dom0 then you'll be using loopback filesystems for the domUs, so data corruption could still happen. xfs has protection against being mounted twice, you may want to consider using xfs if you're concerned about a domU being run from two different dom0's causing data corruption on the fs. I'm not aware of any other fs that provides this feature. From Britt.Treece at savvis.net Thu Apr 13 18:33:35 2006 From: Britt.Treece at savvis.net (Treece, Britt) Date: Thu, 13 Apr 2006 13:33:35 -0500 Subject: [Linux-cluster] Order to Power Up Message-ID: <9A6FE0FCC2B29846824C5CD81C6647B90152A2F4@s228130hz1ew08.apptix-01.savvis.net> Steve, In what order are you currently bringing them up? The client (non-lock master) servers will wait for 600s (default timeout) until a master lock server is available to handle the locking. If one becomes available in that time frame lock_gulmd will start. If one does not become available the lock_gulmd process will time out based on the aforementioned value and the cluster won't be able to start. If all 3 servers are down you should likely power on the lock server and then a moment later power on the client (GFS mounting) servers. Regards, Britt -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Steve Nelson Sent: Thursday, April 13, 2006 11:28 AM To: linux clustering Subject: [Linux-cluster] Order to Power Up Hi All, I've had to power down all the machines in a GFS 6.0 cluster - 2 nodes and a lock_gulmd qurum server. If I bring them up one at a time, the first server will hang waiting to start lock_gulmd. What's the best way to do this, and the best order? Should I bring them up in single user mode first and then start the services manually? S. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From pegasus at nerv.eu.org Thu Apr 13 18:40:30 2006 From: pegasus at nerv.eu.org (Jure =?UTF-8?Q?Pe=C4=8Dar?=) Date: Thu, 13 Apr 2006 20:40:30 +0200 Subject: [Linux-cluster] CLVM and AoE In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338EC@bnifex.cis.buc.com> References: <4766EEE585A6D311ADF500E018C154E3021338EC@bnifex.cis.buc.com> Message-ID: <20060413204030.dddf419a.pegasus@nerv.eu.org> On Thu, 13 Apr 2006 10:40:22 -0400 Bowie Bailey wrote: > If you don't use GFS, just make absolutely sure that there is no way > that two nodes could mount the same lv. As far as I know, there is > nothing in the cluster that will prevent an ext3 or xfs filesystem > from being mounted by multiple nodes. And if it happens, you have > almost guaranteed data corruption. If the underlying storage is scsi3, one can use persistent scsi reservations, which can be set with some tool from the sg3_utils package. In case of AoE, this is of course not possible. -- Jure Pe?ar http://jure.pecar.org/ From aaron at firebright.com Thu Apr 13 19:01:27 2006 From: aaron at firebright.com (Aaron Stewart) Date: Thu, 13 Apr 2006 12:01:27 -0700 Subject: [Linux-cluster] CLVM and AoE In-Reply-To: <20060413204030.dddf419a.pegasus@nerv.eu.org> References: <4766EEE585A6D311ADF500E018C154E3021338EC@bnifex.cis.buc.com> <20060413204030.dddf419a.pegasus@nerv.eu.org> Message-ID: <443EA007.2090701@firebright.com> Hey Jure, We're already committed to the AoE route unfortunately, but we're setting up next week, and I'll keep everyone posted on any performance benchmarks we glean. -=Aaron Jure Pe?ar wrote: > On Thu, 13 Apr 2006 10:40:22 -0400 > Bowie Bailey wrote: > > >> If you don't use GFS, just make absolutely sure that there is no way >> that two nodes could mount the same lv. As far as I know, there is >> nothing in the cluster that will prevent an ext3 or xfs filesystem >> from being mounted by multiple nodes. And if it happens, you have >> almost guaranteed data corruption. >> > > If the underlying storage is scsi3, one can use persistent scsi reservations, which can be set with some tool from the sg3_utils package. In case of AoE, this is of course not possible. > > -------------- next part -------------- A non-text attachment was scrubbed... Name: aaron.vcf Type: text/x-vcard Size: 302 bytes Desc: not available URL: From sanelson at gmail.com Thu Apr 13 20:34:23 2006 From: sanelson at gmail.com (Steve Nelson) Date: Thu, 13 Apr 2006 21:34:23 +0100 Subject: [Linux-cluster] Order to Power Up In-Reply-To: <9A6FE0FCC2B29846824C5CD81C6647B90152A2F4@s228130hz1ew08.apptix-01.savvis.net> References: <9A6FE0FCC2B29846824C5CD81C6647B90152A2F4@s228130hz1ew08.apptix-01.savvis.net> Message-ID: On 4/13/06, Treece, Britt wrote: > Steve, > > In what order are you currently bringing them up? The client (non-lock > master) servers will wait for 600s (default timeout) until a master lock > server is available to handle the locking. Yes, I discovered this, and realised if I bring up a client server, and then a master server, quorum is formed. > If all 3 servers are down you should likely power on the lock server and > then a moment later power on the client (GFS mounting) servers. Thank you - this is what I did, and it worked fine. > Regards, > > Britt S. From robert at deakin.edu.au Fri Apr 14 03:39:33 2006 From: robert at deakin.edu.au (Robert Ruge) Date: Fri, 14 Apr 2006 13:39:33 +1000 Subject: [Linux-cluster] bootup error - undefined symbol: lvm_snprintf afterupdate In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338EF@bnifex.cis.buc.com> Message-ID: <002c01c65f75$0bd9e700$0132a8c0@eit.deakin.edu.au> I have jist experienced a similar thing but with a different undefined symbol. In my case I have installed clvm from a self compiled directory and when the system updates the lvm2 package it has conflicted with my self installed software. The simple answer for me was to reinstall clvm and edit /etc/init.d/clvmd to change the paths from /usr/sbin to /sbin. Robert > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bowie Bailey > Sent: Friday, 14 April 2006 1:33 > To: Linux-Cluster Mailing List (E-mail) > Subject: [Linux-cluster] bootup error - undefined symbol: > lvm_snprintf afterupdate > > This is an x86_64 system that I just updated to the newest > Cluster rpms. > > When I watch the bootup on the console, I see an error: > > lvm.static: symbol lookup error: /usr/lib64/liblvm2clusterlock.so: > undefined symbol: lvm_snprintf > > This error comes immediately after the "Activating VGs" line, so it > appears to be triggered by the vgchange command in the clvmd startup > file. I have another, identically configured, server which I have not > updated yet. This server does not give the error. > > Everything seems to be working fine, so is this something I need to > worry about? > > -- > Bowie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From Bowie_Bailey at BUC.com Fri Apr 14 13:41:45 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Fri, 14 Apr 2006 09:41:45 -0400 Subject: [Linux-cluster] bootup error - undefined symbol: lvm_snprintf afterupdate Message-ID: <4766EEE585A6D311ADF500E018C154E3021338FA@bnifex.cis.buc.com> Robert Ruge wrote: > Bowie Bailey wrote: > > > > This is an x86_64 system that I just updated to the newest > > Cluster rpms. > > > > When I watch the bootup on the console, I see an error: > > > > lvm.static: symbol lookup error: /usr/lib64/liblvm2clusterlock.so: > > undefined symbol: lvm_snprintf > > > > This error comes immediately after the "Activating VGs" line, so it > > appears to be triggered by the vgchange command in the clvmd startup > > file. I have another, identically configured, server which I have > > not updated yet. This server does not give the error. > > > > Everything seems to be working fine, so is this something I need to > > worry about? > > I have jist experienced a similar thing but with a different undefined > symbol. > > In my case I have installed clvm from a self compiled directory and > when the system updates the lvm2 package it has conflicted with my > self installed software. The simple answer for me was to reinstall > clvm and edit /etc/init.d/clvmd to change the paths from /usr/sbin to > /sbin. Interesting, but in my case, there are no self-compiled pieces. Everything was pre-packaged rpms for both the original install and the upgrade. -- Bowie From ugo.parsi at gmail.com Fri Apr 14 15:01:41 2006 From: ugo.parsi at gmail.com (Ugo PARSI) Date: Fri, 14 Apr 2006 17:01:41 +0200 Subject: [Linux-cluster] Aggregating filesystem Message-ID: Hello, I would like to aggregate multiple hard drives (on multiple computers) inside a big filesystem with RAID / failure tolerant capatibilities. I thought GFS could do that part, but it seems it does not... Any ideas on how I could that ? Thanks a lot, Ugo PARSI From deval.kulshrestha at progression.com Thu Apr 13 09:50:26 2006 From: deval.kulshrestha at progression.com (Deval kulshrestha) Date: Thu, 13 Apr 2006 15:20:26 +0530 Subject: *SPAM* Re: [Linux-cluster] Cluster node not able to access all cluster resource In-Reply-To: <1144880005.15794.54.camel@ayanami.boston.redhat.com> Message-ID: <003a01c65edf$b0e36b90$7600a8c0@PROGRESSION> Hi if you are using fibre based storage solution , you can configure either zoning on switch level or Lun Masking at HBA-> logical Volume level. That can restrict the access path for nodes. It's a kind of LUN access security mechanism. Regards, Deval K. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger Sent: Thursday, April 13, 2006 3:43 AM To: linux clustering Subject: *SPAM* Re: [Linux-cluster] Cluster node not able to access all cluster resource On Sat, 2006-04-08 at 19:05 +0200, Leandro Dardini wrote: > The topic is not a problem, but what I want to do. I have a lots of > service, each on is now run by a two node cluster. This is very bad due > to each node fencing other one during network blackout. I'd like to > create only one cluster, but each resource, either GFS filesystems, must > be readable only by a limited number of nodes. > > For example, taking a Cluster "test" made of node A, node B, node C, > node D and with the following resources: GFS Filesystem alpha and GFS > Filesystem beta. I want that only node A and node B can access GFS > Filesystem alpha and only node C and node D can access GFS Filesystem > beta. > > Is it possible? You can just mount alpha on {A B} and beta on {C D}, but I don't think there is an easy way to forcefully prevent mounting alpha on {C D} currently; someone else might know better. -- Lon -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster =========================================================== Privileged or confidential information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), please delete this message and kindly notify the sender by an emailed reply. Opinions, conclusions and other information in this message that do not relate to the official business of Progression and its associate entities shall be understood as neither given nor endorsed by them. ------------------------------------------------------------- Progression Infonet Private Limited, Gurgaon (Haryana), India From jason at monsterjam.org Sat Apr 15 17:41:04 2006 From: jason at monsterjam.org (Jason) Date: Sat, 15 Apr 2006 13:41:04 -0400 Subject: [Linux-cluster] newbie questions.. Message-ID: <20060415174104.GE41043@monsterjam.org> hey folks, I just downloaded ftp://sources.redhat.com/pub/cluster/releases/cluster-1.02.00.tar.gz and am planning to run it on a Red Hat Enterprise Linux AS release 3 box, with Linux tf1.localdomain 2.4.21-40.ELsmp #1 SMP Thu Feb 2 22:22:39 EST 2006 i686 i686 i386 GNU/Linux kernel.. I check the INSTALL file and BAM! ./configure --kernel_src=/path/to/linux-2.6.x ^^^^^^^^^^^ so do I HAVE to be running 2.6 kernel to use this software? If so, what are my options? (i checked the older cluster-1.0x releases and they all say 2.6 kernel too). regards, Jason From sgray at bluestarinc.com Sun Apr 16 19:42:55 2006 From: sgray at bluestarinc.com (Sean Gray) Date: Sun, 16 Apr 2006 15:42:55 -0400 Subject: [Linux-cluster] RE: RHEL+RAC+GFS In-Reply-To: <55D425252A666646B456CFF8E3248DCCAA9339@ILEX5.IL.NDS.COM> Message-ID: <006001c6618d$f88003e0$6500000a@BLS105> Udi, I never did receive an answer on this. Metalink was no help either. What I believe is that RedHat sells GFS and RHCS to Oracle customers so they can get the 2k-3k US per node income, I guess I would as well if I was them : ), they need to eat too. WARNING! A couple days ago I found out that RHEL+RAC+GFS is NOT covered under Oracle?s, ?Unbreakable Support? and they will NOT assist with ANY GFS issues! Here is what I have done thus far as a proof of concept for our 11i implementation conference room pilots: Part A ? Purchased RHEL subscription ? Downloaded GFS and RHCS SRPMs, compiled and installed ? Made a 4 node cluster with ILO fencing and 2 CLVM2/GFS volumes from an EMC cx300 ? Made my staging area for the 11i install Part A Results ? At first look things seemed fine ? Did basic testing with tools like dd, touch, ls, etc. ? Installed Stage11i, install seemed slow ? Under heavy IO (simultaneous 1G file creation using dd) received kernel panics, added numa=off to boot string fixed this ? Installed CRP1 on a single node ? CRP1 is operational, but seems sluggish ? Destroyed cluster, and moved CRP1 to a single node cluster, same result operational but sluggish Part B ? Made 3 CLVM2/GFS volumes DB/APPS/BACKUP ? Mounted all three volumes on both nodes ? Installed CRP2 with node1 as DB and node 2 as APPS Part B Results ? Install was slow ? CRP2 was sluggish and after a few hours dlm_sendd became a giant CPU hog, if the db and apps were bounced (no reboot) things would be OK for a while ? Switched over to the older lock mechanism GULM, but had exact same results ? At this point great disappointment sets in : ( and I reach out to this mailing list for help, no response(!) Part C ? I reformatted the db and apps volumes as ext3 but left them managed by CLVM2 (I never thought to do otherwise) ? I removed the backup volume as a cluster resource, but since I still had CLVM2 in play I found that I had to have cman, ccsd, and clvm enabled so everything would work. ? Now, only apps was mounted on the apps node(CLVM+EXT3), db was only mounted on the db node(CLVM+EXT3), and backup was not mounted. Part C Results ? Same, CRP2 was sluggish and after a few hours dlm_sendd became a giant CPU hog, if the db and apps were bounced (no reboot) things would be OK for a while ? If the backup volume (CLVM+GFS) was mounted it got even worse Part D ? Destroyed the CLVM setup on the backup volume ? Formatted the entire device (backup volume) as EXT3 without any partitioning ? NOTE* the db and apps volumes are RAID 1+0 arrays on fibre channel(fast) and the backup volume is a RAID 5 ATA array (slow). ? So now the setup is as follows: o db node, mounts db volume - fibre channel+CLVM+EXT3 o apps node, mounts apps volume - fibre channel+CLVM+EXT3 o apps node shares apps volome to db node via NFS read-only o db node, mounts backup volume, - ATA+EXT3 Part D Results ? Same, CRP2 was sluggish and after a few hours dlm_sendd became a giant CPU hog, if the db and apps were bounced (no reboot) things would be OK for a while ? Throughput on backup volume is fantastic in comparison!!! Conclusion RHEL+RAC+GFS may be possible. However, I have not been able to put together the recipe, have had no real assistance from outside resources, and think there is a possible bug in dlm_sendd. Until a true recipe is developed I cannot personally recommend this configuration regardless of what http://www.redhat.com/whitepapers/rha/gfs/Oracle9i_RAC_GFS.pdf says. I do not intend to slight any company or product; it is entirely possible my results are due to my own ignorance. Final Note (off-topic and off-list) Some may be wondering what I plan to do next. I am currently pursuing OCFS2 as a file system and clustering solution. Here is why: ? It is GPL?d and free (as in beer) ? It has freely available binaries for stock RedHat kernels ? It has much in common with EXT3 ? It is included in the newer versions lf Linus?s kernel tree ? It will qualify for ?Unbreakable Support? ? It appears to have applications totally outside of the Oracle world, as in creating a shared root (/) volume and still being able to maintain node specific configuration files. Cool stuff. Happy clustering, I hope some of my months of frustrations are useful to someone. -- Sean N. Gray Director of Information Technology United Radio Incorporated, DBA BlueStar 24 Spiral Drive Florence, Kentucky 41042 office: 859.371.4423 x3263 toll free: 800.371.4423 x3263 fax: 859.371.4425 mobile: 513.616.3379 ________________________________________ From: Yaffe, Udi Sent: Sunday, April 16, 2006 5:49 AM To: Sean Gray Subject: RHEL+RAC+GFS Sean, ? I read your message in the RedHat forum, 14 Mar 2006?(about Oracle Rac on RedHat, using GFS from the) and curious to know whether you got an answer ? I spend the last three weeks looking for a document or any other article on the web,?explaining how to install RAC on GFS, but couldn't find any. if you do have an answer, can you please give me an advice how to start with ? ? Regards, ? ????? Udi ??????Senior System Engineer - Project delivery ? From deval.kulshrestha at progression.com Mon Apr 17 07:17:35 2006 From: deval.kulshrestha at progression.com (Deval kulshrestha) Date: Mon, 17 Apr 2006 12:47:35 +0530 Subject: [Linux-cluster] cluster suit 4.2 In-Reply-To: <004c01c649e7$52cd7f30$4ee17bcb@golie> Message-ID: <001001c661ef$00507a30$6800a8c0@PROGRESSION> Hi Paul I am also using HP Servers with ILO port. I have had used hp_ilo as an fencing agent. Basic ILO functionality comes by default with all HP servers, Advance ILO is Licensed feature(Except Blade Server). You can configure IP address and User name and Password for ILO port. In fencing devices you have to configure HP_ilo as fencing agent. But before that ensure ILO is accessible over LAN. It's working fine with updated fence package Regards Deval -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Paul Sent: Friday, March 17, 2006 10:52 PM To: Lon Hohberger Cc: linux-cluster at redhat.com Subject: Re: [Linux-cluster] cluster suit 4.2 Mr Lon, I still face the same problem. If network cable of two NIC of one node are disconnected, the cluster system stall/hang. For your input, I use manual fence for the time being and using two network interfaces for each node as a bonding port. I have no power switch for fencing but my servers have ILO port ( Proliant DL380G4 ). If manual fence is not recommended for this case, can I use ILO port for fencing. Can you tell me how to set ILO fenced and what things are needed for the setting. Is ILO need license. Can you give me the solution for this problem. Thanks in advance. Rgds, paul would you like give us the solution ----- Original Message ----- From: "Lon Hohberger" To: "Paul" Cc: Sent: Thursday, March 16, 2006 11:27 PM Subject: Re: [Linux-cluster] cluster suit 4.2 > On Thu, 2006-03-16 at 23:13 +0700, Paul wrote: >> manual fence, because we have redundance PS, thx >> > > You need to run fence_ack_manual on the surviving node. Note that > running manual fencing in production environments is not supported. > > There is plenty of adequate remote power fencing hardware available > which will handle multiple power supplies. > > -- Lon > -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster =========================================================== Privileged or confidential information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), please delete this message and kindly notify the sender by an emailed reply. Opinions, conclusions and other information in this message that do not relate to the official business of Progression and its associate entities shall be understood as neither given nor endorsed by them. ------------------------------------------------------------- Progression Infonet Private Limited, Gurgaon (Haryana), India From nemanja at yu.net Mon Apr 17 09:29:24 2006 From: nemanja at yu.net (Nemanja Miletic) Date: Mon, 17 Apr 2006 11:29:24 +0200 Subject: [Linux-cluster] problems with 8 node production gfs cluster Message-ID: <1145266165.27997.57.camel@nemanja.eunet.yu> Hello, I am working for major ISP and we have gfs cluster deployed for our mail system. Cluster includes six smtp and two pop3 nodes. Nodes are on blade servers, and are accessing FC HP EVA 5000 storage. Whole cluster is on the same gigabit subnet. Partition that holds mailboxes is shared. We are using gentoo linux distribution with 2.6.16-gentoo-r1 kernel and 1.02 linux-cluster. We have a problem that in busy hours reading and writing from gfs partition gets very slow. This causes many processes that need to use the disk to go in D state. This further causes great load (over 200) on the machines in the cluster. When we cut the pop3 and smtp access on the firewall load slowly decreases. At the moment we have a limit on syn connection on 110 port on our load balancers (LVS based) in order to control the load. Thank You, -- Nemanja Miletic, System Engineer ----- YUnet International http://www.EUnet.yu Dubrovacka 35/III, 11000 Belgrade Tel: +381 11 3305633; Fax: +381 11 3282760 ----- This e-mail is confidential and intended only for the recipient. Unauthorized distribution, modification or disclosure of its contents is prohibited. If you have received this e-mail in error, please notify the sender by telephone +381 11 3305633. From rainer at ultra-secure.de Mon Apr 17 10:31:00 2006 From: rainer at ultra-secure.de (Rainer Duffner) Date: Mon, 17 Apr 2006 12:31:00 +0200 Subject: [Linux-cluster] problems with 8 node production gfs cluster In-Reply-To: <1145266165.27997.57.camel@nemanja.eunet.yu> References: <1145266165.27997.57.camel@nemanja.eunet.yu> Message-ID: <44436E64.4080407@ultra-secure.de> Nemanja Miletic wrote: > Hello, > > I am working for major ISP and we have gfs cluster deployed for our mail > system. Cluster includes six smtp and two pop3 nodes. Nodes are on blade > servers, and are accessing FC HP EVA 5000 storage. Whole cluster is on > the same gigabit subnet. Partition that holds mailboxes is shared. We > are using gentoo linux distribution with 2.6.16-gentoo-r1 kernel and > 1.02 linux-cluster. > > We have a problem that in busy hours reading and writing from gfs > partition gets very slow. This causes many processes that need to use > the disk to go in D state. This further causes great load (over 200) on > the machines in the cluster. When we cut the pop3 and smtp access on the > firewall load slowly decreases. At the moment we have a limit on syn > connection on 110 port on our load balancers (LVS based) in order to > control the load. > > > > Thank You, > Out of personal interest: what MTA/MDA are you running? cheers, Rainer From nemanja at yu.net Mon Apr 17 11:42:24 2006 From: nemanja at yu.net (nemanja at yu.net) Date: Mon, 17 Apr 2006 13:42:24 +0200 Subject: [Linux-cluster] problems with 8 node production gfs cluster In-Reply-To: <44436E64.4080407@ultra-secure.de> References: <1145266165.27997.57.camel@nemanja.eunet.yu> <44436E64.4080407@ultra-secure.de> Message-ID: <20060417134224.bgcspg8kckk4wo8c@mail.yu.net> We are using sendmail/procmail and popa3d for pop3. Quoting Rainer Duffner : > Nemanja Miletic wrote: >> Hello, >> >> I am working for major ISP and we have gfs cluster deployed for our mail >> system. Cluster includes six smtp and two pop3 nodes. Nodes are on blade >> servers, and are accessing FC HP EVA 5000 storage. Whole cluster is on >> the same gigabit subnet. Partition that holds mailboxes is shared. We >> are using gentoo linux distribution with 2.6.16-gentoo-r1 kernel and >> 1.02 linux-cluster. We have a problem that in busy hours reading >> and writing from gfs >> partition gets very slow. This causes many processes that need to use >> the disk to go in D state. This further causes great load (over 200) on >> the machines in the cluster. When we cut the pop3 and smtp access on the >> firewall load slowly decreases. At the moment we have a limit on syn >> connection on 110 port on our load balancers (LVS based) in order to >> control the load. >> >> >> >> Thank You, >> > > > > Out of personal interest: what MTA/MDA are you running? > > > > cheers, > Rainer > From lhh at redhat.com Mon Apr 17 14:04:22 2006 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 17 Apr 2006 10:04:22 -0400 Subject: [Linux-cluster] newbie questions.. In-Reply-To: <20060415174104.GE41043@monsterjam.org> References: <20060415174104.GE41043@monsterjam.org> Message-ID: <1145282662.15794.90.camel@ayanami.boston.redhat.com> On Sat, 2006-04-15 at 13:41 -0400, Jason wrote: > hey folks, I just downloaded ftp://sources.redhat.com/pub/cluster/releases/cluster-1.02.00.tar.gz > and am planning to run it on a Red Hat Enterprise Linux AS release 3 box, with > Linux tf1.localdomain 2.4.21-40.ELsmp #1 SMP Thu Feb 2 22:22:39 EST 2006 i686 i686 i386 GNU/Linux > kernel.. I check the INSTALL file and BAM! > ./configure --kernel_src=/path/to/linux-2.6.x > ^^^^^^^^^^^ > > so do I HAVE to be running 2.6 kernel to use this software? Yes. > If so, what are my options? (i checked the older cluster-1.0x releases and they all say 2.6 > kernel too). You can run clumanager 1.2.x + GFS 6.0 -- Lon From nemanja at yu.net Mon Apr 17 15:41:39 2006 From: nemanja at yu.net (Nemanja Miletic) Date: Mon, 17 Apr 2006 17:41:39 +0200 Subject: [Linux-cluster] Re: problems with 8 node production gfs cluster In-Reply-To: <1145266165.27997.57.camel@nemanja.eunet.yu> References: <1145266165.27997.57.camel@nemanja.eunet.yu> Message-ID: <1145288499.6000.15.camel@nemanja.eunet.yu> Hi, Does anyone think that turning on journaling on files could help us speed up the access to gfs partition? This would be difficult because journaling can be turned on only on files that are empty. We have a large number of empty files of active users that download all their mail from pop3 server, so turning on jurnaling for them should be possible. What size should be the journals when file journaling is on? Thank You On Mon, 2006-04-17 at 11:29 +0200, Nemanja Miletic wrote: > Hello, > > I am working for major ISP and we have gfs cluster deployed for our mail > system. Cluster includes six smtp and two pop3 nodes. Nodes are on blade > servers, and are accessing FC HP EVA 5000 storage. Whole cluster is on > the same gigabit subnet. Partition that holds mailboxes is shared. We > are using gentoo linux distribution with 2.6.16-gentoo-r1 kernel and > 1.02 linux-cluster. > > We have a problem that in busy hours reading and writing from gfs > partition gets very slow. This causes many processes that need to use > the disk to go in D state. This further causes great load (over 200) on > the machines in the cluster. When we cut the pop3 and smtp access on the > firewall load slowly decreases. At the moment we have a limit on syn > connection on 110 port on our load balancers (LVS based) in order to > control the load. > > > > Thank You, -- Nemanja Miletic, System Engineer ----- YUnet International http://www.EUnet.yu Dubrovacka 35/III, 11000 Belgrade Tel: +381 11 3305633; Fax: +381 11 3282760 ----- This e-mail is confidential and intended only for the recipient. Unauthorized distribution, modification or disclosure of its contents is prohibited. If you have received this e-mail in error, please notify the sender by telephone +381 11 3305633. From bole at yu.net Mon Apr 17 15:43:33 2006 From: bole at yu.net (Bosko Radivojevic) Date: Mon, 17 Apr 2006 17:43:33 +0200 Subject: [Linux-cluster] Aggregating filesystem In-Reply-To: References: Message-ID: <200604171743.33987.bole@yu.net> Hi, You need a parallel file system. It seems that IBM's GPFS is good choice. On Friday 14 April 2006 17:01, Ugo PARSI wrote: > Hello, > > I would like to aggregate multiple hard drives (on multiple computers) > inside a big filesystem with RAID / failure tolerant capatibilities. > > I thought GFS could do that part, but it seems it does not... > > Any ideas on how I could that ? > > Thanks a lot, > > Ugo PARSI > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From jason at monsterjam.org Tue Apr 18 00:47:29 2006 From: jason at monsterjam.org (Jason) Date: Mon, 17 Apr 2006 20:47:29 -0400 Subject: [Linux-cluster] newbie questions.. In-Reply-To: <1145282662.15794.90.camel@ayanami.boston.redhat.com> References: <20060415174104.GE41043@monsterjam.org> <1145282662.15794.90.camel@ayanami.boston.redhat.com> Message-ID: <20060418004729.GA13973@monsterjam.org> ok, so I guess Ill shoot for the GFS 6.0. Im trying to figure out where the source can be found from the ftp://sources.redhat.com/pub/cluster/releases/ and not having much luck.. Is it possible to get the source and use GFS 6.0 standalone? regards, Jason On Mon, Apr 17, 2006 at 10:04:22AM -0400, Lon Hohberger wrote: > On Sat, 2006-04-15 at 13:41 -0400, Jason wrote: > > hey folks, I just downloaded ftp://sources.redhat.com/pub/cluster/releases/cluster-1.02.00.tar.gz > > and am planning to run it on a Red Hat Enterprise Linux AS release 3 box, with > > Linux tf1.localdomain 2.4.21-40.ELsmp #1 SMP Thu Feb 2 22:22:39 EST 2006 i686 i686 i386 GNU/Linux > > kernel.. I check the INSTALL file and BAM! > > ./configure --kernel_src=/path/to/linux-2.6.x > > ^^^^^^^^^^^ > > > > so do I HAVE to be running 2.6 kernel to use this software? > > Yes. > > > If so, what are my options? (i checked the older cluster-1.0x releases and they all say 2.6 > > kernel too). > > You can run clumanager 1.2.x + GFS 6.0 > > -- Lon > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- ================================================ | Jason Welsh jason at monsterjam.org | | http://monsterjam.org DSS PGP: 0x5E30CC98 | | gpg key: http://monsterjam.org/gpg/ | ================================================ From aberoham at gmail.com Tue Apr 18 02:38:21 2006 From: aberoham at gmail.com (aberoham at gmail.com) Date: Mon, 17 Apr 2006 19:38:21 -0700 Subject: [Linux-cluster] kernel noise, "Neighbour table overflow." ? Message-ID: <3bdb07840604171938s723f28b0p4f0a195845bc521a@mail.gmail.com> I'm running a test three-node CS/GFS cluster. At random intervals I get the following kernel messages streaming out to /dev/console on all three nodes. --- Neighbour table overflow. printk: 166 messages suppressed. Neighbour table overflow. printk: 1 messages suppressed. Neighbour table overflow. printk: 1 messages suppressed. Neighbour table overflow. printk: 6 messages suppressed. Neighbour table overflow. printk: 5 messages suppressed. Neighbour table overflow. printk: 15 messages suppressed. Neighbour table overflow. printk: 7 messages suppressed. Neighbour table overflow. printk: 11 messages suppressed. --- Are these messages related to CS/GFS? What triggers 'em? And should I worry about it? I'm running Linux 2.6.9-34.ELsmp, GFS-kernel-smp-2.6.9-45, GFS-6.1.5-0 and dlm-kernel-smp-2.6.9-41.7. [root at gfs02 ~]# service cman status Protocol version: 5.0.1 Config version: 73 Cluster name: gfscluster Cluster ID: 41396 Cluster Member: Yes Membership state: Cluster-Member Nodes: 3 Expected_votes: 3 Total_votes: 3 Quorum: 2 Active subsystems: 8 Node name: gfs02 Node addresses: 10.0.19.11 [root at gfs02 ~]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [1 2 3] DLM Lock Space: "clvmd" 2 3 run - [1 2 3] DLM Lock Space: "Magma" 4 5 run - [1 2 3] DLM Lock Space: "gfstest" 5 6 run - [1 2] GFS Mount Group: "gfstest" 6 7 run - [1 2] User: "usrm::manager" 3 4 run - [1 2 3] Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From l.dardini at comune.prato.it Tue Apr 18 08:56:55 2006 From: l.dardini at comune.prato.it (Leandro Dardini) Date: Tue, 18 Apr 2006 10:56:55 +0200 Subject: R: Re: [Linux-cluster] Cluster node not able to access allcluster resource Message-ID: <0C5C8B118420264EBB94D7D7050150011EFBB7@exchange2.comune.prato.local> > -----Messaggio originale----- > Da: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] Per conto di Deval > kulshrestha > Inviato: gioved? 13 aprile 2006 11.50 > A: 'linux clustering' > Oggetto: RE: *SPAM* Re: [Linux-cluster] Cluster node not able > to access allcluster resource > > Hi > if you are using fibre based storage solution , you can > configure either zoning on switch level or Lun Masking at > HBA-> logical Volume level. That can restrict the access path > for nodes. It's a kind of LUN access security mechanism. > > Regards, > Deval K. Unfortunately this approach was already tested and cannot be followed: If the service was owned by a node who is not allowed to access the GFS filesystem, an error occurs. Leandro > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger > Sent: Thursday, April 13, 2006 3:43 AM > To: linux clustering > Subject: *SPAM* Re: [Linux-cluster] Cluster node not able to > access all cluster resource > > On Sat, 2006-04-08 at 19:05 +0200, Leandro Dardini wrote: > > The topic is not a problem, but what I want to do. I have a lots of > > service, each on is now run by a two node cluster. This is very bad > > due to each node fencing other one during network blackout. > I'd like > > to create only one cluster, but each resource, either GFS > filesystems, > > must be readable only by a limited number of nodes. > > > > For example, taking a Cluster "test" made of node A, node > B, node C, > > node D and with the following resources: GFS Filesystem > alpha and GFS > > Filesystem beta. I want that only node A and node B can access GFS > > Filesystem alpha and only node C and node D can access GFS > Filesystem > > beta. > > > > Is it possible? > > You can just mount alpha on {A B} and beta on {C D}, but I > don't think there is an easy way to forcefully prevent > mounting alpha on {C D} currently; someone else might know better. > > -- Lon > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > =========================================================== > Privileged or confidential information may be contained in > this message. If you are not the addressee indicated in this > message (or responsible for delivery of the message to such > person), please delete this message and kindly notify the > sender by an emailed reply. Opinions, conclusions and other > information in this message that do not relate to the > official business of Progression and its associate entities > shall be understood as neither given nor endorsed by them. > > > ------------------------------------------------------------- > Progression Infonet Private Limited, Gurgaon (Haryana), India > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From sanelson at gmail.com Tue Apr 18 13:11:01 2006 From: sanelson at gmail.com (Steve Nelson) Date: Tue, 18 Apr 2006 14:11:01 +0100 Subject: [Linux-cluster] Clumanager and Chkconfig Message-ID: Hi All, Should clumanager be set to automatically start on all nodes? I have a 2 node cluster (+ quorum) were if I kill an interface, the cluster fails over and the failed node reboots. However, the node rejoins the cluster automatically - should this happen? # chkconfig --list clumanager clumanager 0:off 1:off 2:on 3:on 4:on 5:on 6:off This is in chkconfig because I ran chkconfig --add clumanager. On another cluster, I have not run this, but this is currently in production so I can't test failover. My feeling was that Oracle should transfer to the other node, and clustat should shown one node is inactive, and should be started manually. Does this seem right? S. From cjk at techma.com Tue Apr 18 13:28:24 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Tue, 18 Apr 2006 09:28:24 -0400 Subject: [Linux-cluster] Clumanager and Chkconfig Message-ID: It's basically a policy issue on your part. Some folks like to have problem nodes boot up "dumb" to avoid the system taking a beating due to a major problem. It's possible that the cluster would ride this sort of thing out, but if you have a node go down, you'd be investigating anyway so booting "dumb" is not a bad idea anyway. Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Steve Nelson Sent: Tuesday, April 18, 2006 9:11 AM To: linux clustering Subject: [Linux-cluster] Clumanager and Chkconfig Hi All, Should clumanager be set to automatically start on all nodes? I have a 2 node cluster (+ quorum) were if I kill an interface, the cluster fails over and the failed node reboots. However, the node rejoins the cluster automatically - should this happen? # chkconfig --list clumanager clumanager 0:off 1:off 2:on 3:on 4:on 5:on 6:off This is in chkconfig because I ran chkconfig --add clumanager. On another cluster, I have not run this, but this is currently in production so I can't test failover. My feeling was that Oracle should transfer to the other node, and clustat should shown one node is inactive, and should be started manually. Does this seem right? S. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From teigland at redhat.com Tue Apr 18 13:37:04 2006 From: teigland at redhat.com (David Teigland) Date: Tue, 18 Apr 2006 08:37:04 -0500 Subject: [Linux-cluster] Re: problems with 8 node production gfs cluster In-Reply-To: <1145288499.6000.15.camel@nemanja.eunet.yu> References: <1145266165.27997.57.camel@nemanja.eunet.yu> <1145288499.6000.15.camel@nemanja.eunet.yu> Message-ID: <20060418133704.GA16121@redhat.com> On Mon, Apr 17, 2006 at 05:41:39PM +0200, Nemanja Miletic wrote: > Hi, > > Does anyone think that turning on journaling on files could help us > speed up the access to gfs partition? > > This would be difficult because journaling can be turned on only on > files that are empty. We have a large number of empty files of active > users that download all their mail from pop3 server, so turning on > jurnaling for them should be possible. Data journaling might help, it will speed up fsync(), but will increase the i/o going to your storage. > What size should be the journals when file journaling is on? Continue to use the default. Another thing you might try is disabling the drop-locks callback, allowing GFS to cache more locks. Do this before you mount: echo "0" >> /proc/cluster/lock_dlm/drop_count Dave From 14117614 at sun.ac.za Tue Apr 18 20:14:44 2006 From: 14117614 at sun.ac.za (Pool Lee, Mr <14117614@sun.ac.za>) Date: Tue, 18 Apr 2006 22:14:44 +0200 Subject: [Linux-cluster] < cluster.conf problem > Message-ID: <2C04D2F14FD8254386851063BC2B6706574C40@STBEVS01.stb.sun.ac.za> Hi... I have 5 nodes and 1 head node. I want to setup gfs so that I can bunch together the 5 nodes, each have lvm's. I'm having trouble setting up cluster.conf. I follow the manuals example for gfs, not gfs2, and it says that it cant connect to css. I'm running FC5 on all my machines. Lee He who has a why to live can bear with almost any how. Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 862 bytes Desc: image001.gif URL: From lhh at redhat.com Tue Apr 18 20:27:03 2006 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 18 Apr 2006 16:27:03 -0400 Subject: [Linux-cluster] Clumanager and Chkconfig In-Reply-To: References: Message-ID: <1145392023.24818.36.camel@ayanami.boston.redhat.com> On Tue, 2006-04-18 at 14:11 +0100, Steve Nelson wrote: > Hi All, > > Should clumanager be set to automatically start on all nodes? I have > a 2 node cluster (+ quorum) were if I kill an interface, the cluster > fails over and the failed node reboots. However, the node rejoins the > cluster automatically - should this happen? If you don't want clumanager to attempt to rejoin the cluster, chkconfig --del it. If you want it to attempt to rejoin, chkconfig --add it. It's your choice. Many failures which cause a node to be kicked out are temporary (ex: kernel panic), and can be recovered from after a power-cycle. Many hardware failures (ex: motherboard catching fire, hard disk crash) generally (but *not* always) prevent the node from ever getting far enough to rejoin the cluster. -- Lon From Bowie_Bailey at BUC.com Tue Apr 18 20:27:15 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Tue, 18 Apr 2006 16:27:15 -0400 Subject: [Linux-cluster] bootup error - undefined symbol: lvm_snprintf after update Message-ID: <4766EEE585A6D311ADF500E018C154E302133922@bnifex.cis.buc.com> Hmm... only one response with an apparently unrelated cause. Has anyone else seen this error? I'm reluctant to move toward production with this server if I can't find out something about this error. Bowie Bowie wrote: > This is an x86_64 system that I just updated to the newest Cluster > rpms. > > When I watch the bootup on the console, I see an error: > > lvm.static: symbol lookup error: /usr/lib64/liblvm2clusterlock.so: > undefined symbol: lvm_snprintf > > This error comes immediately after the "Activating VGs" line, so it > appears to be triggered by the vgchange command in the clvmd startup > file. I have another, identically configured, server which I have not > updated yet. This server does not give the error. > > Everything seems to be working fine, so is this something I need to > worry about? From filipe.miranda at gmail.com Tue Apr 18 20:33:13 2006 From: filipe.miranda at gmail.com (Filipe Miranda) Date: Tue, 18 Apr 2006 17:33:13 -0300 Subject: [Linux-cluster] RHCS for RHEL3 Resource temporarily unavailable Message-ID: Hello, We have a Red Hat Cluster Suite for Red Hat Enteprise Linux 3. After one month, the master host cluster (node1) service was interrupted. This is what shows in the file /var/log/messages: Apr 13 16:05:06 Node1 clumembd[3222]: sending broadcast message failed Resource temporarily unavailable Apr 13 16:05:29 Node1 last message repeated 32 times Apr 13 16:06:31 Node1 clumembd[3222]: sending broadcast message failed Resource temporarily unavailable This is the cluster config file: cluster.xml - <#> - <#> - <#> - <#> - <#> - <#> - <#> - <#> Any ideas what is going on? what seems to be the problem? The RHEL3 is on U4, so is the RHCS. Did we do anything wrong in this configuration? Suggestions? I would appreciate any help. Regards, FTM -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Tue Apr 18 20:35:14 2006 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 18 Apr 2006 16:35:14 -0400 Subject: [Linux-cluster] newbie questions.. In-Reply-To: <20060418004729.GA13973@monsterjam.org> References: <20060415174104.GE41043@monsterjam.org> <1145282662.15794.90.camel@ayanami.boston.redhat.com> <20060418004729.GA13973@monsterjam.org> Message-ID: <1145392514.24818.43.camel@ayanami.boston.redhat.com> On Mon, 2006-04-17 at 20:47 -0400, Jason wrote: > ok, so I guess Ill shoot for the GFS 6.0. Im trying to figure out where the source can be found > from the ftp://sources.redhat.com/pub/cluster/releases/ > and not having much luck.. Is it possible to get the source and use GFS 6.0 standalone? > ftp://updates.redhat.com/enterprise/3AS/en/RHCS/SRPMS ftp://updates.redhat.com/enterprise/3AS/en/RHGFS/SRPMS Good luck! If it breaks, you get to keep the pieces. Red Hat has evaluation programs available, you know... -- Lon Red Hat's going to Nashville! http://www.redhat.com/promo/summit/ From 14117614 at sun.ac.za Tue Apr 18 20:44:54 2006 From: 14117614 at sun.ac.za (Pool Lee, Mr <14117614@sun.ac.za>) Date: Tue, 18 Apr 2006 22:44:54 +0200 Subject: [Linux-cluster] < newbie question > Message-ID: <2C04D2F14FD8254386851063BC2B6706574C41@STBEVS01.stb.sun.ac.za> What does waiting for cluster quorum mean? Lee He who has a why to live can bear with almost any how. Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 862 bytes Desc: image001.gif URL: From brentonr at dorm.org Tue Apr 18 21:36:54 2006 From: brentonr at dorm.org (Brenton Rothchild) Date: Tue, 18 Apr 2006 16:36:54 -0500 Subject: [Linux-cluster] [Fwd: Re: [Iscsitarget-devel] iSCSI on Fedora Core 5 or custom] Message-ID: <44455BF6.7070101@dorm.org> Hello! There has been some minor discussion on the iscsitarget-devel list regarding the fact that the current iscsitarget code doesn't fully implement the SCSI commands Release and Reserve, nor does it implement persistent reserve in/out. It was also mentioned that some cluster software, such as MSCS (Microsoft Cluster), will possibly corrupt data. It was then asked if GFS would depend on these unimplemented SCSI commands, and the response from Ming Zhang (below) wasn't 100% sure. So, would an iSCSI target lacking Release, Reserve, and persistent reserve in/out cause problems with GFS? Thanks! -Brenton -------- Original Message -------- Subject: Re: [Iscsitarget-devel] iSCSI on Fedora Core 5 or custom Date: Tue, 18 Apr 2006 17:16:39 -0400 From: Ming Zhang Reply-To: mingz at ele.uri.edu To: Jos Vos CC: iscsitarget-devel at lists.sourceforge.net On Tue, 2006-04-18 at 22:50 +0200, Jos Vos wrote: > On Tue, Apr 18, 2006 at 04:21:29PM -0400, Ming Zhang wrote: > > > cluster like MSCS, cluster file system mostly depends on scsi command > > like reserve/release, persistent reserve in/out. But IET does not really > > implement them. So there will be chance that you fail the cluster and > > corrupt u data. > > Does this also apply to the GFS and OCFS2 filesystems? i am not 100% sure, but i remember that once a GFS guy told me that GFS will run ok even with persistent reserve in/out support. not sure about OCFS2. Ming ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Iscsitarget-devel mailing list Iscsitarget-devel at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel From erling.nygaard at gmail.com Wed Apr 19 07:21:46 2006 From: erling.nygaard at gmail.com (Erling Nygaard) Date: Wed, 19 Apr 2006 09:21:46 +0200 Subject: [Linux-cluster] [Fwd: Re: [Iscsitarget-devel] iSCSI on Fedora Core 5 or custom] In-Reply-To: <44455BF6.7070101@dorm.org> References: <44455BF6.7070101@dorm.org> Message-ID: Brenton GFS does not depend on any of the SCSI Release, Reserve or persistent reserve commands. GFS does not really depend on SCSI at all, just so happens that most shared storage devices are SCSI based :-) Once upon a time, in a galaxy far, far away GFS did attempt to use special SCSI commands for locking. The attempt was discontinued.... Erling On 4/18/06, Brenton Rothchild wrote: > Hello! > > There has been some minor discussion on the iscsitarget-devel list > regarding the fact that the current iscsitarget code doesn't fully > implement the SCSI commands Release and Reserve, nor does it implement > persistent reserve in/out. > > It was also mentioned that some cluster software, such as > MSCS (Microsoft Cluster), will possibly corrupt data. > > It was then asked if GFS would depend on these unimplemented > SCSI commands, and the response from Ming Zhang (below) wasn't 100% > sure. > > So, would an iSCSI target lacking Release, Reserve, and persistent > reserve in/out cause problems with GFS? > > Thanks! > -Brenton > > -------- Original Message -------- > Subject: Re: [Iscsitarget-devel] iSCSI on Fedora Core 5 or custom > Date: Tue, 18 Apr 2006 17:16:39 -0400 > From: Ming Zhang > Reply-To: mingz at ele.uri.edu > To: Jos Vos > CC: iscsitarget-devel at lists.sourceforge.net > > On Tue, 2006-04-18 at 22:50 +0200, Jos Vos wrote: > > On Tue, Apr 18, 2006 at 04:21:29PM -0400, Ming Zhang wrote: > > > > > cluster like MSCS, cluster file system mostly depends on scsi command > > > like reserve/release, persistent reserve in/out. But IET does not really > > > implement them. So there will be chance that you fail the cluster and > > > corrupt u data. > > > > Does this also apply to the GFS and OCFS2 filesystems? > > i am not 100% sure, but i remember that once a GFS guy told me that GFS > will run ok even with persistent reserve in/out support. not sure about > OCFS2. > > Ming > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Iscsitarget-devel mailing list > Iscsitarget-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- - Mac OS X. Because making Unix user-friendly is easier than debugging Windows From saju8 at rediffmail.com Wed Apr 19 10:47:47 2006 From: saju8 at rediffmail.com (saju john) Date: 19 Apr 2006 10:47:47 -0000 Subject: [Linux-cluster] clumanager sending broadcast packets ? Message-ID: <20060419104747.6117.qmail@webmail57.rediffmail.com> Dear All, It seems that redhat clumanager is sending continous broadcast packet while it is running. Can any one confirm. Thank You, Saju John -------------- next part -------------- An HTML attachment was scrubbed... URL: From pcaulfie at redhat.com Wed Apr 19 11:56:21 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Wed, 19 Apr 2006 12:56:21 +0100 Subject: [Linux-cluster] clumanager sending broadcast packets ? In-Reply-To: <20060419104747.6117.qmail@webmail57.rediffmail.com> References: <20060419104747.6117.qmail@webmail57.rediffmail.com> Message-ID: <44462565.3020908@redhat.com> saju john wrote: > > > Dear All, > > It seems that redhat clumanager is sending continous broadcast packet > while it is running. Can any one confirm. > cman will send out a broadcast packet every 5 seconds or so, on port 6809. It's so that other nodes can check that the cluster is still connected. -- patrick From placid at adelpha-lan.org Wed Apr 19 13:02:56 2006 From: placid at adelpha-lan.org (Castang Jerome) Date: Wed, 19 Apr 2006 15:02:56 +0200 Subject: [Linux-cluster] clumanager sending broadcast packets ? In-Reply-To: <20060419104747.6117.qmail@webmail57.rediffmail.com> References: <20060419104747.6117.qmail@webmail57.rediffmail.com> Message-ID: <44463500.9080809@adelpha-lan.org> saju john a ?crit : > > > > Dear All, > > It seems that redhat clumanager is sending continous broadcast packet > while it is running. Can any one confirm. > > Thank You, > Saju John > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster I can confirm :) cman sends broadcast packets to check if all nodes are present. -- Jerome CASTANG Tel: 06.85.74.33.02 mail: jerome.castang at adelpha-lan.org --------------------------------------------- RTFM ! From teigland at redhat.com Wed Apr 19 13:20:46 2006 From: teigland at redhat.com (David Teigland) Date: Wed, 19 Apr 2006 08:20:46 -0500 Subject: [Linux-cluster] [Fwd: Re: [Iscsitarget-devel] iSCSI on Fedora Core 5 or custom] In-Reply-To: <44455BF6.7070101@dorm.org> References: <44455BF6.7070101@dorm.org> Message-ID: <20060419132046.GA2683@redhat.com> On Tue, Apr 18, 2006 at 04:36:54PM -0500, Brenton Rothchild wrote: > Hello! > > There has been some minor discussion on the iscsitarget-devel list > regarding the fact that the current iscsitarget code doesn't fully > implement the SCSI commands Release and Reserve, nor does it implement > persistent reserve in/out. > > It was also mentioned that some cluster software, such as > MSCS (Microsoft Cluster), will possibly corrupt data. > > It was then asked if GFS would depend on these unimplemented > SCSI commands, and the response from Ming Zhang (below) wasn't 100% > sure. > > So, would an iSCSI target lacking Release, Reserve, and persistent > reserve in/out cause problems with GFS? GFS requires the clustering software to do i/o fencing and persistent reservations would be a good way to do fencing. Dave From jbrassow at redhat.com Wed Apr 19 14:13:46 2006 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Wed, 19 Apr 2006 09:13:46 -0500 Subject: [Linux-cluster] < cluster.conf problem > In-Reply-To: <2C04D2F14FD8254386851063BC2B6706574C40@STBEVS01.stb.sun.ac.za> References: <2C04D2F14FD8254386851063BC2B6706574C40@STBEVS01.stb.sun.ac.za> Message-ID: <23f9ac66cf0e830ed3672b0a370ac54d@redhat.com> Have you started the CCS daemon (ccsd)? The init script should start this on bootup. You really just need to create the cluster.conf file (by hand or by GUI) and copy it to all your nodes. Then reboot. brassow On Apr 18, 2006, at 3:14 PM, Pool Lee, Mr <14117614 at sun.ac.za> wrote: > Hi? > ? > I have 5 nodes and 1 head node. I want to setup gfs so that I can > bunch together the 5 nodes, each have lvm?s. > ? > I?m having trouble setting up cluster.conf. I follow the manuals > example for gfs, not gfs2, and it says that it cant connect to css. > ? > I?m running FC5 on all my machines. > ? > Lee > ? > > He who has a why to live can bear with almost any how. > Friedrich Nietzsche? > ? > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1980 bytes Desc: not available URL: From 14117614 at sun.ac.za Wed Apr 19 14:28:18 2006 From: 14117614 at sun.ac.za (Pool Lee, Mr <14117614@sun.ac.za>) Date: Wed, 19 Apr 2006 16:28:18 +0200 Subject: [Linux-cluster] < cluster.conf problem > Message-ID: <2C04D2F14FD8254386851063BC2B6706574C4C@STBEVS01.stb.sun.ac.za> Hi... I did that, thanks. Now I have trouble getting fence_tool join to work. I did cman_tool join and I'm able to see the nodes under /proc/cluster/nodes... But fence seems to wait for something called "quorum".. In the cluster.conf file just said fence manually... What would be the problem? Lee He who has a why to live can bear with almost any how. Friedrich Nietzsche ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jonathan E Brassow Sent: 19 April 2006 04:14 PM To: linux clustering Subject: Re: [Linux-cluster] < cluster.conf problem > Have you started the CCS daemon (ccsd)? The init script should start this on bootup. You really just need to create the cluster.conf file (by hand or by GUI) and copy it to all your nodes. Then reboot. brassow On Apr 18, 2006, at 3:14 PM, Pool Lee, Mr <14117614 at sun.ac.za> wrote: Hi... I have 5 nodes and 1 head node. I want to setup gfs so that I can bunch together the 5 nodes, each have lvm's. I'm having trouble setting up cluster.conf. I follow the manuals example for gfs, not gfs2, and it says that it cant connect to css. I'm running FC5 on all my machines. Lee He who has a why to live can bear with almost any how. Friedrich Nietzsche -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbrassow at redhat.com Wed Apr 19 16:45:04 2006 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Wed, 19 Apr 2006 11:45:04 -0500 Subject: [Linux-cluster] < cluster.conf problem > In-Reply-To: <2C04D2F14FD8254386851063BC2B6706574C4C@STBEVS01.stb.sun.ac.za> References: <2C04D2F14FD8254386851063BC2B6706574C4C@STBEVS01.stb.sun.ac.za> Message-ID: <540618d44815ff3d4897d82dd26f9538@redhat.com> Quorum is a number of machines greater than or equal to (n/2 +1) of the total cluster machines. There is no way for two groups of machines in a cluster to have "quorum". Only one group (which may be all the machines) can have this status. When a group of machines has quorum, they can perform cluster operations. The idea of quorum prevents "split-brain" or two separate groups of machines from thinking they are in control of the cluster - and thus potentially corrupting resources because they do not acknowledge the existence of the other group. (Think multiple writer problem.) You should reboot all your machines at the same time. (Or at least do cman_tool join on all the machines at close to the same time.) This allows the machines to form a quorate group and start performing cluster operations - like starting and performing fencing. brassow P.S. Manual fencing sucks for anything more than simple evaluation. My guess is that you will encounter more problems/questions because of manual fencing. On Apr 19, 2006, at 9:28 AM, Pool Lee, Mr <14117614 at sun.ac.za> wrote: > Hi? > ? > I did that, thanks. > ? > Now I have trouble getting fence_tool join to work. I did cman_tool > join and I?m able to see the nodes under /proc/cluster/nodes? > ? > But fence seems to wait for something called ??quorum?.. > In the cluster.conf file just said fence manually? > ? > What would be the problem? > ? > Lee > ? > > He who has a why to live can bear with almost any how. > Friedrich Nietzsche > > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jonathan E > Brassow > Sent: 19 April 2006 04:14 PM > To: linux clustering > Subject: Re: [Linux-cluster] < cluster.conf problem > > ? > Have you started the CCS daemon (ccsd)? The init script should start > this on bootup. You really just need to create the cluster.conf file > (by hand or by GUI) and copy it to all your nodes. Then reboot. > ? > brassow > > ? > On Apr 18, 2006, at 3:14 PM, Pool Lee, Mr <14117614 at sun.ac.za> wrote: >> ? >> Hi? >> ? >> I have 5 nodes and 1 head node. I want to setup gfs so that I can >> bunch together the 5 nodes, each have lvm?s. >> ? >> I?m having trouble setting up cluster.conf. I follow the manuals >> example for gfs, not gfs2, and it says that it cant connect to css. >> ? >> I?m running FC5 on all my machines. >> ? >> Lee >> ? >> ? >> He who has a why to live can bear with almost any how. >> Friedrich Nietzsche? >> ? >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 8555 bytes Desc: not available URL: From ehimmel at burlingtontelecom.com Wed Apr 19 17:23:18 2006 From: ehimmel at burlingtontelecom.com (Evan Himmel) Date: Wed, 19 Apr 2006 13:23:18 -0400 Subject: [Linux-cluster] Cluster Help Message-ID: <44467206.8060802@burlingtontelecom.com> I installed all the necessary rpms to run RH Cluster Suite and GFS on Fedora Core 5. I am running the 64-bit version with the xen kernel. I can't seem to get cman to start. It gives me the following error: can't open cluster socket: Address family not supported by protocol cman_tool: The cman kernel module may not be loaded I also noticed there is no support for the SMP kernels via 64-bit. (cman-kernel-smp) and that lvm2-cluster is also not available. Any help would be great. Evan From 14117614 at sun.ac.za Wed Apr 19 17:32:00 2006 From: 14117614 at sun.ac.za (Pool Lee, Mr <14117614@sun.ac.za>) Date: Wed, 19 Apr 2006 19:32:00 +0200 Subject: [Linux-cluster] < cluster.conf problem > References: <2C04D2F14FD8254386851063BC2B6706574C4C@STBEVS01.stb.sun.ac.za> <540618d44815ff3d4897d82dd26f9538@redhat.com> Message-ID: <2C04D2F14FD8254386851063BC2B67065E08A7@STBEVS01.stb.sun.ac.za> Hi... Thanks for the answer.. Lee He who has a why to live can bear with almost any how. Friedrich Nietzsche ________________________________ From: linux-cluster-bounces at redhat.com on behalf of Jonathan E Brassow Sent: Wed 2006/04/19 06:45 PM To: linux clustering Subject: Re: [Linux-cluster] < cluster.conf problem > Quorum is a number of machines greater than or equal to (n/2 +1) of the total cluster machines. There is no way for two groups of machines in a cluster to have "quorum". Only one group (which may be all the machines) can have this status. When a group of machines has quorum, they can perform cluster operations. The idea of quorum prevents "split-brain" or two separate groups of machines from thinking they are in control of the cluster - and thus potentially corrupting resources because they do not acknowledge the existence of the other group. (Think multiple writer problem.) You should reboot all your machines at the same time. (Or at least do cman_tool join on all the machines at close to the same time.) This allows the machines to form a quorate group and start performing cluster operations - like starting and performing fencing. brassow P.S. Manual fencing sucks for anything more than simple evaluation. My guess is that you will encounter more problems/questions because of manual fencing. On Apr 19, 2006, at 9:28 AM, Pool Lee, Mr <14117614 at sun.ac.za> wrote: Hi... I did that, thanks. Now I have trouble getting fence_tool join to work. I did cman_tool join and I'm able to see the nodes under /proc/cluster/nodes... But fence seems to wait for something called "quorum".. In the cluster.conf file just said fence manually... What would be the problem? Lee He who has a why to live can bear with almost any how. Friedrich Nietzsche From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jonathan E Brassow Sent: 19 April 2006 04:14 PM To: linux clustering Subject: Re: [Linux-cluster] < cluster.conf problem > Have you started the CCS daemon (ccsd)? The init script should start this on bootup. You really just need to create the cluster.conf file (by hand or by GUI) and copy it to all your nodes. Then reboot. brassow On Apr 18, 2006, at 3:14 PM, Pool Lee, Mr <14117614 at sun.ac.za> wrote: Hi... I have 5 nodes and 1 head node. I want to setup gfs so that I can bunch together the 5 nodes, each have lvm's. I'm having trouble setting up cluster.conf. I follow the manuals example for gfs, not gfs2, and it says that it cant connect to css. I'm running FC5 on all my machines. Lee He who has a why to live can bear with almost any how. Friedrich Nietzsche -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 10701 bytes Desc: not available URL: From jbrassow at redhat.com Wed Apr 19 18:33:05 2006 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Wed, 19 Apr 2006 13:33:05 -0500 Subject: [Linux-cluster] Cluster Help In-Reply-To: <44467206.8060802@burlingtontelecom.com> References: <44467206.8060802@burlingtontelecom.com> Message-ID: <814fa887df569f69d883689d0e4fc100@redhat.com> On Apr 19, 2006, at 12:23 PM, Evan Himmel wrote: > I installed all the necessary rpms to run RH Cluster Suite and GFS on > Fedora Core 5. I am running the 64-bit version with the xen kernel. > I can't seem to get cman to start. It gives me the following error: > > can't open cluster socket: Address family not supported by protocol > cman_tool: The cman kernel module may not be loaded This error occurs when the 'cman' module is not loaded in the kernel. You can do 'modprobe cman' to load it. If that doesn't work, it likely means that you don't have a cman-kernel*rpm for the particular kernel you are running. (I'm not sure we are currently building for xen... anyone?) brassow From cfeist at redhat.com Wed Apr 19 20:27:52 2006 From: cfeist at redhat.com (Chris Feist) Date: Wed, 19 Apr 2006 15:27:52 -0500 Subject: [Linux-cluster] Cluster Help In-Reply-To: <814fa887df569f69d883689d0e4fc100@redhat.com> References: <44467206.8060802@burlingtontelecom.com> <814fa887df569f69d883689d0e4fc100@redhat.com> Message-ID: <44469D48.3010105@redhat.com> There are xen packages available for both x86_64 & i686 archs. In the orignal FC5 release there weren't any x86_64 xen rpms for GFS and Cluster Suite. You need to make sure you run 'yum update' to get the latest updates. Thanks, Chris Jonathan E Brassow wrote: > > On Apr 19, 2006, at 12:23 PM, Evan Himmel wrote: > >> I installed all the necessary rpms to run RH Cluster Suite and GFS on >> Fedora Core 5. I am running the 64-bit version with the xen kernel. >> I can't seem to get cman to start. It gives me the following error: >> >> can't open cluster socket: Address family not supported by protocol >> cman_tool: The cman kernel module may not be loaded > > This error occurs when the 'cman' module is not loaded in the kernel. > You can do 'modprobe cman' to load it. If that doesn't work, it likely > means that you don't have a cman-kernel*rpm for the particular kernel > you are running. (I'm not sure we are currently building for xen... > anyone?) > > brassow > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From sanelson at gmail.com Wed Apr 19 21:35:57 2006 From: sanelson at gmail.com (Steve Nelson) Date: Wed, 19 Apr 2006 22:35:57 +0100 Subject: [Linux-cluster] Clustat in user's profile Message-ID: Hi All, On all of my clusters, I have clustat run in the user's profile, so the status of the cluster is visible whenever someone logs in. Someone has suggested to me that clustat could hang, and prevent user access. Is this a valid point? Under what (if any) circumstances would clustat hang? S. From ehimmel at burlingtontelecom.com Thu Apr 20 01:53:03 2006 From: ehimmel at burlingtontelecom.com (Evan Himmel) Date: Wed, 19 Apr 2006 21:53:03 -0400 Subject: [Linux-cluster] Cluster Help In-Reply-To: <44469D48.3010105@redhat.com> References: <44467206.8060802@burlingtontelecom.com> <814fa887df569f69d883689d0e4fc100@redhat.com> <44469D48.3010105@redhat.com> Message-ID: <4446E97F.7050603@burlingtontelecom.com> What about lvm2-cluster? I got kernel-xen for the rest. Thanks! Chris Feist wrote: > There are xen packages available for both x86_64 & i686 archs. In the > orignal FC5 release there weren't any x86_64 xen rpms for GFS and > Cluster Suite. You need to make sure you run 'yum update' to get the > latest updates. > > Thanks, > Chris > > Jonathan E Brassow wrote: >> >> On Apr 19, 2006, at 12:23 PM, Evan Himmel wrote: >> >>> I installed all the necessary rpms to run RH Cluster Suite and GFS >>> on Fedora Core 5. I am running the 64-bit version with the xen >>> kernel. I can't seem to get cman to start. It gives me the >>> following error: >>> >>> can't open cluster socket: Address family not supported by protocol >>> cman_tool: The cman kernel module may not be loaded >> >> This error occurs when the 'cman' module is not loaded in the >> kernel. You can do 'modprobe cman' to load it. If that doesn't >> work, it likely means that you don't have a cman-kernel*rpm for the >> particular kernel you are running. (I'm not sure we are currently >> building for xen... anyone?) >> >> brassow >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Evan Himmel Burlington Telecom http://www.burlingtontelecom.com __________________________________________________________________________________________________________________________________________________ Attention! This electronic message contains information that may be legally confidential and/or privileged. The information is intended solely for the individual or entity named above and access by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. From Fernando.Nino at medias.cnes.fr Thu Apr 20 07:56:20 2006 From: Fernando.Nino at medias.cnes.fr (Fernando Nino) Date: Thu, 20 Apr 2006 09:56:20 +0200 Subject: [Linux-cluster] GFS join hang Message-ID: <200604200756.k3K7uDo25619@cnes.fr> Dear all, I am running GFS 6.1 with dlm on a cluster (4 nodes + front-end) of dual-headed Opterons and RHEL4U3. Because of some problems (kernel panic...) I had to hard boot some nodes of the cluster. Now, some gfs partitions won't mount. They will simply keep waiting forever for the "join" of the GFS group: So... three questions: - What is the join exactly doing ? Cluster status is fine, everybody is member ... - What does the status code mean in the cman_tool output ? - What can I do to restart this cluster ? NB: Before testing this (below) I rebooted the complete cluster and gfs_fsck'ed /all nodes /with everything unmounted. ---------------------------------------------------------------------------------------------------- root # service clvmd start root #: service gfs start Mounting GFS filesystems: # forever ! in another console I get: root # dmesg | tail ... GFS: fsid=globcover:baieGC2b.0: jid=14: Done GFS: fsid=globcover:baieGC2b.0: jid=15: Trying to acquire journal lock... GFS: fsid=globcover:baieGC2b.0: jid=15: Looking at journal... GFS: fsid=globcover:baieGC2b.0: jid=15: Done GFS: Trying to join cluster "lock_dlm", "globcover:baieGC3a" root # cman_tool services Service Name GID LID State Code Fence Domain: "default" 11 2 run - [1 5 4 3 2] DLM Lock Space: "clvmd" 12 3 run - [1 5 4 3 2] DLM Lock Space: "baieGC2b" 13 4 run - [1 5] DLM Lock Space: "baieGC3a" 15 6 run - [1 5 2 4 3] GFS Mount Group: "baieGC2b" 14 5 run - [1 5] GFS Mount Group: "baieGC3a" 0 7 join S-2,2,4 [] root # cman_tool status Protocol version: 5.0.1 Config version: 8 Cluster name: globcover Cluster ID: 53692 Cluster Member: Yes Membership state: Cluster-Member Nodes: 5 Expected_votes: 5 Total_votes: 5 Quorum: 3 Active subsystems: 9 Node name: globcover-fe Node addresses: 10.1.1.1 root # cman_tool nodes Node Votes Exp Sts Name 1 1 5 M globcover-fe 2 1 5 M compute-0-3 3 1 5 M compute-0-2 4 1 5 M compute-0-1 5 1 5 M compute-0-0 ---------------------------------------------------------------------------------------------------- Thanks, -- ------------------------------------------------------------------------ Fernando NI?O CNES - BPi 2102 Medias-France/IRD 18, Av. Edouard Belin T?l: 05.61.27.40.74 31401 Toulouse Cedex 9 From hlawatschek at atix.de Thu Apr 20 08:03:22 2006 From: hlawatschek at atix.de (Mark Hlawatschek) Date: Thu, 20 Apr 2006 10:03:22 +0200 Subject: [Linux-cluster] Patch: nodeid based cdpn for GFS shared root Message-ID: <200604201003.23057.hlawatschek@atix.de> Hi, as posted before, we are using GFS for our diskless shared root cluster solutions. In this file system based ssi configurations all servers are ?stateless? and share the same root partition and boot device in the SAN. Server, infrastructure and storage tier of the diskless shared root cluster can be scaled independently and incrementally. As we want to be independent from the servers hostnames at initrd boottime, we wrote a small GFS patch to use cmans nodeid parameter for a context dependent path name. I attached the patches to the mail. Note, that the GFS and cman patches are totally independent from each other and the cman patch is only for user information. What do you think about nodeid cdpns ? The Readme: 1. Reason for the patch Create context dependent symbolic links (cdsl) dependent to cmans nodeid E.g. ln -s @nodeid mynode 2. Contents - cman-kernel-nodeid.patch Applies against cman-kernel-2.6.9-41 - kernel-nodeid-symlink.patch Applies against gfs-kernel-2.6.9-42 3. Changes: 3.1 cman-kernel Added line to proc/cluster/status output. E.g: "Node ID: 4" 3.2 gfs-kernel Added new parameter for cdsl symlink: @nodeid Thanks, Mark -- Gruss / Regards, Dipl.-Ing. Mark Hlawatschek Phone: +49-89 121 409-55 http://www.atix.de/ http://www.open-sharedroot.org/ ** ATIX - Ges. fuer Informationstechnologie und Consulting mbH Einsteinstr. 10 - 85716 Unterschleissheim - Germany -------------- next part -------------- A non-text attachment was scrubbed... Name: gfs-kernel-nodeid-symlink.patch Type: text/x-diff Size: 2473 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cman-kernel-nodeid.patch Type: text/x-diff Size: 500 bytes Desc: not available URL: From pcaulfie at redhat.com Thu Apr 20 08:08:08 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 20 Apr 2006 09:08:08 +0100 Subject: [Linux-cluster] Patch: nodeid based cdpn for GFS shared root In-Reply-To: <200604201003.23057.hlawatschek@atix.de> References: <200604201003.23057.hlawatschek@atix.de> Message-ID: <44474168.8030703@redhat.com> Mark Hlawatschek wrote: > Hi, > diff -Naur cman-kernel-2.6.9-41.orig/src/proc.c cman-kernel-2.6.9-41/src/proc.c > --- cman-kernel-2.6.9-41.orig/src/proc.c 2005-11-28 17:20:39.000000000 +0100 > +++ cman-kernel-2.6.9-41/src/proc.c 2006-01-23 23:20:15.000000000 +0100 > @@ -149,6 +149,8 @@ > atomic_read(&use_count)); > > c += sprintf(b+c, "Node name: %s\n", nodename); > + > + c += sprintf(b+c, "Node ID: %i\n", us->node_id); > > c += sprintf(b+c, "Node addresses: "); > list_for_each_entry(node_addr, &us->addr_list, list) { > > This patch is already in CVS for cman. I can't comment on the GFS parts. -- patrick From cjk at techma.com Thu Apr 20 11:16:52 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Thu, 20 Apr 2006 07:16:52 -0400 Subject: [Linux-cluster] Clustat in user's profile Message-ID: It's easy to illustrate, just do a "clustsat -i 1" in one window, then fence another node from another. The clustat will pause (hang) at one point. Not a very scientific test and probably a situation that would crop up much, but it can "pause" under normal use. As far as hanging, well, it's software, which by it's very nature can hang. It'll probaly happen right when your showing your boss the new whiz-bang cluster you've been working on :) Things that can make _anything_ hang of course are slow resolves, interrupted authentication mechanism (nis or ldap puking or slow) or maybe an NFS mount which is having issues which causes access checks to timeout/fail etc. In short, I'm personally not a fan of issueing commands during login scripts. Cheers, Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Steve Nelson Sent: Wednesday, April 19, 2006 5:36 PM To: linux clustering Subject: [Linux-cluster] Clustat in user's profile Hi All, On all of my clusters, I have clustat run in the user's profile, so the status of the cluster is visible whenever someone logs in. Someone has suggested to me that clustat could hang, and prevent user access. Is this a valid point? Under what (if any) circumstances would clustat hang? S. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From cjk at techma.com Thu Apr 20 11:46:24 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Thu, 20 Apr 2006 07:46:24 -0400 Subject: [Linux-cluster] New features/architecture ? Message-ID: I've worked with GFS 6 and 6.1 quite a bit lately and in reading the posts over the last few months, I see a lot of references to gfs2. I'm not quite sure where it sits in the grand scheme of things other than it's the next big itteration of gfs as a whole and attepmpts are being made to mearge it into the kernel. This post has some good info, but not much in the way of specifics http://lwn.net/Articles/150652/ * GFS2 - an improved version of GFS, not on-disk compatible * DLM - an improved version of DLM * CMAN - a new version of CMAN, based on OpenAIS * CLVM - will allow more LVM2 features to be used in the cluster These seem to be all there is as far as a "roadmap" and the OpenAIS link doesn't seem all that descriptive unless one is a developer. Is there some point of reference which describes the changes between whats already released and what is planned? For instance, a post recently mentioned adding openais interfaces/functionality. Basically I guess I am looking for a roadmap of some sort? Cheers Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From skellogg at egginc.com Thu Apr 20 12:50:11 2006 From: skellogg at egginc.com (Scott Kellogg) Date: Thu, 20 Apr 2006 08:50:11 -0400 Subject: [Linux-cluster] Cluster Planning In-Reply-To: References: Message-ID: <8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com> Hello, I was wondering if I could get some assistance in setting up a two node cluster. We have 2 Dell PowerEdge 850 machines running RHEL4. Our license for Cluster Suite is still in purchasing. The main thing in the Cluster Suite documentation which confuses me is the use of a SAN. RHEL4 docs claim that the need for a SAN has been eliminated, but I'm having trouble find more information. Most of the docs assume you are using a SAN. My customer could not afford the SAN, just the servers. I would like to set up a high-availablity environment. I understand that due to the hardware configuration (no SAN, no RAID) that there are still points of failure. I'm hoping to set up a simple active- passive configuration. We will be running LAMP applications. If the primary server cannot deliver services, I'd like to automatically cut over to the backup. Ideally, I'd like to set up active-active and load balancing, since the servers have DRAC4 fence devices for use with STONITH. However, since there is no SAN, I'm not sure how data will be mirrored across the two machines. Any help is appreciated! Thank you, Scott Kellogg From teigland at redhat.com Thu Apr 20 14:32:47 2006 From: teigland at redhat.com (David Teigland) Date: Thu, 20 Apr 2006 09:32:47 -0500 Subject: [Linux-cluster] GFS join hang In-Reply-To: <200604200756.k3K7uDo25619@cnes.fr> References: <200604200756.k3K7uDo25619@cnes.fr> Message-ID: <20060420143247.GA22326@redhat.com> On Thu, Apr 20, 2006 at 09:56:20AM +0200, Fernando Nino wrote: > I am running GFS 6.1 with dlm on a cluster (4 nodes + front-end) of > dual-headed Opterons and RHEL4U3. Because of some problems (kernel > panic...) I had to hard boot some nodes of the cluster. Now, some gfs > partitions won't mount. They will simply keep waiting forever for the > "join" of the GFS group: > > So... three questions: > > - What is the join exactly doing ? Cluster status is fine, everybody is > member ... >From all 5 nodes it would be good to see: - cman_tool services - /var/log/messages - /proc/cluster/lock_dlm/debug > - What does the status code mean in the cman_tool output ? > S-2,2,4 S-2: join event state is SEST_JOIN_ACKWAIT ,2: join event flag is SEFL_ALLOW_JOIN ,4: number of acks to our join request is 4 So, the node is waiting for acks to its join request. It needs 5 but has only got 4, someone hasn't sent a reply for some reason. We might be able to figure out who and why given all the info from the other nodes. Rebooting the node that's not replied might resolve things. Dave From david.n.lombard at intel.com Thu Apr 20 14:39:44 2006 From: david.n.lombard at intel.com (Lombard, David N) Date: Thu, 20 Apr 2006 07:39:44 -0700 Subject: [Linux-cluster] Clustat in user's profile Message-ID: <187D3A7CAB42A54DB61F1D05F0125722086AE16E@orsmsx402.amr.corp.intel.com> From: Steve Nelson on Wednesday, April 19, 2006 2:36 PM > Hi All, > > On all of my clusters, I have clustat run in the user's profile, so > the status of the cluster is visible whenever someone logs in. > > Someone has suggested to me that clustat could hang, and prevent user > access. Is this a valid point? Under what (if any) circumstances > would clustat hang? As another has pointed out, anything that can hang the login, will, at the most inopportune times. Why not have a cron job periodically report the status into some file and then just cat the file results during login? If the user then really wants an up-to-the-moment report, they can buy into running clustat. -- dnl From filipe.miranda at gmail.com Thu Apr 20 15:11:31 2006 From: filipe.miranda at gmail.com (Filipe Miranda) Date: Thu, 20 Apr 2006 12:11:31 -0300 Subject: [Linux-cluster] Cluster Planning In-Reply-To: <8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com> References: <8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com> Message-ID: Scott, The RHCS for RHEL4 does not requires a SAN like its antecessor RHCS for RHEL3. All quorum control and cluster management is done throught network, so you will be fine without a SAN when using RHCS for RHEL4. The only problem you will encounter is that without a SAN you will probably have to sincronize data between the servers if your application stores data on internar discs on the servers.... Also to use an active-active (same service active on both servers) configuration + loadbalancing you will need more than 2 servers; at least 4 servers, 2 for loadbalancing (1 active/1 backup) and 2 for the critical service active concurently on both servers (no high availability, no failover). But if the active-active is for the servers (hardware), which means not the same service on high availability then; 2 servers doing loadbalancing (1 active/1 backup), 2 servers providing 2 critical services, one actice on node A and the other one active on node B (failover activated) Well thats my understanding about Red Hat's Cluster Suite... Please correct me if I Am wrong... Att. Filipe Miranda On 4/20/06, Scott Kellogg wrote: > > Hello, > > I was wondering if I could get some assistance in setting up a two > node cluster. > > We have 2 Dell PowerEdge 850 machines running RHEL4. Our license for > Cluster Suite is still in purchasing. The main thing in the Cluster > Suite documentation which confuses me is the use of a SAN. RHEL4 > docs claim that the need for a SAN has been eliminated, but I'm > having trouble find more information. Most of the docs assume you > are using a SAN. My customer could not afford the SAN, just the servers. > > I would like to set up a high-availablity environment. I understand > that due to the hardware configuration (no SAN, no RAID) that there > are still points of failure. I'm hoping to set up a simple active- > passive configuration. We will be running LAMP applications. If the > primary server cannot deliver services, I'd like to automatically cut > over to the backup. > > Ideally, I'd like to set up active-active and load balancing, since > the servers have DRAC4 fence devices for use with STONITH. However, > since there is no SAN, I'm not sure how data will be mirrored across > the two machines. > > Any help is appreciated! > > Thank you, > Scott Kellogg > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Att. --- Filipe T Miranda RHCE - Red Hat Certified Engineer OCP8i - Oracle Certified Professional -------------- next part -------------- An HTML attachment was scrubbed... URL: From skellogg at egginc.com Thu Apr 20 15:22:42 2006 From: skellogg at egginc.com (Scott Kellogg) Date: Thu, 20 Apr 2006 11:22:42 -0400 Subject: [Linux-cluster] Cluster Planning In-Reply-To: References: <8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com> Message-ID: > > > The only problem you will encounter is that without a SAN you will > probably have to sincronize data between the servers if your > application stores data on internar discs on the servers.... Yes, this is the issue that seems to have multiple solutions. I've looked at NFS, DRBD, rysnc, and Unison, but none of these technologies has jumped out at me as the best one. > > Also to use an active-active (same service active on both servers) > configuration + loadbalancing you will need more than 2 servers; at > least 4 servers, 2 for loadbalancing (1 active/1 backup) and 2 for > the critical service active concurently on both servers (no high > availability, no failover). You seem to be referring to LVS. Right, I can't implement that since I don't have enough hardware. I think that active-passive will be the way to go. When the active node dies, the passive node will take over. The data will only be as fresh as the last synchronization. That begs the question of what happens when the active node comes back up ... will the passive node (now active) sync its data to the new active node? This is where picking a synchronization method becomes vital. /Scott > > But if the active-active is for the servers (hardware), which means > not the same service on high availability then; 2 servers doing > loadbalancing (1 active/1 backup), 2 servers providing 2 critical > services, one actice on node A and the other one active on node B > (failover activated) > > Well thats my understanding about Red Hat's Cluster Suite... Please > correct me if I Am wrong... > > Att. > Filipe Miranda > > > On 4/20/06, Scott Kellogg wrote: > Hello, > > I was wondering if I could get some assistance in setting up a two > node cluster. > > We have 2 Dell PowerEdge 850 machines running RHEL4. Our license for > Cluster Suite is still in purchasing. The main thing in the Cluster > Suite documentation which confuses me is the use of a SAN. RHEL4 > docs claim that the need for a SAN has been eliminated, but I'm > having trouble find more information. Most of the docs assume you > are using a SAN. My customer could not afford the SAN, just the > servers. > > I would like to set up a high-availablity environment. I understand > that due to the hardware configuration (no SAN, no RAID) that there > are still points of failure. I'm hoping to set up a simple active- > passive configuration. We will be running LAMP applications. If the > primary server cannot deliver services, I'd like to automatically cut > over to the backup. > > Ideally, I'd like to set up active-active and load balancing, since > the servers have DRAC4 fence devices for use with STONITH. However, > since there is no SAN, I'm not sure how data will be mirrored across > the two machines. > > Any help is appreciated! > > Thank you, > Scott Kellogg > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Att. > --- > Filipe T Miranda > RHCE - Red Hat Certified Engineer > OCP8i - Oracle Certified Professional > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Scott Kellogg System Administrator EG&G Technical Services, Inc. (812) 854-7077 ext. 236 -------------- next part -------------- An HTML attachment was scrubbed... URL: From teigland at redhat.com Thu Apr 20 15:33:26 2006 From: teigland at redhat.com (David Teigland) Date: Thu, 20 Apr 2006 10:33:26 -0500 Subject: [Linux-cluster] New features/architecture ? In-Reply-To: References: Message-ID: <20060420153326.GB22326@redhat.com> On Thu, Apr 20, 2006 at 07:46:24AM -0400, Kovacs, Corey J. wrote: > I've worked with GFS 6 and 6.1 quite a bit lately and in reading the posts > over the last few months, I see a lot > of references to gfs2. I'm not quite sure where it sits in the grand scheme > of things other than it's the next big > itteration of gfs as a whole and attepmpts are being made to mearge it into > the kernel. > > This post has some good info, but not much in the way of specifics > http://lwn.net/Articles/150652/ > > * GFS2 - an improved version of GFS, not on-disk compatible > * DLM - an improved version of DLM > * CMAN - a new version of CMAN, based on OpenAIS > > * CLVM - will allow more LVM2 features to be used in the cluster > > These seem to be all there is as far as a "roadmap" and the OpenAIS link > doesn't seem all that descriptive > unless one is a developer. > > Is there some point of reference which describes the changes between whats > already released and what is > planned? For instance, a post recently mentioned adding openais > interfaces/functionality. For GFS2 and DLM it's largely performance improvements. For clustering infrastructure a ton of stuff moved out of the kernel and now runs in user space, with openais at the center. The user isn't exposed to much of the infrastructure so there's not much user-visible change to speak about. Patrick recently sent this out: https://www.redhat.com/archives/linux-cluster/2006-April/msg00126.html Dave From jbrassow at redhat.com Thu Apr 20 15:37:40 2006 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Thu, 20 Apr 2006 10:37:40 -0500 Subject: [Linux-cluster] Patch: nodeid based cdpn for GFS shared root In-Reply-To: <200604201003.23057.hlawatschek@atix.de> References: <200604201003.23057.hlawatschek@atix.de> Message-ID: <82f35fc0732c6b277c5e616915a98166@redhat.com> I've only heard in passing, but... I think that there has been community push back on the cdpn's. I don't think they want them in GFS 2. It may be tough to argue that GFS 1 needs more cdpn capability if it is completely going away in GFS 2. The reason sited against cdpn's was the fact that 'mount --bind' exists. If you could articulate why bind mounts are insufficient for your uses, it may give the community a reason to take a second look at cdpn's. brassow On Apr 20, 2006, at 3:03 AM, Mark Hlawatschek wrote: > Hi, > > as posted before, we are using GFS for our diskless shared root cluster > solutions. > > In this file system based ssi configurations all servers are > ?stateless? and > share the same root partition and boot device in the SAN. Server, > infrastructure and storage tier of the diskless shared root cluster > can be > scaled independently and incrementally. > > As we want to be independent from the servers hostnames at initrd > boottime, we > wrote a small GFS patch to use cmans nodeid parameter for a context > dependent > path name. > > I attached the patches to the mail. Note, that the GFS and cman > patches are > totally independent from each other and the cman patch is only for user > information. > > What do you think about nodeid cdpns ? > > The Readme: > 1. Reason for the patch > Create context dependent symbolic links (cdsl) dependent to cmans > nodeid > E.g. ln -s @nodeid mynode > > 2. Contents > - cman-kernel-nodeid.patch > Applies against cman-kernel-2.6.9-41 > - kernel-nodeid-symlink.patch > Applies against gfs-kernel-2.6.9-42 > > 3. Changes: > 3.1 cman-kernel > Added line to proc/cluster/status output. E.g: "Node ID: 4" > 3.2 gfs-kernel > Added new parameter for cdsl symlink: @nodeid > > Thanks, > > Mark > > > -- > Gruss / Regards, > > Dipl.-Ing. Mark Hlawatschek > Phone: +49-89 121 409-55 > http://www.atix.de/ > http://www.open-sharedroot.org/ > > ** > ATIX - Ges. fuer Informationstechnologie und Consulting mbH > Einsteinstr. 10 - 85716 Unterschleissheim - Germany > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From cjk at techma.com Thu Apr 20 15:45:59 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Thu, 20 Apr 2006 11:45:59 -0400 Subject: [Linux-cluster] New features/architecture ? Message-ID: David, thanks for the reply. I've seen the post below and in fact it is what prompted the question. Just seems like there is a lot going underneath that I was missing. I was hoping for a more nuts and bolts bag of information with respect to the changes being made across the board. This is a good start though and I'll take a look. Thanks Corey -----Original Message----- From: David Teigland [mailto:teigland at redhat.com] Sent: Thursday, April 20, 2006 11:33 AM To: Kovacs, Corey J. Cc: linux clustering Subject: Re: [Linux-cluster] New features/architecture ? On Thu, Apr 20, 2006 at 07:46:24AM -0400, Kovacs, Corey J. wrote: > I've worked with GFS 6 and 6.1 quite a bit lately and in reading the > posts over the last few months, I see a lot of references to gfs2. I'm > not quite sure where it sits in the grand scheme of things other than > it's the next big itteration of gfs as a whole and attepmpts are being > made to mearge it into the kernel. > > This post has some good info, but not much in the way of specifics > http://lwn.net/Articles/150652/ > > * GFS2 - an improved version of GFS, not on-disk compatible > * DLM - an improved version of DLM > * CMAN - a new version of CMAN, based on OpenAIS > > * CLVM - will allow more LVM2 features to be used in the cluster > > These seem to be all there is as far as a "roadmap" and the OpenAIS > link doesn't seem all that descriptive unless one is a developer. > > Is there some point of reference which describes the changes between > whats already released and what is planned? For instance, a post > recently mentioned adding openais interfaces/functionality. For GFS2 and DLM it's largely performance improvements. For clustering infrastructure a ton of stuff moved out of the kernel and now runs in user space, with openais at the center. The user isn't exposed to much of the infrastructure so there's not much user-visible change to speak about. Patrick recently sent this out: https://www.redhat.com/archives/linux-cluster/2006-April/msg00126.html Dave From teigland at redhat.com Thu Apr 20 15:46:07 2006 From: teigland at redhat.com (David Teigland) Date: Thu, 20 Apr 2006 10:46:07 -0500 Subject: [Linux-cluster] Patch: nodeid based cdpn for GFS shared root In-Reply-To: <82f35fc0732c6b277c5e616915a98166@redhat.com> References: <200604201003.23057.hlawatschek@atix.de> <82f35fc0732c6b277c5e616915a98166@redhat.com> Message-ID: <20060420154607.GC22326@redhat.com> On Thu, Apr 20, 2006 at 10:37:40AM -0500, Jonathan E Brassow wrote: > I've only heard in passing, but... > > I think that there has been community push back on the cdpn's. I don't > think they want them in GFS 2. It may be tough to argue that GFS 1 > needs more cdpn capability if it is completely going away in GFS 2. CDPN's are already removed from GFS2, and probably don't have much chance of getting back in since the linux-kernel folks really have the say. Given that CDPN's are already in GFS1 and won't ever be removed from there, I don't see any reason not to add a nodeid option. GFS2 will need a different approach regardless of whether nodeid is added to GFS1 or not. Dave From rainer at ultra-secure.de Thu Apr 20 15:45:13 2006 From: rainer at ultra-secure.de (Rainer Duffner) Date: Thu, 20 Apr 2006 17:45:13 +0200 Subject: [Linux-cluster] Cluster Planning In-Reply-To: References: <8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com> Message-ID: <4447AC89.1060005@ultra-secure.de> Scott Kellogg wrote: > >> >> Also to use an active-active (same service active on both servers) >> configuration + loadbalancing you will need more than 2 servers; at >> least 4 servers, 2 for loadbalancing (1 active/1 backup) and 2 for >> the critical service active concurently on both servers (no high >> availability, no failover). > > You seem to be referring to LVS. Right, I can't implement that since > I don't have enough hardware. I think that active-passive will be the > way to go. When the active node dies, the passive node will take > over. The data will only be as fresh as the last synchronization. > > That begs the question of what happens when the active node comes back > up ... will the passive node (now active) sync its data to the new > active node? This is where picking a synchronization method becomes > vital. That's where GFS comes in. I don't want to sound rude, but either you (or your customer) have the budget for a cluster or not. If you don't have the budget, it's better to just use the 2nd server as hot-spare and rsync the data over to the 2nd one and do the failover by hand (and even more so the re-activation of the primary server) Or you should have bought a more expensive, more reliable server instead of two low-end ones. You *can* have low-end servers, but you need a reliable storage-infrastructure (which will be a SAN in 7 out of 10 cases and iSCSI in the other), which has a big upfront-cost. This always reminds me of people who want to drive cars they cannot really afford. It's better to acknowledge that and accommodate to a cheaper car than sitting there one day without the money to have it repaired.... cheers, Rainer From pcaulfie at redhat.com Thu Apr 20 15:51:40 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 20 Apr 2006 16:51:40 +0100 Subject: [Linux-cluster] New features/architecture ? In-Reply-To: <20060420153326.GB22326@redhat.com> References: <20060420153326.GB22326@redhat.com> Message-ID: <4447AE0C.30000@redhat.com> David Teigland wrote: > On Thu, Apr 20, 2006 at 07:46:24AM -0400, Kovacs, Corey J. wrote: >> I've worked with GFS 6 and 6.1 quite a bit lately and in reading the posts >> over the last few months, I see a lot >> of references to gfs2. I'm not quite sure where it sits in the grand scheme >> of things other than it's the next big >> itteration of gfs as a whole and attepmpts are being made to mearge it into >> the kernel. >> >> This post has some good info, but not much in the way of specifics >> http://lwn.net/Articles/150652/ >> >> * GFS2 - an improved version of GFS, not on-disk compatible >> * DLM - an improved version of DLM >> * CMAN - a new version of CMAN, based on OpenAIS >> >> * CLVM - will allow more LVM2 features to be used in the cluster >> >> These seem to be all there is as far as a "roadmap" and the OpenAIS link >> doesn't seem all that descriptive >> unless one is a developer. >> >> Is there some point of reference which describes the changes between whats >> already released and what is >> planned? For instance, a post recently mentioned adding openais >> interfaces/functionality. > > For GFS2 and DLM it's largely performance improvements. For clustering > infrastructure a ton of stuff moved out of the kernel and now runs in user > space, with openais at the center. The user isn't exposed to much of the > infrastructure so there's not much user-visible change to speak about. > > Patrick recently sent this out: > https://www.redhat.com/archives/linux-cluster/2006-April/msg00126.html That's really only about CCS changes. For a (slightly out-of-date ) higher level overview see: http://sources.redhat.com/cluster/events/summit2005/pjc2005.sxi It doesn't mention openais (at least not in a relevant context!) but it might give some more idea as to what is going on. -- patrick From filipe.miranda at gmail.com Thu Apr 20 15:53:14 2006 From: filipe.miranda at gmail.com (Filipe Miranda) Date: Thu, 20 Apr 2006 12:53:14 -0300 Subject: [Linux-cluster] Cluster Planning In-Reply-To: References: <8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com> Message-ID: Scott, On 4/20/06, Scott Kellogg wrote: > > > > The only problem you will encounter is that without a SAN you will > probably have to sincronize data between the servers if your application > stores data on internar discs on the servers.... > > > Yes, this is the issue that seems to have multiple solutions. I've looked > at NFS, DRBD, rysnc, and Unison, but none of these technologies has jumped > out at me as the best one. > Correct Also to use an active-active (same service active on both servers) > configuration + loadbalancing you will need more than 2 servers; at least 4 > servers, 2 for loadbalancing (1 active/1 backup) and 2 for the critical > service active concurently on both servers (no high availability, no > failover). > > > You seem to be referring to LVS. Right, I can't implement that since I > don't have enough hardware. I think that active-passive will be the way to > go. When the active node dies, the passive node will take over. The data > will only be as fresh as the last synchronization. > > That begs the question of what happens when the active node comes back up > ... will the passive node (now active) sync its data to the new active > node? This is where picking a synchronization method becomes vital. > Excellent point here. That's the problem when you dont have a SAN --> sync! Let's suppose we have a 2 node on failover. NodeA active NodeB passive. NodeA should be in sync with NodeB If NodeA dies, NodeB takes over NodeB must then continue to sync its data to NodeA (when it becomes available again) This a tough job! About the sync technologies you mentioned: DRBD When you will need a special kernel with support for that. Or recompile a new kenel (be careful since Red Hat wont support any modified piece of software you use, specially the kernel) NFS: You will need a dedicated server to provide shares, right? Rsync: Must have pretty intelligent scripts to garantee what we discussed above, and still not satisfactory /Filipe /Scott > > > > But if the active-active is for the servers (hardware), which means not > the same service on high availability then; 2 servers doing loadbalancing > (1 active/1 backup), 2 servers providing 2 critical services, one actice on > node A and the other one active on node B (failover activated) > > Well thats my understanding about Red Hat's Cluster Suite... Please > correct me if I Am wrong... > > Att. > Filipe Miranda > > > On 4/20/06, Scott Kellogg wrote: > > > > Hello, > > > > I was wondering if I could get some assistance in setting up a two > > node cluster. > > > > We have 2 Dell PowerEdge 850 machines running RHEL4. Our license for > > Cluster Suite is still in purchasing. The main thing in the Cluster > > Suite documentation which confuses me is the use of a SAN. RHEL4 > > docs claim that the need for a SAN has been eliminated, but I'm > > having trouble find more information. Most of the docs assume you > > are using a SAN. My customer could not afford the SAN, just the servers. > > > > > > I would like to set up a high-availablity environment. I understand > > that due to the hardware configuration (no SAN, no RAID) that there > > are still points of failure. I'm hoping to set up a simple active- > > passive configuration. We will be running LAMP applications. If the > > primary server cannot deliver services, I'd like to automatically cut > > over to the backup. > > > > Ideally, I'd like to set up active-active and load balancing, since > > the servers have DRAC4 fence devices for use with STONITH. However, > > since there is no SAN, I'm not sure how data will be mirrored across > > the two machines. > > > > Any help is appreciated! > > > > Thank you, > > Scott Kellogg > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > Att. > --- > Filipe T Miranda > RHCE - Red Hat Certified Engineer > OCP8i - Oracle Certified Professional-- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Scott Kellogg > System Administrator > EG&G Technical Services, Inc. > (812) 854-7077 ext. 236 > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwill at penguincomputing.com Thu Apr 20 16:06:21 2006 From: mwill at penguincomputing.com (Michael Will) Date: Thu, 20 Apr 2006 09:06:21 -0700 Subject: [Linux-cluster] New features/architecture ? Message-ID: <433093DF7AD7444DA65EFAFE3987879C0B84CD@jellyfish.highlyscyld.com> Whats y'alls take on OCFS2 which is in the 2.6 kernel tree? Michael -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J. Sent: Thursday, April 20, 2006 8:46 AM To: David Teigland Cc: linux clustering Subject: RE: [Linux-cluster] New features/architecture ? David, thanks for the reply. I've seen the post below and in fact it is what prompted the question. Just seems like there is a lot going underneath that I was missing. I was hoping for a more nuts and bolts bag of information with respect to the changes being made across the board. This is a good start though and I'll take a look. Thanks Corey -----Original Message----- From: David Teigland [mailto:teigland at redhat.com] Sent: Thursday, April 20, 2006 11:33 AM To: Kovacs, Corey J. Cc: linux clustering Subject: Re: [Linux-cluster] New features/architecture ? On Thu, Apr 20, 2006 at 07:46:24AM -0400, Kovacs, Corey J. wrote: > I've worked with GFS 6 and 6.1 quite a bit lately and in reading the > posts over the last few months, I see a lot of references to gfs2. I'm > not quite sure where it sits in the grand scheme of things other than > it's the next big itteration of gfs as a whole and attepmpts are being > made to mearge it into the kernel. > > This post has some good info, but not much in the way of specifics > http://lwn.net/Articles/150652/ > > * GFS2 - an improved version of GFS, not on-disk compatible > * DLM - an improved version of DLM > * CMAN - a new version of CMAN, based on OpenAIS > > * CLVM - will allow more LVM2 features to be used in the cluster > > These seem to be all there is as far as a "roadmap" and the OpenAIS > link doesn't seem all that descriptive unless one is a developer. > > Is there some point of reference which describes the changes between > whats already released and what is planned? For instance, a post > recently mentioned adding openais interfaces/functionality. For GFS2 and DLM it's largely performance improvements. For clustering infrastructure a ton of stuff moved out of the kernel and now runs in user space, with openais at the center. The user isn't exposed to much of the infrastructure so there's not much user-visible change to speak about. Patrick recently sent this out: https://www.redhat.com/archives/linux-cluster/2006-April/msg00126.html Dave -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From pheeh at nodeps.org Thu Apr 20 17:33:53 2006 From: pheeh at nodeps.org (pheeh at nodeps.org) Date: Thu, 20 Apr 2006 10:33:53 -0700 (MST) Subject: [Linux-cluster] Cluster Planning In-Reply-To: References: <8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com> Message-ID: <54005.149.169.192.7.1145554433.squirrel@149.169.192.7> >From a newbie... I am having the same concerns. I have two boxes that are dedicated storage (iSCSI), and two dedicated GFS servers so it looks like: Storage Machines: storage1 storage2 GFS: gfs1 gfs2 Now, if I create the LVM on gfs1 and it encompases storage1 and storage2, I would be able to mount the LVM on gfs2 via something like nbd or use gfs1 as a iscsi target in itself. However, lets assume that gfs1 just dies. Then the LVM on gfs1 would no longer exist and gfs2 would not be able to write to the disks. So I guess my question is could I create two LVM instances one on gfs1 and one on gfs2 where each would have access to both devices such that: gfs1 /dev/cluster/web (storage1, storage2) gfs2 /dev/cluster/web (storage1, storage2) So that either GFS server could croak and a web server would still be able to access one of the boxes? Although I don't see how that would work. Now, I have figured out that with a single storage device its pretty simple since both machines just mount the iscsi with the initiator, although I just can't seem to figure out hwo to get it done with 2 storage devices such that they act like a RAID1 and failover is seemless. > Scott, > > > > On 4/20/06, Scott Kellogg wrote: >> >> >> >> The only problem you will encounter is that without a SAN you will >> probably have to sincronize data between the servers if your application >> stores data on internar discs on the servers.... >> >> >> Yes, this is the issue that seems to have multiple solutions. I've >> looked >> at NFS, DRBD, rysnc, and Unison, but none of these technologies has >> jumped >> out at me as the best one. >> > > Correct > > Also to use an active-active (same service active on both servers) >> configuration + loadbalancing you will need more than 2 servers; at >> least 4 >> servers, 2 for loadbalancing (1 active/1 backup) and 2 for the critical >> service active concurently on both servers (no high availability, no >> failover). >> >> >> You seem to be referring to LVS. Right, I can't implement that since I >> don't have enough hardware. I think that active-passive will be the way >> to >> go. When the active node dies, the passive node will take over. The >> data >> will only be as fresh as the last synchronization. >> >> That begs the question of what happens when the active node comes back >> up >> ... will the passive node (now active) sync its data to the new active >> node? This is where picking a synchronization method becomes vital. >> > > Excellent point here. > That's the problem when you dont have a SAN --> sync! > > Let's suppose we have a 2 node on failover. > NodeA active NodeB passive. > NodeA should be in sync with NodeB > If NodeA dies, NodeB takes over > NodeB must then continue to sync its data to NodeA (when it becomes > available again) > > This a tough job! > > About the sync technologies you mentioned: > > DRBD > When you will need a special kernel with support for that. Or recompile a > new kenel (be careful since Red Hat wont support any modified piece of > software you use, specially the kernel) > > NFS: > You will need a dedicated server to provide shares, right? > > Rsync: > Must have pretty intelligent scripts to garantee what we discussed above, > and still not satisfactory > > /Filipe > > > > /Scott >> >> >> >> But if the active-active is for the servers (hardware), which means not >> the same service on high availability then; 2 servers doing >> loadbalancing >> (1 active/1 backup), 2 servers providing 2 critical services, one actice >> on >> node A and the other one active on node B (failover activated) >> >> Well thats my understanding about Red Hat's Cluster Suite... Please >> correct me if I Am wrong... >> >> Att. >> Filipe Miranda >> >> >> On 4/20/06, Scott Kellogg wrote: >> > >> > Hello, >> > >> > I was wondering if I could get some assistance in setting up a two >> > node cluster. >> > >> > We have 2 Dell PowerEdge 850 machines running RHEL4. Our license for >> > Cluster Suite is still in purchasing. The main thing in the Cluster >> > Suite documentation which confuses me is the use of a SAN. RHEL4 >> > docs claim that the need for a SAN has been eliminated, but I'm >> > having trouble find more information. Most of the docs assume you >> > are using a SAN. My customer could not afford the SAN, just the >> servers. >> > >> > >> > I would like to set up a high-availablity environment. I understand >> > that due to the hardware configuration (no SAN, no RAID) that there >> > are still points of failure. I'm hoping to set up a simple active- >> > passive configuration. We will be running LAMP applications. If the >> > primary server cannot deliver services, I'd like to automatically cut >> > over to the backup. >> > >> > Ideally, I'd like to set up active-active and load balancing, since >> > the servers have DRAC4 fence devices for use with STONITH. However, >> > since there is no SAN, I'm not sure how data will be mirrored across >> > the two machines. >> > >> > Any help is appreciated! >> > >> > Thank you, >> > Scott Kellogg >> > >> > -- >> > Linux-cluster mailing list >> > Linux-cluster at redhat.com >> > https://www.redhat.com/mailman/listinfo/linux-cluster >> > >> >> >> >> -- >> Att. >> --- >> Filipe T Miranda >> RHCE - Red Hat Certified Engineer >> OCP8i - Oracle Certified Professional-- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> -- >> Scott Kellogg >> System Administrator >> EG&G Technical Services, Inc. >> (812) 854-7077 ext. 236 >> >> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From skellogg at egginc.com Thu Apr 20 17:48:05 2006 From: skellogg at egginc.com (Scott Kellogg) Date: Thu, 20 Apr 2006 13:48:05 -0400 Subject: [Linux-cluster] Cluster Planning In-Reply-To: <4447AC89.1060005@ultra-secure.de> References: <8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com> <4447AC89.1060005@ultra-secure.de> Message-ID: > > I don't want to sound rude, but either you (or your customer) have > the budget for a cluster or not. My feelings exactly. Unfortunately, I do not have control over the level of reactive, crisis-based decision-making. > If you don't have the budget, it's better to just use the 2nd > server as hot-spare and rsync the data over to the 2nd one and do > the failover by hand (and even more so the re-activation of the > primary server) This is a satisfactory solution. Would you care to elaborate? I have read the book "Linux Enterprise Clusters" and it offered several approaches to this. The budget exists to buy Cluster Suite, which I was hoping would simplify configuration of Heartbeat and STONITH. I have been looking at Unison as an option for synchronizing the primary node and the hot spare. > > This always reminds me of people who want to drive cars they cannot > really afford. The real irony is that the customer IMO doesn't even *need* a failover solution. /Scott From skellogg at egginc.com Thu Apr 20 17:50:12 2006 From: skellogg at egginc.com (Scott Kellogg) Date: Thu, 20 Apr 2006 13:50:12 -0400 Subject: [Linux-cluster] Cluster Planning In-Reply-To: References: <8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com> Message-ID: <3E3DE461-2A6C-4F43-8045-0B0880D33173@egginc.com> > > > This a tough job! Sure is! But this hot spare failover concept satisfies the goals of the project. I know that it's not fault-tolerant, but that's fine. /Scott From sanelson at gmail.com Thu Apr 20 17:55:31 2006 From: sanelson at gmail.com (Steve Nelson) Date: Thu, 20 Apr 2006 18:55:31 +0100 Subject: [Linux-cluster] Clustat in user's profile In-Reply-To: <187D3A7CAB42A54DB61F1D05F0125722086AE16E@orsmsx402.amr.corp.intel.com> References: <187D3A7CAB42A54DB61F1D05F0125722086AE16E@orsmsx402.amr.corp.intel.com> Message-ID: On 4/20/06, Lombard, David N wrote: > From: Steve Nelson on Wednesday, April 19, 2006 2:36 PM > > Hi All, > > > > On all of my clusters, I have clustat run in the user's profile, so > > the status of the cluster is visible whenever someone logs in. > > > > Someone has suggested to me that clustat could hang, and prevent user > > access. Is this a valid point? Under what (if any) circumstances > > would clustat hang? > > As another has pointed out, anything that can hang the login, will, at > the most inopportune times. > > Why not have a cron job periodically report the status into some file > and then just cat the file results during login? Yes, and indeed I discovered that I can export the info as xml too, which could be handy :) > If the user then > really wants an up-to-the-moment report, they can buy into running > clustat. Definitely. Thanks for all the advice :) S. From dist-list at LEXUM.UMontreal.CA Thu Apr 20 20:37:24 2006 From: dist-list at LEXUM.UMontreal.CA (FM) Date: Thu, 20 Apr 2006 16:37:24 -0400 Subject: [Linux-cluster] webfarm and redhat cluster ? Message-ID: <4447F104.8010003@lexum.umontreal.ca> Hello, I think I have a misunderstanding with Redhat cluster suite and webFarm. I'm testing this scenario : 2 server as load balancer (red hat + piranha) 2 WEB servers behind the balancer connected to a GFS file system. My goal is to increase the uptime of my websites and to be able to decrease the servers load by adding a new one if necessary. At first, I though that I need to create a cluster for the 2 web servers. But in this scenario, I cannot have load balancing between the 2 web servers. So, am I missing something, or my option here is to have 1 load balancer cluster active/passive (for fail over) and my 2 web servers with a connection to the GFS file system. So no cluster with those server ? The problem with that setup is that piranha will see if a server fails or if httpd fails but not if the GFS fails. Sorry if it's a newbie question From pcaulfie at redhat.com Fri Apr 21 12:33:54 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 21 Apr 2006 13:33:54 +0100 Subject: [Linux-cluster] New features/architecture ? In-Reply-To: <4447AE0C.30000@redhat.com> References: <20060420153326.GB22326@redhat.com> <4447AE0C.30000@redhat.com> Message-ID: <4448D132.2020906@redhat.com> Patrick Caulfield wrote: > David Teigland wrote: >> Patrick recently sent this out: >> https://www.redhat.com/archives/linux-cluster/2006-April/msg00126.html > > > That's really only about CCS changes. For a (slightly out-of-date ) higher > level overview see: > http://sources.redhat.com/cluster/events/summit2005/pjc2005.sxi > > It doesn't mention openais (at least not in a relevant context!) but it might > give some more idea as to what is going on. I've updated this (slightly): http://people.redhat.com/pcaulfie/cman2006.sxi -- patrick From Alain.Moulle at bull.net Fri Apr 21 14:04:27 2006 From: Alain.Moulle at bull.net (Alain Moulle) Date: Fri, 21 Apr 2006 16:04:27 +0200 Subject: [Linux-cluster] CS4 Update 2 / GUI problem ? Message-ID: <4448E66B.4050200@bull.net> Hi I have some problems to configure a 3 nodes cluster with the GUI. My version of GUI is : system-config-cluster-1.0.16-1.0 After completion of the configuration (so : members, fence devices, failover domains, resources and services) I can see in cluster.conf : ... ... So as you can see , I have to add the good fence lines as following: so that it works. Knowing that fencedevices are: So is it a known bug of GUI ? Or did I miss something somewhere in the GUI, so that I miss these fence lines in cluster nodes records. Thanks for your help Alain Moull? From gstaltari at arnet.net.ar Fri Apr 21 14:15:00 2006 From: gstaltari at arnet.net.ar (German Staltari) Date: Fri, 21 Apr 2006 11:15:00 -0300 Subject: [Linux-cluster] Re: problems with 8 node production gfs cluster In-Reply-To: <20060418133704.GA16121@redhat.com> References: <1145266165.27997.57.camel@nemanja.eunet.yu> <1145288499.6000.15.camel@nemanja.eunet.yu> <20060418133704.GA16121@redhat.com> Message-ID: <4448E8E4.3000400@arnet.net.ar> David Teigland wrote: > On Mon, Apr 17, 2006 at 05:41:39PM +0200, Nemanja Miletic wrote: > >> Hi, >> >> Does anyone think that turning on journaling on files could help us >> speed up the access to gfs partition? >> >> This would be difficult because journaling can be turned on only on >> files that are empty. We have a large number of empty files of active >> users that download all their mail from pop3 server, so turning on >> jurnaling for them should be possible. >> > > Data journaling might help, it will speed up fsync(), but will increase > the i/o going to your storage. > > >> What size should be the journals when file journaling is on? >> > > Continue to use the default. > > Another thing you might try is disabling the drop-locks callback, allowing > GFS to cache more locks. Do this before you mount: > echo "0" >> /proc/cluster/lock_dlm/drop_count > > Did you apply this changes? Could you share the results of this changes in your configuration? Do you recommend it? Thanks German Staltari From jparsons at redhat.com Fri Apr 21 15:30:58 2006 From: jparsons at redhat.com (James Parsons) Date: Fri, 21 Apr 2006 11:30:58 -0400 Subject: [Linux-cluster] CS4 Update 2 / GUI problem ? In-Reply-To: <4448E66B.4050200@bull.net> References: <4448E66B.4050200@bull.net> Message-ID: <4448FAB2.1030009@redhat.com> Alain Moulle wrote: >Hi > >I have some problems to configure a 3 nodes cluster >with the GUI. >My version of GUI is : >system-config-cluster-1.0.16-1.0 > >After completion of the configuration (so : members, >fence devices, failover domains, resources and services) >I can see in cluster.conf : > >... > > > > > > > > > > > >... > >So as you can see , I have to add the good fence lines as following: > > > > > > > > > > > > > > > > > > > > > >so that it works. > >Knowing that fencedevices are: > > login="xxxxxxx" name="yack10_fence" passwd="xxxxxxx"/> > login="xxxxxxx" name="yack23_fence" passwd="xxxxxxx"/> > login="xxxxxxx" name="yack21_fence" passwd="xxxxxxx"/> > >So is it a known bug of GUI ? >Or did I miss something somewhere in the GUI, so that >I miss these fence lines in cluster nodes records. > >Thanks for your help > >Alain Moull? > > > > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > > Setting up fencing is done in two steps: 1) Configuring fence devices 2) Configuring fencing for individual nodes. You have your fence devices all ready to go for the first step. For the second step, in the GUI, you need to select a node, and then click "Manage Fencing for this Node". A pop-up will allow you to create fence levels and instances of your fence devices in the levels. Now for baseboard management fence types such as ipmi, rsa, iLO, Drac, etc., This dichotomy between fence device and fence instance is purely artificial. They map 1:1, device:instance. Shared fence devices are a different story, and the GUI is constructed to handle configuration of these types of fences as well. Things get even stickier when you support baseboard management methods like Drac/MC (a variant of Drac), which is a shared fence method. One way to present configuration for fences would be to separate fencing into two types: Shared and unshared, and then construct appropriate GUIs for each. Another way to go is to present a consistent model and config approach for all fence types with similar configuration steps no matter if the devices are shared or not. The latter is the approach we took for the latest cluster GUI. Anyway, Alain, for now, please keep in mind the need to config fence device AND fence instance for every type of fence, even if they are one in the same such as IPMI. In the meanwhile, the fence config GUI is begging to be refactored, and we hope to have a simpler method in place by next update. BTW, your opinions are always welcome. Thanks and Regards, -Jim The editing that you did by hand looks OK except for one important omission: You forgot to close off the tags ;-) From raycharles_man at yahoo.com Fri Apr 21 14:46:00 2006 From: raycharles_man at yahoo.com (Ray Charles) Date: Fri, 21 Apr 2006 07:46:00 -0700 (PDT) Subject: [Linux-cluster] GFS is for what and how it works ? Message-ID: <20060421144600.95327.qmail@web32105.mail.mud.yahoo.com> Hi, I read your post (below) from a couple of weeks ago. My question / comment is... I've read that GFS Volumes are limited to 2TB, redhat whitepaper says 1TB (dated i am sure). You're at 1.2TB today what if you need to be at 5TB in a year?? How will you seemlessly grow the space that exist on mount points beyond 2TB ? > 4. Could you give me example what is actually the GFS real usage in > real live ? I'm using it to share a 1.2 TB storage area between two systems that use it for processing and a third system that has direct access for making backups. > I'm absolutely confuse with this GFS on how they works. Yea. The documentation is not very extensive at this point. -- Bowie __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From Bowie_Bailey at BUC.com Fri Apr 21 16:25:17 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Fri, 21 Apr 2006 12:25:17 -0400 Subject: [Linux-cluster] GFS is for what and how it works ? Message-ID: <4766EEE585A6D311ADF500E018C154E30213393F@bnifex.cis.buc.com> Ray Charles wrote: > I read your post (below) from a couple of weeks ago. > My question / comment is... > I've read that GFS Volumes are limited to 2TB, redhat > whitepaper says 1TB (dated i am sure). You're at 1.2TB > today what if you need to be at 5TB in a year?? How > will you seemlessly grow the space that exist on mount > points beyond 2TB ? I have not found any conclusive answers on maximum filesystem sizes. I think the hard limit for a filesystem with the current kernel is 8TB, but some software may have problems with it when it goes over 2TB. I don't have a way to test a filesystem that large since I don't have the storage. I'll just have to see what happens when I get there. -- Bowie From Fernando.Nino at medias.cnes.fr Wed Apr 19 16:26:05 2006 From: Fernando.Nino at medias.cnes.fr (Fernando Nino) Date: Wed, 19 Apr 2006 18:26:05 +0200 Subject: [Linux-cluster] GFS join hang Message-ID: <200604191625.k3JGPxo14450@cnes.fr> Dear all, I am running GFS 6.1 with dlm on a cluster (4 nodes + front-end) of dual-headed Opterons and RHEL4U3. Because of some problems (kernel panic...) I had to hard boot some nodes of the cluster. Now, some gfs partitions simply won't mount. In some nodes, they will simply keep waiting forever for the join of the GFS group: So three questions: - What is the join exactly waiting for ? Cluster status is fine, everybody is member ... - What does the status code mean in the cman_tool output ? - What can I do to restart this cluster ? NB: Before testing this (below) I rebooted the complete cluster and gfs_fsck'ed /all nodes /with everything unmounted. ---------------------------------------------------------------------------------------------------- root # service clvmd start root #: service gfs start Mounting GFS filesystems: # forever ! in another console I get: root # dmesg | tail ... GFS: fsid=globcover:baieGC2b.0: jid=14: Done GFS: fsid=globcover:baieGC2b.0: jid=15: Trying to acquire journal lock... GFS: fsid=globcover:baieGC2b.0: jid=15: Looking at journal... GFS: fsid=globcover:baieGC2b.0: jid=15: Done GFS: Trying to join cluster "lock_dlm", "globcover:baieGC3a" root # cman_tool services Service Name GID LID State Code Fence Domain: "default" 11 2 run - [1 5 4 3 2] DLM Lock Space: "clvmd" 12 3 run - [1 5 4 3 2] DLM Lock Space: "baieGC2b" 13 4 run - [1 5] DLM Lock Space: "baieGC3a" 15 6 run - [1 5 2 4 3] GFS Mount Group: "baieGC2b" 14 5 run - [1 5] GFS Mount Group: "baieGC3a" 0 7 join S-2,2,4 [] ---------------------------------------------------------------------------------------------------- Thanks, -- ------------------------------------------------------------------------ Fernando NI?O CNES - BPi 2102 Medias-France/IRD 18, Av. Edouard Belin T?l: 05.61.27.40.74 31401 Toulouse Cedex 9 From Fernando.Nino at medias.cnes.fr Thu Apr 20 15:00:48 2006 From: Fernando.Nino at medias.cnes.fr (Fernando Nino) Date: Thu, 20 Apr 2006 17:00:48 +0200 Subject: [Linux-cluster] GFS join hang In-Reply-To: <20060420143247.GA22326@redhat.com> References: <200604200756.k3K7uDo25619@cnes.fr> <20060420143247.GA22326@redhat.com> Message-ID: <200604201501.k3KF19K14069@cnes.fr> An HTML attachment was scrubbed... URL: From sdake at redhat.com Thu Apr 20 20:02:29 2006 From: sdake at redhat.com (Steven Dake) Date: Thu, 20 Apr 2006 13:02:29 -0700 Subject: [Linux-cluster] type punned pointers breakage Message-ID: <1145563349.25648.37.camel@shih.broked.org> likely to cause problems with the optimizer - patch attached to fix. -------------- next part -------------- A non-text attachment was scrubbed... Name: type-punned.patch Type: text/x-patch Size: 1496 bytes Desc: not available URL: From sdake at redhat.com Thu Apr 20 20:11:00 2006 From: sdake at redhat.com (Steven Dake) Date: Thu, 20 Apr 2006 13:11:00 -0700 Subject: [Linux-cluster] another type punned patch Message-ID: <1145563860.25648.39.camel@shih.broked.org> attached -------------- next part -------------- A non-text attachment was scrubbed... Name: type-punned-p2.patch Type: text/x-patch Size: 320 bytes Desc: not available URL: From sdake at redhat.com Thu Apr 20 20:24:01 2006 From: sdake at redhat.com (Steven Dake) Date: Thu, 20 Apr 2006 13:24:01 -0700 Subject: [Linux-cluster] member_list_to_id looks fishy Message-ID: <1145564641.25648.41.camel@shih.broked.org> patch attached (untested) to possibly fix -------------- next part -------------- A non-text attachment was scrubbed... Name: member_list.patch Type: text/x-patch Size: 2849 bytes Desc: not available URL: From jparsons at redhat.com Thu Apr 20 21:16:55 2006 From: jparsons at redhat.com (James Parsons) Date: Thu, 20 Apr 2006 17:16:55 -0400 Subject: [Linux-cluster] New APC agent Message-ID: <4447FA47.8090203@redhat.com> Hello all, This is an snmp based fence agent for APC power switches to be used with RHEL4 Red Hat Cluster Suite. The reasons to use this agent rather than the current fence_apc agent are: 1) This script has been tested successfully with EVERY powerswitch that APC currently makes. 2) It will work on many older models that are no longer supported by APC. I have been told that it even works with the AP9200 switch. Older switches usually don't do well with the fence_apc script. 3) This agent works with large power switches that have more than 8 outlets. The fence_apc script will also, in the next update -- this script will work for you now. If feedback on this beta version of the agent is good, and if ganged switches can be supported, then this agent may replace fence_apc. After unpacking the attached tar file, you will find 3 files: README fence_apc_snmp powernet369.mib In order to use this agent, you will need to have net-snmp-utils installed on every node in your cluster. net-snmp-utils is scheduled for inclusion in the base RHEL distribution for Update 4, and is yummable in FC5. After net-snmp-utils is installed, there will be a directory named: /usr/share/snmp/mibs/ Place the accompanying powernet369.mib file in this directory. To use the agent, cp the agent to the /sbin directory on every cluster node. The interface for the fence_apc_snmp agent is identical to the existing fence_apc agent, so if you are using APC for fencing in your cluster, you *could* backup your current fence_apc agent, and rename this agent from fence_apc_snmp to fence_apc, and it should just work. NOTE: The fence_apc_snmp agent does not yet support ganged or 'daisy-chained' APC switches. If you would rather not copy over your fence_apc agent, you can still use the fence_apc_snmp agent by dropping it into /sbin on every node, and then defining a in the cluster.conf file with agent="fence_apc_snmp" as an attribute, and use it that way. Note, please, that the GUI does not support this agent yet, and you will have to edit your cluster.conf by hand and then propagate it yourself. If you need help with this, email me on linux-cluster or at the address below. Big thanks to Nate Straz who laid the foundation for this agent. The text of this email can also be found inside the tar file as a README. Please let me know how this agent works. Thanks and Regards, -Jim -------------- next part -------------- A non-text attachment was scrubbed... Name: fence_apc_snmp.tar.gz Type: application/x-gzip Size: 110511 bytes Desc: not available URL: From jparsons at redhat.com Fri Apr 21 19:08:33 2006 From: jparsons at redhat.com (James Parsons) Date: Fri, 21 Apr 2006 15:08:33 -0400 Subject: [Linux-cluster] New APC agent Message-ID: <44492DB1.5010105@redhat.com> Hello all, This is an snmp based fence agent for APC power switches to be used with RHEL4 Red Hat Cluster Suite. The reasons to use this agent rather than the current fence_apc agent are: 1) This script has been tested successfully with EVERY powerswitch that APC currently makes. 2) It will work on many older models that are no longer supported by APC. I have been told that it even works with the AP9200 switch. Older switches usually don't do well with the fence_apc script. 3) This agent works with large power switches that have more than 8 outlets. The fence_apc script will also, in the next update -- this script will work for you now. If feedback on this beta version of the agent is good, and if ganged switches can be supported, then this agent may replace fence_apc. After unpacking the attached tar file, you will find 3 files: README fence_apc_snmp powernet369.mib In order to use this agent, you will need to have net-snmp-utils installed on every node in your cluster. net-snmp-utils is scheduled for inclusion in the base RHEL distribution for Update 4, and is yummable in FC5. After net-snmp-utils is installed, there will be a directory named: /usr/share/snmp/mibs/ Place the accompanying powernet369.mib file in this directory. To use the agent, cp the agent to the /sbin directory on every cluster node. The interface for the fence_apc_snmp agent is identical to the existing fence_apc agent, so if you are using APC for fencing in your cluster, you *could* backup your current fence_apc agent, and rename this agent from fence_apc_snmp to fence_apc, and it should just work. NOTE: The fence_apc_snmp agent does not yet support ganged or 'daisy-chained' APC switches. If you would rather not copy over your fence_apc agent, you can still use the fence_apc_snmp agent by dropping it into /sbin on every node, and then defining a in the cluster.conf file with agent="fence_apc_snmp" as an attribute, and use it that way. Note, please, that the GUI does not support this agent yet, and you will have to edit your cluster.conf by hand and then propagate it yourself. If you need help with this, email me on linux-cluster or at the address below. Big thanks to Nate Straz who laid the foundation for this agent. The text of this email can also be found inside the tar file as a README. Please let me know how this agent works. Thanks and Regards, -Jim -------------- next part -------------- A non-text attachment was scrubbed... Name: fence_apc_snmp.tar.gz Type: application/x-gzip Size: 110511 bytes Desc: not available URL: From jparsons at redhat.com Fri Apr 21 19:14:01 2006 From: jparsons at redhat.com (James Parsons) Date: Fri, 21 Apr 2006 15:14:01 -0400 Subject: [Linux-cluster] New APC agent Message-ID: <44492EF9.5010809@redhat.com> Hello all, This is an snmp based fence agent for APC power switches to be used with RHEL4 Red Hat Cluster Suite. The reasons to use this agent rather than the current fence_apc agent are: 1) This script has been tested successfully with EVERY powerswitch that APC currently makes. 2) It will work on many older models that are no longer supported by APC. I have been told that it even works with the AP9200 switch. Older switches usually don't do well with the fence_apc script. 3) This agent works with large power switches that have more than 8 outlets. The fence_apc script will also, in the next update -- this script will work for you now. If feedback on this beta version of the agent is good, and if ganged switches can be supported, then this agent may replace fence_apc. After unpacking the attached tar file, you will find 3 files: README fence_apc_snmp powernet369.mib In order to use this agent, you will need to have net-snmp-utils installed on every node in your cluster. net-snmp-utils is scheduled for inclusion in the base RHEL distribution for Update 4, and is yummable in FC5. After net-snmp-utils is installed, there will be a directory named: /usr/share/snmp/mibs/ Place the accompanying powernet369.mib file in this directory. To use the agent, cp the agent to the /sbin directory on every cluster node. The interface for the fence_apc_snmp agent is identical to the existing fence_apc agent, so if you are using APC for fencing in your cluster, you *could* backup your current fence_apc agent, and rename this agent from fence_apc_snmp to fence_apc, and it should just work. NOTE: The fence_apc_snmp agent does not yet support ganged or 'daisy-chained' APC switches. If you would rather not copy over your fence_apc agent, you can still use the fence_apc_snmp agent by dropping it into /sbin on every node, and then defining a in the cluster.conf file with agent="fence_apc_snmp" as an attribute, and use it that way. Note, please, that the GUI does not support this agent yet, and you will have to edit your cluster.conf by hand and then propagate it yourself. If you need help with this, email me on linux-cluster or at the address below. Big thanks to Nate Straz who laid the foundation for this agent. The text of this email can also be found inside the tar file as a README. Please let me know how this agent works. Thanks and Regards, -Jim -------------- next part -------------- A non-text attachment was scrubbed... Name: fence_apc_snmp.tar.gz Type: application/x-gzip Size: 110511 bytes Desc: not available URL: From jparsons at redhat.com Fri Apr 21 19:29:21 2006 From: jparsons at redhat.com (James Parsons) Date: Fri, 21 Apr 2006 15:29:21 -0400 Subject: [Linux-cluster] New APC agent Message-ID: <44493291.3070101@redhat.com> Hello all, This is an snmp based fence agent for APC power switches to be used with RHEL4 Red Hat Cluster Suite. The reasons to use this agent rather than the current fence_apc agent are: 1) This script has been tested successfully with EVERY powerswitch that APC currently makes. 2) It will work on many older models that are no longer supported by APC. I have been told that it even works with the AP9200 switch. Older switches usually don't do well with the fence_apc script. 3) This agent works with large power switches that have more than 8 outlets. The fence_apc script will also, in the next update -- this script will work for you now. If feedback on this beta version of the agent is good, and if ganged switches can be supported, then this agent may replace fence_apc. After unpacking the attached tar file, you will find 3 files: README fence_apc_snmp powernet369.mib In order to use this agent, you will need to have net-snmp-utils installed on every node in your cluster. net-snmp-utils is scheduled for inclusion in the base RHEL distribution for Update 4, and is yummable in FC5. After net-snmp-utils is installed, there will be a directory named: /usr/share/snmp/mibs/ Place the accompanying powernet369.mib file in this directory. To use the agent, cp the agent to the /sbin directory on every cluster node. The interface for the fence_apc_snmp agent is identical to the existing fence_apc agent, so if you are using APC for fencing in your cluster, you *could* backup your current fence_apc agent, and rename this agent from fence_apc_snmp to fence_apc, and it should just work. NOTE: The fence_apc_snmp agent does not yet support ganged or 'daisy-chained' APC switches. If you would rather not copy over your fence_apc agent, you can still use the fence_apc_snmp agent by dropping it into /sbin on every node, and then defining a in the cluster.conf file with agent="fence_apc_snmp" as an attribute, and use it that way. Note, please, that the GUI does not support this agent yet, and you will have to edit your cluster.conf by hand and then propagate it yourself. If you need help with this, email me on linux-cluster or at the address below. Big thanks to Nate Straz who laid the foundation for this agent. The text of this email can also be found inside the tar file as a README. Please let me know how this agent works. Thanks and Regards, -Jim -------------- next part -------------- A non-text attachment was scrubbed... Name: fence_apc_snmp.tar.gz Type: application/x-gzip Size: 110511 bytes Desc: not available URL: From aberoham at gmail.com Sat Apr 22 02:33:45 2006 From: aberoham at gmail.com (aberoham at gmail.com) Date: Fri, 21 Apr 2006 19:33:45 -0700 Subject: [Linux-cluster] Re: kernel noise, "Neighbour table overflow." ? In-Reply-To: <3bdb07840604171938s723f28b0p4f0a195845bc521a@mail.gmail.com> References: <3bdb07840604171938s723f28b0p4f0a195845bc521a@mail.gmail.com> Message-ID: <3bdb07840604211933l77e94c4dh4e10b27a09579d24@mail.gmail.com> Now, the same nodes that give the Neighbour table overflow messages are unable to ping?! Chcek this out -- [root at gfs02 ~]# ping 192.168.60.188 connect: No buffer space available [root at gfs02 ~]# printk: 4 messages suppressed. Neighbour table overflow. printk: 6 messages suppressed. Neighbour table overflow. printk: 5 messages suppressed. Neighbour table overflow. printk: 1 messages suppressed. [root at gfs02 ~]# uptime 19:32:32 up 4 days, 2:00, 4 users, load average: 0.03, 0.07, 0.08 [root at gfs02 ~]# [root at gfs02 ~]# clustat Member Status: Quorate Member Name Status ------ ---- ------ gfs03 Online, rgmanager gfs02 Online, Local, rgmanager gfs01 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- nfshome gfs03 started ip-test gfs03 started jukebox gfs02 started [root at gfs02 ~]# cman_tool status Protocol version: 5.0.1 Config version: 93 Cluster name: gfscluster Cluster ID: 41396 Cluster Member: Yes Membership state: Cluster-Member Nodes: 3 Expected_votes: 3 Total_votes: 3 Quorum: 2 Active subsystems: 8 Node name: gfs02 Node addresses: 10.0.19.11 On 4/17/06, aberoham at gmail.com wrote: > > > I'm running a test three-node CS/GFS cluster. At random intervals I get > the following kernel messages streaming out to /dev/console on all three > nodes. > > --- > Neighbour table overflow. > printk: 166 messages suppressed. > Neighbour table overflow. > printk: 1 messages suppressed. > Neighbour table overflow. > printk: 1 messages suppressed. > Neighbour table overflow. > printk: 6 messages suppressed. > Neighbour table overflow. > printk: 5 messages suppressed. > Neighbour table overflow. > printk: 15 messages suppressed. > Neighbour table overflow. > printk: 7 messages suppressed. > Neighbour table overflow. > printk: 11 messages suppressed. > --- > > Are these messages related to CS/GFS? What triggers 'em? And should I > worry about it? > > I'm running Linux 2.6.9-34.ELsmp, GFS-kernel-smp-2.6.9-45, GFS-6.1.5-0 and > dlm-kernel-smp-2.6.9-41.7. > > [root at gfs02 ~]# service cman status > Protocol version: 5.0.1 > Config version: 73 > Cluster name: gfscluster > Cluster ID: 41396 > Cluster Member: Yes > Membership state: Cluster-Member > Nodes: 3 > Expected_votes: 3 > Total_votes: 3 > Quorum: 2 > Active subsystems: 8 > Node name: gfs02 > Node addresses: 10.0.19.11 > > [root at gfs02 ~]# cat /proc/cluster/services > Service Name GID LID State Code > Fence Domain: "default" 1 2 run - > [1 2 3] > > DLM Lock Space: "clvmd" 2 3 run - > [1 2 3] > > DLM Lock Space: "Magma" 4 5 run - > [1 2 3] > > DLM Lock Space: "gfstest" 5 6 run - > [1 2] > > GFS Mount Group: "gfstest" 6 7 run - > [1 2] > > User: "usrm::manager" 3 4 run - > [1 2 3] > > > Thanks. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From troels at arvin.dk Sat Apr 22 12:08:12 2006 From: troels at arvin.dk (Troels Arvin) Date: Sat, 22 Apr 2006 14:08:12 +0200 Subject: [Linux-cluster] Preventing automatic poweron? Message-ID: With a Red Hat Cluster Suite for RHEL 4, consisting of two cluster nodes (for fail-over), using HP ILOs for fencing: 1) Both nodes are shut down, using "poweroff". 2) Node 1 is started by pressing the power button. I don't want Node 2 to start yet. <------ 3) After a little while, Node 1 fences Node 2, so that Node 2 starts. How can I prevent this automatic power-on? I mean: Node 1 should be able to see that Node 2 is currently powered off, so there is no need to fence it(?). -- Greetings from Troels Arvin From gforte at leopard.us.udel.edu Sat Apr 22 13:42:17 2006 From: gforte at leopard.us.udel.edu (Greg Forte) Date: Sat, 22 Apr 2006 09:42:17 -0400 Subject: [Linux-cluster] Preventing automatic poweron? In-Reply-To: References: Message-ID: <444A32B9.9020809@leopard.us.udel.edu> Troels Arvin wrote: > With a Red Hat Cluster Suite for RHEL 4, consisting of two cluster nodes > (for fail-over), using HP ILOs for fencing: > > 1) Both nodes are shut down, using "poweroff". > > 2) Node 1 is started by pressing the power button. > I don't want Node 2 to start yet. <------ > > 3) After a little while, Node 1 fences Node 2, so that Node 2 starts. > > How can I prevent this automatic power-on? I mean: Node 1 should be able > to see that Node 2 is currently powered off, so there is no need to fence > it(?). Unplug it? the problem is, in a 2-node cluster, node 1 by itself has quorum, so if node 2 is powered off then it considers it dead and tries to revive it by fencing. You either need to bring node 1 back up in "non-cluster mode" (stop cman/rgmanager/etc) or take further steps to prevent node 2 from being powered on. This is why I prefer external fencing agents - if I manually turn off a node, it stays off ;-) can you cut off communications to node 2's ILO from node 1 temporarily? (and programatically? I assume you can physically yank the communications cable, but that's no better than unplugging the node) -g From troels at arvin.dk Sat Apr 22 13:51:32 2006 From: troels at arvin.dk (Troels Arvin) Date: Sat, 22 Apr 2006 15:51:32 +0200 Subject: [Linux-cluster] Re: Preventing automatic poweron? References: <444A32B9.9020809@leopard.us.udel.edu> Message-ID: On Sat, 22 Apr 2006 09:42:17 -0400, Greg Forte wrote: > the problem is, in a 2-node cluster, node 1 by itself has > quorum, so if node 2 is powered off then it considers it dead and tries > to revive it by fencing Actually, I think what I want is: Fencing should always result in poweroff, not reboot. I wonder if there is a clean way to do that? Rationale: A node should never die. If it does, by definition, an undefined state has occurred, and I would rather not have such a server start without having a chance to look into log files, etc. > can you cut off communications to node 2's ILO from node 1 temporarily? Hmm, good point. The ILO communication happens through cross-over cables with endpoints in the ILO, and in a dedicated NIC. I guess that I may ifdown the interface corresponding to the NIC on Node 1 (the one performing the fencing). -- Greetings from Troels Arvin From gforte at leopard.us.udel.edu Sat Apr 22 14:08:03 2006 From: gforte at leopard.us.udel.edu (Greg Forte) Date: Sat, 22 Apr 2006 10:08:03 -0400 Subject: [Linux-cluster] Re: Preventing automatic poweron? In-Reply-To: References: <444A32B9.9020809@leopard.us.udel.edu> Message-ID: <444A38C3.8060208@leopard.us.udel.edu> Troels Arvin wrote: > On Sat, 22 Apr 2006 09:42:17 -0400, Greg Forte wrote: >> the problem is, in a 2-node cluster, node 1 by itself has >> quorum, so if node 2 is powered off then it considers it dead and tries >> to revive it by fencing > > Actually, I think what I want is: Fencing should always result in > poweroff, not reboot. I wonder if there is a clean way to do that? > > Rationale: A node should never die. If it does, by definition, an > undefined state has occurred, and I would rather not have such a server > start without having a chance to look into log files, etc. Check your cluster.conf - it's probably already sending a "poweroff", then a "poweron", like this (for an APC power unit): in which case you can just drop the "on" part to achieve the desired result. Otherwise you may need to hack at the fence_ilo script a bit - it's just a perl script. -g From troels at arvin.dk Sat Apr 22 14:30:30 2006 From: troels at arvin.dk (Troels Arvin) Date: Sat, 22 Apr 2006 16:30:30 +0200 Subject: [Linux-cluster] -P argument to ccsd Message-ID: Hello, Setup: Two-node RHEL 4-based fail-over cluster. The nodes are multi-homed, i.e. they listen to several network interfaces for various purposes. Dedicated heartbeat ethernet (cross-over) cabling is used for cluster heartbeat. I don't like the fact that the cluster-related daemons listen on multiple network interfaces. One should be sufficient. However, it seems that the daemons (like ccsd) don't have a parameter to specify which interface to listen/communicate on. So I thought that I would use iptables to limit network access to the daemons. In the manual page for ccsd, the "-P" argument is described. The argument governs which ports are being used for inter-ccsd communication ("b"), cluster membership communication ("c"), and administrative programs ("f"). But how are multiple values specified? Like this?: -P b:xxx c:yyy f:zzz or like this?: -P "b:xxx c:yyy f:zzz" or like this?: -P b:xxx -P c:yyy -P f:zzz -- Greetings from Troels Arvin From troels at arvin.dk Sat Apr 22 14:44:48 2006 From: troels at arvin.dk (Troels Arvin) Date: Sat, 22 Apr 2006 16:44:48 +0200 Subject: [Linux-cluster] Meaning of "service" Message-ID: Hi, Setup: Two-node RHEL 4-based fail-over cluster. The cluster runs several daemons which are dependent on each other: httpd: serves static content, and acts as a front-end to Tomcat tomcat: handles servlets, etc; depends on postgresql postgresql: database for servlets run by tomcat The daemons all use a shared storage area (SCSI-box separate from the servers, connected by SCSI cables) for data and logging. The daemons have init-scripts which are specified in system-config-cluster. Should I set this up as a) one Cluster Service, b) as three different Cluster Services? If a: How do I specify that the postgresql script should be run before the tomcat script? If b: How do I specify that the postgresql service should be started before the tomcat service? -- Greetings from Troels Arvin From troels at arvin.dk Sat Apr 22 15:07:47 2006 From: troels at arvin.dk (Troels Arvin) Date: Sat, 22 Apr 2006 17:07:47 +0200 Subject: [Linux-cluster] Re: Re: Preventing automatic poweron? References: <444A32B9.9020809@leopard.us.udel.edu> <444A38C3.8060208@leopard.us.udel.edu> Message-ID: On Sat, 22 Apr 2006 10:08:03 -0400, Greg Forte wrote: > Check your cluster.conf - it's probably already sending a "poweroff", > then a "poweron", like this (for an APC power unit): > > > > > > > My cluster.conf actually didn't have any "option" attributes in its tags. But I added the following attributes to each of my two tags: action="off" And it works. Thanks; that way, I don't have to modify the "fence" RPM-package. -- Greetings from Troels Arvin From eric at bootseg.com Sat Apr 22 15:13:41 2006 From: eric at bootseg.com (Eric Kerin) Date: Sat, 22 Apr 2006 11:13:41 -0400 Subject: [Linux-cluster] Meaning of "service" In-Reply-To: References: Message-ID: <1145718821.3302.20.camel@auh5-0479.corp.jabil.org> On Sat, 2006-04-22 at 16:44 +0200, Troels Arvin wrote: > Hi, > > Setup: Two-node RHEL 4-based fail-over cluster. > > The cluster runs several daemons which are dependent on each other: > > httpd: serves static content, and acts as a front-end to Tomcat > tomcat: handles servlets, etc; depends on postgresql > postgresql: database for servlets run by tomcat > > The daemons all use a shared storage area (SCSI-box separate from the > servers, connected by SCSI cables) for data and logging. The daemons have > init-scripts which are specified in system-config-cluster. > > Should I set this up as > a) one Cluster Service, > b) as three different Cluster Services? > I have a very similar setup for my cluster. I recommend option b. That will allow you to balance the processor load from the different services onto the cluster nodes. In my setup, I keep my Tomcat and httpd processes in the same service, since they work from a single file system. The downsides: * You have to have IP addresses bound for each service to allow the other services to connect no matter what node it's running on. * You need to partition your shared storage into at least one partition for each different cluster service. (I use CLVM to dynamically partition mine, it's a wonderful thing) > If a: > How do I specify that the postgresql script should be run before the > tomcat script? > > If b: > How do I specify that the postgresql service should be started before the > tomcat service? > In my experience, this isn't too much of a problem. But if you list the PostgreSQL service first in your cluster.conf file (or first in the system-config-cluster) it "should" start first, there is no guarantee. In practice, Tomcat JNDI connection pooling will handle making connection to the database once it comes online. Normally Tomcat's startup time is much longer than PostgreSQL's so it should be up before Tomcat tries to connect. Connection pooling will also handle re-connection upon server failover. If you are not using connection pooling, and are creating a connection to the database with each page load, you still won't run into an issue. Thanks, Eric Kerin eric at bootseg.com From troels at arvin.dk Sat Apr 22 16:16:17 2006 From: troels at arvin.dk (Troels Arvin) Date: Sat, 22 Apr 2006 18:16:17 +0200 Subject: [Linux-cluster] Re: -P argument to ccsd References: Message-ID: Hello again, On Sat, 22 Apr 2006 16:30:30 +0200, I wrote: > or like this?: > -P b:xxx -P c:yyy -P f:zzz That was it. http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/ccs/daemon/ccsd.c?rev=1.14.2.5.4.1&content-type=text/x-cvsweb-markup&cvsroot=cluster gave a hint, and tests proved it to be true. -- Greetings from Troels Arvin From troels at arvin.dk Sat Apr 22 16:49:12 2006 From: troels at arvin.dk (Troels Arvin) Date: Sat, 22 Apr 2006 18:49:12 +0200 Subject: [Linux-cluster] Re: Preventing automatic poweron? References: <444A32B9.9020809@leopard.us.udel.edu> Message-ID: On Sat, 22 Apr 2006 09:42:17 -0400, Greg Forte wrote: [...] > You either need to bring node 1 back up in > "non-cluster mode" (stop cman/rgmanager/etc) or take further steps to > prevent node 2 from being powered on. This reminds me: How about using runlevel 4 as a "network-connected, but outside-cluster" runlevel? (I don't recall seeing any specification of what runlevel 4 should be used for.) -- Greetings from Troels Arvin From gforte at leopard.us.udel.edu Sat Apr 22 17:13:41 2006 From: gforte at leopard.us.udel.edu (Greg Forte) Date: Sat, 22 Apr 2006 13:13:41 -0400 Subject: [Linux-cluster] Re: Preventing automatic poweron? In-Reply-To: References: <444A32B9.9020809@leopard.us.udel.edu> Message-ID: <444A6445.50402@leopard.us.udel.edu> Troels Arvin wrote: > On Sat, 22 Apr 2006 09:42:17 -0400, Greg Forte wrote: > [...] >> You either need to bring node 1 back up in >> "non-cluster mode" (stop cman/rgmanager/etc) or take further steps to >> prevent node 2 from being powered on. > > This reminds me: How about using runlevel 4 as a "network-connected, but > outside-cluster" runlevel? (I don't recall seeing any specification of > what runlevel 4 should be used for.) Sure, makes sense to me. none of the runlevels are set in stone, it's just a matter of convention. I think RHEL+RHCS makes both 3 and 5 cluster-enabled (sans and with X services, respectively), so there's no reason why 2 and 4 couldn't be the same minus cluster services (2 is supposed to be "Multiuser, without NFS" according to the comments in /etc/inittab, but again, that's just convention; and 4 is "unused"). Runlevels 7-9 are also valid, according to the init man page, though I've never actually tried them. -g From johngw at comcast.net Fri Apr 21 20:25:11 2006 From: johngw at comcast.net (John Griffin-Wiesner) Date: Fri, 21 Apr 2006 15:25:11 -0500 Subject: [Linux-cluster] where are built GFS rpms?, and upgrade question Message-ID: <20060421202511.GA18697@rubicon.stillrunning.com> Two questions: 1. I can find src.rpm's but no built GFS rpm's for rhel 3U7. I believe that should be GFS-6.0.2.30-0. Can someone tell me where those are hiding? Or do we all have to build those ourselves now? 2. When doing a minor upgrade (6.0.2.20-2 to 6.0.2.30-0) of a group of GFS systems do those all have to be taken off-line and upgraded simultaneously? (The GFS admin guide I have talks only about upgrading from 5.2.1 to 6.0.) Or can they be upgraded individually and work with the other GFS servers that are still running the older rev? Thanks -- John Griffin-Wiesner johngw at comcast.net From nick at sqrt.co.uk Sun Apr 23 03:24:40 2006 From: nick at sqrt.co.uk (Nick Burrett) Date: Sat, 22 Apr 2006 20:24:40 -0700 Subject: [Linux-cluster] kernel noise, "Neighbour table overflow." ? In-Reply-To: <3bdb07840604171938s723f28b0p4f0a195845bc521a@mail.gmail.com> References: <3bdb07840604171938s723f28b0p4f0a195845bc521a@mail.gmail.com> Message-ID: <444AF378.5020106@sqrt.co.uk> aberoham at gmail.com wrote: > > I'm running a test three-node CS/GFS cluster. At random intervals I get > the following kernel messages streaming out to /dev/console on all three > nodes. > > --- > Neighbour table overflow. > printk: 166 messages suppressed. > Neighbour table overflow. > printk: 1 messages suppressed. It looks like your ARP table is overflowing. Try setting the gc_thresh[123] values in /proc/sys/net/ipv4/neigh/ See the manpage arp(7) for further details. Regards, Nick. From mykleb at no.ibm.com Sun Apr 23 10:55:42 2006 From: mykleb at no.ibm.com (Jan-Frode Myklebust) Date: Sun, 23 Apr 2006 12:55:42 +0200 Subject: [Linux-cluster] Re: Linux (qmail) clustering References: Message-ID: On 2006-04-11, Haydar Akpinar wrote: > > I would like to know if it is possible to do and also if any one has done > qmail clustering on a Linux box. Since qmail is Maildir based (no locking problems to worry about), I think this should be fairly easy to do. You'll just need to decide which directories needs to be shared, and which needs to be private to each node. It will probably be enough to have the home directories on a shared storage (GFS or simply just NFS), and just do load balancing by equal MX record priorities. -- Jan-Frode Myklebust, IT Specialist, IBM Global Services, ITS From troels at arvin.dk Sun Apr 23 11:04:25 2006 From: troels at arvin.dk (Troels Arvin) Date: Sun, 23 Apr 2006 13:04:25 +0200 Subject: [Linux-cluster] Re: Meaning of "service" References: <1145718821.3302.20.camel@auh5-0479.corp.jabil.org> Message-ID: Hello, On Sat, 22 Apr 2006 11:13:41 -0400, Eric Kerin wrote: >> Should I set this up as >> a) one Cluster Service, >> b) as three different Cluster Services? >> > I have a very similar setup for my cluster. I recommend option b. I ended up doing option a, because I couldn't get the other option working, for some strange reason. By the way: The manual is rather unclear about the difference between _adding_ a resource, and _attaching_ a resource. Can someone explain the difference? -- Greetings from Troels Arvin From jason at monsterjam.org Mon Apr 24 01:00:57 2006 From: jason at monsterjam.org (Jason) Date: Sun, 23 Apr 2006 21:00:57 -0400 Subject: [Linux-cluster] where are built GFS rpms?, and upgrade question In-Reply-To: <20060421202511.GA18697@rubicon.stillrunning.com> References: <20060421202511.GA18697@rubicon.stillrunning.com> Message-ID: <20060424010057.GA53613@monsterjam.org> On Fri, Apr 21, 2006 at 03:25:11PM -0500, John Griffin-Wiesner wrote: > Two questions: > > 1. I can find src.rpm's but no built GFS rpm's for rhel 3U7. > I believe that should be GFS-6.0.2.30-0. Can someone > tell me where those are hiding? Or do we all have to build > those ourselves now? http://www.gyrate.org/archives/9 From ookami at gmx.de Mon Apr 24 03:59:24 2006 From: ookami at gmx.de (wolfgang pauli) Date: Mon, 24 Apr 2006 05:59:24 +0200 (MEST) Subject: [Linux-cluster] different subnets/ manual fencing Message-ID: <30320.1145851164@www075.gmx.net> Hi, I spent the whole day (sunday) trying to get this working... I guess these two questions might solve the issue. 1. Can I have a cluster span over more than one subnet? 2. When I try to start the cluster software, I always have to start it on all nodes at the same time. If I don't do it, startup will hang while fenced is starting up. I am using manual fencing. Probably the default configuration. The problem is that some of the nodes produce kernel-panics (when starting cman). so i would like to start the nodes one by one and test what the problem is. ... ... -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail From pcaulfie at redhat.com Mon Apr 24 07:24:20 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Mon, 24 Apr 2006 08:24:20 +0100 Subject: [Linux-cluster] different subnets/ manual fencing In-Reply-To: <30320.1145851164@www075.gmx.net> References: <30320.1145851164@www075.gmx.net> Message-ID: <444C7D24.4090504@redhat.com> wolfgang pauli wrote: > Hi, > > I spent the whole day (sunday) trying to get this working... > I guess these two questions might solve the issue. > > 1. Can I have a cluster span over more than one subnet? Yes, but you'll need to configure it for multicas rather than broadcast - and make sure that any intervening routers are good enough. > 2. When I try to start the cluster software, I always have to start it on > all nodes at the same time. If I don't do it, startup will hang while > fenced is starting up. I am using manual fencing. Probably the default > configuration. The problem is that some of the nodes produce > kernel-panics (when starting cman). I'd like to see those please. so i would like to start the nodes > one by one and test what the problem is. > > > ... > > > > > > > > ... > > > > -- patrick From Alain.Moulle at bull.net Mon Apr 24 11:26:11 2006 From: Alain.Moulle at bull.net (Alain Moulle) Date: Mon, 24 Apr 2006 13:26:11 +0200 Subject: [Linux-cluster] Re: CS4 Update 2 / GUI problem ? Message-ID: <444CB5D3.5010408@bull.net> Thanks Jim, it was effectively the problem : the second step about managing the fence for each node was missing (but there is nothing in documentation about this step and dialog boxes ...) Another problem/question: when you have finished the 3 nodes cluster configuration, and Save the file in /etc/cluster/cluster.conf on local node, the Icon "Send to Cluster" is not available because the cs4 is not active at the moment. But with three nodes, even if you try to start the cs4 on this local node (where I've done the configuration) , it can't start alone because the cluster is not quorate ... and you can't start cman on other nodes, it seems that the start is failed because of no cluster.conf currently on the node. So, do we have to do manually the mkdir /etc/cluster on both other nodes, and scp of cluster.conf towards both nodes ? Or is there another tip via GUI to start the cs4 on three nodes despite two nodes have not yet any cluster.conf available ? Thanks Alain Moull? From Alain.Moulle at bull.net Mon Apr 24 11:38:16 2006 From: Alain.Moulle at bull.net (Alain Moulle) Date: Mon, 24 Apr 2006 13:38:16 +0200 Subject: [Linux-cluster] Re: CS4 Update 2 / GUI problem ? In-Reply-To: <444CB5D3.5010408@bull.net> References: <444CB5D3.5010408@bull.net> Message-ID: <444CB8A8.3060502@bull.net> Alain Moulle wrote: > Thanks Jim, it was effectively the problem : the second step > about managing the fence for each node was missing (but there > is nothing in documentation about this step and dialog boxes ...) > > Another problem/question: > when you have finished the 3 nodes cluster configuration, > and Save the file in /etc/cluster/cluster.conf on local node, > the Icon "Send to Cluster" is not available because the > cs4 is not active at the moment. But with three nodes, > even if you try to start the cs4 on this local node (where > I've done the configuration) , it can't start alone because > the cluster is not quorate ... and you can't start cman > on other nodes, it seems that the start is failed because > of no cluster.conf currently on the node. > > So, do we have to do manually the mkdir /etc/cluster on > both other nodes, and scp of cluster.conf towards both nodes ? > Or is there another tip via GUI to start the cs4 on three > nodes despite two nodes have not yet any cluster.conf available ? > > Thanks > Alain Moull? More information : in fact, when I started cman on nodes without cluster/cluster.conf, they effectuvely (as expected) got a cluster.conf from another node connected, but they take the cluster.conf from another HA pair cluster, not from the third node of this current cluster !!!! So that's why the start fails ... Which is the algorythm of search when a node has no cluster.conf available ? Thanks Alain Moull? From jparsons at redhat.com Mon Apr 24 12:57:13 2006 From: jparsons at redhat.com (James Parsons) Date: Mon, 24 Apr 2006 08:57:13 -0400 Subject: [Linux-cluster] Re: CS4 Update 2 / GUI problem ? In-Reply-To: <444CB5D3.5010408@bull.net> References: <444CB5D3.5010408@bull.net> Message-ID: <444CCB29.9060109@redhat.com> Alain Moulle wrote: >Thanks Jim, it was effectively the problem : the second step >about managing the fence for each node was missing (but there >is nothing in documentation about this step and dialog boxes ...) > >Another problem/question: >when you have finished the 3 nodes cluster configuration, >and Save the file in /etc/cluster/cluster.conf on local node, >the Icon "Send to Cluster" is not available because the >cs4 is not active at the moment. But with three nodes, >even if you try to start the cs4 on this local node (where >I've done the configuration) , it can't start alone because >the cluster is not quorate ... and you can't start cman >on other nodes, it seems that the start is failed because >of no cluster.conf currently on the node. > >So, do we have to do manually the mkdir /etc/cluster on >both other nodes, and scp of cluster.conf towards both nodes ? >Or is there another tip via GUI to start the cs4 on three >nodes despite two nodes have not yet any cluster.conf available ? > Yes, Alain. Currently, you must manually scp the cluster.conf to each node before starting the cluster. This requirement is considered unacceptable, however, and ease of cluster deployment is being aggressively pursued in two projects here. The first is an app called deploy-tool, which pulls down the necessary RPMs onto the machines desired as cluster nodes, and installs them AND copies a preliminary cluster.conf file to each node. Finally, it starts the cluster service daemons on each node. The second project with ease of cluster deployment as an important objective is the Conga project. It will provide a remote method for deploying clusters, configuring and monitoring them, and even adding and removing nodes. It will also offer remote storage management and a few other fun things as well. Conga and deploy-tool are both in active development, and deploy-tool is being beta tested now. Regards, -Jim > >Thanks >Alain Moull? > > > > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > > From proftpd at rodriges.spb.ru Mon Apr 24 14:35:26 2006 From: proftpd at rodriges.spb.ru (proftpd at rodriges.spb.ru) Date: Mon, 24 Apr 2006 18:35:26 +0400 Subject: [Linux-cluster] GFS Message-ID: Hello. I'm use GFS only for share one iSCSI target between 2 initiators. I'm really don't need create a cluster among those two nodes. I'm install GFS,GFS-kernel,fence,ccs and others packages, install iSCSI-initiator and even create gfs-file system, but then i'm going to mount gfs i received [root at initiator src]# mount -t gfs /dev/sda1 /mnt/w/ mount: Transport endpoint is not connected I haven't any idea about /etc/cluster/cluster.conf file - It's really necessary to create them. I'm simply want to share one iSCSI target between 2 hosts. Can i achive without creating a cluster??? From erwan at seanodes.com Mon Apr 24 16:17:18 2006 From: erwan at seanodes.com (Velu Erwan) Date: Mon, 24 Apr 2006 18:17:18 +0200 Subject: [Linux-cluster] Missing %if in GFS 6.0.2.30 specfile Message-ID: <444CFA0E.3080400@seanodes.com> If buildup = 0, rpm fails because the "modules" package doesn't exist for the following lines : The patch is simple but helps ;) +%if %{buildup} %post modules depmod -ae -F /boot/System.map-%{kernel_version} %{kernel_version} +%endif Erwan, From ookami at gmx.de Mon Apr 24 17:18:30 2006 From: ookami at gmx.de (Wolfgang Pauli) Date: Mon, 24 Apr 2006 11:18:30 -0600 Subject: [Linux-cluster] different subnets/ manual fencing In-Reply-To: <444C7D24.4090504@redhat.com> References: <30320.1145851164@www075.gmx.net> <444C7D24.4090504@redhat.com> Message-ID: <200604241118.30964.ookami@gmx.de> > Yes, but you'll need to configure it for multicas rather than broadcast - > and make sure that any intervening routers are good enough. That is good news. So we have the head node (dream) with two ethernet cards. We want it to serve a GFS partition to two different subnets. I guess this is than also doable with multicast, right? > I'd like to see those please. > Thanks! I attached a text file (myoops.txt). I hope it helps. I can't tell, because I never really had to deal with kernel panics before... Thanks again, wolfgang P.S.: Apr 23 16:15:10 node15 kernel: Linux version 2.6.15-1.1833_FC4smp (bhcompile at hs20-bc1-1.build.redhat.com) (gcc version 4.0.2 20051125 (Red Hat 4.0.2-8)) #1 SMP Wed Mar 1 23:56:51 EST 2006 -------------- next part -------------- Apr 23 14:03:59 node15 ccsd[2367]: Starting ccsd 1.0.0: Apr 23 14:03:59 node15 ccsd[2367]: Built: Jun 16 2005 10:45:39 Apr 23 14:03:59 node15 ccsd[2367]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Apr 23 14:04:00 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar 7 2006 15:36:41) installed Apr 23 14:04:00 node15 kernel: NET: Registered protocol family 30 Apr 23 14:04:00 node15 ccsd[2367]: cluster.conf (cluster name = oreilly_cluster, version = 33) found. Apr 23 14:04:03 node15 kernel: CMAN: Waiting to join or form a Linux-cluster Apr 23 14:04:03 node15 ccsd[2367]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2 Apr 23 14:04:03 node15 ccsd[2367]: Initial status:: Inquorate Apr 23 14:04:32 node15 kernel: CMAN: sending membership request Apr 23 14:04:53 node15 last message repeated 19 times Apr 23 14:04:54 node15 kernel: CMAN: got node node27 Apr 23 14:04:54 node15 kernel: CMAN: got node node17 Apr 23 14:04:54 node15 kernel: CMAN: got node node16 Apr 23 14:04:54 node15 kernel: CMAN: got node node24 Apr 23 14:04:54 node15 kernel: CMAN: got node node1 Apr 23 14:04:54 node15 kernel: CMAN: got node node2 Apr 23 14:04:54 node15 kernel: CMAN: got node node23 Apr 23 14:04:54 node15 kernel: CMAN: got node node6 Apr 23 14:04:54 node15 kernel: CMAN: got node node10 Apr 23 14:04:54 node15 kernel: CMAN: got node node9 Apr 23 14:05:01 node15 kernel: CMAN: quorum regained, resuming activity Apr 23 14:05:01 node15 ccsd[2367]: Cluster is quorate. Allowing connections. Apr 23 14:05:01 node15 kernel: dlm: no version for "struct_module" found: kernel tainted. Apr 23 14:05:01 node15 kernel: DLM 2.6.11.5-20050601.152643.FC4.22 (built Mar 7 2006 15:42:37) installed Apr 23 14:07:28 node15 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Apr 23 14:07:28 node15 kernel: printing eip: Apr 23 14:07:28 node15 kernel: f8acfa39 Apr 23 14:07:28 node15 kernel: *pde = 37d1f001 Apr 23 14:07:28 node15 kernel: Oops: 0000 [#1] Apr 23 14:07:28 node15 kernel: SMP Apr 23 14:07:28 node15 kernel: last sysfs file: /class/misc/dlm-control/dev Apr 23 14:07:28 node15 kernel: Modules linked in: dlm(U) cman(U) ipv6 parport_pc lp parport autofs4 nfs lockd nfs_acl sunrpc dm_mod eepro100 uhci_hcd hw_ra ndom i8xx_tco i2c_i801 i2c_core e1000 e100 mii floppy ext3 jbd Apr 23 14:07:28 node15 kernel: CPU: 0 Apr 23 14:07:28 node15 kernel: EIP: 0060:[] Tainted: GF VLI Apr 23 14:07:28 node15 kernel: EFLAGS: 00010207 (2.6.15-1.1833_FC4smp) Apr 23 14:07:28 node15 kernel: EIP is at memcpy_fromkvec+0x2e/0x4f [cman] Apr 23 14:07:28 node15 kernel: eax: 00000046 ebx: c1d7aeb7 ecx: 00000011 edx: f68b0fa0 Apr 23 14:07:28 node15 kernel: esi: 00000000 edi: c1d7aeb7 ebp: 00000046 esp: f68b0ec8 Apr 23 14:07:28 node15 kernel: ds: 007b es: 007b ss: 0068 Apr 23 14:07:28 node15 kernel: Process cman_comms (pid: 2392, threadinfo=f68b0000 task=f7d85550) Apr 23 14:07:28 node15 kernel: Stack: badc0ded f6846380 00000000 f7386400 f68b0f74 f8acfbb3 00000100 00000002 Apr 23 14:07:28 node15 kernel: 00000040 f731e000 f7decb40 f731e001 00000001 00000001 f6cea440 f8ad002d Apr 23 14:07:28 node15 kernel: f68b0f90 00000001 000002fd c1b091e0 f68b0f90 f68b0f74 f7decb40 c1eb4940 Apr 23 14:07:28 node15 kernel: Call Trace: Apr 23 14:07:28 node15 kernel: [] send_to_user_port+0x159/0x3cc [cman] [] process_incoming_packet+0x207/0x26c [cman] Apr 23 14:07:29 node15 kernel: [] receive_message+0xb7/0xe0 [cman] [] cluster_kthread+0x18b/0x39f [cman] Apr 23 14:07:29 node15 kernel: [] default_wake_function+0x0/0xc [] cluster_kthread+0x0/0x39f [cman] Apr 23 14:07:29 node15 kernel: [] kernel_thread_helper+0x5/0xb Apr 23 14:07:29 node15 kernel: Code: 53 89 c3 89 cd 85 c9 7e 3e 83 c2 08 eb 07 83 c2 08 85 ed 7e 32 8b 42 fc 85 c0 74 f2 39 e8 0f 47 c5 89 c1 c1 e9 02 8b 7 2 f8 89 df a5 89 c1 83 e1 03 74 02 f3 a4 29 c5 01 c3 01 42 f8 29 42 fc Apr 23 14:07:29 node15 kernel: Continuing in 120 seconds. ^MContinuing in 119 seconds. ^MContinuing in 118 seconds. ^MContinuing in 117 seconds. ^MContinui ng in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111 seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds. Continuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing in 100 seconds. ^M<4>hda: dma_timer_expiry: dma status == 0x24 Apr 23 14:07:29 node15 kernel: Continuing in 99 seconds. ^MContinuing in 98 seconds. ^MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing i n 95 seconds. ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds. hda: DMA interrupt recovery Apr 23 14:07:29 node15 kernel: hda: lost interrupt Apr 23 14:07:29 node15 kernel: Continuing in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing i n 85 seconds. ^MContinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 80 seconds. Continuing in 79 seconds. ^MContinuing in 78 seconds. ^MContinuing in 77 seconds. ^MContinuing in 76 seconds. ^MContinuing in 75 seconds. ^MContinuing in 7 4 seconds. ^MContinuing in 73 seconds. ^MContinuing in 72 seconds. ^MContinuing in 71 seconds. ^MContinuing in 70 seconds. ^MContinuing in 69 seconds. ^M<4 >hda: dma_timer_expiry: dma status == 0x24 Apr 23 14:07:29 node15 kernel: Continuing in 68 seconds. ^MContinuing in 67 seconds. ^MContinuing in 66 seconds. ^MContinuing in 65 seconds. ^MContinuing i n 64 seconds. ^MContinuing in 63 seconds. ^MContinuing in 62 seconds. ^MContinuing in 61 seconds. ^M<6>ide-cd: cmd 0x3 timed out Apr 23 14:07:29 node15 kernel: hdc: lost interrupt Apr 23 14:07:29 node15 kernel: Continuing in 60 seconds. ^MContinuing in 59 seconds. ^Mhda: DMA interrupt recovery Apr 23 14:07:29 node15 kernel: hda: lost interrupt Apr 23 14:07:29 node15 kernel: Continuing in 58 seconds. ^MContinuing in 57 seconds. ^MContinuing in 56 seconds. ^MContinuing in 55 seconds. ^MContinuing i n 54 seconds. ^MContinuing in 53 seconds. ^MContinuing in 52 seconds. ^MContinuing in 51 seconds. ^MContinuing in 50 seconds. ^MContinuing in 49 seconds. Continuing in 48 seconds. ^MContinuing in 47 seconds. ^MContinuing in 46 seconds. ^MContinuing in 45 seconds. ^MContinuing in 44 seconds. ^MContinuing in 4 3 seconds. ^MContinuing in 42 seconds. ^MContinuing in 41 seconds. ^MContinuing in 40 seconds. ^MContinuing in 39 seconds. ^M<4>hda: dma_timer_expiry: dma status == 0x24 Apr 23 14:07:29 node15 kernel: Continuing in 38 seconds. ^MContinuing in 37 seconds. ^MContinuing in 36 seconds. ^MContinuing in 35 seconds. ^MContinuing i n 34 seconds. ^MContinuing in 33 seconds. ^MContinuing in 32 seconds. ^MContinuing in 31 seconds. ^MContinuing in 30 seconds. ^MContinuing in 29 seconds. hda: DMA interrupt recovery Apr 23 14:07:29 node15 kernel: hda: lost interrupt Apr 23 14:07:29 node15 kernel: Continuing in 28 seconds. ^MContinuing in 27 seconds. ^MContinuing in 26 seconds. ^MContinuing in 25 seconds. ^MContinuing i n 24 seconds. ^MContinuing in 23 seconds. ^MContinuing in 22 seconds. ^MContinuing in 21 seconds. ^MContinuing in 20 seconds. ^MContinuing in 19 seconds. Continuing in 18 seconds. ^MContinuing in 17 seconds. ^MContinuing in 16 seconds. ^MContinuing in 15 seconds. ^MContinuing in 14 seconds. ^MContinuing in 1 3 seconds. ^MContinuing in 12 seconds. ^MContinuing in 11 seconds. ^MContinuing in 10 seconds. ^MContinuing in 9 seconds. ^M<4>hda: dma_timer_expiry: dma s tatus == 0x24 Apr 23 14:07:29 node15 kernel: Continuing in 8 seconds. ^MContinuing in 7 seconds. ^MContinuing in 6 seconds. ^MContinuing in 5 seconds. ^MContinuing in 4 seconds. ^MContinuing in 3 seconds. ^MContinuing in 2 seconds. ^MContinuing in 1 seconds. Apr 23 14:07:29 node15 kernel: <0>Fatal exception: panic in 5 seconds # ----------------------------------------------- Apr 23 16:07:04 node15 ccsd[2373]: Starting ccsd 1.0.0: Apr 23 16:07:04 node15 ccsd[2373]: Built: Jun 16 2005 10:45:39 Apr 23 16:07:04 node15 ccsd[2373]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Apr 23 16:07:05 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar 7 2006 15:36:41) installed Apr 23 16:07:05 node15 kernel: NET: Registered protocol family 30 Apr 23 16:07:05 node15 ccsd[2373]: cluster.conf (cluster name = oreilly_cluster, version = 33) found. Apr 23 16:07:14 node15 kernel: CMAN: Waiting to join or form a Linux-cluster Apr 23 16:07:14 node15 ccsd[2373]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2 Apr 23 16:07:14 node15 ccsd[2373]: Initial status:: Inquorate Apr 23 16:07:17 node15 kernel: CMAN: sending membership request Apr 23 16:07:37 node15 last message repeated 27 times Apr 23 16:07:38 node15 kernel: CMAN: got node node2 Apr 23 16:07:38 node15 kernel: CMAN: got node node26 Apr 23 16:07:38 node15 kernel: CMAN: got node node6 Apr 23 16:07:38 node15 kernel: CMAN: got node node27 Apr 23 16:07:38 node15 kernel: CMAN: got node node4 Apr 23 16:07:38 node15 kernel: CMAN: got node node5 Apr 23 16:07:38 node15 kernel: CMAN: got node node17 Apr 23 16:07:38 node15 kernel: CMAN: got node node3 Apr 23 16:07:38 node15 kernel: CMAN: got node node18 Apr 23 16:07:38 node15 kernel: CMAN: got node node16 Apr 23 16:07:38 node15 kernel: CMAN: got node node23 Apr 23 16:07:38 node15 kernel: CMAN: got node node12 Apr 23 16:07:38 node15 kernel: CMAN: got node node7 Apr 23 16:07:38 node15 ccsd[2373]: Cluster is quorate. Allowing connections. Apr 23 16:07:38 node15 kernel: CMAN: got node dream Apr 23 16:07:38 node15 kernel: CMAN: got node node20 Apr 23 16:07:38 node15 kernel: CMAN: quorum regained, resuming activity Apr 23 16:07:38 node15 kernel: dlm: no version for "struct_module" found: kernel tainted. Apr 23 16:07:38 node15 kernel: DLM 2.6.11.5-20050601.152643.FC4.22 (built Mar 7 2006 15:42:37) installed Apr 23 16:10:06 node15 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Apr 23 16:10:06 node15 kernel: printing eip: Apr 23 16:10:06 node15 kernel: f8a85a39 Apr 23 16:10:06 node15 kernel: *pde = 363b4001 Apr 23 16:10:06 node15 kernel: Oops: 0000 [#1] Apr 23 16:10:06 node15 kernel: SMP Apr 23 16:10:06 node15 kernel: last sysfs file: /class/misc/dlm-control/dev Apr 23 16:10:06 node15 kernel: Modules linked in: dlm(U) cman(U) ipv6 parport_pc lp parport autofs4 sunrpc dm_mod eepro100 uhci_hcd hw_random i8xx_tco i2c_ i801 i2c_core e1000 e100 mii floppy ext3 jbd Apr 23 16:10:06 node15 kernel: CPU: 0 Apr 23 16:10:06 node15 kernel: EIP: 0060:[] Tainted: GF VLI Apr 23 16:10:06 node15 kernel: EFLAGS: 00010202 (2.6.15-1.1833_FC4smp) Apr 23 16:10:06 node15 kernel: EIP is at memcpy_fromkvec+0x2e/0x4f [cman] Apr 23 16:10:06 node15 kernel: eax: 00000040 ebx: c1db9eba ecx: 00000010 edx: f6344fa0 Apr 23 16:10:06 node15 kernel: esi: 00000000 edi: c1db9eba ebp: 00000040 esp: f6344ec8 Apr 23 16:10:06 node15 kernel: ds: 007b es: 007b ss: 0068 Apr 23 16:10:06 node15 kernel: Process cman_comms (pid: 2399, threadinfo=f6344000 task=c1e73aa0) Apr 23 16:10:06 node15 kernel: Stack: badc0ded f63d5a80 00000000 f6000a00 f6344f74 f8a85bb3 00000100 00000002 Apr 23 16:10:06 node15 kernel: 00000040 f66a9800 f6319a40 f66a9801 00000001 00000001 f6383cc0 f8a8602d Apr 23 16:10:06 node15 kernel: f6344f90 00000001 000002fa c1b091e0 f6344f90 f6344f74 f6319a40 f7dc1b80 Apr 23 16:10:06 node15 kernel: Call Trace: Apr 23 16:10:06 node15 kernel: [] send_to_user_port+0x159/0x3cc [cman] [] process_incoming_packet+0x207/0x26c [cman] Apr 23 16:10:06 node15 kernel: [] receive_message+0xb7/0xe0 [cman] [] cluster_kthread+0x18b/0x39f [cman] Apr 23 16:10:06 node15 kernel: [] default_wake_function+0x0/0xc [] cluster_kthread+0x0/0x39f [cman] Apr 23 16:10:06 node15 kernel: [] kernel_thread_helper+0x5/0xb Apr 23 16:10:06 node15 kernel: Code: 53 89 c3 89 cd 85 c9 7e 3e 83 c2 08 eb 07 83 c2 08 85 ed 7e 32 8b 42 fc 85 c0 74 f2 39 e8 0f 47 c5 89 c1 c1 e9 02 8b 7 2 f8 89 df a5 89 c1 83 e1 03 74 02 f3 a4 29 c5 01 c3 01 42 f8 29 42 fc Apr 23 16:10:06 node15 kernel: <0>Fatal exception: panic in 5 seconds # ----------------------------------------------- Apr 23 18:05:33 node15 ccsd[3356]: Starting ccsd 1.0.0: Apr 23 18:05:33 node15 ccsd[3356]: Built: Jun 16 2005 10:45:39 Apr 23 18:05:33 node15 ccsd[3356]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Apr 23 18:05:34 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar 7 2006 15:36:41) installed Apr 23 18:05:34 node15 kernel: NET: Registered protocol family 30 Apr 23 18:05:34 node15 ccsd[3356]: cluster.conf (cluster name = oreilly_cluster, version = 35) found. Apr 23 18:05:35 node15 kernel: CMAN: Waiting to join or form a Linux-cluster Apr 23 18:05:36 node15 ccsd[3356]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2 Apr 23 18:05:36 node15 ccsd[3356]: Initial status:: Inquorate Apr 23 18:05:36 node15 kernel: CMAN: sending membership request Apr 23 18:05:36 node15 kernel: CMAN: got node dream Apr 23 18:06:13 node15 kernel: CMAN: quorum regained, resuming activity Apr 23 18:06:13 node15 ccsd[3356]: Cluster is quorate. Allowing connections. Apr 23 18:06:13 node15 kernel: dlm: no version for "struct_module" found: kernel tainted. Apr 23 18:06:13 node15 kernel: DLM 2.6.11.5-20050601.152643.FC4.22 (built Mar 7 2006 15:42:37) installed Apr 23 18:06:23 node15 kernel: CMAN: node node1 rejoining Apr 23 18:06:28 node15 last message repeated 3 times Apr 23 18:06:32 node15 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Apr 23 18:06:32 node15 kernel: printing eip: Apr 23 18:06:32 node15 kernel: f8a85a39 Apr 23 18:06:32 node15 kernel: *pde = 37e89001 Apr 23 18:06:32 node15 kernel: Oops: 0000 [#1] Apr 23 18:06:32 node15 kernel: SMP Apr 23 18:06:32 node15 kernel: last sysfs file: /class/misc/dlm-control/dev Apr 23 18:06:32 node15 kernel: Modules linked in: dlm(U) cman(U) nfs lockd nfs_acl ipv6 parport_pc lp parport autofs4 sunrpc dm_mod eepro100 uhci_hcd hw_ra ndom i8xx_tco i2c_i801 i2c_core e1000 e100 mii floppy ext3 jbd Apr 23 18:06:32 node15 kernel: CPU: 0 Apr 23 18:06:32 node15 kernel: EIP: 0060:[] Tainted: GF VLI Apr 23 18:06:32 node15 kernel: EFLAGS: 00010203 (2.6.15-1.1833_FC4smp) Apr 23 18:06:32 node15 kernel: EIP is at memcpy_fromkvec+0x2e/0x4f [cman] Apr 23 18:06:32 node15 kernel: eax: 00000042 ebx: c1ffcab9 ecx: 00000010 edx: f5c22fa0 Apr 23 18:06:32 node15 kernel: esi: 00000000 edi: c1ffcab9 ebp: 00000042 esp: f5c22ec8 Apr 23 18:06:32 node15 kernel: ds: 007b es: 007b ss: 0068 Apr 23 18:06:32 node15 kernel: Process cman_comms (pid: 3387, threadinfo=f5c22000 task=c1e50000) Apr 23 18:06:33 node15 kernel: Stack: badc0ded f7fd6d80 00000000 f6462000 f5c22f74 f8a85bb3 00000100 00000002 Apr 23 18:06:33 node15 kernel: 00000040 f66b1000 f5caf9c0 f66b1001 00000001 00000001 f5caf840 f8a8602d Apr 23 18:06:33 node15 kernel: f5c22f90 00000001 000002fb c1b091e0 f5c22f90 f5c22f74 f5caf9c0 c1e6d100 Apr 23 18:06:33 node15 kernel: Call Trace: Apr 23 18:06:33 node15 kernel: [] send_to_user_port+0x159/0x3cc [cman] [] process_incoming_packet+0x207/0x26c [cman] Apr 23 18:06:33 node15 kernel: [] receive_message+0xb7/0xe0 [cman] [] cluster_kthread+0x18b/0x39f [cman] Apr 23 18:06:33 node15 kernel: [] default_wake_function+0x0/0xc [] cluster_kthread+0x0/0x39f [cman] Apr 23 18:06:33 node15 kernel: [] kernel_thread_helper+0x5/0xb Apr 23 18:06:33 node15 kernel: Code: 53 89 c3 89 cd 85 c9 7e 3e 83 c2 08 eb 07 83 c2 08 85 ed 7e 32 8b 42 fc 85 c0 74 f2 39 e8 0f 47 c5 89 c1 c1 e9 02 8b 7 2 f8 89 df a5 89 c1 83 e1 03 74 02 f3 a4 29 c5 01 c3 01 42 f8 29 42 fc Apr 23 18:06:33 node15 kernel: Continuing in 120 seconds. ^MContinuing in 119 seconds. ^MContinuing in 118 seconds. ^MContinuing in 117 seconds. ^MContinui ng in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111 seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds. Continuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing in 100 seconds. ^MContinuing in 99 seconds. ^MContinuing in 98 seconds. ^MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing in 95 seconds . ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds. ^MContinuing in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing in 85 seconds. ^MCo Apr 23 18:06:33 node15 kernel: tinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 8 ng in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111 seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds. Continuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing in 100 seconds. ^MContinuing in 99 seconds. ^MContinuing in 98 seconds. ^MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing in 95 seconds . ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds. ^MContinuing in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing in 85 seconds. ^MCo Apr 23 18:06:33 node15 kernel: tinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 8 0 seconds. ^MContinuing in 79 seconds. ^MContinuing in 78 seconds. ^MContinuing in 77 seconds. ^MContinuing in 76 seconds. ^MContinuing in 75 seconds. ^MCo ntinuing in 74 seconds. ^MContinuing in 73 seconds. ^MContinuing in 72 seconds. ^MContinuing in 71 seconds. ^MContinuing in 70 seconds. ^MContinuing in 69 seconds. ^MContinuing in 68 seconds. ^MContinuing in 67 seconds. ^MContinuing in 66 seconds. ^MContinuing in 65 seconds. ^MContinuing in 64 seconds. ^MCont inuing in 63 seconds. ^MContinuing in 62 seconds. ^MContinuing in 61 seconds. ^MContinuing in 60 seconds. ^MContinuing in 59 seconds. ^MContinuing in 58 se conds. ^MContinuing in 57 seconds. ^MContinuing in 56 seconds. ^MContinuing in 55 seconds. ^MContinuing in 54 seconds. ^MContinuing in 53 seconds. ^MContin uing in 52 seconds. ^MContinuing in 51 seconds. ^MContinuing in 50 seconds. ^MContinuing in 49 seconds. ^MContinuing in 48 seconds. Apr 23 18:06:33 node15 kernel: tinuing in 47 seconds. ^MContinuing in 46 seconds. ^MContinuing in 45 seconds. ^MContinuing in 44 seconds. ^MContinuing in 4 3 seconds. ^MContinuing in 42 seconds. ^MContinuing in 41 seconds. ^MContinuing in 40 seconds. ^MContinuing in 39 seconds. ^MContinuing in 38 seconds. ^MCo ntinuing in 37 seconds. ^MContinuing in 36 seconds. ^MContinuing in 35 seconds. ^MContinuing in 34 seconds. ^MContinuing in 33 seconds. ^MContinuing in 32 seconds. ^MContinuing in 31 seconds. ^MContinuing in 30 seconds. ^MContinuing in 29 seconds. ^MContinuing in 28 seconds. ^MContinuing in 27 seconds. ^MCont inuing in 26 seconds. ^MContinuing in 25 seconds. ^MContinuing in 24 seconds. ^MContinuing in 23 seconds. ^MContinuing in 22 seconds. ^MContinuing in 21 se conds. ^MContinuing in 20 seconds. ^MContinuing in 19 seconds. ^MContinuing in 18 seconds. ^MContinuing in 17 seconds. ^MContinuing in 16 seconds. ^MContin uing in 15 seconds. ^MContinuing in 14 seconds. ^MContinuing in 13 seconds. ^MContinuing in 12 seconds. ^MContinuing in 11 seconds. Apr 23 18:06:33 node15 kernel: tinuing in 10 seconds. ^MContinuing in 9 seconds. ^MContinuing in 8 seconds. ^MContinuing in 7 seconds. ^MContinuing in 6 se conds. ^MContinuing in 5 seconds. ^MContinuing in 4 seconds. ^MContinuing in 3 seconds. ^MContinuing in 2 seconds. ^MContinuing in 1 seconds. Apr 23 18:06:33 node15 kernel: <0>Fatal exception: panic in 5 seconds From pcaulfie at redhat.com Mon Apr 24 18:28:48 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Mon, 24 Apr 2006 19:28:48 +0100 Subject: [Linux-cluster] different subnets/ manual fencing In-Reply-To: <200604241118.30964.ookami@gmx.de> References: <30320.1145851164@www075.gmx.net> <444C7D24.4090504@redhat.com> <200604241118.30964.ookami@gmx.de> Message-ID: <444D18E0.8080609@redhat.com> Wolfgang Pauli wrote: >> Yes, but you'll need to configure it for multicas rather than broadcast - >> and make sure that any intervening routers are good enough. > > That is good news. So we have the head node (dream) with two ethernet cards. > We want it to serve a GFS partition to two different subnets. I guess this is > than also doable with multicast, right? Yes, if your router is up to it. >> I'd like to see those please. >> > Thanks! I attached a text file (myoops.txt). I hope it helps. I can't tell, > because I never really had to deal with kernel panics before... > > Thanks again, > > wolfgang > > P.S.: Apr 23 16:15:10 node15 kernel: Linux version 2.6.15-1.1833_FC4smp > (bhcompile at hs20-bc1-1.build.redhat.com) (gcc version 4.0.2 20051125 (Red Hat > 4.0.2-8)) #1 SMP Wed Mar 1 23:56:51 EST 2006 > > > ------------------------------------------------------------------------ > > Apr 23 14:03:59 node15 ccsd[2367]: Starting ccsd 1.0.0: > Apr 23 14:03:59 node15 ccsd[2367]: Built: Jun 16 2005 10:45:39 > Apr 23 14:03:59 node15 ccsd[2367]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. > Apr 23 14:04:00 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar 7 2006 15:36:41) installed That's a rather old version, I'm pretty sure that bug has been fixed since. Can you upgrade ? Patrick From Bowie_Bailey at BUC.com Mon Apr 24 19:03:35 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Mon, 24 Apr 2006 15:03:35 -0400 Subject: [Linux-cluster] GFS Message-ID: <4766EEE585A6D311ADF500E018C154E30213394B@bnifex.cis.buc.com> proftpd at rodriges.spb.ru wrote: > > I'm use GFS only for share one iSCSI target between 2 > initiators. I'm really don't need create a cluster among > those two nodes. I'm install GFS,GFS-kernel,fence,ccs and > others packages, install iSCSI-initiator and even create > gfs-file system, but then i'm going to mount gfs i received > > [root at initiator src]# mount -t gfs /dev/sda1 /mnt/w/ > mount: Transport endpoint is not connected > > I haven't any idea about /etc/cluster/cluster.conf file - > It's really necessary to create them. I'm simply want to > share one iSCSI target between 2 hosts. Can i achive > without creating a cluster??? No, you can't use GFS without a cluster. You need the cluster services to manage access to the shared filesystem and to prevent misbehaving nodes from causing data corruption. -- Bowie From rohara at redhat.com Mon Apr 24 19:15:48 2006 From: rohara at redhat.com (Ryan O'Hara) Date: Mon, 24 Apr 2006 14:15:48 -0500 Subject: [Linux-cluster] GFS In-Reply-To: <4766EEE585A6D311ADF500E018C154E30213394B@bnifex.cis.buc.com> References: <4766EEE585A6D311ADF500E018C154E30213394B@bnifex.cis.buc.com> Message-ID: <444D23E4.8070104@redhat.com> Bowie Bailey wrote: > > proftpd at rodriges.spb.ru wrote: > >>I'm use GFS only for share one iSCSI target between 2 >>initiators. I'm really don't need create a cluster among >>those two nodes. I'm install GFS,GFS-kernel,fence,ccs and >>others packages, install iSCSI-initiator and even create >>gfs-file system, but then i'm going to mount gfs i received >> >>[root at initiator src]# mount -t gfs /dev/sda1 /mnt/w/ >>mount: Transport endpoint is not connected >> >>I haven't any idea about /etc/cluster/cluster.conf file - >>It's really necessary to create them. I'm simply want to >>share one iSCSI target between 2 hosts. Can i achive >>without creating a cluster??? > > > No, you can't use GFS without a cluster. You need the cluster > services to manage access to the shared filesystem and to prevent > misbehaving nodes from causing data corruption. > You can use GFS without a cluster if you run as a standalone filesystem. When you use GFS as shared storage, as in this case, you do need to run GFS in a cluster. Ryan From rajeshkannna at gmail.com Tue Apr 25 05:47:46 2006 From: rajeshkannna at gmail.com (Rajesh Kanna) Date: Tue, 25 Apr 2006 11:17:46 +0530 Subject: [Linux-cluster] Re: Linux-cluster Digest, Vol 24, Issue 35 In-Reply-To: <20060424160007.155DA73461@hormel.redhat.com> References: <20060424160007.155DA73461@hormel.redhat.com> Message-ID: <1d301df90604242247j6a3ba03dn32fd920b54279347@mail.gmail.com> dear sir, I shall want to know about basic of linux-clustering . reg P.Rajeshkanna On 4/24/06, linux-cluster-request at redhat.com wrote: > Send Linux-cluster mailing list submissions to > linux-cluster at redhat.com > > To subscribe or unsubscribe via the World Wide Web, visit > https://www.redhat.com/mailman/listinfo/linux-cluster > or, via email, send a message with subject or body 'help' to > linux-cluster-request at redhat.com > > You can reach the person managing the list at > linux-cluster-owner at redhat.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Linux-cluster digest..." > > > Today's Topics: > > 1. Re: where are built GFS rpms?, and upgrade question (Jason) > 2. different subnets/ manual fencing (wolfgang pauli) > 3. Re: different subnets/ manual fencing (Patrick Caulfield) > 4. Re: CS4 Update 2 / GUI problem ? (Alain Moulle) > 5. Re: CS4 Update 2 / GUI problem ? (Alain Moulle) > 6. Re: Re: CS4 Update 2 / GUI problem ? (James Parsons) > 7. GFS (proftpd at rodriges.spb.ru) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 23 Apr 2006 21:00:57 -0400 > From: Jason > Subject: Re: [Linux-cluster] where are built GFS rpms?, and upgrade > question > To: linux clustering > Message-ID: <20060424010057.GA53613 at monsterjam.org> > Content-Type: text/plain; charset=us-ascii > > On Fri, Apr 21, 2006 at 03:25:11PM -0500, John Griffin-Wiesner wrote: > > Two questions: > > > > 1. I can find src.rpm's but no built GFS rpm's for rhel 3U7. > > I believe that should be GFS-6.0.2.30-0. Can someone > > tell me where those are hiding? Or do we all have to build > > those ourselves now? > > http://www.gyrate.org/archives/9 > > > > ------------------------------ > > Message: 2 > Date: Mon, 24 Apr 2006 05:59:24 +0200 (MEST) > From: "wolfgang pauli" > Subject: [Linux-cluster] different subnets/ manual fencing > To: linux-cluster at redhat.com > Message-ID: <30320.1145851164 at www075.gmx.net> > Content-Type: text/plain; charset="us-ascii" > > Hi, > > I spent the whole day (sunday) trying to get this working... > I guess these two questions might solve the issue. > > 1. Can I have a cluster span over more than one subnet? > > 2. When I try to start the cluster software, I always have to start it on > all nodes at the same time. If I don't do it, startup will hang while > fenced is starting up. I am using manual fencing. Probably the default > configuration. The problem is that some of the nodes produce > kernel-panics (when starting cman). so i would like to start the nodes > one by one and test what the problem is. > > > ... > > > > > > > > ... > > > > > -- > > > "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... > Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail > > > > ------------------------------ > > Message: 3 > Date: Mon, 24 Apr 2006 08:24:20 +0100 > From: Patrick Caulfield > Subject: Re: [Linux-cluster] different subnets/ manual fencing > To: linux clustering > Message-ID: <444C7D24.4090504 at redhat.com> > Content-Type: text/plain; charset=ISO-8859-1 > > wolfgang pauli wrote: > > Hi, > > > > I spent the whole day (sunday) trying to get this working... > > I guess these two questions might solve the issue. > > > > 1. Can I have a cluster span over more than one subnet? > > Yes, but you'll need to configure it for multicas rather than broadcast - and > make sure that any intervening routers are good enough. > > > 2. When I try to start the cluster software, I always have to start it on > > all nodes at the same time. If I don't do it, startup will hang while > > fenced is starting up. I am using manual fencing. Probably the default > > configuration. The problem is that some of the nodes produce > > kernel-panics (when starting cman). > > I'd like to see those please. > > so i would like to start the nodes > > one by one and test what the problem is. > > > > > > ... > > > > > > > > > > > > > > > > ... > > > > > > > > > > > -- > > patrick > > > > ------------------------------ > > Message: 4 > Date: Mon, 24 Apr 2006 13:26:11 +0200 > From: Alain Moulle > Subject: [Linux-cluster] Re: CS4 Update 2 / GUI problem ? > To: linux-cluster at redhat.com > Message-ID: <444CB5D3.5010408 at bull.net> > Content-Type: text/plain; charset=ISO-8859-1 > > Thanks Jim, it was effectively the problem : the second step > about managing the fence for each node was missing (but there > is nothing in documentation about this step and dialog boxes ...) > > Another problem/question: > when you have finished the 3 nodes cluster configuration, > and Save the file in /etc/cluster/cluster.conf on local node, > the Icon "Send to Cluster" is not available because the > cs4 is not active at the moment. But with three nodes, > even if you try to start the cs4 on this local node (where > I've done the configuration) , it can't start alone because > the cluster is not quorate ... and you can't start cman > on other nodes, it seems that the start is failed because > of no cluster.conf currently on the node. > > So, do we have to do manually the mkdir /etc/cluster on > both other nodes, and scp of cluster.conf towards both nodes ? > Or is there another tip via GUI to start the cs4 on three > nodes despite two nodes have not yet any cluster.conf available ? > > Thanks > Alain Moull? > > > > > > > ------------------------------ > > Message: 5 > Date: Mon, 24 Apr 2006 13:38:16 +0200 > From: Alain Moulle > Subject: [Linux-cluster] Re: CS4 Update 2 / GUI problem ? > To: linux-cluster at redhat.com > Message-ID: <444CB8A8.3060502 at bull.net> > Content-Type: text/plain; charset=ISO-8859-1 > > Alain Moulle wrote: > > Thanks Jim, it was effectively the problem : the second step > > about managing the fence for each node was missing (but there > > is nothing in documentation about this step and dialog boxes ...) > > > > Another problem/question: > > when you have finished the 3 nodes cluster configuration, > > and Save the file in /etc/cluster/cluster.conf on local node, > > the Icon "Send to Cluster" is not available because the > > cs4 is not active at the moment. But with three nodes, > > even if you try to start the cs4 on this local node (where > > I've done the configuration) , it can't start alone because > > the cluster is not quorate ... and you can't start cman > > on other nodes, it seems that the start is failed because > > of no cluster.conf currently on the node. > > > > So, do we have to do manually the mkdir /etc/cluster on > > both other nodes, and scp of cluster.conf towards both nodes ? > > Or is there another tip via GUI to start the cs4 on three > > nodes despite two nodes have not yet any cluster.conf available ? > > > > Thanks > > Alain Moull? > > More information : > > in fact, when I started cman on nodes without cluster/cluster.conf, > they effectuvely (as expected) got a cluster.conf from another node > connected, but they take the cluster.conf from another HA pair cluster, > not from the third node of this current cluster !!!! > So that's why the start fails ... > Which is the algorythm of search when a node has no cluster.conf available ? > > Thanks > Alain Moull? > > > > > > > ------------------------------ > > Message: 6 > Date: Mon, 24 Apr 2006 08:57:13 -0400 > From: James Parsons > Subject: Re: [Linux-cluster] Re: CS4 Update 2 / GUI problem ? > To: linux clustering > Message-ID: <444CCB29.9060109 at redhat.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Alain Moulle wrote: > > >Thanks Jim, it was effectively the problem : the second step > >about managing the fence for each node was missing (but there > >is nothing in documentation about this step and dialog boxes ...) > > > >Another problem/question: > >when you have finished the 3 nodes cluster configuration, > >and Save the file in /etc/cluster/cluster.conf on local node, > >the Icon "Send to Cluster" is not available because the > >cs4 is not active at the moment. But with three nodes, > >even if you try to start the cs4 on this local node (where > >I've done the configuration) , it can't start alone because > >the cluster is not quorate ... and you can't start cman > >on other nodes, it seems that the start is failed because > >of no cluster.conf currently on the node. > > > >So, do we have to do manually the mkdir /etc/cluster on > >both other nodes, and scp of cluster.conf towards both nodes ? > >Or is there another tip via GUI to start the cs4 on three > >nodes despite two nodes have not yet any cluster.conf available ? > > > Yes, Alain. Currently, you must manually scp the cluster.conf to each > node before starting the cluster. This requirement is considered > unacceptable, however, and ease of cluster deployment is being > aggressively pursued in two projects here. > > The first is an app called deploy-tool, which pulls down the necessary > RPMs onto the machines desired as cluster nodes, and installs them AND > copies a preliminary cluster.conf file to each node. Finally, it starts > the cluster service daemons on each node. > > The second project with ease of cluster deployment as an important > objective is the Conga project. It will provide a remote method for > deploying clusters, configuring and monitoring them, and even adding and > removing nodes. It will also offer remote storage management and a few > other fun things as well. > > Conga and deploy-tool are both in active development, and deploy-tool is > being beta tested now. > > Regards, > > -Jim > > > > >Thanks > >Alain Moull? > > > > > > > > > >-- > >Linux-cluster mailing list > >Linux-cluster at redhat.com > >https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > ------------------------------ > > Message: 7 > Date: Mon, 24 Apr 2006 18:35:26 +0400 > From: > Subject: [Linux-cluster] GFS > To: linux-cluster at redhat.com > Message-ID: > Content-Type: text/plain; charset="KOI8-R" > > Hello. > > I'm use GFS only for share one iSCSI target between 2 > initiators. I'm really don't need create a cluster among > those two nodes. I'm install GFS,GFS-kernel,fence,ccs and > others packages, install iSCSI-initiator and even create > gfs-file system, but then i'm going to mount gfs i received > > [root at initiator src]# mount -t gfs /dev/sda1 /mnt/w/ > mount: Transport endpoint is not connected > > I haven't any idea about /etc/cluster/cluster.conf file - > It's really necessary to create them. I'm simply want to > share one iSCSI target between 2 hosts. Can i achive > without creating a cluster??? > > > > ------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > End of Linux-cluster Digest, Vol 24, Issue 35 > ********************************************* > From alfeijoo at cesga.es Tue Apr 25 06:53:53 2006 From: alfeijoo at cesga.es (Alejandro Feijoo) Date: Tue, 25 Apr 2006 08:53:53 +0200 (CEST) Subject: [Linux-cluster] quotas on GFS Message-ID: <52100.193.144.44.59.1145948033.squirrel@webmail.cesga.es> Hi, there are any method to assgin global quotas on GFS, for example for all users assing 6Gb for home? or may i need edit all users? Tanks! ++-------------------------++ Alejandro Feij?o Fraga Tecnico de Sistemas. Centro de supercomputaci?n de Galicia Avda. de Vigo s/n. Campus Sur. 15705 - Santiago de Compostela. Spain Tlfn.: 981 56 98 10 Extension: 216 Fax: 981 59 46 16 From nemanja at yu.net Tue Apr 25 10:42:41 2006 From: nemanja at yu.net (Nemanja Miletic) Date: Tue, 25 Apr 2006 12:42:41 +0200 Subject: [Linux-cluster] Re: problems with 8 node production gfs cluster In-Reply-To: <4448E8E4.3000400@arnet.net.ar> References: <1145266165.27997.57.camel@nemanja.eunet.yu> <1145288499.6000.15.camel@nemanja.eunet.yu> <20060418133704.GA16121@redhat.com> <4448E8E4.3000400@arnet.net.ar> Message-ID: <1145961761.30361.23.camel@nemanja.eunet.yu> Well we applied the 'echo "0" >> /proc/cluster/lock_dlm/drop_count' before mounting our GFS partitions. We also introduced another pop3 node in the cluster, installed imapproxy on our webmail machine and made connections for pop3 and imap persistant for 120 seconds on loadbalancers. We did not turn on data journaling because most of files on filesystem are not empty. At the moment the condition is stable. We will probably introduce another node soon. On Fri, 2006-04-21 at 11:15 -0300, German Staltari wrote: > David Teigland wrote: > > On Mon, Apr 17, 2006 at 05:41:39PM +0200, Nemanja Miletic wrote: > > > >> Hi, > >> > >> Does anyone think that turning on journaling on files could help us > >> speed up the access to gfs partition? > >> > >> This would be difficult because journaling can be turned on only on > >> files that are empty. We have a large number of empty files of active > >> users that download all their mail from pop3 server, so turning on > >> jurnaling for them should be possible. > >> > > > > Data journaling might help, it will speed up fsync(), but will increase > > the i/o going to your storage. > > > > > >> What size should be the journals when file journaling is on? > >> > > > > Continue to use the default. > > > > Another thing you might try is disabling the drop-locks callback, allowing > > GFS to cache more locks. Do this before you mount: > > echo "0" >> /proc/cluster/lock_dlm/drop_count > > > > > Did you apply this changes? Could you share the results of this changes > in your configuration? Do you recommend it? > Thanks > German Staltari > -- Nemanja Miletic, System Engineer ----- YUnet International http://www.EUnet.yu Dubrovacka 35/III, 11000 Belgrade Tel: +381 11 3305633; Fax: +381 11 3282760 ----- This e-mail is confidential and intended only for the recipient. Unauthorized distribution, modification or disclosure of its contents is prohibited. If you have received this e-mail in error, please notify the sender by telephone +381 11 3305633. From rajiv.vaidyanath at ccur.com Tue Apr 25 11:59:37 2006 From: rajiv.vaidyanath at ccur.com (Rajiv Vaidyanath) Date: Tue, 25 Apr 2006 07:59:37 -0400 Subject: [Linux-cluster] cluster suite / Opteron Message-ID: <1145966377.16894.30.camel@mouse> Hi, I get some compilation warnings on opteron (cluster-1.02.00) Eg: -------------------------------------------- drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1595: warning: long unsigned int format, uint64_t arg (arg 2) drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1596: warning: long unsigned int format, uint64_t arg (arg 2) drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1598: warning: long unsigned int format, uint64_t arg (arg 2) drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1599: warning: long unsigned int format, uint64_t arg (arg 2) -------------------------------------------- Can I safely ignore these warnings ? Thanks, Rajiv From Bowie_Bailey at BUC.com Tue Apr 25 14:11:09 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Tue, 25 Apr 2006 10:11:09 -0400 Subject: [Linux-cluster] Re: Linux-cluster Digest, Vol 24, Issue 35 Message-ID: <4766EEE585A6D311ADF500E018C154E302133958@bnifex.cis.buc.com> Rajesh Kanna wrote: > dear sir, > > I shall want to know about basic of linux-clustering . > > reg > > P.Rajeshkanna That's a rather open-ended question. Check out the manuals and then come back if you have some more specific questions. https://www.redhat.com/docs/manuals/csgfs/ Also, don't forget to search the list archives. Quite a bit of the "how does the cluster work" type questions have been asked and answered several times before on the list. -- Bowie From filipe.miranda at gmail.com Tue Apr 25 16:11:13 2006 From: filipe.miranda at gmail.com (Filipe Miranda) Date: Tue, 25 Apr 2006 13:11:13 -0300 Subject: [Linux-cluster] Postfix/Dovecot/GFS Message-ID: Hello, We are gathering as much information possible to build a mail cluster using RHEL4/GFS. Could you guys help us out witht some questions? 1) Does Postfix/Dovecot is lock aware when using in conjuction with RH GFS on a RHEL4 ? 2) Will I have to setup Postfix to use Maildir? or mbox can handle it? Thank you, -- Att. --- Filipe Miranda -------------- next part -------------- An HTML attachment was scrubbed... URL: From ookami at gmx.de Tue Apr 25 18:38:41 2006 From: ookami at gmx.de (Wolfgang Pauli) Date: Tue, 25 Apr 2006 12:38:41 -0600 Subject: [Linux-cluster] multicast howto Message-ID: <200604251238.41472.ookami@gmx.de> Hi, I am trying to setup gfs on a cluster that spans over two subnets. dream is a node with to interefaces, one on each subnet. I thought the below setup should work (taken from http://gfs.wikidev.net/Installation ). But it does not. Can anybody tell me what is wrong with that? cheers, wolfgang From gforte at leopard.us.udel.edu Tue Apr 25 18:51:49 2006 From: gforte at leopard.us.udel.edu (Greg Forte) Date: Tue, 25 Apr 2006 14:51:49 -0400 Subject: [Linux-cluster] multicast howto In-Reply-To: <200604251238.41472.ookami@gmx.de> References: <200604251238.41472.ookami@gmx.de> Message-ID: <444E6FC5.6050202@leopard.us.udel.edu> well, for starters, you've got three nodes but are using the two_node mode ... I'm pretty sure that won't work. Also, I believe you need one multicast address that all the nodes communicate on - the multi-homed example given on that wiki page is intended for failover situations, not for "split-brain" networking ... I think. And the addresses given there are just examples, you're going to need to explicitly configure your router(s) to send packets addressed to some multicast address that you assign for the cluster to the ports that the cluster nodes are attached to. -g Wolfgang Pauli wrote: > Hi, > > I am trying to setup gfs on a cluster that spans over two subnets. dream is a > node with to interefaces, one on each subnet. I thought the below setup > should work (taken from http://gfs.wikidev.net/Installation ). But it does > not. Can anybody tell me what is wrong with that? > > cheers, > > wolfgang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Greg Forte gforte at udel.edu IT - User Services University of Delaware 302-831-1982 Newark, DE From rick at espresolutions.com Tue Apr 25 17:17:35 2006 From: rick at espresolutions.com (Rick Bansal) Date: Tue, 25 Apr 2006 12:17:35 -0500 Subject: [Linux-cluster] mysql and redhat cluster suite? Message-ID: <200604251912.k3PJCgMs014563@mx3.redhat.com> Did anyone successfully get multiple mysql daemons to run against a shared data store per Vladimir Grujic suggestion (post Mon, 19 Dec 2005)? I have not been able to as yet. I'm using mysql 4.1 and cannot turn on external-locking. It looks like the option has been complied out in the binaries I have. I'm currently trying to rebuild from source with the "skip-locking" option removed. I'll see if that helps. If anyone has successfully gotten multiple mysql daemons transacting against a shared data store, I'd greatly appreciate any advise. Thanks in advance. Regards, Rick Bansal From kjalleda at gmail.com Tue Apr 25 22:21:08 2006 From: kjalleda at gmail.com (Kishore Jalleda) Date: Tue, 25 Apr 2006 18:21:08 -0400 Subject: [Linux-cluster] mysql and redhat cluster suite? In-Reply-To: <200604251912.k3PJCgMs014563@mx3.redhat.com> References: <200604251912.k3PJCgMs014563@mx3.redhat.com> Message-ID: <78aaf6710604251521m51f9e53o3b6f96cfc4d0cc12@mail.gmail.com> You can only acheive this using Mysql Cluster, if you are talking about multiple mysql daemons using a shared data store, then I don't think you can acheive this using traditional mysql storage engines, Also I am just curious to know why do u need this kind of a setup ?? don't get confused with the Redhat Cluster suite's shared storage which uses GFS, and the locking is taken care of by the GFS, where multiple servers can read/write to a shared storage without worrying about conflicts/locking etc. Mysql Cluster suite is very analogous to Redhat Cluster suite in the sense/intention that multiple nodes/daemons/instances can write simultaneously to a shared data store, with the difference that Mysql cluster suite is based on a shared nothing architecture, which has many SQLD nodes (aka servers in redhat) with data on multiple NDBD nodes (aka shared storage in redhat) Hope this helps Kishore Jalleda http://kjalleda.googlepages.com/projects On 4/25/06, Rick Bansal wrote: > > Did anyone successfully get multiple mysql daemons to run against a shared > data store per Vladimir Grujic suggestion (post Mon, 19 Dec 2005)? I > have > not been able to as yet. > > I'm using mysql 4.1 and cannot turn on external-locking. It looks like > the > option has been complied out in the binaries I have. I'm currently trying > to rebuild from source with the "skip-locking" option removed. I'll see > if > that helps. > > If anyone has successfully gotten multiple mysql daemons transacting > against > a shared data store, I'd greatly appreciate any advise. Thanks in > advance. > > Regards, > Rick Bansal > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pcaulfie at redhat.com Wed Apr 26 07:39:05 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Wed, 26 Apr 2006 08:39:05 +0100 Subject: [Linux-cluster] multicast howto In-Reply-To: <200604251238.41472.ookami@gmx.de> References: <200604251238.41472.ookami@gmx.de> Message-ID: <444F2399.6020809@redhat.com> Wolfgang Pauli wrote: > Hi, > > I am trying to setup gfs on a cluster that spans over two subnets. dream is a > node with to interefaces, one on each subnet. I thought the below setup > should work (taken from http://gfs.wikidev.net/Installation ). But it does > not. Can anybody tell me what is wrong with that? Your multicast address entries are all to cock. They should be the same for /all/ nodes, /and/ for the cman entry. Anwyay, multi-home in CMAN isn't supported by the DLM so you must only specify one multicast address and use ethernet bonding to get multi-path. -- patrick From sander at elexis.nl Wed Apr 26 11:50:43 2006 From: sander at elexis.nl (Sander van Beek - Elexis) Date: Wed, 26 Apr 2006 13:50:43 +0200 Subject: [Linux-cluster] Which 2.6 kernel and cluster tarball will compile together? Message-ID: <7.0.1.0.0.20060426134654.00a2fec0@elexis.nl> Hi all, I'm trying to get GFS working on Slackware 10.2 Tried to compile the sources by hand, but I cannot find any working kernel/cluster combo that will compile. Tried 2.6.9 & cluster-1.02.00, 2.6.12.2 & cluster-1.02.00, the latest 2.6.16 & cluster-1.02.00 and the latest cluster CVS release. But none of these will compile together. Can anyone recommend me a working combination? With best regards, Sander van Beek --------------------------------------- Ing. S. van Beek Elexis Marketing 9 6921 RE Duiven The Netherlands Tel: +31 (0)26 7110329 Mob: +31 (0)6 28395109 Fax: +31 (0)318 611112 Email: sander at elexis.nl Web: http://www.elexis.nl From sander at elexis.nl Wed Apr 26 11:59:32 2006 From: sander at elexis.nl (Sander van Beek - Elexis) Date: Wed, 26 Apr 2006 13:59:32 +0200 Subject: [Linux-cluster] MySQL on GFS benchmarks Message-ID: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl> Hi all, We did a quick benchmark on our 2 node rhel4 testcluster with gfs and a gnbd storage server. The results were very sad. One of the nodes (p3 1ghz, 512 mb) could run +/- 2400 insert queries per second when running mysqld-max 5.0.20 on a local ext3 filesystem. With a 2 node GFS over GNBD setup and inserts on both nodes at the same time, we only could do 80 inserts per second. I'm very interested in the perfomance others got in a similar setup. Would the performance increase when we use software based iscsi instead of gnbd? Or should we simply buy SAN equipment? Does anyone have statistics to compare a standalone mysql setup to a small gfs cluster using a san? With best regards, Sander van Beek --------------------------------------- Ing. S. van Beek Elexis Marketing 9 6921 RE Duiven The Netherlands Tel: +31 (0)26 7110329 Mob: +31 (0)6 28395109 Fax: +31 (0)318 611112 Email: sander at elexis.nl Web: http://www.elexis.nl From pcaulfie at redhat.com Wed Apr 26 12:00:15 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Wed, 26 Apr 2006 13:00:15 +0100 Subject: [Linux-cluster] Which 2.6 kernel and cluster tarball will compile together? In-Reply-To: <7.0.1.0.0.20060426134654.00a2fec0@elexis.nl> References: <7.0.1.0.0.20060426134654.00a2fec0@elexis.nl> Message-ID: <444F60CF.3090409@redhat.com> Sander van Beek - Elexis wrote: > Hi all, > > I'm trying to get GFS working on Slackware 10.2 > Tried to compile the sources by hand, but I cannot find any working > kernel/cluster combo that will compile. > Tried 2.6.9 & cluster-1.02.00, 2.6.12.2 & cluster-1.02.00, the latest > 2.6.16 & cluster-1.02.00 and the latest cluster CVS release. But none of > these will compile together. Can anyone recommend me a working combination? > kernel 2.6.16 & cluster 1.02.00 (or CVS -rSTABLE) should compile. That's what I'm using here. -- patrick From marco.lusini at governo.it Wed Apr 26 12:07:11 2006 From: marco.lusini at governo.it (Marco Lusini) Date: Wed, 26 Apr 2006 14:07:11 +0200 Subject: R: [Linux-cluster] multicast howto In-Reply-To: <444F2399.6020809@redhat.com> Message-ID: <00d701c66929$f2bdf9f0$8ec9100a@nicchio> > > They should be the same for /all/ nodes, /and/ for the cman > entry. Anwyay, multi-home in CMAN isn't supported by the DLM > so you must only specify one multicast address and use > ethernet bonding to get multi-path. > Since I am not using GFS, but just CS4, is it safe to use to run heartbeat on multiple interfaces? TIA, Marco Lusini _______________________________________________________ Messaggio analizzato e protetto da tecnologia antivirus Servizio erogato dal sistema informativo della Presidenza del Consiglio dei Ministri From pcaulfie at redhat.com Wed Apr 26 12:36:09 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Wed, 26 Apr 2006 13:36:09 +0100 Subject: R: [Linux-cluster] multicast howto In-Reply-To: <00d701c66929$f2bdf9f0$8ec9100a@nicchio> References: <00d701c66929$f2bdf9f0$8ec9100a@nicchio> Message-ID: <444F6939.9000804@redhat.com> Marco Lusini wrote: >> They should be the same for /all/ nodes, /and/ for the cman >> entry. Anwyay, multi-home in CMAN isn't supported by the DLM >> so you must only specify one multicast address and use >> ethernet bonding to get multi-path. >> > > Since I am not using GFS, but just CS4, is it safe to use > to run heartbeat on multiple interfaces? It should be. Because of the DLM shortcomings it hasn't been tested for a while though. -- patrick From rajiv.vaidyanath at ccur.com Mon Apr 24 17:28:46 2006 From: rajiv.vaidyanath at ccur.com (Rajiv Vaidyanath) Date: Mon, 24 Apr 2006 13:28:46 -0400 Subject: [Linux-cluster] cluster suite / opteron Message-ID: <1145899726.16894.28.camel@mouse> Hi, I get some compilation warnings on opteron (cluster-1.02.00) Eg: -------------------------------------------- drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1595: warning: long unsigned int format, uint64_t arg (arg 2) drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1596: warning: long unsigned int format, uint64_t arg (arg 2) drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1598: warning: long unsigned int format, uint64_t arg (arg 2) drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1599: warning: long unsigned int format, uint64_t arg (arg 2) -------------------------------------------- Can I safely ignore these warnings ? Thanks, Rajiv From pauli at grey.colorado.edu Mon Apr 24 17:11:45 2006 From: pauli at grey.colorado.edu (Wolfgang Pauli) Date: Mon, 24 Apr 2006 11:11:45 -0600 Subject: [Linux-cluster] different subnets/ manual fencing In-Reply-To: <444C7D24.4090504@redhat.com> References: <30320.1145851164@www075.gmx.net> <444C7D24.4090504@redhat.com> Message-ID: <200604241111.45417.pauli@grey.colorado.edu> > Yes, but you'll need to configure it for multicas rather than broadcast - > and make sure that any intervening routers are good enough. That is good news. So we have the head node (dream) with two ethernet cards. We want it to serve a GFS partition to two different subnets. I guess this is than also doable with multicast, right? > I'd like to see those please. > Thanks! I attached a text file (myoops.txt). I hope it helps. I can't tell, because I never really had to deal with kernel panics before... Thanks again, wolfgang P.S.: Apr 23 16:15:10 node15 kernel: Linux version 2.6.15-1.1833_FC4smp (bhcompile at hs20-bc1-1.build.redhat.com) (gcc version 4.0.2 20051125 (Red Hat 4.0.2-8)) #1 SMP Wed Mar 1 23:56:51 EST 2006 -------------- next part -------------- Apr 23 14:03:59 node15 ccsd[2367]: Starting ccsd 1.0.0: Apr 23 14:03:59 node15 ccsd[2367]: Built: Jun 16 2005 10:45:39 Apr 23 14:03:59 node15 ccsd[2367]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Apr 23 14:04:00 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar 7 2006 15:36:41) installed Apr 23 14:04:00 node15 kernel: NET: Registered protocol family 30 Apr 23 14:04:00 node15 ccsd[2367]: cluster.conf (cluster name = oreilly_cluster, version = 33) found. Apr 23 14:04:03 node15 kernel: CMAN: Waiting to join or form a Linux-cluster Apr 23 14:04:03 node15 ccsd[2367]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2 Apr 23 14:04:03 node15 ccsd[2367]: Initial status:: Inquorate Apr 23 14:04:32 node15 kernel: CMAN: sending membership request Apr 23 14:04:53 node15 last message repeated 19 times Apr 23 14:04:54 node15 kernel: CMAN: got node node27 Apr 23 14:04:54 node15 kernel: CMAN: got node node17 Apr 23 14:04:54 node15 kernel: CMAN: got node node16 Apr 23 14:04:54 node15 kernel: CMAN: got node node24 Apr 23 14:04:54 node15 kernel: CMAN: got node node1 Apr 23 14:04:54 node15 kernel: CMAN: got node node2 Apr 23 14:04:54 node15 kernel: CMAN: got node node23 Apr 23 14:04:54 node15 kernel: CMAN: got node node6 Apr 23 14:04:54 node15 kernel: CMAN: got node node10 Apr 23 14:04:54 node15 kernel: CMAN: got node node9 Apr 23 14:05:01 node15 kernel: CMAN: quorum regained, resuming activity Apr 23 14:05:01 node15 ccsd[2367]: Cluster is quorate. Allowing connections. Apr 23 14:05:01 node15 kernel: dlm: no version for "struct_module" found: kernel tainted. Apr 23 14:05:01 node15 kernel: DLM 2.6.11.5-20050601.152643.FC4.22 (built Mar 7 2006 15:42:37) installed Apr 23 14:07:28 node15 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Apr 23 14:07:28 node15 kernel: printing eip: Apr 23 14:07:28 node15 kernel: f8acfa39 Apr 23 14:07:28 node15 kernel: *pde = 37d1f001 Apr 23 14:07:28 node15 kernel: Oops: 0000 [#1] Apr 23 14:07:28 node15 kernel: SMP Apr 23 14:07:28 node15 kernel: last sysfs file: /class/misc/dlm-control/dev Apr 23 14:07:28 node15 kernel: Modules linked in: dlm(U) cman(U) ipv6 parport_pc lp parport autofs4 nfs lockd nfs_acl sunrpc dm_mod eepro100 uhci_hcd hw_ra ndom i8xx_tco i2c_i801 i2c_core e1000 e100 mii floppy ext3 jbd Apr 23 14:07:28 node15 kernel: CPU: 0 Apr 23 14:07:28 node15 kernel: EIP: 0060:[] Tainted: GF VLI Apr 23 14:07:28 node15 kernel: EFLAGS: 00010207 (2.6.15-1.1833_FC4smp) Apr 23 14:07:28 node15 kernel: EIP is at memcpy_fromkvec+0x2e/0x4f [cman] Apr 23 14:07:28 node15 kernel: eax: 00000046 ebx: c1d7aeb7 ecx: 00000011 edx: f68b0fa0 Apr 23 14:07:28 node15 kernel: esi: 00000000 edi: c1d7aeb7 ebp: 00000046 esp: f68b0ec8 Apr 23 14:07:28 node15 kernel: ds: 007b es: 007b ss: 0068 Apr 23 14:07:28 node15 kernel: Process cman_comms (pid: 2392, threadinfo=f68b0000 task=f7d85550) Apr 23 14:07:28 node15 kernel: Stack: badc0ded f6846380 00000000 f7386400 f68b0f74 f8acfbb3 00000100 00000002 Apr 23 14:07:28 node15 kernel: 00000040 f731e000 f7decb40 f731e001 00000001 00000001 f6cea440 f8ad002d Apr 23 14:07:28 node15 kernel: f68b0f90 00000001 000002fd c1b091e0 f68b0f90 f68b0f74 f7decb40 c1eb4940 Apr 23 14:07:28 node15 kernel: Call Trace: Apr 23 14:07:28 node15 kernel: [] send_to_user_port+0x159/0x3cc [cman] [] process_incoming_packet+0x207/0x26c [cman] Apr 23 14:07:29 node15 kernel: [] receive_message+0xb7/0xe0 [cman] [] cluster_kthread+0x18b/0x39f [cman] Apr 23 14:07:29 node15 kernel: [] default_wake_function+0x0/0xc [] cluster_kthread+0x0/0x39f [cman] Apr 23 14:07:29 node15 kernel: [] kernel_thread_helper+0x5/0xb Apr 23 14:07:29 node15 kernel: Code: 53 89 c3 89 cd 85 c9 7e 3e 83 c2 08 eb 07 83 c2 08 85 ed 7e 32 8b 42 fc 85 c0 74 f2 39 e8 0f 47 c5 89 c1 c1 e9 02 8b 7 2 f8 89 df a5 89 c1 83 e1 03 74 02 f3 a4 29 c5 01 c3 01 42 f8 29 42 fc Apr 23 14:07:29 node15 kernel: Continuing in 120 seconds. ^MContinuing in 119 seconds. ^MContinuing in 118 seconds. ^MContinuing in 117 seconds. ^MContinui ng in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111 seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds. Continuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing in 100 seconds. ^M<4>hda: dma_timer_expiry: dma status == 0x24 Apr 23 14:07:29 node15 kernel: Continuing in 99 seconds. ^MContinuing in 98 seconds. ^MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing i n 95 seconds. ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds. hda: DMA interrupt recovery Apr 23 14:07:29 node15 kernel: hda: lost interrupt Apr 23 14:07:29 node15 kernel: Continuing in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing i n 85 seconds. ^MContinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 80 seconds. Continuing in 79 seconds. ^MContinuing in 78 seconds. ^MContinuing in 77 seconds. ^MContinuing in 76 seconds. ^MContinuing in 75 seconds. ^MContinuing in 7 4 seconds. ^MContinuing in 73 seconds. ^MContinuing in 72 seconds. ^MContinuing in 71 seconds. ^MContinuing in 70 seconds. ^MContinuing in 69 seconds. ^M<4 >hda: dma_timer_expiry: dma status == 0x24 Apr 23 14:07:29 node15 kernel: Continuing in 68 seconds. ^MContinuing in 67 seconds. ^MContinuing in 66 seconds. ^MContinuing in 65 seconds. ^MContinuing i n 64 seconds. ^MContinuing in 63 seconds. ^MContinuing in 62 seconds. ^MContinuing in 61 seconds. ^M<6>ide-cd: cmd 0x3 timed out Apr 23 14:07:29 node15 kernel: hdc: lost interrupt Apr 23 14:07:29 node15 kernel: Continuing in 60 seconds. ^MContinuing in 59 seconds. ^Mhda: DMA interrupt recovery Apr 23 14:07:29 node15 kernel: hda: lost interrupt Apr 23 14:07:29 node15 kernel: Continuing in 58 seconds. ^MContinuing in 57 seconds. ^MContinuing in 56 seconds. ^MContinuing in 55 seconds. ^MContinuing i n 54 seconds. ^MContinuing in 53 seconds. ^MContinuing in 52 seconds. ^MContinuing in 51 seconds. ^MContinuing in 50 seconds. ^MContinuing in 49 seconds. Continuing in 48 seconds. ^MContinuing in 47 seconds. ^MContinuing in 46 seconds. ^MContinuing in 45 seconds. ^MContinuing in 44 seconds. ^MContinuing in 4 3 seconds. ^MContinuing in 42 seconds. ^MContinuing in 41 seconds. ^MContinuing in 40 seconds. ^MContinuing in 39 seconds. ^M<4>hda: dma_timer_expiry: dma status == 0x24 Apr 23 14:07:29 node15 kernel: Continuing in 38 seconds. ^MContinuing in 37 seconds. ^MContinuing in 36 seconds. ^MContinuing in 35 seconds. ^MContinuing i n 34 seconds. ^MContinuing in 33 seconds. ^MContinuing in 32 seconds. ^MContinuing in 31 seconds. ^MContinuing in 30 seconds. ^MContinuing in 29 seconds. hda: DMA interrupt recovery Apr 23 14:07:29 node15 kernel: hda: lost interrupt Apr 23 14:07:29 node15 kernel: Continuing in 28 seconds. ^MContinuing in 27 seconds. ^MContinuing in 26 seconds. ^MContinuing in 25 seconds. ^MContinuing i n 24 seconds. ^MContinuing in 23 seconds. ^MContinuing in 22 seconds. ^MContinuing in 21 seconds. ^MContinuing in 20 seconds. ^MContinuing in 19 seconds. Continuing in 18 seconds. ^MContinuing in 17 seconds. ^MContinuing in 16 seconds. ^MContinuing in 15 seconds. ^MContinuing in 14 seconds. ^MContinuing in 1 3 seconds. ^MContinuing in 12 seconds. ^MContinuing in 11 seconds. ^MContinuing in 10 seconds. ^MContinuing in 9 seconds. ^M<4>hda: dma_timer_expiry: dma s tatus == 0x24 Apr 23 14:07:29 node15 kernel: Continuing in 8 seconds. ^MContinuing in 7 seconds. ^MContinuing in 6 seconds. ^MContinuing in 5 seconds. ^MContinuing in 4 seconds. ^MContinuing in 3 seconds. ^MContinuing in 2 seconds. ^MContinuing in 1 seconds. Apr 23 14:07:29 node15 kernel: <0>Fatal exception: panic in 5 seconds # ----------------------------------------------- Apr 23 16:07:04 node15 ccsd[2373]: Starting ccsd 1.0.0: Apr 23 16:07:04 node15 ccsd[2373]: Built: Jun 16 2005 10:45:39 Apr 23 16:07:04 node15 ccsd[2373]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Apr 23 16:07:05 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar 7 2006 15:36:41) installed Apr 23 16:07:05 node15 kernel: NET: Registered protocol family 30 Apr 23 16:07:05 node15 ccsd[2373]: cluster.conf (cluster name = oreilly_cluster, version = 33) found. Apr 23 16:07:14 node15 kernel: CMAN: Waiting to join or form a Linux-cluster Apr 23 16:07:14 node15 ccsd[2373]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2 Apr 23 16:07:14 node15 ccsd[2373]: Initial status:: Inquorate Apr 23 16:07:17 node15 kernel: CMAN: sending membership request Apr 23 16:07:37 node15 last message repeated 27 times Apr 23 16:07:38 node15 kernel: CMAN: got node node2 Apr 23 16:07:38 node15 kernel: CMAN: got node node26 Apr 23 16:07:38 node15 kernel: CMAN: got node node6 Apr 23 16:07:38 node15 kernel: CMAN: got node node27 Apr 23 16:07:38 node15 kernel: CMAN: got node node4 Apr 23 16:07:38 node15 kernel: CMAN: got node node5 Apr 23 16:07:38 node15 kernel: CMAN: got node node17 Apr 23 16:07:38 node15 kernel: CMAN: got node node3 Apr 23 16:07:38 node15 kernel: CMAN: got node node18 Apr 23 16:07:38 node15 kernel: CMAN: got node node16 Apr 23 16:07:38 node15 kernel: CMAN: got node node23 Apr 23 16:07:38 node15 kernel: CMAN: got node node12 Apr 23 16:07:38 node15 kernel: CMAN: got node node7 Apr 23 16:07:38 node15 ccsd[2373]: Cluster is quorate. Allowing connections. Apr 23 16:07:38 node15 kernel: CMAN: got node dream Apr 23 16:07:38 node15 kernel: CMAN: got node node20 Apr 23 16:07:38 node15 kernel: CMAN: quorum regained, resuming activity Apr 23 16:07:38 node15 kernel: dlm: no version for "struct_module" found: kernel tainted. Apr 23 16:07:38 node15 kernel: DLM 2.6.11.5-20050601.152643.FC4.22 (built Mar 7 2006 15:42:37) installed Apr 23 16:10:06 node15 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Apr 23 16:10:06 node15 kernel: printing eip: Apr 23 16:10:06 node15 kernel: f8a85a39 Apr 23 16:10:06 node15 kernel: *pde = 363b4001 Apr 23 16:10:06 node15 kernel: Oops: 0000 [#1] Apr 23 16:10:06 node15 kernel: SMP Apr 23 16:10:06 node15 kernel: last sysfs file: /class/misc/dlm-control/dev Apr 23 16:10:06 node15 kernel: Modules linked in: dlm(U) cman(U) ipv6 parport_pc lp parport autofs4 sunrpc dm_mod eepro100 uhci_hcd hw_random i8xx_tco i2c_ i801 i2c_core e1000 e100 mii floppy ext3 jbd Apr 23 16:10:06 node15 kernel: CPU: 0 Apr 23 16:10:06 node15 kernel: EIP: 0060:[] Tainted: GF VLI Apr 23 16:10:06 node15 kernel: EFLAGS: 00010202 (2.6.15-1.1833_FC4smp) Apr 23 16:10:06 node15 kernel: EIP is at memcpy_fromkvec+0x2e/0x4f [cman] Apr 23 16:10:06 node15 kernel: eax: 00000040 ebx: c1db9eba ecx: 00000010 edx: f6344fa0 Apr 23 16:10:06 node15 kernel: esi: 00000000 edi: c1db9eba ebp: 00000040 esp: f6344ec8 Apr 23 16:10:06 node15 kernel: ds: 007b es: 007b ss: 0068 Apr 23 16:10:06 node15 kernel: Process cman_comms (pid: 2399, threadinfo=f6344000 task=c1e73aa0) Apr 23 16:10:06 node15 kernel: Stack: badc0ded f63d5a80 00000000 f6000a00 f6344f74 f8a85bb3 00000100 00000002 Apr 23 16:10:06 node15 kernel: 00000040 f66a9800 f6319a40 f66a9801 00000001 00000001 f6383cc0 f8a8602d Apr 23 16:10:06 node15 kernel: f6344f90 00000001 000002fa c1b091e0 f6344f90 f6344f74 f6319a40 f7dc1b80 Apr 23 16:10:06 node15 kernel: Call Trace: Apr 23 16:10:06 node15 kernel: [] send_to_user_port+0x159/0x3cc [cman] [] process_incoming_packet+0x207/0x26c [cman] Apr 23 16:10:06 node15 kernel: [] receive_message+0xb7/0xe0 [cman] [] cluster_kthread+0x18b/0x39f [cman] Apr 23 16:10:06 node15 kernel: [] default_wake_function+0x0/0xc [] cluster_kthread+0x0/0x39f [cman] Apr 23 16:10:06 node15 kernel: [] kernel_thread_helper+0x5/0xb Apr 23 16:10:06 node15 kernel: Code: 53 89 c3 89 cd 85 c9 7e 3e 83 c2 08 eb 07 83 c2 08 85 ed 7e 32 8b 42 fc 85 c0 74 f2 39 e8 0f 47 c5 89 c1 c1 e9 02 8b 7 2 f8 89 df a5 89 c1 83 e1 03 74 02 f3 a4 29 c5 01 c3 01 42 f8 29 42 fc Apr 23 16:10:06 node15 kernel: <0>Fatal exception: panic in 5 seconds # ----------------------------------------------- Apr 23 18:05:33 node15 ccsd[3356]: Starting ccsd 1.0.0: Apr 23 18:05:33 node15 ccsd[3356]: Built: Jun 16 2005 10:45:39 Apr 23 18:05:33 node15 ccsd[3356]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Apr 23 18:05:34 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar 7 2006 15:36:41) installed Apr 23 18:05:34 node15 kernel: NET: Registered protocol family 30 Apr 23 18:05:34 node15 ccsd[3356]: cluster.conf (cluster name = oreilly_cluster, version = 35) found. Apr 23 18:05:35 node15 kernel: CMAN: Waiting to join or form a Linux-cluster Apr 23 18:05:36 node15 ccsd[3356]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2 Apr 23 18:05:36 node15 ccsd[3356]: Initial status:: Inquorate Apr 23 18:05:36 node15 kernel: CMAN: sending membership request Apr 23 18:05:36 node15 kernel: CMAN: got node dream Apr 23 18:06:13 node15 kernel: CMAN: quorum regained, resuming activity Apr 23 18:06:13 node15 ccsd[3356]: Cluster is quorate. Allowing connections. Apr 23 18:06:13 node15 kernel: dlm: no version for "struct_module" found: kernel tainted. Apr 23 18:06:13 node15 kernel: DLM 2.6.11.5-20050601.152643.FC4.22 (built Mar 7 2006 15:42:37) installed Apr 23 18:06:23 node15 kernel: CMAN: node node1 rejoining Apr 23 18:06:28 node15 last message repeated 3 times Apr 23 18:06:32 node15 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Apr 23 18:06:32 node15 kernel: printing eip: Apr 23 18:06:32 node15 kernel: f8a85a39 Apr 23 18:06:32 node15 kernel: *pde = 37e89001 Apr 23 18:06:32 node15 kernel: Oops: 0000 [#1] Apr 23 18:06:32 node15 kernel: SMP Apr 23 18:06:32 node15 kernel: last sysfs file: /class/misc/dlm-control/dev Apr 23 18:06:32 node15 kernel: Modules linked in: dlm(U) cman(U) nfs lockd nfs_acl ipv6 parport_pc lp parport autofs4 sunrpc dm_mod eepro100 uhci_hcd hw_ra ndom i8xx_tco i2c_i801 i2c_core e1000 e100 mii floppy ext3 jbd Apr 23 18:06:32 node15 kernel: CPU: 0 Apr 23 18:06:32 node15 kernel: EIP: 0060:[] Tainted: GF VLI Apr 23 18:06:32 node15 kernel: EFLAGS: 00010203 (2.6.15-1.1833_FC4smp) Apr 23 18:06:32 node15 kernel: EIP is at memcpy_fromkvec+0x2e/0x4f [cman] Apr 23 18:06:32 node15 kernel: eax: 00000042 ebx: c1ffcab9 ecx: 00000010 edx: f5c22fa0 Apr 23 18:06:32 node15 kernel: esi: 00000000 edi: c1ffcab9 ebp: 00000042 esp: f5c22ec8 Apr 23 18:06:32 node15 kernel: ds: 007b es: 007b ss: 0068 Apr 23 18:06:32 node15 kernel: Process cman_comms (pid: 3387, threadinfo=f5c22000 task=c1e50000) Apr 23 18:06:33 node15 kernel: Stack: badc0ded f7fd6d80 00000000 f6462000 f5c22f74 f8a85bb3 00000100 00000002 Apr 23 18:06:33 node15 kernel: 00000040 f66b1000 f5caf9c0 f66b1001 00000001 00000001 f5caf840 f8a8602d Apr 23 18:06:33 node15 kernel: f5c22f90 00000001 000002fb c1b091e0 f5c22f90 f5c22f74 f5caf9c0 c1e6d100 Apr 23 18:06:33 node15 kernel: Call Trace: Apr 23 18:06:33 node15 kernel: [] send_to_user_port+0x159/0x3cc [cman] [] process_incoming_packet+0x207/0x26c [cman] Apr 23 18:06:33 node15 kernel: [] receive_message+0xb7/0xe0 [cman] [] cluster_kthread+0x18b/0x39f [cman] Apr 23 18:06:33 node15 kernel: [] default_wake_function+0x0/0xc [] cluster_kthread+0x0/0x39f [cman] Apr 23 18:06:33 node15 kernel: [] kernel_thread_helper+0x5/0xb Apr 23 18:06:33 node15 kernel: Code: 53 89 c3 89 cd 85 c9 7e 3e 83 c2 08 eb 07 83 c2 08 85 ed 7e 32 8b 42 fc 85 c0 74 f2 39 e8 0f 47 c5 89 c1 c1 e9 02 8b 7 2 f8 89 df a5 89 c1 83 e1 03 74 02 f3 a4 29 c5 01 c3 01 42 f8 29 42 fc Apr 23 18:06:33 node15 kernel: Continuing in 120 seconds. ^MContinuing in 119 seconds. ^MContinuing in 118 seconds. ^MContinuing in 117 seconds. ^MContinui ng in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111 seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds. Continuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing in 100 seconds. ^MContinuing in 99 seconds. ^MContinuing in 98 seconds. ^MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing in 95 seconds . ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds. ^MContinuing in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing in 85 seconds. ^MCo Apr 23 18:06:33 node15 kernel: tinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 8 ng in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111 seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds. Continuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing in 100 seconds. ^MContinuing in 99 seconds. ^MContinuing in 98 seconds. ^MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing in 95 seconds . ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds. ^MContinuing in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing in 85 seconds. ^MCo Apr 23 18:06:33 node15 kernel: tinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 8 0 seconds. ^MContinuing in 79 seconds. ^MContinuing in 78 seconds. ^MContinuing in 77 seconds. ^MContinuing in 76 seconds. ^MContinuing in 75 seconds. ^MCo ntinuing in 74 seconds. ^MContinuing in 73 seconds. ^MContinuing in 72 seconds. ^MContinuing in 71 seconds. ^MContinuing in 70 seconds. ^MContinuing in 69 seconds. ^MContinuing in 68 seconds. ^MContinuing in 67 seconds. ^MContinuing in 66 seconds. ^MContinuing in 65 seconds. ^MContinuing in 64 seconds. ^MCont inuing in 63 seconds. ^MContinuing in 62 seconds. ^MContinuing in 61 seconds. ^MContinuing in 60 seconds. ^MContinuing in 59 seconds. ^MContinuing in 58 se conds. ^MContinuing in 57 seconds. ^MContinuing in 56 seconds. ^MContinuing in 55 seconds. ^MContinuing in 54 seconds. ^MContinuing in 53 seconds. ^MContin uing in 52 seconds. ^MContinuing in 51 seconds. ^MContinuing in 50 seconds. ^MContinuing in 49 seconds. ^MContinuing in 48 seconds. Apr 23 18:06:33 node15 kernel: tinuing in 47 seconds. ^MContinuing in 46 seconds. ^MContinuing in 45 seconds. ^MContinuing in 44 seconds. ^MContinuing in 4 3 seconds. ^MContinuing in 42 seconds. ^MContinuing in 41 seconds. ^MContinuing in 40 seconds. ^MContinuing in 39 seconds. ^MContinuing in 38 seconds. ^MCo ntinuing in 37 seconds. ^MContinuing in 36 seconds. ^MContinuing in 35 seconds. ^MContinuing in 34 seconds. ^MContinuing in 33 seconds. ^MContinuing in 32 seconds. ^MContinuing in 31 seconds. ^MContinuing in 30 seconds. ^MContinuing in 29 seconds. ^MContinuing in 28 seconds. ^MContinuing in 27 seconds. ^MCont inuing in 26 seconds. ^MContinuing in 25 seconds. ^MContinuing in 24 seconds. ^MContinuing in 23 seconds. ^MContinuing in 22 seconds. ^MContinuing in 21 se conds. ^MContinuing in 20 seconds. ^MContinuing in 19 seconds. ^MContinuing in 18 seconds. ^MContinuing in 17 seconds. ^MContinuing in 16 seconds. ^MContin uing in 15 seconds. ^MContinuing in 14 seconds. ^MContinuing in 13 seconds. ^MContinuing in 12 seconds. ^MContinuing in 11 seconds. Apr 23 18:06:33 node15 kernel: tinuing in 10 seconds. ^MContinuing in 9 seconds. ^MContinuing in 8 seconds. ^MContinuing in 7 seconds. ^MContinuing in 6 se conds. ^MContinuing in 5 seconds. ^MContinuing in 4 seconds. ^MContinuing in 3 seconds. ^MContinuing in 2 seconds. ^MContinuing in 1 seconds. Apr 23 18:06:33 node15 kernel: <0>Fatal exception: panic in 5 seconds From darrenf at jammicron.com Tue Apr 25 18:02:51 2006 From: darrenf at jammicron.com (Darren Fraser) Date: Tue, 25 Apr 2006 11:02:51 -0700 Subject: [Linux-cluster] Re: Linux (qmail) clustering References: slrne4mn9e.ipo.mykleb@99RXZYP.ibm.com Message-ID: <444E644B.3010901@jammicron.com> If this is a high volume mail server, GFS and qmail are not going to work nicely together (at least they didn't in my experience). I had a qmail server running on a two node cluster with about 300 virtual domains. Load on each node would spiral out of control until I dropped one of the machines out of the cluster. I've had success with GFS and other services (i.e. ftp and web) but just not with qmail. After googling around some, it appears to be the "NFS safeness" in how qmail delivers mail (see http://www.redhat.com/archives/linux-cluster/2005-September/msg00220.html) that ruins performance on GFS. If this diagnosis is incorrect I'd love to be straightened out because my plan for a load balanced, fault tolerant qmail server had to be scrapped a couple of months back. Cheers, Darren On 2006-04-23 10:55, Jan-Frode Myklebust wrote: > On 2006-04-11, Haydar Akpinar wrote: > > > > I would like to know if it is possible to do and also if any one has done > > qmail clustering on a Linux box. > > Since qmail is Maildir based (no locking problems to worry about), I think > this should be fairly easy to do. You'll just need to decide which > directories needs to be shared, and which needs to be private to each node. > It will probably be enough to have the home directories on a shared storage > (GFS or simply just NFS), and just do load balancing by equal MX record > priorities. > > > > -- > Linux-cluster mailing list > Linux-cluster@??? > https://www.redhat.com/mailman/listinfo/linux-cluster > From sdake at redhat.com Tue Apr 25 18:42:29 2006 From: sdake at redhat.com (Steven Dake) Date: Tue, 25 Apr 2006 11:42:29 -0700 Subject: [Linux-cluster] multicast howto In-Reply-To: <200604251238.41472.ookami@gmx.de> References: <200604251238.41472.ookami@gmx.de> Message-ID: <1145990549.6075.119.camel@shih.broked.org> On Tue, 2006-04-25 at 12:38 -0600, Wolfgang Pauli wrote: > Hi, > > I am trying to setup gfs on a cluster that spans over two subnets. dream is a > node with to interefaces, one on each subnet. I thought the below setup > should work (taken from http://gfs.wikidev.net/Installation ). But it does > not. Can anybody tell me what is wrong with that? > > cheers, > > wolfgang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Wolfgang Do not use the multicast address 224.0.0.1. It is reserved for some various ipv4 operations. Try using 225.0.0.9. If you have a switch between the two subnets, I would expect RHCS to work. If you have a router, I'd expect it not to work as the TTL must be set for multicast packets to hop across routers. For IPV6 the hop count must be set. It appears you are using ipv4. If you have a switch and it doesn't work, try turning off IGMP filtering in the switch +if it is a smart switch. If it is a dumb switch it should just work with some additional latencies. Regards -steve > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From pcaulfie at redhat.com Wed Apr 26 13:17:17 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Wed, 26 Apr 2006 14:17:17 +0100 Subject: [Linux-cluster] multicast howto In-Reply-To: <1145990549.6075.119.camel@shih.broked.org> References: <200604251238.41472.ookami@gmx.de> <1145990549.6075.119.camel@shih.broked.org> Message-ID: <444F72DD.3000607@redhat.com> Steven Dake wrote: > On Tue, 2006-04-25 at 12:38 -0600, Wolfgang Pauli wrote: >> Hi, >> >> I am trying to setup gfs on a cluster that spans over two subnets. dream is a >> node with to interefaces, one on each subnet. I thought the below setup >> should work (taken from http://gfs.wikidev.net/Installation ). But it does >> not. Can anybody tell me what is wrong with that? >> >> cheers, >> >> wolfgang >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > > Wolfgang > Do not use the multicast address 224.0.0.1. It is reserved for some > various ipv4 operations. > > Try using 225.0.0.9. If you have a switch between the two subnets, I > would expect RHCS to work. If you have a router, I'd expect it not to > work as the TTL must be set for multicast packets to hop across routers. > For IPV6 the hop count must be set. It appears you are using ipv4. > > If you have a switch and it doesn't work, try turning off IGMP filtering > in the switch +if it is a smart switch. If it is a dumb switch it > should just work with some additional latencies. Good advice. I've fixed the Wiki page, so it reflects reality a little more. I don't know where that came from but it was confusing. -- patrick From lhh at redhat.com Wed Apr 26 21:36:22 2006 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 26 Apr 2006 17:36:22 -0400 Subject: [Linux-cluster] Re: Meaning of "service" In-Reply-To: References: <1145718821.3302.20.camel@auh5-0479.corp.jabil.org> Message-ID: <1146087382.2984.116.camel@ayanami.boston.redhat.com> On Sun, 2006-04-23 at 13:04 +0200, Troels Arvin wrote: > Hello, > > On Sat, 22 Apr 2006 11:13:41 -0400, Eric Kerin wrote: > >> Should I set this up as > >> a) one Cluster Service, > >> b) as three different Cluster Services? > >> > > I have a very similar setup for my cluster. I recommend option b. > > I ended up doing option a, because I couldn't get the other option > working, for some strange reason. > > By the way: The manual is rather unclear about the difference between > _adding_ a resource, and _attaching_ a resource. Can someone explain the > difference? It's like making a table leg. Just because you have a table leg doesn't mean you have to build a table; you could just have this leg sitting around doing nothing until you decide to use it later. Attach enough pieces together and you can make a table. ;) Unattached (but present) resources are not started by the cluster. Creating "global" resources separate from a service was primarily designed to allow for reuse of resources in some cases. E.g. GFS file systems, clients for cluster NFS services: create "Joe's Desktop" as an NFS client resource, and you can attach it to multiple NFS servers in the cluster. All instances get the same export options. Hmmm... I don't think this plays well in to my table-leg example, because it's really hard to share table legs between multiple tables which are in different rooms; I think you'd have to have to introduce a metaphysical redefinition of the world in order for it to work in which the table legs have built-in infinite improbability drives, but I think you get the idea. ;) -- Lon From pauli at grey.colorado.edu Thu Apr 27 00:45:35 2006 From: pauli at grey.colorado.edu (Wolfgang Pauli) Date: Wed, 26 Apr 2006 18:45:35 -0600 Subject: [Linux-cluster] multicast howto In-Reply-To: <1145990549.6075.119.camel@shih.broked.org> References: <200604251238.41472.ookami@gmx.de> <1145990549.6075.119.camel@shih.broked.org> Message-ID: <200604261845.35577.pauli@grey.colorado.edu> Thanks! First, I had to figure out how multicast works. Still don't fully understand it. I can ping 224.0.0.1 and I get responses from all hosts in the same subnet. It tried to ping 225.0.0.8 but that does not really work. But I don't know whether it has to. I have changed the cluster.conf to have only two nodes. Just to get the basic understanding. I have node dream on subnet 210 and neo on 223. They still form their own clusters. Should I just try different addresses, or do the switches/routers have to be programmed for that? regards, wolfgang From ookami at gmx.de Thu Apr 27 00:47:29 2006 From: ookami at gmx.de (Wolfgang Pauli) Date: Wed, 26 Apr 2006 18:47:29 -0600 Subject: [Linux-cluster] multicast howto In-Reply-To: <1145990549.6075.119.camel@shih.broked.org> References: <200604251238.41472.ookami@gmx.de> <1145990549.6075.119.camel@shih.broked.org> Message-ID: <200604261847.29094.ookami@gmx.de> Thanks! First, I had to figure out how multicast works. Still don't fully understand it. I can ping 224.0.0.1 and I get responses from all hosts in the same subnet. It tried to ping 225.0.0.8 but that does not really work. But I don't know whether it has to. I have changed the cluster.conf to have only two nodes. Just to get the basic understanding. I have node dream on subnet 210 and neo on 223. They still form their own clusters. Should I just try different addresses, or do the switches/routers have to be programmed for that? regards, wolfgang From jason at monsterjam.org Thu Apr 27 02:14:22 2006 From: jason at monsterjam.org (Jason) Date: Wed, 26 Apr 2006 22:14:22 -0400 Subject: [Linux-cluster] fencing? Message-ID: <20060427021422.GC37759@monsterjam.org> ok, so I have GFS installed and have the modules loaded, seems the next step is to set up "fencing" and looking at the list of agents at, http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-fence-methods.html im kinda confused. I thought that the idea behind fencing was to allow one node (i have 2) to take down/reboot the other node after it has detected some sort of fault in the other node. so I understand APC Network Power Switch and WTI Network Power Switch but since I dont have any of those installed, Im going down the list and see GNBD and xCAT as options. I dont understand what these software packages are supposed to be doing in relation to being a fencing agent. Can I use one of these options reliably without having a hardware power switch? I guess in short, the docs I have read dont quite explain how fencing is supposed to work with GFS. regards, Jason From pcaulfie at redhat.com Thu Apr 27 07:29:08 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 27 Apr 2006 08:29:08 +0100 Subject: [Linux-cluster] multicast howto In-Reply-To: <200604261845.35577.pauli@grey.colorado.edu> References: <200604251238.41472.ookami@gmx.de> <1145990549.6075.119.camel@shih.broked.org> <200604261845.35577.pauli@grey.colorado.edu> Message-ID: <445072C4.60208@redhat.com> Wolfgang Pauli wrote: > Thanks! > > First, I had to figure out how multicast works. Still don't fully understand > it. I can ping 224.0.0.1 and I get responses from all hosts in the same > subnet. It tried to ping 225.0.0.8 but that does not really work. But I don't > know whether it has to. > I have changed the cluster.conf to have only two nodes. Just to get the basic > understanding. I have node dream on subnet 210 and neo on 223. They still > form their own clusters. Should I just try different addresses, or do the > switches/routers have to be programmed for that? That cluster.conf file looks a lot more sensible. If the nodes are still not seeing each other then you may have to fiddle with the routers to make sure that the multicast traffic is being passed. tcpdump will tell you whether the traffic is moving between subnets. It's also worth checking that there aren't any iptables rules preventing traffic from the cluster port (6809/udp) or the multicast address reaching the cluster manager. > regards, > > wolfgang > > > > > > > > > > nodename="dream"/> > > > > > > > > > > > > > > > > > > > > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- patrick From jerome.castang at adelpha-lan.org Thu Apr 27 08:08:08 2006 From: jerome.castang at adelpha-lan.org (Castang Jerome) Date: Thu, 27 Apr 2006 10:08:08 +0200 Subject: [Linux-cluster] iSCSI fence agent Message-ID: <44507BE8.20402@adelpha-lan.org> Hi, I found on the RC-list an email (sent in october 2004) about a script witch is a iscsi fence agent Here is the mail: http://www.redhat.com/archives/linux-cluster/2004-October/msg00105.html When I try to start this script, I get this error: "Could not start /usr/bin/ssh root at gfs5 No file or directory". But "/usr/bin/ssh" does exist and node gfs5 is running. Any idea on the problem? -- Jerome Castang mail: jcastang at adelpha-lan.org From cjk at techma.com Thu Apr 27 11:52:37 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Thu, 27 Apr 2006 07:52:37 -0400 Subject: [Linux-cluster] multicast howto Message-ID: Wolfgang, you can't arbitrarily ping multicast addresses as they don't really exist in the sense that an ethernet interface exists as a card. Multicast is a "subscription" based concept. You have to be listening for multicast traffic to receive it at all. To "listen", one "joins" a multicast group (bind to a mcast ip address) and the router, if configured properly will route mcast traffic to your interface. There are certain mcast addresses you should never use explicitly. As mentioned, 224.0.0.1 is one of them, there are others but I can't recall what they are. Your other config didn't work (with respet to multicast) because all of your nodes were listening to different multicast "groups" which is like trying to dial in on a party line, but everone using the wrong phone number. Important to reitterate, Multicast has to be configured by whoever runs your router(s). If Multicast is not enabled on a switch, even if it is on your network, it turns in to broadcast on that switch, which defeats the purpose of multicast in the first place. Hope this clears some multicast stuff up for you. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Wolfgang Pauli Sent: Wednesday, April 26, 2006 8:46 PM To: sdake at redhat.com; linux clustering Subject: Re: [Linux-cluster] multicast howto Thanks! First, I had to figure out how multicast works. Still don't fully understand it. I can ping 224.0.0.1 and I get responses from all hosts in the same subnet. It tried to ping 225.0.0.8 but that does not really work. But I don't know whether it has to. I have changed the cluster.conf to have only two nodes. Just to get the basic understanding. I have node dream on subnet 210 and neo on 223. They still form their own clusters. Should I just try different addresses, or do the switches/routers have to be programmed for that? regards, wolfgang -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From lhh at redhat.com Thu Apr 27 13:36:31 2006 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 27 Apr 2006 09:36:31 -0400 Subject: [Linux-cluster] iSCSI fence agent In-Reply-To: <44507BE8.20402@adelpha-lan.org> References: <44507BE8.20402@adelpha-lan.org> Message-ID: <1146144991.2984.127.camel@ayanami.boston.redhat.com> On Thu, 2006-04-27 at 10:08 +0200, Castang Jerome wrote: > Hi, > > I found on the RC-list an email (sent in october 2004) about a script > witch is a iscsi fence agent > Here is the mail: > > http://www.redhat.com/archives/linux-cluster/2004-October/msg00105.html > > When I try to start this script, I get this error: > "Could not start /usr/bin/ssh root at gfs5 No file or directory". > > But "/usr/bin/ssh" does exist and node gfs5 is running. It's probably trying to exec: /usr/bin/ssh\ root at gfs5 <-- one filename vs /usr/bin/ssh root at gfs5 for some reason; wrong quotation on the system / exec call(s) ? -- Lon From jerome.castang at adelpha-lan.org Thu Apr 27 13:43:21 2006 From: jerome.castang at adelpha-lan.org (Castang Jerome) Date: Thu, 27 Apr 2006 15:43:21 +0200 Subject: [Linux-cluster] iSCSI fence agent In-Reply-To: <1146144991.2984.127.camel@ayanami.boston.redhat.com> References: <44507BE8.20402@adelpha-lan.org> <1146144991.2984.127.camel@ayanami.boston.redhat.com> Message-ID: <4450CA79.9020400@adelpha-lan.org> Lon Hohberger a ?crit : > >It's probably trying to exec: > > /usr/bin/ssh\ root at gfs5 <-- one filename > >vs > /usr/bin/ssh root at gfs5 > >for some reason; wrong quotation on the system / exec call(s) ? > >-- Lon > > > It's ok I found the probleme, I replaced the function "runcommand" by "system" and it works perfectly. Here is the modified perl script: /#!/usr/bin/perl ############################################################################### ############################################################################### ## ## Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. ## Copyright (C) 2004 Red Hat, Inc. All rights reserved. ## ## This copyrighted material is made available to anyone wishing to use, ## modify, copy, or redistribute it subject to the terms and conditions ## of the GNU General Public License v.2. ## ############################################################################### ############################################################################### use Getopt::Std; # Get the program name from $0 and strip directory names $_=$0; s/.*\///; my $pname = $_; $opt_o = 'disable'; # Default fence action # WARNING!! Do not add code bewteen "#BEGIN_VERSION_GENERATION" and # "#END_VERSION_GENERATION" It is generated by the Makefile #BEGIN_VERSION_GENERATION $FENCE_RELEASE_NAME=""; $REDHAT_COPYRIGHT=""; $BUILD_DATE=""; #END_VERSION_GENERATION sub usage { print "Usage:\n"; print "\n"; print "$pname [options]\n"; print "\n"; print "Options:\n"; print " -a ISCSI target address\n"; print " -h usage\n"; # print " -l Login name\n"; print " -n IP of node to disable\n"; print " -o Action: disable (default) or enable\n"; # print " -p Password for login (not used)\n"; print " -q quiet mode\n"; print " -V version\n"; exit 0; } sub fail { ($msg) = @_; print $msg."\n" unless defined $opt_q; $t->close if defined $t; exit 1; } sub fail_usage { ($msg)= _; print STDERR $msg."\n" if $msg; print STDERR "Please use '-h' for usage.\n"; exit 1; } sub version { print "$pname $FENCE_RELEASE_NAME $BUILD_DATE\n"; print "$REDHAT_COPYRIGHT\n" if ( $REDHAT_COPYRIGHT ); exit 0; } if (@ARGV > 0) { #getopts("a:hl:n:o:p:qV") || fail_usage ; getopts("a:hn:o:qV") || fail_usage ; usage if defined $opt_h; version if defined $opt_V; fail_usage "Unknown parameter." if (@ARGV > 0); fail_usage "No '-a' flag specified." unless defined $opt_a; fail_usage "No '-n' flag specified." unless defined $opt_n; fail_usage "Unrecognised action '$opt_o' for '-o' flag" unless $opt_o =~ /^(disable|enable)$/i; } else { get_options_stdin(); fail "failed: no IP address" unless defined $opt_a; fail "failed: no plug number" unless defined $opt_n; #fail "failed: no login name" unless defined $opt_l; #fail "failed: no password" unless defined $opt_p; fail "failed: unrecognised action: $opt_o" unless $opt_o =~ /^(disable|enable)$/i; } # # Set up and log in # my $target_address=$opt_a; #The address of the iSCSI target my $command=$opt_o; #either enable or disable my $node=$opt_n; #the cluster member to lock out #use ssh to log into remote host and send over iptables commands: # iptables -D INPUT -s a.b.c.d -p all -j REJECT # iptables -A INPUT -s a.b.c.d -p all -j REJECT if ($command eq "enable") { #Enable $node on $target_address system("ssh ".' root@'.$target_address." /sbin/iptables -D INPUT -s " . $node . " -p all -j REJECT"); if ($out != 0) { fail "111Could not $command $node on $target_address\n$cmd\n"; } } elsif ($command eq "disable") { #Disable $node on $target_address system("ssh ".' root@'.$target_address." /sbin/iptables -A INPUT -s " . $node . " -p all -j REJECT"); if ($? != 0 ) { fail "Could not $command $node on $target_address\n$cmd\n"; } } else { #This should never happen: fail "Unknown command: $command\n"; } print "success: $command $node\n" unless defined $opt_q; exit 0; sub get_options_stdin { my $opt; my $line = 0; while( defined($in = <>) ) { $_ = $in; chomp; # strip leading and trailing whitespace s/^\s*//; s/\s*$//; # skip comments next if /^#/; $line+=1; $opt=$_; next unless $opt; ($name,$val)=split /\s*=\s*/, $opt; if ( $name eq "" ) { print STDERR "parse error: illegal name in option $line\n"; exit 2; } # DO NOTHING -- this field is used by fenced elsif ($name eq "agent" ) { } # FIXME -- depricated. use "port" instead. elsif ($name eq "fm" ) { (my $dummy,$opt_n) = split /\s+/,$val; print STDERR "Depricated \"fm\" entry detected. refer to man page.\n"; } elsif ($name eq "ipaddr" ) { $opt_a = $val; } elsif ($name eq "login" ) { $opt_l = $val; } # FIXME -- depreicated residue of old fencing system elsif ($name eq "name" ) { } elsif ($name eq "option" ) { $opt_o = $val; } elsif ($name eq "passwd" ) { $opt_p = $val; } elsif ($name eq "port" ) { $opt_n = $val; } # elsif ($name eq "test" ) # { # $opt_T = $val; # } # FIXME should we do more error checking? # Excess name/vals will be eaten for now else { fail "parse error: unknown option \"$opt\""; } } }/ thanks, -- Jerome Castang mail: jcastang at adelpha-lan.org From mbrookov at mines.edu Thu Apr 27 14:17:16 2006 From: mbrookov at mines.edu (Matthew B. Brookover) Date: Thu, 27 Apr 2006 08:17:16 -0600 Subject: [Linux-cluster] iSCSI fence agent In-Reply-To: <4450CA79.9020400@adelpha-lan.org> References: <44507BE8.20402@adelpha-lan.org> <1146144991.2984.127.camel@ayanami.boston.redhat.com> <4450CA79.9020400@adelpha-lan.org> Message-ID: <1146147436.12841.13.camel@merlin.Mines.EDU> I have not used this tool in a while, but it did work on my system. I would not trust this version to fence properly. Using system does not allow the exit status of iptables to be checked for errors. System only reports the status of the ssh command, not the command that is called on the remote host. Matt On Thu, 2006-04-27 at 15:43 +0200, Castang Jerome wrote: > Lon Hohberger a ?crit : > > > > >It's probably trying to exec: > > > > /usr/bin/ssh\ root at gfs5 <-- one filename > > > >vs > > /usr/bin/ssh root at gfs5 > > > >for some reason; wrong quotation on the system / exec call(s) ? > > > >-- Lon > > > > > > > > > It's ok I found the probleme, > I replaced the function "runcommand" by "system" and it works perfectly. > Here is the modified perl script: > > /#!/usr/bin/perl > > ############################################################################### > ############################################################################### > ## > ## Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. > ## Copyright (C) 2004 Red Hat, Inc. All rights reserved. > ## > ## This copyrighted material is made available to anyone wishing to use, > ## modify, copy, or redistribute it subject to the terms and conditions > ## of the GNU General Public License v.2. > ## > ############################################################################### > ############################################################################### > > use Getopt::Std; > > # Get the program name from $0 and strip directory names > $_=$0; > s/.*\///; > my $pname = $_; > > $opt_o = 'disable'; # Default fence action > > # WARNING!! Do not add code bewteen "#BEGIN_VERSION_GENERATION" and > # "#END_VERSION_GENERATION" It is generated by the Makefile > > #BEGIN_VERSION_GENERATION > $FENCE_RELEASE_NAME=""; > $REDHAT_COPYRIGHT=""; > $BUILD_DATE=""; > #END_VERSION_GENERATION > > sub usage > { > print "Usage:\n"; > print "\n"; > print "$pname [options]\n"; > print "\n"; > print "Options:\n"; > print " -a ISCSI target address\n"; > print " -h usage\n"; > # print " -l Login name\n"; > print " -n IP of node to disable\n"; > print " -o Action: disable (default) or enable\n"; > # print " -p Password for login (not used)\n"; > print " -q quiet mode\n"; > print " -V version\n"; > > exit 0; > } > > sub fail > { > ($msg) = @_; > print $msg."\n" unless defined $opt_q; > $t->close if defined $t; > exit 1; > } > > sub fail_usage > { > ($msg)= _; > print STDERR $msg."\n" if $msg; > print STDERR "Please use '-h' for usage.\n"; > exit 1; > } > > sub version > { > print "$pname $FENCE_RELEASE_NAME $BUILD_DATE\n"; > print "$REDHAT_COPYRIGHT\n" if ( $REDHAT_COPYRIGHT ); > > exit 0; > } > > if (@ARGV > 0) > { > #getopts("a:hl:n:o:p:qV") || fail_usage ; > getopts("a:hn:o:qV") || fail_usage ; > > usage if defined $opt_h; > version if defined $opt_V; > > fail_usage "Unknown parameter." if (@ARGV > 0); > > fail_usage "No '-a' flag specified." unless defined $opt_a; > fail_usage "No '-n' flag specified." unless defined $opt_n; > fail_usage "Unrecognised action '$opt_o' for '-o' flag" > unless $opt_o =~ /^(disable|enable)$/i; > > } > else > { > get_options_stdin(); > > fail "failed: no IP address" unless defined $opt_a; > fail "failed: no plug number" unless defined $opt_n; > #fail "failed: no login name" unless defined $opt_l; > #fail "failed: no password" unless defined $opt_p; > fail "failed: unrecognised action: $opt_o" > unless $opt_o =~ /^(disable|enable)$/i; > } > > # > # Set up and log in > # > > my $target_address=$opt_a; #The address of the iSCSI target > my $command=$opt_o; #either enable or disable > my $node=$opt_n; #the cluster member to lock out > > #use ssh to log into remote host and send over iptables commands: > > # iptables -D INPUT -s a.b.c.d -p all -j REJECT > # iptables -A INPUT -s a.b.c.d -p all -j REJECT > > if ($command eq "enable") > { #Enable $node on $target_address > > system("ssh ".' root@'.$target_address." /sbin/iptables -D INPUT > -s " . $node . " -p all -j REJECT"); > > if ($out != 0) > { > fail "111Could not $command $node on > $target_address\n$cmd\n"; > } > } > elsif ($command eq "disable") > { #Disable $node on $target_address > > system("ssh ".' root@'.$target_address." /sbin/iptables -A INPUT > -s " . $node . " -p all -j REJECT"); > > if ($? != 0 ) > { > fail "Could not $command $node on $target_address\n$cmd\n"; > } > } > else > { #This should never happen: > fail "Unknown command: $command\n"; > } > > print "success: $command $node\n" unless defined $opt_q; > exit 0; > > sub get_options_stdin > { > my $opt; > my $line = 0; > while( defined($in = <>) ) > { > $_ = $in; > chomp; > > # strip leading and trailing whitespace > s/^\s*//; > s/\s*$//; > > # skip comments > next if /^#/; > > $line+=1; > $opt=$_; > next unless $opt; > > ($name,$val)=split /\s*=\s*/, $opt; > > if ( $name eq "" ) > { > print STDERR "parse error: illegal name in option $line\n"; > exit 2; > } > > # DO NOTHING -- this field is used by fenced > elsif ($name eq "agent" ) { } > > # FIXME -- depricated. use "port" instead. > elsif ($name eq "fm" ) > { > (my $dummy,$opt_n) = split /\s+/,$val; > print STDERR "Depricated \"fm\" entry detected. refer to > man page.\n"; > } > > elsif ($name eq "ipaddr" ) > { > $opt_a = $val; > } > elsif ($name eq "login" ) > { > $opt_l = $val; > } > > # FIXME -- depreicated residue of old fencing system > elsif ($name eq "name" ) { } > > elsif ($name eq "option" ) > { > $opt_o = $val; > } > elsif ($name eq "passwd" ) > { > $opt_p = $val; > } > elsif ($name eq "port" ) > { > $opt_n = $val; > } > # elsif ($name eq "test" ) > # { > # $opt_T = $val; > # } > > # FIXME should we do more error checking? > # Excess name/vals will be eaten for now > else > { > fail "parse error: unknown option \"$opt\""; > } > } > }/ > > > thanks, > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at bootseg.com Thu Apr 27 14:46:55 2006 From: eric at bootseg.com (Eric Kerin) Date: Thu, 27 Apr 2006 10:46:55 -0400 Subject: [Linux-cluster] iSCSI fence agent In-Reply-To: <4450CA79.9020400@adelpha-lan.org> References: <44507BE8.20402@adelpha-lan.org> <1146144991.2984.127.camel@ayanami.boston.redhat.com> <4450CA79.9020400@adelpha-lan.org> Message-ID: <1146149215.3397.11.camel@auh5-0479.corp.jabil.org> On Thu, 2006-04-27 at 15:43 +0200, Castang Jerome wrote: > Lon Hohberger a ?crit : > > > > >It's probably trying to exec: > > > > /usr/bin/ssh\ root at gfs5 <-- one filename > > > >vs > > /usr/bin/ssh root at gfs5 > > Is the node gfs5 one of the systems mounting the GFS filesystem, or a single box sharing out a device using iscsi? If it's a node mounting the GFS filesystem, this fence method might not work in all failure conditions (kernel panics, intermittent network problems, sky high system load, etc). Since you can't trust a machine that is acting up to follow any of your commands via ssh. This fence script looks like it was meant to ssh into a linux box sharing out the iscsi device, and block the node's access to it. Not ssh into the node, and block it's access to the iscsi device. I figured I'd check. It'd be better to find out if it won't work now, than 3am when your cluster is down since it couldn't fence a node. Thanks, Eric Kerin eric at bootseg.com From jerome.castang at adelpha-lan.org Thu Apr 27 14:49:20 2006 From: jerome.castang at adelpha-lan.org (Castang Jerome) Date: Thu, 27 Apr 2006 16:49:20 +0200 Subject: [Linux-cluster] iSCSI fence agent In-Reply-To: <1146149215.3397.11.camel@auh5-0479.corp.jabil.org> References: <44507BE8.20402@adelpha-lan.org> <1146144991.2984.127.camel@ayanami.boston.redhat.com> <4450CA79.9020400@adelpha-lan.org> <1146149215.3397.11.camel@auh5-0479.corp.jabil.org> Message-ID: <4450D9F0.9060901@adelpha-lan.org> Eric Kerin a ?crit : > >Is the node gfs5 one of the systems mounting the GFS filesystem, or a >single box sharing out a device using iscsi? > > It's a single box sharing out a device using iscsi. -- Jerome Castang mail: jcastang at adelpha-lan.org From guillermo.gomez at gmail.com Wed Apr 26 20:43:18 2006 From: guillermo.gomez at gmail.com (=?ISO-8859-1?Q?Guillermo_G=F3mez?=) Date: Wed, 26 Apr 2006 16:43:18 -0400 Subject: [Linux-cluster] using fiber channel Message-ID: <444FDB66.5080303@gmail.com> Hi, i would like if this mail list is the right one to discuss for using SAN with Fiber Channel HDAs. regards Guillermo G?mez S. Caracas/Venezuela From sdake at redhat.com Thu Apr 27 00:56:05 2006 From: sdake at redhat.com (Steven Dake) Date: Wed, 26 Apr 2006 17:56:05 -0700 Subject: [Linux-cluster] multicast howto In-Reply-To: <200604261847.29094.ookami@gmx.de> References: <200604251238.41472.ookami@gmx.de> <1145990549.6075.119.camel@shih.broked.org> <200604261847.29094.ookami@gmx.de> Message-ID: <1146099365.11702.1.camel@shih.broked.org> On Wed, 2006-04-26 at 18:47 -0600, Wolfgang Pauli wrote: > Thanks! > > First, I had to figure out how multicast works. Still don't fully understand > it. I can ping 224.0.0.1 and I get responses from all hosts in the same > subnet. It tried to ping 225.0.0.8 but that does not really work. But I don't > know whether it has to. > I have changed the cluster.conf to have only two nodes. Just to get the basic > understanding. I have node dream on subnet 210 and neo on 223. They still > form their own clusters. Should I just try different addresses, or do the > switches/routers have to be programmed for that? > > regards, > > wolfgang > you dont say whether you have routers or switches between the two machines it probably wont work across routers it should work across switches as long as they a) dont have IGMP filtering enabled or b) their IGMP implementation is good which most are not > > > > > > > > > nodename="dream"/> > > > > > > > > > > > > > > > > > > > > > > > From gforte at leopard.us.udel.edu Thu Apr 27 14:56:27 2006 From: gforte at leopard.us.udel.edu (Greg Forte) Date: Thu, 27 Apr 2006 10:56:27 -0400 Subject: [Linux-cluster] using fiber channel In-Reply-To: <444FDB66.5080303@gmail.com> References: <444FDB66.5080303@gmail.com> Message-ID: <4450DB9B.5000904@leopard.us.udel.edu> use them for what? ;-) lots of people on this list use them, but they're not really the focus of the list. If you have questions about using a SAN with a cluster, this is the place. If you have general questions about how they work, not so much, but someone can/will probably still answer them. -g Guillermo G?mez wrote: > Hi, i would like if this mail list is the right one to discuss for using > SAN with Fiber Channel HDAs. > > regards > Guillermo G?mez S. > Caracas/Venezuela > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Greg Forte gforte at udel.edu IT - User Services University of Delaware 302-831-1982 Newark, DE From carlopmart at gmail.com Thu Apr 27 16:42:26 2006 From: carlopmart at gmail.com (carlopmart) Date: Thu, 27 Apr 2006 18:42:26 +0200 Subject: [Linux-cluster] Recommended HP servers for cluster suite Message-ID: <4450F472.8050205@gmail.com> Hi all, Somebody can recommends me some HP servers to use with Redhat Cluster Suite for RHEL 4?? My requeriments are: - 4GB RAM - Scsi disks - Two CPUs - iLO support for RHCS fence agent. I don't need shred storage. Many thanks -- CL Martinez carlopmart {at} gmail {d0t} com From lhh at redhat.com Thu Apr 27 17:34:44 2006 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 27 Apr 2006 13:34:44 -0400 Subject: [Linux-cluster] cluster suite / Opteron In-Reply-To: <1145966377.16894.30.camel@mouse> References: <1145966377.16894.30.camel@mouse> Message-ID: <1146159284.2984.161.camel@ayanami.boston.redhat.com> On Tue, 2006-04-25 at 07:59 -0400, Rajiv Vaidyanath wrote: > Hi, > > I get some compilation warnings on opteron (cluster-1.02.00) > > Eg: > -------------------------------------------- > drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1595: warning: long unsigned > int format, uint64_t arg (arg 2) > drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1596: warning: long unsigned > int format, uint64_t arg (arg 2) > drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1598: warning: long unsigned > int format, uint64_t arg (arg 2) > drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1599: warning: long unsigned > int format, uint64_t arg (arg 2) > -------------------------------------------- > > Can I safely ignore these warnings ? Yes, but thanks for noting them; they should be fixed at some point simply for cleanliness. -- Lon From Steve.Bagby at neartek.com Thu Apr 27 18:00:04 2006 From: Steve.Bagby at neartek.com (Steve Bagby) Date: Thu, 27 Apr 2006 14:00:04 -0400 Subject: [Linux-cluster] Fencing using Fibre Alliance MIB Message-ID: Has anyone thought about or done a fence agent using the (more or less) standard Fibre Alliance MIB ? Seems like this would work for a range of switches ... From rainer at ultra-secure.de Thu Apr 27 19:57:14 2006 From: rainer at ultra-secure.de (rainer at ultra-secure.de) Date: Thu, 27 Apr 2006 21:57:14 +0200 Subject: [Linux-cluster] Recommended HP servers for cluster suite In-Reply-To: <4450F472.8050205@gmail.com> References: <4450F472.8050205@gmail.com> Message-ID: <20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de> Quoting carlopmart : > Hi all, > > Somebody can recommends me some HP servers to use with Redhat > Cluster Suite for RHEL 4?? My requeriments are: > > - 4GB RAM > - Scsi disks > - Two CPUs > - iLO support for RHCS fence agent. > > I don't need shred storage. Blades. bl20p can be had very cheap nowadays, but should be enough for most tasks. Downside: only two internal disks, the rest is via SAN (or iSCSI). cheers, Rainer ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From jparsons at redhat.com Thu Apr 27 20:03:58 2006 From: jparsons at redhat.com (James Parsons) Date: Thu, 27 Apr 2006 16:03:58 -0400 Subject: [Linux-cluster] Recommended HP servers for cluster suite In-Reply-To: <20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de> References: <4450F472.8050205@gmail.com> <20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de> Message-ID: <445123AE.4000204@redhat.com> rainer at ultra-secure.de wrote: > Quoting carlopmart : > >> Hi all, >> >> Somebody can recommends me some HP servers to use with Redhat >> Cluster Suite for RHEL 4?? My requeriments are: >> >> - 4GB RAM >> - Scsi disks >> - Two CPUs >> - iLO support for RHCS fence agent. >> >> I don't need shred storage. > > > Blades. > bl20p can be had very cheap nowadays, but should be enough for most > tasks. > Downside: only two internal disks, the rest is via SAN (or iSCSI). I want to add a vote for the proliant bl* series. It uses iLO...not the older Riloe cards, which have been problematic now and then. -J From pcaulfie at redhat.com Fri Apr 28 10:39:52 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 28 Apr 2006 11:39:52 +0100 Subject: [Linux-cluster] New features/architecture ? In-Reply-To: <4448D132.2020906@redhat.com> References: <20060420153326.GB22326@redhat.com> <4447AE0C.30000@redhat.com> <4448D132.2020906@redhat.com> Message-ID: <4451F0F8.2060503@redhat.com> OK, here's whole new document on where cman is going and how it fits in with Openais and all that sort of stuff. Comments welcome. http://people.redhat.com/pcaulfie/docs/aiscman.odt -- patrick From Matthew.Patton.ctr at osd.mil Fri Apr 28 14:51:48 2006 From: Matthew.Patton.ctr at osd.mil (Patton, Matthew F, CTR, OSD-PA&E) Date: Fri, 28 Apr 2006 10:51:48 -0400 Subject: [Linux-cluster] New features/architecture ? Message-ID: Classification: UNCLASSIFIED any chance for a vendor neutral document format - say RTF? > OK, here's whole new document on where cman is going and how > it fits in with > Openais and all that sort of stuff. > > Comments welcome. > > http://people.redhat.com/pcaulfie/docs/aiscman.odt -------------- next part -------------- An HTML attachment was scrubbed... URL: From pcaulfie at redhat.com Fri Apr 28 15:09:30 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 28 Apr 2006 16:09:30 +0100 Subject: [Linux-cluster] New features/architecture ? In-Reply-To: References: Message-ID: <4452302A.9030707@redhat.com> Patton, Matthew F, CTR, OSD-PA&E wrote: > Classification: UNCLASSIFIED > > any chance for a vendor neutral document format - say RTF? !! RTF is a microsoft format ODT is Open Document format >> OK, here's whole new document on where cman is going and how >> it fits in with >> Openais and all that sort of stuff. >> >> Comments welcome. >> >> http://people.redhat.com/pcaulfie/docs/aiscman.odt > -- patrick From pcaulfie at redhat.com Fri Apr 28 15:16:10 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 28 Apr 2006 16:16:10 +0100 Subject: [Linux-cluster] New features/architecture ? In-Reply-To: References: Message-ID: <445231BA.2090005@redhat.com> For those that don't have the bandwidth to download OpenOffice.org, here's a PDF: http://people.redhat.com/pcaulfie/docs/aiscman.pdf -- patrick From jason at monsterjam.org Sat Apr 29 01:58:51 2006 From: jason at monsterjam.org (Jason) Date: Fri, 28 Apr 2006 21:58:51 -0400 Subject: [Linux-cluster] 2nd try: fencing? Message-ID: <20060429015851.GB66106@monsterjam.org> ok, so I have GFS installed and have the modules loaded, seems the next step is to set up "fencing" and looking at the list of agents at, http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-fence-methods.html im kinda confused. I thought that the idea behind fencing was to allow one node (i have 2) to take down/reboot the other node after it has detected some sort of fault in the other node. so I understand APC Network Power Switch and WTI Network Power Switch but since I dont have any of those installed, Im going down the list and see GNBD and xCAT as options. I dont understand what these software packages are supposed to be doing in relation to being a fencing agent. Can I use one of these options reliably without having a hardware power switch? I guess in short, the docs I have read dont quite explain how fencing is supposed to work with GFS. regards, Jason From eric at bootseg.com Sat Apr 29 02:36:02 2006 From: eric at bootseg.com (Eric Kerin) Date: Fri, 28 Apr 2006 22:36:02 -0400 Subject: [Linux-cluster] 2nd try: fencing? In-Reply-To: <20060429015851.GB66106@monsterjam.org> References: <20060429015851.GB66106@monsterjam.org> Message-ID: <1146278162.5933.12.camel@mechanism.localnet> On Fri, 2006-04-28 at 21:58 -0400, Jason wrote: > ok, so I have GFS installed and have the modules loaded, seems the next step is to set up "fencing" > and looking at the list of agents at, > http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-fence-methods.html > im kinda confused. I thought that the idea behind fencing was to allow one node (i have 2) to take > down/reboot the other node after it has detected some sort of fault in the other node. Quick and dirty reason for needing fencing: In GFS you need to stop a failed (or semi-failed) node from writing data to the shared filesystem, otherwise corruption may occur. One method is to power off a misbehaving node. Another is to block access to the shared disk by telling a SAN switch to disable it's port. Still another is to tell a firewall to not allow network traffic to an iSCSI device from the offending node. What you use for a fence method all depends on your hardware. If you give a quick explanation of your hardware setup, we might be able to help you pick a fence device that will work with what you have already. Or if you don't have anything that could be used to block access, you might have to buy some network power switches. OR, if this isn't intended for production use, and you're just testing, you can use fence_manual. This one has the unpleasant downside of needing manual intervention to bring the cluster up after a node failure. But for testing GFS and Cluster Suite, it's nice and cheap. Thanks, Eric Kerin eric at bootseg.com From johannes.russek at io-consulting.net Sat Apr 29 16:42:11 2006 From: johannes.russek at io-consulting.net (Johannes russek) Date: Sat, 29 Apr 2006 18:42:11 +0200 Subject: [Linux-cluster] changes in include/linux/fs.h from 2.6.16 to 2.6.17 Message-ID: hello everyone, has anyone made a patch to use the new mutex mechanism in 2.6.17 in struct block_device? or am i the first one to try and notice? :) best regards, johannes russek From jason at monsterjam.org Sun Apr 30 00:52:45 2006 From: jason at monsterjam.org (Jason) Date: Sat, 29 Apr 2006 20:52:45 -0400 Subject: [Linux-cluster] 2nd try: fencing? In-Reply-To: <1146278162.5933.12.camel@mechanism.localnet> References: <20060429015851.GB66106@monsterjam.org> <1146278162.5933.12.camel@mechanism.localnet> Message-ID: <20060430005245.GA76504@monsterjam.org> > What you use for a fence method all depends on your hardware. If you > give a quick explanation of your hardware setup, we might be able to > help you pick a fence device that will work with what you have already. > Or if you don't have anything that could be used to block access, you > might have to buy some network power switches. right now, all I have is 2 dell servers in a rack with identical configs. (dual ethernet controllers and 1 separate controller for the heartbeat). Both are running linux-ha and are both connected to a dell powervault 220S storage array which is configured so that both hosts can access the drives concurrently (cluster mode). Im following the instructions at http://www.gyrate.org/archives/9 and am at step 17.. which says to configure CCS. I guess we could get an APC power switch, but what would you folks suggest? i.e. what model for just a 2 cluster node (each server has 2 power supplies). Or is there a better way? regards, Jason From filipe.miranda at gmail.com Sun Apr 30 22:00:01 2006 From: filipe.miranda at gmail.com (Filipe Miranda) Date: Sun, 30 Apr 2006 19:00:01 -0300 Subject: [Linux-cluster] MySQL on GFS benchmarks In-Reply-To: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl> References: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl> Message-ID: Sander, It depends, if you are looking for performance, definately SAN. iSCSI might have a better performance over GNBD. I found this on google http://www.bwbug.org/docs/RedHat-GNBD-Ethernet-SAN.pdf It has some detais about GFS on SAN and on GNBD, It might help though.3 Good Luck and keep us posted. Att. FTM On 4/26/06, Sander van Beek - Elexis wrote: > > Hi all, > > We did a quick benchmark on our 2 node rhel4 testcluster with gfs and > a gnbd storage server. The results were very sad. One of the nodes > (p3 1ghz, 512 mb) could run +/- 2400 insert queries per second when > running mysqld-max 5.0.20 on a local ext3 filesystem. With a 2 node > GFS over GNBD setup and inserts on both nodes at the same time, we > only could do 80 inserts per second. I'm very interested in the > perfomance others got in a similar setup. Would the performance > increase when we use software based iscsi instead of gnbd? > Or should we simply buy SAN equipment? Does anyone have statistics to > compare a standalone mysql setup to a small gfs cluster using a san? > > > With best regards, > Sander van Beek > > --------------------------------------- > > Ing. S. van Beek > Elexis > Marketing 9 > 6921 RE Duiven > The Netherlands > > Tel: +31 (0)26 7110329 > Mob: +31 (0)6 28395109 > Fax: +31 (0)318 611112 > Email: sander at elexis.nl > Web: http://www.elexis.nl > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: