From sco at adviseo.fr Sat Apr 1 21:06:32 2006
From: sco at adviseo.fr (Sylvain Coutant)
Date: Sat, 1 Apr 2006 23:06:32 +0200
Subject: [Linux-cluster] gnbd server & cache
Message-ID: <003001c655d0$2706e680$6300000a@ELTON>
Hi,
Could someone help me understand why gnbd server does not support non-caching exports when not coupled with the cluster suite ? I wonder what's the link between both ...
BR,
--
Sylvain COUTANT
ADVISEO
http://www.adviseo.fr/
http://www.open-sp.fr/
From halomoan at powere2e.com Sun Apr 2 04:58:35 2006
From: halomoan at powere2e.com (Halomoan )
Date: Sun, 2 Apr 2006 12:58:35 +0800
Subject: [Linux-cluster] GFS is for what and how it works ?
Message-ID: <200604021258.AA403309094@mail.powere2e.com>
Sorry, I'm newbie in GFS.
Followed Redhat's GFS documentation
To find out how GFS works, I have 2 nodes (node A and node B) for GFS and 1 node (node C) for GNBD server. It runs with no error but i don't know how to use it (GFS)
I attached my /etc/cluster/cluster.conf below.
My question is:
1. At a time, how many nodes have GFS filesystem mounted ? Where is the cluster's work in GFS ?
2. How do I shared the GFS filesystem to other server ? Do I need other software ?
3. From this configuration, if node A failed, what happen to the GFS filesystem ? failover to node B ? How about with the other server that is using the GFS filesystem in node A ?
4. Could you give me example what is actually the GFS real usage in real live ?
I'm absolutely confuse with this GFS on how they works.
Thanks
Regards,
Halomoan
--------------------- Cluster.conf ------------------------
______________ ______________ ______________ ______________
Sent via the KillerWebMail system at mail.powere2e.com
From pcaulfie at redhat.com Mon Apr 3 09:04:11 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 03 Apr 2006 10:04:11 +0100
Subject: [Linux-cluster] standard mechanism to communicate between cluster
nodes from kernel
In-Reply-To:
References:
Message-ID: <4430E50B.9020104@redhat.com>
Aneesh Kumar wrote:
> Hi all,
>
> I was trying to understand whether there is a standard set of API we
> are working on for communicating between different nodes in a cluster
> inside kernel. I looked at ocfs2 and the ocfs2 dlm code base seems to
> use tcp via o2net_send_tcp_msg and the redhat dlm seems to sctp. There
> is also tipc (net/tipc) code in the kernel now ( I am not sure about
> the details of tipc). This confuses me a lot. If i want to use all
> these cluster components what is the standard way. I am right now
> looking at clusterproc
> (http://www.openssi.org/cgi-bin/view?page=proc-hooks.html ) and
> wondering what should be the communication mechanism. clusterproc was
> earlier based on CI which provided a simple easy way to define
> different cluster services( more or less like rpcgen style
> http://ci-linux.sourceforge.net/ics.shtml ). Does we are looking for a
> framework like that ?
>
> NOTE: I am not trying to find out which one is the best. I am trying
> to find out if there is a standard way of doing this
>
I'll repeat the reply I sent you you when you asked me this via private email,
just for the record...
I think you've answered your own question. each cluster manager has its own
way of communicating between nodes.
As for which is best, That depends on what you mean by "best". There are
lots of variables in cluster comms. Do you want speed? reliability?
predictability? ordering?"
--
patrick
From thaidn at gmail.com Mon Apr 3 10:30:16 2006
From: thaidn at gmail.com (Thai Duong)
Date: Mon, 3 Apr 2006 17:30:16 +0700
Subject: [Linux-cluster] Manual fencing doest work
Message-ID:
Hi all,
I have a 2 node GFS 6.1 cluster with the following configuration:
It turns out that manual fencing doest work as expected. When I force power
down a node, the other could not fence it and worse, the whole GFS file
system is freeze waiting for the downed node to be up again. I got something
like below in kernel log
Apr 2 16:46:28 fcc1 fenced[3444]: fencing node "fcc4"
Apr 2 16:46:28 fcc1 fenced[3444]: fence "fcc4" failed
Some information about GFS and kernel:
[root at fcc1 ~]# rpm -qa | grep GFS
GFS-6.1.3-0
GFS-kernel-2.6.9-45.0.2
[root at fcc1 ~]# uname -a
Linux fcc1 2.6.9-22.0.2.EL #1 SMP Thu Jan 5 17:04:58 EST 2006 ia64 ia64 ia64
GNU/Linux
Please help.
TIA,
Thai Duong.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From sunjw at onewaveinc.com Mon Apr 3 09:51:36 2006
From: sunjw at onewaveinc.com (=?GB2312?B?y++/oc6w?=)
Date: Mon, 3 Apr 2006 17:51:36 +0800
Subject: [Linux-cluster] kernel panic about lock_dlm
Message-ID:
Hi, everyone
I use kernel 2.6.15-rc7 and the latest STABLE cvs branch of GFS
when the newest kernel is 2.6.15-rc7?
I've started a GFS cluster with 4 nodes, but after about 4 days,
the cluster did not work.I found the /var/log/messages as follows:
<--
Mar 28 15:31:29 nd05 kernel: d 1 locks
Mar 28 15:31:29 nd05 kernel: gfs-sda1 update remastered resources
Mar 28 15:31:29 nd05 kernel: gfs-sda1 updated 0 resources
Mar 28 15:31:29 nd05 kernel: gfs-sda1 rebuild locks
Mar 28 15:31:29 nd05 kernel: gfs-sda1 rebuilt 0 locks
Mar 28 15:31:29 nd05 kernel: gfs-sda1 recover event 11 done
Mar 28 15:31:29 nd05 kernel: gfs-sda1 move flags 0,0,1 ids 8,11,11
Mar 28 15:31:29 nd05 kernel: gfs-sda1 process held requests
Mar 28 15:31:29 nd05 kernel: gfs-sda1 processed 0 requests
Mar 28 15:31:29 nd05 kernel: gfs-sda1 resend marked requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 resent 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 11 finished
Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 1,0,0 ids 11,11,11
Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 0,1,0 ids 11,14,11
Mar 28 15:31:30 nd05 kernel: gfs-sda1 move use event 14
Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14
Mar 28 15:31:30 nd05 kernel: gfs-sda1 add node 2
Mar 28 15:31:30 nd05 kernel: gfs-sda1 total nodes 4
Mar 28 15:31:30 nd05 kernel: gfs-sda1 rebuild resource directory
Mar 28 15:31:30 nd05 kernel: gfs-sda1 rebuilt 1552 resources
Mar 28 15:31:30 nd05 kernel: gfs-sda1 purge requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 purged 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 mark waiting requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 marked 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14 done
Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 0,0,1 ids 11,14,14
Mar 28 15:31:30 nd05 kernel: gfs-sda1 process held requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 processed 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 resend marked requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 resent 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14 finished
Mar 28 15:31:30 nd05 kernel: gfs-sda1 grant lock on lockqueue 2
Mar 28 15:31:30 nd05 kernel: gfs-sda1 process_lockqueue_reply id 9190386 state 0
Mar 28 15:31:30 nd05 kernel: gfs-sda1 grant lock on lockqueue 2
Mar 28 15:31:30 nd05 kernel: gfs-sda1 process_lockqueue_reply id eab0065 state 0
Mar 28 15:31:30 nd05 kernel: gfs-sda1 unlock fb040350 no id
Mar 28 15:31:30 nd05 kernel: recovery_done jid 3 msg 309 a
Mar 28 15:31:30 nd05 kernel: 3961 recovery_done nodeid 4 flg 18
Mar 28 15:31:30 nd05 kernel: 3977 pr_start last_stop 3 last_start 4 last_finish 3
Mar 28 15:31:31 nd05 kernel: 3977 pr_start count 3 type 3 event 4 flags 21a
Mar 28 15:31:31 nd05 kernel: 3977 pr_start 4 done 1
Mar 28 15:31:31 nd05 kernel: 3976 pr_finish flags 1a
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13415b4b id 163005c 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13425b42 id 180002f 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13435b39 id 1a00360 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13445b30 id 1760186 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13455b27 id 17a038b 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13465b1e id 15a01a8 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13475b15 id 1910380 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13485b0c id 1880309 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13495b03 id 17001e6 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134a5afa id 1940352 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134b5af1 id 1650349 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134c5ae8 id 167001d 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,134d5adf id 15c0083 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134e5ad6 id 1770155 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134f5acd id 16400cb 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13505ac4 id 1680102 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13515abb id 1920051 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13525ab2 id 1850182 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13535aa9 id 17301cb 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13545aa0 id 17803ed 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13555a97 id 18a0111 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13565a8e id 16d03c5 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13575a85 id 1870026 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13585a7c id 185030b 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13595a73 id 15d0190 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135a5a6a id 14b03f1 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135b5a61 id 177025e 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135c5a58 id 198016f 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135d5a4f id 1640163 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135e5a46 id 1730233 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135f5a3d id 1880130 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13495b03 id 17001e6 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134a5afa id 1940352 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134b5af1 id 1650349 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134c5ae8 id 167001d 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,134d5adf id 15c0083 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134e5ad6 id 1770155 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134f5acd id 16400cb 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13505ac4 id 1680102 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13515abb id 1920051 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13525ab2 id 1850182 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13535aa9 id 17301cb 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13545aa0 id 17803ed 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13555a97 id 18a0111 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13565a8e id 16d03c5 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13575a85 id 1870026 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13585a7c id 185030b 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13595a73 id 15d0190 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135a5a6a id 14b03f1 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135b5a61 id 177025e 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135c5a58 id 198016f 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135d5a4f id 1640163 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135e5a46 id 1730233 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135f5a3d id 1880130 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13605a34 id 16f00aa 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13615a2b id 17400e1 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13625a22 id 16b03c1 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13635a19 id 16b03ad 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13645a10 id 17e03d4 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13655a07 id 18202c0 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136659fe id 170036c 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136759f5 id 155031c 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136859ec id 1660212 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136959e3 id 15c0114 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136a59da id 15a038f 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136b59d1 id 17600bb 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136c59c8 id 1a20336 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136d59bf id 171003c 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136e59b6 id 1500008 3,0
Mar 28 15:31:32 nd05 kernel: 3976 pr_start last_stop 4 last_start 9 last_finish 4
Mar 28 15:31:33 nd05 kernel: 3976 pr_start count 4 type 2 event 9 flags 21a
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,136f59ad id 15e026f 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,137059a4 id 170017e 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,1371599b id 16b01e3 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13725992 id 18000a2 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13735989 id 177017c 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13745980 id 16d035a 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13755977 id 18102d6 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,1376596e id 1740020 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13775965 id 1780207 3,0
Mar 28 15:31:33 nd05 kernel: 3976 pr_start 9 done 1
Mar 28 15:31:33 nd05 kernel: 3976 pr_finish flags 1a
Mar 28 15:31:33 nd05 kernel: 3976 pr_start last_stop 9 last_start 10 last_finish 9
Mar 28 15:31:33 nd05 kernel: 3976 pr_start count 3 type 3 event 10 flags 21a
Mar 28 15:31:33 nd05 kernel: 3976 pr_start 10 done 1
Mar 28 15:31:33 nd05 kernel: 3977 pr_finish flags 1a
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,370232 id 23a010e 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,380229 id 2630143 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,390220 id 29f0338 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3a0217 id 2850133 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3b020e id 268035b 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3c0205 id 2710344 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3d01fc id 27701f4 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3e01f3 id 28203f7 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3f01ea id 236011f 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,4001e1 id 25e0387 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,4101d8 id 2810157 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4201cf id 248035a 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4301c6 id 24d0297 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4401bd id 2920280 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4501b4 id 267000b 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4601ab id 263012c 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4701a2 id 2930281 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,480199 id 28e028d 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,490190 id 243031a 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4a0187 id 259000d 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4b017e id 2650370 3,0
Mar 28 15:31:35 nd05 kernel: 3976 pr_start last_stop 10 last_start 15 last_finish 10
Mar 28 15:31:35 nd05 kernel: 3976 pr_start count 4 type 2 event 15 flags 21a
Mar 28 15:31:35 nd05 kernel: 3976 pr_start 15 done 1
Mar 28 15:31:35 nd05 kernel: 3976 pr_finish flags 1a
Mar 28 15:31:35 nd05 kernel:
Mar 28 15:31:35 nd05 kernel: lock_dlm: Assertion failed on line 357 of file /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/dlm/lock.c
Mar 28 15:31:35 nd05 kernel: lock_dlm: assertion: "!error"
Mar 28 15:31:35 nd05 kernel: lock_dlm: time = 79185725
Mar 28 15:31:35 nd05 kernel: gfs-sda1: error=-22 num=3,133b5b81 lkf=9 flags=84
Mar 28 15:31:35 nd05 kernel:
Mar 28 15:31:37 nd05 kernel: ------------[ cut here ]------------
Mar 28 15:31:37 nd05 kernel: kernel BUG at /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/dlm/lock.c:357!
Mar 28 15:31:37 nd05 kernel: invalid operand: 0000 [#1]
Mar 28 15:31:37 nd05 kernel: SMP
Mar 28 15:31:37 nd05 kernel: Modules linked in: lock_dlm dlm cman gfs lock_harness ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msgha
ndler binfmt_misc dm_mirror dm_round_robin dm_multipath dm_mod video thermal processor fan button battery ac uhci_hcd usbcore hw_random shpchp
pci_hotplug e1000 bonding qla2300 qla2xxx scsi_transport_fc sd_mod
Mar 28 15:31:37 nd05 kernel: CPU: 1
Mar 28 15:31:37 nd05 kernel: EIP: 0060:[] Not tainted VLI
Mar 28 15:31:37 nd05 kernel: EFLAGS: 00010282 (2.6.15-rc7smp)
Mar 28 15:31:37 nd05 kernel: EIP is at do_dlm_unlock+0x8f/0xa4 [lock_dlm]
Mar 28 15:31:37 nd05 kernel: eax: 00000004 ebx: f560c180 ecx: f5cf7f10 edx: f89edf11
Mar 28 15:31:37 nd05 kernel: esi: ffffffea edi: f8a7f000 ebp: f8a61580 esp: f5cf7f0c
Mar 28 15:31:37 nd05 kernel: ds: 007b es: 007b ss: 0068
Mar 28 15:31:37 nd05 kernel: Process gfs_glockd (pid: 3979, threadinfo=f5cf6000 task=f6735030)
Mar 28 15:31:37 nd05 kernel: Stack: f89edf11 f8a7f000 f55517b0 f89e97f0 f560c180 f8a3c64f f560c180 00000003
Mar 28 15:31:37 nd05 kernel: f55517d4 f8a329d8 f8a7f000 f560c180 00000003 f55517b0 f8a61580 f55517b0
Mar 28 15:31:37 nd05 kernel: f8a7f000 f8a31f28 f55517b0 f55517b0 00000001 f8a31fdc d82c34c0 f55517b0
Mar 28 15:31:37 nd05 kernel: Call Trace:
Mar 28 15:31:37 nd05 kernel: [] lm_dlm_unlock+0x19/0x20 [lock_dlm]
Mar 28 15:31:37 nd05 kernel: [] gfs_lm_unlock+0x2c/0x43 [gfs]
Mar 28 15:31:37 nd05 kernel: [] gfs_glock_drop_th+0xe8/0x122 [gfs]
Mar 28 15:31:37 nd05 kernel: [] rq_demote+0x76/0x92 [gfs]
Mar 28 15:31:37 nd05 kernel: [] run_queue+0x54/0xb5 [gfs]
Mar 28 15:31:37 nd05 kernel: [] unlock_on_glock+0x1d/0x24 [gfs]
Mar 28 15:31:37 nd05 kernel: [] gfs_reclaim_glock+0xbd/0x135 [gfs]
Mar 28 15:31:37 nd05 kernel: [] gfs_glockd+0x3a/0xe3 [gfs]
Mar 28 15:31:37 nd05 kernel: [] default_wake_function+0x0/0x12
Mar 28 15:31:37 nd05 kernel: [] ret_from_fork+0x6/0x14
Mar 28 15:31:37 nd05 kernel: [] default_wake_function+0x0/0x12
Mar 28 15:31:37 nd05 kernel: [] gfs_glockd+0x0/0xe3 [gfs]
Mar 28 15:31:37 nd05 kernel: [] kernel_thread_helper+0x5/0xb
Mar 28 15:31:37 nd05 kernel: Code: 73 34 ff 73 2c ff 73 08 ff 73 04 ff 73 0c 56 8b 03 ff 70 18 68 09 e0 9e f8 e8 ac 14 73 c7 83 c4 34 68 11 df
9e f8 e8 9f 14 73 c7 <0f> 0b 65 01 58 de 9e f8 68 13 df 9e f8 e8 23 0d 73 c7 5b 5e c3
-->
What problem may be there?
Thanks for any reply!
Luckey
From troels at arvin.dk Mon Apr 3 14:16:55 2006
From: troels at arvin.dk (Troels Arvin)
Date: Mon, 03 Apr 2006 16:16:55 +0200
Subject: [Linux-cluster] Using a null modem for heartbeat with CS4?
Message-ID:
Hello,
I would like to have to heartbeat channels between my cluster nodes: A
cross-over ethernet cable and a null modem cable.
In the manual for Cluster Suite 3 (CS2), it's stated that a null modem
cable can be used for heartbeat.
The manual for CS4 doesn't mention null modem cables. Isn't it possible to
use null modem cables for heartbeat in CS4?
--
Greetings from Troels Arvin
From libregeek at gmail.com Mon Apr 3 14:20:03 2006
From: libregeek at gmail.com (Manilal K M)
Date: Mon, 3 Apr 2006 19:50:03 +0530
Subject: [Linux-cluster] Using a null modem for heartbeat with CS4?
In-Reply-To:
References:
Message-ID: <2315046d0604030720p1e2d4fc3n8b5f2708649e950f@mail.gmail.com>
On 03/04/06, Troels Arvin wrote:
> Hello,
>
> I would like to have to heartbeat channels between my cluster nodes: A
> cross-over ethernet cable and a null modem cable.
>
> In the manual for Cluster Suite 3 (CS2), it's stated that a null modem
> cable can be used for heartbeat.
>
> The manual for CS4 doesn't mention null modem cables. Isn't it possible to
> use null modem cables for heartbeat in CS4?
AFAIK, Null modems are not supported in CS4.
regards
Manilal
From Bowie_Bailey at BUC.com Mon Apr 3 14:30:36 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Mon, 3 Apr 2006 10:30:36 -0400
Subject: [Linux-cluster] GFS is for what and how it works ?
Message-ID: <4766EEE585A6D311ADF500E018C154E302133870@bnifex.cis.buc.com>
Halomoan wrote:
> Sorry, I'm newbie in GFS.
>
> Followed Redhat's GFS documentation
> To find out how GFS works, I have 2 nodes (node A and node B) for
> GFS and 1 node (node C) for GNBD server. It runs with no error but i
> don't know how to use it (GFS)
>
> I attached my /etc/cluster/cluster.conf below.
>
> My question is:
>
> 1. At a time, how many nodes have GFS filesystem mounted ? Where is
> the cluster's work in GFS ?
You can mount one node for each journal you created when you built the
GFS filesystem.
What the cluster does is manage access to the GFS filesystem and
(attempt to) ensure that if one node starts having problems, it can't
corrupt the filesystem.
> 2. How do I shared the GFS filesystem to other server ? Do I need
> other software ?
GFS is simply a filesystem which is capable of being used on multiple
nodes at the same time. How you mount it depends on what software or
hardware you are using to share the media. GNBD can be used by a
server to share it's storage with the other nodes. You can also use
iSCSI, aoe, and others to connect each node directly to a separate
storage unit.
> 3. From this configuration, if node A failed, what happen to the GFS
> filesystem ? failover to node B ? How about with the other server
> that is using the GFS filesystem in node A ?
There is no failover. Everything is always active. As long as the
storage itself doesn't fail, the failure of one node should not be a
problem. Unless, of course, it causes your cluster to lose quorum
(drop below the minimum number of servers necessary to maintain the
cluster).
> 4. Could you give me example what is actually the GFS real usage in
> real live ?
I'm using it to share a 1.2 TB storage area between two systems that
use it for processing and a third system that has direct access for
making backups.
> I'm absolutely confuse with this GFS on how they works.
Yea. The documentation is not very extensive at this point.
--
Bowie
From JACOB_LIBERMAN at Dell.com Mon Apr 3 20:16:17 2006
From: JACOB_LIBERMAN at Dell.com (JACOB_LIBERMAN at Dell.com)
Date: Mon, 3 Apr 2006 15:16:17 -0500
Subject: [Linux-cluster] Order of execution
Message-ID:
Hi cluster geniuses,
I have a quick question.
I am trying to write a custom startup script for an application called
adsi rms. The application comes with its own startup script that
requires the disk resource and network interface. Here is my question:
When I create a custom startup script for the service and place it in
/etc/init.d/, the cluster service can start the application successfully
but not all services come online because the shared disk and IP do not
appear to be available when the service starts.
Is there a way to set the order of execution for a service so that the
application will not start until AFTER the disk and network interface
are available?
Thanks again, Jacob
From eric at bootseg.com Mon Apr 3 20:26:44 2006
From: eric at bootseg.com (Eric Kerin)
Date: Mon, 03 Apr 2006 16:26:44 -0400
Subject: [Linux-cluster] Order of execution
In-Reply-To:
References:
Message-ID: <1144096004.4004.14.camel@auh5-0479.corp.jabil.org>
Jacob,
The start/stop orders are defined in /usr/share/cluster/service.sh
look under the special tag, there should be a child tag for each type of
child node of service.
Mine looks like so (current rgmanager rpm from RHN):
For starting, fs should start first, then clusterfs, etc... finally smb
and script start.
For stopping, script would be stopped first, then ip, etc... finally fs.
Thanks,
Eric Kerin
eric at bootseg.com
On Mon, 2006-04-03 at 15:16 -0500, JACOB_LIBERMAN at Dell.com wrote:
> Hi cluster geniuses,
>
> I have a quick question.
>
> I am trying to write a custom startup script for an application called
> adsi rms. The application comes with its own startup script that
> requires the disk resource and network interface. Here is my question:
>
> When I create a custom startup script for the service and place it in
> /etc/init.d/, the cluster service can start the application successfully
> but not all services come online because the shared disk and IP do not
> appear to be available when the service starts.
>
> Is there a way to set the order of execution for a service so that the
> application will not start until AFTER the disk and network interface
> are available?
>
> Thanks again, Jacob
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
From jbrassow at redhat.com Mon Apr 3 22:37:56 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Mon, 3 Apr 2006 17:37:56 -0500
Subject: [Linux-cluster] Manual fencing doest work
In-Reply-To:
References:
Message-ID: <6475746f533faa0d27117afbbcf54e7f@redhat.com>
Fence manual setup simply waits until either
1) the user reboots the failed node _and_ uses fence_ack_manaul to
notify the node asking for the fence that you have done so.
or
2) the node that "failed" comes back up
In the steps you described, you never acknowledged the request for
fencing - hence, you have to wait for the machine to come back up.
brassow
BTW, i'd never use manual fencing in production.
On Apr 3, 2006, at 5:30 AM, Thai Duong wrote:
> Hi all,
>
> I have a 2 node GFS 6.1 cluster with the following configuration:
>
>
>
>
> ???
> ???
>
> ???
> ?????
> ??????
> ???????
> ????????
> ???????
> ??????
> ?????
>
> ?????
> ??????
> ???????
> ????????
> ???????
> ??????
> ?????
> ??
>
> ?
> ??
> ?
>
> ?
>
> It turns out that manual fencing doest work as expected. When I force
> power down a node, the other could not fence it and worse, the whole
> GFS file system is freeze waiting for the downed node to be up again.
> I got something like below in kernel log
>
> Apr? 2 16:46:28 fcc1 fenced[3444]: fencing node "fcc4"
> Apr? 2 16:46:28 fcc1 fenced[3444]: fence "fcc4" failed
>
> Some information about GFS and kernel:
>
> [root at fcc1 ~]# rpm -qa | grep GFS
> GFS-6.1.3-0
> GFS-kernel-2.6.9-45.0.2
>
> [root at fcc1 ~]# uname -a
> Linux fcc1 2.6.9-22.0.2.EL #1 SMP Thu Jan 5 17:04:58 EST 2006 ia64
> ia64 ia64 GNU/Linux
>
> Please help.
>
> TIA,
>
> Thai Duong.
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
From teigland at redhat.com Tue Apr 4 03:08:53 2006
From: teigland at redhat.com (David Teigland)
Date: Mon, 3 Apr 2006 22:08:53 -0500
Subject: [Linux-cluster] Manual fencing doest work
In-Reply-To:
References:
Message-ID: <20060404030853.GA12817@redhat.com>
On Mon, Apr 03, 2006 at 05:30:16PM +0700, Thai Duong wrote:
>
>
>
Try "fencedevices" and "fencedevice".
Dave
From halomoan at powere2e.com Tue Apr 4 06:11:18 2006
From: halomoan at powere2e.com (Halomoan Chow)
Date: Tue, 4 Apr 2006 14:11:18 +0800
Subject: [Linux-cluster] GFS is for what and how it works ?
In-Reply-To: <4766EEE585A6D311ADF500E018C154E302133870@bnifex.cis.buc.com>
Message-ID: <001c01c657ae$9d9595f0$100fcc0a@pc002>
Thank you Bowie
You gave me a little light in GFS jungle :D
Regards,
Halomoan
-----Original Message-----
From: Bowie Bailey [mailto:Bowie_Bailey at BUC.com]
Sent: Monday, April 03, 2006 10:31 PM
To: halomoan at powere2e.com
Cc: linux clustering
Subject: RE: [Linux-cluster] GFS is for what and how it works ?
Halomoan wrote:
> Sorry, I'm newbie in GFS.
>
> Followed Redhat's GFS documentation
> To find out how GFS works, I have 2 nodes (node A and node B) for
> GFS and 1 node (node C) for GNBD server. It runs with no error but i
> don't know how to use it (GFS)
>
> I attached my /etc/cluster/cluster.conf below.
>
> My question is:
>
> 1. At a time, how many nodes have GFS filesystem mounted ? Where is
> the cluster's work in GFS ?
You can mount one node for each journal you created when you built the
GFS filesystem.
What the cluster does is manage access to the GFS filesystem and
(attempt to) ensure that if one node starts having problems, it can't
corrupt the filesystem.
> 2. How do I shared the GFS filesystem to other server ? Do I need
> other software ?
GFS is simply a filesystem which is capable of being used on multiple
nodes at the same time. How you mount it depends on what software or
hardware you are using to share the media. GNBD can be used by a
server to share it's storage with the other nodes. You can also use
iSCSI, aoe, and others to connect each node directly to a separate
storage unit.
> 3. From this configuration, if node A failed, what happen to the GFS
> filesystem ? failover to node B ? How about with the other server
> that is using the GFS filesystem in node A ?
There is no failover. Everything is always active. As long as the
storage itself doesn't fail, the failure of one node should not be a
problem. Unless, of course, it causes your cluster to lose quorum
(drop below the minimum number of servers necessary to maintain the
cluster).
> 4. Could you give me example what is actually the GFS real usage in
> real live ?
I'm using it to share a 1.2 TB storage area between two systems that
use it for processing and a third system that has direct access for
making backups.
> I'm absolutely confuse with this GFS on how they works.
Yea. The documentation is not very extensive at this point.
--
Bowie
From JACOB_LIBERMAN at Dell.com Tue Apr 4 12:55:44 2006
From: JACOB_LIBERMAN at Dell.com (JACOB_LIBERMAN at Dell.com)
Date: Tue, 4 Apr 2006 07:55:44 -0500
Subject: [Linux-cluster] Order of execution
Message-ID:
Eric,
I am running RHEL3 U4 with clumanager 1.2.22. I do not have the options
listed below.
Does anyone have an example script for this version? Lon?
Thanks, Jacob
> -----Original Message-----
> From: Eric Kerin [mailto:eric at bootseg.com]
> Sent: Monday, April 03, 2006 3:27 PM
> To: Liberman, Jacob
> Cc: linux clustering
> Subject: Re: [Linux-cluster] Order of execution
>
> Jacob,
>
> The start/stop orders are defined in
> /usr/share/cluster/service.sh look under the special tag,
> there should be a child tag for each type of child node of service.
>
> Mine looks like so (current rgmanager rpm from RHN):
>
>
>
>
>
>
>
>
>
>
>
>
> For starting, fs should start first, then clusterfs, etc...
> finally smb and script start.
>
> For stopping, script would be stopped first, then ip, etc...
> finally fs.
>
> Thanks,
> Eric Kerin
> eric at bootseg.com
>
>
> On Mon, 2006-04-03 at 15:16 -0500, JACOB_LIBERMAN at Dell.com wrote:
> > Hi cluster geniuses,
> >
> > I have a quick question.
> >
> > I am trying to write a custom startup script for an
> application called
> > adsi rms. The application comes with its own startup script that
> > requires the disk resource and network interface. Here is
> my question:
> >
> > When I create a custom startup script for the service and
> place it in
> > /etc/init.d/, the cluster service can start the application
> successfully
> > but not all services come online because the shared disk
> and IP do not
> > appear to be available when the service starts.
> >
> > Is there a way to set the order of execution for a service
> so that the
> > application will not start until AFTER the disk and network
> interface
> > are available?
> >
> > Thanks again, Jacob
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
From pcaulfie at redhat.com Tue Apr 4 13:40:52 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 04 Apr 2006 14:40:52 +0100
Subject: [Linux-cluster] Using a null modem for heartbeat with CS4?
In-Reply-To: <2315046d0604030720p1e2d4fc3n8b5f2708649e950f@mail.gmail.com>
References:
<2315046d0604030720p1e2d4fc3n8b5f2708649e950f@mail.gmail.com>
Message-ID: <44327764.4080108@redhat.com>
Manilal K M wrote:
> On 03/04/06, Troels Arvin wrote:
>> Hello,
>>
>> I would like to have to heartbeat channels between my cluster nodes: A
>> cross-over ethernet cable and a null modem cable.
>>
>> In the manual for Cluster Suite 3 (CS2), it's stated that a null modem
>> cable can be used for heartbeat.
>>
>> The manual for CS4 doesn't mention null modem cables. Isn't it possible to
>> use null modem cables for heartbeat in CS4?
> AFAIK, Null modems are not supported in CS4.
>
If you're really desperate you could set up a serial PPP link between the two
machines and do the IP heartbeat over that.
Don't tell anyone I said that though ;-)
--
patrick
From Alain.Moulle at bull.net Wed Apr 5 08:51:33 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Wed, 05 Apr 2006 10:51:33 +0200
Subject: [Linux-cluster] CS4 Update2 / cman systematically FAILED on service
stop
Message-ID: <44338515.10200@bull.net>
Hi
I have a systematic problem with cman stop on my configuration :
knowing that there is no service with autostart in
the cluster.conf, and that I have only one main service
to be started by : clusvcadm -e SERVICE -m
First test :
launch CS4 OK
stop CS4 OK
no problem
Second test :
launch CS4
clusvcadm -e SERVICE -m
then
clusvcadm -d SERVICE
stop CS4 ...
in this case, cman stop is systematically FAILED ...
This is true if both cases where CS4 is started
on peer node as well as where is it stopped.
Any clue or track to identify the problem ?
Thanks
Alain Moull?
From ben.yarwood at juno.co.uk Wed Apr 5 11:51:31 2006
From: ben.yarwood at juno.co.uk (Ben Yarwood)
Date: Wed, 5 Apr 2006 12:51:31 +0100
Subject: [Linux-cluster] Monitoring Cluster Services
Message-ID: <089401c658a7$481d72b0$3964a8c0@WS076>
I have set up a monitoring tool to check that all the appropriate processes
are running on our cluster nodes. I am currently checking for the
following:
ccsd , 1 instance
cman_comms, 1 instance
cman_memb , 1 instance
cman_serviced, 1 instance
cman_hbeat, 1 instance
fenced, 1 instance
clvmd, 1 instance
gfs_inoded, 1 instance for each gfs mount
clurgmgrd, 1 instance
Can anyone tell me if this is a correct and exhaustive list.
Regards
Ben
From ilya at cs.msu.su Wed Apr 5 15:27:57 2006
From: ilya at cs.msu.su (Ilya M. Slepnev)
Date: Wed, 05 Apr 2006 19:27:57 +0400
Subject: [Linux-cluster] Problems with compilation.
Message-ID: <1144250877.8183.19.camel@localhost.localdomain>
Hi,
I'm sorry for inconvenience, did anybody faced such problem with
configuring cluster-suite? It writes, that there is no directory named
"/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2", but it is
there... Am I doing something wrong? Is there some FAQ about that?
Thanks, Ilya...
khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make
cd dlm-kernel && make
make[1]: Entering directory
`/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
cd src2 && make all
make[2]: Entering directory
`/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
make -C M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2
modules USING_KBUILD=yes
make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No
such file or directory. Stop.
make: Entering an unknown directorymake: Leaving an unknown
directorymake[2]: *** [all] Error 2
make[2]: Leaving directory
`/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
make[1]: *** [all] Error 2
make[1]: Leaving directory
`/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
make: *** [all] Error 2
khext at hess:~/nigma/ext3/gfs/cvs/cluster$
From jbrassow at redhat.com Wed Apr 5 15:40:45 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Wed, 5 Apr 2006 10:40:45 -0500
Subject: [Linux-cluster] Problems with compilation.
In-Reply-To: <1144250877.8183.19.camel@localhost.localdomain>
References: <1144250877.8183.19.camel@localhost.localdomain>
Message-ID: <6e718842c9112d2f91e40fc31e3b29b9@redhat.com>
might want to skip the 'make' by itself... try:
dir/cluster> make clean; make distclean
dir/cluster> ./configure --kernel_src=
dir/cluster> make install
brassow
On Apr 5, 2006, at 10:27 AM, Ilya M. Slepnev wrote:
> Hi,
>
> I'm sorry for inconvenience, did anybody faced such problem with
> configuring cluster-suite? It writes, that there is no directory named
> "/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2", but it is
> there... Am I doing something wrong? Is there some FAQ about that?
>
> Thanks, Ilya...
>
> khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make
> cd dlm-kernel && make
> make[1]: Entering directory
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
> cd src2 && make all
> make[2]: Entering directory
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
> make -C M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2
> modules USING_KBUILD=yes
> make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No
> such file or directory. Stop.
> make: Entering an unknown directorymake: Leaving an unknown
> directorymake[2]: *** [all] Error 2
> make[2]: Leaving directory
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
> make: *** [all] Error 2
> khext at hess:~/nigma/ext3/gfs/cvs/cluster$
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
From ilya at cs.msu.su Wed Apr 5 16:16:25 2006
From: ilya at cs.msu.su (Ilya M. Slepnev)
Date: Wed, 05 Apr 2006 20:16:25 +0400
Subject: [Linux-cluster] Problems with compilation.
In-Reply-To: <6e718842c9112d2f91e40fc31e3b29b9@redhat.com>
References: <1144250877.8183.19.camel@localhost.localdomain>
<6e718842c9112d2f91e40fc31e3b29b9@redhat.com>
Message-ID: <1144253785.8185.27.camel@localhost.localdomain>
Surely, I tried that first... Here is a lot of output of configure and
"make install"... It seems not better than previous!-)
khext at hess:~/nigma/ext3/gfs/cvs/cluster$ ./configure --kernel_src=/home/khext/nigma/ext3/linux-2.6.16.1
configure dlm-kernel
Configuring Makefiles for your system...
Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 101.
configure gnbd-kernel
Configuring Makefiles for your system...
Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 95.
configure magma
Configuring Makefiles for your system...
Completed Makefile configuration
configure ccs
Configuring Makefiles for your system...
Completed Makefile configuration
configure cman
Configuring Makefiles for your system...
Completed Makefile configuration
configure dlm
Configuring Makefiles for your system...
Completed Makefile configuration
configure fence
Configuring Makefiles for your system...
Completed Makefile configuration
configure iddev
Configuring Makefiles for your system...
Completed Makefile configuration
configure gulm
Configuring Makefiles for your system...
Completed Makefile configuration
configure gfs-kernel
Configuring Makefiles for your system...
Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 107.
configure gfs2-kernel
Configuring Makefiles for your system...
Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 95.
configure gfs
Configuring Makefiles for your system...
Completed Makefile configuration
configure gfs2
Configuring Makefiles for your system...
Completed Makefile configuration
configure gnbd
Configuring Makefiles for your system...
Completed Makefile configuration
configure magma-plugins
Configuring Makefiles for your system...
Completed Makefile configuration
configure rgmanager
Configuring Makefiles for your system...
Completed Makefile configuration
configure cmirror
Configuring Makefiles for your system...
Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 95.
khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make install
cd dlm-kernel && make install
make[1]: Entering directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
cd src2 && make install
make[2]: Entering directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
make -C M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2 modules USING_KBUILD=yes
make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No such file or directory. Stop.
make: Entering an unknown directorymake: Leaving an unknown directorymake[2]: *** [all] Error 2
make[2]: Leaving directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
make[1]: *** [install] Error 2
make[1]: Leaving directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
make: *** [install] Error 2
khext at hess:~/nigma/ext3/gfs/cvs/cluster$
On Wed, 2006-04-05 at 10:40 -0500, Jonathan E Brassow wrote:
> might want to skip the 'make' by itself... try:
>
> dir/cluster> make clean; make distclean
> dir/cluster> ./configure --kernel_src=
> dir/cluster> make install
>
> brassow
> On Apr 5, 2006, at 10:27 AM, Ilya M. Slepnev wrote:
>
> > Hi,
> >
> > I'm sorry for inconvenience, did anybody faced such problem with
> > configuring cluster-suite? It writes, that there is no directory named
> > "/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2", but it is
> > there... Am I doing something wrong? Is there some FAQ about that?
> >
> > Thanks, Ilya...
> >
> > khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make
> > cd dlm-kernel && make
> > make[1]: Entering directory
> > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
> > cd src2 && make all
> > make[2]: Entering directory
> > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
> > make -C M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2
> > modules USING_KBUILD=yes
> > make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No
> > such file or directory. Stop.
> > make: Entering an unknown directorymake: Leaving an unknown
> > directorymake[2]: *** [all] Error 2
> > make[2]: Leaving directory
> > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
> > make[1]: *** [all] Error 2
> > make[1]: Leaving directory
> > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
> > make: *** [all] Error 2
> > khext at hess:~/nigma/ext3/gfs/cvs/cluster$
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
From jbrassow at redhat.com Wed Apr 5 18:36:55 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Wed, 5 Apr 2006 13:36:55 -0500
Subject: [Linux-cluster] Problems with compilation.
In-Reply-To: <1144253785.8185.27.camel@localhost.localdomain>
References: <1144250877.8183.19.camel@localhost.localdomain>
<6e718842c9112d2f91e40fc31e3b29b9@redhat.com>
<1144253785.8185.27.camel@localhost.localdomain>
Message-ID: <23453f82d4985b73787dc15e364ee7aa@redhat.com>
did you setup and do a 'make' in your kernel tree. Failing to do that
will give those errors.
brassow
On Apr 5, 2006, at 11:16 AM, Ilya M. Slepnev wrote:
> Surely, I tried that first... Here is a lot of output of configure and
> "make install"... It seems not better than previous!-)
>
> khext at hess:~/nigma/ext3/gfs/cvs/cluster$ ./configure
> --kernel_src=/home/khext/nigma/ext3/linux-2.6.16.1
> configure dlm-kernel
>
> Configuring Makefiles for your system...
> Can't open
> /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at
> ./configure line 101.
> configure gnbd-kernel
>
> Configuring Makefiles for your system...
> Can't open
> /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at
> ./configure line 95.
> configure magma
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure ccs
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure cman
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure dlm
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure fence
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure iddev
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure gulm
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure gfs-kernel
>
> Configuring Makefiles for your system...
> Can't open
> /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at
> ./configure line 107.
> configure gfs2-kernel
>
> Configuring Makefiles for your system...
> Can't open
> /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at
> ./configure line 95.
> configure gfs
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure gfs2
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure gnbd
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure magma-plugins
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure rgmanager
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure cmirror
>
> Configuring Makefiles for your system...
> Can't open
> /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at
> ./configure line 95.
> khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make install
> cd dlm-kernel && make install
> make[1]: Entering directory
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
> cd src2 && make install
> make[2]: Entering directory
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
> make -C M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2
> modules USING_KBUILD=yes
> make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No
> such file or directory. Stop.
> make: Entering an unknown directorymake: Leaving an unknown
> directorymake[2]: *** [all] Error 2
> make[2]: Leaving directory
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
> make[1]: *** [install] Error 2
> make[1]: Leaving directory
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
> make: *** [install] Error 2
> khext at hess:~/nigma/ext3/gfs/cvs/cluster$
>
>
>
>
> On Wed, 2006-04-05 at 10:40 -0500, Jonathan E Brassow wrote:
>> might want to skip the 'make' by itself... try:
>>
>> dir/cluster> make clean; make distclean
>> dir/cluster> ./configure --kernel_src=
>> dir/cluster> make install
>>
>> brassow
>> On Apr 5, 2006, at 10:27 AM, Ilya M. Slepnev wrote:
>>
>>> Hi,
>>>
>>> I'm sorry for inconvenience, did anybody faced such problem with
>>> configuring cluster-suite? It writes, that there is no directory
>>> named
>>> "/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2", but it is
>>> there... Am I doing something wrong? Is there some FAQ about that?
>>>
>>> Thanks, Ilya...
>>>
>>> khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make
>>> cd dlm-kernel && make
>>> make[1]: Entering directory
>>> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
>>> cd src2 && make all
>>> make[2]: Entering directory
>>> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
>>> make -C M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2
>>> modules USING_KBUILD=yes
>>> make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2:
>>> No
>>> such file or directory. Stop.
>>> make: Entering an unknown directorymake: Leaving an unknown
>>> directorymake[2]: *** [all] Error 2
>>> make[2]: Leaving directory
>>> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
>>> make[1]: *** [all] Error 2
>>> make[1]: Leaving directory
>>> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
>>> make: *** [all] Error 2
>>> khext at hess:~/nigma/ext3/gfs/cvs/cluster$
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
From jeffbethke at aol.net Wed Apr 5 20:57:54 2006
From: jeffbethke at aol.net (Jeffrey Bethke)
Date: Wed, 05 Apr 2006 16:57:54 -0400
Subject: [Linux-cluster] speeding up df/statvfs( ) calls to large GFS
volumes?
Message-ID: <44342F52.3030608@aol.net>
Hi!
Is there a way to speed up the return vaules for df/statvfs( ) when
using large GFS volume(e.g 25TB+)? I'm currently working a problem
where, as part of disk monitoring, we need to run a statvfs( ) every
few minutes. The problem is that we can't determine the interval of
running the tool as GFS can, on occasion, take a long time to return a
value!
So, is there any variable I can tweak w/ gfs_tool, or mount option I can
apply outside of 'noatime', that will help things like 'df -h' run
consistently faster?
Help?
Thanks!
.jeff
From mtp at tilted.com Thu Apr 6 01:22:08 2006
From: mtp at tilted.com (Mark Petersen)
Date: Wed, 05 Apr 2006 20:22:08 -0500
Subject: [Linux-cluster] GNBD, CLVM and snapshots
Message-ID: <7.0.1.0.2.20060405195416.02784ab0@tilted.com>
I'm wanting to use gnbd with clvm to export block devices for 3
(possibly more) hosts running Xen. Each host will have access to the
single gnbd export with LVM. Only a single host will ever actually
have the device mounted. GNBD can support live migrations with a
block device, which is the main attraction.
So a little info on Xen and what I want to do. There are dom0's
(privileged VM) that have full access to any running domU (VM
instances started by the dom0.) The dom0 will be running
clvm/CCS/gnbd-Client/etc. The dom0 will start a domU that mounts the
lv, only the dom0 needs direct access to this resource. In this
configuration, would it be possible to take snapshots of the LV from
the dom0? What about from another dom0 in the cluster? What about
the gnbd-server?
Is work still be done on csnap? There isn't much documentation on
this, and it seems like it might be GFS specific.
If this won't work with clvm and gnbd, is there an alternative that
would work? I really want to be able to do snapshots and live
migration with block devices. I'm not sure this is possible. I may
fallback to only live migrations with gnbd if I have to.
Finally, ideally this would be backed by DRBD, but can gnbd handle a
primary/secondary role instead of doing multipath (which won't work
with drbd.) Failover mode was mentioned in posts from over a year
ago, and it sounds promising.
From starstom at gmail.com Thu Apr 6 03:53:34 2006
From: starstom at gmail.com (Tom Stars)
Date: Thu, 6 Apr 2006 09:23:34 +0530
Subject: [Linux-cluster] About Linux Cluster
Message-ID: <551992020604052053m7bbc7f8cua7f20da14cf0d28f@mail.gmail.com>
Hi
I am newbie to linux clusters. i would like to setup a linux cluster of 4
nodes, and a DAS box for Storage connected to
linux systems.through an optical fiber. All linux systems are running RHEL
4.0. AS
Q1)Do i need GFS to be configured in case i have to run oracle on the
cluster nodes . (Oracle 11i Application Server)
Q2) when do i need GFS.
Q3) If the DAS is mounted on 1 node and create an NFS Server and provides
shares to other nodes, does it affect the performance.
Thanks.
Tom.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Alain.Moulle at bull.net Thu Apr 6 07:13:28 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Thu, 06 Apr 2006 09:13:28 +0200
Subject: [Linux-cluster] RE: CS4 Update2 / cman systematically FAILED on
service stop /// New question ///
Message-ID: <4434BF98.8070002@bull.net>
I've identified the problem : in fact, that was due to
a process launched via the SERVICE script, but which
was not stopped on clusvcadm -s SERVICE (or -d) .
Then, on service cman stop, the modprobe -r dlm was successful
but at the end of this modprobe -r, the lsmod
indicates one user left on cman :
cman 136480 1
but without user identification (such as "cman 136480 10 dlm" when cs4
is all active).
So the modprobe -r cman was then impossible.
Could someone explain to me the link between a process
managed in the SERVICE script and the remaining 1 user
on cman ?
Thanks
Alain Moull?
>> I have a systematic problem with cman stop on my configuration :
>> knowing that there is no service with autostart in
>> the cluster.conf, and that I have only one main service
>> to be started by : clusvcadm -e SERVICE -m
>> First test :
>> launch CS4 OK
>> stop CS4 OK
>> no problem
>> Second test :
>> launch CS4
>> clusvcadm -e SERVICE -m
>> then
>> clusvcadm -d SERVICE
>> stop CS4 ...
>> in this case, cman stop is systematically FAILED ...
>> This is true if both cases where CS4 is started
>> on peer node as well as where is it stopped.
>> Any clue or track to identify the problem ?
From pcaulfie at redhat.com Thu Apr 6 07:25:53 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 06 Apr 2006 08:25:53 +0100
Subject: [Linux-cluster] RE: CS4 Update2 / cman systematically FAILED
on service stop /// New question ///
In-Reply-To: <4434BF98.8070002@bull.net>
References: <4434BF98.8070002@bull.net>
Message-ID: <4434C281.6010804@redhat.com>
Alain Moulle wrote:
> I've identified the problem : in fact, that was due to
> a process launched via the SERVICE script, but which
> was not stopped on clusvcadm -s SERVICE (or -d) .
> Then, on service cman stop, the modprobe -r dlm was successful
> but at the end of this modprobe -r, the lsmod
> indicates one user left on cman :
> cman 136480 1
> but without user identification (such as "cman 136480 10 dlm" when cs4
> is all active).
> So the modprobe -r cman was then impossible.
>
> Could someone explain to me the link between a process
> managed in the SERVICE script and the remaining 1 user
> on cman ?
There's no direct link. The usage count on cman is simply the number of links
to it. They could be kernel or userspace users.
In this case it could be CCS. Even if the cluster isn't operating, ccs polls
the cluster manager to see if has come back up.
--
patrick
From figaro at neo-info.net Thu Apr 6 09:44:27 2006
From: figaro at neo-info.net (Figaro Yang)
Date: Thu, 6 Apr 2006 17:44:27 +0800
Subject: [Linux-cluster] lock_gulm.ko needs unknown symbol tap_sig
Message-ID: <011701c6595e$b8837a60$c800a8c0@neooffice>
Hi ~ All?
I have some question for rebuild gfs kernel , that has some error messages :
if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map
2.6.11.img;fi
WARNING: /lib/modules/2.6.11/kernel/fs/gfs_locking/lock_gulm/lock_gulm.ko
needs unknown symbol tap_sig
WARNING: /lib/modules/2.6.11/kernel/fs/gfs_locking/lock_gulm/lock_gulm.ko
needs unknown symbol watch_sig
WARNING: /lib/modules/2.6.11/kernel/fs/gfs_locking/lock_gulm/lock_gulm.ko
needs unknown symbol sig_watcher_init
WARNING: /lib/modules/2.6.11/kernel/fs/gfs_locking/lock_gulm/lock_gulm.ko
needs unknown symbol sig_watcher_lock_drop
how to fix this error ?
thanks all help !!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From ocrete at max-t.com Thu Apr 6 16:34:41 2006
From: ocrete at max-t.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=)
Date: Thu, 06 Apr 2006 12:34:41 -0400
Subject: [Linux-cluster] cman kickout out nodes for no good reason
Message-ID: <1144341281.355.38.camel@cocagne.max-t.internal>
Hi,
I have a strange problem where cman suddenly starts kicking out members
of the cluster with "Inconsistent cluster view" when I join a new node
(sometimes). It takes a few minutes between each kicking. I'm using a
snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is
in transition state at that point and I can't stop/start services or do
anything else. It did not do that with a snapshot I took a few months
ago.
--
Olivier Cr?te
ocrete at max-t.com
Maximum Throughput Inc.
From charlie.sharkey at bustech.com Wed Apr 5 17:40:48 2006
From: charlie.sharkey at bustech.com (Charlie Sharkey)
Date: Wed, 5 Apr 2006 13:40:48 -0400
Subject: [Linux-cluster] two node cluster startup problem
Message-ID: <03FB5D708BE3C8448E8079186A56CDE67658CD@BTIBURMAIL.bustech.com>
Hi,
I'm having trouble with a two node cluster. The second node ("one")
gets the config from "zero" ok, but won't join the cluster. It instead
starts it's own cluster (according to /proc/cluster/nodes). My config
file is below, any help would be appreciated. thanks !
From lhh at redhat.com Thu Apr 6 20:34:25 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 06 Apr 2006 16:34:25 -0400
Subject: [Linux-cluster] Monitoring Cluster Services
In-Reply-To: <089401c658a7$481d72b0$3964a8c0@WS076>
References: <089401c658a7$481d72b0$3964a8c0@WS076>
Message-ID: <1144355665.3723.1.camel@ayanami.boston.redhat.com>
On Wed, 2006-04-05 at 12:51 +0100, Ben Yarwood wrote:
> I have set up a monitoring tool to check that all the appropriate processes
> are running on our cluster nodes. I am currently checking for the
> following:
>
> ccsd , 1 instance
> cman_comms, 1 instance
> cman_memb , 1 instance
> cman_serviced, 1 instance
> cman_hbeat, 1 instance
> fenced, 1 instance
> clvmd, 1 instance
> gfs_inoded, 1 instance for each gfs mount
> clurgmgrd, 1 instance
>
> Can anyone tell me if this is a correct and exhaustive list.
Looks like it's missing DLM threads.
-- Lon
From lhh at redhat.com Thu Apr 6 20:41:17 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 06 Apr 2006 16:41:17 -0400
Subject: [Linux-cluster] Order of execution
In-Reply-To:
References:
Message-ID: <1144356077.3723.10.camel@ayanami.boston.redhat.com>
On Tue, 2006-04-04 at 07:55 -0500, JACOB_LIBERMAN at Dell.com wrote:
> Eric,
>
> I am running RHEL3 U4 with clumanager 1.2.22. I do not have the options
> listed below.
>
> Does anyone have an example script for this version? Lon?
The linux-cluster / RHCS4 ordering is directly taken from RHCS3:
(a) mount file systems
(b) bring up IPs
(c) start user service (only can have one in RHCS3)
Is the cluster controlling all of the components, or is it only
controlling some of them? It sounds like it should work.
-- Lon
From gstaltari at arnet.net.ar Thu Apr 6 21:19:47 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Thu, 06 Apr 2006 18:19:47 -0300
Subject: [Linux-cluster] GFS and CPU time
Message-ID: <443585F3.4090100@arnet.net.ar>
Hi, we've created a 6 node cluster with GFS filesystem. The question is
why there's always one node that the CPU time of those GFS/lock related
processes is a lot higher than the others.
Node 1
root 3799 0.0 0.0 0 0 ? S< Mar31 0:00
[dlm_recoverd]
root 3806 0.1 0.0 0 0 ? S< Mar31 16:37 [lock_dlm1]
root 3807 0.1 0.0 0 0 ? S< Mar31 16:40 [lock_dlm2]
root 3808 1.0 0.0 0 0 ? S Mar31 102:27 [gfs_scand]
root 3809 0.1 0.0 0 0 ? S Mar31 18:05
[gfs_glockd]
root 3810 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_recoverd]
root 3811 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_logd]
root 3812 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_quotad]
root 3813 0.0 0.0 0 0 ? S Mar31 0:18
[gfs_inoded]
Node 2
root 4230 0.0 0.0 0 0 ? S< Mar31 0:00
[dlm_recoverd]
root 4237 0.0 0.0 0 0 ? S< Mar31 4:16 [lock_dlm1]
root 4238 0.0 0.0 0 0 ? S< Mar31 4:13 [lock_dlm2]
root 4239 0.4 0.0 0 0 ? S Mar31 38:01 [gfs_scand]
root 4240 0.0 0.0 0 0 ? S Mar31 2:58
[gfs_glockd]
root 4241 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_recoverd]
root 4242 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_logd]
root 4243 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_quotad]
root 4244 0.0 0.0 0 0 ? S Mar31 0:45
[gfs_inoded]
Node 3
root 4124 0.0 0.0 0 0 ? S< Mar31 0:00
[dlm_recoverd]
root 4131 0.0 0.0 0 0 ? S< Mar31 2:29 [lock_dlm1]
root 4132 0.0 0.0 0 0 ? S< Mar31 2:29 [lock_dlm2]
root 4133 0.9 0.0 0 0 ? S Mar31 88:45 [gfs_scand]
root 4134 0.0 0.0 0 0 ? S Mar31 2:35
[gfs_glockd]
root 4135 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_recoverd]
root 4136 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_logd]
root 4137 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_quotad]
root 4138 0.0 0.0 0 0 ? S Mar31 0:06
[gfs_inoded]
Node 4
root 17576 0.0 0.0 0 0 ? S< Mar31 0:00
[dlm_recoverd]
root 17577 0.0 0.0 0 0 ? S< Mar31 0:00 [lock_dlm1]
root 17578 0.0 0.0 0 0 ? S< Mar31 0:00 [lock_dlm2]
root 17579 0.0 0.0 0 0 ? S Mar31 0:01 [gfs_scand]
root 17580 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_glockd]
root 17581 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_recoverd]
root 17582 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_logd]
root 17583 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_quotad]
root 17584 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_inoded]
Node 5
root 30784 0.0 0.0 0 0 ? S< Mar31 0:00
[dlm_recoverd]
root 30785 0.0 0.0 0 0 ? S< Mar31 0:47 [lock_dlm1]
root 30786 0.0 0.0 0 0 ? S< Mar31 0:46 [lock_dlm2]
root 30787 0.2 0.0 0 0 ? S Mar31 10:00
[gfs_scand]
root 30788 0.0 0.0 0 0 ? S Mar31 0:50
[gfs_glockd]
root 30789 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_recoverd]
root 30790 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_logd]
root 30791 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_quotad]
root 30792 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_inoded]
Node 6
root 4273 0.0 0.0 0 0 ? S< Mar31 0:00
[dlm_recoverd]
root 4274 0.0 0.0 0 0 ? S< Mar31 0:18 [lock_dlm1]
root 4275 0.0 0.0 0 0 ? S< Mar31 0:17 [lock_dlm2]
root 4276 0.1 0.0 0 0 ? S Mar31 5:36 [gfs_scand]
root 4277 0.0 0.0 0 0 ? S Mar31 0:22
[gfs_glockd]
root 4278 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_recoverd]
root 4279 0.0 0.0 0 0 ? S Mar31 0:00 [gfs_logd]
root 4280 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_quotad]
root 4281 0.0 0.0 0 0 ? S Mar31 0:00
[gfs_inoded]
FC 4
kernel-smp-2.6.15-1.1831_FC4
dlm-kernel-smp-2.6.11.5-20050601.152643.FC4.21
GFS-kernel-smp-2.6.11.8-20050601.152643.FC4.24
cman-kernel-smp-2.6.11.5-20050601.152643.FC4.22
TIA
German Staltari
From ben.yarwood at juno.co.uk Thu Apr 6 22:45:59 2006
From: ben.yarwood at juno.co.uk (Ben Yarwood)
Date: Thu, 6 Apr 2006 23:45:59 +0100
Subject: [Linux-cluster] Monitoring Cluster Services
In-Reply-To: <1144355665.3723.1.camel@ayanami.boston.redhat.com>
Message-ID: <093c01c659cb$df9bf150$3964a8c0@WS076>
Is there one instance of each of the following?
dlm_astd
dlm_recvd
dlm_sendd
Cheers
Ben
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
> Sent: 06 April 2006 21:34
> To: linux clustering
> Subject: Re: [Linux-cluster] Monitoring Cluster Services
>
> On Wed, 2006-04-05 at 12:51 +0100, Ben Yarwood wrote:
> > I have set up a monitoring tool to check that all the appropriate
> > processes are running on our cluster nodes. I am currently
> checking
> > for the
> > following:
> >
> > ccsd , 1 instance
> > cman_comms, 1 instance
> > cman_memb , 1 instance
> > cman_serviced, 1 instance
> > cman_hbeat, 1 instance
> > fenced, 1 instance
> > clvmd, 1 instance
> > gfs_inoded, 1 instance for each gfs mount clurgmgrd, 1 instance
> >
> > Can anyone tell me if this is a correct and exhaustive list.
>
> Looks like it's missing DLM threads.
>
> -- Lon
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
From ookami at gmx.de Fri Apr 7 04:36:04 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Fri, 7 Apr 2006 06:36:04 +0200 (MEST)
Subject: [Linux-cluster] newbie: gfs merge
Message-ID: <5174.1144384564@www022.gmx.net>
Hi,
I installed gfs and all the cluster stuff on our systems and I didn't have
the impression that I missed any of the steps in the manual. So I have to
nodes which both have a gfs partition mounted. I can also mount these, if
I exported them with gnbd. But I don't see the big difference to nfs yet
(apart from maybe performance). I thought that if I name the
gfs-partitions the same (clustername:gfs1) they would be magically merged
or something like that. I thought this was meant by the notion in the docs
that GFS does not have a single point of failure. Or that we could have
redundant file-servers. What did I get wrong about all that?
P.S.: I did the changes to /etc/lvm/lvm.conf regarding the locking
(locking_type=2).
Thanks for any help!!!
wolfgang
--
E-Mails und Internet immer und ?berall!
1&1 PocketWeb, perfekt mit GMX: http://www.gmx.net/de/go/pocketweb
From pcaulfie at redhat.com Fri Apr 7 07:20:23 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Fri, 07 Apr 2006 08:20:23 +0100
Subject: [Linux-cluster] two node cluster startup problem
In-Reply-To: <03FB5D708BE3C8448E8079186A56CDE67658CD@BTIBURMAIL.bustech.com>
References: <03FB5D708BE3C8448E8079186A56CDE67658CD@BTIBURMAIL.bustech.com>
Message-ID: <443612B7.6010202@redhat.com>
Charlie Sharkey wrote:
> Hi,
>
> I'm having trouble with a two node cluster. The second node ("one")
> gets the config from "zero" ok, but won't join the cluster. It instead
> starts it's own cluster (according to /proc/cluster/nodes). My config
> file is below, any help would be appreciated. thanks !
>
Check you don't have any firewalling enabled. It's most likely that the nodes
can't talk to each other. You'll need to open ports 6809/udp and 21064/tcp.
Also check that you can ping and/or ssh between the machines.
--
patrick
From Michael.Roethlein at ri-solution.com Fri Apr 7 08:51:29 2006
From: Michael.Roethlein at ri-solution.com (=?iso-8859-1?Q?R=F6thlein_Michael_=28RI-Solution=29?=)
Date: Fri, 7 Apr 2006 10:51:29 +0200
Subject: [Linux-cluster] GFS freezes without a trace
Message-ID: <992633B6A0E42B49BC5A41C10A8C841B01DB222B@MUCEX004.root.local>
Hi,
In the last days it occured several times that gfs got lost, but I could not find any trace in any logfile I could think of.
We have a 4 node cluster with each node attached to one storage with one gfs partition.
Is there a gfs or whatever logfile i might have not found or is it possible to enable debugging?
Thanks in Advance
Yours
Michael
From Bowie_Bailey at BUC.com Fri Apr 7 13:42:26 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Fri, 7 Apr 2006 09:42:26 -0400
Subject: [Linux-cluster] newbie: gfs merge
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338A7@bnifex.cis.buc.com>
wolfgang pauli wrote:
>
> I installed gfs and all the cluster stuff on our systems and I didn't
> have the impression that I missed any of the steps in the manual. So
> I have to nodes which both have a gfs partition mounted. I can also
> mount these, if I exported them with gnbd. But I don't see the big
> difference to nfs yet (apart from maybe performance). I thought that
> if I name the gfs-partitions the same (clustername:gfs1) they would
> be magically merged or something like that. I thought this was meant
> by the notion in the docs that GFS does not have a single point of
> failure. Or that we could have redundant file-servers. What did I get
> wrong about all that?
It sounds like you are a bit confused about what GFS does. I replied
to someone within the last week or so on almost the same issue. Check
the archives.
GFS is a filesystem that allows multiple nodes to access and update it
at the same time. The cluster services manage the nodes and try to
prevent a misbehaving node from corrupting the filesystem.
If you have hard drives in all of your nodes, GFS and the cluster will
not help you make them into one big shared storage area -- at least not
yet, I believe there is a beta (alpha?) project out there somewhere.
If you have a big storage area, GFS and the cluster _will_ allow you
to connect all of your nodes to it.
The redundancy comes in the fact that you have multiple machines
running from the same storage area. If one of the machines goes down,
the others can continue working. In a load-balanced configuration,
the loss of one of the nodes will be transparent to the users. In
theory, of course... If the storage dies, that's another issue.
Hopefully, your storage is raid and can handle a disk failure.
--
Bowie
From charlie.sharkey at bustech.com Fri Apr 7 14:00:08 2006
From: charlie.sharkey at bustech.com (Charlie Sharkey)
Date: Fri, 7 Apr 2006 10:00:08 -0400
Subject: [Linux-cluster] two node cluster startup problem
Message-ID: <03FB5D708BE3C8448E8079186A56CDE67659B4@BTIBURMAIL.bustech.com>
That was it, problem solved. Ping worked ok, but not ssh. I stopped both
the portmap and iptables services and now it joins ok.
Thanks for your help !
charlie
From ookami at gmx.de Fri Apr 7 19:22:51 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Fri, 7 Apr 2006 21:22:51 +0200 (MEST)
Subject: [Linux-cluster] newbie: gfs merge
References: <4766EEE585A6D311ADF500E018C154E3021338A7@bnifex.cis.buc.com>
Message-ID: <20750.1144437771@www010.gmx.net>
> > I installed gfs and all the cluster stuff on our systems and I didn't
> > have the impression that I missed any of the steps in the manual. So
> > I have to nodes which both have a gfs partition mounted. I can also
> > mount these, if I exported them with gnbd. But I don't see the big
> > difference to nfs yet (apart from maybe performance). I thought that
> > if I name the gfs-partitions the same (clustername:gfs1) they would
> > be magically merged or something like that. I thought this was meant
> > by the notion in the docs that GFS does not have a single point of
> > failure. Or that we could have redundant file-servers. What did I get
> > wrong about all that?
>
> It sounds like you are a bit confused about what GFS does. I replied
> to someone within the last week or so on almost the same issue. Check
> the archives.
>
> GFS is a filesystem that allows multiple nodes to access and update it
> at the same time. The cluster services manage the nodes and try to
> prevent a misbehaving node from corrupting the filesystem.
>
> If you have hard drives in all of your nodes, GFS and the cluster will
> not help you make them into one big shared storage area -- at least not
> yet, I believe there is a beta (alpha?) project out there somewhere.
> If you have a big storage area, GFS and the cluster _will_ allow you
> to connect all of your nodes to it.
>
> The redundancy comes in the fact that you have multiple machines
> running from the same storage area. If one of the machines goes down,
> the others can continue working. In a load-balanced configuration,
> the loss of one of the nodes will be transparent to the users. In
> theory, of course... If the storage dies, that's another issue.
> Hopefully, your storage is raid and can handle a disk failure.
>
> --
> Bowie
Hm... Thanks for you answer! I am definetelly confused a bit. Even after
reading you post of last week. I understand that i can not merge the file
systems. Our setup is very basic. We have to linux machines who could act
as file server and we thought that we could one (A) have working as an
active backup of the other (B). Is that what the documentation calls a
failover domain, with (B) being the failover "domain" for (A)? Until now,
we were running rsync at night, so that if the first of the two servers
failed, clients could mount the NFS from the other server. There is
nothing fancy here, like a SAN I guess, just machines connected via
ethernet switches. So basically the question is, whether it is possible to
keep the filesystems on the two servers in total sync, so that it would
not matter whether clients mount the remote share from (A) or (B). Whether
the clients would automatically be able to mount the GFS from (B), if (A)
fails.
Wolfgang
--
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner
From Bowie_Bailey at BUC.com Fri Apr 7 19:32:38 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Fri, 7 Apr 2006 15:32:38 -0400
Subject: [Linux-cluster] newbie: gfs merge
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338B8@bnifex.cis.buc.com>
wolfgang pauli wrote:
> > > I installed gfs and all the cluster stuff on our systems and I
> > > didn't have the impression that I missed any of the steps in the
> > > manual. So I have to nodes which both have a gfs partition
> > > mounted. I can also mount these, if I exported them with gnbd.
> > > But I don't see the big difference to nfs yet (apart from maybe
> > > performance). I thought that if I name the gfs-partitions the
> > > same (clustername:gfs1) they would be magically merged or
> > > something like that. I thought this was meant by the notion in
> > > the docs that GFS does not have a single point of failure. Or
> > > that we could have redundant file-servers. What did I get wrong
> > > about all that?
> >
> > It sounds like you are a bit confused about what GFS does. I
> > replied to someone within the last week or so on almost the same
> > issue. Check the archives.
> >
> > GFS is a filesystem that allows multiple nodes to access and
> > update it at the same time. The cluster services manage the nodes
> > and try to prevent a misbehaving node from corrupting the
> > filesystem.
> >
> > If you have hard drives in all of your nodes, GFS and the cluster
> > will not help you make them into one big shared storage area -- at
> > least not yet, I believe there is a beta (alpha?) project out
> > there somewhere. If you have a big storage area, GFS and the
> > cluster _will_ allow you to connect all of your nodes to it.
> >
> > The redundancy comes in the fact that you have multiple machines
> > running from the same storage area. If one of the machines goes
> > down, the others can continue working. In a load-balanced
> > configuration, the loss of one of the nodes will be transparent to
> > the users. In theory, of course... If the storage dies, that's
> > another issue. Hopefully, your storage is raid and can handle a
> > disk failure.
>
> Hm... Thanks for you answer! I am definetelly confused a bit. Even
> after reading you post of last week. I understand that i can not
> merge the file systems. Our setup is very basic. We have to linux
> machines who could act as file server and we thought that we could
> one (A) have working as an active backup of the other (B). Is that
> what the documentation calls a failover domain, with (B) being the
> failover "domain" for (A)? Until now, we were running rsync at
> night, so that if the first of the two servers failed, clients could
> mount the NFS from the other server. There is nothing fancy here,
> like a SAN I guess, just machines connected via ethernet switches.
> So basically the question is, whether it is possible to keep the
> filesystems on the two servers in total sync, so that it would not
> matter whether clients mount the remote share from (A) or (B).
> Whether the clients would automatically be able to mount the GFS
> from (B), if (A) fails.
No, GFS doesn't work quite like that. What you have is something more
like this: Two machines, (A) and (B), are file servers. A third
machine, (C), is either a linux box exporting it's filesystem via
GNBD, or a dedicated storage box running iSCSI, AoE, or something
similar that will allow multiple connections. (A) and (B) are both
connected to the GFS filesystem exported by (C). If either (A) or (B)
goes down, the other one can continue serving the data from (C). They
don't need to be synchronized because they are using the same physical
storage. And, if the application permits, you can even run them both
simultaneously.
You are looking for something different. There is a project out there
for that, but it is not production ready at this point. Maybe someone
else remembers the name.
--
Bowie
From ookami at gmx.de Fri Apr 7 21:01:06 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Fri, 7 Apr 2006 23:01:06 +0200 (MEST)
Subject: [Linux-cluster] newbie: gfs merge
References: <4766EEE585A6D311ADF500E018C154E3021338B8@bnifex.cis.buc.com>
Message-ID: <4720.1144443666@www010.gmx.net>
> > > > I installed gfs and all the cluster stuff on our systems and I
> > > > didn't have the impression that I missed any of the steps in the
> > > > manual. So I have to nodes which both have a gfs partition
> > > > mounted. I can also mount these, if I exported them with gnbd.
> > > > But I don't see the big difference to nfs yet (apart from maybe
> > > > performance). I thought that if I name the gfs-partitions the
> > > > same (clustername:gfs1) they would be magically merged or
> > > > something like that. I thought this was meant by the notion in
> > > > the docs that GFS does not have a single point of failure. Or
> > > > that we could have redundant file-servers. What did I get wrong
> > > > about all that?
> > >
> > > It sounds like you are a bit confused about what GFS does. I
> > > replied to someone within the last week or so on almost the same
> > > issue. Check the archives.
> > >
> > > GFS is a filesystem that allows multiple nodes to access and
> > > update it at the same time. The cluster services manage the nodes
> > > and try to prevent a misbehaving node from corrupting the
> > > filesystem.
> > >
> > > If you have hard drives in all of your nodes, GFS and the cluster
> > > will not help you make them into one big shared storage area -- at
> > > least not yet, I believe there is a beta (alpha?) project out
> > > there somewhere. If you have a big storage area, GFS and the
> > > cluster _will_ allow you to connect all of your nodes to it.
> > >
> > > The redundancy comes in the fact that you have multiple machines
> > > running from the same storage area. If one of the machines goes
> > > down, the others can continue working. In a load-balanced
> > > configuration, the loss of one of the nodes will be transparent to
> > > the users. In theory, of course... If the storage dies, that's
> > > another issue. Hopefully, your storage is raid and can handle a
> > > disk failure.
> >
> > Hm... Thanks for you answer! I am definetelly confused a bit. Even
> > after reading you post of last week. I understand that i can not
> > merge the file systems. Our setup is very basic. We have to linux
> > machines who could act as file server and we thought that we could
> > one (A) have working as an active backup of the other (B). Is that
> > what the documentation calls a failover domain, with (B) being the
> > failover "domain" for (A)? Until now, we were running rsync at
> > night, so that if the first of the two servers failed, clients could
> > mount the NFS from the other server. There is nothing fancy here,
> > like a SAN I guess, just machines connected via ethernet switches.
> > So basically the question is, whether it is possible to keep the
> > filesystems on the two servers in total sync, so that it would not
> > matter whether clients mount the remote share from (A) or (B).
> > Whether the clients would automatically be able to mount the GFS
> > from (B), if (A) fails.
>
> No, GFS doesn't work quite like that. What you have is something more
> like this: Two machines, (A) and (B), are file servers. A third
> machine, (C), is either a linux box exporting it's filesystem via
> GNBD, or a dedicated storage box running iSCSI, AoE, or something
> similar that will allow multiple connections. (A) and (B) are both
> connected to the GFS filesystem exported by (C). If either (A) or (B)
> goes down, the other one can continue serving the data from (C). They
> don't need to be synchronized because they are using the same physical
> storage. And, if the application permits, you can even run them both
> simultaneously.
>
> You are looking for something different. There is a project out there
> for that, but it is not production ready at this point. Maybe someone
> else remembers the name.
>
> --
> Bowie
>
Oh, OK. This would makes sense to me. But I still have some questions..
1. Would this reduce the load on (C)?
2. I know how to export the gfs from (C) and mount it on (A) and (B), but
how to the clients know whether they should connect to (A) or (B). Is this
managed my clvmd?
Thanks for the great help so far!!
wolfgang
--
Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer
From kumaresh81 at yahoo.co.in Sat Apr 8 16:48:04 2006
From: kumaresh81 at yahoo.co.in (Kumaresh Ponnuswamy)
Date: Sat, 8 Apr 2006 17:48:04 +0100 (BST)
Subject: [Linux-cluster] issues with rhcs 4.2
Message-ID: <20060408164804.54434.qmail@web8319.mail.in.yahoo.com>
hi,
I recently migrated from rhcs 3 to rhcs 4.2 and since then I am unable to bring up the clustered services.
Even though the services are getting executed (like the VIP, shared devices etc), the status in clustat and system-config-cluster still displays failed and because of this the failover is not happening.
Any light on this will be much appreciated. Cluster is on RHEL AS 4U2 with two nodes.
Regards,
Kumaresh
---------------------------------
Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From l.dardini at comune.prato.it Sat Apr 8 17:05:18 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Sat, 8 Apr 2006 19:05:18 +0200
Subject: [Linux-cluster] Cluster node not able to access all cluster resource
Message-ID: <0C5C8B118420264EBB94D7D7050150011EFA92@exchange2.comune.prato.local>
The topic is not a problem, but what I want to do. I have a lots of
service, each on is now run by a two node cluster. This is very bad due
to each node fencing other one during network blackout. I'd like to
create only one cluster, but each resource, either GFS filesystems, must
be readable only by a limited number of nodes.
For example, taking a Cluster "test" made of node A, node B, node C,
node D and with the following resources: GFS Filesystem alpha and GFS
Filesystem beta. I want that only node A and node B can access GFS
Filesystem alpha and only node C and node D can access GFS Filesystem
beta.
Is it possible?
Leandro
From ookami at gmx.de Sun Apr 9 00:44:15 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Sun, 9 Apr 2006 02:44:15 +0200 (MEST)
Subject: [Linux-cluster] hangs when copying with gnbd and gfs
Message-ID: <20347.1144543455@www012.gmx.net>
Hi,
I could successfully mount a gfs partition and export with gnbd. It was
also very fast, when I was moving a file from the client to the server,
but if I try a second operation, like copying the file back, it always
hangs. I can not even do copy files locally to the gfs partition anymore.
Unfortunately, there is no info at all in the syslog or any other logfile.
And the "gnbd_import -vl" and "gnbd_export -vl" don't show any error
either. I guess it has something to do with the locking or fencing, but I
don't understand that very well. Below it my config etc. Thanks for any
hints!!
I exported/imported the file system like that:
gnbd_export -d /dev/hdd1 -e testgfs
gnbd_import -i eon
mount -t gfs /dev/gnbd/testgfs /mnt/gfs1/
--
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner
From ookami at gmx.de Sun Apr 9 02:54:55 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Sun, 9 Apr 2006 04:54:55 +0200 (MEST)
Subject: [Linux-cluster] hangs when copying with gnbd and gfs
References: <20347.1144543455@www012.gmx.net>
Message-ID: <22376.1144551295@www084.gmx.net>
Could this be related to automount? I just tried it again copied back a
forth some mpg files and everything worked fine. But then I copied another
file (230MB of /dev/zero) and the copying froze. The only think I could
find in the log file was this:
Apr 8 20:44:26 echo automount[5176]: failed to mount /misc/.directory
Apr 8 20:44:26 echo automount[5177]: failed to mount /misc/.directory
Apr 8 20:44:26 echo automount[5178]: >> /usr/sbin/showmount: can't get
address for .directory
Apr 8 20:44:26 echo automount[5178]: lookup(program): lookup
for .directory failed
Apr 8 20:44:26 echo automount[5178]: failed to mount /net/.directory
Apr 8 20:44:26 echo automount[5183]: >> /usr/sbin/showmount: can't get
address for .directory
Apr 8 20:44:26 echo automount[5183]: lookup(program): lookup
for .directory failed
Apr 8 20:44:26 echo automount[5183]: failed to mount /net/.directory
Another question I have is whether it is possible to mount the gfs on the
server while it gnbd-exports the filesystem?
wolfgang
--
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner
From Alain.Moulle at bull.net Mon Apr 10 11:02:08 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Mon, 10 Apr 2006 13:02:08 +0200
Subject: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
Message-ID: <443A3B30.10307@bull.net>
Hi
I'm trying to configure a simple 3 nodes cluster
with simple tests scripts.
But I can't start cman, it remains stalled with
this message in syslog :
Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded
Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
16:04:34) installed
Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered protocol family 30
Apr 10 11:38:00 s_sys at yack21 ccsd[25004]: cluster.conf (cluster name =
HA_METADATA_3N, version = 8) found.
Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to join or form a
Linux-cluster
Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Connected to cluster infrastruture
via: CMAN/SM Plugin v1.1.2
Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status:: Inquorate
Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN: forming a new cluster
and nothing more.
The graphic tool dos not detect any error in configuration; I 've
attached my cluster.conf for the three nodes, knowing that
I wanted two nodes (yack10 and yack21) running theirs applications
and the 3rd one (yack23) as a backup for yack10 and/or yack21,
but I don't want any failover between yack10 and yack21.
PS : I 've verified all ssh connections between the 3 nodes, and
all the fence paths as described in the cluster.conf.
Thanks again for your help.
Alain
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 1500 bytes
Desc: not available
URL:
From l.dardini at comune.prato.it Mon Apr 10 11:11:04 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Mon, 10 Apr 2006 13:11:04 +0200
Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
Message-ID: <0C5C8B118420264EBB94D7D7050150011EFACF@exchange2.comune.prato.local>
> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] Per conto di Alain Moulle
> Inviato: luned? 10 aprile 2006 13.02
> A: linux-cluster at redhat.com
> Oggetto: [Linux-cluster] CS4 U2 / problem to configure a 3
> nodes cluster
>
> Hi
>
> I'm trying to configure a simple 3 nodes cluster with simple
> tests scripts.
> But I can't start cman, it remains stalled with this message
> in syslog :
> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr 10
> 11:38:00 s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
> 16:04:34) installed
> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered
> protocol family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]:
> cluster.conf (cluster name = HA_METADATA_3N, version = 8) found.
> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to join
> or form a Linux-cluster Apr 10 11:38:01 s_sys at yack21
> ccsd[25004]: Connected to cluster infrastruture
> via: CMAN/SM Plugin v1.1.2
> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status::
> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN:
> forming a new cluster
>
> and nothing more.
>
> The graphic tool dos not detect any error in configuration; I
> 've attached my cluster.conf for the three nodes, knowing
> that I wanted two nodes (yack10 and yack21) running theirs
> applications and the 3rd one (yack23) as a backup for yack10
> and/or yack21, but I don't want any failover between yack10
> and yack21.
>
> PS : I 've verified all ssh connections between the 3 nodes,
> and all the fence paths as described in the cluster.conf.
> Thanks again for your help.
>
> Alain
>
Are you starting the cman on all three nodes in the same time? A node doesn't start until each other node is starting. Timing is important during booting.
Leandro
From pcaulfie at redhat.com Mon Apr 10 12:02:58 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 10 Apr 2006 13:02:58 +0100
Subject: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
In-Reply-To: <443A3B30.10307@bull.net>
References: <443A3B30.10307@bull.net>
Message-ID: <443A4972.5030000@redhat.com>
Alain Moulle wrote:
> Hi
>
> I'm trying to configure a simple 3 nodes cluster
> with simple tests scripts.
> But I can't start cman, it remains stalled with
> this message in syslog :
> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded
> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
> 16:04:34) installed
> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered protocol family 30
> Apr 10 11:38:00 s_sys at yack21 ccsd[25004]: cluster.conf (cluster name =
> HA_METADATA_3N, version = 8) found.
> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to join or form a
> Linux-cluster
> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Connected to cluster infrastruture
> via: CMAN/SM Plugin v1.1.2
> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status:: Inquorate
> Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN: forming a new cluster
>
> and nothing more.
>
> The graphic tool dos not detect any error in configuration; I 've
> attached my cluster.conf for the three nodes, knowing that
> I wanted two nodes (yack10 and yack21) running theirs applications
> and the 3rd one (yack23) as a backup for yack10 and/or yack21,
> but I don't want any failover between yack10 and yack21.
>
> PS : I 've verified all ssh connections between the 3 nodes, and
> all the fence paths as described in the cluster.conf.
> Thanks again for your help.
Check that the cluster ports are not blocked by any firewalling. You'll
need 6809/udp & 21064/tcp opened.
--
patrick
From ugo.parsi at gmail.com Mon Apr 10 14:25:20 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Mon, 10 Apr 2006 16:25:20 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
Message-ID:
Hello,
Do you know how to run GFS / linux-cluster suite under a 2.6.16 vanilla kernel ?
All I've got is :
/usr/src/cluster/dlm-kernel/src2/lockspace.c: In function `do_uevent':
/usr/src/cluster/dlm-kernel/src2/lockspace.c:160: error: too many
arguments to function `kobject_uevent'
/usr/src/cluster/dlm-kernel/src2/lockspace.c:162: error: too many
arguments to function `kobject_uevent'
make[4]: *** [/usr/src/cluster/dlm-kernel/src2/lockspace.o] Error 1
I've removed the last argument in the kobject_uvent call wich was
"NULL", it does compile, but I don't really know if it's safe to do
this that way...
Anyway, I'm stuck with another error which seem due to a missing
include .h file (dlm.h) :
libdlm.c:44:17: dlm.h: No such file or directory
In file included from libdlm.c:46:
libdlm.h:142: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:142: warning: its scope is only this definition or
declaration, which is probably not what you want
libdlm.h:145: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:156: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:160: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:210: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:221: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:225: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:229: warning: `struct dlm_lksb' declared inside parameter list
libdlm.c:47:24: dlm_device.h: No such file or directory
libdlm.c:70: warning: `struct dlm_lock_result' declared inside parameter list
libdlm.c:71: warning: `struct dlm_lock_result' declared inside parameter list
libdlm.c:72: warning: `struct dlm_write_request' declared inside parameter list
libdlm.c:120: error: field `lksb' has incomplete type
libdlm.c: In function `unlock_resource':
libdlm.c:215: error: `DLM_EUNLOCK' undeclared (first use in this function)
libdlm.c:215: error: (Each undeclared identifier is reported only once
libdlm.c:215: error: for each function it appears in.)
libdlm.c: At top level:
libdlm.c:268: warning: `struct dlm_write_request' declared inside parameter list
libdlm.c: In function `set_version':
libdlm.c:270: error: dereferencing pointer to incomplete type
libdlm.c:270: error: `DLM_DEVICE_VERSION_MAJOR' undeclared (first use
in this function)
libdlm.c:271: error: dereferencing pointer to incomplete type
Any ideas ?
Thanks a lot,
Ugo PARSI
From jerome.castang at adelpha-lan.org Mon Apr 10 14:33:21 2006
From: jerome.castang at adelpha-lan.org (Castang Jerome)
Date: Mon, 10 Apr 2006 16:33:21 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To:
References:
Message-ID: <443A6CB1.7010307@adelpha-lan.org>
Ugo PARSI a ?crit :
>Hello,
>
>Do you know how to run GFS / linux-cluster suite under a 2.6.16 vanilla kernel ?
>
>All I've got is :
>
>/usr/src/cluster/dlm-kernel/src2/lockspace.c: In function `do_uevent':
>/usr/src/cluster/dlm-kernel/src2/lockspace.c:160: error: too many
>arguments to function `kobject_uevent'
>/usr/src/cluster/dlm-kernel/src2/lockspace.c:162: error: too many
>arguments to function `kobject_uevent'
>make[4]: *** [/usr/src/cluster/dlm-kernel/src2/lockspace.o] Error 1
>
>I've removed the last argument in the kobject_uvent call wich was
>"NULL", it does compile, but I don't really know if it's safe to do
>this that way...
>
>Anyway, I'm stuck with another error which seem due to a missing
>include .h file (dlm.h) :
>
>libdlm.c:44:17: dlm.h: No such file or directory
>In file included from libdlm.c:46:
>libdlm.h:142: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:142: warning: its scope is only this definition or
>declaration, which is probably not what you want
>libdlm.h:145: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:156: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:160: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:210: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:221: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:225: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:229: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.c:47:24: dlm_device.h: No such file or directory
>libdlm.c:70: warning: `struct dlm_lock_result' declared inside parameter list
>libdlm.c:71: warning: `struct dlm_lock_result' declared inside parameter list
>libdlm.c:72: warning: `struct dlm_write_request' declared inside parameter list
>libdlm.c:120: error: field `lksb' has incomplete type
>libdlm.c: In function `unlock_resource':
>libdlm.c:215: error: `DLM_EUNLOCK' undeclared (first use in this function)
>libdlm.c:215: error: (Each undeclared identifier is reported only once
>libdlm.c:215: error: for each function it appears in.)
>libdlm.c: At top level:
>libdlm.c:268: warning: `struct dlm_write_request' declared inside parameter list
>libdlm.c: In function `set_version':
>libdlm.c:270: error: dereferencing pointer to incomplete type
>libdlm.c:270: error: `DLM_DEVICE_VERSION_MAJOR' undeclared (first use
>in this function)
>libdlm.c:271: error: dereferencing pointer to incomplete type
>
>Any ideas ?
>
>Thanks a lot,
>
>Ugo PARSI
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
For the problem with dlm.h i found this:
http://rpmfind.net/linux/RPM/fedora/updates/4/x86_64/debug/dlm-kernel-debuginfo-2.6.11.5-20050601.152643.FC4.21.x86_64.html
Seems that dlm.h is provided by dlm-kernel-debuginfo
.
--
Jerome Castang
mail: jcastang at adelpha-lan.org
From ugo.parsi at gmail.com Mon Apr 10 14:39:25 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Mon, 10 Apr 2006 16:39:25 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <443A6CB1.7010307@adelpha-lan.org>
References:
<443A6CB1.7010307@adelpha-lan.org>
Message-ID:
>
> For the problem with dlm.h i found this:
> >http://rpmfind.net/linux/RPM/fedora/updates/4/x86_64/debug/dlm-kernel-debuginfo-2.6.11.5-20050601.152643.FC4.21.x86_64.html
The link is dead :(
>
> Seems that dlm.h is provided by dlm-kernel-debuginfo
> .
>
I've installed two packages on Debian
# apt-cache search dlm
libdlm-dev - Distributed lock manager - development files
libdlm0 - Distributed lock manager - library
Here's all I've got :
# locate dlm.h
/usr/include/libdlm.h
/usr/src/cluster/dlm-kernel/src2/dlm.h
/usr/src/cluster/dlm-kernel/src/dlm.h
/usr/src/cluster/dlm/lib/libdlm.h
/usr/src/cluster/gfs-kernel/src/dlm/lock_dlm.h
/usr/src/cluster/gfs/lock_dlm/daemon/lock_dlm.h
/usr/src/linux-2.6.16.1/fs/ocfs2/dlm/userdlm.h
I'm trying your package, but I suppose it's redhat-only...
Thanks,
Ugo PARSI
From jerome.castang at adelpha-lan.org Mon Apr 10 14:51:26 2006
From: jerome.castang at adelpha-lan.org (Castang Jerome)
Date: Mon, 10 Apr 2006 16:51:26 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To:
References: <443A6CB1.7010307@adelpha-lan.org>
Message-ID: <443A70EE.4070907@adelpha-lan.org>
Ugo PARSI a ?crit :
>>For the problem with dlm.h i found this:
>>
>>
>>>http://rpmfind.net/linux/RPM/fedora/updates/4/x86_64/debug/dlm-kernel-debuginfo-2.6.11.5-20050601.152643.FC4.21.x86_64.html
>>>
>>>
>
>The link is dead :(
>
>
Link is dead ?
It works perfectly for me...
>
>
>>Seems that dlm.h is provided by dlm-kernel-debuginfo
>>.
>>
>>
>>
>
>I've installed two packages on Debian
>
># apt-cache search dlm
>libdlm-dev - Distributed lock manager - development files
>libdlm0 - Distributed lock manager - library
>
>
>Here's all I've got :
>
># locate dlm.h
>/usr/include/libdlm.h
>/usr/src/cluster/dlm-kernel/src2/dlm.h
>/usr/src/cluster/dlm-kernel/src/dlm.h
>/usr/src/cluster/dlm/lib/libdlm.h
>/usr/src/cluster/gfs-kernel/src/dlm/lock_dlm.h
>/usr/src/cluster/gfs/lock_dlm/daemon/lock_dlm.h
>/usr/src/linux-2.6.16.1/fs/ocfs2/dlm/userdlm.h
>
>I'm trying your package, but I suppose it's redhat-only...
>
>Thanks,
>
>Ugo PARSI
>
>--88
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
I suppose you can try to get this RH package and unpack it to get files
and put them where they should be...
--
Jerome Castang
mail: jcastang at adelpha-lan.org
From ugo.parsi at gmail.com Mon Apr 10 14:57:14 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Mon, 10 Apr 2006 16:57:14 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <443A70EE.4070907@adelpha-lan.org>
References:
<443A6CB1.7010307@adelpha-lan.org>
<443A70EE.4070907@adelpha-lan.org>
Message-ID:
> I suppose you can try to get this RH package and unpack it to get files
> and put them where they should be...
>
Well I've just did and it doesn't change pretty much :(
Ugo PARSI
From jerome.castang at adelpha-lan.org Mon Apr 10 15:16:18 2006
From: jerome.castang at adelpha-lan.org (Castang Jerome)
Date: Mon, 10 Apr 2006 17:16:18 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To:
References:
<443A6CB1.7010307@adelpha-lan.org>
<443A70EE.4070907@adelpha-lan.org>
Message-ID: <443A76C2.8070900@adelpha-lan.org>
Ugo PARSI a ?crit :
>>I suppose you can try to get this RH package and unpack it to get files
>>and put them where they should be...
>>
>>
>>
>
>Well I've just did and it doesn't change pretty much :(
>
>Ugo PARSI
>
>
Have you tried to start with the cvs of Cluster Project ?
I think cvs provides all you need.
--
Jerome Castang
mail: jcastang at adelpha-lan.org
From basv at sara.nl Mon Apr 10 15:26:06 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Mon, 10 Apr 2006 17:26:06 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To:
References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org>
Message-ID: <443A790E.1040002@sara.nl>
Ugo PARSI wrote:
>> I suppose you can try to get this RH package and unpack it to get files
>> and put them where they should be...
>>
>
> Well I've just did and it doesn't change pretty much :(
>
> Ugo PARSI
Ugo,
Which version for GFS do you use cvs STABLE or HEAD?
I have compiled deb-packages for kernel 2.6.16.2 and uses the CVS
STABLE branch.
Regards
--
--
********************************************************************
* *
* Bas van der Vlies e-mail: basv at sara.nl *
* SARA - Academic Computing Services phone: +31 20 592 8012 *
* Kruislaan 415 fax: +31 20 6683167 *
* 1098 SJ Amsterdam *
* *
********************************************************************
From carlopmart at gmail.com Mon Apr 10 15:52:20 2006
From: carlopmart at gmail.com (carlopmart)
Date: Mon, 10 Apr 2006 17:52:20 +0200
Subject: [Linux-cluster] Question about manual fencing
Message-ID: <443A7F34.7000901@gmail.com>
Hi all,
I would like to test manual fencing on two nodes for testing
pourposes. I have read RedHat's docs about this but I don't see very
clear. If I setup manual fencing, when one node shutdowns, the other
node startups all services that I have configured on the another node
automatically?
Thanks.
--
CL Martinez
carlopmart {at} gmail {d0t} com
From jerome.castang at adelpha-lan.org Mon Apr 10 15:59:14 2006
From: jerome.castang at adelpha-lan.org (Castang Jerome)
Date: Mon, 10 Apr 2006 17:59:14 +0200
Subject: [Linux-cluster] Question about manual fencing
In-Reply-To: <443A7F34.7000901@gmail.com>
References: <443A7F34.7000901@gmail.com>
Message-ID: <443A80D2.6050806@adelpha-lan.org>
carlopmart a ?crit :
> Hi all,
>
> I would like to test manual fencing on two nodes for testing
> pourposes. I have read RedHat's docs about this but I don't see very
> clear. If I setup manual fencing, when one node shutdowns, the other
> node startups all services that I have configured on the another node
> automatically?
>
> Thanks.
>
I don't think so.
Fencing a node is to stop it, or make it leaving the cluster (using any
method like shutdown...)
So if you use manual fencing, the other nodes will not start automaticly
their services...
--
Jerome Castang
mail: jcastang at adelpha-lan.org
From tf0054 at gmail.com Sat Apr 8 16:23:05 2006
From: tf0054 at gmail.com (=?ISO-2022-JP?B?GyRCQ2ZMbkxUGyhC?=)
Date: Sun, 9 Apr 2006 01:23:05 +0900
Subject: [Linux-cluster] Cisco fence agent
Message-ID:
Hi all.
Do anyone have cisco catalyst fence agent?
If nobody make that, I will make.
Thanks.
From Bowie_Bailey at BUC.com Mon Apr 10 16:09:03 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Mon, 10 Apr 2006 12:09:03 -0400
Subject: [Linux-cluster] newbie: gfs merge
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338C7@bnifex.cis.buc.com>
wolfgang pauli wrote:
> > >
> > > Hm... Thanks for you answer! I am definetelly confused a bit. Even
> > > after reading you post of last week. I understand that i can not
> > > merge the file systems. Our setup is very basic. We have to linux
> > > machines who could act as file server and we thought that we could
> > > one (A) have working as an active backup of the other (B). Is that
> > > what the documentation calls a failover domain, with (B) being the
> > > failover "domain" for (A)? Until now, we were running rsync at
> > > night, so that if the first of the two servers failed, clients
> > > could mount the NFS from the other server. There is nothing fancy
> > > here, like a SAN I guess, just machines connected via ethernet
> > > switches. So basically the question is, whether it is possible to
> > > keep the filesystems on the two servers in total sync, so that it
> > > would not matter whether clients mount the remote share from (A)
> > > or (B). Whether the clients would automatically be able to mount
> > > the GFS from (B), if (A) fails.
> >
> > No, GFS doesn't work quite like that. What you have is something
> > more like this: Two machines, (A) and (B), are file servers. A
> > third machine, (C), is either a linux box exporting it's filesystem
> > via GNBD, or a dedicated storage box running iSCSI, AoE, or
> > something similar that will allow multiple connections. (A) and
> > (B) are both connected to the GFS filesystem exported by (C). If
> > either (A) or (B) goes down, the other one can continue serving the
> > data from (C). They don't need to be synchronized because they are
> > using the same physical storage. And, if the application permits,
> > you can even run them both simultaneously.
> >
> > You are looking for something different. There is a project out
> > there for that, but it is not production ready at this point.
> > Maybe someone else remembers the name.
>
> Oh, OK. This would makes sense to me. But I still have some
> questions..
>
> 1. Would this reduce the load on (C)?
Reduce it from what? (C) would be a completely different type of
machine from (A) and (B). (A) and (B) are application systems, while
(C) is just a fileserver. (C) would not need to be quite as fast as
the others, just fast enough to keep up with the I/O requirements of
the storage and the GFS/Cluster overhead.
> 2. I know how to export the gfs from (C) and mount it on (A) and (B),
> but how to the clients know whether they should connect to (A) or
> (B). Is this managed my clvmd?
No, this is managed by your network. If (A) and (B) are running the
same software, it doesn't matter which one they connect to. On my
system, I have a Foundry ServerIron that load-balances the two
machines. You can also do it using LVS software, such as the stuff in
the Linux HA project.
--
Bowie
From schlegel at riege.com Mon Apr 10 16:20:20 2006
From: schlegel at riege.com (Gunther Schlegel)
Date: Tue, 11 Apr 2006 00:20:20 +0800
Subject: [Linux-cluster] gfs file locking
Message-ID: <443A85C4.2060608@riege.com>
Hi,
does GFS support the same ways of file locking a local filesystem does?
I am evaluating to put an application on gfs that runs pretty fine on
local filesystems but tends to have severe problems on NFS. I know NFS
is totally different from GFS, but from the applications point of view
both are just filesystems.
best regards, Gunther
-------------- next part --------------
A non-text attachment was scrubbed...
Name: schlegel.vcf
Type: text/x-vcard
Size: 344 bytes
Desc: not available
URL:
From ugo.parsi at gmail.com Mon Apr 10 16:53:41 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Mon, 10 Apr 2006 18:53:41 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To:
References:
<443A6CB1.7010307@adelpha-lan.org>
<443A70EE.4070907@adelpha-lan.org>
<443A76C2.8070900@adelpha-lan.org>
Message-ID:
Reposting sorry :
On 4/10/06, Ugo PARSI wrote:
> > Have you tried to start with the cvs of Cluster Project ?
> > I think cvs provides all you need.
> >
>
> Well, that's the only thing I did....I guess ?!
>
> I've followed that document indeed :
>
> http://sources.redhat.com/cluster/doc/usage.txt
>
> So I did a cvs -d :pserver:cvs at sources.redhat.com:/cvs/cluster
> checkout cluster
>
> Is that okay ?
>
> Thanks a lot,
>
> Ugo PARSI
>
From ugo.parsi at gmail.com Mon Apr 10 16:57:02 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Mon, 10 Apr 2006 18:57:02 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <443A790E.1040002@sara.nl>
References:
<443A6CB1.7010307@adelpha-lan.org>
<443A70EE.4070907@adelpha-lan.org>
<443A790E.1040002@sara.nl>
Message-ID:
> Which version for GFS do you use cvs STABLE or HEAD?
>
I don't know how to tell...
Is stable this thing ? - The 'cluster' cvs head can be unstable, so
it's recommended that you
checkout from the RHEL4 branch -- 'checkout -r RHEL4 cluster'
I've tried both with or without anyway....
> I have compiled deb-packages for kernel 2.6.16.2 and uses the CVS
> STABLE branch.
>
>From a vanilla kernel ?
Because basically, I've just tried all of this from a fresh vanilla
2.6.16.1 (I'm gonna try the 2.6.16.2) downloaded from kernel.org.
System was running that kernel at time of compilation, and I provided
the path of the kernel to the configure script.
Anything wrong ? Any ideas ? You've made some fixes/patches ?
Thanks a lot,
Ugo PARSI
From basv at sara.nl Mon Apr 10 19:24:58 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Mon, 10 Apr 2006 21:24:58 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To:
References:
<443A6CB1.7010307@adelpha-lan.org>
<443A70EE.4070907@adelpha-lan.org>
<443A790E.1040002@sara.nl>
Message-ID:
>
>> I have compiled deb-packages for kernel 2.6.16.2 and uses the CVS
>> STABLE branch.
>>
>
>
You have to download, from cvs STABLE:
cvs -d :pserver:cvs at sources.redhat.com:/cvs/cluster checkout -r
STABLE cluster
Some packages need header files that are provided by others. So you
most install them
before compiling the rest. I have made debian package scripts for
all cluster packages.
If i have some time a will put them on our ftp-server.
I have made a small document it is in dutch, but it is not that
difficult. You have to install each package before building the
others. It make life for me easier then examine all the dependencies.
cd cluster/cman-kernel
dch -i (vullen met juiste kernel versie)
debian/rules clean
debian/rules build
debian/rules binary
dpkg -i ../cman-kernel_.deb
depmod -a
Nu de volgende delen maken op de bovenstaande manier:
dlm-kernel
cd juiste pad
dpkg -i ../dlm-kernel_.deb
gnbd-kernel
dpkg -i ../gnbd-kernel_.deb
gfs-kernel
dpkg -i ../gfs-kernel_.deb
Nu de volgende kernel onafhankelijke delen bouwen:
magma
dch -i (juiste cvs versie)
debian/rules clean
debian/rules binary
dpkg -i ../magma.deb
idem:
iddev
dpkg -i ../iddev.deb
ccs
dpkg -i ../ccs.deb
cman
dlm
dpkg -i ../dlm.deb
gnbd
gfs
fence
gulm
dpkg -i ../gulm.deb
magma-plugins
rgmanager
--
Bas van der Vlies
basv at sara.nl
From ocrete at max-t.com Mon Apr 10 21:01:47 2006
From: ocrete at max-t.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=)
Date: Mon, 10 Apr 2006 17:01:47 -0400
Subject: [Linux-cluster] cman kickout out nodes for no good reason
In-Reply-To: <1144341281.355.38.camel@cocagne.max-t.internal>
References: <1144341281.355.38.camel@cocagne.max-t.internal>
Message-ID: <1144702908.21093.7.camel@cocagne.max-t.internal>
On Thu, 2006-06-04 at 12:34 -0400, Olivier Cr?te wrote:
> I have a strange problem where cman suddenly starts kicking out members
> of the cluster with "Inconsistent cluster view" when I join a new node
> (sometimes). It takes a few minutes between each kicking. I'm using a
> snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is
> in transition state at that point and I can't stop/start services or do
> anything else. It did not do that with a snapshot I took a few months
> ago.
Its still happening, the node that joins says "Transition master
unknown", while all of the other nodes who the master is, then the
master gets kicked out. Then a new master is selected, all of the nodes
seem to know who the master is, but refuse to act on it. After a while,
the new master is kicked out and the process restarts. I guess its
related to the changes with the timestamps to prevent master desync, I
dont see any other recent change that could have caused it.
--
Olivier Cr?te
ocrete at max-t.com
Maximum Throughput Inc.
From ookami at gmx.de Mon Apr 10 23:07:48 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Tue, 11 Apr 2006 01:07:48 +0200 (MEST)
Subject: [Linux-cluster] hangs when copying with gnbd and gfs
References: <22376.1144551295@www084.gmx.net>
Message-ID: <28595.1144710468@www031.gmx.net>
> Could this be related to automount? I just tried it again copied back a
> forth some mpg files and everything worked fine. But then I copied
another
> file (230MB of /dev/zero) and the copying froze. The only think I could
> find in the log file was this:
> Apr 8 20:44:26 echo automount[5176]: failed to mount /misc/.directory
> Apr 8 20:44:26 echo automount[5177]: failed to mount /misc/.directory
> Apr 8 20:44:26 echo automount[5178]: >> /usr/sbin/showmount: can't get
> address for .directory
> Apr 8 20:44:26 echo automount[5178]: lookup(program): lookup
> for .directory failed
> Apr 8 20:44:26 echo automount[5178]: failed to mount /net/.directory
> Apr 8 20:44:26 echo automount[5183]: >> /usr/sbin/showmount: can't get
> address for .directory
> Apr 8 20:44:26 echo automount[5183]: lookup(program): lookup
> for .directory failed
> Apr 8 20:44:26 echo automount[5183]: failed to mount /net/.directory
>
> Another question I have is whether it is possible to mount the gfs on
the
> server while it gnbd-exports the filesystem?
>
> wolfgang
>
OK, I think I solved it. I switched from GNBD to iSCSI. I have iscsitarget
running on the server and open-iscsi on the client. I had to export the
logical volume rather then the war device to be able to mount it on the
client.
--
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
From forigato at gmail.com Mon Apr 10 23:57:16 2006
From: forigato at gmail.com (ANDRE LUIS FORIGATO)
Date: Mon, 10 Apr 2006 20:57:16 -0300
Subject: [Linux-cluster] Help-me, Please
Message-ID: <9e7b71460604101657n1eebc099jfaabb5a08ebbc630@mail.gmail.com>
Linux xlx2 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005
i686 i686 i386 GNU/Linux
Redhat-config-cluster 1.0.3
clumanager 1.2.22
Apr 10 01:18:07 xlx2 clusvcmgrd[4671]: Couldn't connect to
member #0: Connection timed out
Apr 10 05:13:43 xlx2 clusvcmgrd[4671]: Couldn't connect to
member #0: Connection timed out
Apr 10 05:13:43 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
lock: No locks available
Apr 10 05:13:49 xlx2 cluquorumd[4463]: Disk-TB: Partner is DOWN
(Dead/Hung)
Apr 10 05:13:54 xlx2 cluquorumd[4463]: Disk-TB: State Change: Partner UP
Apr 10 10:47:08 xlx2 clusvcmgrd[4671]: Couldn't connect to
member #0: Connection timed out
Apr 10 10:47:08 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
lock: No locks available
Apr 10 11:30:59 xlx2 clusvcmgrd[4671]: Couldn't connect to
member #0: Connection timed out
Apr 10 11:30:59 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
lock: No locks available
Apr 10 11:31:07 xlx2 clumembd[4493]: Membership View #5:0x00000002
Apr 10 11:31:08 xlx2 cluquorumd[4463]: Membership reports #0
as down, but disk reports as up: State uncertain!
Apr 10 11:31:08 xlx2 cluquorumd[4463]: --> Commencing STONITH <--
Apr 10 11:31:08 xlx2 cluquorumd[4463]: Disk-TB: Partner is DOWN
(Dead/Hung)
Apr 10 11:31:10 xlx2 cluquorumd[4463]: Disk-TB: State Change: Partner UP
Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: Quorum Event: View #12 0x00000002
Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: Member
200.254.254.171's state is uncertain: Some services may be
unavailable!
Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: Quorum Event: View #13 0x00000002
Apr 10 11:31:29 xlx2 clusvcmgrd[4671]: Couldn't connect to
member #0: Connection timed out
Apr 10 11:31:29 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
lock: No locks available
Apr 10 11:31:34 xlx2 cluquorumd[4463]: Disk-TB: Partner is DOWN
(Dead/Hung)
Apr 10 11:31:38 xlx2 cluquorumd[4463]: --> Commencing STONITH <--
Apr 10 11:31:38 xlx2 cluquorumd[4463]: STONITH: Falsely
claiming that 200.254.254.171 has been fenced
Apr 10 11:31:38 xlx2 cluquorumd[4463]: STONITH: Data integrity
may be compromised!
Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: Couldn't connect to
member #0: Connection timed out
Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
lock: No locks available
Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: Quorum Event: View #15 0x00000002
Apr 10 11:31:41 xlx2 clusvcmgrd[4671]: State change:
200.254.254.172 DOWN
Apr 10 11:34:08 xlx2 cluquorumd[4463]: Disk-TB: State Change: Partner UP
Apr 10 11:34:09 xlx2 clusvcmgrd[4671]: Quorum Event: View #16 0x00000002
Apr 10 11:34:16 xlx2 clusvcmgrd[4671]: Couldn't connect to
member #0: No route to host
Apr 10 11:34:16 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
lock: No locks available
Apr 10 11:34:25 xlx2 clusvcmgrd[4671]: Couldn't connect to
member #0: No route to host
Apr 10 11:34:25 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
lock: No locks available
Apr 10 11:34:34 xlx2 clusvcmgrd[4671]: Couldn't connect to
member #0: No route to host
Apr 10 11:34:34 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
lock: No locks available
Apr 10 11:34:43 xlx2 clusvcmgrd[4671]: Couldn't connect to
member #0: No route to host
Apr 10 11:34:43 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
lock: No locks available
Apr 10 11:34:50 xlx2 clumembd[4493]: Member 200.254.254.171 UP
Apr 10 11:34:50 xlx2 clumembd[4493]: Membership View #6:0x00000003
Apr 10 11:34:50 xlx2 cluquorumd[4463]: __msg_send: Incomplete
write to 13. Error: Connection reset by peer
Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: Quorum Event: View #17 0x00000003
Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: State change: Local UP
Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: State change: 200.254.254.171 UP
Apr 10 13:21:25 xlx2 clusvcmgrd[4671]: Couldn't connect to
member #0: Connection timed out
Apr 10 17:03:22 xlx2 clusvcmgrd[4671]: Couldn't connect to
member #0: Connection timed out
Apr 10 20:30:30 xlx2 clulockd[4498]: Denied 200.254.254.171:
Broken pipe
Apr 10 20:30:30 xlx2 clulockd[4498]: select error: Broken pipe
Att,
Forigas
From Alain.Moulle at bull.net Tue Apr 11 06:08:57 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 11 Apr 2006 08:08:57 +0200
Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
Message-ID: <443B47F9.6090506@bull.net>
>Hi
>>
>> I'm trying to configure a simple 3 nodes cluster with simple
>> tests scripts.
>> But I can't start cman, it remains stalled with this message
>> in syslog :
>> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr 10
>> 11:38:00 s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
>> 16:04:34) installed
>> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered
>> protocol family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]:
>> cluster.conf (cluster name = HA_METADATA_3N, version = 8) found.
>> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to join
>> or form a Linux-cluster Apr 10 11:38:01 s_sys at yack21
>> ccsd[25004]: Connected to cluster infrastruture
>> via: CMAN/SM Plugin v1.1.2
>> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status::
>> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN:
>> forming a new cluster
>>
>> and nothing more.
>>
>> The graphic tool dos not detect any error in configuration; I
>> 've attached my cluster.conf for the three nodes, knowing
>> that I wanted two nodes (yack10 and yack21) running theirs
>> applications and the 3rd one (yack23) as a backup for yack10
>> and/or yack21, but I don't want any failover between yack10
>> and yack21.
>>
>> PS : I 've verified all ssh connections between the 3 nodes,
>> and all the fence paths as described in the cluster.conf.
>> Thanks again for your help.
>>
>> Alain
>>
>Are you starting the cman on all three nodes in the same time? A node doesn't
>start until each other node is starting. Timing is important during booting.
>Leandro
Hi, no I wasn't ...
I've tried now, and this is ok on yack21 and yack23, but not on yack10,
is there something wrong in the cluster.conf to explain this behavior ?
On yack10 , cman is trying to :
CMAN: forming a new cluster
but fails with a timeout ...
??
Thanks
Alain
--
mailto:Alain.Moulle at bull.net
+------------------------------+--------------------------------+
| Alain Moull? | from France : 04 76 29 75 99 |
| | FAX number : 04 76 29 72 49 |
| Bull SA | |
| 1, Rue de Provence | Adr : FREC B1-041 |
| B.P. 208 | |
| 38432 Echirolles - CEDEX | Email: Alain.Moulle at bull.net |
| France | BCOM : 229 7599 |
+-------------------------------+-------------------------------+
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 1500 bytes
Desc: not available
URL:
From l.dardini at comune.prato.it Tue Apr 11 06:59:13 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Tue, 11 Apr 2006 08:59:13 +0200
Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
Message-ID: <0C5C8B118420264EBB94D7D7050150011EFAEB@exchange2.comune.prato.local>
> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] Per conto di Alain Moulle
> Inviato: marted? 11 aprile 2006 8.09
> A: linux-cluster at redhat.com
> Oggetto: R: [Linux-cluster] CS4 U2 / problem to configure a 3
> nodes cluster
>
> >Hi
> >>
> >> I'm trying to configure a simple 3 nodes cluster with simple tests
> >> scripts.
> >> But I can't start cman, it remains stalled with this message in
> >> syslog :
> >> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr
> 10 11:38:00
> >> s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
> >> 16:04:34) installed
> >> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered protocol
> >> family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]:
> >> cluster.conf (cluster name = HA_METADATA_3N, version = 8) found.
> >> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to
> join or form
> >> a Linux-cluster Apr 10 11:38:01 s_sys at yack21
> >> ccsd[25004]: Connected to cluster infrastruture
> >> via: CMAN/SM Plugin v1.1.2
> >> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status::
> >> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN:
> >> forming a new cluster
> >>
> >> and nothing more.
> >>
> >> The graphic tool dos not detect any error in configuration; I 've
> >> attached my cluster.conf for the three nodes, knowing that
> I wanted
> >> two nodes (yack10 and yack21) running theirs applications
> and the 3rd
> >> one (yack23) as a backup for yack10 and/or yack21, but I
> don't want
> >> any failover between yack10 and yack21.
> >>
> >> PS : I 've verified all ssh connections between the 3
> nodes, and all
> >> the fence paths as described in the cluster.conf.
> >> Thanks again for your help.
> >>
> >> Alain
> >>
>
>
> >Are you starting the cman on all three nodes in the same
> time? A node
> >doesn't start until each other node is starting. Timing is
> important during booting.
>
> >Leandro
>
> Hi, no I wasn't ...
> I've tried now, and this is ok on yack21 and yack23, but not
> on yack10, is there something wrong in the cluster.conf to
> explain this behavior ?
> On yack10 , cman is trying to :
> CMAN: forming a new cluster
> but fails with a timeout ...
>
> ??
> Thanks
> Alain
> --
>
Maybe this time is due to a firewall setup, as already stated on the list. A tcpdump from yack10 to the other nodes may help you catch the bug.
Leandro
From ugo.parsi at gmail.com Tue Apr 11 07:44:56 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Tue, 11 Apr 2006 09:44:56 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To:
References:
<443A6CB1.7010307@adelpha-lan.org>
<443A70EE.4070907@adelpha-lan.org>
<443A790E.1040002@sara.nl>
Message-ID:
> You have to download, from cvs STABLE:
> cvs -d :pserver:cvs at sources.redhat.com:/cvs/cluster checkout -r
> STABLE cluster
>
Ok I've tried it, thanks, it does seem to work better but I have still
issues....
This time there's no kernel issues....but another missing .h file :
[...]
make[2]: Entering directory `/usr/src/cluster/cman/lib'
gcc -Wall -g -O -I. -fPIC -I/usr/src/cluster/build/incdir/cluster -c
-o libcman.o libcman.c
libcman.c:31:35: cluster/cnxman-socket.h: No such file or directory
libcman.c:44: warning: `struct cl_cluster_node' declared inside parameter list
libcman.c:44: warning: its scope is only this definition or
declaration, which is probably not what you want
libcman.c: In function `copy_node':
libcman.c:46: error: dereferencing pointer to incomplete type
libcman.c:47: error: dereferencing pointer to incomplete type
[...]
> Some packages need header files that are provided by others. So you
> most install them
> before compiling the rest. I have made debian package scripts for
> all cluster packages.
True, but well, that's what the main Makefile is doing, right ?
[....]
cd cman-kernel && ${MAKE} install ${MAKELINE}
cd dlm-kernel && ${MAKE} install ${MAKELINE}
cd gfs-kernel && ${MAKE} install ${MAKELINE}
cd gnbd-kernel && ${MAKE} install ${MAKELINE}
cd magma && ${MAKE} install ${MAKELINE}
cd ccs && ${MAKE} install ${MAKELINE}
[....]
So I don't see what you are doing more.... except the fact you are
building Debian packages ?
Thanks a lot,
Ugo PARSI
From pcaulfie at redhat.com Tue Apr 11 07:47:52 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 11 Apr 2006 08:47:52 +0100
Subject: [Linux-cluster] cman kickout out nodes for no good reason
In-Reply-To: <1144702908.21093.7.camel@cocagne.max-t.internal>
References: <1144341281.355.38.camel@cocagne.max-t.internal>
<1144702908.21093.7.camel@cocagne.max-t.internal>
Message-ID: <443B5F28.1060004@redhat.com>
Olivier Cr?te wrote:
> On Thu, 2006-06-04 at 12:34 -0400, Olivier Cr?te wrote:
>> I have a strange problem where cman suddenly starts kicking out members
>> of the cluster with "Inconsistent cluster view" when I join a new node
>> (sometimes). It takes a few minutes between each kicking. I'm using a
>> snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is
>> in transition state at that point and I can't stop/start services or do
>> anything else. It did not do that with a snapshot I took a few months
>> ago.
>
> Its still happening, the node that joins says "Transition master
> unknown", while all of the other nodes who the master is, then the
> master gets kicked out. Then a new master is selected, all of the nodes
> seem to know who the master is, but refuse to act on it. After a while,
> the new master is kicked out and the process restarts. I guess its
> related to the changes with the timestamps to prevent master desync, I
> dont see any other recent change that could have caused it.
>
That's very peculiar behaviour, and it's going to be hard to pin down. How
consistently does it happen ?
It could be caused by extreme network packet loss, or something blocking the
progress of cman processes. Are the already joined nodes very busy when you
bring the new node into the cluster (if so, doing what?)
I think the best way to try and track this down is to get a tcpdump of the
cluster traffic (port 6809/udp) happening at the time of the join - make sure
that all nodes are included in the dump and that all of the packet is captured.
--
patrick
From pcaulfie at redhat.com Tue Apr 11 08:46:15 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 11 Apr 2006 09:46:15 +0100
Subject: [Linux-cluster] DLM messages
In-Reply-To: <4427CB55.2060203@sara.nl>
References: <20060327084643.GB27410@redhat.com> <4427AA3F.3040009@sara.nl>
<4427CB55.2060203@sara.nl>
Message-ID: <443B6CD7.8050704@redhat.com>
> === FS2 ==
> Mar 27 12:28:25 ifs2 kernel: ------------[ cut here ]------------
> Mar 27 12:28:25 ifs2 kernel: kernel BUG at
> /usr/src/gfs/stable_1.0.2/stable/cluster/cman-kernel/src/membership.c:3151!
> Mar 27 12:28:25 ifs2 kernel: invalid opcode: 0000 [#1]
> Mar 27 12:28:25 ifs2 kernel: SMP
> Mar 27 12:28:25 ifs2 kernel: Modules linked in: lock_dlm dlm cman
> dm_round_robin dm_multipath sg ide_floppy ide_cd cdrom qla2xxx siimage piix
> e1000 gfs lock_harness dm_mod
> Mar 27 12:28:25 ifs2 kernel: CPU: 0
> Mar 27 12:28:25 ifs2 kernel: EIP: 0060:[] Tainted: GF VLI
> Mar 27 12:28:25 ifs2 kernel: EFLAGS: 00010246 (2.6.16-rc5-sara3 #1)
> Mar 27 12:28:25 ifs2 kernel: EIP is at elect_master+0x34/0x41 [cman]
That cman crash looks nasty, though it may be related to "disabing the
heartbeat-network interface". Is this the node you are referring to ?
--
patrick
From basv at sara.nl Tue Apr 11 10:13:33 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Tue, 11 Apr 2006 12:13:33 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To:
References: <443A6CB1.7010307@adelpha-lan.org> <443A70EE.4070907@adelpha-lan.org> <443A790E.1040002@sara.nl>
Message-ID: <443B814D.6030706@sara.nl>
Ugo PARSI wrote:
>> You have to download, from cvs STABLE:
>> cvs -d :pserver:cvs at sources.redhat.com:/cvs/cluster checkout -r
>> STABLE cluster
>>
>
> Ok I've tried it, thanks, it does seem to work better but I have still
> issues....
> This time there's no kernel issues....but another missing .h file :
>
> [...]
> make[2]: Entering directory `/usr/src/cluster/cman/lib'
> gcc -Wall -g -O -I. -fPIC -I/usr/src/cluster/build/incdir/cluster -c
> -o libcman.o libcman.c
> libcman.c:31:35: cluster/cnxman-socket.h: No such file or directory
> libcman.c:44: warning: `struct cl_cluster_node' declared inside parameter list
> libcman.c:44: warning: its scope is only this definition or
> declaration, which is probably not what you want
> libcman.c: In function `copy_node':
> libcman.c:46: error: dereferencing pointer to incomplete type
> libcman.c:47: error: dereferencing pointer to incomplete type
> [...]
>
This a bug i reported it to his list, but no replies. I think i removed
the cluster from the include cluster/cnxman-socket.h line.
Your are using debian or not. I can put the deb-packages that kernel
independed on our ftp-server. No warranty they include all init.d
script and start at runlevel 3.
When i machine start in starts at runlevel 2, not in cluster enabled
mode. To enable cluster mode we do a init 3, to can remove a node from a
cluster with the init 2 command.
Regards
--
--
********************************************************************
* *
* Bas van der Vlies e-mail: basv at sara.nl *
* SARA - Academic Computing Services phone: +31 20 592 8012 *
* Kruislaan 415 fax: +31 20 6683167 *
* 1098 SJ Amsterdam *
* *
********************************************************************
From basv at sara.nl Tue Apr 11 10:19:45 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Tue, 11 Apr 2006 12:19:45 +0200
Subject: [Linux-cluster] DLM messages
In-Reply-To: <443B6CD7.8050704@redhat.com>
References: <20060327084643.GB27410@redhat.com> <4427AA3F.3040009@sara.nl> <4427CB55.2060203@sara.nl>
<443B6CD7.8050704@redhat.com>
Message-ID: <443B82C1.7010603@sara.nl>
Patrick Caulfield wrote:
>> === FS2 ==
>> Mar 27 12:28:25 ifs2 kernel: ------------[ cut here ]------------
>> Mar 27 12:28:25 ifs2 kernel: kernel BUG at
>> /usr/src/gfs/stable_1.0.2/stable/cluster/cman-kernel/src/membership.c:3151!
>> Mar 27 12:28:25 ifs2 kernel: invalid opcode: 0000 [#1]
>> Mar 27 12:28:25 ifs2 kernel: SMP
>> Mar 27 12:28:25 ifs2 kernel: Modules linked in: lock_dlm dlm cman
>> dm_round_robin dm_multipath sg ide_floppy ide_cd cdrom qla2xxx siimage piix
>> e1000 gfs lock_harness dm_mod
>> Mar 27 12:28:25 ifs2 kernel: CPU: 0
>> Mar 27 12:28:25 ifs2 kernel: EIP: 0060:[] Tainted: GF VLI
>> Mar 27 12:28:25 ifs2 kernel: EFLAGS: 00010246 (2.6.16-rc5-sara3 #1)
>> Mar 27 12:28:25 ifs2 kernel: EIP is at elect_master+0x34/0x41 [cman]
>
> That cman crash looks nasty, though it may be related to "disabing the
> heartbeat-network interface". Is this the node you are referring to ?
>
As i read the thread this must be the node that i disabled the
heartbeat-network. So the other nodes could fence this node and they did
but the other nodes also crashed.
Regards
--
--
********************************************************************
* *
* Bas van der Vlies e-mail: basv at sara.nl *
* SARA - Academic Computing Services phone: +31 20 592 8012 *
* Kruislaan 415 fax: +31 20 6683167 *
* 1098 SJ Amsterdam *
* *
********************************************************************
From Alain.Moulle at bull.net Tue Apr 11 10:58:30 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 11 Apr 2006 12:58:30 +0200
Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
Message-ID: <443B8BD6.80906@bull.net>
>>Hi
>>
>>>> >>
>>>> >> I'm trying to configure a simple 3 nodes cluster with simple tests
>>>> >> scripts.
>>>> >> But I can't start cman, it remains stalled with this message in
>>>> >> syslog :
>>>> >> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr
>
>> 10 11:38:00
>
>>>> >> s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
>>>> >> 16:04:34) installed
>>>> >> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered protocol
>>>> >> family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]:
>>>> >> cluster.conf (cluster name = HA_METADATA_3N, version = 8) found.
>>>> >> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to
>
>> join or form
>
>>>> >> a Linux-cluster Apr 10 11:38:01 s_sys at yack21
>>>> >> ccsd[25004]: Connected to cluster infrastruture
>>>> >> via: CMAN/SM Plugin v1.1.2
>>>> >> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status::
>>>> >> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN:
>>>> >> forming a new cluster
>>>> >>
>>>> >> and nothing more.
>>>> >>
>>>> >> The graphic tool dos not detect any error in configuration; I 've
>>>> >> attached my cluster.conf for the three nodes, knowing that
>
>> I wanted
>
>>>> >> two nodes (yack10 and yack21) running theirs applications
>
>> and the 3rd
>
>>>> >> one (yack23) as a backup for yack10 and/or yack21, but I
>
>> don't want
>
>>>> >> any failover between yack10 and yack21.
>>>> >>
>>>> >> PS : I 've verified all ssh connections between the 3
>
>> nodes, and all
>
>>>> >> the fence paths as described in the cluster.conf.
>>>> >> Thanks again for your help.
>>>> >>
>>>> >> Alain
>>>> >>
>
>>
>>
>
>>> >Are you starting the cman on all three nodes in the same
>
>> time? A node
>
>>> >doesn't start until each other node is starting. Timing is
>
>> important during booting.
>>
>
>>> >Leandro
>
>>
>> Hi, no I wasn't ...
>> I've tried now, and this is ok on yack21 and yack23, but not
>> on yack10, is there something wrong in the cluster.conf to
>> explain this behavior ?
>> On yack10 , cman is trying to :
>> CMAN: forming a new cluster
>> but fails with a timeout ...
>>
>> ??
>> Thanks
>> Alain
>> --
>>
>Maybe this time is due to a firewall setup, as already stated on the list. A
>tcpdump from yack10 to the other nodes may help you catch the bug.
>Leandro
No firewall setup on yack10, neither on yack21 nor yack23. Besides
the ssh connections are all valid between the three nodes in all
combinations without passwd request. And still the problem ...
Any other idea ?
Is my cluster.conf correct ?
Besides, with regard to you first answer, I've tested on yack21 and yack23 :
if I start cman only on yack21, it does end in timeout.
And if I start cman quite at the same time on yack21 and yack23, it
works on both nodes.
I haven't found in documentation any recommandation about this point.
Besides, if one node is breakdowned, that mean that we will never be
able to reboot the other node and launch the CS4 again with all
applications ... sounds strange, doesn't it ?
Thanks
Alain Moull?
From pcaulfie at redhat.com Tue Apr 11 11:52:23 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 11 Apr 2006 12:52:23 +0100
Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
In-Reply-To: <443B8BD6.80906@bull.net>
References: <443B8BD6.80906@bull.net>
Message-ID: <443B9877.2020505@redhat.com>
Alain Moulle wrote:
>> Maybe this time is due to a firewall setup, as already stated on the list. A
>> tcpdump from yack10 to the other nodes may help you catch the bug.
>> Leandro
>
> No firewall setup on yack10, neither on yack21 nor yack23. Besides
> the ssh connections are all valid between the three nodes in all
> combinations without passwd request. And still the problem ...
> Any other idea ?
> Is my cluster.conf correct ?
>
> Besides, with regard to you first answer, I've tested on yack21 and yack23 :
> if I start cman only on yack21, it does end in timeout.
> And if I start cman quite at the same time on yack21 and yack23, it
> works on both nodes.
> I haven't found in documentation any recommandation about this point.
> Besides, if one node is breakdowned, that mean that we will never be
> able to reboot the other node and launch the CS4 again with all
> applications ... sounds strange, doesn't it ?
>
Can you be a little clearer exactly what you mean by this? and post some exact
messages please. It's not clear to me now just what your problem is.
>From your initial post it sounded like the nodes in the cluster were forming
separate clusters, but that last sentence makes it sound like you're seeing
something else.
--
patrick
From l.dardini at comune.prato.it Tue Apr 11 12:48:35 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Tue, 11 Apr 2006 14:48:35 +0200
Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
Message-ID: <0C5C8B118420264EBB94D7D7050150011EFAFC@exchange2.comune.prato.local>
> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] Per conto di Alain Moulle
> Inviato: marted? 11 aprile 2006 12.59
> A: linux-cluster at redhat.com
> Oggetto: R: [Linux-cluster] CS4 U2 / problem to configure a 3
> nodes cluster
>
> >>Hi
> >>
> >>>> >>
> >>>> >> I'm trying to configure a simple 3 nodes cluster with simple
> >>>> >> tests scripts.
> >>>> >> But I can't start cman, it remains stalled with this
> message in
> >>>> >> syslog :
> >>>> >> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr
> >
> >> 10 11:38:00
> >
> >>>> >> s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
> >>>> >> 16:04:34) installed
> >>>> >> Apr 10 11:38:00 s_kernel at yack21 kernel: NET:
> Registered protocol
> >>>> >> family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]:
> >>>> >> cluster.conf (cluster name = HA_METADATA_3N, version
> = 8) found.
> >>>> >> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to
> >
> >> join or form
> >
> >>>> >> a Linux-cluster Apr 10 11:38:01 s_sys at yack21
> >>>> >> ccsd[25004]: Connected to cluster infrastruture
> >>>> >> via: CMAN/SM Plugin v1.1.2
> >>>> >> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status::
> >>>> >> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN:
> >>>> >> forming a new cluster
> >>>> >>
> >>>> >> and nothing more.
> >>>> >>
> >>>> >> The graphic tool dos not detect any error in configuration; I
> >>>> >> 've attached my cluster.conf for the three nodes, knowing that
> >
> >> I wanted
> >
> >>>> >> two nodes (yack10 and yack21) running theirs applications
> >
> >> and the 3rd
> >
> >>>> >> one (yack23) as a backup for yack10 and/or yack21, but I
> >
> >> don't want
> >
> >>>> >> any failover between yack10 and yack21.
> >>>> >>
> >>>> >> PS : I 've verified all ssh connections between the 3
> >
> >> nodes, and all
> >
> >>>> >> the fence paths as described in the cluster.conf.
> >>>> >> Thanks again for your help.
> >>>> >>
> >>>> >> Alain
> >>>> >>
> >
> >>
> >>
> >
> >>> >Are you starting the cman on all three nodes in the same
> >
> >> time? A node
> >
> >>> >doesn't start until each other node is starting. Timing is
> >
> >> important during booting.
> >>
> >
> >>> >Leandro
> >
> >>
> >> Hi, no I wasn't ...
> >> I've tried now, and this is ok on yack21 and yack23, but not on
> >> yack10, is there something wrong in the cluster.conf to
> explain this
> >> behavior ?
> >> On yack10 , cman is trying to :
> >> CMAN: forming a new cluster
> >> but fails with a timeout ...
> >>
> >> ??
> >> Thanks
> >> Alain
> >> --
> >>
>
>
> >Maybe this time is due to a firewall setup, as already stated on the
> >list. A tcpdump from yack10 to the other nodes may help you
> catch the bug.
> >Leandro
>
> No firewall setup on yack10, neither on yack21 nor yack23.
> Besides the ssh connections are all valid between the three
> nodes in all combinations without passwd request. And still
> the problem ...
> Any other idea ?
> Is my cluster.conf correct ?
>
> Besides, with regard to you first answer, I've tested on
> yack21 and yack23 :
> if I start cman only on yack21, it does end in timeout.
> And if I start cman quite at the same time on yack21 and
> yack23, it works on both nodes.
> I haven't found in documentation any recommandation about this point.
> Besides, if one node is breakdowned, that mean that we will
> never be able to reboot the other node and launch the CS4
> again with all applications ... sounds strange, doesn't it ?
>
No, this doesn't sound strange. Cluster must be quorate to operate. Quorum can be reduced while a node is down, fencing it or simply removing it, by cman or by hand editing cluster.conf. Try this: start all the node without cman, gfs and other GFS suite packages. Then start by hand, one a time on each node, ccsd, cman, lock_gulm(?), fenced, clvmd and rgmanager init scripts. After each run, check the /var/log/messages output and connectivity between nodes. Unfortunately the configuration is far different from the one I use, so I cannot help you.
Leandro
From akpinar_haydar at hotmail.com Tue Apr 11 05:17:53 2006
From: akpinar_haydar at hotmail.com (Haydar Akpinar)
Date: Tue, 11 Apr 2006 05:17:53 +0000
Subject: [Linux-cluster] Linux (qmail) clustering
Message-ID:
Hello every one. I am a newbe so don't really know much about UNIX nor
Linux for that matter
I have been asked to do a high availability qmail(non LDAP) clustering which
is running on Redhat 9.
I would like to know if it is possible to do and also if any one has done
qmail clustering on a Linux box.
And if any one can direct me with finding the information on How To
Thanks for your time.
_________________________________________________________________
Hava durumunu bizden ?grenin ve evden ?yle ?ikin!
http://www.msn.com.tr/havadurumu/
From ocrete at max-t.com Tue Apr 11 14:06:40 2006
From: ocrete at max-t.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=)
Date: Tue, 11 Apr 2006 10:06:40 -0400
Subject: [Linux-cluster] cman kickout out nodes for no good reason
In-Reply-To: <443B5F28.1060004@redhat.com>
References: <1144341281.355.38.camel@cocagne.max-t.internal>
<1144702908.21093.7.camel@cocagne.max-t.internal>
<443B5F28.1060004@redhat.com>
Message-ID: <1144764400.9106.3.camel@TesterBox.tester.ca>
On Tue, 2006-11-04 at 08:47 +0100, Patrick Caulfield wrote:
> Olivier Cr?te wrote:
> > On Thu, 2006-06-04 at 12:34 -0400, Olivier Cr?te wrote:
> >> I have a strange problem where cman suddenly starts kicking out members
> >> of the cluster with "Inconsistent cluster view" when I join a new node
> >> (sometimes). It takes a few minutes between each kicking. I'm using a
> >> snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is
> >> in transition state at that point and I can't stop/start services or do
> >> anything else. It did not do that with a snapshot I took a few months
> >> ago.
> >
> > Its still happening, the node that joins says "Transition master
> > unknown", while all of the other nodes who the master is, then the
> > master gets kicked out. Then a new master is selected, all of the nodes
> > seem to know who the master is, but refuse to act on it. After a while,
> > the new master is kicked out and the process restarts. I guess its
> > related to the changes with the timestamps to prevent master desync, I
> > dont see any other recent change that could have caused it.
> >
>
> That's very peculiar behaviour, and it's going to be hard to pin down. How
> consistently does it happen ?
Often, but I haven't found the exact sequence to reproduce it.
> It could be caused by extreme network packet loss, or something blocking the
> progress of cman processes. Are the already joined nodes very busy when you
> bring the new node into the cluster (if so, doing what?)
I doubt its packet loss since cman is running over myrinet's ethernet/ip
layer and its the only user of that port (so it shouldn't be affected by
the rest of the traffic over the myrinet). The other nodes may be busy,
but the CPU isn't at 100% us on any of them, although the PCIX bus may
be used a lot.
> I think the best way to try and track this down is to get a tcpdump of the
> cluster traffic (port 6809/udp) happening at the time of the join - make sure
> that all nodes are included in the dump and that all of the packet is captured.
I will try to get a tcpdump.
Thanks for you help,
--
Olivier Cr?te
ocrete at max-t.com
Maximum Throughput Inc.
From mbrookov at mines.edu Tue Apr 11 14:49:04 2006
From: mbrookov at mines.edu (Matthew B. Brookover)
Date: Tue, 11 Apr 2006 08:49:04 -0600
Subject: [Linux-cluster] Cisco fence agent
In-Reply-To:
References:
Message-ID: <1144766944.16956.10.camel@merlin.Mines.EDU>
I do not know if this will help, but here is what I put together.
We have 3 Cisco 3750 switches. I am currently using SNMP to turn off
the ports of a host that is being fenced. I wrote a perl script called
fence_cisco that works with GFS 6. I have attached a copy of
fence_cisco to this message and its config file. I do not have much in
the way of documentation for it, and it will probably take some hacking
to get it to work with a current version of GFS. If you know a little
perl, writing a fencing agent is not very difficult.
I have also included a copy for the config file for fence_cisco. The
first two lines specify the SNMP community string and the IP address for
the switch. The rest is a list of hosts and the ports they use. You
will have to talk to your local network guru to figure out Cisco
community strings and the numbers involved. It took some tinkering to
figure out how Cisco does this stuff, and even after writing the code, I
am still not sure that I understand it. I do know that it does work,
GFS does do the correct things during a crash.
Most people use one of the power supply switches. Redhat provides the
fence_apc agent that will turn off the power to a node that needs to be
fenced. I like the network option because the host that is having
problems will be able to write log entries after it has been fenced.
You will need to get the Net::SNMP module from cpan.org to use
fence_cisco.
Matt
On Sun, 2006-04-09 at 01:23 +0900, ??? wrote:
> Hi all.
> Do anyone have cisco catalyst fence agent?
> If nobody make that, I will make.
>
> Thanks.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fence_cisco
Type: application/x-perl
Size: 10442 bytes
Desc: not available
URL:
-------------- next part --------------
community:YOURSTRINGHERE
switch:1.1.1.1
imagine:GigabitEthernet1/0/9:GigabitEthernet2/0/9:GigabitEthernet1/0/5
illuminate:GigabitEthernet2/0/10:GigabitEthernet3/0/9:GigabitEthernet2/0/6
illusion:GigabitEthernet1/0/10:GigabitEthernet3/0/10:GigabitEthernet1/0/6
inception:GigabitEthernet1/0/11:GigabitEthernet2/0/11:GigabitEthernet1/0/7
inspire:GigabitEthernet2/0/12:GigabitEthernet3/0/11:GigabitEthernet2/0/8
incantation:GigabitEthernet1/0/12:GigabitEthernet3/0/12:GigabitEthernet1/0/8
From carlopmart at gmail.com Tue Apr 11 15:01:16 2006
From: carlopmart at gmail.com (carlopmart)
Date: Tue, 11 Apr 2006 17:01:16 +0200
Subject: [Linux-cluster] Question about manual fencing
In-Reply-To: <443A80D2.6050806@adelpha-lan.org>
References: <443A7F34.7000901@gmail.com> <443A80D2.6050806@adelpha-lan.org>
Message-ID: <443BC4BC.1030405@gmail.com>
Thanks Jerome.
Castang Jerome wrote:
> carlopmart a ?crit :
>
>> Hi all,
>>
>> I would like to test manual fencing on two nodes for testing
>> pourposes. I have read RedHat's docs about this but I don't see very
>> clear. If I setup manual fencing, when one node shutdowns, the other
>> node startups all services that I have configured on the another node
>> automatically?
>>
>> Thanks.
>>
>
> I don't think so.
> Fencing a node is to stop it, or make it leaving the cluster (using any
> method like shutdown...)
> So if you use manual fencing, the other nodes will not start automaticly
> their services...
>
>
--
CL Martinez
carlopmart {at} gmail {d0t} com
From basv at sara.nl Tue Apr 11 15:35:47 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Tue, 11 Apr 2006 17:35:47 +0200
Subject: [Offlist] Re: [Linux-cluster] Using GFS with vanilla kernel
(2.6.16)
In-Reply-To:
References:
<443A6CB1.7010307@adelpha-lan.org>
<443A70EE.4070907@adelpha-lan.org>
<443A790E.1040002@sara.nl>
Message-ID:
On Apr 11, 2006, at 3:58 PM, Nate Carlson wrote:
> On Mon, 10 Apr 2006, Bas van der Vlies wrote:
>> I have compiled deb-packages for kernel 2.6.16.2 and uses the CVS
>> STABLE branch.
>
> Do you have the source packages? It'd be really handy to be able to
> build module packages. :)
>
>
I did not make source packages, its is a good suggestion, but i use
gfs from CVS and use different kind of kernels. So i regularly
make new versions.
For every package i creates a debian directory and i made i global
script that compiles everything and make debian packages
- for the kernel modules, the kernel version is in the package
- for the user space tools i only update the version number.
Regards
--
Bas van der Vlies
basv at sara.nl
From natecars at natecarlson.com Tue Apr 11 15:37:58 2006
From: natecars at natecarlson.com (Nate Carlson)
Date: Tue, 11 Apr 2006 10:37:58 -0500 (CDT)
Subject: [Offlist] Re: [Linux-cluster] Using GFS with vanilla kernel
(2.6.16)
In-Reply-To:
References:
<443A6CB1.7010307@adelpha-lan.org>
<443A70EE.4070907@adelpha-lan.org>
<443A790E.1040002@sara.nl>
Message-ID:
On Tue, 11 Apr 2006, Bas van der Vlies wrote:
> I did not make source packages, its is a good suggestion, but i use gfs
> from CVS and use different kind of kernels. So i regularly make new
> versions.
>
> For every package i creates a debian directory and i made i global script
> that compiles everything and make debian packages
> - for the kernel modules, the kernel version is in the package
> - for the user space tools i only update the version number.
Would you mind sharing the scripts? That'd make my life a bit easier when
packaging GFS for debian. :)
------------------------------------------------------------------------
| nate carlson | natecars at natecarlson.com | http://www.natecarlson.com |
| depriving some poor village of its idiot since 1981 |
------------------------------------------------------------------------
From basv at sara.nl Tue Apr 11 15:43:37 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Tue, 11 Apr 2006 17:43:37 +0200
Subject: [Offlist] Re: [Linux-cluster] Using GFS with vanilla kernel
(2.6.16)
In-Reply-To:
References:
<443A6CB1.7010307@adelpha-lan.org>
<443A70EE.4070907@adelpha-lan.org>
<443A790E.1040002@sara.nl>
Message-ID: <2553AC38-C5BC-4C10-95CE-8CFB0F85E0A6@sara.nl>
On Apr 11, 2006, at 5:37 PM, Nate Carlson wrote:
> On Tue, 11 Apr 2006, Bas van der Vlies wrote:
>> I did not make source packages, its is a good suggestion, but i
>> use gfs from CVS and use different kind of kernels. So i
>> regularly make new versions.
>>
>> For every package i creates a debian directory and i made i global
>> script that compiles everything and make debian packages
>> - for the kernel modules, the kernel version is in the package
>> - for the user space tools i only update the version number.
>
> Would you mind sharing the scripts? That'd make my life a bit
> easier when packaging GFS for debian. :)
>
No problem, I have to package it and make it available on our ftp-
server. If find bug or have improvements mail them.
I will send an email to list if i have made release ;-)
Regards
--
Bas van der Vlies
basv at sara.nl
From natecars at natecarlson.com Tue Apr 11 15:43:59 2006
From: natecars at natecarlson.com (Nate Carlson)
Date: Tue, 11 Apr 2006 10:43:59 -0500 (CDT)
Subject: [Offlist] Re: [Linux-cluster] Using GFS with vanilla kernel
(2.6.16)
In-Reply-To: <2553AC38-C5BC-4C10-95CE-8CFB0F85E0A6@sara.nl>
References:
<443A6CB1.7010307@adelpha-lan.org>
<443A70EE.4070907@adelpha-lan.org>
<443A790E.1040002@sara.nl>
<2553AC38-C5BC-4C10-95CE-8CFB0F85E0A6@sara.nl>
Message-ID:
On Tue, 11 Apr 2006, Bas van der Vlies wrote:
> No problem, I have to package it and make it available on our ftp-server.
> If find bug or have improvements mail them.
> I will send an email to list if i have made release ;-)
Great - thanks! :)
------------------------------------------------------------------------
| nate carlson | natecars at natecarlson.com | http://www.natecarlson.com |
| depriving some poor village of its idiot since 1981 |
------------------------------------------------------------------------
From jbrassow at redhat.com Tue Apr 11 15:48:25 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Tue, 11 Apr 2006 10:48:25 -0500
Subject: [Linux-cluster] Question about manual fencing
In-Reply-To: <443BC4BC.1030405@gmail.com>
References: <443A7F34.7000901@gmail.com> <443A80D2.6050806@adelpha-lan.org>
<443BC4BC.1030405@gmail.com>
Message-ID: <634f53a0e00f383b47d142f530b9dbf7@redhat.com>
manual fencing gets it's name because it requires manual
intervention... that is, it is not automatic.
brassow
On Apr 11, 2006, at 10:01 AM, carlopmart wrote:
> Thanks Jerome.
>
> Castang Jerome wrote:
>> carlopmart a ?crit :
>>> Hi all,
>>>
>>> I would like to test manual fencing on two nodes for testing
>>> pourposes. I have read RedHat's docs about this but I don't see
>>> very clear. If I setup manual fencing, when one node shutdowns, the
>>> other node startups all services that I have configured on the
>>> another node automatically?
>>>
>>> Thanks.
>>>
>> I don't think so.
>> Fencing a node is to stop it, or make it leaving the cluster (using
>> any method like shutdown...)
>> So if you use manual fencing, the other nodes will not start
>> automaticly their services...
>
> --
> CL Martinez
> carlopmart {at} gmail {d0t} com
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
From Alain.Moulle at bull.net Tue Apr 11 15:56:02 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 11 Apr 2006 17:56:02 +0200
Subject: [Linux-cluster] CS4 Update 2 / cluster 3 noeuds question
Message-ID: <443BD192.1000407@bull.net>
Hi
Finally I've found the problem (a bad alias in /etc/hosts !).
But I've another question :
As told before, I have yack10 and yack23 with each one a service
to run, and yack23 as backup for both nodes (see attached cluster.conf)
I've tested with a poweroff on yack10 and the service
is well failoverd on yack23. But then I tried to
do poweroff on yack21, but it does not failover
because "missing two many heart beats".
I suspect that it is normal because we have only
one node left among the three, and so there is
not enough votes ...
But I would like to have a confirmation ?
And if so, is there a way to configure so that
yack23 could failover the services of both
other nodes stopped at the same time ?
Thanks
Alain
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 2015 bytes
Desc: not available
URL:
From teigland at redhat.com Tue Apr 11 16:52:59 2006
From: teigland at redhat.com (David Teigland)
Date: Tue, 11 Apr 2006 11:52:59 -0500
Subject: [Linux-cluster] cluster-1.02.00
Message-ID: <20060411165259.GB5820@redhat.com>
A new source tarball from the STABLE branch has been released; it builds
and runs on 2.6.16:
ftp://sources.redhat.com/pub/cluster/releases/cluster-1.02.00.tar.gz
Version 1.02.00 - 10 April 2006
===============================
dlm-kernel: Allow DLM to start if the node gets a different nodeid.
dlm-kernel: Add WARNING printk when cman calls emergency_shutdown.
dlm-kernel: The in_recovery semaphore wasn't being released in corner case
where grant message is ignored for lock being unlocked.
dlm-kernel: Remove an assertion that triggers unnecessarily in rare
cases of overlapping and invalid master lookups.
dlm-kernel: Don't close existing connection if a double-connect is
attempted - just ignore the last one.
dlm-kernel: Fix a race where an attempt to unlock a lock in the completion
AST routine could crash on SMP.
dlm-kernel: Fix transient hangs that could be caused by incorrect handling
of locks granted due to ALTMODE. bz#178738
dlm-kernel: Allow any old user to create the default lockspace. You need Udev
running AND build dlm with ./configure --have_udev.
dlm-kernel: Only release a lockspace if all users have closed it. bz#177934
cman-kernel: Fix cman master confusion during recovery. bz#158592
cman-kernel: Add printk to assert failure when a nodeid lookup fails.
cman-kernel: Give an interface "max-retries" attempts to get fixed after
an error before we give up and shut down the cluster.
cman-kernel: IPv6 FF1x:: multicast addresses don't work. Always send out
of the locally bound address. bz#166752
cman-kernel: Ignore really badly delayed old duplicates that might get
sent via a bonded interface. bz#173621
cman-kernel: /proc/cluster/services seq_start needs to initialise the pointer,
we may not be starting from the beginning every time. bz#175372
cman-kernel: Fix memory leak when reading from /proc/cluster/nodes or
/proc/cluster/services. bz#178367
cman-kernel: Send a userspace notification when we are the last node in
a cluster. bz#182233
cman-kernel: add quorum device interface for userspace
cman-kernel: Add node ID to /proc/cluster/status
cman: Allow "cman_tool leave force" to cause cman to leave the cluster
even if it's in transition or joining.
cman: Look over more than 16 interfaces when searching for the broadcast
address.
cman: init script does 'cman_tool leave remove' on stop
cman: add cman_get/set_private to libcman
cman: add quorum device API to libcman
gfs-kernel: Fix performance with sync mount option; pages were not being
flushed when gfs_writepage is called. bz#173147
gfs-kernel: Flush pages into storage in case of DirectIO falling back to
BufferIO. DirectIO reads were sometimes getting stale data.
gfs-kernel: Make sendfile work with stuffed inodes; after a write on
stuffed inode, mark cached page as not uptodate. bz#142849
gfs-kernel: Fix spot where the quota_enforce setting is ignored.
gfs-kernel: Fix case of big allocation slowdown. The allocator could end
up failing its passive attempts to lock all recent rgrps because another
node had deallocated from them and was caching the locks. The allocator now
switches from passive to forceful requests after try_threshold failures.
gfs-kernel: Fix rare case of bad NFS file handles leading to stale file
handle errors. bz#178469
gfs-kernel: Properly handle error return code from verify_jhead().
gfs-kernel: Fix possible umount panic due to the ordering of log flushes
and log shutdown. bz#164331, bz#178469
gfs-kernel: Fix directory delete out of memory error. bz#182057
gfs-kernel: Return code was not being propagated while setting default
ACLs causing an EPERM everytime. bz#182066
gulm: Fix bug that would cause luck_gulmd to not call waitpid unless
SIGCHLD was received from the child. bz#171246
gulm: Fix problems with host lookups. Now try to match the ip if we are
unable to match the name of a lock server as well as fixing the expiration
of locks if gulm somehow gets a FQDN. bz#169171
fence/fenced: Multiple devices in one method were not being translated
into multiple calls to an agent, but all the device data was lumped together
for one agent call. bz#172401
fence/fence_apc: Make agent work with 7900 series apc switches. bz#172441
fence/fence_ipmilan: fixes for bz#178314
fence/fence_drac: support for drac 4/I
fence/fence_drac: interface change in drac_mc firmware version 1.2
fence: Add support for IBM rsa fence agent
gnbd-kernel: gnbd_monitor wouldn't correctly reset after an uncached gnbd had
failed and been restored. bz#155304
gnbd-kernel: kill gnbd_monitor when all uncached gnbds have been removed.
bz#127042
gnbd: changes to let multipath run over gnbd.
gfs_fsck: Fix small window where another node can mount during a gfs_fsck.
bz#169087
gfs_fsck: gfs_fsck crashed on many types of extended attribute corruptions.
bz#173697
gfs_fsck: Check result code and handle failure's in fsck rgrp read code.
bz#169340
gfs_fsck: fix errors checking large (multi-TB) filesystems. bz#186125
gfs_edit: new version with more options that uses ncurses.
ccs: Make ccs connection descriptors time out, fixing a problem where all
descriptors could be used up, even though none are in use.
ccs: Increase number of connection descriptors from 10 to 30.
ccs: Ignore SIGPIPE, don't catch SIGSEV, allowing for core dumps.
ccs: endian fixes for clusters of machines with different endianness
ccs: Fix error printing. bz#178812
ccs: fix ccs_tool seg fault on upgrade. bz#186121
magma-plugins/sm: Fix reads of /proc/cluster/services. bz#175033
magma-plugins/gulm: Fix clu_lock() return value that resulted in
"Resource temporarily unavailable" messages at times. bz#171253
rgmanager: Add support for inheritance in the form "type%attribute"
instead of just attribute so as to avoid confusion.
rgmanager: Fix bz#150346 - Clustat usability problems
rgmanager: Fix bz#170859 - VIPs show up on multiple members.
rgmanager: Fix bz#171034 - Missing: Monitoring for local and cluster fs's
rgmanager: Fix bz#171036 - RFE: Log messages in resource agents
rgmanager: Fix bz#165447 - ip.sh fails when using VLAN on bonded interface
rgmanager: Fix bz#171153 - clustat withholds information if run on multiple
members simultaneously
rgmanager: Fix bz#171236 - ia64 alignment warnings
rgmanager: Fix bz#173526 - Samba Resource Agent
rgmanager: Fix bz#173916 - rgmanager log level change requires restart
rgmanager: Fix bz#174819 - clustat crashes if ccsd is not running
rgmanager: Fix bz#175106 - lsof -b blocks when using gethostbyname causing
slow force-unmount when DNS is broken
rgmanager: Fix bz#175108 - rgmanager storing extraneous info using VF
rgmanager: Fix bz#175114 - rgmanager uses wrong stop-order for unspecified
resource agents
rgmanager: Implement bz#175215: Inherit fsid for nfs exports
rgmanager: Fix bz#175229 - remove unneeded references to clurmtabd; it is no
longer a necessary piece for NFS failover
rgmanager: Fix bz#176343 - __builtin_return_address(x) for x>0 is never
guaranteed to work
rgmanager: Ensure rgmanager doesn't block SIGSEGV when debug is not enabled.
rgmanager: Fix bz#172177, bz#172178
rgmanager: Allow scripts to inherit the name attr of a parent in case the
script wants to know it. bz#172310
rgmanager: Fix #166109 - random segfault in clurgmgrd
rgmanager: Fix most of 177467 - clustat hang
From gstaltari at arnet.net.ar Tue Apr 11 19:25:20 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Tue, 11 Apr 2006 16:25:20 -0300
Subject: [Linux-cluster] cluster-1.02.00
In-Reply-To: <20060411165259.GB5820@redhat.com>
References: <20060411165259.GB5820@redhat.com>
Message-ID: <443C02A0.5010103@arnet.net.ar>
David Teigland wrote:
> A new source tarball from the STABLE branch has been released; it builds
> and runs on 2.6.16:
>
> ftp://sources.redhat.com/pub/cluster/releases/cluster-1.02.00.tar.gz
>
> Version 1.02.00 - 10 April 2006
> ===============================
> dlm-kernel: Allow DLM to start if the node gets a different nodeid.
> dlm-kernel: Add WARNING printk when cman calls emergency_shutdown.
> dlm-kernel: The in_recovery semaphore wasn't being released in corner case
> where grant message is ignored for lock being unlocked.
> dlm-kernel: Remove an assertion that triggers unnecessarily in rare
> cases of overlapping and invalid master lookups.
> dlm-kernel: Don't close existing connection if a double-connect is
> attempted - just ignore the last one.
> dlm-kernel: Fix a race where an attempt to unlock a lock in the completion
> AST routine could crash on SMP.
> dlm-kernel: Fix transient hangs that could be caused by incorrect handling
> of locks granted due to ALTMODE. bz#178738
> dlm-kernel: Allow any old user to create the default lockspace. You need Udev
> running AND build dlm with ./configure --have_udev.
> dlm-kernel: Only release a lockspace if all users have closed it. bz#177934
> cman-kernel: Fix cman master confusion during recovery. bz#158592
> cman-kernel: Add printk to assert failure when a nodeid lookup fails.
> cman-kernel: Give an interface "max-retries" attempts to get fixed after
> an error before we give up and shut down the cluster.
> cman-kernel: IPv6 FF1x:: multicast addresses don't work. Always send out
> of the locally bound address. bz#166752
> cman-kernel: Ignore really badly delayed old duplicates that might get
> sent via a bonded interface. bz#173621
> cman-kernel: /proc/cluster/services seq_start needs to initialise the pointer,
> we may not be starting from the beginning every time. bz#175372
> cman-kernel: Fix memory leak when reading from /proc/cluster/nodes or
> /proc/cluster/services. bz#178367
> cman-kernel: Send a userspace notification when we are the last node in
> a cluster. bz#182233
> cman-kernel: add quorum device interface for userspace
> cman-kernel: Add node ID to /proc/cluster/status
> cman: Allow "cman_tool leave force" to cause cman to leave the cluster
> even if it's in transition or joining.
> cman: Look over more than 16 interfaces when searching for the broadcast
> address.
> cman: init script does 'cman_tool leave remove' on stop
> cman: add cman_get/set_private to libcman
> cman: add quorum device API to libcman
> gfs-kernel: Fix performance with sync mount option; pages were not being
> flushed when gfs_writepage is called. bz#173147
> gfs-kernel: Flush pages into storage in case of DirectIO falling back to
> BufferIO. DirectIO reads were sometimes getting stale data.
> gfs-kernel: Make sendfile work with stuffed inodes; after a write on
> stuffed inode, mark cached page as not uptodate. bz#142849
> gfs-kernel: Fix spot where the quota_enforce setting is ignored.
> gfs-kernel: Fix case of big allocation slowdown. The allocator could end
> up failing its passive attempts to lock all recent rgrps because another
> node had deallocated from them and was caching the locks. The allocator now
> switches from passive to forceful requests after try_threshold failures.
> gfs-kernel: Fix rare case of bad NFS file handles leading to stale file
> handle errors. bz#178469
> gfs-kernel: Properly handle error return code from verify_jhead().
> gfs-kernel: Fix possible umount panic due to the ordering of log flushes
> and log shutdown. bz#164331, bz#178469
> gfs-kernel: Fix directory delete out of memory error. bz#182057
> gfs-kernel: Return code was not being propagated while setting default
> ACLs causing an EPERM everytime. bz#182066
> gulm: Fix bug that would cause luck_gulmd to not call waitpid unless
> SIGCHLD was received from the child. bz#171246
> gulm: Fix problems with host lookups. Now try to match the ip if we are
> unable to match the name of a lock server as well as fixing the expiration
> of locks if gulm somehow gets a FQDN. bz#169171
> fence/fenced: Multiple devices in one method were not being translated
> into multiple calls to an agent, but all the device data was lumped together
> for one agent call. bz#172401
> fence/fence_apc: Make agent work with 7900 series apc switches. bz#172441
> fence/fence_ipmilan: fixes for bz#178314
> fence/fence_drac: support for drac 4/I
> fence/fence_drac: interface change in drac_mc firmware version 1.2
> fence: Add support for IBM rsa fence agent
> gnbd-kernel: gnbd_monitor wouldn't correctly reset after an uncached gnbd had
> failed and been restored. bz#155304
> gnbd-kernel: kill gnbd_monitor when all uncached gnbds have been removed.
> bz#127042
> gnbd: changes to let multipath run over gnbd.
> gfs_fsck: Fix small window where another node can mount during a gfs_fsck.
> bz#169087
> gfs_fsck: gfs_fsck crashed on many types of extended attribute corruptions.
> bz#173697
> gfs_fsck: Check result code and handle failure's in fsck rgrp read code.
> bz#169340
> gfs_fsck: fix errors checking large (multi-TB) filesystems. bz#186125
> gfs_edit: new version with more options that uses ncurses.
> ccs: Make ccs connection descriptors time out, fixing a problem where all
> descriptors could be used up, even though none are in use.
> ccs: Increase number of connection descriptors from 10 to 30.
> ccs: Ignore SIGPIPE, don't catch SIGSEV, allowing for core dumps.
> ccs: endian fixes for clusters of machines with different endianness
> ccs: Fix error printing. bz#178812
> ccs: fix ccs_tool seg fault on upgrade. bz#186121
> magma-plugins/sm: Fix reads of /proc/cluster/services. bz#175033
> magma-plugins/gulm: Fix clu_lock() return value that resulted in
> "Resource temporarily unavailable" messages at times. bz#171253
> rgmanager: Add support for inheritance in the form "type%attribute"
> instead of just attribute so as to avoid confusion.
> rgmanager: Fix bz#150346 - Clustat usability problems
> rgmanager: Fix bz#170859 - VIPs show up on multiple members.
> rgmanager: Fix bz#171034 - Missing: Monitoring for local and cluster fs's
> rgmanager: Fix bz#171036 - RFE: Log messages in resource agents
> rgmanager: Fix bz#165447 - ip.sh fails when using VLAN on bonded interface
> rgmanager: Fix bz#171153 - clustat withholds information if run on multiple
> members simultaneously
> rgmanager: Fix bz#171236 - ia64 alignment warnings
> rgmanager: Fix bz#173526 - Samba Resource Agent
> rgmanager: Fix bz#173916 - rgmanager log level change requires restart
> rgmanager: Fix bz#174819 - clustat crashes if ccsd is not running
> rgmanager: Fix bz#175106 - lsof -b blocks when using gethostbyname causing
> slow force-unmount when DNS is broken
> rgmanager: Fix bz#175108 - rgmanager storing extraneous info using VF
> rgmanager: Fix bz#175114 - rgmanager uses wrong stop-order for unspecified
> resource agents
> rgmanager: Implement bz#175215: Inherit fsid for nfs exports
> rgmanager: Fix bz#175229 - remove unneeded references to clurmtabd; it is no
> longer a necessary piece for NFS failover
> rgmanager: Fix bz#176343 - __builtin_return_address(x) for x>0 is never
> guaranteed to work
> rgmanager: Ensure rgmanager doesn't block SIGSEGV when debug is not enabled.
> rgmanager: Fix bz#172177, bz#172178
> rgmanager: Allow scripts to inherit the name attr of a parent in case the
> script wants to know it. bz#172310
> rgmanager: Fix #166109 - random segfault in clurgmgrd
> rgmanager: Fix most of 177467 - clustat hang
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
It would be nice to have the rpm for FC4 from this new update.
TIA
German
From gregp at liveammo.com Wed Apr 12 03:13:31 2006
From: gregp at liveammo.com (Greg Perry)
Date: Tue, 11 Apr 2006 23:13:31 -0400
Subject: [Linux-cluster] Questions about GFS
Message-ID: <443C705B.6020606@liveammo.com>
Hello,
I have been researching GFS for a few days, and I have some questions
that hopefully some seasoned users of GFS may be able to answer.
I am working on the design of a linux cluster that needs to be scalable,
it will be primarily an RDBMS-driven data warehouse used for data mining
and content indexing. In an ideal world, we would be able to start with
a small (say 4 node) cluster, then add machines (and storage) as the
various RDBMS' grow in size (as well as the use virtual IPs for load
balancing across multiple lighttpd instances. All machines on the node
need to be able to talk to the same volume of information, and GFS (in
theory at least) would be used to aggregate the drives from each machine
into that huge shared logical volume).
With that being said, here are some questions:
1) What is the preference on the RDBMS, will MySQL 5.x work and are
there any locking issues to consider? What would the best open source
RDBMS be (MySQL vs. Postgresql etc)
2) If there was a 10 machine cluster, each with a 300GB SATA drive, can
you use GFS to aggregate all 10 drives into one big logical 3000GB
volume? Would that scenario work similar to a RAID array? If one or
two nodes fail, but the GFS quorum is maintained, can those nodes be
replaced and repopulated just like a RAID-5 array? If this scenario is
possible, how difficult is it to "grow" the shared logical volume by
adding additional nodes (say I had two more machines each with a 300GB
SATA drive)?
3) How stable is GFS currently, and is it used in many production
environments?
4) How stable is the FC5 version, and does it include all of the
configuration utilities in the RH Enterprise Cluster version? (the idea
would be to prove the point on FC5, then migrate to RH Enterprise).
5) Would CentOS be preferred over FC5 for the initial proof of concept
and early adoption?
6) Are there any restrictions or performance advantages of using all
drives with the same geometry, or can you mix and match different size
drives and just add to the aggregate volume size?
Thanks in advance,
Greg
From pcaulfie at redhat.com Wed Apr 12 07:06:17 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 12 Apr 2006 08:06:17 +0100
Subject: [Linux-cluster] CS4 Update 2 / cluster 3 noeuds question
In-Reply-To: <443BD192.1000407@bull.net>
References: <443BD192.1000407@bull.net>
Message-ID: <443CA6E9.9000402@redhat.com>
Alain Moulle wrote:
> Hi
> Finally I've found the problem (a bad alias in /etc/hosts !).
>
> But I've another question :
> As told before, I have yack10 and yack23 with each one a service
> to run, and yack23 as backup for both nodes (see attached cluster.conf)
>
> I've tested with a poweroff on yack10 and the service
> is well failoverd on yack23. But then I tried to
> do poweroff on yack21, but it does not failover
> because "missing two many heart beats".
> I suspect that it is normal because we have only
> one node left among the three, and so there is
> not enough votes ...
> But I would like to have a confirmation ?
Yes, that's correct. If you have a three-node cluster then there needs to be
two active nodes for it to have quorum. Otherwise single nodes could split
form "clusters" on their own and corrupt the filesystem (in the case of GFS)
> And if so, is there a way to configure so that
> yack23 could failover the services of both
> other nodes stopped at the same time ?
>
--
patrick
From kumaresh81 at yahoo.co.in Wed Apr 12 08:12:24 2006
From: kumaresh81 at yahoo.co.in (Kumaresh Ponnuswamy)
Date: Wed, 12 Apr 2006 09:12:24 +0100 (BST)
Subject: [Linux-cluster] a doubt on quorums
Message-ID: <20060412081224.71455.qmail@web8318.mail.in.yahoo.com>
Hi,
I have a problem with my cluster and quorum settings and any help will be appreciated.
I have a five node cluster with quorum vote of 1 for all the 5 nodes. They have a GFS shared file system on all the five nodes, and, two domains and two services involving two nodes.
When I shut down the 3 nodes that don't participate in the two domains and clustered services, both the services stop and fail to start when tried manually also.
I guess it is something to do with the quorum settings, but not sure on the way forward.
The environment is on RHEL AS 4U2 with GFS 6.1 and RHCS 4U2.
Regards,
Kumaresh
---------------------------------
Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From placid at adelpha-lan.org Wed Apr 12 08:18:20 2006
From: placid at adelpha-lan.org (Castang Jerome)
Date: Wed, 12 Apr 2006 10:18:20 +0200
Subject: [Linux-cluster] a doubt on quorums
In-Reply-To: <20060412081224.71455.qmail@web8318.mail.in.yahoo.com>
References: <20060412081224.71455.qmail@web8318.mail.in.yahoo.com>
Message-ID: <443CB7CC.5@adelpha-lan.org>
Kumaresh Ponnuswamy a ?crit :
> Hi,
>
> I have a problem with my cluster and quorum settings and any help will
> be appreciated.
>
> I have a five node cluster with quorum vote of 1 for all the 5 nodes.
> They have a GFS shared file system on all the five nodes, and, two
> domains and two services involving two nodes.
>
> When I shut down the 3 nodes that don't participate in the two domains
> and clustered services, both the services stop and fail to start when
> tried manually also.
>
> I guess it is something to do with the quorum settings, but not sure
> on the way forward.
>
> The environment is on RHEL AS 4U2 with GFS 6.1 and RHCS 4U2.
>
> Regards,
> Kumaresh
>
> ------------------------------------------------------------------------
If you have 3 nodes of 5 falling down, your cluster becomes a two node
cluster.
So, as it is written in documentation, it's a "special cluster" and it
has to be specified (in cluster.conf or by this command "can_tool join -2")
When you have a two node cluster, it is possible that each node is
isolated (this is the "splitbrain" ).
--
Jerome CASTANG
Tel: 06.85.74.33.02
mail: jerome.castang at adelpha-lan.org
---------------------------------------------
Comme le dit un vieu proverbe chinois: RTFM !
From erwan at seanodes.com Wed Apr 12 08:18:48 2006
From: erwan at seanodes.com (Velu Erwan)
Date: Wed, 12 Apr 2006 10:18:48 +0200
Subject: [Linux-cluster] cluster-1.02.00
In-Reply-To: <20060411165259.GB5820@redhat.com>
References: <20060411165259.GB5820@redhat.com>
Message-ID: <443CB7E8.3020508@seanodes.com>
David Teigland a ?crit :
>A new source tarball from the STABLE branch has been released; it builds
>and runs on 2.6.16:
>
>
Is it possible to split the kernel part from the binaries part in the
make process ?
If yes, it could helps to have a dkms package that help us to use this
release in an easiest way ;o)
My build host don't have the same kernel source as my nodes, so I'd like
to build the binaries on it and then generate the dkms package.
When you install this dkms package on a new host, the kernel part of gfs
recompiles itself.. This is very usefull ;)
Erwan,
From basv at sara.nl Wed Apr 12 08:37:30 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Wed, 12 Apr 2006 10:37:30 +0200
Subject: [Linux-cluster] ANNOUNCE: gfs_2_deb utils initial version
In-Reply-To:
References:
<443A6CB1.7010307@adelpha-lan.org>
<443A70EE.4070907@adelpha-lan.org>
<443A790E.1040002@sara.nl>
<2553AC38-C5BC-4C10-95CE-8CFB0F85E0A6@sara.nl>
Message-ID: <443CBC4A.5080607@sara.nl>
= gfs_2_deb - utilities =
This is a release of the SARA package gfs_2_deb that contains utilities that
we use to make debian packages from the RedHat Cluster Software (GFS).
All init.d scripts in the debian package start at runlevel 3 and the scripts
start in the right order. We have choosen this setup for these reasons,
default
runlevel is 2:
1) When a node is fenced, the node is rebooted and is ready for
cluster mode.
2) We can easily switch from runlevels, join or leave the cluster
See README for further info
The package can be downloaded at:
ftp://ftp.sara.nl/pub/outgoing/gfs_2_deb-0.1.tar.gz
Regards
--
--
********************************************************************
* *
* Bas van der Vlies e-mail: basv at sara.nl *
* SARA - Academic Computing Services phone: +31 20 592 8012 *
* Kruislaan 415 fax: +31 20 6683167 *
* 1098 SJ Amsterdam *
* *
********************************************************************
From deval.kulshrestha at progression.com Wed Apr 12 08:57:41 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Wed, 12 Apr 2006 14:27:41 +0530
Subject: [Linux-cluster] RE: how to dis-allow manual mounting of cluster
file system resources?
Message-ID: <004501c65e0f$2afde300$7600a8c0@PROGRESSION>
Hi
I am using one MSA 500 G2 , two no. of HP DL360 G4P server with HP's HBA
642, Server installed with RHEL 4 ES U1 and RHCS4 with lock mgr as DLM
I have to run around 14 different services in HA mode, I have break them up
in two different priority domain.
Now 7 services runs on node1 in HA mode, node2 is failover host for them,
Remaining 7 services runs on node2 in HA mode and node1 is failover domain
for them.
In my scenario Simultaneous logical drive access is not required, thus I am
not using GFS here
What ever is needed is configured properly and working fine.
But this cluster is still causes some data inconsistency error if somebody
manually mounts the partitions which is already in access by other node.
I understand that this is against the basics of non-shared file system. This
can be documented also, but everybody knows that after 2-3 yrs down the line
when support staff replaced by new people, when they come in with very
limited understanding about the running stuff they can do some mount
mistake.(umount is a document screw up, but mount is here undocumented screw
up) every body knows mount is just a simple command, it does not harm
anything, if I just want to read data mount is ok. But in our case we wanted
to restrict other users to use mount command when some logical volume is
already mounted on one node.
I want some help on this, when shared file system is not implemented. How we
can restrict manual mount of cluster file system resources when its being in
use by some cluster services?
Any help would be highly appreciable here.
With regard
Deval K.
===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From kumaresh81 at yahoo.co.in Wed Apr 12 10:03:38 2006
From: kumaresh81 at yahoo.co.in (Kumaresh Ponnuswamy)
Date: Wed, 12 Apr 2006 11:03:38 +0100 (BST)
Subject: [Linux-cluster] RE: how to dis-allow manual mounting of cluster
file system resources?
In-Reply-To: <004501c65e0f$2afde300$7600a8c0@PROGRESSION>
Message-ID: <20060412100338.70764.qmail@web8327.mail.in.yahoo.com>
Hi,
In your case, I guess removing the SUID on mount for normal users is the best solution.
This is will prevent non root members from mounting the file systesm.
Regards,
Kumaresh
Deval kulshrestha wrote:
Hi
I am using one MSA 500 G2 , two no. of HP DL360 G4P server with HP?s HBA 642, Server installed with RHEL 4 ES U1 and RHCS4 with lock mgr as DLM
I have to run around 14 different services in HA mode, I have break them up in two different priority domain.
Now 7 services runs on node1 in HA mode, node2 is failover host for them,
Remaining 7 services runs on node2 in HA mode and node1 is failover domain for them.
In my scenario Simultaneous logical drive access is not required, thus I am not using GFS here
What ever is needed is configured properly and working fine.
But this cluster is still causes some data inconsistency error if somebody manually mounts the partitions which is already in access by other node.
I understand that this is against the basics of non-shared file system. This can be documented also, but everybody knows that after 2-3 yrs down the line when support staff replaced by new people, when they come in with very limited understanding about the running stuff they can do some mount mistake.(umount is a document screw up, but mount is here undocumented screw up) every body knows mount is just a simple command, it does not harm anything, if I just want to read data mount is ok. But in our case we wanted to restrict other users to use mount command when some logical volume is already mounted on one node.
I want some help on this, when shared file system is not implemented. How we can restrict manual mount of cluster file system resources when its being in use by some cluster services?
Any help would be highly appreciable here.
With regard
Deval K.
=========================================================== Privileged or confidential information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), please delete this message and kindly notify the sender by an emailed reply. Opinions, conclusions and other information in this message that do not relate to the official business of Progression and its associate entities shall be understood as neither given nor endorsed by them. ------------------------------------------------------------- Progression Infonet Private Limited, Gurgaon (Haryana), India
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
---------------------------------
Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From deval.kulshrestha at progression.com Wed Apr 12 10:59:13 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Wed, 12 Apr 2006 16:29:13 +0530
Subject: [Linux-cluster] RE: how to dis-allow manual mounting of cluster
file system resources?
In-Reply-To: <20060412100338.70764.qmail@web8327.mail.in.yahoo.com>
Message-ID: <005501c65e20$22437b10$7600a8c0@PROGRESSION>
Hi Kumaresh
Thanks for the reply/inputs
SAN LUN's are not defined in /etc/fstab. They don't have to be mounted while
OS boots. SAN volumes are the part of Cluster resources groups, they are in
control of Cluster services rgmanager.
I did not understand how we can make it work, please suggest how we can go
ahead.
Regards
Deval
-----Original Message-----
From: Kumaresh Ponnuswamy [mailto:kumaresh81 at yahoo.co.in]
Sent: Wednesday, April 12, 2006 3:34 PM
To: Deval kulshrestha; linux clustering
Subject: Re: [Linux-cluster] RE: how to dis-allow manual mounting of cluster
file system resources?
Hi,
In your case, I guess removing the SUID on mount for normal users is the
best solution.
This is will prevent non root members from mounting the file systesm.
Regards,
Kumaresh
Deval kulshrestha wrote:
Hi
I am using one MSA 500 G2 , two no. of HP DL360 G4P server with HP's HBA
642, Server installed with RHEL 4 ES U1 and RHCS4 with lock mgr as DLM
I have to run around 14 different services in HA mode, I have break them up
in two different priority domain.
Now 7 services runs on node1 in HA mode, node2 is failover host for them,
Remaining 7 services runs on node2 in HA mode and node1 is failover domain
for them.
In my scenario Simultaneous logical drive access is not required, thus I am
not using GFS here
What ever is needed is configured properly and working fine.
But this cluster is still causes some data inconsistency error if somebody
manually mounts the partitions which is already in access by other node.
I understand that this is against the basics of non-shared file system. This
can be documented also, but everybody knows that after 2-3 yrs down the line
when support staff replaced by new people, when they come in with very
limited understanding about the running stuff they can do some mount
mistake.(umount is a document screw up, but mount is here undocumented screw
up) every body knows mount is just a simple command, it does not harm
anything, if I just want to read data mount is ok. But in our case we wanted
to restrict other users to use mount command when some logical volume is
already mounted on one node.
I want some help on this, when shared file system is not implemented. How we
can restrict manual mount of cluster file system resources when its being in
use by some cluster services?
Any help would be highly appreciable here.
With regard
Deval K.
=========================================================== Privileged or
confidential information may be contained in this message. If you are not
the addressee indicated in this message (or responsible for delivery of the
message to such person), please delete this message and kindly notify the
sender by an emailed reply. Opinions, conclusions and other information in
this message that do not relate to the official business of Progression
and its associate entities shall be understood as neither given nor
endorsed by them.
------------------------------------------------------------- Progression
Infonet Private Limited,
Gurgaon (Haryana), India
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
_____
Jiyo cricket on Yahoo!
India cricket
Yahoo!
Messenger Mobile Stay in touch with your buddies all the
time.
===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Bowie_Bailey at BUC.com Wed Apr 12 14:56:13 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Wed, 12 Apr 2006 10:56:13 -0400
Subject: [Linux-cluster] Questions about GFS
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com>
Greg Perry wrote:
>
> I have been researching GFS for a few days, and I have some questions
> that hopefully some seasoned users of GFS may be able to answer.
>
> I am working on the design of a linux cluster that needs to be
> scalable, it will be primarily an RDBMS-driven data warehouse used
> for data mining and content indexing. In an ideal world, we would be
> able to start with a small (say 4 node) cluster, then add machines
> (and storage) as the various RDBMS' grow in size (as well as the use
> virtual IPs for load balancing across multiple lighttpd instances.
> All machines on the node need to be able to talk to the same volume
> of information, and GFS (in theory at least) would be used to
> aggregate the drives from each machine into that huge shared logical
> volume).
>
> With that being said, here are some questions:
>
> 1) What is the preference on the RDBMS, will MySQL 5.x work and are
> there any locking issues to consider? What would the best open source
> RDBMS be (MySQL vs. Postgresql etc)
Someone more qualified than me will have to answer that question.
> 2) If there was a 10 machine cluster, each with a 300GB SATA drive,
> can you use GFS to aggregate all 10 drives into one big logical 3000GB
> volume? Would that scenario work similar to a RAID array? If one or
> two nodes fail, but the GFS quorum is maintained, can those nodes be
> replaced and repopulated just like a RAID-5 array? If this scenario
> is possible, how difficult is it to "grow" the shared logical volume
> by adding additional nodes (say I had two more machines each with a
> 300GB SATA drive)?
GFS doesn't work that way. GFS is just a fancy filesystem. It takes
an already shared volume and allows all of the nodes to access it at
the same time.
> 3) How stable is GFS currently, and is it used in many production
> environments?
It seems to be stable for me, but we are still in testing mode at the
moment.
> 4) How stable is the FC5 version, and does it include all of the
> configuration utilities in the RH Enterprise Cluster version? (the
> idea would be to prove the point on FC5, then migrate to RH
> Enterprise).
Haven't used that one.
> 5) Would CentOS be preferred over FC5 for the initial
> proof of concept and early adoption?
If your eventual platform is RHEL, then CentOS would make more sense
for a testing platform since it is almost identical to RHEL. Fedora
can be less stable and may introduce some issues that you wouldn't have
with RHEL. On the other hand, RHEL may have some problems that don't
appear on Fedora because of updated packages.
If you want bleeding edge, use Fedora.
If you want stability, use CentOS or RHEL.
> 6) Are there any restrictions or performance advantages of using all
> drives with the same geometry, or can you mix and match different size
> drives and just add to the aggregate volume size?
As I said earlier, GFS does not do the aggregation.
What you get with GFS is the ability to share an already networked
storage volume. You can use iSCSI, AoE, GNBD, or others to connect
the storage to all of the cluster nodes. Then you format the volume
with GFS so that it can be used with all of the nodes.
I believe there is a project for the aggregate filesystem that you are
looking for, but as far as I know, it is still beta.
--
Bowie
From gregp at liveammo.com Wed Apr 12 15:21:27 2006
From: gregp at liveammo.com (Greg Perry)
Date: Wed, 12 Apr 2006 11:21:27 -0400
Subject: [Linux-cluster] Questions about GFS
In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com>
References: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com>
Message-ID: <443D1AF7.8090105@liveammo.com>
Thanks Bowie, I understand more now. So within this architecture, it
would make more sense to utilize a RAID-5/10 SAN, then add diskless
workstations as needed for performance...?
For said diskless workstations, does it make sense to run Stateless
Linux to keep the images the same across all of the workstations/client
machines?
Regards
Greg
Bowie Bailey wrote:
> Greg Perry wrote:
>> I have been researching GFS for a few days, and I have some questions
>> that hopefully some seasoned users of GFS may be able to answer.
>>
>> I am working on the design of a linux cluster that needs to be
>> scalable, it will be primarily an RDBMS-driven data warehouse used
>> for data mining and content indexing. In an ideal world, we would be
>> able to start with a small (say 4 node) cluster, then add machines
>> (and storage) as the various RDBMS' grow in size (as well as the use
>> virtual IPs for load balancing across multiple lighttpd instances.
>> All machines on the node need to be able to talk to the same volume
>> of information, and GFS (in theory at least) would be used to
>> aggregate the drives from each machine into that huge shared logical
>> volume).
>>
>> With that being said, here are some questions:
>>
>> 1) What is the preference on the RDBMS, will MySQL 5.x work and are
>> there any locking issues to consider? What would the best open source
>> RDBMS be (MySQL vs. Postgresql etc)
>
> Someone more qualified than me will have to answer that question.
>
>> 2) If there was a 10 machine cluster, each with a 300GB SATA drive,
>> can you use GFS to aggregate all 10 drives into one big logical 3000GB
>> volume? Would that scenario work similar to a RAID array? If one or
>> two nodes fail, but the GFS quorum is maintained, can those nodes be
>> replaced and repopulated just like a RAID-5 array? If this scenario
>> is possible, how difficult is it to "grow" the shared logical volume
>> by adding additional nodes (say I had two more machines each with a
>> 300GB SATA drive)?
>
> GFS doesn't work that way. GFS is just a fancy filesystem. It takes
> an already shared volume and allows all of the nodes to access it at
> the same time.
>
>> 3) How stable is GFS currently, and is it used in many production
>> environments?
>
> It seems to be stable for me, but we are still in testing mode at the
> moment.
>
>> 4) How stable is the FC5 version, and does it include all of the
>> configuration utilities in the RH Enterprise Cluster version? (the
>> idea would be to prove the point on FC5, then migrate to RH
>> Enterprise).
>
> Haven't used that one.
>
>> 5) Would CentOS be preferred over FC5 for the initial
>> proof of concept and early adoption?
>
> If your eventual platform is RHEL, then CentOS would make more sense
> for a testing platform since it is almost identical to RHEL. Fedora
> can be less stable and may introduce some issues that you wouldn't have
> with RHEL. On the other hand, RHEL may have some problems that don't
> appear on Fedora because of updated packages.
>
> If you want bleeding edge, use Fedora.
> If you want stability, use CentOS or RHEL.
>
>> 6) Are there any restrictions or performance advantages of using all
>> drives with the same geometry, or can you mix and match different size
>> drives and just add to the aggregate volume size?
>
> As I said earlier, GFS does not do the aggregation.
>
> What you get with GFS is the ability to share an already networked
> storage volume. You can use iSCSI, AoE, GNBD, or others to connect
> the storage to all of the cluster nodes. Then you format the volume
> with GFS so that it can be used with all of the nodes.
>
> I believe there is a project for the aggregate filesystem that you are
> looking for, but as far as I know, it is still beta.
>
From gregp at liveammo.com Wed Apr 12 15:28:13 2006
From: gregp at liveammo.com (Greg Perry)
Date: Wed, 12 Apr 2006 11:28:13 -0400
Subject: [Linux-cluster] Questions about GFS
In-Reply-To: <443D1AF7.8090105@liveammo.com>
References: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com>
<443D1AF7.8090105@liveammo.com>
Message-ID: <443D1C8D.5080503@liveammo.com>
Also, after reviewing the GFS architecture it seems there would be
significant security issues to consider, ie if one client/member of the
GFS volume were compromised, that would lead to a full compromise of the
filesystem across all nodes (and the ability to create special devices
and modify the filesystem on any other GFS node member). Are there any
plans to include any form of discretionary or mandatory access controls
for GFS in the upcoming v2 release?
Greg
Greg Perry wrote:
> Thanks Bowie, I understand more now. So within this architecture, it
> would make more sense to utilize a RAID-5/10 SAN, then add diskless
> workstations as needed for performance...?
>
> For said diskless workstations, does it make sense to run Stateless
> Linux to keep the images the same across all of the workstations/client
> machines?
>
> Regards
>
> Greg
>
> Bowie Bailey wrote:
>> Greg Perry wrote:
>>> I have been researching GFS for a few days, and I have some questions
>>> that hopefully some seasoned users of GFS may be able to answer.
>>>
>>> I am working on the design of a linux cluster that needs to be
>>> scalable, it will be primarily an RDBMS-driven data warehouse used
>>> for data mining and content indexing. In an ideal world, we would be
>>> able to start with a small (say 4 node) cluster, then add machines
>>> (and storage) as the various RDBMS' grow in size (as well as the use
>>> virtual IPs for load balancing across multiple lighttpd instances.
>>> All machines on the node need to be able to talk to the same volume
>>> of information, and GFS (in theory at least) would be used to
>>> aggregate the drives from each machine into that huge shared logical
>>> volume).
>>> With that being said, here are some questions:
>>>
>>> 1) What is the preference on the RDBMS, will MySQL 5.x work and are
>>> there any locking issues to consider? What would the best open source
>>> RDBMS be (MySQL vs. Postgresql etc)
>>
>> Someone more qualified than me will have to answer that question.
>>
>>> 2) If there was a 10 machine cluster, each with a 300GB SATA drive,
>>> can you use GFS to aggregate all 10 drives into one big logical 3000GB
>>> volume? Would that scenario work similar to a RAID array? If one or
>>> two nodes fail, but the GFS quorum is maintained, can those nodes be
>>> replaced and repopulated just like a RAID-5 array? If this scenario
>>> is possible, how difficult is it to "grow" the shared logical volume
>>> by adding additional nodes (say I had two more machines each with a
>>> 300GB SATA drive)?
>>
>> GFS doesn't work that way. GFS is just a fancy filesystem. It takes
>> an already shared volume and allows all of the nodes to access it at
>> the same time.
>>
>>> 3) How stable is GFS currently, and is it used in many production
>>> environments?
>>
>> It seems to be stable for me, but we are still in testing mode at the
>> moment.
>>
>>> 4) How stable is the FC5 version, and does it include all of the
>>> configuration utilities in the RH Enterprise Cluster version? (the
>>> idea would be to prove the point on FC5, then migrate to RH
>>> Enterprise).
>>
>> Haven't used that one.
>>
>>> 5) Would CentOS be preferred over FC5 for the initial
>>> proof of concept and early adoption?
>>
>> If your eventual platform is RHEL, then CentOS would make more sense
>> for a testing platform since it is almost identical to RHEL. Fedora
>> can be less stable and may introduce some issues that you wouldn't have
>> with RHEL. On the other hand, RHEL may have some problems that don't
>> appear on Fedora because of updated packages.
>>
>> If you want bleeding edge, use Fedora.
>> If you want stability, use CentOS or RHEL.
>>
>>> 6) Are there any restrictions or performance advantages of using all
>>> drives with the same geometry, or can you mix and match different size
>>> drives and just add to the aggregate volume size?
>>
>> As I said earlier, GFS does not do the aggregation.
>>
>> What you get with GFS is the ability to share an already networked
>> storage volume. You can use iSCSI, AoE, GNBD, or others to connect
>> the storage to all of the cluster nodes. Then you format the volume
>> with GFS so that it can be used with all of the nodes.
>>
>> I believe there is a project for the aggregate filesystem that you are
>> looking for, but as far as I know, it is still beta.
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
From hlawatschek at atix.de Wed Apr 12 15:36:46 2006
From: hlawatschek at atix.de (Mark Hlawatschek)
Date: Wed, 12 Apr 2006 17:36:46 +0200
Subject: [Linux-cluster] Questions about GFS
In-Reply-To: <443D1AF7.8090105@liveammo.com>
References: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com>
<443D1AF7.8090105@liveammo.com>
Message-ID: <200604121736.46956.hlawatschek@atix.de>
Greg,
you can use a diskless shared root configuration with gfs. This setup would
enable you to add cluster nodes as you need them.
Have a look at http://www.open-sharedroot.org/
Mark
On Wednesday 12 April 2006 17:21, Greg Perry wrote:
> Thanks Bowie, I understand more now. So within this architecture, it
> would make more sense to utilize a RAID-5/10 SAN, then add diskless
> workstations as needed for performance...?
>
> For said diskless workstations, does it make sense to run Stateless
> Linux to keep the images the same across all of the workstations/client
> machines?
>
> Regards
>
> Greg
>
> Bowie Bailey wrote:
> > Greg Perry wrote:
> >> I have been researching GFS for a few days, and I have some questions
> >> that hopefully some seasoned users of GFS may be able to answer.
> >>
> >> I am working on the design of a linux cluster that needs to be
> >> scalable, it will be primarily an RDBMS-driven data warehouse used
> >> for data mining and content indexing. In an ideal world, we would be
> >> able to start with a small (say 4 node) cluster, then add machines
> >> (and storage) as the various RDBMS' grow in size (as well as the use
> >> virtual IPs for load balancing across multiple lighttpd instances.
> >> All machines on the node need to be able to talk to the same volume
> >> of information, and GFS (in theory at least) would be used to
> >> aggregate the drives from each machine into that huge shared logical
> >> volume).
> >>
> >> With that being said, here are some questions:
> >>
> >> 1) What is the preference on the RDBMS, will MySQL 5.x work and are
> >> there any locking issues to consider? What would the best open source
> >> RDBMS be (MySQL vs. Postgresql etc)
> >
> > Someone more qualified than me will have to answer that question.
> >
> >> 2) If there was a 10 machine cluster, each with a 300GB SATA drive,
> >> can you use GFS to aggregate all 10 drives into one big logical 3000GB
> >> volume? Would that scenario work similar to a RAID array? If one or
> >> two nodes fail, but the GFS quorum is maintained, can those nodes be
> >> replaced and repopulated just like a RAID-5 array? If this scenario
> >> is possible, how difficult is it to "grow" the shared logical volume
> >> by adding additional nodes (say I had two more machines each with a
> >> 300GB SATA drive)?
> >
> > GFS doesn't work that way. GFS is just a fancy filesystem. It takes
> > an already shared volume and allows all of the nodes to access it at
> > the same time.
> >
> >> 3) How stable is GFS currently, and is it used in many production
> >> environments?
> >
> > It seems to be stable for me, but we are still in testing mode at the
> > moment.
> >
> >> 4) How stable is the FC5 version, and does it include all of the
> >> configuration utilities in the RH Enterprise Cluster version? (the
> >> idea would be to prove the point on FC5, then migrate to RH
> >> Enterprise).
> >
> > Haven't used that one.
> >
> >> 5) Would CentOS be preferred over FC5 for the initial
> >> proof of concept and early adoption?
> >
> > If your eventual platform is RHEL, then CentOS would make more sense
> > for a testing platform since it is almost identical to RHEL. Fedora
> > can be less stable and may introduce some issues that you wouldn't have
> > with RHEL. On the other hand, RHEL may have some problems that don't
> > appear on Fedora because of updated packages.
> >
> > If you want bleeding edge, use Fedora.
> > If you want stability, use CentOS or RHEL.
> >
> >> 6) Are there any restrictions or performance advantages of using all
> >> drives with the same geometry, or can you mix and match different size
> >> drives and just add to the aggregate volume size?
> >
> > As I said earlier, GFS does not do the aggregation.
> >
> > What you get with GFS is the ability to share an already networked
> > storage volume. You can use iSCSI, AoE, GNBD, or others to connect
> > the storage to all of the cluster nodes. Then you format the volume
> > with GFS so that it can be used with all of the nodes.
> >
> > I believe there is a project for the aggregate filesystem that you are
> > looking for, but as far as I know, it is still beta.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
--
Gruss / Regards,
Dipl.-Ing. Mark Hlawatschek
Phone: +49-89 121 409-55
http://www.atix.de/
http://www.open-sharedroot.org/
**
ATIX - Ges. fuer Informationstechnologie und Consulting mbH
Einsteinstr. 10 - 85716 Unterschleissheim - Germany
From Bowie_Bailey at BUC.com Wed Apr 12 15:45:19 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Wed, 12 Apr 2006 11:45:19 -0400
Subject: [Linux-cluster] Questions about GFS
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338DB@bnifex.cis.buc.com>
As someone else pointed out, it is possible to run diskless
workstations with their root on the GFS. I haven't tried this
configuration, so I don't know what issues their may be. The security
issue is there. Since they are all running from the same disk, a
compromise on one can corrupt the entire cluster.
On my systems, I just have a small hard drive to hold the OS and
applications and then mount the GFS as a data partition.
Bowie
Greg Perry wrote:
> Also, after reviewing the GFS architecture it seems there would be
> significant security issues to consider, ie if one client/member of
> the GFS volume were compromised, that would lead to a full compromise
> of the filesystem across all nodes (and the ability to create special
> devices and modify the filesystem on any other GFS node member). Are
> there any plans to include any form of discretionary or mandatory
> access controls for GFS in the upcoming v2 release?
>
> Greg
>
> Greg Perry wrote:
> > Thanks Bowie, I understand more now. So within this architecture,
> > it would make more sense to utilize a RAID-5/10 SAN, then add
> > diskless workstations as needed for performance...?
> >
> > For said diskless workstations, does it make sense to run Stateless
> > Linux to keep the images the same across all of the
> > workstations/client machines?
> >
> > Regards
> >
> > Greg
> >
> > Bowie Bailey wrote:
> > > Greg Perry wrote:
> > > > I have been researching GFS for a few days, and I have some
> > > > questions that hopefully some seasoned users of GFS may be able
> > > > to answer.
> > > >
> > > > I am working on the design of a linux cluster that needs to be
> > > > scalable, it will be primarily an RDBMS-driven data warehouse
> > > > used for data mining and content indexing. In an ideal world,
> > > > we would be able to start with a small (say 4 node) cluster,
> > > > then add machines (and storage) as the various RDBMS' grow in
> > > > size (as well as the use virtual IPs for load balancing across
> > > > multiple lighttpd instances. All machines on the node need to
> > > > be able to talk to the same volume of information, and GFS (in
> > > > theory at least) would be used to aggregate the drives from
> > > > each machine into that huge shared logical volume). With that
> > > > being said, here are some questions:
> > > >
> > > > 1) What is the preference on the RDBMS, will MySQL 5.x work and
> > > > are there any locking issues to consider? What would the best
> > > > open source RDBMS be (MySQL vs. Postgresql etc)
> > >
> > > Someone more qualified than me will have to answer that question.
> > >
> > > > 2) If there was a 10 machine cluster, each with a 300GB SATA
> > > > drive, can you use GFS to aggregate all 10 drives into one big
> > > > logical 3000GB volume? Would that scenario work similar to a
> > > > RAID array? If one or two nodes fail, but the GFS quorum is
> > > > maintained, can those nodes be replaced and repopulated just
> > > > like a RAID-5 array? If this scenario is possible, how
> > > > difficult is it to "grow" the shared logical volume by adding
> > > > additional nodes (say I had two more machines each with a 300GB
> > > > SATA drive)?
> > >
> > > GFS doesn't work that way. GFS is just a fancy filesystem. It
> > > takes an already shared volume and allows all of the nodes to
> > > access it at the same time.
> > >
> > > > 3) How stable is GFS currently, and is it used in many
> > > > production environments?
> > >
> > > It seems to be stable for me, but we are still in testing mode at
> > > the moment.
> > >
> > > > 4) How stable is the FC5 version, and does it include all of the
> > > > configuration utilities in the RH Enterprise Cluster version?
> > > > (the idea would be to prove the point on FC5, then migrate to RH
> > > > Enterprise).
> > >
> > > Haven't used that one.
> > >
> > > > 5) Would CentOS be preferred over FC5 for the initial
> > > > proof of concept and early adoption?
> > >
> > > If your eventual platform is RHEL, then CentOS would make more
> > > sense for a testing platform since it is almost identical to
> > > RHEL. Fedora can be less stable and may introduce some issues
> > > that you wouldn't have with RHEL. On the other hand, RHEL may
> > > have some problems that don't appear on Fedora because of updated
> > > packages.
> > >
> > > If you want bleeding edge, use Fedora.
> > > If you want stability, use CentOS or RHEL.
> > >
> > > > 6) Are there any restrictions or performance advantages of
> > > > using all drives with the same geometry, or can you mix and
> > > > match different size drives and just add to the aggregate
> > > > volume size?
> > >
> > > As I said earlier, GFS does not do the aggregation.
> > >
> > > What you get with GFS is the ability to share an already networked
> > > storage volume. You can use iSCSI, AoE, GNBD, or others to
> > > connect the storage to all of the cluster nodes. Then you format
> > > the volume with GFS so that it can be used with all of the nodes.
> > >
> > > I believe there is a project for the aggregate filesystem that
> > > you are looking for, but as far as I know, it is still beta.
> > >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
From Bowie_Bailey at BUC.com Wed Apr 12 15:48:19 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Wed, 12 Apr 2006 11:48:19 -0400
Subject: [Linux-cluster] Questions about GFS
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338DC@bnifex.cis.buc.com>
Also, keep in mind that the number of nodes is limited by the number
of journals on your GFS filesystem. So when you create the
filesystem, you should add a few extra journals to accommodate
expansion. If you run out, you have to add disks to the GFS in order
to create more journals.
Bowie
Mark Hlawatschek wrote:
> Greg,
>
> you can use a diskless shared root configuration with gfs. This setup
> would enable you to add cluster nodes as you need them.
> Have a look at http://www.open-sharedroot.org/
>
> Mark
>
> On Wednesday 12 April 2006 17:21, Greg Perry wrote:
> > Thanks Bowie, I understand more now. So within this architecture,
> > it would make more sense to utilize a RAID-5/10 SAN, then add
> > diskless workstations as needed for performance...?
> >
> > For said diskless workstations, does it make sense to run Stateless
> > Linux to keep the images the same across all of the
> > workstations/client machines?
From tf0054 at gmail.com Wed Apr 12 17:10:52 2006
From: tf0054 at gmail.com (Takeshi NAKANO)
Date: Thu, 13 Apr 2006 02:10:52 +0900
Subject: [Linux-cluster] Cisco fence agent
In-Reply-To: <1144766944.16956.10.camel@merlin.Mines.EDU>
References:
<1144766944.16956.10.camel@merlin.Mines.EDU>
Message-ID:
Hello Matthew.
Thank for showing your code!
That is exactly same one which I will make.
> I like the network option because the host that is having problems
> will be able to write log entries after it has been fenced.
I can not agree more.
Thanks a lot.
Takeshi NAKANO.
2006/4/11, Matthew B. Brookover :
> I do not know if this will help, but here is what I put together.
>
> We have 3 Cisco 3750 switches. I am currently using SNMP to turn off the
> ports of a host that is being fenced. I wrote a perl script called
> fence_cisco that works with GFS 6. I have attached a copy of fence_cisco to
> this message and its config file. I do not have much in the way of
> documentation for it, and it will probably take some hacking to get it to
> work with a current version of GFS. If you know a little perl, writing a
> fencing agent is not very difficult.
>
> I have also included a copy for the config file for fence_cisco. The first
> two lines specify the SNMP community string and the IP address for the
> switch. The rest is a list of hosts and the ports they use. You will have
> to talk to your local network guru to figure out Cisco community strings and
> the numbers involved. It took some tinkering to figure out how Cisco does
> this stuff, and even after writing the code, I am still not sure that I
> understand it. I do know that it does work, GFS does do the correct things
> during a crash.
>
> Most people use one of the power supply switches. Redhat provides the
> fence_apc agent that will turn off the power to a node that needs to be
> fenced. I like the network option because the host that is having problems
> will be able to write log entries after it has been fenced.
>
> You will need to get the Net::SNMP module from cpan.org to use fence_cisco.
> Matt
>
>
>
> On Sun, 2006-04-09 at 01:23 +0900, ??? wrote:
>
> Hi all. Do anyone have cisco catalyst fence agent? If nobody make that, I
> will make. Thanks.
> -- Linux-cluster mailing list Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
From aaron at firebright.com Wed Apr 12 19:25:49 2006
From: aaron at firebright.com (Aaron Stewart)
Date: Wed, 12 Apr 2006 12:25:49 -0700
Subject: [Linux-cluster] CLVM and AoE
Message-ID: <443D543D.2030202@firebright.com>
Hey All,
I'm currently in process of setting up a Coraid ATA over Ethernet device
as a backend storage for multiple systems that export individual
partitions to Xen virtual servers. In our discussions with Coraid, they
suggested looking into CLVM in order to handle this.
Obviously, I have some questions.. :)
- Has anyone used this kind of setup? I have very little experience
with Redhat's cluster management, but have a fairly high level of
expertise overall in this arena.
- How does management of LVM logical volumes occur? Do we need to
maintain one server that administers the volume group?
- What kind of pitfalls should we be aware of?
Can anyone point to any experience or any HOWTO's that discuss setting
something like this up?
Here's the setup:
1. Coraid SR1520 configured in one lblade, exported via AoE on a
dedicated storage network as one LUN
2. Centos4.2 on all cluster nodes
3. logical volumes get masked when getting passed into Xen, so on the
Dom0 controller it should look like /dev/VolGroup00/{xenvmID} (which
shows up in the virtual as /dev/sda1)
4. only one host need access to a given logical volume at any given
time. If migration needs to occur, the volume should be unmounted and
remounted on another physical system.
5. Despite the fact that AoE is a layer 4 protocol, apparently it can
coexist with IP on the same network interface, so we can transport
cluster metadata over the same interface. Barring that, there is a
second (public) interface on each box.
6. We want to avoid a single point of failure (such as a second AoE
server that exports luns from lvm lv's)
Thanks in advance..
-=Aaron Stewart
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aaron.vcf
Type: text/x-vcard
Size: 289 bytes
Desc: not available
URL:
From sanelson at gmail.com Wed Apr 12 20:10:42 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Wed, 12 Apr 2006 21:10:42 +0100
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
Message-ID:
Hi All,
I'm assuming that most of us on this list have used HP MSA kit, so
excuse me a slightly off-topic question!
I've got a cluster connected to an MSA1000, but want to make some
changes on the MSA1000 itself.
I've got a dumb terminal that runs procom, but its pretty horrid, so
I've connected the controller direct to the serial port of one of the
linux machines to use minicom.
As per HP's documentation, I've set it up as:
pr port /dev/ttyS0
pu baudrate 19200
pu bits 8
pu parity N
pu stopbits 1
However, I get no response.
Any ideas on how to troubleshoot? Anyone got this working?
S.
From greg.freemyer at gmail.com Wed Apr 12 20:18:06 2006
From: greg.freemyer at gmail.com (Greg Freemyer)
Date: Wed, 12 Apr 2006 16:18:06 -0400
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
In-Reply-To:
References:
Message-ID: <87f94c370604121318y179cdd1as8bd8fc62d988ad99@mail.gmail.com>
Did you try 9600 baud?
Seems like all the Dec, I mean Compaq, I mean HP storage uses 9600 not 19200.
I don't know what the HP stuff uses that is not from the old Dec
storageworks line.
On 4/12/06, Steve Nelson wrote:
> Hi All,
>
> I'm assuming that most of us on this list have used HP MSA kit, so
> excuse me a slightly off-topic question!
>
> I've got a cluster connected to an MSA1000, but want to make some
> changes on the MSA1000 itself.
>
> I've got a dumb terminal that runs procom, but its pretty horrid, so
> I've connected the controller direct to the serial port of one of the
> linux machines to use minicom.
>
> As per HP's documentation, I've set it up as:
>
> pr port /dev/ttyS0
> pu baudrate 19200
> pu bits 8
> pu parity N
> pu stopbits 1
>
> However, I get no response.
>
> Any ideas on how to troubleshoot? Anyone got this working?
>
> S.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
--
Greg Freemyer
The Norcross Group
Forensics for the 21st Century
From cjk at techma.com Wed Apr 12 20:28:30 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Wed, 12 Apr 2006 16:28:30 -0400
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
Message-ID:
Turn off flow control if it's on, save the config as default and restart
minicom.
Also, make sure you are using the HP supplied cable and not some one off
or general serial cable. In true HP form, it's a custom cable...
If that doesn't work, here are some things to check..
1. The HP cable is plugged into the _front_ of the MSA (the back is all
fibre)
2. Make sure your serial port is not being used by something else (serial
terminal)
3. umm, I dunno, these are pretty simple...
Good luck
Regards,
Corey
-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Steve Nelson
Sent: Wednesday, April 12, 2006 4:11 PM
To: linux clustering
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
Hi All,
I'm assuming that most of us on this list have used HP MSA kit, so excuse me
a slightly off-topic question!
I've got a cluster connected to an MSA1000, but want to make some changes on
the MSA1000 itself.
I've got a dumb terminal that runs procom, but its pretty horrid, so I've
connected the controller direct to the serial port of one of the linux
machines to use minicom.
As per HP's documentation, I've set it up as:
pr port /dev/ttyS0
pu baudrate 19200
pu bits 8
pu parity N
pu stopbits 1
However, I get no response.
Any ideas on how to troubleshoot? Anyone got this working?
S.
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
From cjk at techma.com Wed Apr 12 20:29:05 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Wed, 12 Apr 2006 16:29:05 -0400
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
Message-ID:
MSA1x00's use 19200... it's an oddball
Regards
Corey
-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Greg Freemyer
Sent: Wednesday, April 12, 2006 4:18 PM
To: linux clustering
Subject: Re: [Linux-cluster] [OT] Serial Connection to MSA1000
Did you try 9600 baud?
Seems like all the Dec, I mean Compaq, I mean HP storage uses 9600 not 19200.
I don't know what the HP stuff uses that is not from the old Dec storageworks
line.
On 4/12/06, Steve Nelson wrote:
> Hi All,
>
> I'm assuming that most of us on this list have used HP MSA kit, so
> excuse me a slightly off-topic question!
>
> I've got a cluster connected to an MSA1000, but want to make some
> changes on the MSA1000 itself.
>
> I've got a dumb terminal that runs procom, but its pretty horrid, so
> I've connected the controller direct to the serial port of one of the
> linux machines to use minicom.
>
> As per HP's documentation, I've set it up as:
>
> pr port /dev/ttyS0
> pu baudrate 19200
> pu bits 8
> pu parity N
> pu stopbits 1
>
> However, I get no response.
>
> Any ideas on how to troubleshoot? Anyone got this working?
>
> S.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
--
Greg Freemyer
The Norcross Group
Forensics for the 21st Century
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
From cjk at techma.com Wed Apr 12 20:30:42 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Wed, 12 Apr 2006 16:30:42 -0400
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
Message-ID:
Could be that someone else changed the baud setting tho, so Greg has a good
point..
If someone used to 9600 worked on it, they might have changed it cuz the
default
wuz "wrong" :)
Corey
-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Greg Freemyer
Sent: Wednesday, April 12, 2006 4:18 PM
To: linux clustering
Subject: Re: [Linux-cluster] [OT] Serial Connection to MSA1000
Did you try 9600 baud?
Seems like all the Dec, I mean Compaq, I mean HP storage uses 9600 not 19200.
I don't know what the HP stuff uses that is not from the old Dec storageworks
line.
On 4/12/06, Steve Nelson wrote:
> Hi All,
>
> I'm assuming that most of us on this list have used HP MSA kit, so
> excuse me a slightly off-topic question!
>
> I've got a cluster connected to an MSA1000, but want to make some
> changes on the MSA1000 itself.
>
> I've got a dumb terminal that runs procom, but its pretty horrid, so
> I've connected the controller direct to the serial port of one of the
> linux machines to use minicom.
>
> As per HP's documentation, I've set it up as:
>
> pr port /dev/ttyS0
> pu baudrate 19200
> pu bits 8
> pu parity N
> pu stopbits 1
>
> However, I get no response.
>
> Any ideas on how to troubleshoot? Anyone got this working?
>
> S.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
--
Greg Freemyer
The Norcross Group
Forensics for the 21st Century
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
From sanelson at gmail.com Wed Apr 12 20:28:51 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Wed, 12 Apr 2006 21:28:51 +0100
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
In-Reply-To: <87f94c370604121318y179cdd1as8bd8fc62d988ad99@mail.gmail.com>
References:
<87f94c370604121318y179cdd1as8bd8fc62d988ad99@mail.gmail.com>
Message-ID:
On 4/12/06, Greg Freemyer wrote:
> Did you try 9600 baud?
I did...
I am assuming /dev/ttyS0 is correct - it only has one serial port!
S.
From sanelson at gmail.com Wed Apr 12 20:40:49 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Wed, 12 Apr 2006 21:40:49 +0100
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
In-Reply-To:
References:
Message-ID:
On 4/12/06, Kovacs, Corey J. wrote:
> Turn off flow control if it's on, save the config as default and restart
> minicom.
Thanks very much. I had turned off flow control, but saving as
default, and restarting appeared to make the difference.
Welcome to minicom 2.00.0
OPTIONS: History Buffer, F-key Macros, Search History Buffer, I18n
Compiled on Sep 12 2003, 17:33:22.
Press CTRL-A Z for help on special keys
Invalid CLI command.
CLI> AT S7=45 S0=0 L1 V1 X4 &c1 E1 Q0
Invalid CLI command.
CLI>
Incidentally, how do I get it not to send that dialling stuff?
> Corey
S.
From Bowie_Bailey at BUC.com Wed Apr 12 20:59:26 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Wed, 12 Apr 2006 16:59:26 -0400
Subject: [Linux-cluster] CLVM and AoE
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com>
Aaron Stewart wrote:
>
> I'm currently in process of setting up a Coraid ATA over Ethernet
> device as a backend storage for multiple systems that export
> individual partitions to Xen virtual servers. In our discussions
> with Coraid, they suggested looking into CLVM in order to handle this.
>
> Obviously, I have some questions.. :)
>
> - Has anyone used this kind of setup? I have very little experience
> with Redhat's cluster management, but have a fairly high level of
> expertise overall in this arena.
I don't know anything about Xen, but I am using this same basic setup
on my systems.
> - How does management of LVM logical volumes occur? Do we need to
> maintain one server that administers the volume group?
The management is distributed. You can manage the cluster and volume
groups from any node.
> - What kind of pitfalls should we be aware of?
Some people have complained about throughput issues with GFS. Our
application doesn't require high throughput, so I can't comment on
this. I haven't found any issues in my testing so far.
> Can anyone point to any experience or any HOWTO's that discuss setting
> something like this up?
There are a few documents, but most of the ones that I've seen are out
of date. If you have specific questions, you can ask here.
If you don't have it already, here is the yum config with the current
cluster RPMs for CentOS. Just drop it in a file in /etc/yum.repos.d/.
Note that the current cluster RPMs are for the new 2.6.9-34.EL kernel.
----------------------------
[csgfs]
name=CentOS-4 - CSGFS
baseurl=http://mirror.centos.org/centos/$releasever/csgfs/$basearch/
gpgcheck=1
enabled=1
----------------------------
The only thing you need to build from source is the AoE driver from
CoRaid.
> Here's the setup:
>
> 1. Coraid SR1520 configured in one lblade, exported via AoE on a
> dedicated storage network as one LUN
> 2. Centos4.2 on all cluster nodes
> 3. logical volumes get masked when getting passed into Xen, so on the
> Dom0 controller it should look like /dev/VolGroup00/{xenvmID} (which
> shows up in the virtual as /dev/sda1)
> 4. only one host need access to a given logical volume at any given
> time. If migration needs to occur, the volume should be unmounted and
> remounted on another physical system.
This can be done, but the cluster will not do it for you. Each
logical volume can be accessed by as many nodes as you need. Note
that you need one GFS journal per node that needs simultaneous access.
> 5. Despite the fact that AoE is a layer 4 protocol, apparently it can
> coexist with IP on the same network interface, so we can transport
> cluster metadata over the same interface. Barring that, there is a
> second (public) interface on each box.
> 6. We want to avoid a single point of failure (such as a second AoE
> server that exports luns from lvm lv's)
Now that DLM is the recommended locking manager, everything is
distributed. Your only single point of failure is the CoRaid box.
--
Bowie
From aaron at firebright.com Wed Apr 12 21:11:24 2006
From: aaron at firebright.com (Aaron Stewart)
Date: Wed, 12 Apr 2006 14:11:24 -0700
Subject: [Linux-cluster] CLVM and AoE
In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com>
References: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com>
Message-ID: <443D6CFC.7000507@firebright.com>
Hey Bowie,
Wow.. That's perfect. Thanks for the response.
I have a question about whether GFS is a requirement.. Since each lv is
a separate partition mounted on xen, does GFS make sense, or can we use
ext3/xfs/etc.?
-=Aaron
Bowie Bailey wrote:
> Aaron Stewart wrote:
>
>> I'm currently in process of setting up a Coraid ATA over Ethernet
>> device as a backend storage for multiple systems that export
>> individual partitions to Xen virtual servers. In our discussions
>> with Coraid, they suggested looking into CLVM in order to handle this.
>>
>> Obviously, I have some questions.. :)
>>
>> - Has anyone used this kind of setup? I have very little experience
>> with Redhat's cluster management, but have a fairly high level of
>> expertise overall in this arena.
>>
>
> I don't know anything about Xen, but I am using this same basic setup
> on my systems.
>
>
>> - How does management of LVM logical volumes occur? Do we need to
>> maintain one server that administers the volume group?
>>
>
> The management is distributed. You can manage the cluster and volume
> groups from any node.
>
>
>> - What kind of pitfalls should we be aware of?
>>
>
> Some people have complained about throughput issues with GFS. Our
> application doesn't require high throughput, so I can't comment on
> this. I haven't found any issues in my testing so far.
>
>
>> Can anyone point to any experience or any HOWTO's that discuss setting
>> something like this up?
>>
>
> There are a few documents, but most of the ones that I've seen are out
> of date. If you have specific questions, you can ask here.
>
> If you don't have it already, here is the yum config with the current
> cluster RPMs for CentOS. Just drop it in a file in /etc/yum.repos.d/.
> Note that the current cluster RPMs are for the new 2.6.9-34.EL kernel.
>
> ----------------------------
> [csgfs]
> name=CentOS-4 - CSGFS
> baseurl=http://mirror.centos.org/centos/$releasever/csgfs/$basearch/
> gpgcheck=1
> enabled=1
> ----------------------------
>
> The only thing you need to build from source is the AoE driver from
> CoRaid.
>
>
>> Here's the setup:
>>
>> 1. Coraid SR1520 configured in one lblade, exported via AoE on a
>> dedicated storage network as one LUN
>> 2. Centos4.2 on all cluster nodes
>> 3. logical volumes get masked when getting passed into Xen, so on the
>> Dom0 controller it should look like /dev/VolGroup00/{xenvmID} (which
>> shows up in the virtual as /dev/sda1)
>> 4. only one host need access to a given logical volume at any given
>> time. If migration needs to occur, the volume should be unmounted and
>> remounted on another physical system.
>>
>
> This can be done, but the cluster will not do it for you. Each
> logical volume can be accessed by as many nodes as you need. Note
> that you need one GFS journal per node that needs simultaneous access.
>
>
>> 5. Despite the fact that AoE is a layer 4 protocol, apparently it can
>> coexist with IP on the same network interface, so we can transport
>> cluster metadata over the same interface. Barring that, there is a
>> second (public) interface on each box.
>> 6. We want to avoid a single point of failure (such as a second AoE
>> server that exports luns from lvm lv's)
>>
>
> Now that DLM is the recommended locking manager, everything is
> distributed. Your only single point of failure is the CoRaid box.
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aaron.vcf
Type: text/x-vcard
Size: 289 bytes
Desc: not available
URL:
From mtp at tilted.com Wed Apr 12 21:29:00 2006
From: mtp at tilted.com (Mark Petersen)
Date: Wed, 12 Apr 2006 16:29:00 -0500
Subject: [Linux-cluster] CLVM and AoE
In-Reply-To: <443D6CFC.7000507@firebright.com>
References: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com>
<443D6CFC.7000507@firebright.com>
Message-ID: <7.0.1.0.2.20060412162416.028964f0@tilted.com>
At 04:11 PM 4/12/2006, you wrote:
>Hey Bowie,
>
>Wow.. That's perfect. Thanks for the response.
>
>I have a question about whether GFS is a requirement.. Since each lv
>is a separate partition mounted on xen, does GFS make sense, or can
>we use ext3/xfs/etc.?
So is every dom0 going to mount the CoRaid device directly using
AoE? And CLVM will notify the whole cluster when any single node
makes LVM changes?
If not, then you'll need to use GNBD to export the lv's I guess.
Either way you can use whatever fs you have support for in a xenU
kernel. You shouldn't need to format anything GFS at all.
From lhh at redhat.com Wed Apr 12 22:07:05 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 12 Apr 2006 18:07:05 -0400
Subject: [Linux-cluster] Help-me, Please
In-Reply-To: <9e7b71460604101657n1eebc099jfaabb5a08ebbc630@mail.gmail.com>
References: <9e7b71460604101657n1eebc099jfaabb5a08ebbc630@mail.gmail.com>
Message-ID: <1144879625.15794.48.camel@ayanami.boston.redhat.com>
On Mon, 2006-04-10 at 20:57 -0300, ANDRE LUIS FORIGATO wrote:
> Linux xlx2 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005
> i686 i686 i386 GNU/Linux
> Apr 10 01:18:07 xlx2 clusvcmgrd[4671]: Couldn't connect to
> member #0: Connection timed out
> Apr 10 05:13:43 xlx2 clusvcmgrd[4671]: Couldn't connect to
> member #0: Connection timed out
> Apr 10 05:13:43 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
> lock: No locks available
> Apr 10 05:13:49 xlx2 cluquorumd[4463]: Disk-TB: Partner is DOWN
> (Dead/Hung)
> Apr 10 05:13:54 xlx2 cluquorumd[4463]: Disk-TB: State Change: Partner UP
> Apr 10 10:47:08 xlx2 clusvcmgrd[4671]: Couldn't connect to
> member #0: Connection timed out
> Apr 10 10:47:08 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
> lock: No locks available
> Apr 10 11:30:59 xlx2 clusvcmgrd[4671]: Couldn't connect to
> member #0: Connection timed out
> Apr 10 11:30:59 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
> lock: No locks available
> Apr 10 11:31:07 xlx2 clumembd[4493]: Membership View #5:0x00000002
> Apr 10 11:31:08 xlx2 cluquorumd[4463]: Membership reports #0
> as down, but disk reports as up: State uncertain!
> Apr 10 11:31:08 xlx2 cluquorumd[4463]: --> Commencing STONITH <--
> Apr 10 11:31:08 xlx2 cluquorumd[4463]: Disk-TB: Partner is DOWN
> (Dead/Hung)
> Apr 10 11:31:10 xlx2 cluquorumd[4463]: Disk-TB: State Change: Partner UP
> Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: Quorum Event: View #12 0x00000002
> Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: Member
> 200.254.254.171's state is uncertain: Some services may be
> unavailable!
> Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: Quorum Event: View #13 0x00000002
> Apr 10 11:31:29 xlx2 clusvcmgrd[4671]: Couldn't connect to
> member #0: Connection timed out
> Apr 10 11:31:29 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
> lock: No locks available
> Apr 10 11:31:34 xlx2 cluquorumd[4463]: Disk-TB: Partner is DOWN
> (Dead/Hung)
> Apr 10 11:31:38 xlx2 cluquorumd[4463]: --> Commencing STONITH <--
> Apr 10 11:31:38 xlx2 cluquorumd[4463]: STONITH: Falsely
> claiming that 200.254.254.171 has been fenced
> Apr 10 11:31:38 xlx2 cluquorumd[4463]: STONITH: Data integrity
> may be compromised!
> Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: Couldn't connect to
> member #0: Connection timed out
> Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: Unable to obtain cluster
> lock: No locks available
> Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: Quorum Event: View #15 0x00000002
> Apr 10 11:31:41 xlx2 clusvcmgrd[4671]: State change:
> 200.254.254.172 DOWN
> Apr 10 11:34:08 xlx2 cluquorumd[4463]: Disk-TB: State Change: Partner UP
> Apr 10 11:34:09 xlx2 clusvcmgrd[4671]: Quorum Event: View #16 0x00000002
> Apr 10 11:34:16 xlx2 clusvcmgrd[4671]: