From sco at adviseo.fr  Sat Apr  1 21:06:32 2006
From: sco at adviseo.fr (Sylvain Coutant)
Date: Sat, 1 Apr 2006 23:06:32 +0200
Subject: [Linux-cluster] gnbd server & cache
Message-ID: <003001c655d0$2706e680$6300000a@ELTON>

Hi,

Could someone help me understand why gnbd server does not support non-caching exports when not coupled with the cluster suite ? I wonder what's the link between both ...


BR,

--
Sylvain COUTANT

ADVISEO
http://www.adviseo.fr/
http://www.open-sp.fr/





From halomoan at powere2e.com  Sun Apr  2 04:58:35 2006
From: halomoan at powere2e.com (Halomoan  )
Date: Sun,  2 Apr 2006 12:58:35 +0800
Subject: [Linux-cluster] GFS is for what and how it works ?
Message-ID: <200604021258.AA403309094@mail.powere2e.com>


Sorry, I'm newbie in GFS.

Followed Redhat's GFS documentation 
To find out how GFS works, I have 2 nodes (node A and node B) for GFS and 1 node (node C) for GNBD server. It runs with no error but i don't know how to use it (GFS)

I attached my /etc/cluster/cluster.conf below.

My question is:

1. At a time, how many nodes have GFS filesystem mounted ? Where is the cluster's work in GFS ?
2. How do I shared the GFS filesystem to other server ? Do I need other software ?
3. From this configuration, if node A failed, what happen to the GFS filesystem ? failover to node B ? How about with the other server that is using the GFS filesystem in node A ?
4. Could you give me example what is actually the GFS real usage in real live ? 

I'm absolutely confuse with this GFS on how they works.

Thanks

Regards,

Halomoan

--------------------- Cluster.conf ------------------------
<?xml version="1.0"?>
<cluster config_version="14" name="gfs_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="p3" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="byhand" nodename="p3"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="p7" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="byhand" nodename="p7"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_gnbd" name="p5_gnbd" servers="p5.xxx.com"/>
                <fencedevice agent="fence_manual" name="byhand"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="failover" ordered="0" restricted="0">
                                <failoverdomainnode name="p3" priority="1"/>
                                <failoverdomainnode name="p7" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <clusterfs device="/dev/gnbd/global_disk" force_unmount="1" fsid="15674" fstype="gfs" mountpoint="/data" name="data" options=""/>
                </resources>
                <service autostart="1" domain="failover" name="test">
                        <clusterfs ref="data"/>
                </service>
        </rm>
</cluster>

 

 
______________ ______________ ______________ ______________
Sent via the KillerWebMail system at mail.powere2e.com


 
                   



From pcaulfie at redhat.com  Mon Apr  3 09:04:11 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 03 Apr 2006 10:04:11 +0100
Subject: [Linux-cluster] standard mechanism to communicate between cluster
	nodes from kernel
In-Reply-To: <cc723f590603170259v2c57ce5fi781cbe80d5a4c54b@mail.gmail.com>
References: <cc723f590603170259v2c57ce5fi781cbe80d5a4c54b@mail.gmail.com>
Message-ID: <4430E50B.9020104@redhat.com>

Aneesh Kumar wrote:
> Hi all,
> 
> I was trying to understand whether there is a standard set of API we
> are working on for communicating between different nodes in a cluster
> inside kernel. I looked at ocfs2 and the ocfs2 dlm code base seems to
> use tcp via o2net_send_tcp_msg and the redhat dlm seems to sctp. There
> is also tipc (net/tipc) code in the kernel now ( I am not sure about
> the details of tipc). This confuses me a lot. If i want to use all
> these cluster components what is the standard way. I am right now
> looking at clusterproc
> (http://www.openssi.org/cgi-bin/view?page=proc-hooks.html ) and
> wondering what should be the communication mechanism.  clusterproc was
> earlier based on CI which provided a simple easy way to define
> different cluster services( more or less like rpcgen style
> http://ci-linux.sourceforge.net/ics.shtml ). Does we are looking for a
> framework like that ?
> 
> NOTE: I am not trying to find out which one is the best. I am trying
> to find out if there is a standard way of doing this
> 

I'll repeat the reply I sent you you when you asked me this via private email,
just for the record...

   I think you've answered your own question. each cluster manager has its own
   way of communicating between nodes.

   As for which is best, That depends on what you mean by "best". There are
   lots of variables in cluster comms. Do you want speed? reliability?
   predictability? ordering?"



-- 

patrick



From thaidn at gmail.com  Mon Apr  3 10:30:16 2006
From: thaidn at gmail.com (Thai Duong)
Date: Mon, 3 Apr 2006 17:30:16 +0700
Subject: [Linux-cluster] Manual fencing doest work
Message-ID: <d4e2d9970604030330v7f32571cga1091fd6e2132f9d@mail.gmail.com>

Hi all,

I have a 2 node GFS 6.1 cluster with the following configuration:

<?xml version="1.0"?>
<cluster name="fccrac" config_version="5">

    <cman two_node="1" expected_votes="1">
    </cman>

    <clusternodes>
      <clusternode name="fcc1" votes="1">
       <fence>
        <method name="single">
         <device name="human" nodename="fcc1"/>
        </method>
       </fence>
      </clusternode>

      <clusternode name="fcc4" votes="1">
       <fence>
        <method name="single">
         <device name="human" nodename="fcc4"/>
        </method>
       </fence>
      </clusternode>
   </clusternodes>

  <fence_devices>
   <fence_device name="human" agent="fence_manual"/>
  </fence_devices>

 </cluster>

It turns out that manual fencing doest work as expected. When I force power
down a node, the other could not fence it and worse, the whole GFS file
system is freeze waiting for the downed node to be up again. I got something
like below in kernel log

Apr  2 16:46:28 fcc1 fenced[3444]: fencing node "fcc4"
Apr  2 16:46:28 fcc1 fenced[3444]: fence "fcc4" failed

Some information about GFS and kernel:

[root at fcc1 ~]# rpm -qa | grep GFS
GFS-6.1.3-0
GFS-kernel-2.6.9-45.0.2

[root at fcc1 ~]# uname -a
Linux fcc1 2.6.9-22.0.2.EL #1 SMP Thu Jan 5 17:04:58 EST 2006 ia64 ia64 ia64
GNU/Linux

Please help.

TIA,

Thai Duong.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060403/f16991f8/attachment.htm>

From sunjw at onewaveinc.com  Mon Apr  3 09:51:36 2006
From: sunjw at onewaveinc.com (=?GB2312?B?y++/oc6w?=)
Date: Mon, 3 Apr 2006 17:51:36 +0800
Subject: [Linux-cluster] kernel panic about lock_dlm
Message-ID: <SERVERgvhBh6ZHTSBpl0003b178@mail.onewaveinc.com>

Hi, everyone

I use kernel 2.6.15-rc7 and the latest STABLE cvs branch of GFS 
when the newest kernel is 2.6.15-rc7?
I've started a GFS cluster with 4 nodes, but after about 4 days, 
the cluster did not work.I found the /var/log/messages as follows:
<--
Mar 28 15:31:29 nd05 kernel: d 1 locks
Mar 28 15:31:29 nd05 kernel: gfs-sda1 update remastered resources
Mar 28 15:31:29 nd05 kernel: gfs-sda1 updated 0 resources
Mar 28 15:31:29 nd05 kernel: gfs-sda1 rebuild locks
Mar 28 15:31:29 nd05 kernel: gfs-sda1 rebuilt 0 locks
Mar 28 15:31:29 nd05 kernel: gfs-sda1 recover event 11 done
Mar 28 15:31:29 nd05 kernel: gfs-sda1 move flags 0,0,1 ids 8,11,11
Mar 28 15:31:29 nd05 kernel: gfs-sda1 process held requests
Mar 28 15:31:29 nd05 kernel: gfs-sda1 processed 0 requests
Mar 28 15:31:29 nd05 kernel: gfs-sda1 resend marked requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 resent 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 11 finished
Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 1,0,0 ids 11,11,11
Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 0,1,0 ids 11,14,11
Mar 28 15:31:30 nd05 kernel: gfs-sda1 move use event 14
Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14
Mar 28 15:31:30 nd05 kernel: gfs-sda1 add node 2
Mar 28 15:31:30 nd05 kernel: gfs-sda1 total nodes 4
Mar 28 15:31:30 nd05 kernel: gfs-sda1 rebuild resource directory
Mar 28 15:31:30 nd05 kernel: gfs-sda1 rebuilt 1552 resources
Mar 28 15:31:30 nd05 kernel: gfs-sda1 purge requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 purged 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 mark waiting requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 marked 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14 done
Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 0,0,1 ids 11,14,14
Mar 28 15:31:30 nd05 kernel: gfs-sda1 process held requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 processed 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 resend marked requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 resent 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14 finished
Mar 28 15:31:30 nd05 kernel: gfs-sda1 grant lock on lockqueue 2
Mar 28 15:31:30 nd05 kernel: gfs-sda1 process_lockqueue_reply id 9190386 state 0
Mar 28 15:31:30 nd05 kernel: gfs-sda1 grant lock on lockqueue 2
Mar 28 15:31:30 nd05 kernel: gfs-sda1 process_lockqueue_reply id eab0065 state 0
Mar 28 15:31:30 nd05 kernel: gfs-sda1 unlock fb040350 no id
Mar 28 15:31:30 nd05 kernel: recovery_done jid 3 msg 309 a
Mar 28 15:31:30 nd05 kernel: 3961 recovery_done nodeid 4 flg 18
Mar 28 15:31:30 nd05 kernel: 3977 pr_start last_stop 3 last_start 4 last_finish 3
Mar 28 15:31:31 nd05 kernel: 3977 pr_start count 3 type 3 event 4 flags 21a
Mar 28 15:31:31 nd05 kernel: 3977 pr_start 4 done 1
Mar 28 15:31:31 nd05 kernel: 3976 pr_finish flags 1a
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13415b4b id 163005c 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13425b42 id 180002f 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13435b39 id 1a00360 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13445b30 id 1760186 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13455b27 id 17a038b 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13465b1e id 15a01a8 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13475b15 id 1910380 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13485b0c id 1880309 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13495b03 id 17001e6 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134a5afa id 1940352 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134b5af1 id 1650349 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134c5ae8 id 167001d 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,134d5adf id 15c0083 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134e5ad6 id 1770155 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134f5acd id 16400cb 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13505ac4 id 1680102 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13515abb id 1920051 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13525ab2 id 1850182 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13535aa9 id 17301cb 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13545aa0 id 17803ed 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13555a97 id 18a0111 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13565a8e id 16d03c5 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13575a85 id 1870026 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13585a7c id 185030b 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13595a73 id 15d0190 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135a5a6a id 14b03f1 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135b5a61 id 177025e 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135c5a58 id 198016f 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135d5a4f id 1640163 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135e5a46 id 1730233 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135f5a3d id 1880130 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13495b03 id 17001e6 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134a5afa id 1940352 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134b5af1 id 1650349 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134c5ae8 id 167001d 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,134d5adf id 15c0083 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134e5ad6 id 1770155 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134f5acd id 16400cb 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13505ac4 id 1680102 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13515abb id 1920051 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13525ab2 id 1850182 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13535aa9 id 17301cb 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13545aa0 id 17803ed 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13555a97 id 18a0111 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13565a8e id 16d03c5 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13575a85 id 1870026 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13585a7c id 185030b 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13595a73 id 15d0190 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135a5a6a id 14b03f1 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135b5a61 id 177025e 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135c5a58 id 198016f 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135d5a4f id 1640163 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135e5a46 id 1730233 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135f5a3d id 1880130 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13605a34 id 16f00aa 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13615a2b id 17400e1 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13625a22 id 16b03c1 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13635a19 id 16b03ad 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13645a10 id 17e03d4 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13655a07 id 18202c0 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136659fe id 170036c 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136759f5 id 155031c 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136859ec id 1660212 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136959e3 id 15c0114 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136a59da id 15a038f 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136b59d1 id 17600bb 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136c59c8 id 1a20336 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136d59bf id 171003c 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136e59b6 id 1500008 3,0
Mar 28 15:31:32 nd05 kernel: 3976 pr_start last_stop 4 last_start 9 last_finish 4
Mar 28 15:31:33 nd05 kernel: 3976 pr_start count 4 type 2 event 9 flags 21a
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,136f59ad id 15e026f 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,137059a4 id 170017e 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,1371599b id 16b01e3 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13725992 id 18000a2 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13735989 id 177017c 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13745980 id 16d035a 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13755977 id 18102d6 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,1376596e id 1740020 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13775965 id 1780207 3,0
Mar 28 15:31:33 nd05 kernel: 3976 pr_start 9 done 1
Mar 28 15:31:33 nd05 kernel: 3976 pr_finish flags 1a
Mar 28 15:31:33 nd05 kernel: 3976 pr_start last_stop 9 last_start 10 last_finish 9
Mar 28 15:31:33 nd05 kernel: 3976 pr_start count 3 type 3 event 10 flags 21a
Mar 28 15:31:33 nd05 kernel: 3976 pr_start 10 done 1
Mar 28 15:31:33 nd05 kernel: 3977 pr_finish flags 1a
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,370232 id 23a010e 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,380229 id 2630143 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,390220 id 29f0338 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3a0217 id 2850133 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3b020e id 268035b 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3c0205 id 2710344 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3d01fc id 27701f4 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3e01f3 id 28203f7 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3f01ea id 236011f 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,4001e1 id 25e0387 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,4101d8 id 2810157 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4201cf id 248035a 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4301c6 id 24d0297 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4401bd id 2920280 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4501b4 id 267000b 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4601ab id 263012c 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4701a2 id 2930281 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,480199 id 28e028d 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,490190 id 243031a 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4a0187 id 259000d 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4b017e id 2650370 3,0
Mar 28 15:31:35 nd05 kernel: 3976 pr_start last_stop 10 last_start 15 last_finish 10
Mar 28 15:31:35 nd05 kernel: 3976 pr_start count 4 type 2 event 15 flags 21a
Mar 28 15:31:35 nd05 kernel: 3976 pr_start 15 done 1
Mar 28 15:31:35 nd05 kernel: 3976 pr_finish flags 1a
Mar 28 15:31:35 nd05 kernel: 
Mar 28 15:31:35 nd05 kernel: lock_dlm:  Assertion failed on line 357 of file /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/dlm/lock.c
Mar 28 15:31:35 nd05 kernel: lock_dlm:  assertion:  "!error"
Mar 28 15:31:35 nd05 kernel: lock_dlm:  time = 79185725
Mar 28 15:31:35 nd05 kernel: gfs-sda1: error=-22 num=3,133b5b81 lkf=9 flags=84
Mar 28 15:31:35 nd05 kernel: 
Mar 28 15:31:37 nd05 kernel: ------------[ cut here ]------------
Mar 28 15:31:37 nd05 kernel: kernel BUG at /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/dlm/lock.c:357!
Mar 28 15:31:37 nd05 kernel: invalid operand: 0000 [#1]
Mar 28 15:31:37 nd05 kernel: SMP 
Mar 28 15:31:37 nd05 kernel: Modules linked in: lock_dlm dlm cman gfs lock_harness ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msgha
ndler binfmt_misc dm_mirror dm_round_robin dm_multipath dm_mod video thermal processor fan button battery ac uhci_hcd usbcore hw_random shpchp
 pci_hotplug e1000 bonding qla2300 qla2xxx scsi_transport_fc sd_mod
Mar 28 15:31:37 nd05 kernel: CPU:    1
Mar 28 15:31:37 nd05 kernel: EIP:    0060:[<f89e9556>]    Not tainted VLI
Mar 28 15:31:37 nd05 kernel: EFLAGS: 00010282   (2.6.15-rc7smp) 
Mar 28 15:31:37 nd05 kernel: EIP is at do_dlm_unlock+0x8f/0xa4 [lock_dlm]
Mar 28 15:31:37 nd05 kernel: eax: 00000004   ebx: f560c180   ecx: f5cf7f10   edx: f89edf11
Mar 28 15:31:37 nd05 kernel: esi: ffffffea   edi: f8a7f000   ebp: f8a61580   esp: f5cf7f0c
Mar 28 15:31:37 nd05 kernel: ds: 007b   es: 007b   ss: 0068
Mar 28 15:31:37 nd05 kernel: Process gfs_glockd (pid: 3979, threadinfo=f5cf6000 task=f6735030)
Mar 28 15:31:37 nd05 kernel: Stack: f89edf11 f8a7f000 f55517b0 f89e97f0 f560c180 f8a3c64f f560c180 00000003 
Mar 28 15:31:37 nd05 kernel:        f55517d4 f8a329d8 f8a7f000 f560c180 00000003 f55517b0 f8a61580 f55517b0 
Mar 28 15:31:37 nd05 kernel:        f8a7f000 f8a31f28 f55517b0 f55517b0 00000001 f8a31fdc d82c34c0 f55517b0 
Mar 28 15:31:37 nd05 kernel: Call Trace:
Mar 28 15:31:37 nd05 kernel:  [<f89e97f0>] lm_dlm_unlock+0x19/0x20 [lock_dlm]
Mar 28 15:31:37 nd05 kernel:  [<f8a3c64f>] gfs_lm_unlock+0x2c/0x43 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<f8a329d8>] gfs_glock_drop_th+0xe8/0x122 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<f8a31f28>] rq_demote+0x76/0x92 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<f8a31fdc>] run_queue+0x54/0xb5 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<f8a320f4>] unlock_on_glock+0x1d/0x24 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<f8a34013>] gfs_reclaim_glock+0xbd/0x135 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<f8a28734>] gfs_glockd+0x3a/0xe3 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<c0116f3d>] default_wake_function+0x0/0x12
Mar 28 15:31:37 nd05 kernel:  [<c010328a>] ret_from_fork+0x6/0x14
Mar 28 15:31:37 nd05 kernel:  [<c0116f3d>] default_wake_function+0x0/0x12
Mar 28 15:31:37 nd05 kernel:  [<f8a286fa>] gfs_glockd+0x0/0xe3 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<c0101ab5>] kernel_thread_helper+0x5/0xb
Mar 28 15:31:37 nd05 kernel: Code: 73 34 ff 73 2c ff 73 08 ff 73 04 ff 73 0c 56 8b 03 ff 70 18 68 09 e0 9e f8 e8 ac 14 73 c7 83 c4 34 68 11 df
 9e f8 e8 9f 14 73 c7 <0f> 0b 65 01 58 de 9e f8 68 13 df 9e f8 e8 23 0d 73 c7 5b 5e c3 
-->

What problem may be there? 
Thanks for any reply!
Luckey




From troels at arvin.dk  Mon Apr  3 14:16:55 2006
From: troels at arvin.dk (Troels Arvin)
Date: Mon, 03 Apr 2006 16:16:55 +0200
Subject: [Linux-cluster] Using a null modem for heartbeat with CS4?
Message-ID: <pan.2006.04.03.14.16.54.328000@arvin.dk>

Hello,

I would like to have to heartbeat channels between my cluster nodes: A
cross-over ethernet cable and a null modem cable.

In the manual for Cluster Suite 3 (CS2), it's stated that a null modem
cable can be used for heartbeat.

The manual for CS4 doesn't mention null modem cables. Isn't it possible to
use null modem cables for heartbeat in CS4?

-- 
Greetings from Troels Arvin




From libregeek at gmail.com  Mon Apr  3 14:20:03 2006
From: libregeek at gmail.com (Manilal K M)
Date: Mon, 3 Apr 2006 19:50:03 +0530
Subject: [Linux-cluster] Using a null modem for heartbeat with CS4?
In-Reply-To: <pan.2006.04.03.14.16.54.328000@arvin.dk>
References: <pan.2006.04.03.14.16.54.328000@arvin.dk>
Message-ID: <2315046d0604030720p1e2d4fc3n8b5f2708649e950f@mail.gmail.com>

On 03/04/06, Troels Arvin <troels at arvin.dk> wrote:
> Hello,
>
> I would like to have to heartbeat channels between my cluster nodes: A
> cross-over ethernet cable and a null modem cable.
>
> In the manual for Cluster Suite 3 (CS2), it's stated that a null modem
> cable can be used for heartbeat.
>
> The manual for CS4 doesn't mention null modem cables. Isn't it possible to
> use null modem cables for heartbeat in CS4?
AFAIK, Null modems are not supported in CS4.

regards
Manilal



From Bowie_Bailey at BUC.com  Mon Apr  3 14:30:36 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Mon, 3 Apr 2006 10:30:36 -0400 
Subject: [Linux-cluster] GFS is for what and how it works ?
Message-ID: <4766EEE585A6D311ADF500E018C154E302133870@bnifex.cis.buc.com>

Halomoan wrote:
> Sorry, I'm newbie in GFS.
> 
> Followed Redhat's GFS documentation
> To find out how GFS works, I have 2 nodes (node A and node B) for
> GFS and 1 node (node C) for GNBD server. It runs with no error but i
> don't know how to use it (GFS)  
> 
> I attached my /etc/cluster/cluster.conf below.
> 
> My question is:
> 
> 1. At a time, how many nodes have GFS filesystem mounted ? Where is
> the cluster's work in GFS ? 

You can mount one node for each journal you created when you built the
GFS filesystem.

What the cluster does is manage access to the GFS filesystem and
(attempt to) ensure that if one node starts having problems, it can't
corrupt the filesystem.

> 2. How do I shared the GFS filesystem to other server ? Do I need
> other software ? 

GFS is simply a filesystem which is capable of being used on multiple
nodes at the same time.  How you mount it depends on what software or
hardware you are using to share the media.  GNBD can be used by a
server to share it's storage with the other nodes.  You can also use
iSCSI, aoe, and others to connect each node directly to a separate
storage unit.

> 3. From this configuration, if node A failed, what happen to the GFS
> filesystem ? failover to node B ? How about with the other server
> that is using the GFS filesystem in node A ?  

There is no failover.  Everything is always active.  As long as the
storage itself doesn't fail, the failure of one node should not be a
problem.  Unless, of course, it causes your cluster to lose quorum
(drop below the minimum number of servers necessary to maintain the
cluster).

> 4. Could you give me example what is actually the GFS real usage in
> real live ? 

I'm using it to share a 1.2 TB storage area between two systems that
use it for processing and a third system that has direct access for
making backups.

> I'm absolutely confuse with this GFS on how they works.

Yea.  The documentation is not very extensive at this point.

-- 
Bowie



From JACOB_LIBERMAN at Dell.com  Mon Apr  3 20:16:17 2006
From: JACOB_LIBERMAN at Dell.com (JACOB_LIBERMAN at Dell.com)
Date: Mon, 3 Apr 2006 15:16:17 -0500
Subject: [Linux-cluster] Order of execution
Message-ID: <BC430F453501174992B9D9E8AFB7519AD5310B@ausx3mps309.aus.amer.dell.com>

Hi cluster geniuses,

I have a quick question. 

I am trying to write a custom startup script for an application called
adsi rms.  The application comes with its own startup script that
requires the disk resource and network interface. Here is my question:

When I create a custom startup script for the service and place it in
/etc/init.d/, the cluster service can start the application successfully
but not all services come online because the shared disk and IP do not
appear to be available when the service starts.

Is there a way to set the order of execution for a service so that the
application will not start until AFTER the disk and network interface
are available?

Thanks again, Jacob



From eric at bootseg.com  Mon Apr  3 20:26:44 2006
From: eric at bootseg.com (Eric Kerin)
Date: Mon, 03 Apr 2006 16:26:44 -0400
Subject: [Linux-cluster] Order of execution
In-Reply-To: <BC430F453501174992B9D9E8AFB7519AD5310B@ausx3mps309.aus.amer.dell.com>
References: <BC430F453501174992B9D9E8AFB7519AD5310B@ausx3mps309.aus.amer.dell.com>
Message-ID: <1144096004.4004.14.camel@auh5-0479.corp.jabil.org>

Jacob, 

The start/stop orders are defined in /usr/share/cluster/service.sh
look under the special tag, there should be a child tag for each type of
child node of service.

Mine looks like so (current rgmanager rpm from RHN): 
        <child type="fs" start="1" stop="8"/>
        <child type="clusterfs" start="2" stop="7"/>
        <child type="netfs" start="3" stop="6"/>
        <child type="nfsexport" start="4" stop="5"/>

        <child type="nfsclient" start="5" stop=""/>

        <child type="ip" start="6" stop="2"/>
        <child type="smb" start="7" stop="3"/>
        <child type="script" start="7" stop="1"/>


For starting, fs should start first, then clusterfs, etc... finally smb
and script start.

For stopping, script would be stopped first, then ip, etc... finally fs.

Thanks, 
Eric Kerin
eric at bootseg.com


On Mon, 2006-04-03 at 15:16 -0500, JACOB_LIBERMAN at Dell.com wrote:
> Hi cluster geniuses,
> 
> I have a quick question. 
> 
> I am trying to write a custom startup script for an application called
> adsi rms.  The application comes with its own startup script that
> requires the disk resource and network interface. Here is my question:
> 
> When I create a custom startup script for the service and place it in
> /etc/init.d/, the cluster service can start the application successfully
> but not all services come online because the shared disk and IP do not
> appear to be available when the service starts.
> 
> Is there a way to set the order of execution for a service so that the
> application will not start until AFTER the disk and network interface
> are available?
> 
> Thanks again, Jacob
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From jbrassow at redhat.com  Mon Apr  3 22:37:56 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Mon, 3 Apr 2006 17:37:56 -0500
Subject: [Linux-cluster] Manual fencing doest work
In-Reply-To: <d4e2d9970604030330v7f32571cga1091fd6e2132f9d@mail.gmail.com>
References: <d4e2d9970604030330v7f32571cga1091fd6e2132f9d@mail.gmail.com>
Message-ID: <6475746f533faa0d27117afbbcf54e7f@redhat.com>

Fence manual setup simply waits until either
1) the user reboots the failed node _and_ uses fence_ack_manaul to 
notify the node asking for the fence that you have done so.
or
2) the node that "failed" comes back up

In the steps you described, you never acknowledged the request for 
fencing - hence, you have to wait for the machine to come back up.

  brassow

BTW, i'd never use manual fencing in production.

On Apr 3, 2006, at 5:30 AM, Thai Duong wrote:

> Hi all,
>
>  I have a 2 node GFS 6.1 cluster with the following configuration:
>
>  <?xml version="1.0"?>
>  <cluster name="fccrac" config_version="5">
>
>  ??? <cman two_node="1" expected_votes="1">
>  ??? </cman>
>
>  ??? <clusternodes>
>  ????? <clusternode name="fcc1" votes="1">
>  ?????? <fence>
>  ??????? <method name="single">
>  ???????? <device name="human" nodename="fcc1"/>
>  ??????? </method>
>  ?????? </fence>
>  ????? </clusternode>
>
>  ????? <clusternode name="fcc4" votes="1">
>  ?????? <fence>
>  ??????? <method name="single">
>  ???????? <device name="human" nodename="fcc4"/>
>  ??????? </method>
>  ?????? </fence>
>  ????? </clusternode>
>  ?? </clusternodes>
>
>  ? <fence_devices>
>  ?? <fence_device name="human" agent="fence_manual"/>
>  ? </fence_devices>
>
>  ?</cluster>
>
>  It turns out that manual fencing doest work as expected. When I force 
> power down a node, the other could not fence it and worse, the whole 
> GFS file system is freeze waiting for the downed node to be up again. 
> I got something like below in kernel log
>
>  Apr? 2 16:46:28 fcc1 fenced[3444]: fencing node "fcc4"
>  Apr? 2 16:46:28 fcc1 fenced[3444]: fence "fcc4" failed
>
>  Some information about GFS and kernel:
>
>  [root at fcc1 ~]# rpm -qa | grep GFS
>  GFS-6.1.3-0
>  GFS-kernel-2.6.9-45.0.2
>
>  [root at fcc1 ~]# uname -a
>  Linux fcc1 2.6.9-22.0.2.EL #1 SMP Thu Jan 5 17:04:58 EST 2006 ia64 
> ia64 ia64 GNU/Linux
>
>  Please help.
>
>  TIA,
>
>  Thai Duong.
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From teigland at redhat.com  Tue Apr  4 03:08:53 2006
From: teigland at redhat.com (David Teigland)
Date: Mon, 3 Apr 2006 22:08:53 -0500
Subject: [Linux-cluster] Manual fencing doest work
In-Reply-To: <d4e2d9970604030330v7f32571cga1091fd6e2132f9d@mail.gmail.com>
References: <d4e2d9970604030330v7f32571cga1091fd6e2132f9d@mail.gmail.com>
Message-ID: <20060404030853.GA12817@redhat.com>

On Mon, Apr 03, 2006 at 05:30:16PM +0700, Thai Duong wrote:
>   <fence_devices>
>    <fence_device name="human" agent="fence_manual"/>
>   </fence_devices>

Try "fencedevices" and "fencedevice".

Dave



From halomoan at powere2e.com  Tue Apr  4 06:11:18 2006
From: halomoan at powere2e.com (Halomoan Chow)
Date: Tue, 4 Apr 2006 14:11:18 +0800
Subject: [Linux-cluster] GFS is for what and how it works ?
In-Reply-To: <4766EEE585A6D311ADF500E018C154E302133870@bnifex.cis.buc.com>
Message-ID: <001c01c657ae$9d9595f0$100fcc0a@pc002>

Thank you Bowie
You gave me a little light in GFS jungle :D


Regards,

Halomoan


-----Original Message-----
From: Bowie Bailey [mailto:Bowie_Bailey at BUC.com] 
Sent: Monday, April 03, 2006 10:31 PM
To: halomoan at powere2e.com
Cc: linux clustering
Subject: RE: [Linux-cluster] GFS is for what and how it works ?

Halomoan wrote:
> Sorry, I'm newbie in GFS.
> 
> Followed Redhat's GFS documentation
> To find out how GFS works, I have 2 nodes (node A and node B) for
> GFS and 1 node (node C) for GNBD server. It runs with no error but i
> don't know how to use it (GFS)  
> 
> I attached my /etc/cluster/cluster.conf below.
> 
> My question is:
> 
> 1. At a time, how many nodes have GFS filesystem mounted ? Where is
> the cluster's work in GFS ? 

You can mount one node for each journal you created when you built the
GFS filesystem.

What the cluster does is manage access to the GFS filesystem and
(attempt to) ensure that if one node starts having problems, it can't
corrupt the filesystem.

> 2. How do I shared the GFS filesystem to other server ? Do I need
> other software ? 

GFS is simply a filesystem which is capable of being used on multiple
nodes at the same time.  How you mount it depends on what software or
hardware you are using to share the media.  GNBD can be used by a
server to share it's storage with the other nodes.  You can also use
iSCSI, aoe, and others to connect each node directly to a separate
storage unit.

> 3. From this configuration, if node A failed, what happen to the GFS
> filesystem ? failover to node B ? How about with the other server
> that is using the GFS filesystem in node A ?  

There is no failover.  Everything is always active.  As long as the
storage itself doesn't fail, the failure of one node should not be a
problem.  Unless, of course, it causes your cluster to lose quorum
(drop below the minimum number of servers necessary to maintain the
cluster).

> 4. Could you give me example what is actually the GFS real usage in
> real live ? 

I'm using it to share a 1.2 TB storage area between two systems that
use it for processing and a third system that has direct access for
making backups.

> I'm absolutely confuse with this GFS on how they works.

Yea.  The documentation is not very extensive at this point.

-- 
Bowie



From JACOB_LIBERMAN at Dell.com  Tue Apr  4 12:55:44 2006
From: JACOB_LIBERMAN at Dell.com (JACOB_LIBERMAN at Dell.com)
Date: Tue, 4 Apr 2006 07:55:44 -0500
Subject: [Linux-cluster] Order of execution
Message-ID: <BC430F453501174992B9D9E8AFB7519AD5310E@ausx3mps309.aus.amer.dell.com>

Eric,

I am running RHEL3 U4 with clumanager 1.2.22.  I do not have the options
listed below.

Does anyone have an example script for this version?  Lon?

Thanks, Jacob 

> -----Original Message-----
> From: Eric Kerin [mailto:eric at bootseg.com] 
> Sent: Monday, April 03, 2006 3:27 PM
> To: Liberman, Jacob
> Cc: linux clustering
> Subject: Re: [Linux-cluster] Order of execution
> 
> Jacob, 
> 
> The start/stop orders are defined in 
> /usr/share/cluster/service.sh look under the special tag, 
> there should be a child tag for each type of child node of service.
> 
> Mine looks like so (current rgmanager rpm from RHN): 
>         <child type="fs" start="1" stop="8"/>
>         <child type="clusterfs" start="2" stop="7"/>
>         <child type="netfs" start="3" stop="6"/>
>         <child type="nfsexport" start="4" stop="5"/>
> 
>         <child type="nfsclient" start="5" stop=""/>
> 
>         <child type="ip" start="6" stop="2"/>
>         <child type="smb" start="7" stop="3"/>
>         <child type="script" start="7" stop="1"/>
> 
> 
> For starting, fs should start first, then clusterfs, etc... 
> finally smb and script start.
> 
> For stopping, script would be stopped first, then ip, etc... 
> finally fs.
> 
> Thanks,
> Eric Kerin
> eric at bootseg.com
> 
> 
> On Mon, 2006-04-03 at 15:16 -0500, JACOB_LIBERMAN at Dell.com wrote:
> > Hi cluster geniuses,
> > 
> > I have a quick question. 
> > 
> > I am trying to write a custom startup script for an 
> application called
> > adsi rms.  The application comes with its own startup script that
> > requires the disk resource and network interface. Here is 
> my question:
> > 
> > When I create a custom startup script for the service and 
> place it in
> > /etc/init.d/, the cluster service can start the application 
> successfully
> > but not all services come online because the shared disk 
> and IP do not
> > appear to be available when the service starts.
> > 
> > Is there a way to set the order of execution for a service 
> so that the
> > application will not start until AFTER the disk and network 
> interface
> > are available?
> > 
> > Thanks again, Jacob
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From pcaulfie at redhat.com  Tue Apr  4 13:40:52 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 04 Apr 2006 14:40:52 +0100
Subject: [Linux-cluster] Using a null modem for heartbeat with CS4?
In-Reply-To: <2315046d0604030720p1e2d4fc3n8b5f2708649e950f@mail.gmail.com>
References: <pan.2006.04.03.14.16.54.328000@arvin.dk>
	<2315046d0604030720p1e2d4fc3n8b5f2708649e950f@mail.gmail.com>
Message-ID: <44327764.4080108@redhat.com>

Manilal K M wrote:
> On 03/04/06, Troels Arvin <troels at arvin.dk> wrote:
>> Hello,
>>
>> I would like to have to heartbeat channels between my cluster nodes: A
>> cross-over ethernet cable and a null modem cable.
>>
>> In the manual for Cluster Suite 3 (CS2), it's stated that a null modem
>> cable can be used for heartbeat.
>>
>> The manual for CS4 doesn't mention null modem cables. Isn't it possible to
>> use null modem cables for heartbeat in CS4?
> AFAIK, Null modems are not supported in CS4.
> 

If you're really desperate you could set up a serial PPP link between the two
machines and do the IP heartbeat over that.

Don't tell anyone I said that though ;-)


-- 

patrick



From Alain.Moulle at bull.net  Wed Apr  5 08:51:33 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Wed, 05 Apr 2006 10:51:33 +0200
Subject: [Linux-cluster] CS4 Update2 / cman systematically FAILED on service
	stop
Message-ID: <44338515.10200@bull.net>

Hi

I have a systematic problem with cman stop on my configuration :

knowing that there is no service with autostart in
the cluster.conf, and that I have only one main service
to be started by : clusvcadm -e SERVICE -m <node_name>

First test :
launch CS4 OK
stop CS4 OK
no problem

Second test :
launch CS4
clusvcadm -e SERVICE -m <node_name>
then
clusvcadm -d SERVICE
stop CS4 ...

 in this case, cman stop is systematically FAILED ...

This is true if both cases where CS4 is started
on peer node as well as where is it stopped.

Any clue or track to identify the problem ?

Thanks
Alain Moull?




From ben.yarwood at juno.co.uk  Wed Apr  5 11:51:31 2006
From: ben.yarwood at juno.co.uk (Ben Yarwood)
Date: Wed, 5 Apr 2006 12:51:31 +0100
Subject: [Linux-cluster] Monitoring Cluster Services
Message-ID: <089401c658a7$481d72b0$3964a8c0@WS076>

I have set up a monitoring tool to check that all the appropriate processes
are running on our cluster nodes.  I am currently checking for the
following:

ccsd , 1 instance
cman_comms, 1 instance
cman_memb , 1 instance
cman_serviced, 1 instance
cman_hbeat, 1 instance
fenced, 1 instance
clvmd, 1 instance
gfs_inoded, 1 instance for each gfs mount
clurgmgrd, 1 instance

Can anyone tell me if this is a correct and exhaustive list.

Regards
Ben





From ilya at cs.msu.su  Wed Apr  5 15:27:57 2006
From: ilya at cs.msu.su (Ilya M. Slepnev)
Date: Wed, 05 Apr 2006 19:27:57 +0400
Subject: [Linux-cluster] Problems with compilation.
Message-ID: <1144250877.8183.19.camel@localhost.localdomain>

Hi,

I'm sorry for inconvenience, did anybody faced such problem with
configuring cluster-suite? It writes, that there is no directory named
"/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2", but it is
there... Am I doing something wrong? Is there some FAQ about that?

Thanks, Ilya...

khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make
cd dlm-kernel && make
make[1]: Entering directory
`/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
cd src2 && make all
make[2]: Entering directory
`/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
make -C  M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2
modules USING_KBUILD=yes
make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No
such file or directory.  Stop.
make: Entering an unknown directorymake: Leaving an unknown
directorymake[2]: *** [all] Error 2
make[2]: Leaving directory
`/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
make[1]: *** [all] Error 2
make[1]: Leaving directory
`/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
make: *** [all] Error 2
khext at hess:~/nigma/ext3/gfs/cvs/cluster$




From jbrassow at redhat.com  Wed Apr  5 15:40:45 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Wed, 5 Apr 2006 10:40:45 -0500
Subject: [Linux-cluster] Problems with compilation.
In-Reply-To: <1144250877.8183.19.camel@localhost.localdomain>
References: <1144250877.8183.19.camel@localhost.localdomain>
Message-ID: <6e718842c9112d2f91e40fc31e3b29b9@redhat.com>

might want to skip the 'make' by itself... try:

dir/cluster> make clean; make distclean
dir/cluster> ./configure --kernel_src=<path>
dir/cluster> make install

  brassow
On Apr 5, 2006, at 10:27 AM, Ilya M. Slepnev wrote:

> Hi,
>
> I'm sorry for inconvenience, did anybody faced such problem with
> configuring cluster-suite? It writes, that there is no directory named
> "/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2", but it is
> there... Am I doing something wrong? Is there some FAQ about that?
>
> Thanks, Ilya...
>
> khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make
> cd dlm-kernel && make
> make[1]: Entering directory
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
> cd src2 && make all
> make[2]: Entering directory
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
> make -C  M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2
> modules USING_KBUILD=yes
> make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No
> such file or directory.  Stop.
> make: Entering an unknown directorymake: Leaving an unknown
> directorymake[2]: *** [all] Error 2
> make[2]: Leaving directory
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
> make: *** [all] Error 2
> khext at hess:~/nigma/ext3/gfs/cvs/cluster$
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From ilya at cs.msu.su  Wed Apr  5 16:16:25 2006
From: ilya at cs.msu.su (Ilya M. Slepnev)
Date: Wed, 05 Apr 2006 20:16:25 +0400
Subject: [Linux-cluster] Problems with compilation.
In-Reply-To: <6e718842c9112d2f91e40fc31e3b29b9@redhat.com>
References: <1144250877.8183.19.camel@localhost.localdomain>
	<6e718842c9112d2f91e40fc31e3b29b9@redhat.com>
Message-ID: <1144253785.8185.27.camel@localhost.localdomain>

Surely, I tried that first... Here is a lot of output of configure and
"make install"... It seems not better than previous!-)

khext at hess:~/nigma/ext3/gfs/cvs/cluster$ ./configure --kernel_src=/home/khext/nigma/ext3/linux-2.6.16.1
configure dlm-kernel

Configuring Makefiles for your system...
Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 101.
configure gnbd-kernel

Configuring Makefiles for your system...
Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 95.
configure magma

Configuring Makefiles for your system...
Completed Makefile configuration

configure ccs

Configuring Makefiles for your system...
Completed Makefile configuration

configure cman

Configuring Makefiles for your system...
Completed Makefile configuration

configure dlm

Configuring Makefiles for your system...
Completed Makefile configuration

configure fence

Configuring Makefiles for your system...
Completed Makefile configuration

configure iddev

Configuring Makefiles for your system...
Completed Makefile configuration

configure gulm

Configuring Makefiles for your system...
Completed Makefile configuration

configure gfs-kernel

Configuring Makefiles for your system...
Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 107.
configure gfs2-kernel

Configuring Makefiles for your system...
Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 95.
configure gfs

Configuring Makefiles for your system...
Completed Makefile configuration

configure gfs2

Configuring Makefiles for your system...
Completed Makefile configuration

configure gnbd

Configuring Makefiles for your system...
Completed Makefile configuration

configure magma-plugins

Configuring Makefiles for your system...
Completed Makefile configuration

configure rgmanager

Configuring Makefiles for your system...
Completed Makefile configuration

configure cmirror

Configuring Makefiles for your system...
Can't open /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at ./configure line 95.
khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make install
cd dlm-kernel && make install
make[1]: Entering directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
cd src2 && make install
make[2]: Entering directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
make -C  M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2 modules USING_KBUILD=yes
make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No such file or directory.  Stop.
make: Entering an unknown directorymake: Leaving an unknown directorymake[2]: *** [all] Error 2
make[2]: Leaving directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
make[1]: *** [install] Error 2
make[1]: Leaving directory `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
make: *** [install] Error 2
khext at hess:~/nigma/ext3/gfs/cvs/cluster$




On Wed, 2006-04-05 at 10:40 -0500, Jonathan E Brassow wrote:
> might want to skip the 'make' by itself... try:
> 
> dir/cluster> make clean; make distclean
> dir/cluster> ./configure --kernel_src=<path>
> dir/cluster> make install
> 
>   brassow
> On Apr 5, 2006, at 10:27 AM, Ilya M. Slepnev wrote:
> 
> > Hi,
> >
> > I'm sorry for inconvenience, did anybody faced such problem with
> > configuring cluster-suite? It writes, that there is no directory named
> > "/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2", but it is
> > there... Am I doing something wrong? Is there some FAQ about that?
> >
> > Thanks, Ilya...
> >
> > khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make
> > cd dlm-kernel && make
> > make[1]: Entering directory
> > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
> > cd src2 && make all
> > make[2]: Entering directory
> > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
> > make -C  M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2
> > modules USING_KBUILD=yes
> > make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No
> > such file or directory.  Stop.
> > make: Entering an unknown directorymake: Leaving an unknown
> > directorymake[2]: *** [all] Error 2
> > make[2]: Leaving directory
> > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
> > make[1]: *** [all] Error 2
> > make[1]: Leaving directory
> > `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
> > make: *** [all] Error 2
> > khext at hess:~/nigma/ext3/gfs/cvs/cluster$
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From jbrassow at redhat.com  Wed Apr  5 18:36:55 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Wed, 5 Apr 2006 13:36:55 -0500
Subject: [Linux-cluster] Problems with compilation.
In-Reply-To: <1144253785.8185.27.camel@localhost.localdomain>
References: <1144250877.8183.19.camel@localhost.localdomain>
	<6e718842c9112d2f91e40fc31e3b29b9@redhat.com>
	<1144253785.8185.27.camel@localhost.localdomain>
Message-ID: <23453f82d4985b73787dc15e364ee7aa@redhat.com>

did you setup and do a 'make' in your kernel tree.  Failing to do that 
will give those errors.

  brassow

On Apr 5, 2006, at 11:16 AM, Ilya M. Slepnev wrote:

> Surely, I tried that first... Here is a lot of output of configure and
> "make install"... It seems not better than previous!-)
>
> khext at hess:~/nigma/ext3/gfs/cvs/cluster$ ./configure 
> --kernel_src=/home/khext/nigma/ext3/linux-2.6.16.1
> configure dlm-kernel
>
> Configuring Makefiles for your system...
> Can't open 
> /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at 
> ./configure line 101.
> configure gnbd-kernel
>
> Configuring Makefiles for your system...
> Can't open 
> /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at 
> ./configure line 95.
> configure magma
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure ccs
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure cman
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure dlm
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure fence
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure iddev
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure gulm
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure gfs-kernel
>
> Configuring Makefiles for your system...
> Can't open 
> /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at 
> ./configure line 107.
> configure gfs2-kernel
>
> Configuring Makefiles for your system...
> Can't open 
> /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at 
> ./configure line 95.
> configure gfs
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure gfs2
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure gnbd
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure magma-plugins
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure rgmanager
>
> Configuring Makefiles for your system...
> Completed Makefile configuration
>
> configure cmirror
>
> Configuring Makefiles for your system...
> Can't open 
> /home/khext/nigma/ext3/linux-2.6.16.1/include/linux/version.h at 
> ./configure line 95.
> khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make install
> cd dlm-kernel && make install
> make[1]: Entering directory 
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
> cd src2 && make install
> make[2]: Entering directory 
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
> make -C  M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2 
> modules USING_KBUILD=yes
> make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: No 
> such file or directory.  Stop.
> make: Entering an unknown directorymake: Leaving an unknown 
> directorymake[2]: *** [all] Error 2
> make[2]: Leaving directory 
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
> make[1]: *** [install] Error 2
> make[1]: Leaving directory 
> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
> make: *** [install] Error 2
> khext at hess:~/nigma/ext3/gfs/cvs/cluster$
>
>
>
>
> On Wed, 2006-04-05 at 10:40 -0500, Jonathan E Brassow wrote:
>> might want to skip the 'make' by itself... try:
>>
>> dir/cluster> make clean; make distclean
>> dir/cluster> ./configure --kernel_src=<path>
>> dir/cluster> make install
>>
>>   brassow
>> On Apr 5, 2006, at 10:27 AM, Ilya M. Slepnev wrote:
>>
>>> Hi,
>>>
>>> I'm sorry for inconvenience, did anybody faced such problem with
>>> configuring cluster-suite? It writes, that there is no directory 
>>> named
>>> "/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2", but it is
>>> there... Am I doing something wrong? Is there some FAQ about that?
>>>
>>> Thanks, Ilya...
>>>
>>> khext at hess:~/nigma/ext3/gfs/cvs/cluster$ make
>>> cd dlm-kernel && make
>>> make[1]: Entering directory
>>> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
>>> cd src2 && make all
>>> make[2]: Entering directory
>>> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
>>> make -C  M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2
>>> modules USING_KBUILD=yes
>>> make: *** M=/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2: 
>>> No
>>> such file or directory.  Stop.
>>> make: Entering an unknown directorymake: Leaving an unknown
>>> directorymake[2]: *** [all] Error 2
>>> make[2]: Leaving directory
>>> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel/src2'
>>> make[1]: *** [all] Error 2
>>> make[1]: Leaving directory
>>> `/home/khext/nigma/ext3/gfs/cvs/cluster/dlm-kernel'
>>> make: *** [all] Error 2
>>> khext at hess:~/nigma/ext3/gfs/cvs/cluster$
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From jeffbethke at aol.net  Wed Apr  5 20:57:54 2006
From: jeffbethke at aol.net (Jeffrey Bethke)
Date: Wed, 05 Apr 2006 16:57:54 -0400
Subject: [Linux-cluster] speeding up df/statvfs( ) calls to large GFS
	volumes?
Message-ID: <44342F52.3030608@aol.net>

Hi!  
   Is there a way to speed up the return vaules for df/statvfs( ) when 
using large GFS volume(e.g 25TB+)?  I'm currently working a problem 
where, as part of disk monitoring,  we need to run a statvfs( ) every 
few minutes. The problem is that we can't determine the interval of 
running the tool as GFS can, on occasion,  take a long time  to return a 
value!

So, is there any variable I can tweak w/ gfs_tool, or mount option I can 
apply outside of 'noatime',  that will help things like 'df -h' run 
consistently faster? 

Help?

Thanks!
.jeff




From mtp at tilted.com  Thu Apr  6 01:22:08 2006
From: mtp at tilted.com (Mark Petersen)
Date: Wed, 05 Apr 2006 20:22:08 -0500
Subject: [Linux-cluster] GNBD, CLVM and snapshots
Message-ID: <7.0.1.0.2.20060405195416.02784ab0@tilted.com>

I'm wanting to use gnbd with clvm to export block devices for 3 
(possibly more) hosts running Xen.  Each host will have access to the 
single gnbd export with LVM.  Only a single host will ever actually 
have the device mounted.  GNBD can support live migrations with a 
block device, which is the main attraction.

So a little info on Xen and what I want to do.  There are dom0's 
(privileged VM) that have full access to any running domU (VM 
instances started by the dom0.)  The dom0 will be running 
clvm/CCS/gnbd-Client/etc.  The dom0 will start a domU that mounts the 
lv, only the dom0 needs direct access to this resource.  In this 
configuration, would it be possible to take snapshots of the LV from 
the dom0?  What about from another dom0 in the cluster?  What about 
the gnbd-server?

Is work still be done on csnap?  There isn't much documentation on 
this, and it seems like it might be GFS specific.

If this won't work with clvm and gnbd, is there an alternative that 
would work?  I really want to be able to do snapshots and live 
migration with block devices.  I'm not sure this is possible.  I may 
fallback to only live migrations with gnbd if I have to.

Finally, ideally this would be backed by DRBD, but can gnbd handle a 
primary/secondary role instead of doing multipath (which won't work 
with drbd.)  Failover mode was mentioned in posts from over a year 
ago, and it sounds promising. 



From starstom at gmail.com  Thu Apr  6 03:53:34 2006
From: starstom at gmail.com (Tom Stars)
Date: Thu, 6 Apr 2006 09:23:34 +0530
Subject: [Linux-cluster] About Linux Cluster
Message-ID: <551992020604052053m7bbc7f8cua7f20da14cf0d28f@mail.gmail.com>

Hi

I am newbie to  linux clusters. i would like to setup a linux cluster of 4
nodes, and a DAS box for Storage connected to
linux systems.through an optical fiber. All linux systems are running RHEL
4.0. AS
Q1)Do i need GFS to be configured in case i have to run oracle on the
cluster nodes . (Oracle 11i Application Server)
Q2) when do i need GFS.
Q3) If the DAS is mounted on 1 node and create an NFS Server and provides
shares to other nodes, does it affect the performance.

Thanks.
Tom.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060406/37354ec3/attachment.htm>

From Alain.Moulle at bull.net  Thu Apr  6 07:13:28 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Thu, 06 Apr 2006 09:13:28 +0200
Subject: [Linux-cluster] RE: CS4 Update2 / cman systematically FAILED on
 service stop /// New question ///
Message-ID: <4434BF98.8070002@bull.net>

I've identified the problem : in fact, that was due to
a process launched via the SERVICE script, but which
was not stopped on clusvcadm -s SERVICE (or -d) .
Then, on service cman stop, the modprobe -r dlm was successful
but at the end of this modprobe -r, the lsmod
indicates one user left on cman :
   cman                  136480  1
but without user identification (such as "cman 136480  10 dlm" when cs4
is all active).
So the modprobe -r cman was then impossible.

Could someone explain to me the link between a process
managed in the SERVICE script and the remaining 1 user
on cman ?

Thanks
Alain Moull?
>> I have a systematic problem with cman stop on my configuration :
>> knowing that there is no service with autostart in
>> the cluster.conf, and that I have only one main service
>> to be started by : clusvcadm -e SERVICE -m <node_name>
>> First test :
>> launch CS4 OK
>> stop CS4 OK
>> no problem
>> Second test :
>> launch CS4
>> clusvcadm -e SERVICE -m <node_name>
>> then
>> clusvcadm -d SERVICE
>> stop CS4 ...
>>  in this case, cman stop is systematically FAILED ...
>> This is true if both cases where CS4 is started
>> on peer node as well as where is it stopped.
>> Any clue or track to identify the problem ?




From pcaulfie at redhat.com  Thu Apr  6 07:25:53 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 06 Apr 2006 08:25:53 +0100
Subject: [Linux-cluster] RE: CS4 Update2 / cman systematically FAILED
	on service stop /// New question ///
In-Reply-To: <4434BF98.8070002@bull.net>
References: <4434BF98.8070002@bull.net>
Message-ID: <4434C281.6010804@redhat.com>

Alain Moulle wrote:
> I've identified the problem : in fact, that was due to
> a process launched via the SERVICE script, but which
> was not stopped on clusvcadm -s SERVICE (or -d) .
> Then, on service cman stop, the modprobe -r dlm was successful
> but at the end of this modprobe -r, the lsmod
> indicates one user left on cman :
>    cman                  136480  1
> but without user identification (such as "cman 136480  10 dlm" when cs4
> is all active).
> So the modprobe -r cman was then impossible.
> 
> Could someone explain to me the link between a process
> managed in the SERVICE script and the remaining 1 user
> on cman ?

There's no direct link. The usage count on cman is simply the number of links
to it. They could be kernel or userspace users.

In this case it could be CCS. Even if the cluster isn't operating, ccs polls
the cluster manager to see if has come back up.

-- 

patrick



From figaro at neo-info.net  Thu Apr  6 09:44:27 2006
From: figaro at neo-info.net (Figaro Yang)
Date: Thu, 6 Apr 2006 17:44:27 +0800
Subject: [Linux-cluster] lock_gulm.ko needs unknown symbol tap_sig
Message-ID: <011701c6595e$b8837a60$c800a8c0@neooffice>

Hi ~ All?

 

I have some question for rebuild gfs kernel , that has some error messages :

if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map
2.6.11.img;fi

WARNING: /lib/modules/2.6.11/kernel/fs/gfs_locking/lock_gulm/lock_gulm.ko
needs unknown symbol tap_sig

WARNING: /lib/modules/2.6.11/kernel/fs/gfs_locking/lock_gulm/lock_gulm.ko
needs unknown symbol watch_sig

WARNING: /lib/modules/2.6.11/kernel/fs/gfs_locking/lock_gulm/lock_gulm.ko
needs unknown symbol sig_watcher_init

WARNING: /lib/modules/2.6.11/kernel/fs/gfs_locking/lock_gulm/lock_gulm.ko
needs unknown symbol sig_watcher_lock_drop

 

how to fix this error ? 

 

thanks all help !!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060406/d9521f6e/attachment.htm>

From ocrete at max-t.com  Thu Apr  6 16:34:41 2006
From: ocrete at max-t.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=)
Date: Thu, 06 Apr 2006 12:34:41 -0400
Subject: [Linux-cluster] cman kickout out nodes for no good reason
Message-ID: <1144341281.355.38.camel@cocagne.max-t.internal>

Hi,

I have a strange problem where cman suddenly starts kicking out members
of the cluster with "Inconsistent cluster view" when I join a new node
(sometimes).  It takes a few minutes between each kicking. I'm using a
snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is
in transition state at that point and I can't stop/start services or do
anything else. It did not do that with a snapshot I took a few months
ago.

-- 
Olivier Cr?te
ocrete at max-t.com
Maximum Throughput Inc.




From charlie.sharkey at bustech.com  Wed Apr  5 17:40:48 2006
From: charlie.sharkey at bustech.com (Charlie Sharkey)
Date: Wed, 5 Apr 2006 13:40:48 -0400
Subject: [Linux-cluster] two node cluster startup problem
Message-ID: <03FB5D708BE3C8448E8079186A56CDE67658CD@BTIBURMAIL.bustech.com>


Hi,

I'm having trouble with a two node cluster. The second node ("one")
gets the config from "zero" ok, but won't join the cluster. It instead
starts it's own cluster (according to /proc/cluster/nodes). My config
file is below, any help would be appreciated. thanks !


<?xml version="1.0"?>
<cluster name="alpha" config_version="4">
      
       <cman two_node="1" expected_votes="1">
        </cman>
                      
         <clusternodes>
           <clusternode name="zero" votes="1">
             <fence>
               <method name="single">
                 <device name="human" ipaddr="192.188.120.161"/>
               </method>
             </fence>
           </clusternode>
            <clusternode name="one" votes="1">
             <fence>
               <method name="single">
                 <device name="human" ipaddr="192.188.120.162"/>
               </method>
             </fence>
           </clusternode>
         </clusternodes>
         
         <fencedevices>
           <fencedevice name="human" agent="fence_manual"/>
         </fencedevices> 
         
 </cluster>











From lhh at redhat.com  Thu Apr  6 20:34:25 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 06 Apr 2006 16:34:25 -0400
Subject: [Linux-cluster] Monitoring Cluster Services
In-Reply-To: <089401c658a7$481d72b0$3964a8c0@WS076>
References: <089401c658a7$481d72b0$3964a8c0@WS076>
Message-ID: <1144355665.3723.1.camel@ayanami.boston.redhat.com>

On Wed, 2006-04-05 at 12:51 +0100, Ben Yarwood wrote:
> I have set up a monitoring tool to check that all the appropriate processes
> are running on our cluster nodes.  I am currently checking for the
> following:
> 
> ccsd , 1 instance
> cman_comms, 1 instance
> cman_memb , 1 instance
> cman_serviced, 1 instance
> cman_hbeat, 1 instance
> fenced, 1 instance
> clvmd, 1 instance
> gfs_inoded, 1 instance for each gfs mount
> clurgmgrd, 1 instance
> 
> Can anyone tell me if this is a correct and exhaustive list.

Looks like it's missing DLM threads.

-- Lon




From lhh at redhat.com  Thu Apr  6 20:41:17 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 06 Apr 2006 16:41:17 -0400
Subject: [Linux-cluster] Order of execution
In-Reply-To: <BC430F453501174992B9D9E8AFB7519AD5310E@ausx3mps309.aus.amer.dell.com>
References: <BC430F453501174992B9D9E8AFB7519AD5310E@ausx3mps309.aus.amer.dell.com>
Message-ID: <1144356077.3723.10.camel@ayanami.boston.redhat.com>

On Tue, 2006-04-04 at 07:55 -0500, JACOB_LIBERMAN at Dell.com wrote:
> Eric,
> 
> I am running RHEL3 U4 with clumanager 1.2.22.  I do not have the options
> listed below.
> 
> Does anyone have an example script for this version?  Lon?

The linux-cluster / RHCS4 ordering is directly taken from RHCS3:

(a) mount file systems
(b) bring up IPs
(c) start user service (only can have one in RHCS3)

Is the cluster controlling all of the components, or is it only
controlling some of them?  It sounds like it should work.

-- Lon



From gstaltari at arnet.net.ar  Thu Apr  6 21:19:47 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Thu, 06 Apr 2006 18:19:47 -0300
Subject: [Linux-cluster] GFS and CPU time
Message-ID: <443585F3.4090100@arnet.net.ar>

Hi, we've created a 6 node cluster with GFS filesystem. The question is 
why there's always one node that the CPU time of those GFS/lock related 
processes is a lot higher than the others.
Node 1
root      3799  0.0  0.0      0     0 ?        S<   Mar31   0:00 
[dlm_recoverd]
root      3806  0.1  0.0      0     0 ?        S<   Mar31  16:37 [lock_dlm1]
root      3807  0.1  0.0      0     0 ?        S<   Mar31  16:40 [lock_dlm2]
root      3808  1.0  0.0      0     0 ?        S    Mar31 102:27 [gfs_scand]
root      3809  0.1  0.0      0     0 ?        S    Mar31  18:05 
[gfs_glockd]
root      3810  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_recoverd]
root      3811  0.0  0.0      0     0 ?        S    Mar31   0:00 [gfs_logd]
root      3812  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_quotad]
root      3813  0.0  0.0      0     0 ?        S    Mar31   0:18 
[gfs_inoded]
Node 2
root      4230  0.0  0.0      0     0 ?        S<   Mar31   0:00 
[dlm_recoverd]
root      4237  0.0  0.0      0     0 ?        S<   Mar31   4:16 [lock_dlm1]
root      4238  0.0  0.0      0     0 ?        S<   Mar31   4:13 [lock_dlm2]
root      4239  0.4  0.0      0     0 ?        S    Mar31  38:01 [gfs_scand]
root      4240  0.0  0.0      0     0 ?        S    Mar31   2:58 
[gfs_glockd]
root      4241  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_recoverd]
root      4242  0.0  0.0      0     0 ?        S    Mar31   0:00 [gfs_logd]
root      4243  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_quotad]
root      4244  0.0  0.0      0     0 ?        S    Mar31   0:45 
[gfs_inoded]
Node 3
root      4124  0.0  0.0      0     0 ?        S<   Mar31   0:00 
[dlm_recoverd]
root      4131  0.0  0.0      0     0 ?        S<   Mar31   2:29 [lock_dlm1]
root      4132  0.0  0.0      0     0 ?        S<   Mar31   2:29 [lock_dlm2]
root      4133  0.9  0.0      0     0 ?        S    Mar31  88:45 [gfs_scand]
root      4134  0.0  0.0      0     0 ?        S    Mar31   2:35 
[gfs_glockd]
root      4135  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_recoverd]
root      4136  0.0  0.0      0     0 ?        S    Mar31   0:00 [gfs_logd]
root      4137  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_quotad]
root      4138  0.0  0.0      0     0 ?        S    Mar31   0:06 
[gfs_inoded]
Node 4
root     17576  0.0  0.0      0     0 ?        S<   Mar31   0:00 
[dlm_recoverd]
root     17577  0.0  0.0      0     0 ?        S<   Mar31   0:00 [lock_dlm1]
root     17578  0.0  0.0      0     0 ?        S<   Mar31   0:00 [lock_dlm2]
root     17579  0.0  0.0      0     0 ?        S    Mar31   0:01 [gfs_scand]
root     17580  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_glockd]
root     17581  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_recoverd]
root     17582  0.0  0.0      0     0 ?        S    Mar31   0:00 [gfs_logd]
root     17583  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_quotad]
root     17584  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_inoded]
Node 5
root     30784  0.0  0.0      0     0 ?        S<   Mar31   0:00 
[dlm_recoverd]
root     30785  0.0  0.0      0     0 ?        S<   Mar31   0:47 [lock_dlm1]
root     30786  0.0  0.0      0     0 ?        S<   Mar31   0:46 [lock_dlm2]
root     30787  0.2  0.0      0     0 ?        S     Mar31  10:00 
[gfs_scand]
root     30788  0.0  0.0      0     0 ?        S     Mar31   0:50 
[gfs_glockd]
root     30789  0.0  0.0      0     0 ?        S     Mar31  0:00 
[gfs_recoverd]
root     30790  0.0  0.0      0     0 ?        S     Mar31   0:00 [gfs_logd]
root     30791  0.0  0.0      0     0 ?        S     Mar31   0:00 
[gfs_quotad]
root     30792  0.0  0.0      0     0 ?        S     Mar31   0:00 
[gfs_inoded]
Node 6
root      4273  0.0  0.0      0     0 ?        S<   Mar31   0:00 
[dlm_recoverd]
root      4274  0.0  0.0      0     0 ?        S<   Mar31   0:18 [lock_dlm1]
root      4275  0.0  0.0      0     0 ?        S<   Mar31   0:17 [lock_dlm2]
root      4276  0.1  0.0      0     0 ?        S    Mar31   5:36 [gfs_scand]
root      4277  0.0  0.0      0     0 ?        S    Mar31   0:22 
[gfs_glockd]
root      4278  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_recoverd]
root      4279  0.0  0.0      0     0 ?        S    Mar31   0:00 [gfs_logd]
root      4280  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_quotad]
root      4281  0.0  0.0      0     0 ?        S    Mar31   0:00 
[gfs_inoded]

FC 4
kernel-smp-2.6.15-1.1831_FC4
dlm-kernel-smp-2.6.11.5-20050601.152643.FC4.21
GFS-kernel-smp-2.6.11.8-20050601.152643.FC4.24
cman-kernel-smp-2.6.11.5-20050601.152643.FC4.22

TIA
German Staltari



From ben.yarwood at juno.co.uk  Thu Apr  6 22:45:59 2006
From: ben.yarwood at juno.co.uk (Ben Yarwood)
Date: Thu, 6 Apr 2006 23:45:59 +0100
Subject: [Linux-cluster] Monitoring Cluster Services
In-Reply-To: <1144355665.3723.1.camel@ayanami.boston.redhat.com>
Message-ID: <093c01c659cb$df9bf150$3964a8c0@WS076>

Is there one instance of each of the following?

dlm_astd
dlm_recvd
dlm_sendd

Cheers
Ben 

 

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
> Sent: 06 April 2006 21:34
> To: linux clustering
> Subject: Re: [Linux-cluster] Monitoring Cluster Services
> 
> On Wed, 2006-04-05 at 12:51 +0100, Ben Yarwood wrote:
> > I have set up a monitoring tool to check that all the appropriate 
> > processes are running on our cluster nodes.  I am currently 
> checking 
> > for the
> > following:
> > 
> > ccsd , 1 instance
> > cman_comms, 1 instance
> > cman_memb , 1 instance
> > cman_serviced, 1 instance
> > cman_hbeat, 1 instance
> > fenced, 1 instance
> > clvmd, 1 instance
> > gfs_inoded, 1 instance for each gfs mount clurgmgrd, 1 instance
> > 
> > Can anyone tell me if this is a correct and exhaustive list.
> 
> Looks like it's missing DLM threads.
> 
> -- Lon
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 




From ookami at gmx.de  Fri Apr  7 04:36:04 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Fri, 7 Apr 2006 06:36:04 +0200 (MEST)
Subject: [Linux-cluster] newbie: gfs merge
Message-ID: <5174.1144384564@www022.gmx.net>

Hi,  
  
I installed gfs and all the cluster stuff on our systems and I didn't have  
the impression that I missed any of the steps in the manual. So I have to  
nodes which both have a gfs partition mounted. I can also mount these, if  
I exported them with gnbd. But I don't see the big difference to nfs yet  
(apart from maybe performance). I thought that if I name the  
gfs-partitions the same (clustername:gfs1) they would be magically merged  
or something like that. I thought this was meant by the notion in the docs  
that GFS does not have a single point of failure. Or that we could have  
redundant file-servers. What did I get wrong about all that?  
  
P.S.: I did the changes to /etc/lvm/lvm.conf regarding the locking 
(locking_type=2). 
 
Thanks for any help!!! 
 
wolfgang 

-- 
 

E-Mails und Internet immer und ?berall!
1&1 PocketWeb, perfekt mit GMX: http://www.gmx.net/de/go/pocketweb



From pcaulfie at redhat.com  Fri Apr  7 07:20:23 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Fri, 07 Apr 2006 08:20:23 +0100
Subject: [Linux-cluster] two node cluster startup problem
In-Reply-To: <03FB5D708BE3C8448E8079186A56CDE67658CD@BTIBURMAIL.bustech.com>
References: <03FB5D708BE3C8448E8079186A56CDE67658CD@BTIBURMAIL.bustech.com>
Message-ID: <443612B7.6010202@redhat.com>

Charlie Sharkey wrote:
> Hi,
> 
> I'm having trouble with a two node cluster. The second node ("one")
> gets the config from "zero" ok, but won't join the cluster. It instead
> starts it's own cluster (according to /proc/cluster/nodes). My config
> file is below, any help would be appreciated. thanks !
> 

Check you don't have any firewalling enabled. It's most likely that the nodes
can't talk to each other. You'll need to open ports 6809/udp and 21064/tcp.

Also check that you can ping and/or ssh between the machines.

-- 

patrick



From Michael.Roethlein at ri-solution.com  Fri Apr  7 08:51:29 2006
From: Michael.Roethlein at ri-solution.com (=?iso-8859-1?Q?R=F6thlein_Michael_=28RI-Solution=29?=)
Date: Fri, 7 Apr 2006 10:51:29 +0200
Subject: [Linux-cluster] GFS freezes without a trace
Message-ID: <992633B6A0E42B49BC5A41C10A8C841B01DB222B@MUCEX004.root.local>

Hi,

In the last days it occured several times that gfs got lost, but I could not find any trace in any logfile I could think of.

We have a 4 node cluster with each node attached to one storage with one gfs partition.

Is there a gfs or whatever logfile i might have not found or is it possible to enable debugging?

Thanks in Advance

Yours

Michael



From Bowie_Bailey at BUC.com  Fri Apr  7 13:42:26 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Fri, 7 Apr 2006 09:42:26 -0400 
Subject: [Linux-cluster] newbie: gfs merge
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338A7@bnifex.cis.buc.com>

wolfgang pauli wrote:
> 
> I installed gfs and all the cluster stuff on our systems and I didn't
> have the impression that I missed any of the steps in the manual. So
> I have to nodes which both have a gfs partition mounted. I can also
> mount these, if I exported them with gnbd. But I don't see the big
> difference to nfs yet (apart from maybe performance). I thought that
> if I name the gfs-partitions the same (clustername:gfs1) they would
> be magically merged or something like that. I thought this was meant
> by the notion in the docs that GFS does not have a single point of
> failure. Or that we could have redundant file-servers. What did I get
> wrong about all that? 

It sounds like you are a bit confused about what GFS does.  I replied
to someone within the last week or so on almost the same issue.  Check
the archives.

GFS is a filesystem that allows multiple nodes to access and update it
at the same time.  The cluster services manage the nodes and try to
prevent a misbehaving node from corrupting the filesystem.

If you have hard drives in all of your nodes, GFS and the cluster will
not help you make them into one big shared storage area -- at least not
yet, I believe there is a beta (alpha?) project out there somewhere.
If you have a big storage area, GFS and the cluster _will_ allow you
to connect all of your nodes to it.

The redundancy comes in the fact that you have multiple machines
running from the same storage area.  If one of the machines goes down,
the others can continue working.  In a load-balanced configuration,
the loss of one of the nodes will be transparent to the users.  In
theory, of course...  If the storage dies, that's another issue.
Hopefully, your storage is raid and can handle a disk failure.

-- 
Bowie



From charlie.sharkey at bustech.com  Fri Apr  7 14:00:08 2006
From: charlie.sharkey at bustech.com (Charlie Sharkey)
Date: Fri, 7 Apr 2006 10:00:08 -0400
Subject: [Linux-cluster] two node cluster startup problem
Message-ID: <03FB5D708BE3C8448E8079186A56CDE67659B4@BTIBURMAIL.bustech.com>


That was it, problem solved. Ping worked ok, but not ssh. I stopped both

the portmap and iptables services and now it joins ok.

Thanks for your help !
 
charlie



From ookami at gmx.de  Fri Apr  7 19:22:51 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Fri, 7 Apr 2006 21:22:51 +0200 (MEST)
Subject: [Linux-cluster] newbie: gfs merge
References: <4766EEE585A6D311ADF500E018C154E3021338A7@bnifex.cis.buc.com>
Message-ID: <20750.1144437771@www010.gmx.net>

> > I installed gfs and all the cluster stuff on our systems and I didn't   
> > have the impression that I missed any of the steps in the manual. So   
> > I have to nodes which both have a gfs partition mounted. I can also   
> > mount these, if I exported them with gnbd. But I don't see the big   
> > difference to nfs yet (apart from maybe performance). I thought that   
> > if I name the gfs-partitions the same (clustername:gfs1) they would   
> > be magically merged or something like that. I thought this was meant   
> > by the notion in the docs that GFS does not have a single point of   
> > failure. Or that we could have redundant file-servers. What did I get   
> > wrong about all that?    
>    
> It sounds like you are a bit confused about what GFS does.  I replied   
> to someone within the last week or so on almost the same issue.  Check   
> the archives.   
>    
> GFS is a filesystem that allows multiple nodes to access and update it   
> at the same time.  The cluster services manage the nodes and try to   
> prevent a misbehaving node from corrupting the filesystem.   
>    
> If you have hard drives in all of your nodes, GFS and the cluster will   
> not help you make them into one big shared storage area -- at least not   
> yet, I believe there is a beta (alpha?) project out there somewhere.   
> If you have a big storage area, GFS and the cluster _will_ allow you   
> to connect all of your nodes to it.   
>    
> The redundancy comes in the fact that you have multiple machines   
> running from the same storage area.  If one of the machines goes down,   
> the others can continue working.  In a load-balanced configuration,   
> the loss of one of the nodes will be transparent to the users.  In   
> theory, of course...  If the storage dies, that's another issue.   
> Hopefully, your storage is raid and can handle a disk failure.   
>    
> --    
> Bowie   
 
Hm... Thanks for you answer! I am definetelly confused a bit. Even after 
reading you post of last week. I understand that i can not merge the file 
systems. Our setup is very basic. We have to linux machines who could act 
as file server and we thought that we could one (A) have working as an 
active backup of the other (B). Is that what the documentation calls a 
failover domain, with (B) being the failover "domain" for (A)? Until now, 
we were running rsync at night, so that if the first of the two servers 
failed, clients could mount the NFS from the other server. There is 
nothing fancy here, like a SAN I guess, just machines connected via 
ethernet switches. So basically the question is, whether it is possible to 
keep the filesystems on the two servers in total sync, so that it would 
not matter whether clients mount the remote share from (A) or (B). Whether 
the clients would automatically be able to mount the GFS from (B), if (A) 
fails.  
  
Wolfgang   

-- 
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner



From Bowie_Bailey at BUC.com  Fri Apr  7 19:32:38 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Fri, 7 Apr 2006 15:32:38 -0400 
Subject: [Linux-cluster] newbie: gfs merge
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338B8@bnifex.cis.buc.com>

wolfgang pauli wrote:
> > > I installed gfs and all the cluster stuff on our systems and I
> > > didn't have the impression that I missed any of the steps in the
> > > manual. So I have to nodes which both have a gfs partition
> > > mounted. I can also mount these, if I exported them with gnbd.
> > > But I don't see the big difference to nfs yet (apart from maybe
> > > performance). I thought that if I name the gfs-partitions the
> > > same (clustername:gfs1) they would be magically merged or
> > > something like that. I thought this was meant by the notion in
> > > the docs that GFS does not have a single point of failure. Or
> > > that we could have redundant file-servers. What did I get wrong
> > > about all that? 
> > 
> > It sounds like you are a bit confused about what GFS does.  I
> > replied to someone within the last week or so on almost the same
> > issue.  Check the archives. 
> > 
> > GFS is a filesystem that allows multiple nodes to access and
> > update it at the same time.  The cluster services manage the nodes
> > and try to prevent a misbehaving node from corrupting the
> > filesystem.
> > 
> > If you have hard drives in all of your nodes, GFS and the cluster
> > will not help you make them into one big shared storage area -- at
> > least not yet, I believe there is a beta (alpha?) project out
> > there somewhere. If you have a big storage area, GFS and the
> > cluster _will_ allow you to connect all of your nodes to it.
> > 
> > The redundancy comes in the fact that you have multiple machines
> > running from the same storage area.  If one of the machines goes
> > down, the others can continue working.  In a load-balanced
> > configuration, the loss of one of the nodes will be transparent to
> > the users.  In theory, of course...  If the storage dies, that's
> > another issue. Hopefully, your storage is raid and can handle a
> > disk failure. 
> 
> Hm... Thanks for you answer! I am definetelly confused a bit. Even
> after reading you post of last week. I understand that i can not
> merge the file systems. Our setup is very basic. We have to linux
> machines who could act as file server and we thought that we could
> one (A) have working as an active backup of the other (B). Is that
> what the documentation calls a failover domain, with (B) being the
> failover "domain" for (A)? Until now, we were running rsync at
> night, so that if the first of the two servers failed, clients could
> mount the NFS from the other server. There is nothing fancy here,
> like a SAN I guess, just machines connected via ethernet switches.
> So basically the question is, whether it is possible to keep the
> filesystems on the two servers in total sync, so that it would not
> matter whether clients mount the remote share from (A) or (B).
> Whether the clients would automatically be able to mount the GFS
> from (B), if (A) fails. 

No, GFS doesn't work quite like that.  What you have is something more
like this:  Two machines, (A) and (B), are file servers.  A third
machine, (C), is either a linux box exporting it's filesystem via
GNBD, or a dedicated storage box running iSCSI, AoE, or something
similar that will allow multiple connections.  (A) and (B) are both
connected to the GFS filesystem exported by (C).  If either (A) or (B)
goes down, the other one can continue serving the data from (C).  They
don't need to be synchronized because they are using the same physical
storage.  And, if the application permits, you can even run them both
simultaneously.

You are looking for something different.  There is a project out there
for that, but it is not production ready at this point.  Maybe someone
else remembers the name.

-- 
Bowie



From ookami at gmx.de  Fri Apr  7 21:01:06 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Fri, 7 Apr 2006 23:01:06 +0200 (MEST)
Subject: [Linux-cluster] newbie: gfs merge
References: <4766EEE585A6D311ADF500E018C154E3021338B8@bnifex.cis.buc.com>
Message-ID: <4720.1144443666@www010.gmx.net>

> > > > I installed gfs and all the cluster stuff on our systems and I 
> > > > didn't have the impression that I missed any of the steps in the 
> > > > manual. So I have to nodes which both have a gfs partition 
> > > > mounted. I can also mount these, if I exported them with gnbd. 
> > > > But I don't see the big difference to nfs yet (apart from maybe 
> > > > performance). I thought that if I name the gfs-partitions the 
> > > > same (clustername:gfs1) they would be magically merged or 
> > > > something like that. I thought this was meant by the notion in 
> > > > the docs that GFS does not have a single point of failure. Or 
> > > > that we could have redundant file-servers. What did I get wrong 
> > > > about all that?  
> > >  
> > > It sounds like you are a bit confused about what GFS does.  I 
> > > replied to someone within the last week or so on almost the same 
> > > issue.  Check the archives.  
> > >  
> > > GFS is a filesystem that allows multiple nodes to access and 
> > > update it at the same time.  The cluster services manage the nodes 
> > > and try to prevent a misbehaving node from corrupting the 
> > > filesystem. 
> > >  
> > > If you have hard drives in all of your nodes, GFS and the cluster 
> > > will not help you make them into one big shared storage area -- at 
> > > least not yet, I believe there is a beta (alpha?) project out 
> > > there somewhere. If you have a big storage area, GFS and the 
> > > cluster _will_ allow you to connect all of your nodes to it. 
> > >  
> > > The redundancy comes in the fact that you have multiple machines 
> > > running from the same storage area.  If one of the machines goes 
> > > down, the others can continue working.  In a load-balanced 
> > > configuration, the loss of one of the nodes will be transparent to 
> > > the users.  In theory, of course...  If the storage dies, that's 
> > > another issue. Hopefully, your storage is raid and can handle a 
> > > disk failure.  
> >  
> > Hm... Thanks for you answer! I am definetelly confused a bit. Even 
> > after reading you post of last week. I understand that i can not 
> > merge the file systems. Our setup is very basic. We have to linux 
> > machines who could act as file server and we thought that we could 
> > one (A) have working as an active backup of the other (B). Is that 
> > what the documentation calls a failover domain, with (B) being the 
> > failover "domain" for (A)? Until now, we were running rsync at 
> > night, so that if the first of the two servers failed, clients could 
> > mount the NFS from the other server. There is nothing fancy here, 
> > like a SAN I guess, just machines connected via ethernet switches. 
> > So basically the question is, whether it is possible to keep the 
> > filesystems on the two servers in total sync, so that it would not 
> > matter whether clients mount the remote share from (A) or (B). 
> > Whether the clients would automatically be able to mount the GFS 
> > from (B), if (A) fails.  
>  
> No, GFS doesn't work quite like that.  What you have is something more 
> like this:  Two machines, (A) and (B), are file servers.  A third 
> machine, (C), is either a linux box exporting it's filesystem via 
> GNBD, or a dedicated storage box running iSCSI, AoE, or something 
> similar that will allow multiple connections.  (A) and (B) are both 
> connected to the GFS filesystem exported by (C).  If either (A) or (B) 
> goes down, the other one can continue serving the data from (C).  They 
> don't need to be synchronized because they are using the same physical 
> storage.  And, if the application permits, you can even run them both 
> simultaneously. 
>  
> You are looking for something different.  There is a project out there 
> for that, but it is not production ready at this point.  Maybe someone 
> else remembers the name. 
>  
> --  
> Bowie 
>  
 
Oh, OK. This would makes sense to me. But I still have some questions.. 
 
1. Would this reduce the load on (C)? 
 
2. I know how to export the gfs from (C) and mount it on (A) and (B), but 
how to the clients know whether they should connect to (A) or (B). Is this 
managed my clvmd?  
 
Thanks for the great help so far!! 
 
wolfgang 

-- 
Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer



From kumaresh81 at yahoo.co.in  Sat Apr  8 16:48:04 2006
From: kumaresh81 at yahoo.co.in (Kumaresh Ponnuswamy)
Date: Sat, 8 Apr 2006 17:48:04 +0100 (BST)
Subject: [Linux-cluster] issues with rhcs 4.2
Message-ID: <20060408164804.54434.qmail@web8319.mail.in.yahoo.com>

hi,
   
  I recently migrated from rhcs 3 to rhcs 4.2 and since then I am unable to bring up the clustered services.
   
  Even though the services are getting executed (like the VIP, shared devices etc), the status in clustat and system-config-cluster still displays failed and because of this the failover is not happening. 
   
  Any light on this will be much appreciated. Cluster is on RHEL AS 4U2 with two nodes.
   
  Regards,
  Kumaresh

				
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060408/bfe29483/attachment.htm>

From l.dardini at comune.prato.it  Sat Apr  8 17:05:18 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Sat, 8 Apr 2006 19:05:18 +0200
Subject: [Linux-cluster] Cluster node not able to access all cluster resource
Message-ID: <0C5C8B118420264EBB94D7D7050150011EFA92@exchange2.comune.prato.local>

The topic is not a problem, but what I want to do. I have a lots of
service, each on is now run by a two node cluster. This is very bad due
to each node fencing other one during network blackout. I'd like to
create only one cluster, but each resource, either GFS filesystems, must
be readable only by a limited number of nodes.

For example, taking a Cluster "test" made of node A, node B, node C,
node D and with the following resources: GFS Filesystem alpha and GFS
Filesystem beta. I want that only node A and node B can access GFS
Filesystem alpha and only node C and node D can access GFS Filesystem
beta.

Is it possible?

Leandro




From ookami at gmx.de  Sun Apr  9 00:44:15 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Sun, 9 Apr 2006 02:44:15 +0200 (MEST)
Subject: [Linux-cluster] hangs when copying with gnbd and gfs
Message-ID: <20347.1144543455@www012.gmx.net>

Hi, 
 
I could successfully mount a gfs partition and export with gnbd. It was 
also very fast, when I was moving a file from the client to the server, 
but if I try a second operation, like copying the file back, it always 
hangs. I can not even do copy files locally to the gfs partition anymore. 
Unfortunately, there is no info at all in the syslog or any other logfile. 
And the "gnbd_import -vl" and "gnbd_export -vl" don't show any error 
either. I guess it has something to do with the locking or fencing, but I 
don't understand that very well. Below it my config etc. Thanks for any 
hints!! 
 
<?xml version="1.0"?> 
<cluster config_version="29" name="oreilly_cluster"> 
        <fence_daemon post_fail_delay="0" post_join_delay="3"/> 
        <clusternodes> 
                <clusternode name="eon" votes="1"> 
                        <fence> 
                                <method name="1"> 
                                        <device name="manual_fence" 
nodename="eon"/> 
                                </method> 
                        </fence> 
                </clusternode> 
                <clusternode name="echo" votes="1"> 
                        <fence> 
                                <method name="1"> 
                                        <device name="gnbd" 
nodename="echo"/> 
                                        <device name="manual_fence" 
nodename="echo"/> 
                                </method> 
                        </fence> 
                </clusternode> 
        </clusternodes> 
        <cman expected_votes="1" two_node="1"/> 
        <fencedevices> 
                <fencedevice agent="fence_manual" name="manual_fence"/> 
                <fencedevice agent="fence_gnbd" name="gnbd" 
servers="eon"/> 
        </fencedevices> 
        <rm> 
                <failoverdomains> 
                        <failoverdomain name="oreilly_cluster_failover" 
ordered="0" restricted="0\ 
"> 
                                <failoverdomainnode name="eon" 
priority="1"/> 
                        </failoverdomain> 
                </failoverdomains> 
                <resources> 
                        <clusterfs device="/dev/volumeGroup1/gfs1" 
force_unmount="0" fsid="29490"\ 
 fstype="gfs" mountpoint="/mnt/gfs1" name="oreilly_cluster:gfs1" 
options="-j 50 -p lock_dlm"/> 
                </resources> 
                <service autostart="1" domain="oreilly_cluster_failover" 
name="gfs1"> 
                        <clusterfs ref="oreilly_cluster:gfs1"/> 
                </service> 
        </rm> 
</cluster> 
 
 
I exported/imported the file system like that: 
gnbd_export -d /dev/hdd1 -e testgfs  
gnbd_import -i eon 
mount -t gfs /dev/gnbd/testgfs /mnt/gfs1/ 
 

-- 
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner



From ookami at gmx.de  Sun Apr  9 02:54:55 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Sun, 9 Apr 2006 04:54:55 +0200 (MEST)
Subject: [Linux-cluster] hangs when copying with gnbd and gfs
References: <20347.1144543455@www012.gmx.net>
Message-ID: <22376.1144551295@www084.gmx.net>

Could this be related to automount? I just tried it again copied back a 
forth some mpg files and everything worked fine. But then I copied another 
file (230MB of /dev/zero) and the copying froze. The only think I could 
find in the log file was this: 
Apr  8 20:44:26 echo automount[5176]: failed to mount /misc/.directory 
Apr  8 20:44:26 echo automount[5177]: failed to mount /misc/.directory 
Apr  8 20:44:26 echo automount[5178]: >> /usr/sbin/showmount: can't get 
address for .directory 
Apr  8 20:44:26 echo automount[5178]: lookup(program): lookup 
for .directory failed 
Apr  8 20:44:26 echo automount[5178]: failed to mount /net/.directory 
Apr  8 20:44:26 echo automount[5183]: >> /usr/sbin/showmount: can't get 
address for .directory 
Apr  8 20:44:26 echo automount[5183]: lookup(program): lookup 
for .directory failed 
Apr  8 20:44:26 echo automount[5183]: failed to mount /net/.directory 
 
Another question I have is whether it is possible to mount the gfs on the 
server while it gnbd-exports the filesystem? 
 
wolfgang 

-- 
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen f?r GMX Partner: http://www.gmx.net/de/go/partner



From Alain.Moulle at bull.net  Mon Apr 10 11:02:08 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Mon, 10 Apr 2006 13:02:08 +0200
Subject: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
Message-ID: <443A3B30.10307@bull.net>

Hi

I'm trying to configure a simple 3 nodes cluster
with simple tests scripts.
But I can't start cman, it remains stalled with
this message in syslog :
Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded
Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
16:04:34) installed
Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered protocol family 30
Apr 10 11:38:00 s_sys at yack21 ccsd[25004]: cluster.conf (cluster name =
HA_METADATA_3N, version = 8) found.
Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to join or form a
Linux-cluster
Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Connected to cluster infrastruture
via: CMAN/SM Plugin v1.1.2
Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status:: Inquorate
Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN: forming a new cluster

and nothing more.

The graphic tool dos not detect any error in configuration; I 've
attached my cluster.conf for the three nodes, knowing that
I wanted two nodes (yack10 and yack21) running theirs applications
and the 3rd one (yack23) as a backup for yack10 and/or yack21,
but I don't want any failover between yack10 and yack21.

PS : I 've verified all ssh connections between the 3 nodes, and
all the fence paths as described in the cluster.conf.
Thanks again for your help.

Alain



-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 1500 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060410/a87f2d48/attachment.xml>

From l.dardini at comune.prato.it  Mon Apr 10 11:11:04 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Mon, 10 Apr 2006 13:11:04 +0200
Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
Message-ID: <0C5C8B118420264EBB94D7D7050150011EFACF@exchange2.comune.prato.local>

> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] Per conto di Alain Moulle
> Inviato: luned? 10 aprile 2006 13.02
> A: linux-cluster at redhat.com
> Oggetto: [Linux-cluster] CS4 U2 / problem to configure a 3 
> nodes cluster
> 
> Hi
> 
> I'm trying to configure a simple 3 nodes cluster with simple 
> tests scripts.
> But I can't start cman, it remains stalled with this message 
> in syslog :
> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr 10 
> 11:38:00 s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
> 16:04:34) installed
> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered 
> protocol family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]: 
> cluster.conf (cluster name = HA_METADATA_3N, version = 8) found.
> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to join 
> or form a Linux-cluster Apr 10 11:38:01 s_sys at yack21 
> ccsd[25004]: Connected to cluster infrastruture
> via: CMAN/SM Plugin v1.1.2
> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status:: 
> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN: 
> forming a new cluster
> 
> and nothing more.
> 
> The graphic tool dos not detect any error in configuration; I 
> 've attached my cluster.conf for the three nodes, knowing 
> that I wanted two nodes (yack10 and yack21) running theirs 
> applications and the 3rd one (yack23) as a backup for yack10 
> and/or yack21, but I don't want any failover between yack10 
> and yack21.
> 
> PS : I 've verified all ssh connections between the 3 nodes, 
> and all the fence paths as described in the cluster.conf.
> Thanks again for your help.
> 
> Alain
> 

Are you starting the cman on all three nodes in the same time? A node doesn't start until each other node is starting. Timing is important during booting. 

Leandro



From pcaulfie at redhat.com  Mon Apr 10 12:02:58 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 10 Apr 2006 13:02:58 +0100
Subject: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
In-Reply-To: <443A3B30.10307@bull.net>
References: <443A3B30.10307@bull.net>
Message-ID: <443A4972.5030000@redhat.com>

Alain Moulle wrote:
> Hi
> 
> I'm trying to configure a simple 3 nodes cluster
> with simple tests scripts.
> But I can't start cman, it remains stalled with
> this message in syslog :
> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded
> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
> 16:04:34) installed
> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered protocol family 30
> Apr 10 11:38:00 s_sys at yack21 ccsd[25004]: cluster.conf (cluster name =
> HA_METADATA_3N, version = 8) found.
> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to join or form a
> Linux-cluster
> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Connected to cluster infrastruture
> via: CMAN/SM Plugin v1.1.2
> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status:: Inquorate
> Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN: forming a new cluster
> 
> and nothing more.
> 
> The graphic tool dos not detect any error in configuration; I 've
> attached my cluster.conf for the three nodes, knowing that
> I wanted two nodes (yack10 and yack21) running theirs applications
> and the 3rd one (yack23) as a backup for yack10 and/or yack21,
> but I don't want any failover between yack10 and yack21.
> 
> PS : I 've verified all ssh connections between the 3 nodes, and
> all the fence paths as described in the cluster.conf.
> Thanks again for your help.

Check that the cluster ports are not blocked by any firewalling. You'll
need 6809/udp & 21064/tcp opened.
-- 

patrick



From ugo.parsi at gmail.com  Mon Apr 10 14:25:20 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Mon, 10 Apr 2006 16:25:20 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
Message-ID: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>

Hello,

Do you know how to run GFS / linux-cluster suite under a 2.6.16 vanilla kernel ?

All I've got is :

/usr/src/cluster/dlm-kernel/src2/lockspace.c: In function `do_uevent':
/usr/src/cluster/dlm-kernel/src2/lockspace.c:160: error: too many
arguments to function `kobject_uevent'
/usr/src/cluster/dlm-kernel/src2/lockspace.c:162: error: too many
arguments to function `kobject_uevent'
make[4]: *** [/usr/src/cluster/dlm-kernel/src2/lockspace.o] Error 1

I've removed the last argument in the kobject_uvent call wich was
"NULL", it does compile, but I don't really know if it's safe to do
this that way...

Anyway, I'm stuck with another error which seem due to a missing
include .h file (dlm.h) :

libdlm.c:44:17: dlm.h: No such file or directory
In file included from libdlm.c:46:
libdlm.h:142: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:142: warning: its scope is only this definition or
declaration, which is probably not what you want
libdlm.h:145: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:156: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:160: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:210: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:221: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:225: warning: `struct dlm_lksb' declared inside parameter list
libdlm.h:229: warning: `struct dlm_lksb' declared inside parameter list
libdlm.c:47:24: dlm_device.h: No such file or directory
libdlm.c:70: warning: `struct dlm_lock_result' declared inside parameter list
libdlm.c:71: warning: `struct dlm_lock_result' declared inside parameter list
libdlm.c:72: warning: `struct dlm_write_request' declared inside parameter list
libdlm.c:120: error: field `lksb' has incomplete type
libdlm.c: In function `unlock_resource':
libdlm.c:215: error: `DLM_EUNLOCK' undeclared (first use in this function)
libdlm.c:215: error: (Each undeclared identifier is reported only once
libdlm.c:215: error: for each function it appears in.)
libdlm.c: At top level:
libdlm.c:268: warning: `struct dlm_write_request' declared inside parameter list
libdlm.c: In function `set_version':
libdlm.c:270: error: dereferencing pointer to incomplete type
libdlm.c:270: error: `DLM_DEVICE_VERSION_MAJOR' undeclared (first use
in this function)
libdlm.c:271: error: dereferencing pointer to incomplete type

Any ideas ?

Thanks a lot,

Ugo PARSI



From jerome.castang at adelpha-lan.org  Mon Apr 10 14:33:21 2006
From: jerome.castang at adelpha-lan.org (Castang Jerome)
Date: Mon, 10 Apr 2006 16:33:21 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
Message-ID: <443A6CB1.7010307@adelpha-lan.org>

Ugo PARSI a ?crit :

>Hello,
>
>Do you know how to run GFS / linux-cluster suite under a 2.6.16 vanilla kernel ?
>
>All I've got is :
>
>/usr/src/cluster/dlm-kernel/src2/lockspace.c: In function `do_uevent':
>/usr/src/cluster/dlm-kernel/src2/lockspace.c:160: error: too many
>arguments to function `kobject_uevent'
>/usr/src/cluster/dlm-kernel/src2/lockspace.c:162: error: too many
>arguments to function `kobject_uevent'
>make[4]: *** [/usr/src/cluster/dlm-kernel/src2/lockspace.o] Error 1
>
>I've removed the last argument in the kobject_uvent call wich was
>"NULL", it does compile, but I don't really know if it's safe to do
>this that way...
>
>Anyway, I'm stuck with another error which seem due to a missing
>include .h file (dlm.h) :
>
>libdlm.c:44:17: dlm.h: No such file or directory
>In file included from libdlm.c:46:
>libdlm.h:142: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:142: warning: its scope is only this definition or
>declaration, which is probably not what you want
>libdlm.h:145: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:156: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:160: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:210: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:221: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:225: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.h:229: warning: `struct dlm_lksb' declared inside parameter list
>libdlm.c:47:24: dlm_device.h: No such file or directory
>libdlm.c:70: warning: `struct dlm_lock_result' declared inside parameter list
>libdlm.c:71: warning: `struct dlm_lock_result' declared inside parameter list
>libdlm.c:72: warning: `struct dlm_write_request' declared inside parameter list
>libdlm.c:120: error: field `lksb' has incomplete type
>libdlm.c: In function `unlock_resource':
>libdlm.c:215: error: `DLM_EUNLOCK' undeclared (first use in this function)
>libdlm.c:215: error: (Each undeclared identifier is reported only once
>libdlm.c:215: error: for each function it appears in.)
>libdlm.c: At top level:
>libdlm.c:268: warning: `struct dlm_write_request' declared inside parameter list
>libdlm.c: In function `set_version':
>libdlm.c:270: error: dereferencing pointer to incomplete type
>libdlm.c:270: error: `DLM_DEVICE_VERSION_MAJOR' undeclared (first use
>in this function)
>libdlm.c:271: error: dereferencing pointer to incomplete type
>
>Any ideas ?
>
>Thanks a lot,
>
>Ugo PARSI
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>  
>

For the problem with dlm.h i found this:
http://rpmfind.net/linux/RPM/fedora/updates/4/x86_64/debug/dlm-kernel-debuginfo-2.6.11.5-20050601.152643.FC4.21.x86_64.html

Seems that dlm.h is provided by dlm-kernel-debuginfo 
<http://rpmfind.net/linux/rpm2html/search.php?query=dlm-kernel-debuginfo>.


-- 
Jerome Castang
mail: jcastang at adelpha-lan.org




From ugo.parsi at gmail.com  Mon Apr 10 14:39:25 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Mon, 10 Apr 2006 16:39:25 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <443A6CB1.7010307@adelpha-lan.org>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
	<443A6CB1.7010307@adelpha-lan.org>
Message-ID: <f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>

>
> For the problem with dlm.h i found this:
> >http://rpmfind.net/linux/RPM/fedora/updates/4/x86_64/debug/dlm-kernel-debuginfo-2.6.11.5-20050601.152643.FC4.21.x86_64.html

The link is dead :(

>
> Seems that dlm.h is provided by dlm-kernel-debuginfo
> <http://rpmfind.net/linux/rpm2html/search.php?query=dlm-kernel-debuginfo>.
>

I've installed two packages on Debian

# apt-cache search dlm
libdlm-dev - Distributed lock manager - development files
libdlm0 - Distributed lock manager - library


Here's all I've got :

# locate dlm.h
/usr/include/libdlm.h
/usr/src/cluster/dlm-kernel/src2/dlm.h
/usr/src/cluster/dlm-kernel/src/dlm.h
/usr/src/cluster/dlm/lib/libdlm.h
/usr/src/cluster/gfs-kernel/src/dlm/lock_dlm.h
/usr/src/cluster/gfs/lock_dlm/daemon/lock_dlm.h
/usr/src/linux-2.6.16.1/fs/ocfs2/dlm/userdlm.h

I'm trying your package, but I suppose it's redhat-only...

Thanks,

Ugo PARSI



From jerome.castang at adelpha-lan.org  Mon Apr 10 14:51:26 2006
From: jerome.castang at adelpha-lan.org (Castang Jerome)
Date: Mon, 10 Apr 2006 16:51:26 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>	<443A6CB1.7010307@adelpha-lan.org>
	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>
Message-ID: <443A70EE.4070907@adelpha-lan.org>

Ugo PARSI a ?crit :

>>For the problem with dlm.h i found this:
>>    
>>
>>>http://rpmfind.net/linux/RPM/fedora/updates/4/x86_64/debug/dlm-kernel-debuginfo-2.6.11.5-20050601.152643.FC4.21.x86_64.html
>>>      
>>>
>
>The link is dead :(
>  
>

Link is dead ?
It works perfectly for me...

>  
>
>>Seems that dlm.h is provided by dlm-kernel-debuginfo
>><http://rpmfind.net/linux/rpm2html/search.php?query=dlm-kernel-debuginfo>.
>>
>>    
>>
>
>I've installed two packages on Debian
>
># apt-cache search dlm
>libdlm-dev - Distributed lock manager - development files
>libdlm0 - Distributed lock manager - library
>
>
>Here's all I've got :
>
># locate dlm.h
>/usr/include/libdlm.h
>/usr/src/cluster/dlm-kernel/src2/dlm.h
>/usr/src/cluster/dlm-kernel/src/dlm.h
>/usr/src/cluster/dlm/lib/libdlm.h
>/usr/src/cluster/gfs-kernel/src/dlm/lock_dlm.h
>/usr/src/cluster/gfs/lock_dlm/daemon/lock_dlm.h
>/usr/src/linux-2.6.16.1/fs/ocfs2/dlm/userdlm.h
>
>I'm trying your package, but I suppose it's redhat-only...
>
>Thanks,
>
>Ugo PARSI
>
>--88
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>  
>

I suppose you can try to get this RH package and unpack it to get files 
and put them where they should be...


-- 
Jerome Castang
mail: jcastang at adelpha-lan.org




From ugo.parsi at gmail.com  Mon Apr 10 14:57:14 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Mon, 10 Apr 2006 16:57:14 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <443A70EE.4070907@adelpha-lan.org>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
	<443A6CB1.7010307@adelpha-lan.org>
	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>
	<443A70EE.4070907@adelpha-lan.org>
Message-ID: <f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>

> I suppose you can try to get this RH package and unpack it to get files
> and put them where they should be...
>

Well I've just did and it doesn't change pretty much :(

Ugo PARSI



From jerome.castang at adelpha-lan.org  Mon Apr 10 15:16:18 2006
From: jerome.castang at adelpha-lan.org (Castang Jerome)
Date: Mon, 10 Apr 2006 17:16:18 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>	
	<443A6CB1.7010307@adelpha-lan.org>	
	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>	
	<443A70EE.4070907@adelpha-lan.org>
	<f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
Message-ID: <443A76C2.8070900@adelpha-lan.org>

Ugo PARSI a ?crit :

>>I suppose you can try to get this RH package and unpack it to get files
>>and put them where they should be...
>>
>>    
>>
>
>Well I've just did and it doesn't change pretty much :(
>
>Ugo PARSI
>  
>
Have you tried to start with the cvs of Cluster Project ?
I think cvs provides all you need.

-- 
Jerome Castang
mail: jcastang at adelpha-lan.org




From basv at sara.nl  Mon Apr 10 15:26:06 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Mon, 10 Apr 2006 17:26:06 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>	<443A6CB1.7010307@adelpha-lan.org>	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>	<443A70EE.4070907@adelpha-lan.org>
	<f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
Message-ID: <443A790E.1040002@sara.nl>

Ugo PARSI wrote:
>> I suppose you can try to get this RH package and unpack it to get files
>> and put them where they should be...
>>
> 
> Well I've just did and it doesn't change pretty much :(
> 
> Ugo PARSI

Ugo,

  Which version for GFS do you use cvs STABLE or HEAD?

  I have compiled deb-packages for kernel 2.6.16.2 and uses the CVS
  STABLE branch.

Regards


-- 
--
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: basv at sara.nl      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************



From carlopmart at gmail.com  Mon Apr 10 15:52:20 2006
From: carlopmart at gmail.com (carlopmart)
Date: Mon, 10 Apr 2006 17:52:20 +0200
Subject: [Linux-cluster] Question about manual fencing
Message-ID: <443A7F34.7000901@gmail.com>

Hi all,

  I would like to test manual fencing on two nodes for testing 
pourposes.  I have read RedHat's docs about this but I don't see very 
clear. If I setup manual fencing, when one node shutdowns, the other 
node startups all services that I have configured on the another node 
automatically?

Thanks.

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From jerome.castang at adelpha-lan.org  Mon Apr 10 15:59:14 2006
From: jerome.castang at adelpha-lan.org (Castang Jerome)
Date: Mon, 10 Apr 2006 17:59:14 +0200
Subject: [Linux-cluster] Question about manual fencing
In-Reply-To: <443A7F34.7000901@gmail.com>
References: <443A7F34.7000901@gmail.com>
Message-ID: <443A80D2.6050806@adelpha-lan.org>

carlopmart a ?crit :

> Hi all,
>
>  I would like to test manual fencing on two nodes for testing 
> pourposes.  I have read RedHat's docs about this but I don't see very 
> clear. If I setup manual fencing, when one node shutdowns, the other 
> node startups all services that I have configured on the another node 
> automatically?
>
> Thanks.
>

I don't think so.
Fencing a node is to stop it, or make it leaving the cluster (using any 
method like shutdown...)
So if you use manual fencing, the other nodes will not start automaticly 
their services...


-- 
Jerome Castang
mail: jcastang at adelpha-lan.org




From tf0054 at gmail.com  Sat Apr  8 16:23:05 2006
From: tf0054 at gmail.com (=?ISO-2022-JP?B?GyRCQ2ZMbkxUGyhC?=)
Date: Sun, 9 Apr 2006 01:23:05 +0900
Subject: [Linux-cluster] Cisco fence agent
Message-ID: <a51e1ec20604080923m778e8fbbtd6ab207666d6ac3c@mail.gmail.com>

Hi all.
Do anyone have cisco catalyst fence agent?
If nobody make that, I will make.

Thanks.



From Bowie_Bailey at BUC.com  Mon Apr 10 16:09:03 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Mon, 10 Apr 2006 12:09:03 -0400
Subject: [Linux-cluster] newbie: gfs merge
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338C7@bnifex.cis.buc.com>

wolfgang pauli wrote:
> > > 
> > > Hm... Thanks for you answer! I am definetelly confused a bit. Even
> > > after reading you post of last week. I understand that i can not
> > > merge the file systems. Our setup is very basic. We have to linux
> > > machines who could act as file server and we thought that we could
> > > one (A) have working as an active backup of the other (B). Is that
> > > what the documentation calls a failover domain, with (B) being the
> > > failover "domain" for (A)? Until now, we were running rsync at
> > > night, so that if the first of the two servers failed, clients
> > > could mount the NFS from the other server. There is nothing fancy
> > > here, like a SAN I guess, just machines connected via ethernet
> > > switches. So basically the question is, whether it is possible to
> > > keep the filesystems on the two servers in total sync, so that it
> > > would not matter whether clients mount the remote share from (A)
> > > or (B). Whether the clients would automatically be able to mount
> > > the GFS from (B), if (A) fails.
> > 
> > No, GFS doesn't work quite like that.  What you have is something
> > more like this:  Two machines, (A) and (B), are file servers.  A
> > third machine, (C), is either a linux box exporting it's filesystem
> > via GNBD, or a dedicated storage box running iSCSI, AoE, or
> > something similar that will allow multiple connections.  (A) and
> > (B) are both connected to the GFS filesystem exported by (C).  If
> > either (A) or (B) goes down, the other one can continue serving the
> > data from (C).  They don't need to be synchronized because they are
> > using the same physical storage.  And, if the application permits,
> > you can even run them both simultaneously. 
> > 
> > You are looking for something different.  There is a project out
> > there for that, but it is not production ready at this point. 
> > Maybe someone else remembers the name. 
> 
> Oh, OK. This would makes sense to me. But I still have some
> questions.. 
> 
> 1. Would this reduce the load on (C)?

Reduce it from what?  (C) would be a completely different type of
machine from (A) and (B).  (A) and (B) are application systems, while
(C) is just a fileserver.  (C) would not need to be quite as fast as
the others, just fast enough to keep up with the I/O requirements of
the storage and the GFS/Cluster overhead.

> 2. I know how to export the gfs from (C) and mount it on (A) and (B),
> but how to the clients know whether they should connect to (A) or
> (B). Is this managed my clvmd?

No, this is managed by your network.  If (A) and (B) are running the
same software, it doesn't matter which one they connect to.  On my
system, I have a Foundry ServerIron that load-balances the two
machines.  You can also do it using LVS software, such as the stuff in
the Linux HA project.

-- 
Bowie



From schlegel at riege.com  Mon Apr 10 16:20:20 2006
From: schlegel at riege.com (Gunther Schlegel)
Date: Tue, 11 Apr 2006 00:20:20 +0800
Subject: [Linux-cluster] gfs file locking
Message-ID: <443A85C4.2060608@riege.com>

Hi,

does GFS support the same ways of file locking a local filesystem does? 
I am evaluating to put an application on gfs that runs pretty fine on 
local filesystems but tends to have severe problems on NFS. I know NFS 
is totally different from GFS, but from the applications point of view 
both are just filesystems.

best regards, Gunther
-------------- next part --------------
A non-text attachment was scrubbed...
Name: schlegel.vcf
Type: text/x-vcard
Size: 344 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060411/a6bfd663/attachment.vcf>

From ugo.parsi at gmail.com  Mon Apr 10 16:53:41 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Mon, 10 Apr 2006 18:53:41 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <f29fd8170604100953y77aafa2q505a91d97be9bc90@mail.gmail.com>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
	<443A6CB1.7010307@adelpha-lan.org>
	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>
	<443A70EE.4070907@adelpha-lan.org>
	<f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
	<443A76C2.8070900@adelpha-lan.org>
	<f29fd8170604100953y77aafa2q505a91d97be9bc90@mail.gmail.com>
Message-ID: <f29fd8170604100953me355197sc3c74705e622f541@mail.gmail.com>

Reposting sorry :

On 4/10/06, Ugo PARSI <ugo.parsi at gmail.com> wrote:
> > Have you tried to start with the cvs of Cluster Project ?
> > I think cvs provides all you need.
> >
>
> Well, that's the only thing I did....I guess ?!
>
> I've followed that document indeed :
>
> http://sources.redhat.com/cluster/doc/usage.txt
>
> So I did a   cvs -d :pserver:cvs at sources.redhat.com:/cvs/cluster
> checkout cluster
>
> Is that okay ?
>
> Thanks a lot,
>
> Ugo PARSI
>



From ugo.parsi at gmail.com  Mon Apr 10 16:57:02 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Mon, 10 Apr 2006 18:57:02 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <443A790E.1040002@sara.nl>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
	<443A6CB1.7010307@adelpha-lan.org>
	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>
	<443A70EE.4070907@adelpha-lan.org>
	<f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
	<443A790E.1040002@sara.nl>
Message-ID: <f29fd8170604100957u5a78b7c9yfbd7295d72a2e1e0@mail.gmail.com>

>   Which version for GFS do you use cvs STABLE or HEAD?
>

I don't know how to tell...

Is stable this thing ?  - The 'cluster' cvs head can be unstable, so
it's recommended that you
    checkout from the RHEL4 branch -- 'checkout -r RHEL4 cluster'

I've tried both with or without anyway....

>   I have compiled deb-packages for kernel 2.6.16.2 and uses the CVS
>   STABLE branch.
>

>From a vanilla kernel ?
Because basically, I've just tried all of this from a fresh vanilla
2.6.16.1 (I'm gonna try the 2.6.16.2) downloaded from kernel.org.
System was running that kernel at time of compilation, and I provided
the path of the kernel to the configure script.

Anything wrong ? Any ideas ? You've made some fixes/patches ?

Thanks a lot,

Ugo PARSI



From basv at sara.nl  Mon Apr 10 19:24:58 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Mon, 10 Apr 2006 21:24:58 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <f29fd8170604100957u5a78b7c9yfbd7295d72a2e1e0@mail.gmail.com>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
	<443A6CB1.7010307@adelpha-lan.org>
	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>
	<443A70EE.4070907@adelpha-lan.org>
	<f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
	<443A790E.1040002@sara.nl>
	<f29fd8170604100957u5a78b7c9yfbd7295d72a2e1e0@mail.gmail.com>
Message-ID: <D915D859-864E-4B48-A370-FB689AE220FB@sara.nl>

>
>>   I have compiled deb-packages for kernel 2.6.16.2 and uses the CVS
>>   STABLE branch.
>>
>
>

You have to download, from cvs STABLE:
  cvs -d :pserver:cvs at sources.redhat.com:/cvs/cluster checkout -r  
STABLE cluster

Some packages need header files that are provided by others. So you  
most install them
before compiling the rest.  I have made debian package scripts for  
all cluster packages.

If i have some time a will put them on our ftp-server.

I have made a small document it is in dutch, but it is not that  
difficult. You have to install each package before building the  
others. It make life for me easier then examine all the dependencies.

cd cluster/cman-kernel
dch -i (vullen met juiste kernel versie)
debian/rules clean
debian/rules build
debian/rules binary
dpkg -i ../cman-kernel_<kernel_version>.deb
depmod -a
Nu de volgende delen maken op de bovenstaande manier:

dlm-kernel
cd juiste pad
dpkg -i ../dlm-kernel_<kernel_version>.deb
gnbd-kernel
dpkg -i ../gnbd-kernel_<kernel_version>.deb
gfs-kernel
dpkg -i ../gfs-kernel_<kernel_version>.deb
Nu de volgende kernel onafhankelijke delen bouwen:

magma
dch -i (juiste cvs versie)
debian/rules clean
debian/rules binary
dpkg -i ../magma.deb
idem:

iddev
dpkg -i ../iddev.deb
ccs
dpkg -i ../ccs.deb
cman
dlm
dpkg -i ../dlm.deb
gnbd
gfs
fence
gulm
dpkg -i ../gulm.deb
magma-plugins
rgmanager




--
Bas van der Vlies
basv at sara.nl





From ocrete at max-t.com  Mon Apr 10 21:01:47 2006
From: ocrete at max-t.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=)
Date: Mon, 10 Apr 2006 17:01:47 -0400
Subject: [Linux-cluster] cman kickout out nodes for no good reason
In-Reply-To: <1144341281.355.38.camel@cocagne.max-t.internal>
References: <1144341281.355.38.camel@cocagne.max-t.internal>
Message-ID: <1144702908.21093.7.camel@cocagne.max-t.internal>

On Thu, 2006-06-04 at 12:34 -0400, Olivier Cr?te wrote:
> I have a strange problem where cman suddenly starts kicking out members
> of the cluster with "Inconsistent cluster view" when I join a new node
> (sometimes).  It takes a few minutes between each kicking. I'm using a
> snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is
> in transition state at that point and I can't stop/start services or do
> anything else. It did not do that with a snapshot I took a few months
> ago.

Its still happening, the node that joins says "Transition master
unknown", while all of the other nodes who the master is, then the
master gets kicked out. Then a new master is selected, all of the nodes
seem to know who the master is, but refuse to act on it. After a while,
the new master is kicked out and the process restarts. I guess its
related to the changes with the timestamps to prevent master desync, I
dont see any other recent change that could have caused it.

-- 
Olivier Cr?te
ocrete at max-t.com
Maximum Throughput Inc.




From ookami at gmx.de  Mon Apr 10 23:07:48 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Tue, 11 Apr 2006 01:07:48 +0200 (MEST)
Subject: [Linux-cluster] hangs when copying with gnbd and gfs
References: <22376.1144551295@www084.gmx.net>
Message-ID: <28595.1144710468@www031.gmx.net>

> Could this be related to automount? I just tried it again copied back a  
> forth some mpg files and everything worked fine. But then I copied 
another 
> file (230MB of /dev/zero) and the copying froze. The only think I could  
> find in the log file was this:  
> Apr  8 20:44:26 echo automount[5176]: failed to mount /misc/.directory  
> Apr  8 20:44:26 echo automount[5177]: failed to mount /misc/.directory  
> Apr  8 20:44:26 echo automount[5178]: >> /usr/sbin/showmount: can't get  
> address for .directory  
> Apr  8 20:44:26 echo automount[5178]: lookup(program): lookup  
> for .directory failed  
> Apr  8 20:44:26 echo automount[5178]: failed to mount /net/.directory  
> Apr  8 20:44:26 echo automount[5183]: >> /usr/sbin/showmount: can't get  
> address for .directory  
> Apr  8 20:44:26 echo automount[5183]: lookup(program): lookup  
> for .directory failed  
> Apr  8 20:44:26 echo automount[5183]: failed to mount /net/.directory  
>   
> Another question I have is whether it is possible to mount the gfs on 
the  
> server while it gnbd-exports the filesystem?  
>   
> wolfgang  
>  
 
OK, I think I solved it. I switched from GNBD to iSCSI. I have iscsitarget 
running on the server and open-iscsi on the client. I had to export the 
logical volume rather then the war device to be able to mount it on the 
client. 
 

-- 
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail



From forigato at gmail.com  Mon Apr 10 23:57:16 2006
From: forigato at gmail.com (ANDRE LUIS FORIGATO)
Date: Mon, 10 Apr 2006 20:57:16 -0300
Subject: [Linux-cluster] Help-me, Please
Message-ID: <9e7b71460604101657n1eebc099jfaabb5a08ebbc630@mail.gmail.com>

Linux xlx2 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005
i686 i686 i386 GNU/Linux

Redhat-config-cluster 1.0.3
clumanager 1.2.22


Apr 10 01:18:07 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
member #0: Connection timed out
Apr 10 05:13:43 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
member #0: Connection timed out
Apr 10 05:13:43 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
lock: No locks available
Apr 10 05:13:49 xlx2 cluquorumd[4463]: <info> Disk-TB: Partner is DOWN
(Dead/Hung)
Apr 10 05:13:54 xlx2 cluquorumd[4463]: <info> Disk-TB: State Change: Partner UP
Apr 10 10:47:08 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
member #0: Connection timed out
Apr 10 10:47:08 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
lock: No locks available
Apr 10 11:30:59 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
member #0: Connection timed out
Apr 10 11:30:59 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
lock: No locks available
Apr 10 11:31:07 xlx2 clumembd[4493]: <info> Membership View #5:0x00000002
Apr 10 11:31:08 xlx2 cluquorumd[4463]: <warning> Membership reports #0
as down, but disk reports as up: State uncertain!
Apr 10 11:31:08 xlx2 cluquorumd[4463]: <warning> --> Commencing STONITH <--
Apr 10 11:31:08 xlx2 cluquorumd[4463]: <info> Disk-TB: Partner is DOWN
(Dead/Hung)
Apr 10 11:31:10 xlx2 cluquorumd[4463]: <info> Disk-TB: State Change: Partner UP
Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: <info> Quorum Event: View #12 0x00000002
Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: <warning> Member
200.254.254.171's state is uncertain: Some services may be
unavailable!
Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: <info> Quorum Event: View #13 0x00000002
Apr 10 11:31:29 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
member #0: Connection timed out
Apr 10 11:31:29 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
lock: No locks available
Apr 10 11:31:34 xlx2 cluquorumd[4463]: <info> Disk-TB: Partner is DOWN
(Dead/Hung)
Apr 10 11:31:38 xlx2 cluquorumd[4463]: <warning> --> Commencing STONITH <--
Apr 10 11:31:38 xlx2 cluquorumd[4463]: <warning> STONITH: Falsely
claiming that 200.254.254.171 has been fenced
Apr 10 11:31:38 xlx2 cluquorumd[4463]: <crit> STONITH: Data integrity
may be compromised!
Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
member #0: Connection timed out
Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
lock: No locks available
Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: <info> Quorum Event: View #15 0x00000002
Apr 10 11:31:41 xlx2 clusvcmgrd[4671]: <info> State change:
200.254.254.172 DOWN
Apr 10 11:34:08 xlx2 cluquorumd[4463]: <info> Disk-TB: State Change: Partner UP
Apr 10 11:34:09 xlx2 clusvcmgrd[4671]: <info> Quorum Event: View #16 0x00000002
Apr 10 11:34:16 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
member #0: No route to host
Apr 10 11:34:16 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
lock: No locks available
Apr 10 11:34:25 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
member #0: No route to host
Apr 10 11:34:25 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
lock: No locks available
Apr 10 11:34:34 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
member #0: No route to host
Apr 10 11:34:34 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
lock: No locks available
Apr 10 11:34:43 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
member #0: No route to host
Apr 10 11:34:43 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
lock: No locks available
Apr 10 11:34:50 xlx2 clumembd[4493]: <notice> Member 200.254.254.171 UP
Apr 10 11:34:50 xlx2 clumembd[4493]: <info> Membership View #6:0x00000003
Apr 10 11:34:50 xlx2 cluquorumd[4463]: <err> __msg_send: Incomplete
write to 13. Error: Connection reset by peer
Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: <info> Quorum Event: View #17 0x00000003
Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: <info> State change: Local UP
Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: <info> State change: 200.254.254.171 UP
Apr 10 13:21:25 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
member #0: Connection timed out
Apr 10 17:03:22 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
member #0: Connection timed out
Apr 10 20:30:30 xlx2 clulockd[4498]: <warning> Denied 200.254.254.171:
Broken pipe
Apr 10 20:30:30 xlx2 clulockd[4498]: <err> select error: Broken pipe

Att,
Forigas



From Alain.Moulle at bull.net  Tue Apr 11 06:08:57 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 11 Apr 2006 08:08:57 +0200
Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
Message-ID: <443B47F9.6090506@bull.net>

>Hi
>>
>> I'm trying to configure a simple 3 nodes cluster with simple
>> tests scripts.
>> But I can't start cman, it remains stalled with this message
>> in syslog :
>> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr 10
>> 11:38:00 s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
>> 16:04:34) installed
>> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered
>> protocol family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]:
>> cluster.conf (cluster name = HA_METADATA_3N, version = 8) found.
>> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to join
>> or form a Linux-cluster Apr 10 11:38:01 s_sys at yack21
>> ccsd[25004]: Connected to cluster infrastruture
>> via: CMAN/SM Plugin v1.1.2
>> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status::
>> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN:
>> forming a new cluster
>>
>> and nothing more.
>>
>> The graphic tool dos not detect any error in configuration; I
>> 've attached my cluster.conf for the three nodes, knowing
>> that I wanted two nodes (yack10 and yack21) running theirs
>> applications and the 3rd one (yack23) as a backup for yack10
>> and/or yack21, but I don't want any failover between yack10
>> and yack21.
>>
>> PS : I 've verified all ssh connections between the 3 nodes,
>> and all the fence paths as described in the cluster.conf.
>> Thanks again for your help.
>>
>> Alain
>>


>Are you starting the cman on all three nodes in the same time? A node doesn't
>start until each other node is starting. Timing is important during booting.

>Leandro

Hi, no I wasn't ...
I've tried now, and this is ok on yack21 and yack23, but not on yack10,
is there something wrong in the cluster.conf to explain this behavior ?
On yack10 , cman is trying to :
CMAN: forming a new cluster
but fails with a timeout ...

??
Thanks
Alain
-- 



mailto:Alain.Moulle at bull.net
+------------------------------+--------------------------------+
|	Alain Moull?	       	| from France :	04 76 29 75 99  |
|                              	| FAX number  : 04 76 29 72 49  |
| Bull SA		       	|				|
| 1, Rue de Provence  		| Adr  : FREC B1-041            |
| B.P. 208			|				|
| 38432 Echirolles - CEDEX     	| Email: Alain.Moulle at bull.net  |
| France                       	| BCOM : 229 7599               |
+-------------------------------+-------------------------------+

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 1500 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060411/60a4cdc9/attachment.xml>

From l.dardini at comune.prato.it  Tue Apr 11 06:59:13 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Tue, 11 Apr 2006 08:59:13 +0200
Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
Message-ID: <0C5C8B118420264EBB94D7D7050150011EFAEB@exchange2.comune.prato.local>

 

> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] Per conto di Alain Moulle
> Inviato: marted? 11 aprile 2006 8.09
> A: linux-cluster at redhat.com
> Oggetto: R: [Linux-cluster] CS4 U2 / problem to configure a 3 
> nodes cluster
> 
> >Hi
> >>
> >> I'm trying to configure a simple 3 nodes cluster with simple tests 
> >> scripts.
> >> But I can't start cman, it remains stalled with this message in 
> >> syslog :
> >> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr 
> 10 11:38:00 
> >> s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
> >> 16:04:34) installed
> >> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered protocol 
> >> family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]:
> >> cluster.conf (cluster name = HA_METADATA_3N, version = 8) found.
> >> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to 
> join or form 
> >> a Linux-cluster Apr 10 11:38:01 s_sys at yack21
> >> ccsd[25004]: Connected to cluster infrastruture
> >> via: CMAN/SM Plugin v1.1.2
> >> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status::
> >> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN:
> >> forming a new cluster
> >>
> >> and nothing more.
> >>
> >> The graphic tool dos not detect any error in configuration; I 've 
> >> attached my cluster.conf for the three nodes, knowing that 
> I wanted 
> >> two nodes (yack10 and yack21) running theirs applications 
> and the 3rd 
> >> one (yack23) as a backup for yack10 and/or yack21, but I 
> don't want 
> >> any failover between yack10 and yack21.
> >>
> >> PS : I 've verified all ssh connections between the 3 
> nodes, and all 
> >> the fence paths as described in the cluster.conf.
> >> Thanks again for your help.
> >>
> >> Alain
> >>
> 
> 
> >Are you starting the cman on all three nodes in the same 
> time? A node 
> >doesn't start until each other node is starting. Timing is 
> important during booting.
> 
> >Leandro
> 
> Hi, no I wasn't ...
> I've tried now, and this is ok on yack21 and yack23, but not 
> on yack10, is there something wrong in the cluster.conf to 
> explain this behavior ?
> On yack10 , cman is trying to :
> CMAN: forming a new cluster
> but fails with a timeout ...
> 
> ??
> Thanks
> Alain
> -- 
> 

Maybe this time is due to a firewall setup, as already stated on the list. A tcpdump from yack10 to the other nodes may help you catch the bug.

Leandro



From ugo.parsi at gmail.com  Tue Apr 11 07:44:56 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Tue, 11 Apr 2006 09:44:56 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <D915D859-864E-4B48-A370-FB689AE220FB@sara.nl>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
	<443A6CB1.7010307@adelpha-lan.org>
	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>
	<443A70EE.4070907@adelpha-lan.org>
	<f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
	<443A790E.1040002@sara.nl>
	<f29fd8170604100957u5a78b7c9yfbd7295d72a2e1e0@mail.gmail.com>
	<D915D859-864E-4B48-A370-FB689AE220FB@sara.nl>
Message-ID: <f29fd8170604110044u3a60bbap7ca0ff1aa0954c33@mail.gmail.com>

> You have to download, from cvs STABLE:
>   cvs -d :pserver:cvs at sources.redhat.com:/cvs/cluster checkout -r
> STABLE cluster
>

Ok I've tried it, thanks, it does seem to work better but I have still
issues....
This time there's no kernel issues....but another missing .h file :

[...]
make[2]: Entering directory `/usr/src/cluster/cman/lib'
gcc -Wall  -g -O -I. -fPIC -I/usr/src/cluster/build/incdir/cluster -c
-o libcman.o libcman.c
libcman.c:31:35: cluster/cnxman-socket.h: No such file or directory
libcman.c:44: warning: `struct cl_cluster_node' declared inside parameter list
libcman.c:44: warning: its scope is only this definition or
declaration, which is probably not what you want
libcman.c: In function `copy_node':
libcman.c:46: error: dereferencing pointer to incomplete type
libcman.c:47: error: dereferencing pointer to incomplete type
[...]

> Some packages need header files that are provided by others. So you
> most install them
> before compiling the rest.  I have made debian package scripts for
> all cluster packages.

True, but well, that's what the main Makefile is doing, right ?

[....]
        cd cman-kernel && ${MAKE} install ${MAKELINE}
        cd dlm-kernel && ${MAKE} install ${MAKELINE}
        cd gfs-kernel && ${MAKE} install ${MAKELINE}
        cd gnbd-kernel && ${MAKE} install ${MAKELINE}
        cd magma && ${MAKE} install ${MAKELINE}
        cd ccs && ${MAKE} install ${MAKELINE}
[....]

So I don't see what you are doing more.... except the fact you are
building Debian packages ?

Thanks a lot,

Ugo PARSI



From pcaulfie at redhat.com  Tue Apr 11 07:47:52 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 11 Apr 2006 08:47:52 +0100
Subject: [Linux-cluster] cman kickout out nodes for no good reason
In-Reply-To: <1144702908.21093.7.camel@cocagne.max-t.internal>
References: <1144341281.355.38.camel@cocagne.max-t.internal>
	<1144702908.21093.7.camel@cocagne.max-t.internal>
Message-ID: <443B5F28.1060004@redhat.com>

Olivier Cr?te wrote:
> On Thu, 2006-06-04 at 12:34 -0400, Olivier Cr?te wrote:
>> I have a strange problem where cman suddenly starts kicking out members
>> of the cluster with "Inconsistent cluster view" when I join a new node
>> (sometimes).  It takes a few minutes between each kicking. I'm using a
>> snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is
>> in transition state at that point and I can't stop/start services or do
>> anything else. It did not do that with a snapshot I took a few months
>> ago.
> 
> Its still happening, the node that joins says "Transition master
> unknown", while all of the other nodes who the master is, then the
> master gets kicked out. Then a new master is selected, all of the nodes
> seem to know who the master is, but refuse to act on it. After a while,
> the new master is kicked out and the process restarts. I guess its
> related to the changes with the timestamps to prevent master desync, I
> dont see any other recent change that could have caused it.
> 

That's very peculiar behaviour, and it's going to be hard to pin down. How
consistently does it happen ?

It could be caused by extreme network packet loss, or something blocking the
progress of cman processes. Are the already joined nodes very busy when you
bring the new node into the cluster (if so, doing what?)

I think the best way to try and track this down is to get a tcpdump of the
cluster traffic (port 6809/udp) happening at the time of the join - make sure
that all nodes are included in the dump and that all of the packet is captured.

-- 

patrick



From pcaulfie at redhat.com  Tue Apr 11 08:46:15 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 11 Apr 2006 09:46:15 +0100
Subject: [Linux-cluster] DLM messages
In-Reply-To: <4427CB55.2060203@sara.nl>
References: <C2EC4062-0671-4594-B1D0-8AB6B0106CFE@sara.nl>	<20060327084643.GB27410@redhat.com>	<4427AA3F.3040009@sara.nl>
	<4427CB55.2060203@sara.nl>
Message-ID: <443B6CD7.8050704@redhat.com>


> === FS2 ==
> Mar 27 12:28:25 ifs2 kernel: ------------[ cut here ]------------
> Mar 27 12:28:25 ifs2 kernel: kernel BUG at
> /usr/src/gfs/stable_1.0.2/stable/cluster/cman-kernel/src/membership.c:3151!
> Mar 27 12:28:25 ifs2 kernel: invalid opcode: 0000 [#1]
> Mar 27 12:28:25 ifs2 kernel: SMP
> Mar 27 12:28:25 ifs2 kernel: Modules linked in: lock_dlm dlm cman
> dm_round_robin dm_multipath sg ide_floppy ide_cd cdrom qla2xxx siimage piix
> e1000 gfs lock_harness dm_mod
> Mar 27 12:28:25 ifs2 kernel: CPU:    0
> Mar 27 12:28:25 ifs2 kernel: EIP:    0060:[<f8ac7825>]    Tainted: GF   VLI
> Mar 27 12:28:25 ifs2 kernel: EFLAGS: 00010246   (2.6.16-rc5-sara3 #1)
> Mar 27 12:28:25 ifs2 kernel: EIP is at elect_master+0x34/0x41 [cman]

That cman crash looks nasty, though it may be related to "disabing the
heartbeat-network interface". Is this the node you are referring to ?

-- 

patrick



From basv at sara.nl  Tue Apr 11 10:13:33 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Tue, 11 Apr 2006 12:13:33 +0200
Subject: [Linux-cluster] Using GFS with vanilla kernel (2.6.16)
In-Reply-To: <f29fd8170604110044u3a60bbap7ca0ff1aa0954c33@mail.gmail.com>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>	<443A6CB1.7010307@adelpha-lan.org>	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>	<443A70EE.4070907@adelpha-lan.org>	<f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>	<443A790E.1040002@sara.nl>	<f29fd8170604100957u5a78b7c9yfbd7295d72a2e1e0@mail.gmail.com>	<D915D859-864E-4B48-A370-FB689AE220FB@sara.nl>
	<f29fd8170604110044u3a60bbap7ca0ff1aa0954c33@mail.gmail.com>
Message-ID: <443B814D.6030706@sara.nl>

Ugo PARSI wrote:
>> You have to download, from cvs STABLE:
>>   cvs -d :pserver:cvs at sources.redhat.com:/cvs/cluster checkout -r
>> STABLE cluster
>>
> 
> Ok I've tried it, thanks, it does seem to work better but I have still
> issues....
> This time there's no kernel issues....but another missing .h file :
> 
> [...]
> make[2]: Entering directory `/usr/src/cluster/cman/lib'
> gcc -Wall  -g -O -I. -fPIC -I/usr/src/cluster/build/incdir/cluster -c
> -o libcman.o libcman.c
> libcman.c:31:35: cluster/cnxman-socket.h: No such file or directory
> libcman.c:44: warning: `struct cl_cluster_node' declared inside parameter list
> libcman.c:44: warning: its scope is only this definition or
> declaration, which is probably not what you want
> libcman.c: In function `copy_node':
> libcman.c:46: error: dereferencing pointer to incomplete type
> libcman.c:47: error: dereferencing pointer to incomplete type
> [...]
> 
This a bug i reported it to his list, but no replies. I think i removed 
the cluster from the include cluster/cnxman-socket.h line.

Your are using debian or not. I can put the deb-packages that kernel 
independed on our ftp-server. No warranty they include all init.d
script and start at runlevel 3.

When i machine start in starts at runlevel 2, not in cluster enabled 
mode. To enable cluster mode we do a init 3, to can remove a node from a 
cluster with the init 2 command.

Regards
-- 
--
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: basv at sara.nl      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************



From basv at sara.nl  Tue Apr 11 10:19:45 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Tue, 11 Apr 2006 12:19:45 +0200
Subject: [Linux-cluster] DLM messages
In-Reply-To: <443B6CD7.8050704@redhat.com>
References: <C2EC4062-0671-4594-B1D0-8AB6B0106CFE@sara.nl>	<20060327084643.GB27410@redhat.com>	<4427AA3F.3040009@sara.nl>	<4427CB55.2060203@sara.nl>
	<443B6CD7.8050704@redhat.com>
Message-ID: <443B82C1.7010603@sara.nl>

Patrick Caulfield wrote:
>> === FS2 ==
>> Mar 27 12:28:25 ifs2 kernel: ------------[ cut here ]------------
>> Mar 27 12:28:25 ifs2 kernel: kernel BUG at
>> /usr/src/gfs/stable_1.0.2/stable/cluster/cman-kernel/src/membership.c:3151!
>> Mar 27 12:28:25 ifs2 kernel: invalid opcode: 0000 [#1]
>> Mar 27 12:28:25 ifs2 kernel: SMP
>> Mar 27 12:28:25 ifs2 kernel: Modules linked in: lock_dlm dlm cman
>> dm_round_robin dm_multipath sg ide_floppy ide_cd cdrom qla2xxx siimage piix
>> e1000 gfs lock_harness dm_mod
>> Mar 27 12:28:25 ifs2 kernel: CPU:    0
>> Mar 27 12:28:25 ifs2 kernel: EIP:    0060:[<f8ac7825>]    Tainted: GF   VLI
>> Mar 27 12:28:25 ifs2 kernel: EFLAGS: 00010246   (2.6.16-rc5-sara3 #1)
>> Mar 27 12:28:25 ifs2 kernel: EIP is at elect_master+0x34/0x41 [cman]
> 
> That cman crash looks nasty, though it may be related to "disabing the
> heartbeat-network interface". Is this the node you are referring to ?
> 

As i read the thread this must be the node that i disabled the 
heartbeat-network. So the other nodes could fence this node and they did
but the other nodes also crashed.

Regards


-- 
--
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: basv at sara.nl      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************



From Alain.Moulle at bull.net  Tue Apr 11 10:58:30 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 11 Apr 2006 12:58:30 +0200
Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
Message-ID: <443B8BD6.80906@bull.net>

>>Hi
>>
>>>> >>
>>>> >> I'm trying to configure a simple 3 nodes cluster with simple tests
>>>> >> scripts.
>>>> >> But I can't start cman, it remains stalled with this message in
>>>> >> syslog :
>>>> >> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr
>
>> 10 11:38:00
>
>>>> >> s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
>>>> >> 16:04:34) installed
>>>> >> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: Registered protocol
>>>> >> family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]:
>>>> >> cluster.conf (cluster name = HA_METADATA_3N, version = 8) found.
>>>> >> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to
>
>> join or form
>
>>>> >> a Linux-cluster Apr 10 11:38:01 s_sys at yack21
>>>> >> ccsd[25004]: Connected to cluster infrastruture
>>>> >> via: CMAN/SM Plugin v1.1.2
>>>> >> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status::
>>>> >> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN:
>>>> >> forming a new cluster
>>>> >>
>>>> >> and nothing more.
>>>> >>
>>>> >> The graphic tool dos not detect any error in configuration; I 've
>>>> >> attached my cluster.conf for the three nodes, knowing that
>
>> I wanted
>
>>>> >> two nodes (yack10 and yack21) running theirs applications
>
>> and the 3rd
>
>>>> >> one (yack23) as a backup for yack10 and/or yack21, but I
>
>> don't want
>
>>>> >> any failover between yack10 and yack21.
>>>> >>
>>>> >> PS : I 've verified all ssh connections between the 3
>
>> nodes, and all
>
>>>> >> the fence paths as described in the cluster.conf.
>>>> >> Thanks again for your help.
>>>> >>
>>>> >> Alain
>>>> >>
>
>>
>>
>
>>> >Are you starting the cman on all three nodes in the same
>
>> time? A node
>
>>> >doesn't start until each other node is starting. Timing is
>
>> important during booting.
>>
>
>>> >Leandro
>
>>
>> Hi, no I wasn't ...
>> I've tried now, and this is ok on yack21 and yack23, but not
>> on yack10, is there something wrong in the cluster.conf to
>> explain this behavior ?
>> On yack10 , cman is trying to :
>> CMAN: forming a new cluster
>> but fails with a timeout ...
>>
>> ??
>> Thanks
>> Alain
>> --
>>


>Maybe this time is due to a firewall setup, as already stated on the list. A
>tcpdump from yack10 to the other nodes may help you catch the bug.
>Leandro

No firewall setup on yack10, neither on yack21 nor yack23. Besides
the ssh connections are all valid between the three nodes in all
combinations without passwd request. And still the problem ...
Any other idea ?
Is my cluster.conf correct ?

Besides, with regard to you first answer, I've tested on yack21 and yack23 :
if I start cman only on yack21, it does end in timeout.
And if I start cman quite at the same time on yack21 and yack23, it
works on both nodes.
I haven't found in documentation any recommandation about this point.
Besides, if one node is breakdowned, that mean that we will never be
able to reboot the other node and launch the CS4 again with all
applications ... sounds strange, doesn't it ?

Thanks
Alain Moull?





From pcaulfie at redhat.com  Tue Apr 11 11:52:23 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 11 Apr 2006 12:52:23 +0100
Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
In-Reply-To: <443B8BD6.80906@bull.net>
References: <443B8BD6.80906@bull.net>
Message-ID: <443B9877.2020505@redhat.com>

Alain Moulle wrote:
>> Maybe this time is due to a firewall setup, as already stated on the list. A
>> tcpdump from yack10 to the other nodes may help you catch the bug.
>> Leandro
> 
> No firewall setup on yack10, neither on yack21 nor yack23. Besides
> the ssh connections are all valid between the three nodes in all
> combinations without passwd request. And still the problem ...
> Any other idea ?
> Is my cluster.conf correct ?
> 
> Besides, with regard to you first answer, I've tested on yack21 and yack23 :
> if I start cman only on yack21, it does end in timeout.
> And if I start cman quite at the same time on yack21 and yack23, it
> works on both nodes.
> I haven't found in documentation any recommandation about this point.
> Besides, if one node is breakdowned, that mean that we will never be
> able to reboot the other node and launch the CS4 again with all
> applications ... sounds strange, doesn't it ?
>

Can you be a little clearer exactly what you mean by this? and post some exact
messages please. It's not clear to me now just what your problem is.

>From your initial post it sounded like the nodes in the cluster were forming
separate clusters, but that last sentence makes it sound like you're seeing
something else.

-- 

patrick



From l.dardini at comune.prato.it  Tue Apr 11 12:48:35 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Tue, 11 Apr 2006 14:48:35 +0200
Subject: R: [Linux-cluster] CS4 U2 / problem to configure a 3 nodes cluster
Message-ID: <0C5C8B118420264EBB94D7D7050150011EFAFC@exchange2.comune.prato.local>

 

> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] Per conto di Alain Moulle
> Inviato: marted? 11 aprile 2006 12.59
> A: linux-cluster at redhat.com
> Oggetto: R: [Linux-cluster] CS4 U2 / problem to configure a 3 
> nodes cluster
> 
> >>Hi
> >>
> >>>> >>
> >>>> >> I'm trying to configure a simple 3 nodes cluster with simple 
> >>>> >> tests scripts.
> >>>> >> But I can't start cman, it remains stalled with this 
> message in 
> >>>> >> syslog :
> >>>> >> Apr 10 11:37:44 s_sys at yack21 ccsd: startup succeeded Apr
> >
> >> 10 11:38:00
> >
> >>>> >> s_kernel at yack21 kernel: CMAN 2.6.9-39.5 (built Sep 20 2005
> >>>> >> 16:04:34) installed
> >>>> >> Apr 10 11:38:00 s_kernel at yack21 kernel: NET: 
> Registered protocol 
> >>>> >> family 30 Apr 10 11:38:00 s_sys at yack21 ccsd[25004]:
> >>>> >> cluster.conf (cluster name = HA_METADATA_3N, version 
> = 8) found.
> >>>> >> Apr 10 11:38:00 s_kernel at yack21 kernel: CMAN: Waiting to
> >
> >> join or form
> >
> >>>> >> a Linux-cluster Apr 10 11:38:01 s_sys at yack21
> >>>> >> ccsd[25004]: Connected to cluster infrastruture
> >>>> >> via: CMAN/SM Plugin v1.1.2
> >>>> >> Apr 10 11:38:01 s_sys at yack21 ccsd[25004]: Initial status::
> >>>> >> Inquorate Apr 10 11:38:32 s_kernel at yack21 kernel: CMAN:
> >>>> >> forming a new cluster
> >>>> >>
> >>>> >> and nothing more.
> >>>> >>
> >>>> >> The graphic tool dos not detect any error in configuration; I 
> >>>> >> 've attached my cluster.conf for the three nodes, knowing that
> >
> >> I wanted
> >
> >>>> >> two nodes (yack10 and yack21) running theirs applications
> >
> >> and the 3rd
> >
> >>>> >> one (yack23) as a backup for yack10 and/or yack21, but I
> >
> >> don't want
> >
> >>>> >> any failover between yack10 and yack21.
> >>>> >>
> >>>> >> PS : I 've verified all ssh connections between the 3
> >
> >> nodes, and all
> >
> >>>> >> the fence paths as described in the cluster.conf.
> >>>> >> Thanks again for your help.
> >>>> >>
> >>>> >> Alain
> >>>> >>
> >
> >>
> >>
> >
> >>> >Are you starting the cman on all three nodes in the same
> >
> >> time? A node
> >
> >>> >doesn't start until each other node is starting. Timing is
> >
> >> important during booting.
> >>
> >
> >>> >Leandro
> >
> >>
> >> Hi, no I wasn't ...
> >> I've tried now, and this is ok on yack21 and yack23, but not on 
> >> yack10, is there something wrong in the cluster.conf to 
> explain this 
> >> behavior ?
> >> On yack10 , cman is trying to :
> >> CMAN: forming a new cluster
> >> but fails with a timeout ...
> >>
> >> ??
> >> Thanks
> >> Alain
> >> --
> >>
> 
> 
> >Maybe this time is due to a firewall setup, as already stated on the 
> >list. A tcpdump from yack10 to the other nodes may help you 
> catch the bug.
> >Leandro
> 
> No firewall setup on yack10, neither on yack21 nor yack23. 
> Besides the ssh connections are all valid between the three 
> nodes in all combinations without passwd request. And still 
> the problem ...
> Any other idea ?
> Is my cluster.conf correct ?
> 
> Besides, with regard to you first answer, I've tested on 
> yack21 and yack23 :
> if I start cman only on yack21, it does end in timeout.
> And if I start cman quite at the same time on yack21 and 
> yack23, it works on both nodes.
> I haven't found in documentation any recommandation about this point.
> Besides, if one node is breakdowned, that mean that we will 
> never be able to reboot the other node and launch the CS4 
> again with all applications ... sounds strange, doesn't it ?
> 

No, this doesn't sound strange. Cluster must be quorate to operate. Quorum can be reduced while a node is down, fencing it or simply removing it, by cman or by hand editing cluster.conf. Try this: start all the node without cman, gfs and other GFS suite packages. Then start by hand, one a time on each node, ccsd, cman, lock_gulm(?), fenced, clvmd and rgmanager init scripts. After each run, check the /var/log/messages output and connectivity between nodes. Unfortunately the configuration is far different from the one I use, so I cannot help you.

Leandro



From akpinar_haydar at hotmail.com  Tue Apr 11 05:17:53 2006
From: akpinar_haydar at hotmail.com (Haydar Akpinar)
Date: Tue, 11 Apr 2006 05:17:53 +0000
Subject: [Linux-cluster] Linux (qmail) clustering
Message-ID: <BAY12-F26339F96CEBFB8663D7FCF9CD0@phx.gbl>

Hello every one.  I am a newbe so don't really know much about UNIX nor 
Linux for that matter

I have been asked to do a high availability qmail(non LDAP) clustering which 
is running on Redhat 9.

I would like to know if it is possible to do and also if any one has done 
qmail clustering on a Linux box.

And if any one can direct me with finding the information on How To

Thanks for your time.

_________________________________________________________________
Hava durumunu bizden ?grenin ve evden ?yle ?ikin! 
http://www.msn.com.tr/havadurumu/



From ocrete at max-t.com  Tue Apr 11 14:06:40 2006
From: ocrete at max-t.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=)
Date: Tue, 11 Apr 2006 10:06:40 -0400
Subject: [Linux-cluster] cman kickout out nodes for no good reason
In-Reply-To: <443B5F28.1060004@redhat.com>
References: <1144341281.355.38.camel@cocagne.max-t.internal>
	<1144702908.21093.7.camel@cocagne.max-t.internal>
	<443B5F28.1060004@redhat.com>
Message-ID: <1144764400.9106.3.camel@TesterBox.tester.ca>

On Tue, 2006-11-04 at 08:47 +0100, Patrick Caulfield wrote:
> Olivier Cr?te wrote:
> > On Thu, 2006-06-04 at 12:34 -0400, Olivier Cr?te wrote:
> >> I have a strange problem where cman suddenly starts kicking out members
> >> of the cluster with "Inconsistent cluster view" when I join a new node
> >> (sometimes).  It takes a few minutes between each kicking. I'm using a
> >> snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is
> >> in transition state at that point and I can't stop/start services or do
> >> anything else. It did not do that with a snapshot I took a few months
> >> ago.
> > 
> > Its still happening, the node that joins says "Transition master
> > unknown", while all of the other nodes who the master is, then the
> > master gets kicked out. Then a new master is selected, all of the nodes
> > seem to know who the master is, but refuse to act on it. After a while,
> > the new master is kicked out and the process restarts. I guess its
> > related to the changes with the timestamps to prevent master desync, I
> > dont see any other recent change that could have caused it.
> > 
> 
> That's very peculiar behaviour, and it's going to be hard to pin down. How
> consistently does it happen ?

Often, but I haven't found the exact sequence to reproduce it.

> It could be caused by extreme network packet loss, or something blocking the
> progress of cman processes. Are the already joined nodes very busy when you
> bring the new node into the cluster (if so, doing what?)

I doubt its packet loss since cman is running over myrinet's ethernet/ip
layer and its the only user of that port (so it shouldn't be affected by
the rest of the traffic over the myrinet). The other nodes may be busy,
but the CPU isn't at 100% us on any of them, although the PCIX bus may
be used a lot.

> I think the best way to try and track this down is to get a tcpdump of the
> cluster traffic (port 6809/udp) happening at the time of the join - make sure
> that all nodes are included in the dump and that all of the packet is captured.

I will try to get a tcpdump.

Thanks for you help,

-- 
Olivier Cr?te
ocrete at max-t.com
Maximum Throughput Inc.



From mbrookov at mines.edu  Tue Apr 11 14:49:04 2006
From: mbrookov at mines.edu (Matthew B. Brookover)
Date: Tue, 11 Apr 2006 08:49:04 -0600
Subject: [Linux-cluster] Cisco fence agent
In-Reply-To: <a51e1ec20604080923m778e8fbbtd6ab207666d6ac3c@mail.gmail.com>
References: <a51e1ec20604080923m778e8fbbtd6ab207666d6ac3c@mail.gmail.com>
Message-ID: <1144766944.16956.10.camel@merlin.Mines.EDU>

I do not know if this will help, but here is what I put together.

We have 3 Cisco 3750 switches.  I am currently using SNMP to turn off
the ports of a host that is being fenced.  I wrote a perl script called
fence_cisco that works with GFS 6.  I have attached a copy of
fence_cisco to this message and its config file.  I do not have much in
the way of documentation for it, and it will probably take some hacking
to get it to work with a current version of GFS.  If you know a little
perl, writing a fencing agent is not very difficult.

I have also included a copy for the config file for fence_cisco.  The
first two lines specify the SNMP community string and the IP address for
the switch.  The rest is a list of hosts and the ports they use.  You
will have to talk to your local network guru to figure out Cisco
community strings and the numbers involved.  It took some tinkering to
figure out how Cisco does this stuff, and even after writing the code, I
am still not sure that I understand it.  I do know that it does work,
GFS does do the correct things during a crash.

Most people use one of the power supply switches.  Redhat provides the
fence_apc agent that will turn off the power to a node that needs to be
fenced.  I like the network option because the host that is having
problems will be able to write log entries after it has been fenced.

You will need to get the Net::SNMP module from cpan.org to use
fence_cisco.
Matt


On Sun, 2006-04-09 at 01:23 +0900, ??? wrote:

> Hi all.
> Do anyone have cisco catalyst fence agent?
> If nobody make that, I will make.
> 
> Thanks.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060411/f8bc2ac3/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fence_cisco
Type: application/x-perl
Size: 10442 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060411/f8bc2ac3/attachment.pl>
-------------- next part --------------
community:YOURSTRINGHERE
switch:1.1.1.1
imagine:GigabitEthernet1/0/9:GigabitEthernet2/0/9:GigabitEthernet1/0/5
illuminate:GigabitEthernet2/0/10:GigabitEthernet3/0/9:GigabitEthernet2/0/6
illusion:GigabitEthernet1/0/10:GigabitEthernet3/0/10:GigabitEthernet1/0/6
inception:GigabitEthernet1/0/11:GigabitEthernet2/0/11:GigabitEthernet1/0/7
inspire:GigabitEthernet2/0/12:GigabitEthernet3/0/11:GigabitEthernet2/0/8
incantation:GigabitEthernet1/0/12:GigabitEthernet3/0/12:GigabitEthernet1/0/8

From carlopmart at gmail.com  Tue Apr 11 15:01:16 2006
From: carlopmart at gmail.com (carlopmart)
Date: Tue, 11 Apr 2006 17:01:16 +0200
Subject: [Linux-cluster] Question about manual fencing
In-Reply-To: <443A80D2.6050806@adelpha-lan.org>
References: <443A7F34.7000901@gmail.com> <443A80D2.6050806@adelpha-lan.org>
Message-ID: <443BC4BC.1030405@gmail.com>

Thanks Jerome.

Castang Jerome wrote:
> carlopmart a ?crit :
> 
>> Hi all,
>>
>>  I would like to test manual fencing on two nodes for testing 
>> pourposes.  I have read RedHat's docs about this but I don't see very 
>> clear. If I setup manual fencing, when one node shutdowns, the other 
>> node startups all services that I have configured on the another node 
>> automatically?
>>
>> Thanks.
>>
> 
> I don't think so.
> Fencing a node is to stop it, or make it leaving the cluster (using any 
> method like shutdown...)
> So if you use manual fencing, the other nodes will not start automaticly 
> their services...
> 
> 

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From basv at sara.nl  Tue Apr 11 15:35:47 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Tue, 11 Apr 2006 17:35:47 +0200
Subject: [Offlist] Re: [Linux-cluster] Using GFS with vanilla kernel
	(2.6.16)
In-Reply-To: <Pine.LNX.4.63.0604110858320.15939@tungsten.msp.technicality.org>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
	<443A6CB1.7010307@adelpha-lan.org>
	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>
	<443A70EE.4070907@adelpha-lan.org>
	<f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
	<443A790E.1040002@sara.nl>
	<Pine.LNX.4.63.0604110858320.15939@tungsten.msp.technicality.org>
Message-ID: <C0459316-33A1-4B3D-9005-1F74AD42EFF4@sara.nl>


On Apr 11, 2006, at 3:58 PM, Nate Carlson wrote:

> On Mon, 10 Apr 2006, Bas van der Vlies wrote:
>> I have compiled deb-packages for kernel 2.6.16.2 and uses the CVS  
>> STABLE branch.
>
> Do you have the source packages? It'd be really handy to be able to  
> build module packages. :)
>
>
I did not make source packages,  its is a good suggestion, but i use  
gfs from CVS and use different kind of kernels.  So i regularly
make new versions.

For every package i creates a debian directory and i made i global  
script that compiles everything and make debian packages
  - for the kernel modules, the kernel version is in the package
- for the user space tools i only update the version number.

Regards

--
Bas van der Vlies
basv at sara.nl





From natecars at natecarlson.com  Tue Apr 11 15:37:58 2006
From: natecars at natecarlson.com (Nate Carlson)
Date: Tue, 11 Apr 2006 10:37:58 -0500 (CDT)
Subject: [Offlist] Re: [Linux-cluster] Using GFS with vanilla kernel
	(2.6.16)
In-Reply-To: <C0459316-33A1-4B3D-9005-1F74AD42EFF4@sara.nl>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
	<443A6CB1.7010307@adelpha-lan.org>
	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>
	<443A70EE.4070907@adelpha-lan.org>
	<f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
	<443A790E.1040002@sara.nl>
	<Pine.LNX.4.63.0604110858320.15939@tungsten.msp.technicality.org>
	<C0459316-33A1-4B3D-9005-1F74AD42EFF4@sara.nl>
Message-ID: <Pine.LNX.4.63.0604111036430.15939@tungsten.msp.technicality.org>

On Tue, 11 Apr 2006, Bas van der Vlies wrote:
> I did not make source packages, its is a good suggestion, but i use gfs 
> from CVS and use different kind of kernels.  So i regularly make new 
> versions.
>
> For every package i creates a debian directory and i made i global script 
> that compiles everything and make debian packages
> - for the kernel modules, the kernel version is in the package
> - for the user space tools i only update the version number.

Would you mind sharing the scripts? That'd make my life a bit easier when 
packaging GFS for debian.  :)

------------------------------------------------------------------------
| nate carlson | natecars at natecarlson.com | http://www.natecarlson.com |
|       depriving some poor village of its idiot since 1981            |
------------------------------------------------------------------------



From basv at sara.nl  Tue Apr 11 15:43:37 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Tue, 11 Apr 2006 17:43:37 +0200
Subject: [Offlist] Re: [Linux-cluster] Using GFS with vanilla kernel
	(2.6.16)
In-Reply-To: <Pine.LNX.4.63.0604111036430.15939@tungsten.msp.technicality.org>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
	<443A6CB1.7010307@adelpha-lan.org>
	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>
	<443A70EE.4070907@adelpha-lan.org>
	<f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
	<443A790E.1040002@sara.nl>
	<Pine.LNX.4.63.0604110858320.15939@tungsten.msp.technicality.org>
	<C0459316-33A1-4B3D-9005-1F74AD42EFF4@sara.nl>
	<Pine.LNX.4.63.0604111036430.15939@tungsten.msp.technicality.org>
Message-ID: <2553AC38-C5BC-4C10-95CE-8CFB0F85E0A6@sara.nl>


On Apr 11, 2006, at 5:37 PM, Nate Carlson wrote:

> On Tue, 11 Apr 2006, Bas van der Vlies wrote:
>> I did not make source packages, its is a good suggestion, but i  
>> use gfs from CVS and use different kind of kernels.  So i  
>> regularly make new versions.
>>
>> For every package i creates a debian directory and i made i global  
>> script that compiles everything and make debian packages
>> - for the kernel modules, the kernel version is in the package
>> - for the user space tools i only update the version number.
>
> Would you mind sharing the scripts? That'd make my life a bit  
> easier when packaging GFS for debian.  :)
>

No problem, I have to package it and make it available on our ftp- 
server.   If find bug or have improvements mail them.
I will send an email to list if i have made release ;-)

Regards


--
Bas van der Vlies
basv at sara.nl





From natecars at natecarlson.com  Tue Apr 11 15:43:59 2006
From: natecars at natecarlson.com (Nate Carlson)
Date: Tue, 11 Apr 2006 10:43:59 -0500 (CDT)
Subject: [Offlist] Re: [Linux-cluster] Using GFS with vanilla kernel
	(2.6.16)
In-Reply-To: <2553AC38-C5BC-4C10-95CE-8CFB0F85E0A6@sara.nl>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
	<443A6CB1.7010307@adelpha-lan.org>
	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>
	<443A70EE.4070907@adelpha-lan.org>
	<f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
	<443A790E.1040002@sara.nl>
	<Pine.LNX.4.63.0604110858320.15939@tungsten.msp.technicality.org>
	<C0459316-33A1-4B3D-9005-1F74AD42EFF4@sara.nl>
	<Pine.LNX.4.63.0604111036430.15939@tungsten.msp.technicality.org>
	<2553AC38-C5BC-4C10-95CE-8CFB0F85E0A6@sara.nl>
Message-ID: <Pine.LNX.4.63.0604111043530.19238@tungsten.msp.technicality.org>

On Tue, 11 Apr 2006, Bas van der Vlies wrote:
> No problem, I have to package it and make it available on our ftp-server. 
> If find bug or have improvements mail them.
> I will send an email to list if i have made release ;-)

Great - thanks! :)

------------------------------------------------------------------------
| nate carlson | natecars at natecarlson.com | http://www.natecarlson.com |
|       depriving some poor village of its idiot since 1981            |
------------------------------------------------------------------------



From jbrassow at redhat.com  Tue Apr 11 15:48:25 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Tue, 11 Apr 2006 10:48:25 -0500
Subject: [Linux-cluster] Question about manual fencing
In-Reply-To: <443BC4BC.1030405@gmail.com>
References: <443A7F34.7000901@gmail.com> <443A80D2.6050806@adelpha-lan.org>
	<443BC4BC.1030405@gmail.com>
Message-ID: <634f53a0e00f383b47d142f530b9dbf7@redhat.com>

manual fencing gets it's name because it requires manual 
intervention... that is, it is not automatic.

  brassow

On Apr 11, 2006, at 10:01 AM, carlopmart wrote:

> Thanks Jerome.
>
> Castang Jerome wrote:
>> carlopmart a ?crit :
>>> Hi all,
>>>
>>>  I would like to test manual fencing on two nodes for testing 
>>> pourposes.  I have read RedHat's docs about this but I don't see 
>>> very clear. If I setup manual fencing, when one node shutdowns, the 
>>> other node startups all services that I have configured on the 
>>> another node automatically?
>>>
>>> Thanks.
>>>
>> I don't think so.
>> Fencing a node is to stop it, or make it leaving the cluster (using 
>> any method like shutdown...)
>> So if you use manual fencing, the other nodes will not start 
>> automaticly their services...
>
> -- 
> CL Martinez
> carlopmart {at} gmail {d0t} com
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>




From Alain.Moulle at bull.net  Tue Apr 11 15:56:02 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 11 Apr 2006 17:56:02 +0200
Subject: [Linux-cluster] CS4 Update 2 / cluster 3 noeuds question
Message-ID: <443BD192.1000407@bull.net>

Hi
Finally I've found the problem (a bad alias in /etc/hosts !).

But I've another question :
As told before, I have yack10 and yack23 with each one a service
to run, and yack23 as backup for both nodes (see attached cluster.conf)

I've tested with a poweroff on yack10 and the service
is well failoverd on yack23. But then I tried to
do poweroff on yack21, but it does not failover
because "missing two many heart beats".
I suspect that it is normal because we have only
one node left among the three, and so there is
not enough votes ...
But I would like to have a confirmation ?

And if so, is there a way to configure so that
yack23 could failover the services of both
other nodes stopped at the same time ?

Thanks
Alain

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: text/xml
Size: 2015 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060411/558a4dd3/attachment.xml>

From teigland at redhat.com  Tue Apr 11 16:52:59 2006
From: teigland at redhat.com (David Teigland)
Date: Tue, 11 Apr 2006 11:52:59 -0500
Subject: [Linux-cluster] cluster-1.02.00
Message-ID: <20060411165259.GB5820@redhat.com>

A new source tarball from the STABLE branch has been released; it builds
and runs on 2.6.16:

  ftp://sources.redhat.com/pub/cluster/releases/cluster-1.02.00.tar.gz

Version 1.02.00 - 10 April 2006
===============================
  dlm-kernel: Allow DLM to start if the node gets a different nodeid.
  dlm-kernel: Add WARNING printk when cman calls emergency_shutdown.
  dlm-kernel: The in_recovery semaphore wasn't being released in corner case
  where grant message is ignored for lock being unlocked.
  dlm-kernel: Remove an assertion that triggers unnecessarily in rare
  cases of overlapping and invalid master lookups.
  dlm-kernel: Don't close existing connection if a double-connect is
  attempted - just ignore the last one.
  dlm-kernel: Fix a race where an attempt to unlock a lock in the completion
  AST routine could crash on SMP.
  dlm-kernel: Fix transient hangs that could be caused by incorrect handling
  of locks granted due to ALTMODE. bz#178738
  dlm-kernel: Allow any old user to create the default lockspace.  You need Udev
  running AND build dlm with ./configure --have_udev.
  dlm-kernel: Only release a lockspace if all users have closed it. bz#177934
  cman-kernel: Fix cman master confusion during recovery. bz#158592
  cman-kernel: Add printk to assert failure when a nodeid lookup fails.
  cman-kernel: Give an interface "max-retries" attempts to get fixed after
  an error before we give up and shut down the cluster.
  cman-kernel: IPv6 FF1x:: multicast addresses don't work.  Always send out
  of the locally bound address. bz#166752
  cman-kernel: Ignore really badly delayed old duplicates that might get
  sent via a bonded interface. bz#173621
  cman-kernel: /proc/cluster/services seq_start needs to initialise the pointer,
  we may not be starting from the beginning every time. bz#175372
  cman-kernel: Fix memory leak when reading from /proc/cluster/nodes or
  /proc/cluster/services. bz#178367
  cman-kernel: Send a userspace notification when we are the last node in
  a cluster. bz#182233
  cman-kernel: add quorum device interface for userspace
  cman-kernel: Add node ID to /proc/cluster/status
  cman: Allow "cman_tool leave force" to cause cman to leave the cluster
  even if it's in transition or joining.
  cman: Look over more than 16 interfaces when searching for the broadcast
  address.
  cman: init script does 'cman_tool leave remove' on stop
  cman: add cman_get/set_private to libcman
  cman: add quorum device API to libcman
  gfs-kernel: Fix performance with sync mount option; pages were not being
  flushed when gfs_writepage is called. bz#173147
  gfs-kernel: Flush pages into storage in case of DirectIO falling back to
  BufferIO.  DirectIO reads were sometimes getting stale data.
  gfs-kernel: Make sendfile work with stuffed inodes; after a write on
  stuffed inode, mark cached page as not uptodate.  bz#142849
  gfs-kernel: Fix spot where the quota_enforce setting is ignored.
  gfs-kernel: Fix case of big allocation slowdown.  The allocator could end
  up failing its passive attempts to lock all recent rgrps because another
  node had deallocated from them and was caching the locks.  The allocator now
  switches from passive to forceful requests after try_threshold failures.
  gfs-kernel: Fix rare case of bad NFS file handles leading to stale file
  handle errors. bz#178469
  gfs-kernel: Properly handle error return code from verify_jhead().
  gfs-kernel: Fix possible umount panic due to the ordering of log flushes
  and log shutdown.  bz#164331, bz#178469
  gfs-kernel: Fix directory delete out of memory error.  bz#182057
  gfs-kernel: Return code was not being propagated while setting default
  ACLs causing an EPERM everytime. bz#182066
  gulm: Fix bug that would cause luck_gulmd to not call waitpid unless
  SIGCHLD was received from the child. bz#171246
  gulm:	Fix problems with host lookups.  Now try to match the ip if we are
  unable to match the name of a lock server as well as fixing the expiration
  of locks if gulm somehow gets a FQDN. bz#169171
  fence/fenced: Multiple devices in one method were not being translated
  into multiple calls to an agent, but all the device data was lumped together
  for one agent call. bz#172401
  fence/fence_apc: Make agent work with 7900 series apc switches. bz#172441
  fence/fence_ipmilan: fixes for bz#178314
  fence/fence_drac: support for drac 4/I
  fence/fence_drac: interface change in drac_mc firmware version 1.2
  fence: Add support for IBM rsa fence agent
  gnbd-kernel: gnbd_monitor wouldn't correctly reset after an uncached gnbd had
  failed and been restored. bz#155304
  gnbd-kernel: kill gnbd_monitor when all uncached gnbds have been removed.
  bz#127042
  gnbd: changes to let multipath run over gnbd.
  gfs_fsck: Fix small window where another node can mount during a gfs_fsck.
  bz#169087
  gfs_fsck: gfs_fsck crashed on many types of extended attribute corruptions.
  bz#173697
  gfs_fsck: Check result code and handle failure's in fsck rgrp read code.
  bz#169340
  gfs_fsck: fix errors checking large (multi-TB) filesystems. bz#186125
  gfs_edit: new version with more options that uses ncurses.
  ccs: Make ccs connection descriptors time out, fixing a problem where all
  descriptors could be used up, even though none are in use.
  ccs: Increase number of connection descriptors from 10 to 30.
  ccs: Ignore SIGPIPE, don't catch SIGSEV, allowing for core dumps.
  ccs: endian fixes for clusters of machines with different endianness
  ccs: Fix error printing. bz#178812
  ccs: fix ccs_tool seg fault on upgrade. bz#186121
  magma-plugins/sm: Fix reads of /proc/cluster/services. bz#175033
  magma-plugins/gulm: Fix clu_lock() return value that resulted in
  "Resource temporarily unavailable" messages at times. bz#171253
  rgmanager: Add support for inheritance in the form "type%attribute"
  instead of just attribute so as to avoid confusion.
  rgmanager: Fix bz#150346 - Clustat usability problems
  rgmanager: Fix bz#170859 - VIPs show up on multiple members.
  rgmanager: Fix bz#171034 - Missing: Monitoring for local and cluster fs's
  rgmanager: Fix bz#171036 - RFE: Log messages in resource agents
  rgmanager: Fix bz#165447 - ip.sh fails when using VLAN on bonded interface
  rgmanager: Fix bz#171153 - clustat withholds information if run on multiple
  members simultaneously
  rgmanager: Fix bz#171236 - ia64 alignment warnings
  rgmanager: Fix bz#173526 - Samba Resource Agent
  rgmanager: Fix bz#173916 - rgmanager log level change requires restart
  rgmanager: Fix bz#174819 - clustat crashes if ccsd is not running
  rgmanager: Fix bz#175106 - lsof -b blocks when using gethostbyname causing
  slow force-unmount when DNS is broken
  rgmanager: Fix bz#175108 - rgmanager storing extraneous info using VF
  rgmanager: Fix bz#175114 - rgmanager uses wrong stop-order for unspecified
  resource agents
  rgmanager: Implement bz#175215: Inherit fsid for nfs exports
  rgmanager: Fix bz#175229 - remove unneeded references to clurmtabd; it is no
  longer a necessary piece for NFS failover
  rgmanager: Fix bz#176343 - __builtin_return_address(x) for x>0 is never
  guaranteed to work
  rgmanager: Ensure rgmanager doesn't block SIGSEGV when debug is not enabled.
  rgmanager: Fix bz#172177, bz#172178
  rgmanager: Allow scripts to inherit the name attr of a parent in case the
  script wants to know it. bz#172310
  rgmanager: Fix #166109 - random segfault in clurgmgrd
  rgmanager: Fix most of 177467 - clustat hang



From gstaltari at arnet.net.ar  Tue Apr 11 19:25:20 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Tue, 11 Apr 2006 16:25:20 -0300
Subject: [Linux-cluster] cluster-1.02.00
In-Reply-To: <20060411165259.GB5820@redhat.com>
References: <20060411165259.GB5820@redhat.com>
Message-ID: <443C02A0.5010103@arnet.net.ar>

David Teigland wrote:
> A new source tarball from the STABLE branch has been released; it builds
> and runs on 2.6.16:
>
>   ftp://sources.redhat.com/pub/cluster/releases/cluster-1.02.00.tar.gz
>
> Version 1.02.00 - 10 April 2006
> ===============================
>   dlm-kernel: Allow DLM to start if the node gets a different nodeid.
>   dlm-kernel: Add WARNING printk when cman calls emergency_shutdown.
>   dlm-kernel: The in_recovery semaphore wasn't being released in corner case
>   where grant message is ignored for lock being unlocked.
>   dlm-kernel: Remove an assertion that triggers unnecessarily in rare
>   cases of overlapping and invalid master lookups.
>   dlm-kernel: Don't close existing connection if a double-connect is
>   attempted - just ignore the last one.
>   dlm-kernel: Fix a race where an attempt to unlock a lock in the completion
>   AST routine could crash on SMP.
>   dlm-kernel: Fix transient hangs that could be caused by incorrect handling
>   of locks granted due to ALTMODE. bz#178738
>   dlm-kernel: Allow any old user to create the default lockspace.  You need Udev
>   running AND build dlm with ./configure --have_udev.
>   dlm-kernel: Only release a lockspace if all users have closed it. bz#177934
>   cman-kernel: Fix cman master confusion during recovery. bz#158592
>   cman-kernel: Add printk to assert failure when a nodeid lookup fails.
>   cman-kernel: Give an interface "max-retries" attempts to get fixed after
>   an error before we give up and shut down the cluster.
>   cman-kernel: IPv6 FF1x:: multicast addresses don't work.  Always send out
>   of the locally bound address. bz#166752
>   cman-kernel: Ignore really badly delayed old duplicates that might get
>   sent via a bonded interface. bz#173621
>   cman-kernel: /proc/cluster/services seq_start needs to initialise the pointer,
>   we may not be starting from the beginning every time. bz#175372
>   cman-kernel: Fix memory leak when reading from /proc/cluster/nodes or
>   /proc/cluster/services. bz#178367
>   cman-kernel: Send a userspace notification when we are the last node in
>   a cluster. bz#182233
>   cman-kernel: add quorum device interface for userspace
>   cman-kernel: Add node ID to /proc/cluster/status
>   cman: Allow "cman_tool leave force" to cause cman to leave the cluster
>   even if it's in transition or joining.
>   cman: Look over more than 16 interfaces when searching for the broadcast
>   address.
>   cman: init script does 'cman_tool leave remove' on stop
>   cman: add cman_get/set_private to libcman
>   cman: add quorum device API to libcman
>   gfs-kernel: Fix performance with sync mount option; pages were not being
>   flushed when gfs_writepage is called. bz#173147
>   gfs-kernel: Flush pages into storage in case of DirectIO falling back to
>   BufferIO.  DirectIO reads were sometimes getting stale data.
>   gfs-kernel: Make sendfile work with stuffed inodes; after a write on
>   stuffed inode, mark cached page as not uptodate.  bz#142849
>   gfs-kernel: Fix spot where the quota_enforce setting is ignored.
>   gfs-kernel: Fix case of big allocation slowdown.  The allocator could end
>   up failing its passive attempts to lock all recent rgrps because another
>   node had deallocated from them and was caching the locks.  The allocator now
>   switches from passive to forceful requests after try_threshold failures.
>   gfs-kernel: Fix rare case of bad NFS file handles leading to stale file
>   handle errors. bz#178469
>   gfs-kernel: Properly handle error return code from verify_jhead().
>   gfs-kernel: Fix possible umount panic due to the ordering of log flushes
>   and log shutdown.  bz#164331, bz#178469
>   gfs-kernel: Fix directory delete out of memory error.  bz#182057
>   gfs-kernel: Return code was not being propagated while setting default
>   ACLs causing an EPERM everytime. bz#182066
>   gulm: Fix bug that would cause luck_gulmd to not call waitpid unless
>   SIGCHLD was received from the child. bz#171246
>   gulm:	Fix problems with host lookups.  Now try to match the ip if we are
>   unable to match the name of a lock server as well as fixing the expiration
>   of locks if gulm somehow gets a FQDN. bz#169171
>   fence/fenced: Multiple devices in one method were not being translated
>   into multiple calls to an agent, but all the device data was lumped together
>   for one agent call. bz#172401
>   fence/fence_apc: Make agent work with 7900 series apc switches. bz#172441
>   fence/fence_ipmilan: fixes for bz#178314
>   fence/fence_drac: support for drac 4/I
>   fence/fence_drac: interface change in drac_mc firmware version 1.2
>   fence: Add support for IBM rsa fence agent
>   gnbd-kernel: gnbd_monitor wouldn't correctly reset after an uncached gnbd had
>   failed and been restored. bz#155304
>   gnbd-kernel: kill gnbd_monitor when all uncached gnbds have been removed.
>   bz#127042
>   gnbd: changes to let multipath run over gnbd.
>   gfs_fsck: Fix small window where another node can mount during a gfs_fsck.
>   bz#169087
>   gfs_fsck: gfs_fsck crashed on many types of extended attribute corruptions.
>   bz#173697
>   gfs_fsck: Check result code and handle failure's in fsck rgrp read code.
>   bz#169340
>   gfs_fsck: fix errors checking large (multi-TB) filesystems. bz#186125
>   gfs_edit: new version with more options that uses ncurses.
>   ccs: Make ccs connection descriptors time out, fixing a problem where all
>   descriptors could be used up, even though none are in use.
>   ccs: Increase number of connection descriptors from 10 to 30.
>   ccs: Ignore SIGPIPE, don't catch SIGSEV, allowing for core dumps.
>   ccs: endian fixes for clusters of machines with different endianness
>   ccs: Fix error printing. bz#178812
>   ccs: fix ccs_tool seg fault on upgrade. bz#186121
>   magma-plugins/sm: Fix reads of /proc/cluster/services. bz#175033
>   magma-plugins/gulm: Fix clu_lock() return value that resulted in
>   "Resource temporarily unavailable" messages at times. bz#171253
>   rgmanager: Add support for inheritance in the form "type%attribute"
>   instead of just attribute so as to avoid confusion.
>   rgmanager: Fix bz#150346 - Clustat usability problems
>   rgmanager: Fix bz#170859 - VIPs show up on multiple members.
>   rgmanager: Fix bz#171034 - Missing: Monitoring for local and cluster fs's
>   rgmanager: Fix bz#171036 - RFE: Log messages in resource agents
>   rgmanager: Fix bz#165447 - ip.sh fails when using VLAN on bonded interface
>   rgmanager: Fix bz#171153 - clustat withholds information if run on multiple
>   members simultaneously
>   rgmanager: Fix bz#171236 - ia64 alignment warnings
>   rgmanager: Fix bz#173526 - Samba Resource Agent
>   rgmanager: Fix bz#173916 - rgmanager log level change requires restart
>   rgmanager: Fix bz#174819 - clustat crashes if ccsd is not running
>   rgmanager: Fix bz#175106 - lsof -b blocks when using gethostbyname causing
>   slow force-unmount when DNS is broken
>   rgmanager: Fix bz#175108 - rgmanager storing extraneous info using VF
>   rgmanager: Fix bz#175114 - rgmanager uses wrong stop-order for unspecified
>   resource agents
>   rgmanager: Implement bz#175215: Inherit fsid for nfs exports
>   rgmanager: Fix bz#175229 - remove unneeded references to clurmtabd; it is no
>   longer a necessary piece for NFS failover
>   rgmanager: Fix bz#176343 - __builtin_return_address(x) for x>0 is never
>   guaranteed to work
>   rgmanager: Ensure rgmanager doesn't block SIGSEGV when debug is not enabled.
>   rgmanager: Fix bz#172177, bz#172178
>   rgmanager: Allow scripts to inherit the name attr of a parent in case the
>   script wants to know it. bz#172310
>   rgmanager: Fix #166109 - random segfault in clurgmgrd
>   rgmanager: Fix most of 177467 - clustat hang
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>   
It would be nice to have the rpm for FC4 from this new update.
TIA
German



From gregp at liveammo.com  Wed Apr 12 03:13:31 2006
From: gregp at liveammo.com (Greg Perry)
Date: Tue, 11 Apr 2006 23:13:31 -0400
Subject: [Linux-cluster] Questions about GFS
Message-ID: <443C705B.6020606@liveammo.com>

Hello,

I have been researching GFS for a few days, and I have some questions 
that hopefully some seasoned users of GFS may be able to answer.

I am working on the design of a linux cluster that needs to be scalable, 
it will be primarily an RDBMS-driven data warehouse used for data mining 
and content indexing.  In an ideal world, we would be able to start with 
a small (say 4 node) cluster, then add machines (and storage) as the 
various RDBMS' grow in size (as well as the use virtual IPs for load 
balancing across multiple lighttpd instances.  All machines on the node 
need to be able to talk to the same volume of information, and GFS (in 
theory at least) would be used to aggregate the drives from each machine 
into that huge shared logical volume).

With that being said, here are some questions:

1) What is the preference on the RDBMS, will MySQL 5.x work and are 
there any locking issues to consider?  What would the best open source 
RDBMS be (MySQL vs. Postgresql etc)
2) If there was a 10 machine cluster, each with a 300GB SATA drive, can 
you use GFS to aggregate all 10 drives into one big logical 3000GB 
volume?  Would that scenario work similar to a RAID array?  If one or 
two nodes fail, but the GFS quorum is maintained, can those nodes be 
replaced and repopulated just like a RAID-5 array?  If this scenario is 
possible, how difficult is it to "grow" the shared logical volume by 
adding additional nodes (say I had two more machines each with a 300GB 
SATA drive)?
3) How stable is GFS currently, and is it used in many production 
environments?
4) How stable is the FC5 version, and does it include all of the 
configuration utilities in the RH Enterprise Cluster version?  (the idea 
would be to prove the point on FC5, then migrate to RH Enterprise).
5) Would CentOS be preferred over FC5 for the initial proof of concept 
and early adoption?
6) Are there any restrictions or performance advantages of using all 
drives with the same geometry, or can you mix and match different size 
drives and just add to the aggregate volume size?

Thanks in advance,

Greg



From pcaulfie at redhat.com  Wed Apr 12 07:06:17 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 12 Apr 2006 08:06:17 +0100
Subject: [Linux-cluster] CS4 Update 2 / cluster 3 noeuds question
In-Reply-To: <443BD192.1000407@bull.net>
References: <443BD192.1000407@bull.net>
Message-ID: <443CA6E9.9000402@redhat.com>

Alain Moulle wrote:
> Hi
> Finally I've found the problem (a bad alias in /etc/hosts !).
> 
> But I've another question :
> As told before, I have yack10 and yack23 with each one a service
> to run, and yack23 as backup for both nodes (see attached cluster.conf)
> 
> I've tested with a poweroff on yack10 and the service
> is well failoverd on yack23. But then I tried to
> do poweroff on yack21, but it does not failover
> because "missing two many heart beats".
> I suspect that it is normal because we have only
> one node left among the three, and so there is
> not enough votes ...
> But I would like to have a confirmation ?

Yes, that's correct. If you have a three-node cluster then there needs to be
two active nodes for it to have quorum. Otherwise single nodes could split
form "clusters" on their own and corrupt the filesystem (in the case of GFS)

> And if so, is there a way to configure so that
> yack23 could failover the services of both
> other nodes stopped at the same time ?
> 


-- 

patrick



From kumaresh81 at yahoo.co.in  Wed Apr 12 08:12:24 2006
From: kumaresh81 at yahoo.co.in (Kumaresh Ponnuswamy)
Date: Wed, 12 Apr 2006 09:12:24 +0100 (BST)
Subject: [Linux-cluster] a doubt on quorums
Message-ID: <20060412081224.71455.qmail@web8318.mail.in.yahoo.com>

Hi,
   
  I have a problem with my cluster and quorum settings and any help will be appreciated.
   
  I have a five node cluster with quorum vote of 1 for all the 5 nodes. They have a GFS shared file system on all the five nodes, and, two domains and two services involving two nodes.
   
  When I shut down the 3 nodes that don't participate in the two domains and clustered services, both the services stop and fail to start when tried manually also.
   
  I guess it is something to do with the quorum settings, but not sure on the way forward.
   
  The environment is on RHEL AS 4U2 with GFS 6.1 and RHCS 4U2.
   
  Regards,
  Kumaresh

				
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060412/4e543e85/attachment.htm>

From placid at adelpha-lan.org  Wed Apr 12 08:18:20 2006
From: placid at adelpha-lan.org (Castang Jerome)
Date: Wed, 12 Apr 2006 10:18:20 +0200
Subject: [Linux-cluster] a doubt on quorums
In-Reply-To: <20060412081224.71455.qmail@web8318.mail.in.yahoo.com>
References: <20060412081224.71455.qmail@web8318.mail.in.yahoo.com>
Message-ID: <443CB7CC.5@adelpha-lan.org>

Kumaresh Ponnuswamy a ?crit :
> Hi,
>  
> I have a problem with my cluster and quorum settings and any help will 
> be appreciated.
>  
> I have a five node cluster with quorum vote of 1 for all the 5 nodes. 
> They have a GFS shared file system on all the five nodes, and, two 
> domains and two services involving two nodes.
>  
> When I shut down the 3 nodes that don't participate in the two domains 
> and clustered services, both the services stop and fail to start when 
> tried manually also.
>  
> I guess it is something to do with the quorum settings, but not sure 
> on the way forward.
>  
> The environment is on RHEL AS 4U2 with GFS 6.1 and RHCS 4U2.
>  
> Regards,
> Kumaresh
>
> ------------------------------------------------------------------------
If you have 3 nodes of 5 falling down, your cluster becomes a two node 
cluster.
So, as it is written in documentation, it's a "special cluster" and it 
has to be specified (in cluster.conf or by this command "can_tool join -2")
When you have a two node cluster, it is possible that  each node  is  
isolated (this is the "splitbrain" ).


-- 
Jerome CASTANG
Tel: 06.85.74.33.02
mail: jerome.castang at adelpha-lan.org

---------------------------------------------
Comme le dit un vieu proverbe chinois: RTFM !



From erwan at seanodes.com  Wed Apr 12 08:18:48 2006
From: erwan at seanodes.com (Velu Erwan)
Date: Wed, 12 Apr 2006 10:18:48 +0200
Subject: [Linux-cluster] cluster-1.02.00
In-Reply-To: <20060411165259.GB5820@redhat.com>
References: <20060411165259.GB5820@redhat.com>
Message-ID: <443CB7E8.3020508@seanodes.com>

David Teigland a ?crit :

>A new source tarball from the STABLE branch has been released; it builds
>and runs on 2.6.16:
>  
>
Is it possible to split the kernel part from the binaries part in the 
make process ?

If yes, it could helps to have a dkms package that help us to use this 
release in an easiest way ;o)
My build host don't have the same kernel source as my nodes, so I'd like 
to build the binaries on it and then generate the dkms package.

When you install this dkms package on a new host, the kernel part of gfs 
recompiles itself.. This is very usefull ;)
Erwan,



From basv at sara.nl  Wed Apr 12 08:37:30 2006
From: basv at sara.nl (Bas van der Vlies)
Date: Wed, 12 Apr 2006 10:37:30 +0200
Subject: [Linux-cluster] ANNOUNCE: gfs_2_deb utils initial version
In-Reply-To: <Pine.LNX.4.63.0604111043530.19238@tungsten.msp.technicality.org>
References: <f29fd8170604100725j4008656btaa0f6880ffa2f583@mail.gmail.com>
	<443A6CB1.7010307@adelpha-lan.org>
	<f29fd8170604100739n42ddb178kc33e793ac9bd334f@mail.gmail.com>
	<443A70EE.4070907@adelpha-lan.org>
	<f29fd8170604100757y5983d943w70784f8cfa0e8269@mail.gmail.com>
	<443A790E.1040002@sara.nl>
	<Pine.LNX.4.63.0604110858320.15939@tungsten.msp.technicality.org>
	<C0459316-33A1-4B3D-9005-1F74AD42EFF4@sara.nl>
	<Pine.LNX.4.63.0604111036430.15939@tungsten.msp.technicality.org>
	<2553AC38-C5BC-4C10-95CE-8CFB0F85E0A6@sara.nl>
	<Pine.LNX.4.63.0604111043530.19238@tungsten.msp.technicality.org>
Message-ID: <443CBC4A.5080607@sara.nl>

= gfs_2_deb - utilities =

This is a release of the SARA package gfs_2_deb that contains utilities that
we use to make debian packages from the RedHat Cluster Software (GFS).

All init.d scripts in the debian package start at runlevel 3 and the scripts
start in the right order. We have choosen this setup for these reasons, 
default
runlevel is 2:
  1) When a node is fenced, the node is rebooted and is ready for 
cluster mode.
  2) We can easily switch from runlevels, join or leave the cluster

See README for further info

The package can be downloaded at:
	ftp://ftp.sara.nl/pub/outgoing/gfs_2_deb-0.1.tar.gz

Regards




-- 
--
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: basv at sara.nl      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************



From deval.kulshrestha at progression.com  Wed Apr 12 08:57:41 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Wed, 12 Apr 2006 14:27:41 +0530
Subject: [Linux-cluster] RE: how to dis-allow manual mounting of cluster
	file system resources?
Message-ID: <004501c65e0f$2afde300$7600a8c0@PROGRESSION>

Hi

 

I am using one MSA 500 G2 , two no. of HP DL360 G4P server with HP's HBA
642, Server installed with RHEL 4 ES U1 and RHCS4 with lock mgr as DLM

I have to run around 14 different services in HA mode, I have break them up
in two different priority domain. 

Now 7 services runs on node1 in HA mode, node2 is failover host for them, 

Remaining  7 services runs on node2 in HA mode and node1 is failover domain
for them.

In my scenario Simultaneous logical drive access is not required, thus I am
not using GFS here

What ever is needed is configured properly and working fine.

 

But this cluster is still causes some data inconsistency error if somebody
manually mounts the partitions which is already in access by other node.

I understand that this is against the basics of non-shared file system. This
can be documented also, but everybody knows that after 2-3 yrs down the line
when support staff replaced by new people, when they come in with very
limited understanding about the running stuff they can do some mount
mistake.(umount is a document screw up, but mount is here undocumented screw
up) every body knows mount is just a simple command, it does not harm
anything, if I just want to read data mount is ok. But in our case we wanted
to restrict other users to use mount command when some logical volume is
already mounted on one node.

 

I want some help on this, when shared file system is not implemented. How we
can restrict manual mount of cluster file system resources when its being in
use by some cluster services?

 

Any help would be highly appreciable here.

 

 

With regard

Deval K.

 

 

 

 

 

===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060412/8535115b/attachment.htm>

From kumaresh81 at yahoo.co.in  Wed Apr 12 10:03:38 2006
From: kumaresh81 at yahoo.co.in (Kumaresh Ponnuswamy)
Date: Wed, 12 Apr 2006 11:03:38 +0100 (BST)
Subject: [Linux-cluster] RE: how to dis-allow manual mounting of cluster
	file system resources?
In-Reply-To: <004501c65e0f$2afde300$7600a8c0@PROGRESSION>
Message-ID: <20060412100338.70764.qmail@web8327.mail.in.yahoo.com>

Hi,
   
  In your case, I guess removing the SUID on mount for normal users is the best solution.
   
  This is will prevent non root members from mounting the file systesm.
   
  Regards,
  Kumaresh

Deval kulshrestha <deval.kulshrestha at progression.com> wrote:
                Hi
   
  I am using one MSA 500 G2 , two no. of HP DL360 G4P server with HP?s HBA 642, Server installed with RHEL 4 ES U1 and RHCS4 with lock mgr as DLM
  I have to run around 14 different services in HA mode, I have break them up in two different priority domain. 
  Now 7 services runs on node1 in HA mode, node2 is failover host for them, 
  Remaining  7 services runs on node2 in HA mode and node1 is failover domain for them.
  In my scenario Simultaneous logical drive access is not required, thus I am not using GFS here
  What ever is needed is configured properly and working fine.
   
  But this cluster is still causes some data inconsistency error if somebody manually mounts the partitions which is already in access by other node.
  I understand that this is against the basics of non-shared file system. This can be documented also, but everybody knows that after 2-3 yrs down the line when support staff replaced by new people, when they come in with very limited understanding about the running stuff they can do some mount mistake.(umount is a document screw up, but mount is here undocumented screw up) every body knows mount is just a simple command, it does not harm anything, if I just want to read data mount is ok. But in our case we wanted to restrict other users to use mount command when some logical volume is already mounted on one node.
   
  I want some help on this, when shared file system is not implemented. How we can restrict manual mount of cluster file system resources when its being in use by some cluster services?
   
  Any help would be highly appreciable here.
   
   
  With regard
  Deval K.
   
   
   
   
   


===========================================================  Privileged or confidential information may be contained  in this message. If you are not the addressee indicated  in this message (or responsible for delivery of the   message to such person), please delete this message and  kindly notify the sender by an emailed reply. Opinions,  conclusions and other information in this message that  do not relate to the official business of Progression  and its associate entities shall be understood as neither  given nor endorsed by them.        -------------------------------------------------------------  Progression Infonet Private Limited, Gurgaon (Haryana), India  
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

				
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060412/2ee5c1ec/attachment.htm>

From deval.kulshrestha at progression.com  Wed Apr 12 10:59:13 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Wed, 12 Apr 2006 16:29:13 +0530
Subject: [Linux-cluster] RE: how to dis-allow manual mounting of cluster
	file system resources?
In-Reply-To: <20060412100338.70764.qmail@web8327.mail.in.yahoo.com>
Message-ID: <005501c65e20$22437b10$7600a8c0@PROGRESSION>

Hi Kumaresh

Thanks for the reply/inputs

SAN LUN's are not defined in /etc/fstab. They don't have to be mounted while
OS boots. SAN volumes are the part of Cluster resources groups, they are in
control of Cluster services rgmanager.

 

I did not understand how we can make it work, please suggest how we can go
ahead.

 

Regards

Deval

 

 

-----Original Message-----
From: Kumaresh Ponnuswamy [mailto:kumaresh81 at yahoo.co.in] 
Sent: Wednesday, April 12, 2006 3:34 PM
To: Deval kulshrestha; linux clustering
Subject: Re: [Linux-cluster] RE: how to dis-allow manual mounting of cluster
file system resources?

 

Hi,

 

In your case, I guess removing the SUID on mount for normal users is the
best solution.

 

This is will prevent non root members from mounting the file systesm.

 

Regards,

Kumaresh

Deval kulshrestha <deval.kulshrestha at progression.com> wrote:

Hi

 

I am using one MSA 500 G2 , two no. of HP DL360 G4P server with HP's HBA
642, Server installed with RHEL 4 ES U1 and RHCS4 with lock mgr as DLM

I have to run around 14 different services in HA mode, I have break them up
in two different priority domain. 

Now 7 services runs on node1 in HA mode, node2 is failover host for them, 

Remaining  7 services runs on node2 in HA mode and node1 is failover domain
for them.

In my scenario Simultaneous logical drive access is not required, thus I am
not using GFS here

What ever is needed is configured properly and working fine.

 

But this cluster is still causes some data inconsistency error if somebody
manually mounts the partitions which is already in access by other node.

I understand that this is against the basics of non-shared file system. This
can be documented also, but everybody knows that after 2-3 yrs down the line
when support staff replaced by new people, when they come in with very
limited understanding about the running stuff they can do some mount
mistake.(umount is a document screw up, but mount is here undocumented screw
up) every body knows mount is just a simple command, it does not harm
anything, if I just want to read data mount is ok. But in our case we wanted
to restrict other users to use mount command when some logical volume is
already mounted on one node.

 

I want some help on this, when shared file system is not implemented. How we
can restrict manual mount of cluster file system resources when its being in
use by some cluster services?

 

Any help would be highly appreciable here.

 

 

With regard

Deval K.

 

 

 

 

 

===========================================================  Privileged or
confidential information may be contained  in this message. If you are not
the addressee indicated  in this message (or responsible for delivery of the
message to such person), please delete this message and  kindly notify the
sender by an emailed reply. Opinions,  conclusions and other information in
this message that  do not relate to the official business of Progression
and its associate entities shall be understood as neither  given nor
endorsed by them.
-------------------------------------------------------------  Progression
Infonet Private Limited,
 Gurgaon (Haryana), India  

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

 

  _____  

Jiyo cricket on Yahoo!
<http://us.rd.yahoo.com/mail/in/mailcricket/*http:/in.sports.yahoo.com/crick
et/>  India cricket
Yahoo!
<http://us.rd.yahoo.com/mail/in/mailmobilemessenger/*http:/in.mobile.yahoo.c
om/new/messenger/>  Messenger Mobile Stay in touch with your buddies all the
time.

===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060412/14971b7a/attachment.htm>

From Bowie_Bailey at BUC.com  Wed Apr 12 14:56:13 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Wed, 12 Apr 2006 10:56:13 -0400
Subject: [Linux-cluster] Questions about GFS
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com>

Greg Perry wrote:
> 
> I have been researching GFS for a few days, and I have some questions
> that hopefully some seasoned users of GFS may be able to answer.
> 
> I am working on the design of a linux cluster that needs to be
> scalable, it will be primarily an RDBMS-driven data warehouse used
> for data mining and content indexing.  In an ideal world, we would be
> able to start with a small (say 4 node) cluster, then add machines
> (and storage) as the various RDBMS' grow in size (as well as the use
> virtual IPs for load balancing across multiple lighttpd instances. 
> All machines on the node need to be able to talk to the same volume
> of information, and GFS (in theory at least) would be used to
> aggregate the drives from each machine into that huge shared logical
> volume). 
> 
> With that being said, here are some questions:
> 
> 1) What is the preference on the RDBMS, will MySQL 5.x work and are
> there any locking issues to consider?  What would the best open source
> RDBMS be (MySQL vs. Postgresql etc)

Someone more qualified than me will have to answer that question.

> 2) If there was a 10 machine cluster, each with a 300GB SATA drive,
> can you use GFS to aggregate all 10 drives into one big logical 3000GB
> volume?  Would that scenario work similar to a RAID array?  If one or
> two nodes fail, but the GFS quorum is maintained, can those nodes be
> replaced and repopulated just like a RAID-5 array?  If this scenario
> is possible, how difficult is it to "grow" the shared logical volume
> by adding additional nodes (say I had two more machines each with a
> 300GB SATA drive)?

GFS doesn't work that way.  GFS is just a fancy filesystem.  It takes
an already shared volume and allows all of the nodes to access it at
the same time.

> 3) How stable is GFS currently, and is it used in many production
> environments?

It seems to be stable for me, but we are still in testing mode at the
moment.

> 4) How stable is the FC5 version, and does it include all of the
> configuration utilities in the RH Enterprise Cluster version?  (the
> idea would be to prove the point on FC5, then migrate to RH
> Enterprise).

Haven't used that one.

> 5) Would CentOS be preferred over FC5 for the initial
> proof of concept and early adoption?

If your eventual platform is RHEL, then CentOS would make more sense
for a testing platform since it is almost identical to RHEL.  Fedora
can be less stable and may introduce some issues that you wouldn't have
with RHEL.  On the other hand, RHEL may have some problems that don't
appear on Fedora because of updated packages.

If you want bleeding edge, use Fedora.
If you want stability, use CentOS or RHEL.

> 6) Are there any restrictions or performance advantages of using all
> drives with the same geometry, or can you mix and match different size
> drives and just add to the aggregate volume size?

As I said earlier, GFS does not do the aggregation.

What you get with GFS is the ability to share an already networked
storage volume.  You can use iSCSI, AoE, GNBD, or others to connect
the storage to all of the cluster nodes.  Then you format the volume
with GFS so that it can be used with all of the nodes.

I believe there is a project for the aggregate filesystem that you are
looking for, but as far as I know, it is still beta.

-- 
Bowie



From gregp at liveammo.com  Wed Apr 12 15:21:27 2006
From: gregp at liveammo.com (Greg Perry)
Date: Wed, 12 Apr 2006 11:21:27 -0400
Subject: [Linux-cluster] Questions about GFS
In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com>
References: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com>
Message-ID: <443D1AF7.8090105@liveammo.com>

Thanks Bowie, I understand more now.  So within this architecture, it 
would make more sense to utilize a RAID-5/10 SAN, then add diskless 
workstations as needed for performance...?

For said diskless workstations, does it make sense to run Stateless 
Linux to keep the images the same across all of the workstations/client 
machines?

Regards

Greg

Bowie Bailey wrote:
> Greg Perry wrote:
>> I have been researching GFS for a few days, and I have some questions
>> that hopefully some seasoned users of GFS may be able to answer.
>>
>> I am working on the design of a linux cluster that needs to be
>> scalable, it will be primarily an RDBMS-driven data warehouse used
>> for data mining and content indexing.  In an ideal world, we would be
>> able to start with a small (say 4 node) cluster, then add machines
>> (and storage) as the various RDBMS' grow in size (as well as the use
>> virtual IPs for load balancing across multiple lighttpd instances. 
>> All machines on the node need to be able to talk to the same volume
>> of information, and GFS (in theory at least) would be used to
>> aggregate the drives from each machine into that huge shared logical
>> volume). 
>>
>> With that being said, here are some questions:
>>
>> 1) What is the preference on the RDBMS, will MySQL 5.x work and are
>> there any locking issues to consider?  What would the best open source
>> RDBMS be (MySQL vs. Postgresql etc)
> 
> Someone more qualified than me will have to answer that question.
> 
>> 2) If there was a 10 machine cluster, each with a 300GB SATA drive,
>> can you use GFS to aggregate all 10 drives into one big logical 3000GB
>> volume?  Would that scenario work similar to a RAID array?  If one or
>> two nodes fail, but the GFS quorum is maintained, can those nodes be
>> replaced and repopulated just like a RAID-5 array?  If this scenario
>> is possible, how difficult is it to "grow" the shared logical volume
>> by adding additional nodes (say I had two more machines each with a
>> 300GB SATA drive)?
> 
> GFS doesn't work that way.  GFS is just a fancy filesystem.  It takes
> an already shared volume and allows all of the nodes to access it at
> the same time.
> 
>> 3) How stable is GFS currently, and is it used in many production
>> environments?
> 
> It seems to be stable for me, but we are still in testing mode at the
> moment.
> 
>> 4) How stable is the FC5 version, and does it include all of the
>> configuration utilities in the RH Enterprise Cluster version?  (the
>> idea would be to prove the point on FC5, then migrate to RH
>> Enterprise).
> 
> Haven't used that one.
> 
>> 5) Would CentOS be preferred over FC5 for the initial
>> proof of concept and early adoption?
> 
> If your eventual platform is RHEL, then CentOS would make more sense
> for a testing platform since it is almost identical to RHEL.  Fedora
> can be less stable and may introduce some issues that you wouldn't have
> with RHEL.  On the other hand, RHEL may have some problems that don't
> appear on Fedora because of updated packages.
> 
> If you want bleeding edge, use Fedora.
> If you want stability, use CentOS or RHEL.
> 
>> 6) Are there any restrictions or performance advantages of using all
>> drives with the same geometry, or can you mix and match different size
>> drives and just add to the aggregate volume size?
> 
> As I said earlier, GFS does not do the aggregation.
> 
> What you get with GFS is the ability to share an already networked
> storage volume.  You can use iSCSI, AoE, GNBD, or others to connect
> the storage to all of the cluster nodes.  Then you format the volume
> with GFS so that it can be used with all of the nodes.
> 
> I believe there is a project for the aggregate filesystem that you are
> looking for, but as far as I know, it is still beta.
> 



From gregp at liveammo.com  Wed Apr 12 15:28:13 2006
From: gregp at liveammo.com (Greg Perry)
Date: Wed, 12 Apr 2006 11:28:13 -0400
Subject: [Linux-cluster] Questions about GFS
In-Reply-To: <443D1AF7.8090105@liveammo.com>
References: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com>
	<443D1AF7.8090105@liveammo.com>
Message-ID: <443D1C8D.5080503@liveammo.com>

Also, after reviewing the GFS architecture it seems there would be 
significant security issues to consider, ie if one client/member of the 
GFS volume were compromised, that would lead to a full compromise of the 
filesystem across all nodes (and the ability to create special devices 
and modify the filesystem on any other GFS node member).  Are there any 
plans to include any form of discretionary or mandatory access controls 
for GFS in the upcoming v2 release?

Greg

Greg Perry wrote:
> Thanks Bowie, I understand more now.  So within this architecture, it 
> would make more sense to utilize a RAID-5/10 SAN, then add diskless 
> workstations as needed for performance...?
> 
> For said diskless workstations, does it make sense to run Stateless 
> Linux to keep the images the same across all of the workstations/client 
> machines?
> 
> Regards
> 
> Greg
> 
> Bowie Bailey wrote:
>> Greg Perry wrote:
>>> I have been researching GFS for a few days, and I have some questions
>>> that hopefully some seasoned users of GFS may be able to answer.
>>>
>>> I am working on the design of a linux cluster that needs to be
>>> scalable, it will be primarily an RDBMS-driven data warehouse used
>>> for data mining and content indexing.  In an ideal world, we would be
>>> able to start with a small (say 4 node) cluster, then add machines
>>> (and storage) as the various RDBMS' grow in size (as well as the use
>>> virtual IPs for load balancing across multiple lighttpd instances. 
>>> All machines on the node need to be able to talk to the same volume
>>> of information, and GFS (in theory at least) would be used to
>>> aggregate the drives from each machine into that huge shared logical
>>> volume).
>>> With that being said, here are some questions:
>>>
>>> 1) What is the preference on the RDBMS, will MySQL 5.x work and are
>>> there any locking issues to consider?  What would the best open source
>>> RDBMS be (MySQL vs. Postgresql etc)
>>
>> Someone more qualified than me will have to answer that question.
>>
>>> 2) If there was a 10 machine cluster, each with a 300GB SATA drive,
>>> can you use GFS to aggregate all 10 drives into one big logical 3000GB
>>> volume?  Would that scenario work similar to a RAID array?  If one or
>>> two nodes fail, but the GFS quorum is maintained, can those nodes be
>>> replaced and repopulated just like a RAID-5 array?  If this scenario
>>> is possible, how difficult is it to "grow" the shared logical volume
>>> by adding additional nodes (say I had two more machines each with a
>>> 300GB SATA drive)?
>>
>> GFS doesn't work that way.  GFS is just a fancy filesystem.  It takes
>> an already shared volume and allows all of the nodes to access it at
>> the same time.
>>
>>> 3) How stable is GFS currently, and is it used in many production
>>> environments?
>>
>> It seems to be stable for me, but we are still in testing mode at the
>> moment.
>>
>>> 4) How stable is the FC5 version, and does it include all of the
>>> configuration utilities in the RH Enterprise Cluster version?  (the
>>> idea would be to prove the point on FC5, then migrate to RH
>>> Enterprise).
>>
>> Haven't used that one.
>>
>>> 5) Would CentOS be preferred over FC5 for the initial
>>> proof of concept and early adoption?
>>
>> If your eventual platform is RHEL, then CentOS would make more sense
>> for a testing platform since it is almost identical to RHEL.  Fedora
>> can be less stable and may introduce some issues that you wouldn't have
>> with RHEL.  On the other hand, RHEL may have some problems that don't
>> appear on Fedora because of updated packages.
>>
>> If you want bleeding edge, use Fedora.
>> If you want stability, use CentOS or RHEL.
>>
>>> 6) Are there any restrictions or performance advantages of using all
>>> drives with the same geometry, or can you mix and match different size
>>> drives and just add to the aggregate volume size?
>>
>> As I said earlier, GFS does not do the aggregation.
>>
>> What you get with GFS is the ability to share an already networked
>> storage volume.  You can use iSCSI, AoE, GNBD, or others to connect
>> the storage to all of the cluster nodes.  Then you format the volume
>> with GFS so that it can be used with all of the nodes.
>>
>> I believe there is a project for the aggregate filesystem that you are
>> looking for, but as far as I know, it is still beta.
>>
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From hlawatschek at atix.de  Wed Apr 12 15:36:46 2006
From: hlawatschek at atix.de (Mark Hlawatschek)
Date: Wed, 12 Apr 2006 17:36:46 +0200
Subject: [Linux-cluster] Questions about GFS
In-Reply-To: <443D1AF7.8090105@liveammo.com>
References: <4766EEE585A6D311ADF500E018C154E3021338DA@bnifex.cis.buc.com>
	<443D1AF7.8090105@liveammo.com>
Message-ID: <200604121736.46956.hlawatschek@atix.de>

Greg,

you can use a diskless shared root configuration with gfs. This setup would 
enable you to add cluster nodes as you need them.
Have a look at  http://www.open-sharedroot.org/ 

Mark

On Wednesday 12 April 2006 17:21, Greg Perry wrote:
> Thanks Bowie, I understand more now.  So within this architecture, it
> would make more sense to utilize a RAID-5/10 SAN, then add diskless
> workstations as needed for performance...?
>
> For said diskless workstations, does it make sense to run Stateless
> Linux to keep the images the same across all of the workstations/client
> machines?
>
> Regards
>
> Greg
>
> Bowie Bailey wrote:
> > Greg Perry wrote:
> >> I have been researching GFS for a few days, and I have some questions
> >> that hopefully some seasoned users of GFS may be able to answer.
> >>
> >> I am working on the design of a linux cluster that needs to be
> >> scalable, it will be primarily an RDBMS-driven data warehouse used
> >> for data mining and content indexing.  In an ideal world, we would be
> >> able to start with a small (say 4 node) cluster, then add machines
> >> (and storage) as the various RDBMS' grow in size (as well as the use
> >> virtual IPs for load balancing across multiple lighttpd instances.
> >> All machines on the node need to be able to talk to the same volume
> >> of information, and GFS (in theory at least) would be used to
> >> aggregate the drives from each machine into that huge shared logical
> >> volume).
> >>
> >> With that being said, here are some questions:
> >>
> >> 1) What is the preference on the RDBMS, will MySQL 5.x work and are
> >> there any locking issues to consider?  What would the best open source
> >> RDBMS be (MySQL vs. Postgresql etc)
> >
> > Someone more qualified than me will have to answer that question.
> >
> >> 2) If there was a 10 machine cluster, each with a 300GB SATA drive,
> >> can you use GFS to aggregate all 10 drives into one big logical 3000GB
> >> volume?  Would that scenario work similar to a RAID array?  If one or
> >> two nodes fail, but the GFS quorum is maintained, can those nodes be
> >> replaced and repopulated just like a RAID-5 array?  If this scenario
> >> is possible, how difficult is it to "grow" the shared logical volume
> >> by adding additional nodes (say I had two more machines each with a
> >> 300GB SATA drive)?
> >
> > GFS doesn't work that way.  GFS is just a fancy filesystem.  It takes
> > an already shared volume and allows all of the nodes to access it at
> > the same time.
> >
> >> 3) How stable is GFS currently, and is it used in many production
> >> environments?
> >
> > It seems to be stable for me, but we are still in testing mode at the
> > moment.
> >
> >> 4) How stable is the FC5 version, and does it include all of the
> >> configuration utilities in the RH Enterprise Cluster version?  (the
> >> idea would be to prove the point on FC5, then migrate to RH
> >> Enterprise).
> >
> > Haven't used that one.
> >
> >> 5) Would CentOS be preferred over FC5 for the initial
> >> proof of concept and early adoption?
> >
> > If your eventual platform is RHEL, then CentOS would make more sense
> > for a testing platform since it is almost identical to RHEL.  Fedora
> > can be less stable and may introduce some issues that you wouldn't have
> > with RHEL.  On the other hand, RHEL may have some problems that don't
> > appear on Fedora because of updated packages.
> >
> > If you want bleeding edge, use Fedora.
> > If you want stability, use CentOS or RHEL.
> >
> >> 6) Are there any restrictions or performance advantages of using all
> >> drives with the same geometry, or can you mix and match different size
> >> drives and just add to the aggregate volume size?
> >
> > As I said earlier, GFS does not do the aggregation.
> >
> > What you get with GFS is the ability to share an already networked
> > storage volume.  You can use iSCSI, AoE, GNBD, or others to connect
> > the storage to all of the cluster nodes.  Then you format the volume
> > with GFS so that it can be used with all of the nodes.
> >
> > I believe there is a project for the aggregate filesystem that you are
> > looking for, but as far as I know, it is still beta.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Gruss / Regards,

Dipl.-Ing. Mark Hlawatschek
Phone: +49-89 121 409-55
http://www.atix.de/
http://www.open-sharedroot.org/

**
ATIX - Ges. fuer Informationstechnologie und Consulting mbH
Einsteinstr. 10 - 85716 Unterschleissheim - Germany



From Bowie_Bailey at BUC.com  Wed Apr 12 15:45:19 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Wed, 12 Apr 2006 11:45:19 -0400
Subject: [Linux-cluster] Questions about GFS
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338DB@bnifex.cis.buc.com>

As someone else pointed out, it is possible to run diskless
workstations with their root on the GFS.  I haven't tried this
configuration, so I don't know what issues their may be.  The security
issue is there.  Since they are all running from the same disk, a
compromise on one can corrupt the entire cluster.

On my systems, I just have a small hard drive to hold the OS and
applications and then mount the GFS as a data partition.

Bowie

Greg Perry wrote:
> Also, after reviewing the GFS architecture it seems there would be
> significant security issues to consider, ie if one client/member of
> the GFS volume were compromised, that would lead to a full compromise
> of the filesystem across all nodes (and the ability to create special
> devices and modify the filesystem on any other GFS node member).  Are
> there any plans to include any form of discretionary or mandatory
> access controls for GFS in the upcoming v2 release?
> 
> Greg
> 
> Greg Perry wrote:
> > Thanks Bowie, I understand more now.  So within this architecture,
> > it would make more sense to utilize a RAID-5/10 SAN, then add
> > diskless workstations as needed for performance...?
> > 
> > For said diskless workstations, does it make sense to run Stateless
> > Linux to keep the images the same across all of the
> > workstations/client machines? 
> > 
> > Regards
> > 
> > Greg
> > 
> > Bowie Bailey wrote:
> > > Greg Perry wrote:
> > > > I have been researching GFS for a few days, and I have some
> > > > questions that hopefully some seasoned users of GFS may be able
> > > > to answer. 
> > > > 
> > > > I am working on the design of a linux cluster that needs to be
> > > > scalable, it will be primarily an RDBMS-driven data warehouse
> > > > used for data mining and content indexing.  In an ideal world,
> > > > we would be able to start with a small (say 4 node) cluster,
> > > > then add machines (and storage) as the various RDBMS' grow in
> > > > size (as well as the use virtual IPs for load balancing across
> > > > multiple lighttpd instances. All machines on the node need to
> > > > be able to talk to the same volume of information, and GFS (in
> > > > theory at least) would be used to aggregate the drives from
> > > > each machine into that huge shared logical volume). With that
> > > > being said, here are some questions: 
> > > > 
> > > > 1) What is the preference on the RDBMS, will MySQL 5.x work and
> > > > are there any locking issues to consider?  What would the best
> > > > open source RDBMS be (MySQL vs. Postgresql etc)
> > > 
> > > Someone more qualified than me will have to answer that question.
> > > 
> > > > 2) If there was a 10 machine cluster, each with a 300GB SATA
> > > > drive, can you use GFS to aggregate all 10 drives into one big
> > > > logical 3000GB volume?  Would that scenario work similar to a
> > > > RAID array?  If one or two nodes fail, but the GFS quorum is
> > > > maintained, can those nodes be replaced and repopulated just
> > > > like a RAID-5 array?  If this scenario is possible, how
> > > > difficult is it to "grow" the shared logical volume by adding
> > > > additional nodes (say I had two more machines each with a 300GB
> > > > SATA drive)? 
> > > 
> > > GFS doesn't work that way.  GFS is just a fancy filesystem.  It
> > > takes an already shared volume and allows all of the nodes to
> > > access it at the same time. 
> > > 
> > > > 3) How stable is GFS currently, and is it used in many
> > > > production environments?
> > > 
> > > It seems to be stable for me, but we are still in testing mode at
> > > the moment. 
> > > 
> > > > 4) How stable is the FC5 version, and does it include all of the
> > > > configuration utilities in the RH Enterprise Cluster version? 
> > > > (the idea would be to prove the point on FC5, then migrate to RH
> > > > Enterprise).
> > > 
> > > Haven't used that one.
> > > 
> > > > 5) Would CentOS be preferred over FC5 for the initial
> > > > proof of concept and early adoption?
> > > 
> > > If your eventual platform is RHEL, then CentOS would make more
> > > sense for a testing platform since it is almost identical to
> > > RHEL.  Fedora can be less stable and may introduce some issues
> > > that you wouldn't have with RHEL.  On the other hand, RHEL may
> > > have some problems that don't appear on Fedora because of updated
> > > packages. 
> > > 
> > > If you want bleeding edge, use Fedora.
> > > If you want stability, use CentOS or RHEL.
> > > 
> > > > 6) Are there any restrictions or performance advantages of
> > > > using all drives with the same geometry, or can you mix and
> > > > match different size drives and just add to the aggregate
> > > > volume size? 
> > > 
> > > As I said earlier, GFS does not do the aggregation.
> > > 
> > > What you get with GFS is the ability to share an already networked
> > > storage volume.  You can use iSCSI, AoE, GNBD, or others to
> > > connect the storage to all of the cluster nodes.  Then you format
> > > the volume with GFS so that it can be used with all of the nodes.
> > > 
> > > I believe there is a project for the aggregate filesystem that
> > > you are looking for, but as far as I know, it is still beta.
> > > 
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster



From Bowie_Bailey at BUC.com  Wed Apr 12 15:48:19 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Wed, 12 Apr 2006 11:48:19 -0400
Subject: [Linux-cluster] Questions about GFS
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338DC@bnifex.cis.buc.com>

Also, keep in mind that the number of nodes is limited by the number
of journals on your GFS filesystem.  So when you create the
filesystem, you should add a few extra journals to accommodate
expansion.  If you run out, you have to add disks to the GFS in order
to create more journals.
Bowie

Mark Hlawatschek wrote:
> Greg,
> 
> you can use a diskless shared root configuration with gfs. This setup
> would enable you to add cluster nodes as you need them.
> Have a look at  http://www.open-sharedroot.org/
> 
> Mark
> 
> On Wednesday 12 April 2006 17:21, Greg Perry wrote:
> > Thanks Bowie, I understand more now.  So within this architecture,
> > it would make more sense to utilize a RAID-5/10 SAN, then add
> > diskless workstations as needed for performance...?
> > 
> > For said diskless workstations, does it make sense to run Stateless
> > Linux to keep the images the same across all of the
> > workstations/client machines? 



From tf0054 at gmail.com  Wed Apr 12 17:10:52 2006
From: tf0054 at gmail.com (Takeshi NAKANO)
Date: Thu, 13 Apr 2006 02:10:52 +0900
Subject: [Linux-cluster] Cisco fence agent
In-Reply-To: <1144766944.16956.10.camel@merlin.Mines.EDU>
References: <a51e1ec20604080923m778e8fbbtd6ab207666d6ac3c@mail.gmail.com>
	<1144766944.16956.10.camel@merlin.Mines.EDU>
Message-ID: <a51e1ec20604121010j3d95c5a9n1314a39c51b9fe40@mail.gmail.com>

Hello Matthew.

Thank for showing your code!
That is exactly same one which I will make.

> I like the network option because the host that is having problems
> will be able to write log entries after it has been fenced.

I can not agree more.

Thanks a lot.
Takeshi NAKANO.

2006/4/11, Matthew B. Brookover <mbrookov at mines.edu>:
>  I do not know if this will help, but here is what I put together.
>
>  We have 3 Cisco 3750 switches.  I am currently using SNMP to turn off the
> ports of a host that is being fenced.  I wrote a perl script called
> fence_cisco that works with GFS 6.  I have attached a copy of fence_cisco to
> this message and its config file.  I do not have much in the way of
> documentation for it, and it will probably take some hacking to get it to
> work with a current version of GFS.  If you know a little perl, writing a
> fencing agent is not very difficult.
>
>  I have also included a copy for the config file for fence_cisco.  The first
> two lines specify the SNMP community string and the IP address for the
> switch.  The rest is a list of hosts and the ports they use.  You will have
> to talk to your local network guru to figure out Cisco community strings and
> the numbers involved.  It took some tinkering to figure out how Cisco does
> this stuff, and even after writing the code, I am still not sure that I
> understand it.  I do know that it does work, GFS does do the correct things
> during a crash.
>
>  Most people use one of the power supply switches.  Redhat provides the
> fence_apc agent that will turn off the power to a node that needs to be
> fenced.  I like the network option because the host that is having problems
> will be able to write log entries after it has been fenced.
>
>  You will need to get the Net::SNMP module from cpan.org to use fence_cisco.
>  Matt
>
>
>
>  On Sun, 2006-04-09 at 01:23 +0900, ??? wrote:
>
>  Hi all. Do anyone have cisco catalyst fence agent? If nobody make that, I
> will make. Thanks.
> -- Linux-cluster mailing list Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From aaron at firebright.com  Wed Apr 12 19:25:49 2006
From: aaron at firebright.com (Aaron Stewart)
Date: Wed, 12 Apr 2006 12:25:49 -0700
Subject: [Linux-cluster] CLVM and AoE
Message-ID: <443D543D.2030202@firebright.com>

Hey All,

I'm currently in process of setting up a Coraid ATA over Ethernet device 
as a backend storage for multiple systems that export individual 
partitions to Xen virtual servers.  In our discussions with Coraid, they 
suggested looking into CLVM in order to handle this.

Obviously, I have some questions.. :)

- Has anyone used this kind of setup?  I have very little experience 
with Redhat's cluster management, but have a fairly high level of 
expertise overall in this arena.  
- How does management of LVM logical volumes occur?  Do we need to 
maintain one server that administers the volume group?
- What kind of pitfalls should we be aware of?

Can anyone point to any experience or any HOWTO's that discuss setting 
something like this up?

Here's the setup:

1. Coraid SR1520 configured in one lblade, exported via AoE on a 
dedicated storage network as one LUN
2. Centos4.2 on all cluster nodes
3. logical volumes get masked when getting passed into Xen, so on the 
Dom0 controller it should look like /dev/VolGroup00/{xenvmID} (which 
shows up in the virtual as /dev/sda1)
4. only one host need access to a given logical volume at any given 
time.  If migration needs to occur, the volume should be unmounted and 
remounted on another physical system.
5. Despite the fact that AoE is a layer 4 protocol, apparently it can 
coexist with IP on the same network interface, so we can transport 
cluster metadata over the same interface.  Barring that, there is a 
second (public) interface on each box.
6. We want to avoid a single point of failure (such as a second AoE 
server that exports luns from lvm lv's)

Thanks in advance..

-=Aaron Stewart
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aaron.vcf
Type: text/x-vcard
Size: 289 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060412/f69219b4/attachment.vcf>

From sanelson at gmail.com  Wed Apr 12 20:10:42 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Wed, 12 Apr 2006 21:10:42 +0100
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
Message-ID: <b6131fdc0604121310n5f1ca01bk4d7a0eab0ed8faf1@mail.gmail.com>

Hi All,

I'm assuming that most of us on this list have used HP MSA kit, so
excuse me a slightly off-topic question!

I've got a cluster connected to an MSA1000, but want to make some
changes on the MSA1000 itself.

I've got a dumb terminal that runs procom, but its pretty horrid, so
I've connected the controller direct to the serial port of one of the
linux machines to use minicom.

As per HP's documentation, I've set it up as:

pr port             /dev/ttyS0
pu baudrate         19200
pu bits             8
pu parity           N
pu stopbits         1

However, I get no response.

Any ideas on how to troubleshoot?  Anyone got this working?

S.



From greg.freemyer at gmail.com  Wed Apr 12 20:18:06 2006
From: greg.freemyer at gmail.com (Greg Freemyer)
Date: Wed, 12 Apr 2006 16:18:06 -0400
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
In-Reply-To: <b6131fdc0604121310n5f1ca01bk4d7a0eab0ed8faf1@mail.gmail.com>
References: <b6131fdc0604121310n5f1ca01bk4d7a0eab0ed8faf1@mail.gmail.com>
Message-ID: <87f94c370604121318y179cdd1as8bd8fc62d988ad99@mail.gmail.com>

Did you try 9600 baud?

Seems like all the Dec, I mean Compaq, I mean HP storage uses 9600 not 19200.

I don't know what the HP stuff uses that is not from the old Dec
storageworks line.

On 4/12/06, Steve Nelson <sanelson at gmail.com> wrote:
> Hi All,
>
> I'm assuming that most of us on this list have used HP MSA kit, so
> excuse me a slightly off-topic question!
>
> I've got a cluster connected to an MSA1000, but want to make some
> changes on the MSA1000 itself.
>
> I've got a dumb terminal that runs procom, but its pretty horrid, so
> I've connected the controller direct to the serial port of one of the
> linux machines to use minicom.
>
> As per HP's documentation, I've set it up as:
>
> pr port             /dev/ttyS0
> pu baudrate         19200
> pu bits             8
> pu parity           N
> pu stopbits         1
>
> However, I get no response.
>
> Any ideas on how to troubleshoot?  Anyone got this working?
>
> S.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


--
Greg Freemyer
The Norcross Group
Forensics for the 21st Century



From cjk at techma.com  Wed Apr 12 20:28:30 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Wed, 12 Apr 2006 16:28:30 -0400
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E54@tmaemail.techma.com>

Turn off flow control if it's on, save the config as default and restart
minicom.

Also, make sure you are using the HP supplied cable and not some one off
or general serial cable. In true HP form, it's a custom cable... 

If that doesn't work, here are some things to check..

1. The HP cable is plugged into the _front_ of the MSA (the back is all
fibre)
2. Make sure your serial port is not being used by something else (serial
terminal)
3. umm, I dunno, these are pretty simple...


Good luck

Regards,



Corey 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Steve Nelson
Sent: Wednesday, April 12, 2006 4:11 PM
To: linux clustering
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000

Hi All,

I'm assuming that most of us on this list have used HP MSA kit, so excuse me
a slightly off-topic question!

I've got a cluster connected to an MSA1000, but want to make some changes on
the MSA1000 itself.

I've got a dumb terminal that runs procom, but its pretty horrid, so I've
connected the controller direct to the serial port of one of the linux
machines to use minicom.

As per HP's documentation, I've set it up as:

pr port             /dev/ttyS0
pu baudrate         19200
pu bits             8
pu parity           N
pu stopbits         1

However, I get no response.

Any ideas on how to troubleshoot?  Anyone got this working?

S.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From cjk at techma.com  Wed Apr 12 20:29:05 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Wed, 12 Apr 2006 16:29:05 -0400
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E55@tmaemail.techma.com>

MSA1x00's use 19200...  it's an oddball


Regards


Corey 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Greg Freemyer
Sent: Wednesday, April 12, 2006 4:18 PM
To: linux clustering
Subject: Re: [Linux-cluster] [OT] Serial Connection to MSA1000

Did you try 9600 baud?

Seems like all the Dec, I mean Compaq, I mean HP storage uses 9600 not 19200.

I don't know what the HP stuff uses that is not from the old Dec storageworks
line.

On 4/12/06, Steve Nelson <sanelson at gmail.com> wrote:
> Hi All,
>
> I'm assuming that most of us on this list have used HP MSA kit, so 
> excuse me a slightly off-topic question!
>
> I've got a cluster connected to an MSA1000, but want to make some 
> changes on the MSA1000 itself.
>
> I've got a dumb terminal that runs procom, but its pretty horrid, so 
> I've connected the controller direct to the serial port of one of the 
> linux machines to use minicom.
>
> As per HP's documentation, I've set it up as:
>
> pr port             /dev/ttyS0
> pu baudrate         19200
> pu bits             8
> pu parity           N
> pu stopbits         1
>
> However, I get no response.
>
> Any ideas on how to troubleshoot?  Anyone got this working?
>
> S.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


--
Greg Freemyer
The Norcross Group
Forensics for the 21st Century

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From cjk at techma.com  Wed Apr 12 20:30:42 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Wed, 12 Apr 2006 16:30:42 -0400
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E56@tmaemail.techma.com>

Could be that someone else changed the baud setting tho, so Greg has a good
point..

If someone used to 9600 worked on it, they might have changed it cuz the
default
wuz "wrong"  :)


Corey 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Greg Freemyer
Sent: Wednesday, April 12, 2006 4:18 PM
To: linux clustering
Subject: Re: [Linux-cluster] [OT] Serial Connection to MSA1000

Did you try 9600 baud?

Seems like all the Dec, I mean Compaq, I mean HP storage uses 9600 not 19200.

I don't know what the HP stuff uses that is not from the old Dec storageworks
line.

On 4/12/06, Steve Nelson <sanelson at gmail.com> wrote:
> Hi All,
>
> I'm assuming that most of us on this list have used HP MSA kit, so 
> excuse me a slightly off-topic question!
>
> I've got a cluster connected to an MSA1000, but want to make some 
> changes on the MSA1000 itself.
>
> I've got a dumb terminal that runs procom, but its pretty horrid, so 
> I've connected the controller direct to the serial port of one of the 
> linux machines to use minicom.
>
> As per HP's documentation, I've set it up as:
>
> pr port             /dev/ttyS0
> pu baudrate         19200
> pu bits             8
> pu parity           N
> pu stopbits         1
>
> However, I get no response.
>
> Any ideas on how to troubleshoot?  Anyone got this working?
>
> S.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


--
Greg Freemyer
The Norcross Group
Forensics for the 21st Century

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From sanelson at gmail.com  Wed Apr 12 20:28:51 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Wed, 12 Apr 2006 21:28:51 +0100
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
In-Reply-To: <87f94c370604121318y179cdd1as8bd8fc62d988ad99@mail.gmail.com>
References: <b6131fdc0604121310n5f1ca01bk4d7a0eab0ed8faf1@mail.gmail.com>
	<87f94c370604121318y179cdd1as8bd8fc62d988ad99@mail.gmail.com>
Message-ID: <b6131fdc0604121328x3b097ae3hec816f7bb986a2ce@mail.gmail.com>

On 4/12/06, Greg Freemyer <greg.freemyer at gmail.com> wrote:
> Did you try 9600 baud?

I did...

I am assuming /dev/ttyS0 is correct - it only has one serial port!

S.



From sanelson at gmail.com  Wed Apr 12 20:40:49 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Wed, 12 Apr 2006 21:40:49 +0100
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E54@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E54@tmaemail.techma.com>
Message-ID: <b6131fdc0604121340g445a8692pdd2a857c93c0e634@mail.gmail.com>

On 4/12/06, Kovacs, Corey J. <cjk at techma.com> wrote:
> Turn off flow control if it's on, save the config as default and restart
> minicom.

Thanks very much.  I had turned off flow control, but saving as
default, and restarting appeared to make the difference.

Welcome to minicom 2.00.0

OPTIONS: History Buffer, F-key Macros, Search History Buffer, I18n
Compiled on Sep 12 2003, 17:33:22.

Press CTRL-A Z for help on special keys


Invalid CLI command.

CLI> AT S7=45 S0=0 L1 V1 X4 &c1 E1 Q0
Invalid CLI command.

CLI>

Incidentally, how do I get it not to send that dialling stuff?

> Corey

S.



From Bowie_Bailey at BUC.com  Wed Apr 12 20:59:26 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Wed, 12 Apr 2006 16:59:26 -0400
Subject: [Linux-cluster] CLVM and AoE
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com>

Aaron Stewart wrote:
> 
> I'm currently in process of setting up a Coraid ATA over Ethernet
> device as a backend storage for multiple systems that export
> individual partitions to Xen virtual servers.  In our discussions
> with Coraid, they suggested looking into CLVM in order to handle this.
> 
> Obviously, I have some questions.. :)
> 
> - Has anyone used this kind of setup?  I have very little experience
> with Redhat's cluster management, but have a fairly high level of
> expertise overall in this arena.

I don't know anything about Xen, but I am using this same basic setup
on my systems.

> - How does management of LVM logical volumes occur?  Do we need to
> maintain one server that administers the volume group?

The management is distributed.  You can manage the cluster and volume
groups from any node.

> - What kind of pitfalls should we be aware of?

Some people have complained about throughput issues with GFS.  Our
application doesn't require high throughput, so I can't comment on
this.  I haven't found any issues in my testing so far.

> Can anyone point to any experience or any HOWTO's that discuss setting
> something like this up?

There are a few documents, but most of the ones that I've seen are out
of date.  If you have specific questions, you can ask here.

If you don't have it already, here is the yum config with the current
cluster RPMs for CentOS.  Just drop it in a file in /etc/yum.repos.d/.
Note that the current cluster RPMs are for the new 2.6.9-34.EL kernel.

----------------------------
[csgfs]
name=CentOS-4 - CSGFS
baseurl=http://mirror.centos.org/centos/$releasever/csgfs/$basearch/
gpgcheck=1
enabled=1
----------------------------

The only thing you need to build from source is the AoE driver from
CoRaid.

> Here's the setup:
> 
> 1. Coraid SR1520 configured in one lblade, exported via AoE on a
> dedicated storage network as one LUN
> 2. Centos4.2 on all cluster nodes
> 3. logical volumes get masked when getting passed into Xen, so on the
> Dom0 controller it should look like /dev/VolGroup00/{xenvmID} (which
> shows up in the virtual as /dev/sda1)
> 4. only one host need access to a given logical volume at any given
> time.  If migration needs to occur, the volume should be unmounted and
> remounted on another physical system.

This can be done, but the cluster will not do it for you.  Each
logical volume can be accessed by as many nodes as you need.  Note
that you need one GFS journal per node that needs simultaneous access.

> 5. Despite the fact that AoE is a layer 4 protocol, apparently it can
> coexist with IP on the same network interface, so we can transport
> cluster metadata over the same interface.  Barring that, there is a
> second (public) interface on each box.
> 6. We want to avoid a single point of failure (such as a second AoE
> server that exports luns from lvm lv's)

Now that DLM is the recommended locking manager, everything is
distributed.  Your only single point of failure is the CoRaid box.

-- 
Bowie



From aaron at firebright.com  Wed Apr 12 21:11:24 2006
From: aaron at firebright.com (Aaron Stewart)
Date: Wed, 12 Apr 2006 14:11:24 -0700
Subject: [Linux-cluster] CLVM and AoE
In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com>
References: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com>
Message-ID: <443D6CFC.7000507@firebright.com>

Hey Bowie,

Wow.. That's perfect.  Thanks for the response.

I have a question about whether GFS is a requirement.. Since each lv is 
a separate partition mounted on xen, does GFS make sense, or can we use 
ext3/xfs/etc.?

-=Aaron

Bowie Bailey wrote:
> Aaron Stewart wrote:
>   
>> I'm currently in process of setting up a Coraid ATA over Ethernet
>> device as a backend storage for multiple systems that export
>> individual partitions to Xen virtual servers.  In our discussions
>> with Coraid, they suggested looking into CLVM in order to handle this.
>>
>> Obviously, I have some questions.. :)
>>
>> - Has anyone used this kind of setup?  I have very little experience
>> with Redhat's cluster management, but have a fairly high level of
>> expertise overall in this arena.
>>     
>
> I don't know anything about Xen, but I am using this same basic setup
> on my systems.
>
>   
>> - How does management of LVM logical volumes occur?  Do we need to
>> maintain one server that administers the volume group?
>>     
>
> The management is distributed.  You can manage the cluster and volume
> groups from any node.
>
>   
>> - What kind of pitfalls should we be aware of?
>>     
>
> Some people have complained about throughput issues with GFS.  Our
> application doesn't require high throughput, so I can't comment on
> this.  I haven't found any issues in my testing so far.
>
>   
>> Can anyone point to any experience or any HOWTO's that discuss setting
>> something like this up?
>>     
>
> There are a few documents, but most of the ones that I've seen are out
> of date.  If you have specific questions, you can ask here.
>
> If you don't have it already, here is the yum config with the current
> cluster RPMs for CentOS.  Just drop it in a file in /etc/yum.repos.d/.
> Note that the current cluster RPMs are for the new 2.6.9-34.EL kernel.
>
> ----------------------------
> [csgfs]
> name=CentOS-4 - CSGFS
> baseurl=http://mirror.centos.org/centos/$releasever/csgfs/$basearch/
> gpgcheck=1
> enabled=1
> ----------------------------
>
> The only thing you need to build from source is the AoE driver from
> CoRaid.
>
>   
>> Here's the setup:
>>
>> 1. Coraid SR1520 configured in one lblade, exported via AoE on a
>> dedicated storage network as one LUN
>> 2. Centos4.2 on all cluster nodes
>> 3. logical volumes get masked when getting passed into Xen, so on the
>> Dom0 controller it should look like /dev/VolGroup00/{xenvmID} (which
>> shows up in the virtual as /dev/sda1)
>> 4. only one host need access to a given logical volume at any given
>> time.  If migration needs to occur, the volume should be unmounted and
>> remounted on another physical system.
>>     
>
> This can be done, but the cluster will not do it for you.  Each
> logical volume can be accessed by as many nodes as you need.  Note
> that you need one GFS journal per node that needs simultaneous access.
>
>   
>> 5. Despite the fact that AoE is a layer 4 protocol, apparently it can
>> coexist with IP on the same network interface, so we can transport
>> cluster metadata over the same interface.  Barring that, there is a
>> second (public) interface on each box.
>> 6. We want to avoid a single point of failure (such as a second AoE
>> server that exports luns from lvm lv's)
>>     
>
> Now that DLM is the recommended locking manager, everything is
> distributed.  Your only single point of failure is the CoRaid box.
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: aaron.vcf
Type: text/x-vcard
Size: 289 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060412/7961a6c6/attachment.vcf>

From mtp at tilted.com  Wed Apr 12 21:29:00 2006
From: mtp at tilted.com (Mark Petersen)
Date: Wed, 12 Apr 2006 16:29:00 -0500
Subject: [Linux-cluster] CLVM and AoE
In-Reply-To: <443D6CFC.7000507@firebright.com>
References: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com>
	<443D6CFC.7000507@firebright.com>
Message-ID: <7.0.1.0.2.20060412162416.028964f0@tilted.com>

At 04:11 PM 4/12/2006, you wrote:
>Hey Bowie,
>
>Wow.. That's perfect.  Thanks for the response.
>
>I have a question about whether GFS is a requirement.. Since each lv 
>is a separate partition mounted on xen, does GFS make sense, or can 
>we use ext3/xfs/etc.?

So is every dom0 going to mount the CoRaid device directly using 
AoE?  And CLVM will notify the whole cluster when any single node 
makes LVM changes?

If not, then you'll need to use GNBD to export the lv's I guess.

Either way you can use whatever fs you have support for in a xenU 
kernel.  You shouldn't need to format anything GFS at all. 



From lhh at redhat.com  Wed Apr 12 22:07:05 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 12 Apr 2006 18:07:05 -0400
Subject: [Linux-cluster] Help-me, Please
In-Reply-To: <9e7b71460604101657n1eebc099jfaabb5a08ebbc630@mail.gmail.com>
References: <9e7b71460604101657n1eebc099jfaabb5a08ebbc630@mail.gmail.com>
Message-ID: <1144879625.15794.48.camel@ayanami.boston.redhat.com>

On Mon, 2006-04-10 at 20:57 -0300, ANDRE LUIS FORIGATO wrote:
> Linux xlx2 2.4.21-27.0.2.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005
> i686 i686 i386 GNU/Linux

> Apr 10 01:18:07 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
> member #0: Connection timed out
> Apr 10 05:13:43 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
> member #0: Connection timed out
> Apr 10 05:13:43 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
> lock: No locks available
> Apr 10 05:13:49 xlx2 cluquorumd[4463]: <info> Disk-TB: Partner is DOWN
> (Dead/Hung)
> Apr 10 05:13:54 xlx2 cluquorumd[4463]: <info> Disk-TB: State Change: Partner UP
> Apr 10 10:47:08 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
> member #0: Connection timed out
> Apr 10 10:47:08 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
> lock: No locks available
> Apr 10 11:30:59 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
> member #0: Connection timed out
> Apr 10 11:30:59 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
> lock: No locks available
> Apr 10 11:31:07 xlx2 clumembd[4493]: <info> Membership View #5:0x00000002
> Apr 10 11:31:08 xlx2 cluquorumd[4463]: <warning> Membership reports #0
> as down, but disk reports as up: State uncertain!
> Apr 10 11:31:08 xlx2 cluquorumd[4463]: <warning> --> Commencing STONITH <--
> Apr 10 11:31:08 xlx2 cluquorumd[4463]: <info> Disk-TB: Partner is DOWN
> (Dead/Hung)
> Apr 10 11:31:10 xlx2 cluquorumd[4463]: <info> Disk-TB: State Change: Partner UP
> Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: <info> Quorum Event: View #12 0x00000002
> Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: <warning> Member
> 200.254.254.171's state is uncertain: Some services may be
> unavailable!
> Apr 10 11:31:18 xlx2 clusvcmgrd[4671]: <info> Quorum Event: View #13 0x00000002
> Apr 10 11:31:29 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
> member #0: Connection timed out
> Apr 10 11:31:29 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
> lock: No locks available
> Apr 10 11:31:34 xlx2 cluquorumd[4463]: <info> Disk-TB: Partner is DOWN
> (Dead/Hung)
> Apr 10 11:31:38 xlx2 cluquorumd[4463]: <warning> --> Commencing STONITH <--
> Apr 10 11:31:38 xlx2 cluquorumd[4463]: <warning> STONITH: Falsely
> claiming that 200.254.254.171 has been fenced
> Apr 10 11:31:38 xlx2 cluquorumd[4463]: <crit> STONITH: Data integrity
> may be compromised!
> Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
> member #0: Connection timed out
> Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
> lock: No locks available
> Apr 10 11:31:40 xlx2 clusvcmgrd[4671]: <info> Quorum Event: View #15 0x00000002
> Apr 10 11:31:41 xlx2 clusvcmgrd[4671]: <info> State change:
> 200.254.254.172 DOWN
> Apr 10 11:34:08 xlx2 cluquorumd[4463]: <info> Disk-TB: State Change: Partner UP
> Apr 10 11:34:09 xlx2 clusvcmgrd[4671]: <info> Quorum Event: View #16 0x00000002
> Apr 10 11:34:16 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
> member #0: No route to host
> Apr 10 11:34:16 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
> lock: No locks available
> Apr 10 11:34:25 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
> member #0: No route to host
> Apr 10 11:34:25 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
> lock: No locks available
> Apr 10 11:34:34 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
> member #0: No route to host
> Apr 10 11:34:34 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
> lock: No locks available
> Apr 10 11:34:43 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
> member #0: No route to host
> Apr 10 11:34:43 xlx2 clusvcmgrd[4671]: <err> Unable to obtain cluster
> lock: No locks available
> Apr 10 11:34:50 xlx2 clumembd[4493]: <notice> Member 200.254.254.171 UP
> Apr 10 11:34:50 xlx2 clumembd[4493]: <info> Membership View #6:0x00000003
> Apr 10 11:34:50 xlx2 cluquorumd[4463]: <err> __msg_send: Incomplete
> write to 13. Error: Connection reset by peer
> Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: <info> Quorum Event: View #17 0x00000003
> Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: <info> State change: Local UP
> Apr 10 11:34:51 xlx2 clusvcmgrd[4671]: <info> State change: 200.254.254.171 UP
> Apr 10 13:21:25 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
> member #0: Connection timed out
> Apr 10 17:03:22 xlx2 clusvcmgrd[4671]: <crit> Couldn't connect to
> member #0: Connection timed out
> Apr 10 20:30:30 xlx2 clulockd[4498]: <warning> Denied 200.254.254.171:
> Broken pipe
> Apr 10 20:30:30 xlx2 clulockd[4498]: <err> select error: Broken pipe

What were you doing when this happened?

-- Lon



From lhh at redhat.com  Wed Apr 12 22:13:25 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 12 Apr 2006 18:13:25 -0400
Subject: [Linux-cluster] Cluster node not able to access all cluster
	resource
In-Reply-To: <0C5C8B118420264EBB94D7D7050150011EFA92@exchange2.comune.prato.local>
References: <0C5C8B118420264EBB94D7D7050150011EFA92@exchange2.comune.prato.local>
Message-ID: <1144880005.15794.54.camel@ayanami.boston.redhat.com>

On Sat, 2006-04-08 at 19:05 +0200, Leandro Dardini wrote:
> The topic is not a problem, but what I want to do. I have a lots of
> service, each on is now run by a two node cluster. This is very bad due
> to each node fencing other one during network blackout. I'd like to
> create only one cluster, but each resource, either GFS filesystems, must
> be readable only by a limited number of nodes.
> 
> For example, taking a Cluster "test" made of node A, node B, node C,
> node D and with the following resources: GFS Filesystem alpha and GFS
> Filesystem beta. I want that only node A and node B can access GFS
> Filesystem alpha and only node C and node D can access GFS Filesystem
> beta.
> 
> Is it possible?

You can just mount alpha on {A B} and beta on {C D}, but I don't think
there is an easy way to forcefully prevent mounting alpha on {C D}
currently; someone else might know better.

-- Lon



From lhh at redhat.com  Wed Apr 12 22:36:27 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 12 Apr 2006 18:36:27 -0400
Subject: [Linux-cluster] issues with rhcs 4.2
In-Reply-To: <20060408164804.54434.qmail@web8319.mail.in.yahoo.com>
References: <20060408164804.54434.qmail@web8319.mail.in.yahoo.com>
Message-ID: <1144881387.15794.66.camel@ayanami.boston.redhat.com>

On Sat, 2006-04-08 at 17:48 +0100, Kumaresh Ponnuswamy wrote:
> hi,
>  
> I recently migrated from rhcs 3 to rhcs 4.2 and since then I am unable
> to bring up the clustered services.
>  
> Even though the services are getting executed (like the VIP, shared
> devices etc), the status in clustat and system-config-cluster still
> displays failed and because of this the failover is not happening. 
 
> Any light on this will be much appreciated. Cluster is on RHEL AS 4U2
> with two nodes.

The part that fails should be in the log.  My guess is that it is the
script.  Rgmanager expects LSB behavior - i.e. "stop after stop" should
return 0, not 1.  If we have a '1' return code, rgmanager thinks the
service has failed to stop - so the service can not fail over (resources
might still be allocated!).

See:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151104
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=173991

-- Lon





From cjkovacs at verizon.net  Thu Apr 13 01:42:55 2006
From: cjkovacs at verizon.net (Corey Kovacs)
Date: Wed, 12 Apr 2006 21:42:55 -0400
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
In-Reply-To: <b6131fdc0604121340g445a8692pdd2a857c93c0e634@mail.gmail.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E54@tmaemail.techma.com>
	<b6131fdc0604121340g445a8692pdd2a857c93c0e634@mail.gmail.com>
Message-ID: <1144892575.8357.1.camel@ronin.home.net>

Good to hear it's working...

You can get rid of the modem stuff by pressing CTRL+O, then select the
modem settings option. Just clean everything that you can out and save
again.

Have fun...


Corey



On Wed, 2006-04-12 at 21:40 +0100, Steve Nelson wrote:
> On 4/12/06, Kovacs, Corey J. <cjk at techma.com> wrote:
> > Turn off flow control if it's on, save the config as default and restart
> > minicom.
> 
> Thanks very much.  I had turned off flow control, but saving as
> default, and restarting appeared to make the difference.
> 
> Welcome to minicom 2.00.0
> 
> OPTIONS: History Buffer, F-key Macros, Search History Buffer, I18n
> Compiled on Sep 12 2003, 17:33:22.
> 
> Press CTRL-A Z for help on special keys
> 
> 
> Invalid CLI command.
> 
> CLI> AT S7=45 S0=0 L1 V1 X4 &c1 E1 Q0
> Invalid CLI command.
> 
> CLI>
> 
> Incidentally, how do I get it not to send that dialling stuff?
> 
> > Corey
> 
> S.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From cjkovacs at verizon.net  Thu Apr 13 04:07:53 2006
From: cjkovacs at verizon.net (Corey Kovacs)
Date: Thu, 13 Apr 2006 00:07:53 -0400
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
In-Reply-To: <1144892575.8357.1.camel@ronin.home.net>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E54@tmaemail.techma.com>
	<b6131fdc0604121340g445a8692pdd2a857c93c0e634@mail.gmail.com>
	<1144892575.8357.1.camel@ronin.home.net>
Message-ID: <1144901273.8357.3.camel@ronin.home.net>

Sorry, that should be "CTRL+A, then o"

On Wed, 2006-04-12 at 21:42 -0400, Corey Kovacs wrote:
> Good to hear it's working...
> 
> You can get rid of the modem stuff by pressing CTRL+O, then select the
> modem settings option. Just clean everything that you can out and save
> again.
> 
> Have fun...
> 
> 
> Corey
> 
> 
> 
> On Wed, 2006-04-12 at 21:40 +0100, Steve Nelson wrote:
> > On 4/12/06, Kovacs, Corey J. <cjk at techma.com> wrote:
> > > Turn off flow control if it's on, save the config as default and restart
> > > minicom.
> > 
> > Thanks very much.  I had turned off flow control, but saving as
> > default, and restarting appeared to make the difference.
> > 
> > Welcome to minicom 2.00.0
> > 
> > OPTIONS: History Buffer, F-key Macros, Search History Buffer, I18n
> > Compiled on Sep 12 2003, 17:33:22.
> > 
> > Press CTRL-A Z for help on special keys
> > 
> > 
> > Invalid CLI command.
> > 
> > CLI> AT S7=45 S0=0 L1 V1 X4 &c1 E1 Q0
> > Invalid CLI command.
> > 
> > CLI>
> > 
> > Incidentally, how do I get it not to send that dialling stuff?
> > 
> > > Corey
> > 
> > S.
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From kumaresh81 at yahoo.co.in  Thu Apr 13 07:06:53 2006
From: kumaresh81 at yahoo.co.in (Kumaresh Ponnuswamy)
Date: Thu, 13 Apr 2006 08:06:53 +0100 (BST)
Subject: [Linux-cluster] issues with rhcs 4.2
In-Reply-To: <1144881387.15794.66.camel@ayanami.boston.redhat.com>
Message-ID: <20060413070653.37951.qmail@web8326.mail.in.yahoo.com>

Hi,
   
  thanks for the mail.
   
  the issue is that RHCS 4 expects an RC script rather than a normal script. After making it an RC script, the cluster is working.
   
  Regards,
  Kumaresh

Lon Hohberger <lhh at redhat.com> wrote:
  On Sat, 2006-04-08 at 17:48 +0100, Kumaresh Ponnuswamy wrote:
> hi,
> 
> I recently migrated from rhcs 3 to rhcs 4.2 and since then I am unable
> to bring up the clustered services.
> 
> Even though the services are getting executed (like the VIP, shared
> devices etc), the status in clustat and system-config-cluster still
> displays failed and because of this the failover is not happening. 

> Any light on this will be much appreciated. Cluster is on RHEL AS 4U2
> with two nodes.

The part that fails should be in the log. My guess is that it is the
script. Rgmanager expects LSB behavior - i.e. "stop after stop" should
return 0, not 1. If we have a '1' return code, rgmanager thinks the
service has failed to stop - so the service can not fail over (resources
might still be allocated!).

See:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151104
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=173991

-- Lon



--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


				
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060413/2931d213/attachment.htm>

From sanelson at gmail.com  Thu Apr 13 07:54:32 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Thu, 13 Apr 2006 08:54:32 +0100
Subject: [Linux-cluster] [OT] Serial Connection to MSA1000
In-Reply-To: <1144901273.8357.3.camel@ronin.home.net>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E54@tmaemail.techma.com>
	<b6131fdc0604121340g445a8692pdd2a857c93c0e634@mail.gmail.com>
	<1144892575.8357.1.camel@ronin.home.net>
	<1144901273.8357.3.camel@ronin.home.net>
Message-ID: <b6131fdc0604130054g90c81e7o5d4255c418b875ce@mail.gmail.com>

On 4/13/06, Corey Kovacs <cjkovacs at verizon.net> wrote:

> Sorry, that should be "CTRL+A, then o"

Yeah, I quickly got the hang of the minicom interface as it seems to
be just the same as screen and ratpoison (and thus I suppose emacs?).

Thanks for your help!

S.



From Alain.Moulle at bull.net  Thu Apr 13 09:04:09 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Thu, 13 Apr 2006 11:04:09 +0200
Subject: [Linux-cluster] CS4 Update 2 / question about quorum
Message-ID: <443E1409.4050700@bull.net>

Hi

A question a little bit theoretical for my understanding :
for a cluster with 8 nodes, I understand that each node
has by default a Quorum Votes value = 1 , so does that
mean that until 3 nodes are failed , services are failovered
by others, and that at the 4th failed one, the cluster
is stalled in the current state ?
And in which cases would it be judicious to set the
Quorum Vote for some nodes at 2 or more ?
Or is there a way to modify the % to define if
the cluster is quorate or note ?
For example, let's suppose that on the 8 nodes cluster,
whatever 2 nodes are able (in term of capacity/perf ...)
to run all HA services of the 8 nodes, is-it possible
to configure the cs4 such as the failover will be possible
even if 6 nodes are failed ?

Thanks
Alain Moull?




From pcaulfie at redhat.com  Thu Apr 13 09:16:07 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 13 Apr 2006 10:16:07 +0100
Subject: [Linux-cluster] CS4 Update 2 / question about quorum
In-Reply-To: <443E1409.4050700@bull.net>
References: <443E1409.4050700@bull.net>
Message-ID: <443E16D7.5000606@redhat.com>

Alain Moulle wrote:
> Hi
> 
> A question a little bit theoretical for my understanding :
> for a cluster with 8 nodes, I understand that each node
> has by default a Quorum Votes value = 1 , so does that
> mean that until 3 nodes are failed , services are failovered
> by others, and that at the 4th failed one, the cluster
> is stalled in the current state ?
> And in which cases would it be judicious to set the
> Quorum Vote for some nodes at 2 or more ?
> Or is there a way to modify the % to define if
> the cluster is quorate or note ?
> For example, let's suppose that on the 8 nodes cluster,
> whatever 2 nodes are able (in term of capacity/perf ...)
> to run all HA services of the 8 nodes, is-it possible
> to configure the cs4 such as the failover will be possible
> even if 6 nodes are failed ?

Quorum is not really related to failover, it's to prevent "split-brain" so
that (eg) a service doesn't end up running on two nodes that can't talk to
each other or (more importantly) that a GFS filesystem doesn't get corrupted
by two non-cooperating systems.

Yes, it's possible to set votes on some machines higher than others but you
need to be very careful that you do your calculations correctly such that you
can't get into a split brain situation if two higher-rated nodes split off
into two separate clusters.

Fiddling with the node votes is most useful where you have server (perhaps
gnbd) nodes in the cluster without which the satellites can't work.

patrick



From carlopmart at gmail.com  Thu Apr 13 10:39:50 2006
From: carlopmart at gmail.com (carlopmart)
Date: Thu, 13 Apr 2006 12:39:50 +0200
Subject: [Linux-cluster] OT: Tomcat with RHEL on CS
Message-ID: <443E2A76.8070705@gmail.com>

Hi all,

  I need some help to accomplish the following task: I need to setup a 
high availability cluster for Tomcat+Apache. I can not use a shared 
storage (content pages html, jsp and so on are static). Which can be the 
best form: use Cluster Suite with RHEL, heartbeat from linx-ha.org, 
keepalived or another one??

Many thanks.


-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From dgolden at cp.dias.ie  Thu Apr 13 11:13:43 2006
From: dgolden at cp.dias.ie (David Golden)
Date: Thu, 13 Apr 2006 12:13:43 +0100
Subject: [Linux-cluster] CLVM and AoE
In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com>
References: <4766EEE585A6D311ADF500E018C154E3021338E6@bnifex.cis.buc.com>
Message-ID: <20060413111342.GA3168@ariadne.cp.dias.ie>

On 2006-04-12 16:59:26 -0400, Bowie Bailey wrote:
> > - What kind of pitfalls should we be aware of?
> 

> Some people have complained about throughput issues with GFS.  Our
> application doesn't require high throughput, so I can't comment on
> this.  I haven't found any issues in my testing so far.
>

Well, a thing I _think_ we've seen a few times is that the case of many
simultaneous  writes to different files in different directories is MUCH faster
than many simultaneous writes to different files in the same directory. 
I think this may have been mentioned before on-list, IIRC it's a design trade-off,
something to do  with GFS's efforts to preserve strict unix-like
consistency (generally regarded as a major advantage of GFS over
the horrors of NFS), directory metadata about the files needs to be 
updated an awful lot in the same  directory case, and the
directory therefore needs to be locked for update an awful lot, 
which can lead to much slowdown.

I don't have hard numbers, nor available facilities to generate them
right now, so feel free to regard this as FUD. 



From pcaulfie at redhat.com  Thu Apr 13 13:27:10 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 13 Apr 2006 14:27:10 +0100
Subject: [Linux-cluster] New cman & ccs
Message-ID: <443E51AE.5090007@redhat.com>

I've written a short web page on the differences between the 'old' in-kernel
cman (in the RHEL4 & STABLE branches) and the 'new' userspace openAIS-based cman.

http://people.redhat.com/pcaulfie/cmanccs.html

This isn't a tutorial on CCS or cluster.conf, it just outlines what is
different between the two. The only non-forwards compatible bit is that the
userland version needs nodeids assigning. ccs_tool now has a subcommand to do
this for you.
-- 

patrick



From Bowie_Bailey at BUC.com  Thu Apr 13 14:40:22 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Thu, 13 Apr 2006 10:40:22 -0400
Subject: [Linux-cluster] CLVM and AoE
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338EC@bnifex.cis.buc.com>

Aaron Stewart wrote:
> Hey Bowie,
> 
> Wow.. That's perfect.  Thanks for the response.
> 
> I have a question about whether GFS is a requirement.. Since each lv
> is a separate partition mounted on xen, does GFS make sense, or can
> we use ext3/xfs/etc.?

What you get from GFS is the ability for multiple nodes to mount the
filesystem simultaneously.  If you are never going to do this, then
you can use any filesystem you want.  CLVM can handle management of
the lv's across the nodes.

If you don't use GFS, just make absolutely sure that there is no way
that two nodes could mount the same lv.  As far as I know, there is
nothing in the cluster that will prevent an ext3 or xfs filesystem
from being mounted by multiple nodes.  And if it happens, you have
almost guaranteed data corruption.

-- 
Bowie



From Bowie_Bailey at BUC.com  Thu Apr 13 15:33:07 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Thu, 13 Apr 2006 11:33:07 -0400
Subject: [Linux-cluster] bootup error - undefined symbol: lvm_snprintf after
	update
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338EF@bnifex.cis.buc.com>

This is an x86_64 system that I just updated to the newest Cluster rpms.

When I watch the bootup on the console, I see an error:

lvm.static: symbol lookup error: /usr/lib64/liblvm2clusterlock.so:
   undefined symbol: lvm_snprintf

This error comes immediately after the "Activating VGs" line, so it
appears to be triggered by the vgchange command in the clvmd startup
file.  I have another, identically configured, server which I have not
updated yet.  This server does not give the error.

Everything seems to be working fine, so is this something I need to
worry about?

--
Bowie



From sanelson at gmail.com  Thu Apr 13 16:27:54 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Thu, 13 Apr 2006 17:27:54 +0100
Subject: [Linux-cluster] Order to Power Up
Message-ID: <b6131fdc0604130927k375a923gf0793bcdd036ba96@mail.gmail.com>

Hi All,

I've had to power down all the machines in a GFS 6.0 cluster - 2 nodes
and a lock_gulmd qurum server.

If I bring them up one at a time, the first server will hang waiting
to start lock_gulmd.  What's the best way to do this, and the  best
order?

Should I bring them up in single user mode first and then start the
services manually?

S.



From mtp at tilted.com  Thu Apr 13 18:03:13 2006
From: mtp at tilted.com (Mark Petersen)
Date: Thu, 13 Apr 2006 13:03:13 -0500
Subject: [Linux-cluster] CLVM and AoE
In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338EC@bnifex.cis.buc.co
 m>
References: <4766EEE585A6D311ADF500E018C154E3021338EC@bnifex.cis.buc.com>
Message-ID: <7.0.1.0.2.20060413125753.0281dc00@tilted.com>

At 09:40 AM 4/13/2006, you wrote:
>If you don't use GFS, just make absolutely sure that there is no way
>that two nodes could mount the same lv.  As far as I know, there is
>nothing in the cluster that will prevent an ext3 or xfs filesystem
>from being mounted by multiple nodes.  And if it happens, you have
>almost guaranteed data corruption.

The thing with Xen is, if you use GFS on the dom0 then you'll be 
using loopback filesystems for the domUs, so data corruption could 
still happen.  xfs has protection against being mounted twice, you 
may want to consider using xfs if you're concerned about a domU being 
run from two different dom0's causing data corruption on the fs.  I'm 
not aware of any other fs that provides this feature.




From Britt.Treece at savvis.net  Thu Apr 13 18:33:35 2006
From: Britt.Treece at savvis.net (Treece, Britt)
Date: Thu, 13 Apr 2006 13:33:35 -0500
Subject: [Linux-cluster] Order to Power Up
Message-ID: <9A6FE0FCC2B29846824C5CD81C6647B90152A2F4@s228130hz1ew08.apptix-01.savvis.net>

Steve,

In what order are you currently bringing them up?  The client (non-lock
master) servers will wait for 600s (default timeout) until a master lock
server is available to handle the locking.  If one becomes available in
that time frame lock_gulmd will start.  If one does not become available
the lock_gulmd process will time out based on the aforementioned value
and the cluster won't be able to start.

If all 3 servers are down you should likely power on the lock server and
then a moment later power on the client (GFS mounting) servers.

Regards,

Britt

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Steve Nelson
Sent: Thursday, April 13, 2006 11:28 AM
To: linux clustering
Subject: [Linux-cluster] Order to Power Up

Hi All,

I've had to power down all the machines in a GFS 6.0 cluster - 2 nodes
and a lock_gulmd qurum server.

If I bring them up one at a time, the first server will hang waiting
to start lock_gulmd.  What's the best way to do this, and the  best
order?

Should I bring them up in single user mode first and then start the
services manually?

S.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From pegasus at nerv.eu.org  Thu Apr 13 18:40:30 2006
From: pegasus at nerv.eu.org (Jure =?UTF-8?Q?Pe=C4=8Dar?=)
Date: Thu, 13 Apr 2006 20:40:30 +0200
Subject: [Linux-cluster] CLVM and AoE
In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338EC@bnifex.cis.buc.com>
References: <4766EEE585A6D311ADF500E018C154E3021338EC@bnifex.cis.buc.com>
Message-ID: <20060413204030.dddf419a.pegasus@nerv.eu.org>

On Thu, 13 Apr 2006 10:40:22 -0400
Bowie Bailey <Bowie_Bailey at BUC.com> wrote:

> If you don't use GFS, just make absolutely sure that there is no way
> that two nodes could mount the same lv.  As far as I know, there is
> nothing in the cluster that will prevent an ext3 or xfs filesystem
> from being mounted by multiple nodes.  And if it happens, you have
> almost guaranteed data corruption.

If the underlying storage is scsi3, one can use persistent scsi reservations, which can be set with some tool from the sg3_utils package. In case of AoE, this is of course not possible. 

-- 

Jure Pe?ar
http://jure.pecar.org/



From aaron at firebright.com  Thu Apr 13 19:01:27 2006
From: aaron at firebright.com (Aaron Stewart)
Date: Thu, 13 Apr 2006 12:01:27 -0700
Subject: [Linux-cluster] CLVM and AoE
In-Reply-To: <20060413204030.dddf419a.pegasus@nerv.eu.org>
References: <4766EEE585A6D311ADF500E018C154E3021338EC@bnifex.cis.buc.com>
	<20060413204030.dddf419a.pegasus@nerv.eu.org>
Message-ID: <443EA007.2090701@firebright.com>

Hey Jure,

We're already committed to the AoE route unfortunately, but we're 
setting up next week, and I'll keep everyone posted on any performance 
benchmarks we glean.

-=Aaron

Jure Pe?ar wrote:
> On Thu, 13 Apr 2006 10:40:22 -0400
> Bowie Bailey <Bowie_Bailey at BUC.com> wrote:
>
>   
>> If you don't use GFS, just make absolutely sure that there is no way
>> that two nodes could mount the same lv.  As far as I know, there is
>> nothing in the cluster that will prevent an ext3 or xfs filesystem
>> from being mounted by multiple nodes.  And if it happens, you have
>> almost guaranteed data corruption.
>>     
>
> If the underlying storage is scsi3, one can use persistent scsi reservations, which can be set with some tool from the sg3_utils package. In case of AoE, this is of course not possible. 
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: aaron.vcf
Type: text/x-vcard
Size: 302 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060413/d30a7bde/attachment.vcf>

From sanelson at gmail.com  Thu Apr 13 20:34:23 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Thu, 13 Apr 2006 21:34:23 +0100
Subject: [Linux-cluster] Order to Power Up
In-Reply-To: <9A6FE0FCC2B29846824C5CD81C6647B90152A2F4@s228130hz1ew08.apptix-01.savvis.net>
References: <9A6FE0FCC2B29846824C5CD81C6647B90152A2F4@s228130hz1ew08.apptix-01.savvis.net>
Message-ID: <b6131fdc0604131334v5c584cb4mffcaef0828c9d2f0@mail.gmail.com>

On 4/13/06, Treece, Britt <Britt.Treece at savvis.net> wrote:
> Steve,
>
> In what order are you currently bringing them up?  The client (non-lock
> master) servers will wait for 600s (default timeout) until a master lock
> server is available to handle the locking.

Yes, I discovered this, and realised if I bring up a client server,
and then a master server, quorum is formed.

> If all 3 servers are down you should likely power on the lock server and
> then a moment later power on the client (GFS mounting) servers.

Thank you - this is what I did, and it worked fine.

> Regards,
>
> Britt

S.



From robert at deakin.edu.au  Fri Apr 14 03:39:33 2006
From: robert at deakin.edu.au (Robert Ruge)
Date: Fri, 14 Apr 2006 13:39:33 +1000
Subject: [Linux-cluster] bootup error - undefined symbol: lvm_snprintf
	afterupdate
In-Reply-To: <4766EEE585A6D311ADF500E018C154E3021338EF@bnifex.cis.buc.com>
Message-ID: <002c01c65f75$0bd9e700$0132a8c0@eit.deakin.edu.au>

I have jist experienced a similar thing but with a different undefined
symbol.

In my case I have installed clvm from a self compiled directory and
when the system updates the lvm2 package it has conflicted with my
self installed software. The simple answer for me was to reinstall
clvm and edit /etc/init.d/clvmd to change the paths from /usr/sbin to
/sbin.

Robert

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bowie Bailey
> Sent: Friday, 14 April 2006 1:33
> To: Linux-Cluster Mailing List (E-mail)
> Subject: [Linux-cluster] bootup error - undefined symbol: 
> lvm_snprintf afterupdate
> 
> This is an x86_64 system that I just updated to the newest 
> Cluster rpms.
> 
> When I watch the bootup on the console, I see an error:
> 
> lvm.static: symbol lookup error: /usr/lib64/liblvm2clusterlock.so:
>    undefined symbol: lvm_snprintf
> 
> This error comes immediately after the "Activating VGs" line, so it
> appears to be triggered by the vgchange command in the clvmd startup
> file.  I have another, identically configured, server which I have
not
> updated yet.  This server does not give the error.
> 
> Everything seems to be working fine, so is this something I need to
> worry about?
> 
> --
> Bowie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From Bowie_Bailey at BUC.com  Fri Apr 14 13:41:45 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Fri, 14 Apr 2006 09:41:45 -0400
Subject: [Linux-cluster] bootup error - undefined symbol: lvm_snprintf
	afterupdate
Message-ID: <4766EEE585A6D311ADF500E018C154E3021338FA@bnifex.cis.buc.com>

Robert Ruge wrote:
> Bowie Bailey wrote:
> > 
> > This is an x86_64 system that I just updated to the newest
> > Cluster rpms.
> > 
> > When I watch the bootup on the console, I see an error:
> > 
> > lvm.static: symbol lookup error: /usr/lib64/liblvm2clusterlock.so:
> >    undefined symbol: lvm_snprintf
> > 
> > This error comes immediately after the "Activating VGs" line, so it
> > appears to be triggered by the vgchange command in the clvmd startup
> > file.  I have another, identically configured, server which I have
> > not updated yet.  This server does not give the error.
> > 
> > Everything seems to be working fine, so is this something I need to
> > worry about? 
> 
> I have jist experienced a similar thing but with a different undefined
> symbol.
> 
> In my case I have installed clvm from a self compiled directory and
> when the system updates the lvm2 package it has conflicted with my
> self installed software. The simple answer for me was to reinstall
> clvm and edit /etc/init.d/clvmd to change the paths from /usr/sbin to
> /sbin.

Interesting, but in my case, there are no self-compiled pieces.
Everything was pre-packaged rpms for both the original install and the
upgrade.

-- 
Bowie



From ugo.parsi at gmail.com  Fri Apr 14 15:01:41 2006
From: ugo.parsi at gmail.com (Ugo PARSI)
Date: Fri, 14 Apr 2006 17:01:41 +0200
Subject: [Linux-cluster] Aggregating filesystem
Message-ID: <f29fd8170604140801g3582107dvf2603c5ab0818fd6@mail.gmail.com>

Hello,

I would like to aggregate multiple hard drives (on multiple computers)
inside a big filesystem with RAID / failure tolerant capatibilities.

I thought GFS could do that part, but it seems it does not...

Any ideas on how I could that ?

Thanks a lot,

Ugo PARSI



From deval.kulshrestha at progression.com  Thu Apr 13 09:50:26 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Thu, 13 Apr 2006 15:20:26 +0530
Subject: *SPAM* Re: [Linux-cluster] Cluster node not able to access all
	cluster	resource
In-Reply-To: <1144880005.15794.54.camel@ayanami.boston.redhat.com>
Message-ID: <003a01c65edf$b0e36b90$7600a8c0@PROGRESSION>

Hi 
if you are using fibre based storage solution , you can configure either
zoning on switch level or 
Lun Masking at HBA-> logical Volume level. That can restrict the access path
for nodes. It's a kind of LUN access security mechanism.

Regards,
Deval K.

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: Thursday, April 13, 2006 3:43 AM
To: linux clustering
Subject: *SPAM* Re: [Linux-cluster] Cluster node not able to access all
cluster resource

On Sat, 2006-04-08 at 19:05 +0200, Leandro Dardini wrote:
> The topic is not a problem, but what I want to do. I have a lots of
> service, each on is now run by a two node cluster. This is very bad due
> to each node fencing other one during network blackout. I'd like to
> create only one cluster, but each resource, either GFS filesystems, must
> be readable only by a limited number of nodes.
> 
> For example, taking a Cluster "test" made of node A, node B, node C,
> node D and with the following resources: GFS Filesystem alpha and GFS
> Filesystem beta. I want that only node A and node B can access GFS
> Filesystem alpha and only node C and node D can access GFS Filesystem
> beta.
> 
> Is it possible?

You can just mount alpha on {A B} and beta on {C D}, but I don't think
there is an easy way to forcefully prevent mounting alpha on {C D}
currently; someone else might know better.

-- Lon

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India




From jason at monsterjam.org  Sat Apr 15 17:41:04 2006
From: jason at monsterjam.org (Jason)
Date: Sat, 15 Apr 2006 13:41:04 -0400
Subject: [Linux-cluster] newbie questions..
Message-ID: <20060415174104.GE41043@monsterjam.org>

hey folks, I just downloaded ftp://sources.redhat.com/pub/cluster/releases/cluster-1.02.00.tar.gz
and am planning to run it on a Red Hat Enterprise Linux AS release 3 box, with 
Linux tf1.localdomain 2.4.21-40.ELsmp #1 SMP Thu Feb 2 22:22:39 EST 2006 i686 i686 i386 GNU/Linux
kernel.. I check the INSTALL file and BAM!
./configure --kernel_src=/path/to/linux-2.6.x
                                  ^^^^^^^^^^^

so do I HAVE to be running 2.6 kernel to use this software?
If so, what are my options? (i checked the older cluster-1.0x releases and they all say 2.6 
kernel too).

regards,
Jason




From sgray at bluestarinc.com  Sun Apr 16 19:42:55 2006
From: sgray at bluestarinc.com (Sean Gray)
Date: Sun, 16 Apr 2006 15:42:55 -0400
Subject: [Linux-cluster] RE: RHEL+RAC+GFS
In-Reply-To: <55D425252A666646B456CFF8E3248DCCAA9339@ILEX5.IL.NDS.COM>
Message-ID: <006001c6618d$f88003e0$6500000a@BLS105>

Udi,

I never did receive an answer on this. Metalink was no help either. What I
believe is that RedHat sells GFS and RHCS to Oracle customers so they can
get the 2k-3k US per node income, I guess I would as well if I was them : ),
they need to eat too. 

WARNING! A couple days ago I found out that RHEL+RAC+GFS is NOT covered
under Oracle?s, ?Unbreakable Support? and they will NOT assist with ANY GFS
issues! 

Here is what I have done thus far as a proof of concept for our 11i
implementation conference room pilots: 

Part A
? Purchased RHEL subscription
? Downloaded GFS and RHCS SRPMs, compiled and installed
? Made a 4 node cluster with ILO fencing and 2 CLVM2/GFS volumes from an EMC
cx300
? Made my staging area for the 11i install

Part A Results
? At first look things seemed fine
? Did basic testing with tools like dd, touch, ls, etc.
? Installed Stage11i, install seemed slow
? Under heavy IO (simultaneous 1G file creation using dd) received kernel
panics, added numa=off to boot string fixed this
? Installed CRP1 on a single node
? CRP1 is operational, but seems sluggish
? Destroyed cluster, and moved CRP1 to a single node cluster, same result
operational but sluggish

Part B
? Made 3 CLVM2/GFS volumes DB/APPS/BACKUP
? Mounted all three volumes on both nodes
? Installed CRP2 with node1 as DB and node 2 as APPS

Part B Results
? Install was slow
? CRP2 was sluggish and after a few hours dlm_sendd became a giant CPU hog,
if the db and apps were bounced (no reboot) things would be OK for a while
? Switched over to the older lock mechanism GULM, but had exact same results
? At this point great disappointment sets in : ( and I reach out to this
mailing list for help, no response(!)

Part C
? I reformatted the db and apps volumes as ext3 but left them managed by
CLVM2 (I never thought to do otherwise)
? I removed the backup volume as a cluster resource, but since I still had
CLVM2 in play I found that I had to have cman, ccsd, and clvm enabled so
everything would work.
? Now, only apps was mounted on the apps node(CLVM+EXT3), db was only
mounted on the db node(CLVM+EXT3), and backup was not mounted.

Part C Results
? Same, CRP2 was sluggish and after a few hours dlm_sendd became a giant CPU
hog, if the db and apps were bounced (no reboot) things would be OK for a
while
? If the backup volume (CLVM+GFS) was mounted it got even worse

Part D
? Destroyed the CLVM setup on the backup volume
? Formatted the entire device (backup volume) as EXT3 without any
partitioning 
? NOTE* the db and apps volumes are RAID 1+0 arrays on fibre channel(fast)
and the backup volume is a RAID 5 ATA array (slow).
? So now the setup is as follows:
	o db node, mounts db volume - fibre channel+CLVM+EXT3
	o apps node, mounts apps volume - fibre channel+CLVM+EXT3
	o apps node shares apps volome to db node via NFS read-only
	o db node, mounts backup volume, - ATA+EXT3

Part D Results
? Same, CRP2 was sluggish and after a few hours dlm_sendd became a giant CPU
hog, if the db and apps were bounced (no reboot) things would be OK for a
while
? Throughput on backup volume is fantastic in comparison!!!

Conclusion
RHEL+RAC+GFS may be possible. However, I have not been able to put together
the recipe, have had no real assistance from outside resources, and think
there is a possible bug in dlm_sendd. Until a true recipe is developed I
cannot personally recommend this configuration regardless of what
http://www.redhat.com/whitepapers/rha/gfs/Oracle9i_RAC_GFS.pdf says. I do
not intend to slight any company or product; it is entirely possible my
results are due to my own ignorance.

Final Note (off-topic and off-list)
Some may be wondering what I plan to do next. I am currently pursuing OCFS2
as a file system and clustering solution. Here is why:
? It is GPL?d and free (as in beer)
? It has freely available binaries for stock RedHat kernels
? It has much in common with EXT3
? It is included in the newer versions lf Linus?s kernel tree
? It will qualify for ?Unbreakable Support?
? It appears to have applications totally outside of the Oracle world, as in
creating a shared root (/) volume and still being able to maintain node
specific configuration files. Cool stuff. 

Happy clustering, I hope some of my months of frustrations are useful to
someone.

-- 
Sean N. Gray
Director of Information Technology
United Radio Incorporated, DBA BlueStar
24 Spiral Drive
Florence, Kentucky 41042
office: 859.371.4423 x3263
toll free: 800.371.4423 x3263
fax: 859.371.4425
mobile: 513.616.3379
________________________________________
From: Yaffe, Udi 
Sent: Sunday, April 16, 2006 5:49 AM
To: Sean Gray
Subject: RHEL+RAC+GFS

Sean,
?
I read your message in the RedHat forum, 14 Mar 2006?(about Oracle Rac on
RedHat, using GFS from the) and curious to know whether you got an answer ? 
I spend the last three weeks looking for a document or any other article on
the web,?explaining how to install RAC on GFS, but couldn't find any. if you
do have an answer, can you please give me an advice how to start with ?
?
Regards,
?
????? Udi 
??????Senior System Engineer - Project delivery
?




From deval.kulshrestha at progression.com  Mon Apr 17 07:17:35 2006
From: deval.kulshrestha at progression.com (Deval kulshrestha)
Date: Mon, 17 Apr 2006 12:47:35 +0530
Subject: [Linux-cluster] cluster suit 4.2
In-Reply-To: <004c01c649e7$52cd7f30$4ee17bcb@golie>
Message-ID: <001001c661ef$00507a30$6800a8c0@PROGRESSION>

Hi Paul
I am also using HP Servers with ILO port. I have had used hp_ilo as an
fencing agent. Basic ILO functionality comes by default with all HP servers,
Advance ILO is Licensed feature(Except Blade Server).

You can configure IP address and User name and Password for ILO port. 
In fencing devices you have to configure HP_ilo as fencing agent. But before
that ensure ILO is accessible over LAN.

It's working fine with updated fence package

Regards
Deval 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Paul
Sent: Friday, March 17, 2006 10:52 PM
To: Lon Hohberger
Cc: linux-cluster at redhat.com
Subject: Re: [Linux-cluster] cluster suit 4.2

Mr Lon,

I still face the same problem. If network cable of two NIC of one node are 
disconnected, the cluster system stall/hang.
For your input, I use manual fence for the time being and using two network 
interfaces for each node as a bonding port.
I have no power switch for fencing but my servers have ILO port ( Proliant 
DL380G4 ).
If manual fence is not recommended for this case, can I use ILO port for 
fencing.
Can you tell me how to set ILO fenced and what things are needed for the 
setting.
Is ILO need license. Can you give me the solution for this problem.
Thanks in advance.

Rgds,
paul


would you like give us the solution
----- Original Message ----- 
From: "Lon Hohberger" <lhh at redhat.com>
To: "Paul" <paul at tkd.co.id>
Cc: <linux-cluster at redhat.com>
Sent: Thursday, March 16, 2006 11:27 PM
Subject: Re: [Linux-cluster] cluster suit 4.2


> On Thu, 2006-03-16 at 23:13 +0700, Paul wrote:
>> manual fence, because we have redundance PS, thx
>>
>
> You need to run fence_ack_manual on the surviving node.  Note that
> running manual fencing in production environments is not supported.
>
> There is plenty of adequate remote power fencing hardware available
> which will handle multiple power supplies.
>
> -- Lon
> 

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India




From nemanja at yu.net  Mon Apr 17 09:29:24 2006
From: nemanja at yu.net (Nemanja Miletic)
Date: Mon, 17 Apr 2006 11:29:24 +0200
Subject: [Linux-cluster] problems with 8 node production gfs cluster
Message-ID: <1145266165.27997.57.camel@nemanja.eunet.yu>

Hello,

I am working for major ISP and we have gfs cluster deployed for our mail
system. Cluster includes six smtp and two pop3 nodes. Nodes are on blade
servers, and are accessing FC HP EVA 5000 storage. Whole cluster is on
the same gigabit subnet. Partition that holds mailboxes is shared. We
are using gentoo linux distribution with 2.6.16-gentoo-r1 kernel and
1.02 linux-cluster.  

We have a problem that in busy hours reading and writing from gfs
partition gets very slow. This causes many processes that need to use
the disk to go in D state. This further causes great load (over 200) on
the machines in the cluster. When we cut the pop3 and smtp access on the
firewall load slowly decreases. At the moment we have a limit on syn
connection on 110 port on our load balancers (LVS based) in order to
control the load.



Thank You,
-- 
Nemanja Miletic, System Engineer
-----
YUnet International  http://www.EUnet.yu
Dubrovacka 35/III, 11000 Belgrade
Tel: +381 11 3305633;  Fax: +381 11 3282760
-----
This e-mail is confidential and intended only for the recipient.
Unauthorized distribution, modification or disclosure of its
contents is prohibited. If you have received this e-mail in error,
please notify the sender by telephone +381 11 3305633.




From rainer at ultra-secure.de  Mon Apr 17 10:31:00 2006
From: rainer at ultra-secure.de (Rainer Duffner)
Date: Mon, 17 Apr 2006 12:31:00 +0200
Subject: [Linux-cluster] problems with 8 node production gfs cluster
In-Reply-To: <1145266165.27997.57.camel@nemanja.eunet.yu>
References: <1145266165.27997.57.camel@nemanja.eunet.yu>
Message-ID: <44436E64.4080407@ultra-secure.de>

Nemanja Miletic wrote:
> Hello,
>
> I am working for major ISP and we have gfs cluster deployed for our mail
> system. Cluster includes six smtp and two pop3 nodes. Nodes are on blade
> servers, and are accessing FC HP EVA 5000 storage. Whole cluster is on
> the same gigabit subnet. Partition that holds mailboxes is shared. We
> are using gentoo linux distribution with 2.6.16-gentoo-r1 kernel and
> 1.02 linux-cluster.  
>
> We have a problem that in busy hours reading and writing from gfs
> partition gets very slow. This causes many processes that need to use
> the disk to go in D state. This further causes great load (over 200) on
> the machines in the cluster. When we cut the pop3 and smtp access on the
> firewall load slowly decreases. At the moment we have a limit on syn
> connection on 110 port on our load balancers (LVS based) in order to
> control the load.
>
>
>
> Thank You,
>   



Out of personal interest: what MTA/MDA are you running?



cheers,
Rainer



From nemanja at yu.net  Mon Apr 17 11:42:24 2006
From: nemanja at yu.net (nemanja at yu.net)
Date: Mon, 17 Apr 2006 13:42:24 +0200
Subject: [Linux-cluster] problems with 8 node production gfs cluster
In-Reply-To: <44436E64.4080407@ultra-secure.de>
References: <1145266165.27997.57.camel@nemanja.eunet.yu>
	<44436E64.4080407@ultra-secure.de>
Message-ID: <20060417134224.bgcspg8kckk4wo8c@mail.yu.net>

We are using sendmail/procmail and popa3d for pop3.


Quoting Rainer Duffner <rainer at ultra-secure.de>:

> Nemanja Miletic wrote:
>> Hello,
>>
>> I am working for major ISP and we have gfs cluster deployed for our mail
>> system. Cluster includes six smtp and two pop3 nodes. Nodes are on blade
>> servers, and are accessing FC HP EVA 5000 storage. Whole cluster is on
>> the same gigabit subnet. Partition that holds mailboxes is shared. We
>> are using gentoo linux distribution with 2.6.16-gentoo-r1 kernel and
>> 1.02 linux-cluster.  We have a problem that in busy hours reading 
>> and writing from gfs
>> partition gets very slow. This causes many processes that need to use
>> the disk to go in D state. This further causes great load (over 200) on
>> the machines in the cluster. When we cut the pop3 and smtp access on the
>> firewall load slowly decreases. At the moment we have a limit on syn
>> connection on 110 port on our load balancers (LVS based) in order to
>> control the load.
>>
>>
>>
>> Thank You,
>>
>
>
>
> Out of personal interest: what MTA/MDA are you running?
>
>
>
> cheers,
> Rainer
>




From lhh at redhat.com  Mon Apr 17 14:04:22 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 17 Apr 2006 10:04:22 -0400
Subject: [Linux-cluster] newbie questions..
In-Reply-To: <20060415174104.GE41043@monsterjam.org>
References: <20060415174104.GE41043@monsterjam.org>
Message-ID: <1145282662.15794.90.camel@ayanami.boston.redhat.com>

On Sat, 2006-04-15 at 13:41 -0400, Jason wrote:
> hey folks, I just downloaded ftp://sources.redhat.com/pub/cluster/releases/cluster-1.02.00.tar.gz
> and am planning to run it on a Red Hat Enterprise Linux AS release 3 box, with 
> Linux tf1.localdomain 2.4.21-40.ELsmp #1 SMP Thu Feb 2 22:22:39 EST 2006 i686 i686 i386 GNU/Linux
> kernel.. I check the INSTALL file and BAM!
> ./configure --kernel_src=/path/to/linux-2.6.x
>                                   ^^^^^^^^^^^
> 
> so do I HAVE to be running 2.6 kernel to use this software?

Yes.

> If so, what are my options? (i checked the older cluster-1.0x releases and they all say 2.6 
> kernel too).

You can run clumanager 1.2.x + GFS 6.0

-- Lon




From nemanja at yu.net  Mon Apr 17 15:41:39 2006
From: nemanja at yu.net (Nemanja Miletic)
Date: Mon, 17 Apr 2006 17:41:39 +0200
Subject: [Linux-cluster] Re: problems with 8 node production gfs cluster
In-Reply-To: <1145266165.27997.57.camel@nemanja.eunet.yu>
References: <1145266165.27997.57.camel@nemanja.eunet.yu>
Message-ID: <1145288499.6000.15.camel@nemanja.eunet.yu>

Hi,

Does anyone think that turning on journaling on files could help us
speed up the access to gfs partition?

This would be difficult because journaling can be turned on only on
files that are empty. We have a large number of empty files of active
users that download all their mail from pop3 server, so turning on
jurnaling for them should be possible. 

What size should be the journals when file journaling is on?

Thank You


On Mon, 2006-04-17 at 11:29 +0200, Nemanja Miletic wrote:
> Hello,
> 
> I am working for major ISP and we have gfs cluster deployed for our mail
> system. Cluster includes six smtp and two pop3 nodes. Nodes are on blade
> servers, and are accessing FC HP EVA 5000 storage. Whole cluster is on
> the same gigabit subnet. Partition that holds mailboxes is shared. We
> are using gentoo linux distribution with 2.6.16-gentoo-r1 kernel and
> 1.02 linux-cluster.  
> 
> We have a problem that in busy hours reading and writing from gfs
> partition gets very slow. This causes many processes that need to use
> the disk to go in D state. This further causes great load (over 200) on
> the machines in the cluster. When we cut the pop3 and smtp access on the
> firewall load slowly decreases. At the moment we have a limit on syn
> connection on 110 port on our load balancers (LVS based) in order to
> control the load.
> 
> 
> 
> Thank You,
-- 
Nemanja Miletic, System Engineer
-----
YUnet International  http://www.EUnet.yu
Dubrovacka 35/III, 11000 Belgrade
Tel: +381 11 3305633;  Fax: +381 11 3282760
-----
This e-mail is confidential and intended only for the recipient.
Unauthorized distribution, modification or disclosure of its
contents is prohibited. If you have received this e-mail in error,
please notify the sender by telephone +381 11 3305633.




From bole at yu.net  Mon Apr 17 15:43:33 2006
From: bole at yu.net (Bosko Radivojevic)
Date: Mon, 17 Apr 2006 17:43:33 +0200
Subject: [Linux-cluster] Aggregating filesystem
In-Reply-To: <f29fd8170604140801g3582107dvf2603c5ab0818fd6@mail.gmail.com>
References: <f29fd8170604140801g3582107dvf2603c5ab0818fd6@mail.gmail.com>
Message-ID: <200604171743.33987.bole@yu.net>

Hi,

You need a parallel file system. It seems that IBM's GPFS is good choice.

On Friday 14 April 2006 17:01, Ugo PARSI wrote:
> Hello,
> 
> I would like to aggregate multiple hard drives (on multiple computers)
> inside a big filesystem with RAID / failure tolerant capatibilities.
> 
> I thought GFS could do that part, but it seems it does not...
> 
> Any ideas on how I could that ?
> 
> Thanks a lot,
> 
> Ugo PARSI
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From jason at monsterjam.org  Tue Apr 18 00:47:29 2006
From: jason at monsterjam.org (Jason)
Date: Mon, 17 Apr 2006 20:47:29 -0400
Subject: [Linux-cluster] newbie questions..
In-Reply-To: <1145282662.15794.90.camel@ayanami.boston.redhat.com>
References: <20060415174104.GE41043@monsterjam.org>
	<1145282662.15794.90.camel@ayanami.boston.redhat.com>
Message-ID: <20060418004729.GA13973@monsterjam.org>

ok, so I guess Ill shoot for the GFS 6.0. Im trying to figure out where the source can be found 
from the ftp://sources.redhat.com/pub/cluster/releases/
and not having much luck.. Is it possible to get the source and use GFS 6.0 standalone?

regards,
Jason



On Mon, Apr 17, 2006 at 10:04:22AM -0400, Lon Hohberger wrote:
> On Sat, 2006-04-15 at 13:41 -0400, Jason wrote:
> > hey folks, I just downloaded ftp://sources.redhat.com/pub/cluster/releases/cluster-1.02.00.tar.gz
> > and am planning to run it on a Red Hat Enterprise Linux AS release 3 box, with 
> > Linux tf1.localdomain 2.4.21-40.ELsmp #1 SMP Thu Feb 2 22:22:39 EST 2006 i686 i686 i386 GNU/Linux
> > kernel.. I check the INSTALL file and BAM!
> > ./configure --kernel_src=/path/to/linux-2.6.x
> >                                   ^^^^^^^^^^^
> > 
> > so do I HAVE to be running 2.6 kernel to use this software?
> 
> Yes.
> 
> > If so, what are my options? (i checked the older cluster-1.0x releases and they all say 2.6 
> > kernel too).
> 
> You can run clumanager 1.2.x + GFS 6.0
> 
> -- Lon
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
================================================
|    Jason Welsh   jason at monsterjam.org        |
| http://monsterjam.org    DSS PGP: 0x5E30CC98 |
|    gpg key: http://monsterjam.org/gpg/       |
================================================



From aberoham at gmail.com  Tue Apr 18 02:38:21 2006
From: aberoham at gmail.com (aberoham at gmail.com)
Date: Mon, 17 Apr 2006 19:38:21 -0700
Subject: [Linux-cluster] kernel noise, "Neighbour table overflow." ?
Message-ID: <3bdb07840604171938s723f28b0p4f0a195845bc521a@mail.gmail.com>

I'm running a test three-node CS/GFS cluster. At random intervals I get the
following kernel messages streaming out to /dev/console on all three nodes.

---
Neighbour table overflow.
printk: 166 messages suppressed.
Neighbour table overflow.
printk: 1 messages suppressed.
Neighbour table overflow.
printk: 1 messages suppressed.
Neighbour table overflow.
printk: 6 messages suppressed.
Neighbour table overflow.
printk: 5 messages suppressed.
Neighbour table overflow.
printk: 15 messages suppressed.
Neighbour table overflow.
printk: 7 messages suppressed.
Neighbour table overflow.
printk: 11 messages suppressed.
---

Are these messages related to CS/GFS? What triggers 'em? And should I worry
about it?

I'm running Linux 2.6.9-34.ELsmp, GFS-kernel-smp-2.6.9-45, GFS-6.1.5-0 and
dlm-kernel-smp-2.6.9-41.7.

[root at gfs02 ~]# service cman status
Protocol version: 5.0.1
Config version: 73
Cluster name: gfscluster
Cluster ID: 41396
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 3
Expected_votes: 3
Total_votes: 3
Quorum: 2
Active subsystems: 8
Node name: gfs02
Node addresses: 10.0.19.11

[root at gfs02 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2 3]

DLM Lock Space:  "clvmd"                             2   3 run       -
[1 2 3]

DLM Lock Space:  "Magma"                             4   5 run       -
[1 2 3]

DLM Lock Space:  "gfstest"                           5   6 run       -
[1 2]

GFS Mount Group: "gfstest"                           6   7 run       -
[1 2]

User:            "usrm::manager"                     3   4 run       -
[1 2 3]


Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060417/28bcc19b/attachment.htm>

From l.dardini at comune.prato.it  Tue Apr 18 08:56:55 2006
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Tue, 18 Apr 2006 10:56:55 +0200
Subject: R: Re: [Linux-cluster] Cluster node not able to access
	allcluster	resource
Message-ID: <0C5C8B118420264EBB94D7D7050150011EFBB7@exchange2.comune.prato.local>

 

> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] Per conto di Deval 
> kulshrestha
> Inviato: gioved? 13 aprile 2006 11.50
> A: 'linux clustering'
> Oggetto: RE: *SPAM* Re: [Linux-cluster] Cluster node not able 
> to access allcluster resource
> 
> Hi
> if you are using fibre based storage solution , you can 
> configure either zoning on switch level or Lun Masking at 
> HBA-> logical Volume level. That can restrict the access path 
> for nodes. It's a kind of LUN access security mechanism.
> 
> Regards,
> Deval K.

Unfortunately this approach was already tested and cannot be followed: If the service was owned by a node who is not allowed to access the GFS filesystem, an error occurs.

Leandro

> 
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
> Sent: Thursday, April 13, 2006 3:43 AM
> To: linux clustering
> Subject: *SPAM* Re: [Linux-cluster] Cluster node not able to 
> access all cluster resource
> 
> On Sat, 2006-04-08 at 19:05 +0200, Leandro Dardini wrote:
> > The topic is not a problem, but what I want to do. I have a lots of 
> > service, each on is now run by a two node cluster. This is very bad 
> > due to each node fencing other one during network blackout. 
> I'd like 
> > to create only one cluster, but each resource, either GFS 
> filesystems, 
> > must be readable only by a limited number of nodes.
> > 
> > For example, taking a Cluster "test" made of node A, node 
> B, node C, 
> > node D and with the following resources: GFS Filesystem 
> alpha and GFS 
> > Filesystem beta. I want that only node A and node B can access GFS 
> > Filesystem alpha and only node C and node D can access GFS 
> Filesystem 
> > beta.
> > 
> > Is it possible?
> 
> You can just mount alpha on {A B} and beta on {C D}, but I 
> don't think there is an easy way to forcefully prevent 
> mounting alpha on {C D} currently; someone else might know better.
> 
> -- Lon
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> ===========================================================
> Privileged or confidential information may be contained in 
> this message. If you are not the addressee indicated in this 
> message (or responsible for delivery of the message to such 
> person), please delete this message and kindly notify the 
> sender by an emailed reply. Opinions, conclusions and other 
> information in this message that do not relate to the 
> official business of Progression and its associate entities 
> shall be understood as neither given nor endorsed by them.
>   
> 
> -------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon (Haryana), India
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From sanelson at gmail.com  Tue Apr 18 13:11:01 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Tue, 18 Apr 2006 14:11:01 +0100
Subject: [Linux-cluster] Clumanager and Chkconfig
Message-ID: <b6131fdc0604180611u1635b475vf20257b0e1596685@mail.gmail.com>

Hi All,

Should clumanager be set to automatically start on all nodes?  I have
a 2 node cluster (+ quorum) were if I kill an interface, the cluster
fails over and the failed node reboots.  However, the node rejoins the
cluster automatically - should this happen?

# chkconfig --list clumanager
clumanager      0:off   1:off   2:on    3:on    4:on    5:on    6:off

This is in chkconfig because I ran chkconfig --add clumanager.

On another cluster, I have not run this, but this is currently in
production so I can't test failover.

My feeling was that Oracle should transfer to the other node, and
clustat should shown one node is inactive, and should be started
manually.

Does this seem right?

S.



From cjk at techma.com  Tue Apr 18 13:28:24 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Tue, 18 Apr 2006 09:28:24 -0400
Subject: [Linux-cluster] Clumanager and Chkconfig
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E65@tmaemail.techma.com>

It's basically a policy issue on your part. Some folks like to have problem
nodes
boot up "dumb" to avoid the system taking a beating due to a major problem.
It's
possible that the cluster would ride this sort of thing out, but if you have
a node
go down, you'd be investigating anyway so booting "dumb" is not a bad idea
anyway.


Corey 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Steve Nelson
Sent: Tuesday, April 18, 2006 9:11 AM
To: linux clustering
Subject: [Linux-cluster] Clumanager and Chkconfig

Hi All,

Should clumanager be set to automatically start on all nodes?  I have a 2
node cluster (+ quorum) were if I kill an interface, the cluster fails over
and the failed node reboots.  However, the node rejoins the cluster
automatically - should this happen?

# chkconfig --list clumanager
clumanager      0:off   1:off   2:on    3:on    4:on    5:on    6:off

This is in chkconfig because I ran chkconfig --add clumanager.

On another cluster, I have not run this, but this is currently in production
so I can't test failover.

My feeling was that Oracle should transfer to the other node, and clustat
should shown one node is inactive, and should be started manually.

Does this seem right?

S.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From teigland at redhat.com  Tue Apr 18 13:37:04 2006
From: teigland at redhat.com (David Teigland)
Date: Tue, 18 Apr 2006 08:37:04 -0500
Subject: [Linux-cluster] Re: problems with 8 node production gfs cluster
In-Reply-To: <1145288499.6000.15.camel@nemanja.eunet.yu>
References: <1145266165.27997.57.camel@nemanja.eunet.yu>
	<1145288499.6000.15.camel@nemanja.eunet.yu>
Message-ID: <20060418133704.GA16121@redhat.com>

On Mon, Apr 17, 2006 at 05:41:39PM +0200, Nemanja Miletic wrote:
> Hi,
> 
> Does anyone think that turning on journaling on files could help us
> speed up the access to gfs partition?
> 
> This would be difficult because journaling can be turned on only on
> files that are empty. We have a large number of empty files of active
> users that download all their mail from pop3 server, so turning on
> jurnaling for them should be possible. 

Data journaling might help, it will speed up fsync(), but will increase
the i/o going to your storage.

> What size should be the journals when file journaling is on?

Continue to use the default.

Another thing you might try is disabling the drop-locks callback, allowing
GFS to cache more locks.  Do this before you mount:
  echo "0" >> /proc/cluster/lock_dlm/drop_count

Dave



From 14117614 at sun.ac.za  Tue Apr 18 20:14:44 2006
From: 14117614 at sun.ac.za (Pool Lee, Mr <14117614@sun.ac.za>)
Date: Tue, 18 Apr 2006 22:14:44 +0200
Subject: [Linux-cluster] < cluster.conf problem >
Message-ID: <2C04D2F14FD8254386851063BC2B6706574C40@STBEVS01.stb.sun.ac.za>

  

Hi...

 

I have 5 nodes and 1 head node. I want to setup gfs so that I can bunch
together the 5 nodes, each have lvm's.

 

I'm having trouble setting up cluster.conf. I follow the manuals example
for gfs, not gfs2, and it says that it cant connect to css.

 

I'm running FC5 on all my machines.

 

Lee

 

He who has a why to live can bear with almost any how.
Friedrich Nietzsche  

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060418/99fd7707/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 862 bytes
Desc: image001.gif
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060418/99fd7707/attachment.gif>

From lhh at redhat.com  Tue Apr 18 20:27:03 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 18 Apr 2006 16:27:03 -0400
Subject: [Linux-cluster] Clumanager and Chkconfig
In-Reply-To: <b6131fdc0604180611u1635b475vf20257b0e1596685@mail.gmail.com>
References: <b6131fdc0604180611u1635b475vf20257b0e1596685@mail.gmail.com>
Message-ID: <1145392023.24818.36.camel@ayanami.boston.redhat.com>

On Tue, 2006-04-18 at 14:11 +0100, Steve Nelson wrote:
> Hi All,
> 
> Should clumanager be set to automatically start on all nodes?  I have
> a 2 node cluster (+ quorum) were if I kill an interface, the cluster
> fails over and the failed node reboots.  However, the node rejoins the
> cluster automatically - should this happen?

If you don't want clumanager to attempt to rejoin the cluster, chkconfig
--del it.  If you want it to attempt to rejoin, chkconfig --add it.

It's your choice.

Many failures which cause a node to be kicked out are temporary (ex:
kernel panic), and can be recovered from after a power-cycle.

Many hardware failures (ex: motherboard catching fire, hard disk crash)
generally (but *not* always) prevent the node from ever getting far
enough to rejoin the cluster.

-- Lon




From Bowie_Bailey at BUC.com  Tue Apr 18 20:27:15 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Tue, 18 Apr 2006 16:27:15 -0400
Subject: [Linux-cluster] bootup error - undefined symbol: lvm_snprintf
	after update
Message-ID: <4766EEE585A6D311ADF500E018C154E302133922@bnifex.cis.buc.com>

Hmm... only one response with an apparently unrelated cause.  Has
anyone else seen this error?  I'm reluctant to move toward production
with this server if I can't find out something about this error.

Bowie

Bowie wrote:
> This is an x86_64 system that I just updated to the newest Cluster
> rpms. 
> 
> When I watch the bootup on the console, I see an error:
> 
> lvm.static: symbol lookup error: /usr/lib64/liblvm2clusterlock.so:
>    undefined symbol: lvm_snprintf
> 
> This error comes immediately after the "Activating VGs" line, so it
> appears to be triggered by the vgchange command in the clvmd startup
> file.  I have another, identically configured, server which I have not
> updated yet.  This server does not give the error.
> 
> Everything seems to be working fine, so is this something I need to
> worry about?



From filipe.miranda at gmail.com  Tue Apr 18 20:33:13 2006
From: filipe.miranda at gmail.com (Filipe Miranda)
Date: Tue, 18 Apr 2006 17:33:13 -0300
Subject: [Linux-cluster] RHCS for RHEL3 Resource temporarily unavailable
Message-ID: <a6d13c780604181333v6331ddbctfebe6d466536eb62@mail.gmail.com>

 Hello,

We have a Red Hat Cluster Suite for Red Hat Enteprise Linux 3. After one
month, the master host cluster (node1) service was interrupted.
This is what shows in the file /var/log/messages:


Apr 13 16:05:06 Node1 clumembd[3222]: <warning> sending broadcast
message failed Resource temporarily unavailable
Apr 13 16:05:29 Node1 last message repeated 32 times
Apr 13 16:06:31 Node1 clumembd[3222]: <warning> sending broadcast
message failed Resource temporarily unavailable

This is the cluster config file:
cluster.xml

<?xml version="1.0" ?>
 - <#> <cluconfig version="*3.0*">
   <clumembd broadcast="*yes*" interval="*750000*" loglevel="*5*" multicast
="*no*" multicast_ipaddress="" thread="*yes*" tko_count="*20*" />
   <cluquorumd loglevel="*5*" pinginterval="*4*" tiebreaker_ip="" />
   <clurmtabd loglevel="*5*" pollinterval="*4*" />
   <clusvcmgrd loglevel="*7*" />
   <clulockd loglevel="*7*" />
   <cluster config_viewnumber="*1*" key="*814109fd98a521c9e33618045d42b97o*"name
="Cluster_A" />
   <sharedstate driver="*libsharedraw.so*" rawprimary="*/dev/raw/raw1*"rawshadow
="*/dev/raw/raw2*" type="*raw*" />
 - <#> <members>
   <member id="*0*" name="*node1*" watchdog="*no*" />
   <member id="*1*" name="node2" watchdog="*no*" />
  </members>
 - <#> <services>
 - <#> <service checkinterval="*30*" failoverdomain="*Local*"
id="*0*"maxfalsestarts
="*0*" maxrestarts="*15*" name="*script01*" userscript="">
 - <#> <service_ipaddresses>
   <service_ipaddress broadcast="*101.168.192.255*" id="*0*" ipaddress="*
101.168.192.162*" netmask="*255.255.255.0*" />
  </service_ipaddresses>
 - <#> <device id="*0*" name="*/dev/vol01/lv01*" sharename="">
   <mount forceunmount="*yes*" fstype="*ext3*" mountpoint="*/opt/*share"options
="" />
  </device>
  </service>
  </services>
 - <#> <failoverdomains>
 - <#> <failoverdomain id="*0*" name="*Local*" ordered="*no*" restricted="*
no*">
   <failoverdomainnode id="*0*" name="*node1*" />
   <failoverdomainnode id="*1*" name="node2" />
  </failoverdomain>
  </failoverdomains>
  </cluconfig>


Any ideas what is going on? what seems to be the problem?
The RHEL3 is on U4, so is the RHCS.
Did we do anything wrong in this configuration? Suggestions?
I would appreciate any help.


Regards,

FTM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060418/b6390a1d/attachment.htm>

From lhh at redhat.com  Tue Apr 18 20:35:14 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 18 Apr 2006 16:35:14 -0400
Subject: [Linux-cluster] newbie questions..
In-Reply-To: <20060418004729.GA13973@monsterjam.org>
References: <20060415174104.GE41043@monsterjam.org>
	<1145282662.15794.90.camel@ayanami.boston.redhat.com>
	<20060418004729.GA13973@monsterjam.org>
Message-ID: <1145392514.24818.43.camel@ayanami.boston.redhat.com>

On Mon, 2006-04-17 at 20:47 -0400, Jason wrote:
> ok, so I guess Ill shoot for the GFS 6.0. Im trying to figure out where the source can be found 
> from the ftp://sources.redhat.com/pub/cluster/releases/
> and not having much luck.. Is it possible to get the source and use GFS 6.0 standalone?
> 

ftp://updates.redhat.com/enterprise/3AS/en/RHCS/SRPMS
ftp://updates.redhat.com/enterprise/3AS/en/RHGFS/SRPMS

Good luck!  If it breaks, you get to keep the pieces.

Red Hat has evaluation programs available, you know...

-- Lon

Red Hat's going to Nashville!
http://www.redhat.com/promo/summit/




From 14117614 at sun.ac.za  Tue Apr 18 20:44:54 2006
From: 14117614 at sun.ac.za (Pool Lee, Mr <14117614@sun.ac.za>)
Date: Tue, 18 Apr 2006 22:44:54 +0200
Subject: [Linux-cluster] < newbie question >
Message-ID: <2C04D2F14FD8254386851063BC2B6706574C41@STBEVS01.stb.sun.ac.za>

  

What does waiting for cluster quorum mean? 

 

Lee

 

He who has a why to live can bear with almost any how.
Friedrich Nietzsche  

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060418/4fcd1e69/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 862 bytes
Desc: image001.gif
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060418/4fcd1e69/attachment.gif>

From brentonr at dorm.org  Tue Apr 18 21:36:54 2006
From: brentonr at dorm.org (Brenton Rothchild)
Date: Tue, 18 Apr 2006 16:36:54 -0500
Subject: [Linux-cluster] [Fwd: Re: [Iscsitarget-devel] iSCSI on Fedora Core
	5 or custom]
Message-ID: <44455BF6.7070101@dorm.org>

Hello!

There has been some minor discussion on the iscsitarget-devel list
regarding the fact that the current iscsitarget code doesn't fully
implement the SCSI commands Release and Reserve, nor does it implement
persistent reserve in/out.

It was also mentioned that some cluster software, such as
MSCS (Microsoft Cluster), will possibly corrupt data.

It was then asked if GFS would depend on these unimplemented
SCSI commands, and the response from Ming Zhang (below) wasn't 100%
sure.

So, would an iSCSI target lacking Release, Reserve, and persistent
reserve in/out cause problems with GFS?

Thanks!
-Brenton

-------- Original Message --------
Subject: Re: [Iscsitarget-devel] iSCSI on Fedora Core 5 or custom
Date: Tue, 18 Apr 2006 17:16:39 -0400
From: Ming Zhang <mingz at ele.uri.edu>
Reply-To: mingz at ele.uri.edu
To: Jos Vos <jos at xos.nl>
CC: iscsitarget-devel at lists.sourceforge.net

On Tue, 2006-04-18 at 22:50 +0200, Jos Vos wrote:
> On Tue, Apr 18, 2006 at 04:21:29PM -0400, Ming Zhang wrote:
> 
> > cluster like MSCS, cluster file system mostly depends on scsi command
> > like reserve/release, persistent reserve in/out. But IET does not really
> > implement them. So there will be chance that you fail the cluster and
> > corrupt u data.
> 
> Does this also apply to the GFS and OCFS2 filesystems?

i am not 100% sure, but i remember that once a GFS guy told me that GFS
will run ok even with persistent reserve in/out support. not sure about
OCFS2.

Ming




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Iscsitarget-devel mailing list
Iscsitarget-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel



From erling.nygaard at gmail.com  Wed Apr 19 07:21:46 2006
From: erling.nygaard at gmail.com (Erling Nygaard)
Date: Wed, 19 Apr 2006 09:21:46 +0200
Subject: [Linux-cluster] [Fwd: Re: [Iscsitarget-devel] iSCSI on Fedora
	Core 5 or custom]
In-Reply-To: <44455BF6.7070101@dorm.org>
References: <44455BF6.7070101@dorm.org>
Message-ID: <adb721b40604190021m789defbycfc3d8871efa05a7@mail.gmail.com>

Brenton

GFS does not depend on any of the SCSI Release, Reserve or persistent
reserve commands.
GFS does not really depend on SCSI at all, just so happens that most
shared storage devices are SCSI based :-)


Once upon a time, in a galaxy far, far away GFS did attempt to use
special SCSI commands for locking. The attempt was discontinued....

Erling

On 4/18/06, Brenton Rothchild <brentonr at dorm.org> wrote:
> Hello!
>
> There has been some minor discussion on the iscsitarget-devel list
> regarding the fact that the current iscsitarget code doesn't fully
> implement the SCSI commands Release and Reserve, nor does it implement
> persistent reserve in/out.
>
> It was also mentioned that some cluster software, such as
> MSCS (Microsoft Cluster), will possibly corrupt data.
>
> It was then asked if GFS would depend on these unimplemented
> SCSI commands, and the response from Ming Zhang (below) wasn't 100%
> sure.
>
> So, would an iSCSI target lacking Release, Reserve, and persistent
> reserve in/out cause problems with GFS?
>
> Thanks!
> -Brenton
>
> -------- Original Message --------
> Subject: Re: [Iscsitarget-devel] iSCSI on Fedora Core 5 or custom
> Date: Tue, 18 Apr 2006 17:16:39 -0400
> From: Ming Zhang <mingz at ele.uri.edu>
> Reply-To: mingz at ele.uri.edu
> To: Jos Vos <jos at xos.nl>
> CC: iscsitarget-devel at lists.sourceforge.net
>
> On Tue, 2006-04-18 at 22:50 +0200, Jos Vos wrote:
> > On Tue, Apr 18, 2006 at 04:21:29PM -0400, Ming Zhang wrote:
> >
> > > cluster like MSCS, cluster file system mostly depends on scsi command
> > > like reserve/release, persistent reserve in/out. But IET does not really
> > > implement them. So there will be chance that you fail the cluster and
> > > corrupt u data.
> >
> > Does this also apply to the GFS and OCFS2 filesystems?
>
> i am not 100% sure, but i remember that once a GFS guy told me that GFS
> will run ok even with persistent reserve in/out support. not sure about
> OCFS2.
>
> Ming
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> that extends applications into web and mobile media. Attend the live webcast
> and join the prime developer group breaking into this new coding territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
> _______________________________________________
> Iscsitarget-devel mailing list
> Iscsitarget-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


--
-
Mac OS X. Because making Unix user-friendly is easier than debugging Windows



From saju8 at rediffmail.com  Wed Apr 19 10:47:47 2006
From: saju8 at rediffmail.com (saju  john)
Date: 19 Apr 2006 10:47:47 -0000
Subject: [Linux-cluster] clumanager sending broadcast packets ?
Message-ID: <20060419104747.6117.qmail@webmail57.rediffmail.com>

  
  
Dear All,

It seems that redhat clumanager is sending continous broadcast packet while it is running. Can any one confirm.

Thank You,
Saju John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060419/a8dd116b/attachment.htm>

From pcaulfie at redhat.com  Wed Apr 19 11:56:21 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 19 Apr 2006 12:56:21 +0100
Subject: [Linux-cluster] clumanager sending broadcast packets ?
In-Reply-To: <20060419104747.6117.qmail@webmail57.rediffmail.com>
References: <20060419104747.6117.qmail@webmail57.rediffmail.com>
Message-ID: <44462565.3020908@redhat.com>

saju john wrote:
>  
>  
> Dear All,
> 
> It seems that redhat clumanager is sending continous broadcast packet
> while it is running. Can any one confirm.
> 

cman will send out a broadcast packet every 5 seconds or so, on port 6809.
It's so that other nodes can check that the cluster is still connected.
-- 

patrick



From placid at adelpha-lan.org  Wed Apr 19 13:02:56 2006
From: placid at adelpha-lan.org (Castang Jerome)
Date: Wed, 19 Apr 2006 15:02:56 +0200
Subject: [Linux-cluster] clumanager sending broadcast packets ?
In-Reply-To: <20060419104747.6117.qmail@webmail57.rediffmail.com>
References: <20060419104747.6117.qmail@webmail57.rediffmail.com>
Message-ID: <44463500.9080809@adelpha-lan.org>

saju john a ?crit :
>
>  
>  
> Dear All,
>
> It seems that redhat clumanager is sending continous broadcast packet 
> while it is running. Can any one confirm.
>
> Thank You,
> Saju John
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
I can confirm :)
cman sends broadcast packets to check if all nodes are present.


-- 
Jerome CASTANG
Tel: 06.85.74.33.02
mail: jerome.castang at adelpha-lan.org

---------------------------------------------
RTFM !



From teigland at redhat.com  Wed Apr 19 13:20:46 2006
From: teigland at redhat.com (David Teigland)
Date: Wed, 19 Apr 2006 08:20:46 -0500
Subject: [Linux-cluster] [Fwd: Re: [Iscsitarget-devel] iSCSI on Fedora
	Core 5 or custom]
In-Reply-To: <44455BF6.7070101@dorm.org>
References: <44455BF6.7070101@dorm.org>
Message-ID: <20060419132046.GA2683@redhat.com>

On Tue, Apr 18, 2006 at 04:36:54PM -0500, Brenton Rothchild wrote:
> Hello!
> 
> There has been some minor discussion on the iscsitarget-devel list
> regarding the fact that the current iscsitarget code doesn't fully
> implement the SCSI commands Release and Reserve, nor does it implement
> persistent reserve in/out.
> 
> It was also mentioned that some cluster software, such as
> MSCS (Microsoft Cluster), will possibly corrupt data.
> 
> It was then asked if GFS would depend on these unimplemented
> SCSI commands, and the response from Ming Zhang (below) wasn't 100%
> sure.
> 
> So, would an iSCSI target lacking Release, Reserve, and persistent
> reserve in/out cause problems with GFS?

GFS requires the clustering software to do i/o fencing and persistent
reservations would be a good way to do fencing.

Dave



From jbrassow at redhat.com  Wed Apr 19 14:13:46 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Wed, 19 Apr 2006 09:13:46 -0500
Subject: [Linux-cluster] < cluster.conf problem >
In-Reply-To: <2C04D2F14FD8254386851063BC2B6706574C40@STBEVS01.stb.sun.ac.za>
References: <2C04D2F14FD8254386851063BC2B6706574C40@STBEVS01.stb.sun.ac.za>
Message-ID: <23f9ac66cf0e830ed3672b0a370ac54d@redhat.com>

Have you started the CCS daemon (ccsd)?  The init script should start 
this on bootup.  You really just need to create the cluster.conf file 
(by hand or by GUI) and copy it to all your nodes.  Then reboot.

  brassow


On Apr 18, 2006, at 3:14 PM, Pool Lee, Mr <14117614 at sun.ac.za> wrote:

> Hi?
> ?
> I have 5 nodes and 1 head node. I want to setup gfs so that I can 
> bunch together the 5 nodes, each have lvm?s.
> ?
> I?m having trouble setting up cluster.conf. I follow the manuals 
> example for gfs, not gfs2, and it says that it cant connect to css.
> ?
> I?m running FC5 on all my machines.
> ?
> Lee
> ?
>
> He who has a why to live can bear with almost any how.
> Friedrich Nietzsche?
> ?
> <image001.gif>--
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1980 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060419/512b234a/attachment.bin>

From 14117614 at sun.ac.za  Wed Apr 19 14:28:18 2006
From: 14117614 at sun.ac.za (Pool Lee, Mr <14117614@sun.ac.za>)
Date: Wed, 19 Apr 2006 16:28:18 +0200
Subject: [Linux-cluster] < cluster.conf problem >
Message-ID: <2C04D2F14FD8254386851063BC2B6706574C4C@STBEVS01.stb.sun.ac.za>

Hi...

 

I did that, thanks.

 

Now I have trouble getting fence_tool join to work. I did cman_tool join
and I'm able to see the nodes under /proc/cluster/nodes...

 

But fence seems to wait for something called  "quorum"..

In the cluster.conf file just said fence manually...

 

What would be the problem?

 

Lee

 

He who has a why to live can bear with almost any how.
Friedrich Nietzsche

________________________________

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jonathan E
Brassow
Sent: 19 April 2006 04:14 PM
To: linux clustering
Subject: Re: [Linux-cluster] < cluster.conf problem >

 

Have you started the CCS daemon (ccsd)? The init script should start
this on bootup. You really just need to create the cluster.conf file (by
hand or by GUI) and copy it to all your nodes. Then reboot. 

 

brassow 

 

On Apr 18, 2006, at 3:14 PM, Pool Lee, Mr <14117614 at sun.ac.za> wrote: 

	 

	Hi... 

	  

	I have 5 nodes and 1 head node. I want to setup gfs so that I
can bunch together the 5 nodes, each have lvm's. 

	  

	I'm having trouble setting up cluster.conf. I follow the manuals
example for gfs, not gfs2, and it says that it cant connect to css. 

	  

	I'm running FC5 on all my machines. 

	  

	Lee 

	  

	 

	He who has a why to live can bear with almost any how. 

	Friedrich Nietzsche  

	  

	<image001.gif>-- 

	Linux-cluster mailing list 

	Linux-cluster at redhat.com 

	https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060419/f1dc0133/attachment.htm>

From jbrassow at redhat.com  Wed Apr 19 16:45:04 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Wed, 19 Apr 2006 11:45:04 -0500
Subject: [Linux-cluster] < cluster.conf problem >
In-Reply-To: <2C04D2F14FD8254386851063BC2B6706574C4C@STBEVS01.stb.sun.ac.za>
References: <2C04D2F14FD8254386851063BC2B6706574C4C@STBEVS01.stb.sun.ac.za>
Message-ID: <540618d44815ff3d4897d82dd26f9538@redhat.com>

Quorum is a number of machines greater than or equal to (n/2 +1) of the 
total cluster machines.
There is no way for two groups of machines in a cluster to have 
"quorum".  Only one group (which may be all the machines) can have this 
status.

When a group of machines has quorum, they can perform cluster 
operations.  The idea of quorum prevents "split-brain" or two separate 
groups of machines from thinking they are in control of the cluster - 
and thus potentially corrupting resources because they do not 
acknowledge the existence of the other group.  (Think multiple writer 
problem.)

You should reboot all your machines at the same time.  (Or at least do 
cman_tool join on all the machines at close to the same time.)  This 
allows the machines to form a quorate group and start performing 
cluster operations - like starting and performing fencing.

  brassow

P.S. Manual fencing sucks for anything more than simple evaluation.  My 
guess is that you will encounter more problems/questions because of 
manual fencing.

On Apr 19, 2006, at 9:28 AM, Pool Lee, Mr <14117614 at sun.ac.za> wrote:

> Hi?
> ?
> I did that, thanks.
> ?
> Now I have trouble getting fence_tool join to work. I did cman_tool 
> join and I?m able to see the nodes under /proc/cluster/nodes?
> ?
> But fence seems to wait for something called ??quorum?..
> In the cluster.conf file just said fence manually?
> ?
> What would be the problem?
> ?
> Lee
> ?
>
> He who has a why to live can bear with almost any how.
> Friedrich Nietzsche
>
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jonathan E 
> Brassow
> Sent: 19 April 2006 04:14 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] < cluster.conf problem >
> ?
> Have you started the CCS daemon (ccsd)? The init script should start 
> this on bootup. You really just need to create the cluster.conf file 
> (by hand or by GUI) and copy it to all your nodes. Then reboot.
> ?
> brassow
>
> ?
> On Apr 18, 2006, at 3:14 PM, Pool Lee, Mr <14117614 at sun.ac.za> wrote:
>> ?
>> Hi?
>> ?
>> I have 5 nodes and 1 head node. I want to setup gfs so that I can 
>> bunch together the 5 nodes, each have lvm?s.
>> ?
>> I?m having trouble setting up cluster.conf. I follow the manuals 
>> example for gfs, not gfs2, and it says that it cant connect to css.
>> ?
>> I?m running FC5 on all my machines.
>> ?
>> Lee
>> ?
>> ?
>> He who has a why to live can bear with almost any how.
>> Friedrich Nietzsche?
>> ?
>> <image001.gif>--
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 8555 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060419/21bc3da9/attachment.bin>

From ehimmel at burlingtontelecom.com  Wed Apr 19 17:23:18 2006
From: ehimmel at burlingtontelecom.com (Evan Himmel)
Date: Wed, 19 Apr 2006 13:23:18 -0400
Subject: [Linux-cluster] Cluster Help
Message-ID: <44467206.8060802@burlingtontelecom.com>

I installed all the necessary rpms to run RH Cluster Suite and GFS on 
Fedora Core 5.  I am running the 64-bit version with the xen kernel.  I 
can't seem to get cman to start.  It gives me the following error:

can't open cluster socket: Address family not supported by protocol
cman_tool: The cman kernel module may not be loaded

I also noticed there is no support for the SMP kernels via 64-bit. 
(cman-kernel-smp) and that lvm2-cluster is also not available.  Any help 
would be great.

Evan





From 14117614 at sun.ac.za  Wed Apr 19 17:32:00 2006
From: 14117614 at sun.ac.za (Pool Lee, Mr <14117614@sun.ac.za>)
Date: Wed, 19 Apr 2006 19:32:00 +0200
Subject: [Linux-cluster] < cluster.conf problem >
References: <2C04D2F14FD8254386851063BC2B6706574C4C@STBEVS01.stb.sun.ac.za>
	<540618d44815ff3d4897d82dd26f9538@redhat.com>
Message-ID: <2C04D2F14FD8254386851063BC2B67065E08A7@STBEVS01.stb.sun.ac.za>

Hi...
 
Thanks for the answer..
 
Lee
 

He who has a why to live can bear with almost any how.
Friedrich Nietzsche



________________________________

From: linux-cluster-bounces at redhat.com on behalf of Jonathan E Brassow
Sent: Wed 2006/04/19 06:45 PM
To: linux clustering
Subject: Re: [Linux-cluster] < cluster.conf problem >


Quorum is a number of machines greater than or equal to (n/2 +1) of the total cluster machines. 
There is no way for two groups of machines in a cluster to have "quorum". Only one group (which may be all the machines) can have this status. 

When a group of machines has quorum, they can perform cluster operations. The idea of quorum prevents "split-brain" or two separate groups of machines from thinking they are in control of the cluster - and thus potentially corrupting resources because they do not acknowledge the existence of the other group. (Think multiple writer problem.) 

You should reboot all your machines at the same time. (Or at least do cman_tool join on all the machines at close to the same time.) This allows the machines to form a quorate group and start performing cluster operations - like starting and performing fencing. 

brassow 

P.S. Manual fencing sucks for anything more than simple evaluation. My guess is that you will encounter more problems/questions because of manual fencing. 

On Apr 19, 2006, at 9:28 AM, Pool Lee, Mr <14117614 at sun.ac.za> wrote: 


	Hi... 
	
	I did that, thanks. 
	
	Now I have trouble getting fence_tool join to work. I did cman_tool join and I'm able to see the nodes under /proc/cluster/nodes... 
	
	But fence seems to wait for something called  "quorum".. 
	In the cluster.conf file just said fence manually... 
	
	What would be the problem? 
	
	Lee 
	

	He who has a why to live can bear with almost any how. 
	Friedrich Nietzsche 

	From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jonathan E Brassow 
	Sent: 19 April 2006 04:14 PM 
	To: linux clustering 
	Subject: Re: [Linux-cluster] < cluster.conf problem > 
	
	Have you started the CCS daemon (ccsd)? The init script should start this on bootup. You really just need to create the cluster.conf file (by hand or by GUI) and copy it to all your nodes. Then reboot. 
	
	brassow 

	
	On Apr 18, 2006, at 3:14 PM, Pool Lee, Mr <14117614 at sun.ac.za> wrote: 

		
		Hi... 
		  
		I have 5 nodes and 1 head node. I want to setup gfs so that I can bunch together the 5 nodes, each have lvm's. 
		  
		I'm having trouble setting up cluster.conf. I follow the manuals example for gfs, not gfs2, and it says that it cant connect to css. 
		  
		I'm running FC5 on all my machines. 
		  
		Lee 
		  
		
		He who has a why to live can bear with almost any how. 
		Friedrich Nietzsche  
		  
		<image001.gif>-- 
		Linux-cluster mailing list 
		Linux-cluster at redhat.com 
		https://www.redhat.com/mailman/listinfo/linux-cluster 

	-- 
	Linux-cluster mailing list 
	Linux-cluster at redhat.com 
	https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 10701 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060419/840c6da2/attachment.bin>

From jbrassow at redhat.com  Wed Apr 19 18:33:05 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Wed, 19 Apr 2006 13:33:05 -0500
Subject: [Linux-cluster] Cluster Help
In-Reply-To: <44467206.8060802@burlingtontelecom.com>
References: <44467206.8060802@burlingtontelecom.com>
Message-ID: <814fa887df569f69d883689d0e4fc100@redhat.com>


On Apr 19, 2006, at 12:23 PM, Evan Himmel wrote:

> I installed all the necessary rpms to run RH Cluster Suite and GFS on 
> Fedora Core 5.  I am running the 64-bit version with the xen kernel.  
> I can't seem to get cman to start.  It gives me the following error:
>
> can't open cluster socket: Address family not supported by protocol
> cman_tool: The cman kernel module may not be loaded

This error occurs when the 'cman' module is not loaded in the kernel.  
You can do 'modprobe cman' to load it.  If that doesn't work, it likely 
means that you don't have a cman-kernel*rpm for the particular kernel 
you are running. (I'm not sure we are currently building for xen... 
anyone?)

  brassow



From cfeist at redhat.com  Wed Apr 19 20:27:52 2006
From: cfeist at redhat.com (Chris Feist)
Date: Wed, 19 Apr 2006 15:27:52 -0500
Subject: [Linux-cluster] Cluster Help
In-Reply-To: <814fa887df569f69d883689d0e4fc100@redhat.com>
References: <44467206.8060802@burlingtontelecom.com>
	<814fa887df569f69d883689d0e4fc100@redhat.com>
Message-ID: <44469D48.3010105@redhat.com>

There are xen packages available for both x86_64 & i686 archs.  In the orignal 
FC5 release there weren't any x86_64 xen rpms for GFS and Cluster Suite.  You 
need to make sure you run 'yum update' to get the latest updates.

Thanks,
Chris

Jonathan E Brassow wrote:
> 
> On Apr 19, 2006, at 12:23 PM, Evan Himmel wrote:
> 
>> I installed all the necessary rpms to run RH Cluster Suite and GFS on 
>> Fedora Core 5.  I am running the 64-bit version with the xen kernel.  
>> I can't seem to get cman to start.  It gives me the following error:
>>
>> can't open cluster socket: Address family not supported by protocol
>> cman_tool: The cman kernel module may not be loaded
> 
> This error occurs when the 'cman' module is not loaded in the kernel.  
> You can do 'modprobe cman' to load it.  If that doesn't work, it likely 
> means that you don't have a cman-kernel*rpm for the particular kernel 
> you are running. (I'm not sure we are currently building for xen... 
> anyone?)
> 
>  brassow
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From sanelson at gmail.com  Wed Apr 19 21:35:57 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Wed, 19 Apr 2006 22:35:57 +0100
Subject: [Linux-cluster] Clustat in user's profile
Message-ID: <b6131fdc0604191435o7028bbe7w1325a9edfecba688@mail.gmail.com>

Hi All,

On all of my clusters, I have clustat run in the user's profile, so
the status of the cluster is visible whenever someone logs in.

Someone has suggested to me that clustat could hang, and prevent user
access.  Is this a valid point?  Under what (if any) circumstances
would clustat hang?

S.



From ehimmel at burlingtontelecom.com  Thu Apr 20 01:53:03 2006
From: ehimmel at burlingtontelecom.com (Evan Himmel)
Date: Wed, 19 Apr 2006 21:53:03 -0400
Subject: [Linux-cluster] Cluster Help
In-Reply-To: <44469D48.3010105@redhat.com>
References: <44467206.8060802@burlingtontelecom.com>	<814fa887df569f69d883689d0e4fc100@redhat.com>
	<44469D48.3010105@redhat.com>
Message-ID: <4446E97F.7050603@burlingtontelecom.com>

What about lvm2-cluster? I got kernel-xen for the rest.  Thanks!

Chris Feist wrote:
> There are xen packages available for both x86_64 & i686 archs.  In the 
> orignal FC5 release there weren't any x86_64 xen rpms for GFS and 
> Cluster Suite.  You need to make sure you run 'yum update' to get the 
> latest updates.
>
> Thanks,
> Chris
>
> Jonathan E Brassow wrote:
>>
>> On Apr 19, 2006, at 12:23 PM, Evan Himmel wrote:
>>
>>> I installed all the necessary rpms to run RH Cluster Suite and GFS 
>>> on Fedora Core 5.  I am running the 64-bit version with the xen 
>>> kernel.  I can't seem to get cman to start.  It gives me the 
>>> following error:
>>>
>>> can't open cluster socket: Address family not supported by protocol
>>> cman_tool: The cman kernel module may not be loaded
>>
>> This error occurs when the 'cman' module is not loaded in the 
>> kernel.  You can do 'modprobe cman' to load it.  If that doesn't 
>> work, it likely means that you don't have a cman-kernel*rpm for the 
>> particular kernel you are running. (I'm not sure we are currently 
>> building for xen... anyone?)
>>
>>  brassow
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

-- 

Evan Himmel
Burlington Telecom
http://www.burlingtontelecom.com

__________________________________________________________________________________________________________________________________________________
Attention! This electronic message contains information that may be legally confidential and/or privileged. The information is intended solely for the individual or entity named above and access by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it.




From Fernando.Nino at medias.cnes.fr  Thu Apr 20 07:56:20 2006
From: Fernando.Nino at medias.cnes.fr (Fernando Nino)
Date: Thu, 20 Apr 2006 09:56:20 +0200
Subject: [Linux-cluster] GFS join hang
Message-ID: <200604200756.k3K7uDo25619@cnes.fr>

Dear all,


 I am running GFS 6.1 with dlm on a cluster (4 nodes + front-end) of 
dual-headed Opterons and RHEL4U3. Because of some problems (kernel 
panic...) I had to hard boot some nodes of the cluster.  Now, some gfs 
partitions won't mount.  They will simply keep waiting forever for the 
"join" of the GFS group:

So... three questions:

 - What is the join exactly doing ? Cluster status is fine, everybody is 
member ...
 - What does the status code mean in the cman_tool output ?
 - What can I do to restart this cluster ?

NB: Before testing this (below) I rebooted the complete cluster and 
gfs_fsck'ed /all nodes /with everything unmounted.

---------------------------------------------------------------------------------------------------- 

root # service clvmd start

root #: service gfs start
Mounting GFS filesystems:    # forever !

in another console I get:
root # dmesg | tail
...
GFS: fsid=globcover:baieGC2b.0: jid=14: Done
GFS: fsid=globcover:baieGC2b.0: jid=15: Trying to acquire journal lock...
GFS: fsid=globcover:baieGC2b.0: jid=15: Looking at journal...
GFS: fsid=globcover:baieGC2b.0: jid=15: Done
GFS: Trying to join cluster "lock_dlm", "globcover:baieGC3a"


root #  cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                          11   2 run       -
[1 5 4 3 2]

DLM Lock Space:  "clvmd"                            12   3 run       -
[1 5 4 3 2]

DLM Lock Space:  "baieGC2b"                         13   4 run       -
[1 5]

DLM Lock Space:  "baieGC3a"                         15   6 run       -
[1 5 2 4 3]

GFS Mount Group: "baieGC2b"                         14   5 run       -
[1 5]

GFS Mount Group: "baieGC3a"                          0   7 join      
S-2,2,4
[]


root # cman_tool status
Protocol version: 5.0.1
Config version: 8
Cluster name: globcover
Cluster ID: 53692
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 5
Expected_votes: 5
Total_votes: 5
Quorum: 3
Active subsystems: 9
Node name: globcover-fe
Node addresses: 10.1.1.1

root # cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    5   M   globcover-fe
   2    1    5   M   compute-0-3
   3    1    5   M   compute-0-2
   4    1    5   M   compute-0-1
   5    1    5   M   compute-0-0

---------------------------------------------------------------------------------------------------- 




 Thanks,
-- 
------------------------------------------------------------------------
Fernando NI?O 	CNES - BPi 2102
Medias-France/IRD 	18, Av. Edouard Belin
T?l: 05.61.27.40.74 	31401 Toulouse Cedex 9





From hlawatschek at atix.de  Thu Apr 20 08:03:22 2006
From: hlawatschek at atix.de (Mark Hlawatschek)
Date: Thu, 20 Apr 2006 10:03:22 +0200
Subject: [Linux-cluster] Patch: nodeid based cdpn for GFS shared root
Message-ID: <200604201003.23057.hlawatschek@atix.de>

Hi,

as posted before, we are using GFS for our diskless shared root cluster 
solutions. 

In this file system based ssi configurations all servers are ?stateless? and 
share the same root partition and boot device in the SAN. Server, 
infrastructure and storage tier of the diskless shared root cluster can be 
scaled independently and incrementally. 

As we want to be independent from the servers hostnames at initrd boottime, we 
wrote a small GFS patch to use cmans nodeid parameter for a context dependent 
path name. 

I attached the patches to the mail. Note, that the GFS and cman patches are 
totally independent from each other and the cman patch is only for user 
information.

What do you think about nodeid cdpns ?

The Readme:
1. Reason for the patch
Create context dependent symbolic links (cdsl) dependent to cmans nodeid
E.g. ln -s @nodeid mynode

2. Contents
- cman-kernel-nodeid.patch
Applies against cman-kernel-2.6.9-41
- kernel-nodeid-symlink.patch
Applies against gfs-kernel-2.6.9-42

3. Changes:
3.1 cman-kernel
Added line to proc/cluster/status output. E.g: "Node ID: 4"
3.2 gfs-kernel
Added new parameter for cdsl symlink: @nodeid

Thanks,

Mark


-- 
Gruss / Regards,

Dipl.-Ing. Mark Hlawatschek
Phone: +49-89 121 409-55
http://www.atix.de/
http://www.open-sharedroot.org/

**
ATIX - Ges. fuer Informationstechnologie und Consulting mbH
Einsteinstr. 10 - 85716 Unterschleissheim - Germany
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gfs-kernel-nodeid-symlink.patch
Type: text/x-diff
Size: 2473 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060420/824ab2b3/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cman-kernel-nodeid.patch
Type: text/x-diff
Size: 500 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060420/824ab2b3/attachment-0001.bin>

From pcaulfie at redhat.com  Thu Apr 20 08:08:08 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 20 Apr 2006 09:08:08 +0100
Subject: [Linux-cluster] Patch: nodeid based cdpn for GFS shared root
In-Reply-To: <200604201003.23057.hlawatschek@atix.de>
References: <200604201003.23057.hlawatschek@atix.de>
Message-ID: <44474168.8030703@redhat.com>

Mark Hlawatschek wrote:
> Hi,


> diff -Naur cman-kernel-2.6.9-41.orig/src/proc.c cman-kernel-2.6.9-41/src/proc.c
> --- cman-kernel-2.6.9-41.orig/src/proc.c	2005-11-28 17:20:39.000000000 +0100
> +++ cman-kernel-2.6.9-41/src/proc.c	2006-01-23 23:20:15.000000000 +0100
> @@ -149,6 +149,8 @@
>  		 atomic_read(&use_count));
>  
>      c += sprintf(b+c, "Node name: %s\n", nodename);
> +    
> +    c += sprintf(b+c, "Node ID: %i\n", us->node_id);
>  
>      c += sprintf(b+c, "Node addresses: ");
>      list_for_each_entry(node_addr, &us->addr_list, list) {
> 
> 

This patch is already in CVS for cman. I can't comment on the GFS parts.

-- 

patrick



From cjk at techma.com  Thu Apr 20 11:16:52 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Thu, 20 Apr 2006 07:16:52 -0400
Subject: [Linux-cluster] Clustat in user's profile
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E6F@tmaemail.techma.com>

It's easy to illustrate, just do a "clustsat -i 1" in one window, then fence
another node from another. The clustat will pause (hang) at one point. Not a
very 
scientific test and probably a situation that would crop up much, but it can 
"pause" under normal use. As far as hanging, well, it's software, which by
it's 
very nature can hang. It'll probaly happen right when your showing your boss
the 
new whiz-bang cluster you've been working on :)

Things that can make _anything_ hang of course are slow resolves, interrupted

authentication mechanism (nis or ldap puking or slow) or maybe an NFS mount
which
is having issues which causes access checks to timeout/fail etc. 

In short, I'm personally not a fan of issueing commands during login scripts.


Cheers,


Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Steve Nelson
Sent: Wednesday, April 19, 2006 5:36 PM
To: linux clustering
Subject: [Linux-cluster] Clustat in user's profile

Hi All,

On all of my clusters, I have clustat run in the user's profile, so the
status of the cluster is visible whenever someone logs in.

Someone has suggested to me that clustat could hang, and prevent user access.
Is this a valid point?  Under what (if any) circumstances would clustat hang?

S.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From cjk at techma.com  Thu Apr 20 11:46:24 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Thu, 20 Apr 2006 07:46:24 -0400
Subject: [Linux-cluster] New features/architecture ?
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>

I've worked with GFS 6 and 6.1 quite a bit lately and in reading the posts
over the last few months, I see a lot
of references to gfs2. I'm not quite sure where it sits in the grand scheme
of things other than it's the next big
itteration of gfs as a whole and attepmpts are being made to mearge it into
the kernel. 

This post has some good info, but not much in the way of specifics
http://lwn.net/Articles/150652/

*	GFS2 - an improved version of GFS, not on-disk compatible 
*	DLM - an improved version of DLM 
*	CMAN - a new version of CMAN, based on OpenAIS
<http://developer.osdl.org/dev/openais/>  
*	CLVM - will allow more LVM2 features to be used in the cluster

These seem to be all there is as far as a "roadmap" and the OpenAIS link
doesn't seem all that descriptive
unless one is a developer.

Is there some point of reference which describes the changes between whats
already released and what is 
planned? For instance, a post recently mentioned adding openais
interfaces/functionality. 

Basically I guess I am looking for a roadmap of some sort? 


Cheers


Corey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060420/f97ca751/attachment.htm>

From skellogg at egginc.com  Thu Apr 20 12:50:11 2006
From: skellogg at egginc.com (Scott Kellogg)
Date: Thu, 20 Apr 2006 08:50:11 -0400
Subject: [Linux-cluster] Cluster Planning
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>
Message-ID: <8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com>

Hello,

I was wondering if I could get some assistance in setting up a two  
node cluster.

We have 2 Dell PowerEdge 850 machines running RHEL4.  Our license for  
Cluster Suite is still in purchasing.  The main thing in the Cluster  
Suite documentation which confuses me is the use of a SAN.  RHEL4  
docs claim that the need for a SAN has been eliminated, but I'm  
having trouble find more information.  Most of the docs assume you  
are using a SAN. My customer could not afford the SAN, just the servers.

I would like to set up a high-availablity environment.  I understand  
that due to the hardware configuration (no SAN, no RAID) that there  
are still points of failure.  I'm hoping to set up a simple active- 
passive configuration.  We will be running LAMP applications.  If the  
primary server cannot deliver services, I'd like to automatically cut  
over to the backup.

Ideally, I'd like to set up active-active and load balancing, since  
the servers have DRAC4 fence devices for use with STONITH.  However,   
since there is no SAN, I'm not sure how data will be mirrored across  
the two machines.

Any help is appreciated!

Thank you,
Scott Kellogg 



From teigland at redhat.com  Thu Apr 20 14:32:47 2006
From: teigland at redhat.com (David Teigland)
Date: Thu, 20 Apr 2006 09:32:47 -0500
Subject: [Linux-cluster] GFS join hang
In-Reply-To: <200604200756.k3K7uDo25619@cnes.fr>
References: <200604200756.k3K7uDo25619@cnes.fr>
Message-ID: <20060420143247.GA22326@redhat.com>

On Thu, Apr 20, 2006 at 09:56:20AM +0200, Fernando Nino wrote:
> I am running GFS 6.1 with dlm on a cluster (4 nodes + front-end) of 
> dual-headed Opterons and RHEL4U3. Because of some problems (kernel 
> panic...) I had to hard boot some nodes of the cluster.  Now, some gfs 
> partitions won't mount.  They will simply keep waiting forever for the 
> "join" of the GFS group:
> 
> So... three questions:
> 
> - What is the join exactly doing ? Cluster status is fine, everybody is 
> member ...

>From all 5 nodes it would be good to see:
- cman_tool services
- /var/log/messages
- /proc/cluster/lock_dlm/debug

> - What does the status code mean in the cman_tool output ?
> S-2,2,4

S-2: join event state is SEST_JOIN_ACKWAIT
,2: join event flag is SEFL_ALLOW_JOIN
,4: number of acks to our join request is 4

So, the node is waiting for acks to its join request.  It needs 5 but has
only got 4, someone hasn't sent a reply for some reason.  We might be able
to figure out who and why given all the info from the other nodes.
Rebooting the node that's not replied might resolve things.

Dave



From david.n.lombard at intel.com  Thu Apr 20 14:39:44 2006
From: david.n.lombard at intel.com (Lombard, David N)
Date: Thu, 20 Apr 2006 07:39:44 -0700
Subject: [Linux-cluster] Clustat in user's profile
Message-ID: <187D3A7CAB42A54DB61F1D05F0125722086AE16E@orsmsx402.amr.corp.intel.com>

From: Steve Nelson on Wednesday, April 19, 2006 2:36 PM
> Hi All,
> 
> On all of my clusters, I have clustat run in the user's profile, so
> the status of the cluster is visible whenever someone logs in.
> 
> Someone has suggested to me that clustat could hang, and prevent user
> access.  Is this a valid point?  Under what (if any) circumstances
> would clustat hang?

As another has pointed out, anything that can hang the login, will, at
the most inopportune times.

Why not have a cron job periodically report the status into some file
and then just cat the file results during login?  If the user then
really wants an up-to-the-moment report, they can buy into running
clustat.

-- 
dnl



From filipe.miranda at gmail.com  Thu Apr 20 15:11:31 2006
From: filipe.miranda at gmail.com (Filipe Miranda)
Date: Thu, 20 Apr 2006 12:11:31 -0300
Subject: [Linux-cluster] Cluster Planning
In-Reply-To: <8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>
	<8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com>
Message-ID: <a6d13c780604200811q16bef60amc55a145d94fccd42@mail.gmail.com>

Scott,

The RHCS for RHEL4 does not requires a SAN like its antecessor RHCS for
RHEL3.
All quorum control and cluster management is done throught network, so you
will be fine without a SAN when using RHCS for RHEL4.

The only problem you will encounter is that without a SAN you will probably
have to sincronize data between the servers if your application stores data
on internar discs on the servers....

Also to use an active-active (same service active on both servers)
configuration + loadbalancing you will need more than 2 servers; at least 4
servers, 2 for loadbalancing (1 active/1 backup) and 2 for the critical
service active concurently on both servers (no high availability, no
failover).

But if the active-active is for the servers (hardware), which means not the
same service on high availability  then; 2 servers doing loadbalancing (1
active/1 backup), 2 servers providing 2 critical services, one actice on
node A and the other one active on node B (failover activated)

Well thats my understanding about Red Hat's Cluster Suite... Please correct
me if I Am wrong...

Att.
Filipe Miranda


On 4/20/06, Scott Kellogg <skellogg at egginc.com> wrote:
>
> Hello,
>
> I was wondering if I could get some assistance in setting up a two
> node cluster.
>
> We have 2 Dell PowerEdge 850 machines running RHEL4.  Our license for
> Cluster Suite is still in purchasing.  The main thing in the Cluster
> Suite documentation which confuses me is the use of a SAN.  RHEL4
> docs claim that the need for a SAN has been eliminated, but I'm
> having trouble find more information.  Most of the docs assume you
> are using a SAN. My customer could not afford the SAN, just the servers.
>
> I would like to set up a high-availablity environment.  I understand
> that due to the hardware configuration (no SAN, no RAID) that there
> are still points of failure.  I'm hoping to set up a simple active-
> passive configuration.  We will be running LAMP applications.  If the
> primary server cannot deliver services, I'd like to automatically cut
> over to the backup.
>
> Ideally, I'd like to set up active-active and load balancing, since
> the servers have DRAC4 fence devices for use with STONITH.  However,
> since there is no SAN, I'm not sure how data will be mirrored across
> the two machines.
>
> Any help is appreciated!
>
> Thank you,
> Scott Kellogg
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



--
Att.
---
Filipe T Miranda
RHCE - Red Hat Certified Engineer
OCP8i - Oracle Certified Professional
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060420/1136be27/attachment.htm>

From skellogg at egginc.com  Thu Apr 20 15:22:42 2006
From: skellogg at egginc.com (Scott Kellogg)
Date: Thu, 20 Apr 2006 11:22:42 -0400
Subject: [Linux-cluster] Cluster Planning
In-Reply-To: <a6d13c780604200811q16bef60amc55a145d94fccd42@mail.gmail.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>
	<8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com>
	<a6d13c780604200811q16bef60amc55a145d94fccd42@mail.gmail.com>
Message-ID: <B0DCAD51-FDEA-498E-8375-F7826C03929A@egginc.com>

>
>
> The only problem you will encounter is that without a SAN you will  
> probably have to sincronize data between the servers if your  
> application stores data on internar discs on the servers....

Yes, this is the issue that seems to have multiple solutions.  I've  
looked at NFS, DRBD, rysnc, and Unison, but none of these  
technologies has jumped out at me as the best one.

>
> Also to use an active-active (same service active on both servers)  
> configuration + loadbalancing you will need more than 2 servers; at  
> least 4 servers, 2 for loadbalancing (1 active/1 backup) and 2 for  
> the critical service active concurently on both servers (no high  
> availability, no failover).

You seem to be referring to LVS.  Right, I can't implement that since  
I don't have enough hardware.  I think that active-passive will be  
the way to go.  When the active node dies, the passive node will take  
over.  The data will only be as fresh as the last synchronization.

That begs the question of what happens when the active node comes  
back up ... will the passive node (now active) sync its data to the  
new active node?  This is where picking a synchronization method  
becomes vital.

/Scott


>
> But if the active-active is for the servers (hardware), which means  
> not the same service on high availability  then; 2 servers doing  
> loadbalancing (1 active/1 backup), 2 servers providing 2 critical  
> services, one actice on node A and the other one active on node B  
> (failover activated)
>
> Well thats my understanding about Red Hat's Cluster Suite... Please  
> correct me if I Am wrong...
>
> Att.
> Filipe Miranda
>
>
> On 4/20/06, Scott Kellogg <skellogg at egginc.com> wrote:
> Hello,
>
> I was wondering if I could get some assistance in setting up a two
> node cluster.
>
> We have 2 Dell PowerEdge 850 machines running RHEL4.  Our license for
> Cluster Suite is still in purchasing.  The main thing in the Cluster
> Suite documentation which confuses me is the use of a SAN.  RHEL4
> docs claim that the need for a SAN has been eliminated, but I'm
> having trouble find more information.  Most of the docs assume you
> are using a SAN. My customer could not afford the SAN, just the  
> servers.
>
> I would like to set up a high-availablity environment.  I understand
> that due to the hardware configuration (no SAN, no RAID) that there
> are still points of failure.  I'm hoping to set up a simple active-
> passive configuration.  We will be running LAMP applications.  If the
> primary server cannot deliver services, I'd like to automatically cut
> over to the backup.
>
> Ideally, I'd like to set up active-active and load balancing, since
> the servers have DRAC4 fence devices for use with STONITH.  However,
> since there is no SAN, I'm not sure how data will be mirrored across
> the two machines.
>
> Any help is appreciated!
>
> Thank you,
> Scott Kellogg
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> -- 
> Att.
> ---
> Filipe T Miranda
> RHCE - Red Hat Certified Engineer
> OCP8i - Oracle Certified Professional
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Scott Kellogg
System Administrator
EG&G Technical Services, Inc.
(812) 854-7077 ext. 236



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060420/4b95c7df/attachment.htm>

From teigland at redhat.com  Thu Apr 20 15:33:26 2006
From: teigland at redhat.com (David Teigland)
Date: Thu, 20 Apr 2006 10:33:26 -0500
Subject: [Linux-cluster] New features/architecture ?
In-Reply-To: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>
Message-ID: <20060420153326.GB22326@redhat.com>

On Thu, Apr 20, 2006 at 07:46:24AM -0400, Kovacs, Corey J. wrote:
> I've worked with GFS 6 and 6.1 quite a bit lately and in reading the posts
> over the last few months, I see a lot
> of references to gfs2. I'm not quite sure where it sits in the grand scheme
> of things other than it's the next big
> itteration of gfs as a whole and attepmpts are being made to mearge it into
> the kernel. 
> 
> This post has some good info, but not much in the way of specifics
> http://lwn.net/Articles/150652/
> 
> *	GFS2 - an improved version of GFS, not on-disk compatible 
> *	DLM - an improved version of DLM 
> *	CMAN - a new version of CMAN, based on OpenAIS
> <http://developer.osdl.org/dev/openais/>  
> *	CLVM - will allow more LVM2 features to be used in the cluster
> 
> These seem to be all there is as far as a "roadmap" and the OpenAIS link
> doesn't seem all that descriptive
> unless one is a developer.
> 
> Is there some point of reference which describes the changes between whats
> already released and what is 
> planned? For instance, a post recently mentioned adding openais
> interfaces/functionality. 

For GFS2 and DLM it's largely performance improvements.  For clustering
infrastructure a ton of stuff moved out of the kernel and now runs in user
space, with openais at the center.  The user isn't exposed to much of the
infrastructure so there's not much user-visible change to speak about.

Patrick recently sent this out:
https://www.redhat.com/archives/linux-cluster/2006-April/msg00126.html

Dave



From jbrassow at redhat.com  Thu Apr 20 15:37:40 2006
From: jbrassow at redhat.com (Jonathan E Brassow)
Date: Thu, 20 Apr 2006 10:37:40 -0500
Subject: [Linux-cluster] Patch: nodeid based cdpn for GFS shared root
In-Reply-To: <200604201003.23057.hlawatschek@atix.de>
References: <200604201003.23057.hlawatschek@atix.de>
Message-ID: <82f35fc0732c6b277c5e616915a98166@redhat.com>

I've only heard in passing, but...

I think that there has been community push back on the cdpn's.  I don't 
think they want them in GFS 2.  It may be tough to argue that GFS 1 
needs more cdpn capability if it is completely going away in GFS 2.  
The reason sited against cdpn's was the fact that 'mount --bind' 
exists.  If you could articulate why bind mounts are insufficient for 
your uses, it may give the community a reason to take a second look at 
cdpn's.

  brassow

On Apr 20, 2006, at 3:03 AM, Mark Hlawatschek wrote:

> Hi,
>
> as posted before, we are using GFS for our diskless shared root cluster
> solutions.
>
> In this file system based ssi configurations all servers are 
> ?stateless? and
> share the same root partition and boot device in the SAN. Server,
> infrastructure and storage tier of the diskless shared root cluster 
> can be
> scaled independently and incrementally.
>
> As we want to be independent from the servers hostnames at initrd 
> boottime, we
> wrote a small GFS patch to use cmans nodeid parameter for a context 
> dependent
> path name.
>
> I attached the patches to the mail. Note, that the GFS and cman 
> patches are
> totally independent from each other and the cman patch is only for user
> information.
>
> What do you think about nodeid cdpns ?
>
> The Readme:
> 1. Reason for the patch
> Create context dependent symbolic links (cdsl) dependent to cmans 
> nodeid
> E.g. ln -s @nodeid mynode
>
> 2. Contents
> - cman-kernel-nodeid.patch
> Applies against cman-kernel-2.6.9-41
> - kernel-nodeid-symlink.patch
> Applies against gfs-kernel-2.6.9-42
>
> 3. Changes:
> 3.1 cman-kernel
> Added line to proc/cluster/status output. E.g: "Node ID: 4"
> 3.2 gfs-kernel
> Added new parameter for cdsl symlink: @nodeid
>
> Thanks,
>
> Mark
>
>
> -- 
> Gruss / Regards,
>
> Dipl.-Ing. Mark Hlawatschek
> Phone: +49-89 121 409-55
> http://www.atix.de/
> http://www.open-sharedroot.org/
>
> **
> ATIX - Ges. fuer Informationstechnologie und Consulting mbH
> Einsteinstr. 10 - 85716 Unterschleissheim - Germany
> <gfs-kernel-nodeid-symlink.patch><cman-kernel-nodeid.patch>--
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From cjk at techma.com  Thu Apr 20 15:45:59 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Thu, 20 Apr 2006 11:45:59 -0400
Subject: [Linux-cluster] New features/architecture ?
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E71@tmaemail.techma.com>

David, thanks for the reply. I've seen the post below and in fact it is what
prompted the question. Just seems like there is a lot going underneath that
I was missing. I was hoping for a more nuts and bolts bag of information
with respect to the changes being made across the board.

This is a good start though and I'll take a look.


Thanks


Corey 

-----Original Message-----
From: David Teigland [mailto:teigland at redhat.com] 
Sent: Thursday, April 20, 2006 11:33 AM
To: Kovacs, Corey J.
Cc: linux clustering
Subject: Re: [Linux-cluster] New features/architecture ?

On Thu, Apr 20, 2006 at 07:46:24AM -0400, Kovacs, Corey J. wrote:
> I've worked with GFS 6 and 6.1 quite a bit lately and in reading the 
> posts over the last few months, I see a lot of references to gfs2. I'm 
> not quite sure where it sits in the grand scheme of things other than 
> it's the next big itteration of gfs as a whole and attepmpts are being 
> made to mearge it into the kernel.
> 
> This post has some good info, but not much in the way of specifics 
> http://lwn.net/Articles/150652/
> 
> *	GFS2 - an improved version of GFS, not on-disk compatible 
> *	DLM - an improved version of DLM 
> *	CMAN - a new version of CMAN, based on OpenAIS
> <http://developer.osdl.org/dev/openais/>  
> *	CLVM - will allow more LVM2 features to be used in the cluster
> 
> These seem to be all there is as far as a "roadmap" and the OpenAIS 
> link doesn't seem all that descriptive unless one is a developer.
> 
> Is there some point of reference which describes the changes between 
> whats already released and what is planned? For instance, a post 
> recently mentioned adding openais interfaces/functionality.

For GFS2 and DLM it's largely performance improvements.  For clustering
infrastructure a ton of stuff moved out of the kernel and now runs in user
space, with openais at the center.  The user isn't exposed to much of the
infrastructure so there's not much user-visible change to speak about.

Patrick recently sent this out:
https://www.redhat.com/archives/linux-cluster/2006-April/msg00126.html

Dave




From teigland at redhat.com  Thu Apr 20 15:46:07 2006
From: teigland at redhat.com (David Teigland)
Date: Thu, 20 Apr 2006 10:46:07 -0500
Subject: [Linux-cluster] Patch: nodeid based cdpn for GFS shared root
In-Reply-To: <82f35fc0732c6b277c5e616915a98166@redhat.com>
References: <200604201003.23057.hlawatschek@atix.de>
	<82f35fc0732c6b277c5e616915a98166@redhat.com>
Message-ID: <20060420154607.GC22326@redhat.com>

On Thu, Apr 20, 2006 at 10:37:40AM -0500, Jonathan E Brassow wrote:
> I've only heard in passing, but...
>
> I think that there has been community push back on the cdpn's.  I don't
> think they want them in GFS 2.  It may be tough to argue that GFS 1
> needs more cdpn capability if it is completely going away in GFS 2.

CDPN's are already removed from GFS2, and probably don't have much chance
of getting back in since the linux-kernel folks really have the say.

Given that CDPN's are already in GFS1 and won't ever be removed from
there, I don't see any reason not to add a nodeid option.  GFS2 will need
a different approach regardless of whether nodeid is added to GFS1 or not.

Dave



From rainer at ultra-secure.de  Thu Apr 20 15:45:13 2006
From: rainer at ultra-secure.de (Rainer Duffner)
Date: Thu, 20 Apr 2006 17:45:13 +0200
Subject: [Linux-cluster] Cluster Planning
In-Reply-To: <B0DCAD51-FDEA-498E-8375-F7826C03929A@egginc.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>	<8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com>	<a6d13c780604200811q16bef60amc55a145d94fccd42@mail.gmail.com>
	<B0DCAD51-FDEA-498E-8375-F7826C03929A@egginc.com>
Message-ID: <4447AC89.1060005@ultra-secure.de>

Scott Kellogg wrote:

>
>>
>> Also to use an active-active (same service active on both servers) 
>> configuration + loadbalancing you will need more than 2 servers; at 
>> least 4 servers, 2 for loadbalancing (1 active/1 backup) and 2 for 
>> the critical service active concurently on both servers (no high 
>> availability, no failover).
>
> You seem to be referring to LVS.  Right, I can't implement that since 
> I don't have enough hardware.  I think that active-passive will be the 
> way to go.  When the active node dies, the passive node will take 
> over.  The data will only be as fresh as the last synchronization.  
>
> That begs the question of what happens when the active node comes back 
> up ... will the passive node (now active) sync its data to the new 
> active node?  This is where picking a synchronization method becomes 
> vital.


That's where GFS comes in.

I don't want to sound rude, but either you (or your customer) have the 
budget for a cluster or not.
If you don't have the budget, it's better to just use the 2nd server as 
hot-spare and rsync the data over to the 2nd one and do the failover by 
hand (and even more so the re-activation of the primary server)
Or you should have bought a more expensive, more reliable server instead 
of two low-end ones.
You *can* have low-end servers, but you need a reliable 
storage-infrastructure (which will be a SAN in 7 out of 10 cases and 
iSCSI in the other), which has a big upfront-cost.

This always reminds me of people who want to drive cars they cannot 
really afford.
It's better to acknowledge that and accommodate to a cheaper car than 
sitting there one day without the money to have it repaired....



cheers,
Rainer





From pcaulfie at redhat.com  Thu Apr 20 15:51:40 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 20 Apr 2006 16:51:40 +0100
Subject: [Linux-cluster] New features/architecture ?
In-Reply-To: <20060420153326.GB22326@redhat.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>
	<20060420153326.GB22326@redhat.com>
Message-ID: <4447AE0C.30000@redhat.com>

David Teigland wrote:
> On Thu, Apr 20, 2006 at 07:46:24AM -0400, Kovacs, Corey J. wrote:
>> I've worked with GFS 6 and 6.1 quite a bit lately and in reading the posts
>> over the last few months, I see a lot
>> of references to gfs2. I'm not quite sure where it sits in the grand scheme
>> of things other than it's the next big
>> itteration of gfs as a whole and attepmpts are being made to mearge it into
>> the kernel. 
>>
>> This post has some good info, but not much in the way of specifics
>> http://lwn.net/Articles/150652/
>>
>> *	GFS2 - an improved version of GFS, not on-disk compatible 
>> *	DLM - an improved version of DLM 
>> *	CMAN - a new version of CMAN, based on OpenAIS
>> <http://developer.osdl.org/dev/openais/>  
>> *	CLVM - will allow more LVM2 features to be used in the cluster
>>
>> These seem to be all there is as far as a "roadmap" and the OpenAIS link
>> doesn't seem all that descriptive
>> unless one is a developer.
>>
>> Is there some point of reference which describes the changes between whats
>> already released and what is 
>> planned? For instance, a post recently mentioned adding openais
>> interfaces/functionality. 
> 
> For GFS2 and DLM it's largely performance improvements.  For clustering
> infrastructure a ton of stuff moved out of the kernel and now runs in user
> space, with openais at the center.  The user isn't exposed to much of the
> infrastructure so there's not much user-visible change to speak about.
> 
> Patrick recently sent this out:
> https://www.redhat.com/archives/linux-cluster/2006-April/msg00126.html


That's really only about CCS changes. For a (slightly out-of-date ) higher
level overview see:
http://sources.redhat.com/cluster/events/summit2005/pjc2005.sxi

It doesn't mention openais (at least not in a relevant context!) but it might
give some more idea as to what is going on.
-- 

patrick



From filipe.miranda at gmail.com  Thu Apr 20 15:53:14 2006
From: filipe.miranda at gmail.com (Filipe Miranda)
Date: Thu, 20 Apr 2006 12:53:14 -0300
Subject: [Linux-cluster] Cluster Planning
In-Reply-To: <B0DCAD51-FDEA-498E-8375-F7826C03929A@egginc.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>
	<8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com>
	<a6d13c780604200811q16bef60amc55a145d94fccd42@mail.gmail.com>
	<B0DCAD51-FDEA-498E-8375-F7826C03929A@egginc.com>
Message-ID: <a6d13c780604200853u4b0ce603refee41f16513a3c7@mail.gmail.com>

Scott,



On 4/20/06, Scott Kellogg <skellogg at egginc.com> wrote:
>
>
>
> The only problem you will encounter is that without a SAN you will
> probably have to sincronize data between the servers if your application
> stores data on internar discs on the servers....
>
>
> Yes, this is the issue that seems to have multiple solutions.  I've looked
> at NFS, DRBD, rysnc, and Unison, but none of these technologies has jumped
> out at me as the best one.
>

Correct

Also to use an active-active (same service active on both servers)
> configuration + loadbalancing you will need more than 2 servers; at least 4
> servers, 2 for loadbalancing (1 active/1 backup) and 2 for the critical
> service active concurently on both servers (no high availability, no
> failover).
>
>
> You seem to be referring to LVS.  Right, I can't implement that since I
> don't have enough hardware.  I think that active-passive will be the way to
> go.  When the active node dies, the passive node will take over.  The data
> will only be as fresh as the last synchronization.
>
> That begs the question of what happens when the active node comes back up
> ... will the passive node (now active) sync its data to the new active
> node?  This is where picking a synchronization method becomes vital.
>

Excellent point here.
That's the problem when you dont have a SAN --> sync!

Let's suppose we have a 2 node on failover.
NodeA active NodeB passive.
NodeA should be in sync with NodeB
If NodeA dies, NodeB takes over
NodeB must then continue to sync its data to NodeA (when it becomes
available again)

This a tough job!

About the sync technologies you mentioned:

DRBD
When you will need a special kernel with support for that. Or recompile a
new kenel (be careful since Red Hat wont support any modified piece of
software you use, specially the kernel)

NFS:
You will need a dedicated server to provide shares, right?

Rsync:
Must have pretty intelligent scripts to garantee what we discussed above,
and still not satisfactory

/Filipe



/Scott
>
>
>
> But if the active-active is for the servers (hardware), which means not
> the same service on high availability  then; 2 servers doing loadbalancing
> (1 active/1 backup), 2 servers providing 2 critical services, one actice on
> node A and the other one active on node B (failover activated)
>
> Well thats my understanding about Red Hat's Cluster Suite... Please
> correct me if I Am wrong...
>
> Att.
> Filipe Miranda
>
>
> On 4/20/06, Scott Kellogg <skellogg at egginc.com> wrote:
> >
> > Hello,
> >
> > I was wondering if I could get some assistance in setting up a two
> > node cluster.
> >
> > We have 2 Dell PowerEdge 850 machines running RHEL4.  Our license for
> > Cluster Suite is still in purchasing.  The main thing in the Cluster
> > Suite documentation which confuses me is the use of a SAN.  RHEL4
> > docs claim that the need for a SAN has been eliminated, but I'm
> > having trouble find more information.  Most of the docs assume you
> > are using a SAN. My customer could not afford the SAN, just the servers.
> >
> >
> > I would like to set up a high-availablity environment.  I understand
> > that due to the hardware configuration (no SAN, no RAID) that there
> > are still points of failure.  I'm hoping to set up a simple active-
> > passive configuration.  We will be running LAMP applications.  If the
> > primary server cannot deliver services, I'd like to automatically cut
> > over to the backup.
> >
> > Ideally, I'd like to set up active-active and load balancing, since
> > the servers have DRAC4 fence devices for use with STONITH.  However,
> > since there is no SAN, I'm not sure how data will be mirrored across
> > the two machines.
> >
> > Any help is appreciated!
> >
> > Thank you,
> > Scott Kellogg
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
>
>
> --
> Att.
> ---
> Filipe T Miranda
> RHCE - Red Hat Certified Engineer
> OCP8i - Oracle Certified Professional--
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> --
> Scott Kellogg
> System Administrator
> EG&G Technical Services, Inc.
> (812) 854-7077 ext. 236
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060420/d3ffb90c/attachment.htm>

From mwill at penguincomputing.com  Thu Apr 20 16:06:21 2006
From: mwill at penguincomputing.com (Michael Will)
Date: Thu, 20 Apr 2006 09:06:21 -0700
Subject: [Linux-cluster] New features/architecture ?
Message-ID: <433093DF7AD7444DA65EFAFE3987879C0B84CD@jellyfish.highlyscyld.com>

Whats y'alls take on OCFS2 which is in the 2.6 kernel tree? 

Michael 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J.
Sent: Thursday, April 20, 2006 8:46 AM
To: David Teigland
Cc: linux clustering
Subject: RE: [Linux-cluster] New features/architecture ?

David, thanks for the reply. I've seen the post below and in fact it is
what prompted the question. Just seems like there is a lot going
underneath that I was missing. I was hoping for a more nuts and bolts
bag of information with respect to the changes being made across the
board.

This is a good start though and I'll take a look.


Thanks


Corey 

-----Original Message-----
From: David Teigland [mailto:teigland at redhat.com]
Sent: Thursday, April 20, 2006 11:33 AM
To: Kovacs, Corey J.
Cc: linux clustering
Subject: Re: [Linux-cluster] New features/architecture ?

On Thu, Apr 20, 2006 at 07:46:24AM -0400, Kovacs, Corey J. wrote:
> I've worked with GFS 6 and 6.1 quite a bit lately and in reading the 
> posts over the last few months, I see a lot of references to gfs2. I'm

> not quite sure where it sits in the grand scheme of things other than 
> it's the next big itteration of gfs as a whole and attepmpts are being

> made to mearge it into the kernel.
> 
> This post has some good info, but not much in the way of specifics 
> http://lwn.net/Articles/150652/
> 
> *	GFS2 - an improved version of GFS, not on-disk compatible 
> *	DLM - an improved version of DLM 
> *	CMAN - a new version of CMAN, based on OpenAIS
> <http://developer.osdl.org/dev/openais/>  
> *	CLVM - will allow more LVM2 features to be used in the cluster
> 
> These seem to be all there is as far as a "roadmap" and the OpenAIS 
> link doesn't seem all that descriptive unless one is a developer.
> 
> Is there some point of reference which describes the changes between 
> whats already released and what is planned? For instance, a post 
> recently mentioned adding openais interfaces/functionality.

For GFS2 and DLM it's largely performance improvements.  For clustering
infrastructure a ton of stuff moved out of the kernel and now runs in
user space, with openais at the center.  The user isn't exposed to much
of the infrastructure so there's not much user-visible change to speak
about.

Patrick recently sent this out:
https://www.redhat.com/archives/linux-cluster/2006-April/msg00126.html

Dave


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From pheeh at nodeps.org  Thu Apr 20 17:33:53 2006
From: pheeh at nodeps.org (pheeh at nodeps.org)
Date: Thu, 20 Apr 2006 10:33:53 -0700 (MST)
Subject: [Linux-cluster] Cluster Planning
In-Reply-To: <a6d13c780604200853u4b0ce603refee41f16513a3c7@mail.gmail.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>
	<8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com>
	<a6d13c780604200811q16bef60amc55a145d94fccd42@mail.gmail.com>
	<B0DCAD51-FDEA-498E-8375-F7826C03929A@egginc.com>
	<a6d13c780604200853u4b0ce603refee41f16513a3c7@mail.gmail.com>
Message-ID: <54005.149.169.192.7.1145554433.squirrel@149.169.192.7>

>From a newbie...
I am having the same concerns.  I have two boxes that are dedicated
storage (iSCSI), and two dedicated GFS servers so it looks like:
Storage Machines:
storage1
storage2

GFS:
gfs1
gfs2

Now, if I create the LVM on gfs1 and it encompases storage1 and storage2,
I would be able to mount the LVM on gfs2 via something like nbd or use
gfs1 as a iscsi target in itself.  However, lets assume that gfs1 just
dies.  Then the LVM on gfs1 would no longer exist and gfs2 would not be
able to write to the disks.
So I guess my question is could I create two LVM instances one on gfs1 and
one on gfs2 where each would have access to both devices such that:

gfs1 /dev/cluster/web (storage1, storage2)
gfs2 /dev/cluster/web (storage1, storage2)

So that either GFS server could croak and a web server would still be able
to access one of the boxes?  Although I don't see how that would work.

Now, I have figured out that with a single storage device its pretty
simple since both machines just mount the iscsi with the initiator,
although I just can't seem to figure out hwo to get it done with 2 storage
devices such that they act like a RAID1 and failover is seemless.

> Scott,
>
>
>
> On 4/20/06, Scott Kellogg <skellogg at egginc.com> wrote:
>>
>>
>>
>> The only problem you will encounter is that without a SAN you will
>> probably have to sincronize data between the servers if your application
>> stores data on internar discs on the servers....
>>
>>
>> Yes, this is the issue that seems to have multiple solutions.  I've
>> looked
>> at NFS, DRBD, rysnc, and Unison, but none of these technologies has
>> jumped
>> out at me as the best one.
>>
>
> Correct
>
> Also to use an active-active (same service active on both servers)
>> configuration + loadbalancing you will need more than 2 servers; at
>> least 4
>> servers, 2 for loadbalancing (1 active/1 backup) and 2 for the critical
>> service active concurently on both servers (no high availability, no
>> failover).
>>
>>
>> You seem to be referring to LVS.  Right, I can't implement that since I
>> don't have enough hardware.  I think that active-passive will be the way
>> to
>> go.  When the active node dies, the passive node will take over.  The
>> data
>> will only be as fresh as the last synchronization.
>>
>> That begs the question of what happens when the active node comes back
>> up
>> ... will the passive node (now active) sync its data to the new active
>> node?  This is where picking a synchronization method becomes vital.
>>
>
> Excellent point here.
> That's the problem when you dont have a SAN --> sync!
>
> Let's suppose we have a 2 node on failover.
> NodeA active NodeB passive.
> NodeA should be in sync with NodeB
> If NodeA dies, NodeB takes over
> NodeB must then continue to sync its data to NodeA (when it becomes
> available again)
>
> This a tough job!
>
> About the sync technologies you mentioned:
>
> DRBD
> When you will need a special kernel with support for that. Or recompile a
> new kenel (be careful since Red Hat wont support any modified piece of
> software you use, specially the kernel)
>
> NFS:
> You will need a dedicated server to provide shares, right?
>
> Rsync:
> Must have pretty intelligent scripts to garantee what we discussed above,
> and still not satisfactory
>
> /Filipe
>
>
>
> /Scott
>>
>>
>>
>> But if the active-active is for the servers (hardware), which means not
>> the same service on high availability  then; 2 servers doing
>> loadbalancing
>> (1 active/1 backup), 2 servers providing 2 critical services, one actice
>> on
>> node A and the other one active on node B (failover activated)
>>
>> Well thats my understanding about Red Hat's Cluster Suite... Please
>> correct me if I Am wrong...
>>
>> Att.
>> Filipe Miranda
>>
>>
>> On 4/20/06, Scott Kellogg <skellogg at egginc.com> wrote:
>> >
>> > Hello,
>> >
>> > I was wondering if I could get some assistance in setting up a two
>> > node cluster.
>> >
>> > We have 2 Dell PowerEdge 850 machines running RHEL4.  Our license for
>> > Cluster Suite is still in purchasing.  The main thing in the Cluster
>> > Suite documentation which confuses me is the use of a SAN.  RHEL4
>> > docs claim that the need for a SAN has been eliminated, but I'm
>> > having trouble find more information.  Most of the docs assume you
>> > are using a SAN. My customer could not afford the SAN, just the
>> servers.
>> >
>> >
>> > I would like to set up a high-availablity environment.  I understand
>> > that due to the hardware configuration (no SAN, no RAID) that there
>> > are still points of failure.  I'm hoping to set up a simple active-
>> > passive configuration.  We will be running LAMP applications.  If the
>> > primary server cannot deliver services, I'd like to automatically cut
>> > over to the backup.
>> >
>> > Ideally, I'd like to set up active-active and load balancing, since
>> > the servers have DRAC4 fence devices for use with STONITH.  However,
>> > since there is no SAN, I'm not sure how data will be mirrored across
>> > the two machines.
>> >
>> > Any help is appreciated!
>> >
>> > Thank you,
>> > Scott Kellogg
>> >
>> > --
>> > Linux-cluster mailing list
>> > Linux-cluster at redhat.com
>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>> >
>>
>>
>>
>> --
>> Att.
>> ---
>> Filipe T Miranda
>> RHCE - Red Hat Certified Engineer
>> OCP8i - Oracle Certified Professional--
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> --
>> Scott Kellogg
>> System Administrator
>> EG&G Technical Services, Inc.
>> (812) 854-7077 ext. 236
>>
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From skellogg at egginc.com  Thu Apr 20 17:48:05 2006
From: skellogg at egginc.com (Scott Kellogg)
Date: Thu, 20 Apr 2006 13:48:05 -0400
Subject: [Linux-cluster] Cluster Planning
In-Reply-To: <4447AC89.1060005@ultra-secure.de>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>	<8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com>	<a6d13c780604200811q16bef60amc55a145d94fccd42@mail.gmail.com>
	<B0DCAD51-FDEA-498E-8375-F7826C03929A@egginc.com>
	<4447AC89.1060005@ultra-secure.de>
Message-ID: <AD706D97-8AC2-46FE-88C7-1DA68D0D6392@egginc.com>

>
> I don't want to sound rude, but either you (or your customer) have  
> the budget for a cluster or not.

My feelings exactly.  Unfortunately, I do not have control over the  
level of reactive, crisis-based decision-making.

> If you don't have the budget, it's better to just use the 2nd  
> server as hot-spare and rsync the data over to the 2nd one and do  
> the failover by hand (and even more so the re-activation of the  
> primary server)

This is a satisfactory solution.  Would you care to elaborate?  I  
have read the book "Linux Enterprise Clusters" and it offered several  
approaches to this.  The budget exists to buy Cluster Suite, which I  
was hoping would simplify configuration of Heartbeat and STONITH.

I have been looking at Unison as an option for synchronizing the  
primary node and the hot spare.

>
> This always reminds me of people who want to drive cars they cannot  
> really afford.

The real irony is that the customer IMO doesn't even *need* a  
failover solution.

/Scott





From skellogg at egginc.com  Thu Apr 20 17:50:12 2006
From: skellogg at egginc.com (Scott Kellogg)
Date: Thu, 20 Apr 2006 13:50:12 -0400
Subject: [Linux-cluster] Cluster Planning
In-Reply-To: <a6d13c780604200853u4b0ce603refee41f16513a3c7@mail.gmail.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>
	<8E77A76D-C04D-4CA9-AD40-0B508A7310E5@egginc.com>
	<a6d13c780604200811q16bef60amc55a145d94fccd42@mail.gmail.com>
	<B0DCAD51-FDEA-498E-8375-F7826C03929A@egginc.com>
	<a6d13c780604200853u4b0ce603refee41f16513a3c7@mail.gmail.com>
Message-ID: <3E3DE461-2A6C-4F43-8045-0B0880D33173@egginc.com>

>
>
> This a tough job!

Sure is!  But this hot spare failover concept satisfies the goals of  
the project.  I know that it's not fault-tolerant, but that's fine.

/Scott




From sanelson at gmail.com  Thu Apr 20 17:55:31 2006
From: sanelson at gmail.com (Steve Nelson)
Date: Thu, 20 Apr 2006 18:55:31 +0100
Subject: [Linux-cluster] Clustat in user's profile
In-Reply-To: <187D3A7CAB42A54DB61F1D05F0125722086AE16E@orsmsx402.amr.corp.intel.com>
References: <187D3A7CAB42A54DB61F1D05F0125722086AE16E@orsmsx402.amr.corp.intel.com>
Message-ID: <b6131fdc0604201055p7268f097v6934367e18452bfc@mail.gmail.com>

On 4/20/06, Lombard, David N <david.n.lombard at intel.com> wrote:
> From: Steve Nelson on Wednesday, April 19, 2006 2:36 PM
> > Hi All,
> >
> > On all of my clusters, I have clustat run in the user's profile, so
> > the status of the cluster is visible whenever someone logs in.
> >
> > Someone has suggested to me that clustat could hang, and prevent user
> > access.  Is this a valid point?  Under what (if any) circumstances
> > would clustat hang?
>
> As another has pointed out, anything that can hang the login, will, at
> the most inopportune times.
>
> Why not have a cron job periodically report the status into some file
> and then just cat the file results during login?

Yes, and indeed I discovered that I can export the info as xml too,
which could be handy :)

> If the user then
> really wants an up-to-the-moment report, they can buy into running
> clustat.

Definitely.  Thanks for all the advice :)

S.



From dist-list at LEXUM.UMontreal.CA  Thu Apr 20 20:37:24 2006
From: dist-list at LEXUM.UMontreal.CA (FM)
Date: Thu, 20 Apr 2006 16:37:24 -0400
Subject: [Linux-cluster] webfarm and redhat cluster ?
Message-ID: <4447F104.8010003@lexum.umontreal.ca>

Hello,
I think I have a misunderstanding  with Redhat cluster suite and webFarm.

I'm testing this scenario :
2 server as load balancer (red hat + piranha)
2 WEB servers behind the balancer connected to a GFS file system.
My goal is to increase the uptime of my websites and to be able to
decrease the servers load by adding a new one if necessary.

At first,  I though that I need to create a cluster for the 2 web
servers. But in this scenario, I cannot have load balancing between the
2 web servers.

So, am I missing something, or my option here is to have 1 load balancer
cluster active/passive (for fail over) and my 2 web servers with a
connection to the GFS file system. So no cluster with those server ?

The problem with that setup is that piranha will see if a server fails
or if httpd fails but not if the GFS fails.

Sorry if it's a newbie question



From pcaulfie at redhat.com  Fri Apr 21 12:33:54 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Fri, 21 Apr 2006 13:33:54 +0100
Subject: [Linux-cluster] New features/architecture ?
In-Reply-To: <4447AE0C.30000@redhat.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>	<20060420153326.GB22326@redhat.com>
	<4447AE0C.30000@redhat.com>
Message-ID: <4448D132.2020906@redhat.com>

Patrick Caulfield wrote:
> David Teigland wrote:


>> Patrick recently sent this out:
>> https://www.redhat.com/archives/linux-cluster/2006-April/msg00126.html
> 
> 
> That's really only about CCS changes. For a (slightly out-of-date ) higher
> level overview see:
> http://sources.redhat.com/cluster/events/summit2005/pjc2005.sxi
> 
> It doesn't mention openais (at least not in a relevant context!) but it might
> give some more idea as to what is going on.

I've updated this (slightly):

http://people.redhat.com/pcaulfie/cman2006.sxi

-- 

patrick



From Alain.Moulle at bull.net  Fri Apr 21 14:04:27 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Fri, 21 Apr 2006 16:04:27 +0200
Subject: [Linux-cluster] CS4 Update 2 / GUI problem ?
Message-ID: <4448E66B.4050200@bull.net>

Hi

I have some problems to configure a 3 nodes cluster
with the GUI.
My version of GUI is :
system-config-cluster-1.0.16-1.0

After completion of the configuration (so : members,
fence devices, failover domains, resources and services)
I can see in cluster.conf :

...
        <clusternodes>
                <clusternode name="yack10" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="yack21" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="yack23" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
...

So as you can see , I have to add the good fence lines as following:
         <clusternodes>
                <clusternode name="yack10" votes="1">
                        <fence>
                                <method name="1">
                                <device name="yack10_fence"/>
                        </fence>
                </clusternode>
                <clusternode name="yack21" votes="1">
                        <fence>
                                <method name="1">
                                <device name="yack21_fence"/>
                        </fence>
                </clusternode>
                <clusternode name="yack23" votes="1">
                        <fence>
                                <method name="1">
                                <device name="yack23_fence"/>
                        </fence>
                </clusternode>
        </clusternodes>

so that it works.

Knowing that fencedevices are:

                <fencedevice agent="fence_ipmilan" ipaddr="12.8.11.11"
login="xxxxxxx" name="yack10_fence" passwd="xxxxxxx"/>
                <fencedevice agent="fence_ipmilan" ipaddr="12.8.11.24"
login="xxxxxxx" name="yack23_fence" passwd="xxxxxxx"/>
                <fencedevice agent="fence_ipmilan" ipaddr="12.8.11.22"
login="xxxxxxx" name="yack21_fence" passwd="xxxxxxx"/>

So is it a known bug of GUI ?
Or did I miss something somewhere in the GUI, so that
I miss these fence lines in cluster nodes records.

Thanks for your help

Alain Moull?






From gstaltari at arnet.net.ar  Fri Apr 21 14:15:00 2006
From: gstaltari at arnet.net.ar (German Staltari)
Date: Fri, 21 Apr 2006 11:15:00 -0300
Subject: [Linux-cluster] Re: problems with 8 node production gfs cluster
In-Reply-To: <20060418133704.GA16121@redhat.com>
References: <1145266165.27997.57.camel@nemanja.eunet.yu>	<1145288499.6000.15.camel@nemanja.eunet.yu>
	<20060418133704.GA16121@redhat.com>
Message-ID: <4448E8E4.3000400@arnet.net.ar>

David Teigland wrote:
> On Mon, Apr 17, 2006 at 05:41:39PM +0200, Nemanja Miletic wrote:
>   
>> Hi,
>>
>> Does anyone think that turning on journaling on files could help us
>> speed up the access to gfs partition?
>>
>> This would be difficult because journaling can be turned on only on
>> files that are empty. We have a large number of empty files of active
>> users that download all their mail from pop3 server, so turning on
>> jurnaling for them should be possible. 
>>     
>
> Data journaling might help, it will speed up fsync(), but will increase
> the i/o going to your storage.
>
>   
>> What size should be the journals when file journaling is on?
>>     
>
> Continue to use the default.
>
> Another thing you might try is disabling the drop-locks callback, allowing
> GFS to cache more locks.  Do this before you mount:
>   echo "0" >> /proc/cluster/lock_dlm/drop_count
>
>   
Did you apply this changes? Could you share the results of this changes 
in your configuration? Do you recommend it?
Thanks
German Staltari



From jparsons at redhat.com  Fri Apr 21 15:30:58 2006
From: jparsons at redhat.com (James Parsons)
Date: Fri, 21 Apr 2006 11:30:58 -0400
Subject: [Linux-cluster] CS4 Update 2 / GUI problem ?
In-Reply-To: <4448E66B.4050200@bull.net>
References: <4448E66B.4050200@bull.net>
Message-ID: <4448FAB2.1030009@redhat.com>

Alain Moulle wrote:

>Hi
>
>I have some problems to configure a 3 nodes cluster
>with the GUI.
>My version of GUI is :
>system-config-cluster-1.0.16-1.0
>
>After completion of the configuration (so : members,
>fence devices, failover domains, resources and services)
>I can see in cluster.conf :
>
>...
>        <clusternodes>
>                <clusternode name="yack10" votes="1">
>                        <fence/>
>                </clusternode>
>                <clusternode name="yack21" votes="1">
>                        <fence/>
>                </clusternode>
>                <clusternode name="yack23" votes="1">
>                        <fence/>
>                </clusternode>
>        </clusternodes>
>...
>
>So as you can see , I have to add the good fence lines as following:
>         <clusternodes>
>                <clusternode name="yack10" votes="1">
>                        <fence>
>                                <method name="1">
>                                <device name="yack10_fence"/>
>                        </fence>
>                </clusternode>
>                <clusternode name="yack21" votes="1">
>                        <fence>
>                                <method name="1">
>                                <device name="yack21_fence"/>
>                        </fence>
>                </clusternode>
>                <clusternode name="yack23" votes="1">
>                        <fence>
>                                <method name="1">
>                                <device name="yack23_fence"/>
>                        </fence>
>                </clusternode>
>        </clusternodes>
>
>so that it works.
>
>Knowing that fencedevices are:
>
>                <fencedevice agent="fence_ipmilan" ipaddr="12.8.11.11"
>login="xxxxxxx" name="yack10_fence" passwd="xxxxxxx"/>
>                <fencedevice agent="fence_ipmilan" ipaddr="12.8.11.24"
>login="xxxxxxx" name="yack23_fence" passwd="xxxxxxx"/>
>                <fencedevice agent="fence_ipmilan" ipaddr="12.8.11.22"
>login="xxxxxxx" name="yack21_fence" passwd="xxxxxxx"/>
>
>So is it a known bug of GUI ?
>Or did I miss something somewhere in the GUI, so that
>I miss these fence lines in cluster nodes records.
>
>Thanks for your help
>
>Alain Moull?
>
>
>
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>  
>
Setting up fencing is done in two steps:
1) Configuring fence devices
2) Configuring fencing for individual nodes.
You have your fence devices all ready to go for the first step. For the 
second step,  in the GUI, you need to select a node, and then click 
"Manage Fencing for this Node". A pop-up will allow you to create fence 
levels and instances of your fence devices in the levels.

Now for baseboard management fence types such as ipmi, rsa, iLO, Drac, 
etc., This dichotomy between fence device and fence instance is purely 
artificial. They map 1:1, device:instance. Shared fence devices are a 
different story, and the GUI is constructed to handle configuration of 
these types of fences as well. Things get even stickier when you support 
baseboard management methods like Drac/MC (a variant of Drac), which is 
a shared fence method. One way to present configuration for fences would 
be to separate fencing into two types: Shared and unshared, and then 
construct appropriate GUIs for each. Another way to go is to present a 
consistent model and config approach for all fence types with similar 
configuration steps no matter if the devices are shared or not. The 
latter is the approach we took for the latest cluster GUI.

Anyway, Alain, for now, please keep in mind the need to config fence 
device AND fence instance for every type of fence, even if they are one 
in the same such as IPMI. In the meanwhile, the fence config GUI is 
begging to be refactored, and we hope to have a simpler method in place 
by next update. BTW, your opinions are always welcome.

Thanks and Regards,

-Jim

The editing that you did by hand looks OK except for one important 
omission: You forgot to close off
the <method> tags ;-)



From raycharles_man at yahoo.com  Fri Apr 21 14:46:00 2006
From: raycharles_man at yahoo.com (Ray Charles)
Date: Fri, 21 Apr 2006 07:46:00 -0700 (PDT)
Subject: [Linux-cluster] GFS is for what and how it works ?
Message-ID: <20060421144600.95327.qmail@web32105.mail.mud.yahoo.com>



Hi,
I read your post (below) from a couple of weeks ago.
My  question / comment is... 
I've read that GFS Volumes are limited to 2TB, redhat
whitepaper says 1TB (dated i am sure). You're at 1.2TB
today what if you need to be at 5TB in a year?? How
will you seemlessly grow the space that exist on mount
points beyond 2TB ? 

> 4. Could you give me example what is actually the
GFS real usage in
> real live ? 

I'm using it to share a 1.2 TB storage area between
two systems that
use it for processing and a third system that has
direct access for
making backups.

> I'm absolutely confuse with this GFS on how they
works.

Yea.  The documentation is not very extensive at this
point.

-- 
Bowie

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



From Bowie_Bailey at BUC.com  Fri Apr 21 16:25:17 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Fri, 21 Apr 2006 12:25:17 -0400
Subject: [Linux-cluster] GFS is for what and how it works ?
Message-ID: <4766EEE585A6D311ADF500E018C154E30213393F@bnifex.cis.buc.com>

Ray Charles wrote:
> I read your post (below) from a couple of weeks ago.
> My  question / comment is...
> I've read that GFS Volumes are limited to 2TB, redhat
> whitepaper says 1TB (dated i am sure). You're at 1.2TB
> today what if you need to be at 5TB in a year?? How
> will you seemlessly grow the space that exist on mount
> points beyond 2TB ?

I have not found any conclusive answers on maximum filesystem sizes.
I think the hard limit for a filesystem with the current kernel is
8TB, but some software may have problems with it when it goes over
2TB.  

I don't have a way to test a filesystem that large since I don't have
the storage.  I'll just have to see what happens when I get there.

-- 
Bowie



From Fernando.Nino at medias.cnes.fr  Wed Apr 19 16:26:05 2006
From: Fernando.Nino at medias.cnes.fr (Fernando Nino)
Date: Wed, 19 Apr 2006 18:26:05 +0200
Subject: [Linux-cluster] GFS join hang
Message-ID: <200604191625.k3JGPxo14450@cnes.fr>

Dear all,


  I am running GFS 6.1 with dlm on a cluster (4 nodes + front-end) of 
dual-headed Opterons and RHEL4U3. Because of some problems (kernel 
panic...) I had to hard boot some nodes of the cluster.  Now, some gfs 
partitions simply won't mount.  In some nodes, they will simply keep 
waiting forever for the join of the GFS group:

So three questions:

  - What is the join exactly waiting for ? Cluster status is fine, 
everybody is member ...
  - What does the status code mean in the cman_tool output ?
  - What can I do to restart this cluster ?

NB: Before testing this (below) I rebooted the complete cluster and 
gfs_fsck'ed /all nodes /with everything unmounted.

----------------------------------------------------------------------------------------------------
root # service clvmd start

root #: service gfs start
Mounting GFS filesystems:    # forever !

in another console I get:
root # dmesg | tail
...
GFS: fsid=globcover:baieGC2b.0: jid=14: Done
GFS: fsid=globcover:baieGC2b.0: jid=15: Trying to acquire journal lock...
GFS: fsid=globcover:baieGC2b.0: jid=15: Looking at journal...
GFS: fsid=globcover:baieGC2b.0: jid=15: Done
GFS: Trying to join cluster "lock_dlm", "globcover:baieGC3a"


root #  cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                          11   2 run       -
[1 5 4 3 2]

DLM Lock Space:  "clvmd"                            12   3 run       -
[1 5 4 3 2]

DLM Lock Space:  "baieGC2b"                         13   4 run       -
[1 5]

DLM Lock Space:  "baieGC3a"                         15   6 run       -
[1 5 2 4 3]

GFS Mount Group: "baieGC2b"                         14   5 run       -
[1 5]

GFS Mount Group: "baieGC3a"                          0   7 join      S-2,2,4
[]
----------------------------------------------------------------------------------------------------



  Thanks,
-- 
------------------------------------------------------------------------
Fernando NI?O 	CNES - BPi 2102
Medias-France/IRD 	18, Av. Edouard Belin
T?l: 05.61.27.40.74 	31401 Toulouse Cedex 9





From Fernando.Nino at medias.cnes.fr  Thu Apr 20 15:00:48 2006
From: Fernando.Nino at medias.cnes.fr (Fernando Nino)
Date: Thu, 20 Apr 2006 17:00:48 +0200
Subject: [Linux-cluster] GFS join hang
In-Reply-To: <20060420143247.GA22326@redhat.com>
References: <200604200756.k3K7uDo25619@cnes.fr>
	<20060420143247.GA22326@redhat.com>
Message-ID: <200604201501.k3KF19K14069@cnes.fr>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060420/cd30b2dd/attachment.htm>

From sdake at redhat.com  Thu Apr 20 20:02:29 2006
From: sdake at redhat.com (Steven Dake)
Date: Thu, 20 Apr 2006 13:02:29 -0700
Subject: [Linux-cluster] type punned pointers breakage
Message-ID: <1145563349.25648.37.camel@shih.broked.org>

likely to cause problems with the optimizer - patch attached to fix.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: type-punned.patch
Type: text/x-patch
Size: 1496 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060420/7ae07d9b/attachment.bin>

From sdake at redhat.com  Thu Apr 20 20:11:00 2006
From: sdake at redhat.com (Steven Dake)
Date: Thu, 20 Apr 2006 13:11:00 -0700
Subject: [Linux-cluster] another type punned patch
Message-ID: <1145563860.25648.39.camel@shih.broked.org>

attached
-------------- next part --------------
A non-text attachment was scrubbed...
Name: type-punned-p2.patch
Type: text/x-patch
Size: 320 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060420/6a7dd97c/attachment.bin>

From sdake at redhat.com  Thu Apr 20 20:24:01 2006
From: sdake at redhat.com (Steven Dake)
Date: Thu, 20 Apr 2006 13:24:01 -0700
Subject: [Linux-cluster] member_list_to_id looks fishy
Message-ID: <1145564641.25648.41.camel@shih.broked.org>

patch attached (untested) to possibly fix
-------------- next part --------------
A non-text attachment was scrubbed...
Name: member_list.patch
Type: text/x-patch
Size: 2849 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060420/950e47a9/attachment.bin>

From jparsons at redhat.com  Thu Apr 20 21:16:55 2006
From: jparsons at redhat.com (James Parsons)
Date: Thu, 20 Apr 2006 17:16:55 -0400
Subject: [Linux-cluster] New APC agent
Message-ID: <4447FA47.8090203@redhat.com>

Hello all,

This is an snmp based fence agent for APC power switches to be used
with RHEL4 Red Hat Cluster Suite.
                                                                                

The reasons to use this agent rather than the current fence_apc agent are:
1) This script has been tested successfully with EVERY powerswitch that 
APC currently
makes.
2) It will work on many older models that are no longer supported by APC.
I have been told that it even works with the AP9200 switch. Older switches
usually don't do well with the fence_apc script.
3) This agent works with large power switches that have more than 8 outlets.
The fence_apc script will also, in the next update -- this script will 
work for you now.
                                                                                

If feedback on this beta version of the agent is good, and if ganged 
switches
can be supported, then this agent may replace fence_apc.
 
After unpacking the attached tar file, you will find 3 files:
README
fence_apc_snmp
powernet369.mib
                                                                               

In order to use this agent, you will need to have net-snmp-utils installed
on every node in your cluster. net-snmp-utils is scheduled for inclusion
in the base RHEL distribution for Update 4, and is yummable in FC5.
                                                                                

After net-snmp-utils is installed, there will be a directory named:
/usr/share/snmp/mibs/
                                                                                

Place the accompanying powernet369.mib file in this directory.
                                                                                

To use the agent, cp the agent to the /sbin directory on every
cluster node. The interface for the fence_apc_snmp agent is identical to
the existing fence_apc agent, so if you are using APC for fencing in
your cluster, you *could* backup your current fence_apc agent, and
rename this agent from fence_apc_snmp to fence_apc, and it should just work.
                                                                                

NOTE: The fence_apc_snmp agent does not yet support ganged or 
'daisy-chained'
APC switches.
                                                                                

If you would rather not copy over your fence_apc agent, you can still use
the fence_apc_snmp agent by dropping it into /sbin on every node, and then
defining a <fencedevice> in the cluster.conf file with 
agent="fence_apc_snmp"
as an attribute, and use it that way. Note, please, that the GUI does
not support this agent yet, and you will have to edit your cluster.conf
by hand and then propagate it yourself. If you need help with this, email
me on linux-cluster or at the address below.
                                                                                

Big thanks to Nate Straz who laid the foundation for this agent.
                                                                                

The text of this email can also be found inside the tar file as a README.

Please let me know how this agent works.

Thanks and Regards,

-Jim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fence_apc_snmp.tar.gz
Type: application/x-gzip
Size: 110511 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060420/20dd21ab/attachment.bin>

From jparsons at redhat.com  Fri Apr 21 19:08:33 2006
From: jparsons at redhat.com (James Parsons)
Date: Fri, 21 Apr 2006 15:08:33 -0400
Subject: [Linux-cluster] New APC agent
Message-ID: <44492DB1.5010105@redhat.com>

Hello all,

This is an snmp based fence agent for APC power switches to be used
with RHEL4 Red Hat Cluster Suite.
 


The reasons to use this agent rather than the current fence_apc agent are:
1) This script has been tested successfully with EVERY powerswitch that
APC currently
makes.
2) It will work on many older models that are no longer supported by APC.
I have been told that it even works with the AP9200 switch. Older switches
usually don't do well with the fence_apc script.
3) This agent works with large power switches that have more than 8 outlets.
The fence_apc script will also, in the next update -- this script will
work for you now.
 


If feedback on this beta version of the agent is good, and if ganged
switches
can be supported, then this agent may replace fence_apc.

After unpacking the attached tar file, you will find 3 files:
README
fence_apc_snmp
powernet369.mib
 


In order to use this agent, you will need to have net-snmp-utils installed
on every node in your cluster. net-snmp-utils is scheduled for inclusion
in the base RHEL distribution for Update 4, and is yummable in FC5.
 


After net-snmp-utils is installed, there will be a directory named:
/usr/share/snmp/mibs/
 


Place the accompanying powernet369.mib file in this directory.
 


To use the agent, cp the agent to the /sbin directory on every
cluster node. The interface for the fence_apc_snmp agent is identical to
the existing fence_apc agent, so if you are using APC for fencing in
your cluster, you *could* backup your current fence_apc agent, and
rename this agent from fence_apc_snmp to fence_apc, and it should just work.
 


NOTE: The fence_apc_snmp agent does not yet support ganged or
'daisy-chained'
APC switches.
 


If you would rather not copy over your fence_apc agent, you can still use
the fence_apc_snmp agent by dropping it into /sbin on every node, and then
defining a <fencedevice> in the cluster.conf file with
agent="fence_apc_snmp"
as an attribute, and use it that way. Note, please, that the GUI does
not support this agent yet, and you will have to edit your cluster.conf
by hand and then propagate it yourself. If you need help with this, email
me on linux-cluster or at the address below.
 


Big thanks to Nate Straz who laid the foundation for this agent.
 


The text of this email can also be found inside the tar file as a README.

Please let me know how this agent works.

Thanks and Regards,

-Jim

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fence_apc_snmp.tar.gz
Type: application/x-gzip
Size: 110511 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060421/3eb9f9c3/attachment.bin>

From jparsons at redhat.com  Fri Apr 21 19:14:01 2006
From: jparsons at redhat.com (James Parsons)
Date: Fri, 21 Apr 2006 15:14:01 -0400
Subject: [Linux-cluster] New APC agent
Message-ID: <44492EF9.5010809@redhat.com>

Hello all,

This is an snmp based fence agent for APC power switches to be used
with RHEL4 Red Hat Cluster Suite.



The reasons to use this agent rather than the current fence_apc agent are:
1) This script has been tested successfully with EVERY powerswitch that
APC currently makes.
2) It will work on many older models that are no longer supported by 
APC. I have been told that it even works with the AP9200 switch. Older 
switches usually don't do well with the fence_apc script.
3) This agent works with large power switches that have more than 8 
outlets. The fence_apc script will also, in the next update -- this 
script will work for you now.



If feedback on this beta version of the agent is good, and if ganged
switches can be supported, then this agent may replace fence_apc.

After unpacking the attached tar file, you will find 3 files:
README
fence_apc_snmp
powernet369.mib



In order to use this agent, you will need to have net-snmp-utils 
installed on every node in your cluster. net-snmp-utils is scheduled for 
inclusion in the base RHEL distribution for Update 4, and is yummable in 
FC5.



After net-snmp-utils is installed, there will be a directory named:
/usr/share/snmp/mibs/



Place the accompanying powernet369.mib file in this directory.



To use the agent, cp the agent to the /sbin directory on every
cluster node. The interface for the fence_apc_snmp agent is identical to
the existing fence_apc agent, so if you are using APC for fencing in
your cluster, you *could* backup your current fence_apc agent, and
rename this agent from fence_apc_snmp to fence_apc, and it should just work.



NOTE: The fence_apc_snmp agent does not yet support ganged or
'daisy-chained' APC switches.



If you would rather not copy over your fence_apc agent, you can still 
use the fence_apc_snmp agent by dropping it into /sbin on every node, 
and then defining a <fencedevice> in the cluster.conf file with
agent="fence_apc_snmp"
as an attribute, and use it that way. Note, please, that the GUI does
not support this agent yet, and you will have to edit your cluster.conf
by hand and then propagate it yourself. If you need help with this, 
email me on linux-cluster or at the address below.



Big thanks to Nate Straz who laid the foundation for this agent.



The text of this email can also be found inside the tar file as a README.

Please let me know how this agent works.

Thanks and Regards,

-Jim


-------------- next part --------------
A non-text attachment was scrubbed...
Name: fence_apc_snmp.tar.gz
Type: application/x-gzip
Size: 110511 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060421/466a7485/attachment.bin>

From jparsons at redhat.com  Fri Apr 21 19:29:21 2006
From: jparsons at redhat.com (James Parsons)
Date: Fri, 21 Apr 2006 15:29:21 -0400
Subject: [Linux-cluster] New APC agent
Message-ID: <44493291.3070101@redhat.com>

Hello all,

This is an snmp based fence agent for APC power switches to be used
with RHEL4 Red Hat Cluster Suite.



The reasons to use this agent rather than the current fence_apc agent are:
1) This script has been tested successfully with EVERY powerswitch that
APC currently makes.
2) It will work on many older models that are no longer supported by
APC. I have been told that it even works with the AP9200 switch. Older
switches usually don't do well with the fence_apc script.
3) This agent works with large power switches that have more than 8
outlets. The fence_apc script will also, in the next update -- this
script will work for you now.



If feedback on this beta version of the agent is good, and if ganged
switches can be supported, then this agent may replace fence_apc.

After unpacking the attached tar file, you will find 3 files:
README
fence_apc_snmp
powernet369.mib



In order to use this agent, you will need to have net-snmp-utils
installed on every node in your cluster. net-snmp-utils is scheduled for
inclusion in the base RHEL distribution for Update 4, and is yummable in
FC5.



After net-snmp-utils is installed, there will be a directory named:
/usr/share/snmp/mibs/



Place the accompanying powernet369.mib file in this directory.



To use the agent, cp the agent to the /sbin directory on every
cluster node. The interface for the fence_apc_snmp agent is identical to
the existing fence_apc agent, so if you are using APC for fencing in
your cluster, you *could* backup your current fence_apc agent, and
rename this agent from fence_apc_snmp to fence_apc, and it should just work.



NOTE: The fence_apc_snmp agent does not yet support ganged or
'daisy-chained' APC switches.



If you would rather not copy over your fence_apc agent, you can still
use the fence_apc_snmp agent by dropping it into /sbin on every node,
and then defining a <fencedevice> in the cluster.conf file with
agent="fence_apc_snmp"
as an attribute, and use it that way. Note, please, that the GUI does
not support this agent yet, and you will have to edit your cluster.conf
by hand and then propagate it yourself. If you need help with this,
email me on linux-cluster or at the address below.



Big thanks to Nate Straz who laid the foundation for this agent.



The text of this email can also be found inside the tar file as a README.

Please let me know how this agent works.

Thanks and Regards,

-Jim



-------------- next part --------------
A non-text attachment was scrubbed...
Name: fence_apc_snmp.tar.gz
Type: application/x-gzip
Size: 110511 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060421/df4fc832/attachment.bin>

From aberoham at gmail.com  Sat Apr 22 02:33:45 2006
From: aberoham at gmail.com (aberoham at gmail.com)
Date: Fri, 21 Apr 2006 19:33:45 -0700
Subject: [Linux-cluster] Re: kernel noise, "Neighbour table overflow." ?
In-Reply-To: <3bdb07840604171938s723f28b0p4f0a195845bc521a@mail.gmail.com>
References: <3bdb07840604171938s723f28b0p4f0a195845bc521a@mail.gmail.com>
Message-ID: <3bdb07840604211933l77e94c4dh4e10b27a09579d24@mail.gmail.com>

Now, the same nodes that give the Neighbour table overflow messages are
unable to ping?! Chcek this out --

[root at gfs02 ~]# ping 192.168.60.188
connect: No buffer space available
[root at gfs02 ~]# printk: 4 messages suppressed.
Neighbour table overflow.
printk: 6 messages suppressed.
Neighbour table overflow.
printk: 5 messages suppressed.
Neighbour table overflow.
printk: 1 messages suppressed.

[root at gfs02 ~]# uptime
 19:32:32 up 4 days,  2:00,  4 users,  load average: 0.03, 0.07, 0.08
[root at gfs02 ~]#

[root at gfs02 ~]# clustat
Member Status: Quorate

  Member Name                              Status
  ------ ----                              ------
  gfs03                                    Online, rgmanager
  gfs02                                    Online, Local, rgmanager
  gfs01                                    Online, rgmanager

  Service Name         Owner (Last)                   State
  ------- ----         ----- ------                   -----
  nfshome              gfs03                          started
  ip-test              gfs03                          started
  jukebox              gfs02                          started


[root at gfs02 ~]# cman_tool status
Protocol version: 5.0.1
Config version: 93
Cluster name: gfscluster
Cluster ID: 41396
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 3
Expected_votes: 3
Total_votes: 3
Quorum: 2
Active subsystems: 8
Node name: gfs02
Node addresses: 10.0.19.11




On 4/17/06, aberoham at gmail.com <aberoham at gmail.com> wrote:
>
>
> I'm running a test three-node CS/GFS cluster. At random intervals I get
> the following kernel messages streaming out to /dev/console on all three
> nodes.
>
> ---
> Neighbour table overflow.
> printk: 166 messages suppressed.
> Neighbour table overflow.
> printk: 1 messages suppressed.
> Neighbour table overflow.
> printk: 1 messages suppressed.
> Neighbour table overflow.
> printk: 6 messages suppressed.
> Neighbour table overflow.
> printk: 5 messages suppressed.
> Neighbour table overflow.
> printk: 15 messages suppressed.
> Neighbour table overflow.
> printk: 7 messages suppressed.
> Neighbour table overflow.
> printk: 11 messages suppressed.
> ---
>
> Are these messages related to CS/GFS? What triggers 'em? And should I
> worry about it?
>
> I'm running Linux 2.6.9-34.ELsmp, GFS-kernel-smp-2.6.9-45, GFS-6.1.5-0 and
> dlm-kernel-smp-2.6.9-41.7.
>
> [root at gfs02 ~]# service cman status
> Protocol version: 5.0.1
> Config version: 73
> Cluster name: gfscluster
> Cluster ID: 41396
> Cluster Member: Yes
> Membership state: Cluster-Member
> Nodes: 3
> Expected_votes: 3
> Total_votes: 3
> Quorum: 2
> Active subsystems: 8
> Node name: gfs02
> Node addresses: 10.0.19.11
>
> [root at gfs02 ~]# cat /proc/cluster/services
> Service          Name                              GID LID State     Code
> Fence Domain:    "default"                           1   2 run       -
> [1 2 3]
>
> DLM Lock Space:  "clvmd"                             2   3 run       -
> [1 2 3]
>
> DLM Lock Space:  "Magma"                             4   5 run       -
> [1 2 3]
>
> DLM Lock Space:  "gfstest"                           5   6 run       -
> [1 2]
>
> GFS Mount Group: "gfstest"                           6   7 run       -
> [1 2]
>
> User:            "usrm::manager"                     3   4 run       -
> [1 2 3]
>
>
> Thanks.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060421/da486d30/attachment.htm>

From troels at arvin.dk  Sat Apr 22 12:08:12 2006
From: troels at arvin.dk (Troels Arvin)
Date: Sat, 22 Apr 2006 14:08:12 +0200
Subject: [Linux-cluster] Preventing automatic poweron?
Message-ID: <pan.2006.04.22.12.08.11.703000@arvin.dk>

With a Red Hat Cluster Suite for RHEL 4, consisting of two cluster nodes
(for fail-over), using HP ILOs for fencing:

1) Both nodes are shut down, using "poweroff".

2) Node 1 is started by pressing the power button.
   I don't want Node 2 to start yet. <------

3) After a little while, Node 1 fences Node 2, so that Node 2 starts.

How can I prevent this automatic power-on? I mean: Node 1 should be able
to see that Node 2 is currently powered off, so there is no need to fence
it(?).

-- 
Greetings from Troels Arvin




From gforte at leopard.us.udel.edu  Sat Apr 22 13:42:17 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Sat, 22 Apr 2006 09:42:17 -0400
Subject: [Linux-cluster] Preventing automatic poweron?
In-Reply-To: <pan.2006.04.22.12.08.11.703000@arvin.dk>
References: <pan.2006.04.22.12.08.11.703000@arvin.dk>
Message-ID: <444A32B9.9020809@leopard.us.udel.edu>

Troels Arvin wrote:
> With a Red Hat Cluster Suite for RHEL 4, consisting of two cluster nodes
> (for fail-over), using HP ILOs for fencing:
> 
> 1) Both nodes are shut down, using "poweroff".
> 
> 2) Node 1 is started by pressing the power button.
>    I don't want Node 2 to start yet. <------
> 
> 3) After a little while, Node 1 fences Node 2, so that Node 2 starts.
> 
> How can I prevent this automatic power-on? I mean: Node 1 should be able
> to see that Node 2 is currently powered off, so there is no need to fence
> it(?).

Unplug it?  the problem is, in a 2-node cluster, node 1 by itself has 
quorum, so if node 2 is powered off then it considers it dead and tries 
to revive it by fencing.  You either need to bring node 1 back up in 
"non-cluster mode" (stop cman/rgmanager/etc) or take further steps to 
prevent node 2 from being powered on.

This is why I prefer external fencing agents - if I manually turn off a 
node, it stays off ;-)

can you cut off communications to node 2's ILO from node 1 temporarily?
(and programatically?  I assume you can physically yank the 
communications cable, but that's no better than unplugging the node)

-g



From troels at arvin.dk  Sat Apr 22 13:51:32 2006
From: troels at arvin.dk (Troels Arvin)
Date: Sat, 22 Apr 2006 15:51:32 +0200
Subject: [Linux-cluster] Re: Preventing automatic poweron?
References: <pan.2006.04.22.12.08.11.703000@arvin.dk>
	<444A32B9.9020809@leopard.us.udel.edu>
Message-ID: <pan.2006.04.22.13.51.31.141000@arvin.dk>

On Sat, 22 Apr 2006 09:42:17 -0400, Greg Forte wrote:
> the problem is, in a 2-node cluster, node 1 by itself has 
> quorum, so if node 2 is powered off then it considers it dead and tries 
> to revive it by fencing

Actually, I think what I want is: Fencing should always result in
poweroff, not reboot. I wonder if there is a clean way to do that?

Rationale: A node should never die. If it does, by definition, an
undefined state has occurred, and I would rather not have such a server
start without having a chance to look into log files, etc.


> can you cut off communications to node 2's ILO from node 1 temporarily?

Hmm, good point. The ILO communication happens through cross-over cables
with endpoints in the ILO, and in a dedicated NIC. I guess that I may
ifdown the interface corresponding to the NIC on Node 1 (the one
performing the fencing).

-- 
Greetings from Troels Arvin




From gforte at leopard.us.udel.edu  Sat Apr 22 14:08:03 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Sat, 22 Apr 2006 10:08:03 -0400
Subject: [Linux-cluster] Re: Preventing automatic poweron?
In-Reply-To: <pan.2006.04.22.13.51.31.141000@arvin.dk>
References: <pan.2006.04.22.12.08.11.703000@arvin.dk>	<444A32B9.9020809@leopard.us.udel.edu>
	<pan.2006.04.22.13.51.31.141000@arvin.dk>
Message-ID: <444A38C3.8060208@leopard.us.udel.edu>

Troels Arvin wrote:
> On Sat, 22 Apr 2006 09:42:17 -0400, Greg Forte wrote:
>> the problem is, in a 2-node cluster, node 1 by itself has 
>> quorum, so if node 2 is powered off then it considers it dead and tries 
>> to revive it by fencing
> 
> Actually, I think what I want is: Fencing should always result in
> poweroff, not reboot. I wonder if there is a clean way to do that?
> 
> Rationale: A node should never die. If it does, by definition, an
> undefined state has occurred, and I would rather not have such a server
> start without having a chance to look into log files, etc.

Check your cluster.conf - it's probably already sending a "poweroff", 
then a "poweron", like this (for an APC power unit):

<fence>
    <method name="1">
      <device name="FENCE1" option="off" port="1"/>
      <device name="FENCE1" option="on" port="1"/>
    </method>
</fence>

in which case you can just drop the "on" part to achieve the desired 
result.  Otherwise you may need to hack at the fence_ilo script a bit - 
it's just a perl script.

-g



From troels at arvin.dk  Sat Apr 22 14:30:30 2006
From: troels at arvin.dk (Troels Arvin)
Date: Sat, 22 Apr 2006 16:30:30 +0200
Subject: [Linux-cluster] -P argument to ccsd
Message-ID: <pan.2006.04.22.14.30.33.313000@arvin.dk>

Hello,

Setup: Two-node RHEL 4-based fail-over cluster. The nodes are multi-homed,
i.e. they listen to several network interfaces for various purposes.
Dedicated heartbeat ethernet (cross-over) cabling is used for cluster
heartbeat.

I don't like the fact that the cluster-related daemons listen on multiple
network interfaces. One should be sufficient. However, it seems that the
daemons (like ccsd) don't have a parameter to specify which interface to
listen/communicate on. So I thought that I would use iptables to limit
network access to the daemons.

In the manual page for ccsd, the "-P" argument is described. The argument
governs which ports are being used for inter-ccsd communication ("b"),
cluster membership communication ("c"), and administrative programs ("f").
But how are multiple values specified?

Like this?:
 -P b:xxx c:yyy f:zzz

or like this?:
 -P "b:xxx c:yyy f:zzz"

or like this?:
 -P b:xxx -P c:yyy -P f:zzz

-- 
Greetings from Troels Arvin




From troels at arvin.dk  Sat Apr 22 14:44:48 2006
From: troels at arvin.dk (Troels Arvin)
Date: Sat, 22 Apr 2006 16:44:48 +0200
Subject: [Linux-cluster] Meaning of "service"
Message-ID: <pan.2006.04.22.14.44.52.172000@arvin.dk>

Hi,

Setup: Two-node RHEL 4-based fail-over cluster.

The cluster runs several daemons which are dependent on each other:

httpd:      serves static content, and acts as a front-end to Tomcat
tomcat:     handles servlets, etc; depends on postgresql
postgresql: database for servlets run by tomcat

The daemons all use a shared storage area (SCSI-box separate from the
servers, connected by SCSI cables) for data and logging. The daemons have
init-scripts which are specified in system-config-cluster.

Should I set this up as
a) one Cluster Service,
b) as three different Cluster Services?

If a:
How do I specify that the postgresql script should be run before the
tomcat script?

If b:
How do I specify that the postgresql service should be started before the
tomcat service?

-- 
Greetings from Troels Arvin




From troels at arvin.dk  Sat Apr 22 15:07:47 2006
From: troels at arvin.dk (Troels Arvin)
Date: Sat, 22 Apr 2006 17:07:47 +0200
Subject: [Linux-cluster] Re: Re: Preventing automatic poweron?
References: <pan.2006.04.22.12.08.11.703000@arvin.dk>
	<444A32B9.9020809@leopard.us.udel.edu>
	<pan.2006.04.22.13.51.31.141000@arvin.dk>
	<444A38C3.8060208@leopard.us.udel.edu>
Message-ID: <pan.2006.04.22.15.07.53.94000@arvin.dk>

On Sat, 22 Apr 2006 10:08:03 -0400, Greg Forte wrote:
> Check your cluster.conf - it's probably already sending a "poweroff", 
> then a "poweron", like this (for an APC power unit):
> 
> <fence>
>     <method name="1">
>       <device name="FENCE1" option="off" port="1"/>
>       <device name="FENCE1" option="on" port="1"/>
>     </method>
> </fence>

My cluster.conf actually didn't have any "option" attributes in its
<method name> tags. But I added the following attributes to each of my two
<fencedevice ...> tags:
action="off"

And it works. Thanks; that way, I don't have to modify the "fence"
RPM-package.

-- 
Greetings from Troels Arvin




From eric at bootseg.com  Sat Apr 22 15:13:41 2006
From: eric at bootseg.com (Eric Kerin)
Date: Sat, 22 Apr 2006 11:13:41 -0400
Subject: [Linux-cluster] Meaning of "service"
In-Reply-To: <pan.2006.04.22.14.44.52.172000@arvin.dk>
References: <pan.2006.04.22.14.44.52.172000@arvin.dk>
Message-ID: <1145718821.3302.20.camel@auh5-0479.corp.jabil.org>

On Sat, 2006-04-22 at 16:44 +0200, Troels Arvin wrote:
> Hi,
> 
> Setup: Two-node RHEL 4-based fail-over cluster.
> 
> The cluster runs several daemons which are dependent on each other:
> 
> httpd:      serves static content, and acts as a front-end to Tomcat
> tomcat:     handles servlets, etc; depends on postgresql
> postgresql: database for servlets run by tomcat
> 
> The daemons all use a shared storage area (SCSI-box separate from the
> servers, connected by SCSI cables) for data and logging. The daemons have
> init-scripts which are specified in system-config-cluster.
>
> Should I set this up as
> a) one Cluster Service,
> b) as three different Cluster Services?
> 
I have a very similar setup for my cluster.  I recommend option b.  That
will allow you to balance the processor load from the different services
onto the cluster nodes.  In my setup, I keep my Tomcat and httpd
processes in the same service, since they work from a single file
system.

The downsides:
* You have to have IP addresses bound for each service to allow the
other services to connect no matter what node it's running on. 
* You need to partition your shared storage into at least one partition
for each different cluster service.  (I use CLVM to dynamically
partition mine, it's a wonderful thing)

> If a:
> How do I specify that the postgresql script should be run before the
> tomcat script?
> 
> If b:
> How do I specify that the postgresql service should be started before the
> tomcat service?
> 
In my experience, this isn't too much of a problem.  But if you list the
PostgreSQL service first in your cluster.conf file (or first in the
system-config-cluster) it "should" start first, there is no guarantee.

In practice, Tomcat JNDI connection pooling will handle making
connection to the database once it comes online.  Normally Tomcat's
startup time is much longer than PostgreSQL's so it should be up before
Tomcat tries to connect.  Connection pooling will also handle
re-connection upon server failover.  If you are not using connection
pooling, and are creating a connection to the database with each page
load, you still won't run into an issue.


Thanks, 
Eric Kerin
eric at bootseg.com



From troels at arvin.dk  Sat Apr 22 16:16:17 2006
From: troels at arvin.dk (Troels Arvin)
Date: Sat, 22 Apr 2006 18:16:17 +0200
Subject: [Linux-cluster] Re: -P argument to ccsd
References: <pan.2006.04.22.14.30.33.313000@arvin.dk>
Message-ID: <pan.2006.04.22.16.16.24.172000@arvin.dk>

Hello again,

On Sat, 22 Apr 2006 16:30:30 +0200, I wrote:
> or like this?:
>  -P b:xxx -P c:yyy -P f:zzz

That was it.

http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/ccs/daemon/ccsd.c?rev=1.14.2.5.4.1&content-type=text/x-cvsweb-markup&cvsroot=cluster
gave a hint, and tests proved it to be true.

-- 
Greetings from Troels Arvin




From troels at arvin.dk  Sat Apr 22 16:49:12 2006
From: troels at arvin.dk (Troels Arvin)
Date: Sat, 22 Apr 2006 18:49:12 +0200
Subject: [Linux-cluster] Re: Preventing automatic poweron?
References: <pan.2006.04.22.12.08.11.703000@arvin.dk>
	<444A32B9.9020809@leopard.us.udel.edu>
Message-ID: <pan.2006.04.22.16.49.27.797000@arvin.dk>

On Sat, 22 Apr 2006 09:42:17 -0400, Greg Forte wrote:
[...]
> You either need to bring node 1 back up in 
> "non-cluster mode" (stop cman/rgmanager/etc) or take further steps to 
> prevent node 2 from being powered on.

This reminds me: How about using runlevel 4 as a "network-connected, but
outside-cluster" runlevel? (I don't recall seeing any specification of
what runlevel 4 should be used for.)

-- 
Greetings from Troels Arvin




From gforte at leopard.us.udel.edu  Sat Apr 22 17:13:41 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Sat, 22 Apr 2006 13:13:41 -0400
Subject: [Linux-cluster] Re: Preventing automatic poweron?
In-Reply-To: <pan.2006.04.22.16.49.27.797000@arvin.dk>
References: <pan.2006.04.22.12.08.11.703000@arvin.dk>	<444A32B9.9020809@leopard.us.udel.edu>
	<pan.2006.04.22.16.49.27.797000@arvin.dk>
Message-ID: <444A6445.50402@leopard.us.udel.edu>

Troels Arvin wrote:
> On Sat, 22 Apr 2006 09:42:17 -0400, Greg Forte wrote:
> [...]
>> You either need to bring node 1 back up in 
>> "non-cluster mode" (stop cman/rgmanager/etc) or take further steps to 
>> prevent node 2 from being powered on.
> 
> This reminds me: How about using runlevel 4 as a "network-connected, but
> outside-cluster" runlevel? (I don't recall seeing any specification of
> what runlevel 4 should be used for.)

Sure, makes sense to me.  none of the runlevels are set in stone, it's 
just a matter of convention.  I think RHEL+RHCS makes both 3 and 5 
cluster-enabled (sans and with X services, respectively), so there's no 
reason why 2 and 4 couldn't be the same minus cluster services (2 is 
supposed to be "Multiuser, without NFS" according to the comments in 
/etc/inittab, but again, that's just convention; and 4 is "unused"). 
Runlevels 7-9 are also valid, according to the init man page, though 
I've never actually tried them.

-g



From johngw at comcast.net  Fri Apr 21 20:25:11 2006
From: johngw at comcast.net (John Griffin-Wiesner)
Date: Fri, 21 Apr 2006 15:25:11 -0500
Subject: [Linux-cluster] where are built GFS rpms?, and upgrade question
Message-ID: <20060421202511.GA18697@rubicon.stillrunning.com>

Two questions:

1.  I can find src.rpm's but no built GFS rpm's for rhel 3U7.
I believe that should be GFS-6.0.2.30-0.  Can someone
tell me where those are hiding?  Or do we all have to build
those ourselves now?


2. When doing a minor upgrade (6.0.2.20-2 to 6.0.2.30-0) of a
group of GFS systems do those all have to be taken off-line
and upgraded simultaneously?  (The GFS admin guide I have
talks only about upgrading from 5.2.1 to 6.0.)  Or can they
be upgraded individually and work with the other GFS servers
that are still running the older rev?

Thanks

-- 
John Griffin-Wiesner
johngw at comcast.net



From nick at sqrt.co.uk  Sun Apr 23 03:24:40 2006
From: nick at sqrt.co.uk (Nick Burrett)
Date: Sat, 22 Apr 2006 20:24:40 -0700
Subject: [Linux-cluster] kernel noise, "Neighbour table overflow." ?
In-Reply-To: <3bdb07840604171938s723f28b0p4f0a195845bc521a@mail.gmail.com>
References: <3bdb07840604171938s723f28b0p4f0a195845bc521a@mail.gmail.com>
Message-ID: <444AF378.5020106@sqrt.co.uk>

aberoham at gmail.com wrote:
> 
> I'm running a test three-node CS/GFS cluster. At random intervals I get 
> the following kernel messages streaming out to /dev/console on all three 
> nodes.
> 
> ---
> Neighbour table overflow.
> printk: 166 messages suppressed.
> Neighbour table overflow.
> printk: 1 messages suppressed.

It looks like your ARP table is overflowing.  Try setting the 
gc_thresh[123] values in /proc/sys/net/ipv4/neigh/

See the manpage arp(7) for further details.

Regards,


Nick.



From mykleb at no.ibm.com  Sun Apr 23 10:55:42 2006
From: mykleb at no.ibm.com (Jan-Frode Myklebust)
Date: Sun, 23 Apr 2006 12:55:42 +0200
Subject: [Linux-cluster] Re: Linux (qmail) clustering
References: <BAY12-F26339F96CEBFB8663D7FCF9CD0@phx.gbl>
Message-ID: <slrne4mn9e.ipo.mykleb@99RXZYP.ibm.com>

On 2006-04-11, Haydar Akpinar <akpinar_haydar at hotmail.com> wrote:
>
> I would like to know if it is possible to do and also if any one has done 
> qmail clustering on a Linux box.

Since qmail is Maildir based (no locking problems to worry about), I think 
this should be fairly easy to do. You'll just need to decide which 
directories needs to be shared, and which needs to be private to each node.
It will probably be enough to have the home directories on a shared storage
(GFS or simply just NFS), and just do load balancing by equal MX record 
priorities. 


-- 
Jan-Frode Myklebust,  IT Specialist,  IBM Global Services,  ITS



From troels at arvin.dk  Sun Apr 23 11:04:25 2006
From: troels at arvin.dk (Troels Arvin)
Date: Sun, 23 Apr 2006 13:04:25 +0200
Subject: [Linux-cluster] Re: Meaning of "service"
References: <pan.2006.04.22.14.44.52.172000@arvin.dk>
	<1145718821.3302.20.camel@auh5-0479.corp.jabil.org>
Message-ID: <pan.2006.04.23.11.04.21.687000@arvin.dk>

Hello,

On Sat, 22 Apr 2006 11:13:41 -0400, Eric Kerin wrote:
>> Should I set this up as
>> a) one Cluster Service,
>> b) as three different Cluster Services?
>> 
> I have a very similar setup for my cluster.  I recommend option b.

I ended up doing option a, because I couldn't get the other option
working, for some strange reason.

By the way: The manual is rather unclear about the difference between
_adding_ a resource, and _attaching_ a resource. Can someone explain the
difference?

-- 
Greetings from Troels Arvin




From jason at monsterjam.org  Mon Apr 24 01:00:57 2006
From: jason at monsterjam.org (Jason)
Date: Sun, 23 Apr 2006 21:00:57 -0400
Subject: [Linux-cluster] where are built GFS rpms?, and upgrade question
In-Reply-To: <20060421202511.GA18697@rubicon.stillrunning.com>
References: <20060421202511.GA18697@rubicon.stillrunning.com>
Message-ID: <20060424010057.GA53613@monsterjam.org>

On Fri, Apr 21, 2006 at 03:25:11PM -0500, John Griffin-Wiesner wrote:
> Two questions:
> 
> 1.  I can find src.rpm's but no built GFS rpm's for rhel 3U7.
> I believe that should be GFS-6.0.2.30-0.  Can someone
> tell me where those are hiding?  Or do we all have to build
> those ourselves now?

http://www.gyrate.org/archives/9



From ookami at gmx.de  Mon Apr 24 03:59:24 2006
From: ookami at gmx.de (wolfgang pauli)
Date: Mon, 24 Apr 2006 05:59:24 +0200 (MEST)
Subject: [Linux-cluster] different subnets/ manual fencing
Message-ID: <30320.1145851164@www075.gmx.net>

Hi,  
  
I spent the whole day (sunday) trying to get this working...  
I guess these two questions might solve the issue.  
  
1. Can I have a cluster span over more than one subnet?  
  
2. When I try to start the cluster software, I always have to start it on  
all nodes at the same time. If I don't do it, startup will hang while  
fenced is starting up. I am using manual fencing. Probably the default  
configuration. The problem is that some of the nodes produce 
kernel-panics (when starting cman). so i would like to start the nodes 
one by one and test what the problem is. 
  
<fence_daemon post_fail_delay="5" post_join_delay="20"/>  
...  
<clusternode name="eon" votes="1">  
 <fence> 
  <method name="1">  
  <device name="human" nodename="eon"/>  
  </method>  
 </fence>  
</clusternode>  
... 
<fencedevices>  
 <fencedevice agent="fence_manual" name="human"/>  
</fencedevices>  

-- 
 

"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail



From pcaulfie at redhat.com  Mon Apr 24 07:24:20 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 24 Apr 2006 08:24:20 +0100
Subject: [Linux-cluster] different subnets/ manual fencing
In-Reply-To: <30320.1145851164@www075.gmx.net>
References: <30320.1145851164@www075.gmx.net>
Message-ID: <444C7D24.4090504@redhat.com>

wolfgang pauli wrote:
> Hi,  
>   
> I spent the whole day (sunday) trying to get this working...  
> I guess these two questions might solve the issue.  
>   
> 1. Can I have a cluster span over more than one subnet?  

Yes, but you'll need to configure it for multicas rather than broadcast - and
make sure that any intervening routers are good enough.

> 2. When I try to start the cluster software, I always have to start it on  
> all nodes at the same time. If I don't do it, startup will hang while  
> fenced is starting up. I am using manual fencing. Probably the default  
> configuration. The problem is that some of the nodes produce 
> kernel-panics (when starting cman). 

I'd like to see those please.

so i would like to start the nodes
> one by one and test what the problem is. 
>   
> <fence_daemon post_fail_delay="5" post_join_delay="20"/>  
> ...  
> <clusternode name="eon" votes="1">  
>  <fence> 
>   <method name="1">  
>   <device name="human" nodename="eon"/>  
>   </method>  
>  </fence>  
> </clusternode>  
> ... 
> <fencedevices>  
>  <fencedevice agent="fence_manual" name="human"/>  
> </fencedevices>  
> 


-- 

patrick



From Alain.Moulle at bull.net  Mon Apr 24 11:26:11 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Mon, 24 Apr 2006 13:26:11 +0200
Subject: [Linux-cluster] Re: CS4 Update 2 / GUI problem ?
Message-ID: <444CB5D3.5010408@bull.net>

Thanks Jim, it was effectively the problem : the second step
about managing the fence for each node was missing (but there
is nothing in documentation about this step and dialog boxes ...)

Another problem/question:
when you have finished the 3 nodes cluster configuration,
and Save the file in /etc/cluster/cluster.conf on local node,
the Icon "Send to Cluster" is not available because the
cs4 is not active at the moment. But with three nodes,
even if you try to start the cs4 on this local node (where
I've done the configuration) , it can't start alone because
the cluster is not quorate ... and you can't start cman
on other nodes, it seems that the start is failed because
of no cluster.conf currently on the node.

So, do we have to do manually the mkdir /etc/cluster on
both other nodes, and scp of cluster.conf towards both nodes ?
Or is there another tip via GUI to start the cs4 on three
nodes despite two nodes have not yet any cluster.conf available ?

Thanks
Alain Moull?






From Alain.Moulle at bull.net  Mon Apr 24 11:38:16 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Mon, 24 Apr 2006 13:38:16 +0200
Subject: [Linux-cluster] Re: CS4 Update 2 / GUI problem ?
In-Reply-To: <444CB5D3.5010408@bull.net>
References: <444CB5D3.5010408@bull.net>
Message-ID: <444CB8A8.3060502@bull.net>

Alain Moulle wrote:
> Thanks Jim, it was effectively the problem : the second step
> about managing the fence for each node was missing (but there
> is nothing in documentation about this step and dialog boxes ...)
> 
> Another problem/question:
> when you have finished the 3 nodes cluster configuration,
> and Save the file in /etc/cluster/cluster.conf on local node,
> the Icon "Send to Cluster" is not available because the
> cs4 is not active at the moment. But with three nodes,
> even if you try to start the cs4 on this local node (where
> I've done the configuration) , it can't start alone because
> the cluster is not quorate ... and you can't start cman
> on other nodes, it seems that the start is failed because
> of no cluster.conf currently on the node.
> 
> So, do we have to do manually the mkdir /etc/cluster on
> both other nodes, and scp of cluster.conf towards both nodes ?
> Or is there another tip via GUI to start the cs4 on three
> nodes despite two nodes have not yet any cluster.conf available ?
> 
> Thanks
> Alain Moull?

More information :

in fact, when I started cman on nodes without cluster/cluster.conf,
they effectuvely (as expected) got a cluster.conf from another node
connected, but they take the cluster.conf from another HA pair cluster,
not from the third node of this current cluster !!!!
So that's why the start fails ...
Which is the algorythm of search when a node has no cluster.conf available ?

Thanks
Alain Moull?






From jparsons at redhat.com  Mon Apr 24 12:57:13 2006
From: jparsons at redhat.com (James Parsons)
Date: Mon, 24 Apr 2006 08:57:13 -0400
Subject: [Linux-cluster] Re: CS4 Update 2 / GUI problem ?
In-Reply-To: <444CB5D3.5010408@bull.net>
References: <444CB5D3.5010408@bull.net>
Message-ID: <444CCB29.9060109@redhat.com>

Alain Moulle wrote:

>Thanks Jim, it was effectively the problem : the second step
>about managing the fence for each node was missing (but there
>is nothing in documentation about this step and dialog boxes ...)
>
>Another problem/question:
>when you have finished the 3 nodes cluster configuration,
>and Save the file in /etc/cluster/cluster.conf on local node,
>the Icon "Send to Cluster" is not available because the
>cs4 is not active at the moment. But with three nodes,
>even if you try to start the cs4 on this local node (where
>I've done the configuration) , it can't start alone because
>the cluster is not quorate ... and you can't start cman
>on other nodes, it seems that the start is failed because
>of no cluster.conf currently on the node.
>
>So, do we have to do manually the mkdir /etc/cluster on
>both other nodes, and scp of cluster.conf towards both nodes ?
>Or is there another tip via GUI to start the cs4 on three
>nodes despite two nodes have not yet any cluster.conf available ?
>
Yes, Alain. Currently, you must manually scp the cluster.conf to each 
node before starting the cluster. This requirement is considered 
unacceptable, however, and ease of cluster deployment is being 
aggressively pursued in two projects here.

The first is an app called deploy-tool, which pulls down the necessary 
RPMs onto the machines desired as cluster nodes, and installs them AND 
copies a preliminary cluster.conf file to each node. Finally, it starts 
the cluster service daemons on each node.

The second project with ease of cluster deployment as an important 
objective is the Conga project. It will provide a remote method for 
deploying clusters, configuring and monitoring them, and even adding and 
removing nodes. It will also offer remote storage management and a few 
other fun things as well.

Conga and deploy-tool are both in active development, and deploy-tool is 
being beta tested now.

Regards,

-Jim

>
>Thanks
>Alain Moull?
>
>
>
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>  
>




From proftpd at rodriges.spb.ru  Mon Apr 24 14:35:26 2006
From: proftpd at rodriges.spb.ru (proftpd at rodriges.spb.ru)
Date: Mon, 24 Apr 2006 18:35:26 +0400
Subject: [Linux-cluster] GFS
Message-ID: <web-16582530@eltel.net>

Hello.

I'm use GFS only for share one iSCSI target between 2
initiators. I'm really don't need create a cluster among
those two nodes. I'm install GFS,GFS-kernel,fence,ccs and
others packages, install iSCSI-initiator and even create
gfs-file system, but then i'm going to mount gfs i received

[root at initiator src]# mount -t gfs /dev/sda1 /mnt/w/
mount: Transport endpoint is not connected

I haven't any idea about /etc/cluster/cluster.conf file -
It's really necessary to create them. I'm simply want to
share one iSCSI target between 2 hosts. Can i achive
without creating a cluster???



From erwan at seanodes.com  Mon Apr 24 16:17:18 2006
From: erwan at seanodes.com (Velu Erwan)
Date: Mon, 24 Apr 2006 18:17:18 +0200
Subject: [Linux-cluster] Missing %if in GFS 6.0.2.30 specfile
Message-ID: <444CFA0E.3080400@seanodes.com>

If buildup = 0, rpm fails because the "modules" package doesn't exist 
for the following lines :
The patch is simple but helps ;)

+%if %{buildup}
  %post modules
   depmod -ae -F /boot/System.map-%{kernel_version} %{kernel_version}
+%endif

Erwan,



From ookami at gmx.de  Mon Apr 24 17:18:30 2006
From: ookami at gmx.de (Wolfgang Pauli)
Date: Mon, 24 Apr 2006 11:18:30 -0600
Subject: [Linux-cluster] different subnets/ manual fencing
In-Reply-To: <444C7D24.4090504@redhat.com>
References: <30320.1145851164@www075.gmx.net> <444C7D24.4090504@redhat.com>
Message-ID: <200604241118.30964.ookami@gmx.de>

> Yes, but you'll need to configure it for multicas rather than broadcast -
> and make sure that any intervening routers are good enough.

That is good news. So we have the head node (dream) with two ethernet cards. 
We want it to serve a GFS partition to two different subnets. I guess this is 
than also doable with multicast, right?

> I'd like to see those please.
>
Thanks! I attached a text file (myoops.txt). I hope it helps. I can't tell, 
because I never really had to deal with kernel panics before...

Thanks again,

wolfgang

P.S.: Apr 23 16:15:10 node15 kernel: Linux version 2.6.15-1.1833_FC4smp 
(bhcompile at hs20-bc1-1.build.redhat.com) (gcc version 4.0.2 20051125 (Red Hat 
4.0.2-8)) #1 SMP Wed Mar 1 23:56:51 EST 2006
-------------- next part --------------
Apr 23 14:03:59 node15 ccsd[2367]: Starting ccsd 1.0.0:
Apr 23 14:03:59 node15 ccsd[2367]:  Built: Jun 16 2005 10:45:39
Apr 23 14:03:59 node15 ccsd[2367]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Apr 23 14:04:00 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar  7 2006 15:36:41) installed
Apr 23 14:04:00 node15 kernel: NET: Registered protocol family 30
Apr 23 14:04:00 node15 ccsd[2367]: cluster.conf (cluster name = oreilly_cluster, version = 33) found.
Apr 23 14:04:03 node15 kernel: CMAN: Waiting to join or form a Linux-cluster
Apr 23 14:04:03 node15 ccsd[2367]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2
Apr 23 14:04:03 node15 ccsd[2367]: Initial status:: Inquorate
Apr 23 14:04:32 node15 kernel: CMAN: sending membership request
Apr 23 14:04:53 node15 last message repeated 19 times
Apr 23 14:04:54 node15 kernel: CMAN: got node node27
Apr 23 14:04:54 node15 kernel: CMAN: got node node17
Apr 23 14:04:54 node15 kernel: CMAN: got node node16
Apr 23 14:04:54 node15 kernel: CMAN: got node node24
Apr 23 14:04:54 node15 kernel: CMAN: got node node1
Apr 23 14:04:54 node15 kernel: CMAN: got node node2
Apr 23 14:04:54 node15 kernel: CMAN: got node node23
Apr 23 14:04:54 node15 kernel: CMAN: got node node6
Apr 23 14:04:54 node15 kernel: CMAN: got node node10
Apr 23 14:04:54 node15 kernel: CMAN: got node node9
Apr 23 14:05:01 node15 kernel: CMAN: quorum regained, resuming activity
Apr 23 14:05:01 node15 ccsd[2367]: Cluster is quorate.  Allowing connections.
Apr 23 14:05:01 node15 kernel: dlm: no version for "struct_module" found: kernel tainted.
Apr 23 14:05:01 node15 kernel: DLM 2.6.11.5-20050601.152643.FC4.22 (built Mar  7 2006 15:42:37) installed
Apr 23 14:07:28 node15 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Apr 23 14:07:28 node15 kernel:  printing eip:
Apr 23 14:07:28 node15 kernel: f8acfa39
Apr 23 14:07:28 node15 kernel: *pde = 37d1f001
Apr 23 14:07:28 node15 kernel: Oops: 0000 [#1]
Apr 23 14:07:28 node15 kernel: SMP
Apr 23 14:07:28 node15 kernel: last sysfs file: /class/misc/dlm-control/dev
Apr 23 14:07:28 node15 kernel: Modules linked in: dlm(U) cman(U) ipv6 parport_pc lp parport autofs4 nfs lockd nfs_acl sunrpc dm_mod eepro100 uhci_hcd hw_ra
ndom i8xx_tco i2c_i801 i2c_core e1000 e100 mii floppy ext3 jbd
Apr 23 14:07:28 node15 kernel: CPU:    0
Apr 23 14:07:28 node15 kernel: EIP:    0060:[<f8acfa39>]    Tainted: GF     VLI
Apr 23 14:07:28 node15 kernel: EFLAGS: 00010207   (2.6.15-1.1833_FC4smp)
Apr 23 14:07:28 node15 kernel: EIP is at memcpy_fromkvec+0x2e/0x4f [cman]
Apr 23 14:07:28 node15 kernel: eax: 00000046   ebx: c1d7aeb7   ecx: 00000011   edx: f68b0fa0
Apr 23 14:07:28 node15 kernel: esi: 00000000   edi: c1d7aeb7   ebp: 00000046   esp: f68b0ec8
Apr 23 14:07:28 node15 kernel: ds: 007b   es: 007b   ss: 0068
Apr 23 14:07:28 node15 kernel: Process cman_comms (pid: 2392, threadinfo=f68b0000 task=f7d85550)
Apr 23 14:07:28 node15 kernel: Stack: badc0ded f6846380 00000000 f7386400 f68b0f74 f8acfbb3 00000100 00000002
Apr 23 14:07:28 node15 kernel:        00000040 f731e000 f7decb40 f731e001 00000001 00000001 f6cea440 f8ad002d
Apr 23 14:07:28 node15 kernel:        f68b0f90 00000001 000002fd c1b091e0 f68b0f90 f68b0f74 f7decb40 c1eb4940
Apr 23 14:07:28 node15 kernel: Call Trace:
Apr 23 14:07:28 node15 kernel:  [<f8acfbb3>] send_to_user_port+0x159/0x3cc [cman]     [<f8ad002d>] process_incoming_packet+0x207/0x26c [cman]
Apr 23 14:07:29 node15 kernel:  [<f8acf188>] receive_message+0xb7/0xe0 [cman]     [<f8acf33c>] cluster_kthread+0x18b/0x39f [cman]
Apr 23 14:07:29 node15 kernel:  [<c01205ee>] default_wake_function+0x0/0xc     [<f8acf1b1>] cluster_kthread+0x0/0x39f [cman]
Apr 23 14:07:29 node15 kernel:  [<c010243d>] kernel_thread_helper+0x5/0xb
Apr 23 14:07:29 node15 kernel: Code: 53 89 c3 89 cd 85 c9 7e 3e 83 c2 08 eb 07 83 c2 08 85 ed 7e 32 8b 42 fc 85 c0 74 f2 39 e8 0f 47 c5 89 c1 c1 e9 02 8b 7
2 f8 89 df <f3> a5 89 c1 83 e1 03 74 02 f3 a4 29 c5 01 c3 01 42 f8 29 42 fc
Apr 23 14:07:29 node15 kernel: Continuing in 120 seconds. ^MContinuing in 119 seconds. ^MContinuing in 118 seconds. ^MContinuing in 117 seconds. ^MContinui
ng in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111
seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds.
Continuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing
 in 100 seconds. ^M<4>hda: dma_timer_expiry: dma status == 0x24
Apr 23 14:07:29 node15 kernel: Continuing in 99 seconds. ^MContinuing in 98 seconds. ^MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing i
n 95 seconds. ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds.
hda: DMA interrupt recovery
Apr 23 14:07:29 node15 kernel: hda: lost interrupt
Apr 23 14:07:29 node15 kernel: Continuing in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing i
n 85 seconds. ^MContinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 80 seconds.
Continuing in 79 seconds. ^MContinuing in 78 seconds. ^MContinuing in 77 seconds. ^MContinuing in 76 seconds. ^MContinuing in 75 seconds. ^MContinuing in 7
4 seconds. ^MContinuing in 73 seconds. ^MContinuing in 72 seconds. ^MContinuing in 71 seconds. ^MContinuing in 70 seconds. ^MContinuing in 69 seconds. ^M<4
>hda: dma_timer_expiry: dma status == 0x24
Apr 23 14:07:29 node15 kernel: Continuing in 68 seconds. ^MContinuing in 67 seconds. ^MContinuing in 66 seconds. ^MContinuing in 65 seconds. ^MContinuing i
n 64 seconds. ^MContinuing in 63 seconds. ^MContinuing in 62 seconds. ^MContinuing in 61 seconds. ^M<6>ide-cd: cmd 0x3 timed out
Apr 23 14:07:29 node15 kernel: hdc: lost interrupt
Apr 23 14:07:29 node15 kernel: Continuing in 60 seconds. ^MContinuing in 59 seconds. ^Mhda: DMA interrupt recovery
Apr 23 14:07:29 node15 kernel: hda: lost interrupt
Apr 23 14:07:29 node15 kernel: Continuing in 58 seconds. ^MContinuing in 57 seconds. ^MContinuing in 56 seconds. ^MContinuing in 55 seconds. ^MContinuing i
n 54 seconds. ^MContinuing in 53 seconds. ^MContinuing in 52 seconds. ^MContinuing in 51 seconds. ^MContinuing in 50 seconds. ^MContinuing in 49 seconds.
Continuing in 48 seconds. ^MContinuing in 47 seconds. ^MContinuing in 46 seconds. ^MContinuing in 45 seconds. ^MContinuing in 44 seconds. ^MContinuing in 4
3 seconds. ^MContinuing in 42 seconds. ^MContinuing in 41 seconds. ^MContinuing in 40 seconds. ^MContinuing in 39 seconds. ^M<4>hda: dma_timer_expiry: dma
status == 0x24
Apr 23 14:07:29 node15 kernel: Continuing in 38 seconds. ^MContinuing in 37 seconds. ^MContinuing in 36 seconds. ^MContinuing in 35 seconds. ^MContinuing i
n 34 seconds. ^MContinuing in 33 seconds. ^MContinuing in 32 seconds. ^MContinuing in 31 seconds. ^MContinuing in 30 seconds. ^MContinuing in 29 seconds.
hda: DMA interrupt recovery
Apr 23 14:07:29 node15 kernel: hda: lost interrupt
Apr 23 14:07:29 node15 kernel: Continuing in 28 seconds. ^MContinuing in 27 seconds. ^MContinuing in 26 seconds. ^MContinuing in 25 seconds. ^MContinuing i
n 24 seconds. ^MContinuing in 23 seconds. ^MContinuing in 22 seconds. ^MContinuing in 21 seconds. ^MContinuing in 20 seconds. ^MContinuing in 19 seconds.
Continuing in 18 seconds. ^MContinuing in 17 seconds. ^MContinuing in 16 seconds. ^MContinuing in 15 seconds. ^MContinuing in 14 seconds. ^MContinuing in 1
3 seconds. ^MContinuing in 12 seconds. ^MContinuing in 11 seconds. ^MContinuing in 10 seconds. ^MContinuing in 9 seconds. ^M<4>hda: dma_timer_expiry: dma s
tatus == 0x24
Apr 23 14:07:29 node15 kernel: Continuing in 8 seconds. ^MContinuing in 7 seconds. ^MContinuing in 6 seconds. ^MContinuing in 5 seconds. ^MContinuing in 4
seconds. ^MContinuing in 3 seconds. ^MContinuing in 2 seconds. ^MContinuing in 1 seconds.
Apr 23 14:07:29 node15 kernel:  <0>Fatal exception: panic in 5 seconds


# -----------------------------------------------

Apr 23 16:07:04 node15 ccsd[2373]: Starting ccsd 1.0.0:
Apr 23 16:07:04 node15 ccsd[2373]:  Built: Jun 16 2005 10:45:39
Apr 23 16:07:04 node15 ccsd[2373]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Apr 23 16:07:05 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar  7 2006 15:36:41) installed
Apr 23 16:07:05 node15 kernel: NET: Registered protocol family 30
Apr 23 16:07:05 node15 ccsd[2373]: cluster.conf (cluster name = oreilly_cluster, version = 33) found.
Apr 23 16:07:14 node15 kernel: CMAN: Waiting to join or form a Linux-cluster
Apr 23 16:07:14 node15 ccsd[2373]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2
Apr 23 16:07:14 node15 ccsd[2373]: Initial status:: Inquorate
Apr 23 16:07:17 node15 kernel: CMAN: sending membership request
Apr 23 16:07:37 node15 last message repeated 27 times
Apr 23 16:07:38 node15 kernel: CMAN: got node node2
Apr 23 16:07:38 node15 kernel: CMAN: got node node26
Apr 23 16:07:38 node15 kernel: CMAN: got node node6
Apr 23 16:07:38 node15 kernel: CMAN: got node node27
Apr 23 16:07:38 node15 kernel: CMAN: got node node4
Apr 23 16:07:38 node15 kernel: CMAN: got node node5
Apr 23 16:07:38 node15 kernel: CMAN: got node node17
Apr 23 16:07:38 node15 kernel: CMAN: got node node3
Apr 23 16:07:38 node15 kernel: CMAN: got node node18
Apr 23 16:07:38 node15 kernel: CMAN: got node node16
Apr 23 16:07:38 node15 kernel: CMAN: got node node23
Apr 23 16:07:38 node15 kernel: CMAN: got node node12
Apr 23 16:07:38 node15 kernel: CMAN: got node node7
Apr 23 16:07:38 node15 ccsd[2373]: Cluster is quorate.  Allowing connections.
Apr 23 16:07:38 node15 kernel: CMAN: got node dream
Apr 23 16:07:38 node15 kernel: CMAN: got node node20
Apr 23 16:07:38 node15 kernel: CMAN: quorum regained, resuming activity
Apr 23 16:07:38 node15 kernel: dlm: no version for "struct_module" found: kernel tainted.
Apr 23 16:07:38 node15 kernel: DLM 2.6.11.5-20050601.152643.FC4.22 (built Mar  7 2006 15:42:37) installed
Apr 23 16:10:06 node15 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Apr 23 16:10:06 node15 kernel:  printing eip:
Apr 23 16:10:06 node15 kernel: f8a85a39
Apr 23 16:10:06 node15 kernel: *pde = 363b4001
Apr 23 16:10:06 node15 kernel: Oops: 0000 [#1]
Apr 23 16:10:06 node15 kernel: SMP
Apr 23 16:10:06 node15 kernel: last sysfs file: /class/misc/dlm-control/dev
Apr 23 16:10:06 node15 kernel: Modules linked in: dlm(U) cman(U) ipv6 parport_pc lp parport autofs4 sunrpc dm_mod eepro100 uhci_hcd hw_random i8xx_tco i2c_
i801 i2c_core e1000 e100 mii floppy ext3 jbd
Apr 23 16:10:06 node15 kernel: CPU:    0
Apr 23 16:10:06 node15 kernel: EIP:    0060:[<f8a85a39>]    Tainted: GF     VLI
Apr 23 16:10:06 node15 kernel: EFLAGS: 00010202   (2.6.15-1.1833_FC4smp)
Apr 23 16:10:06 node15 kernel: EIP is at memcpy_fromkvec+0x2e/0x4f [cman]
Apr 23 16:10:06 node15 kernel: eax: 00000040   ebx: c1db9eba   ecx: 00000010   edx: f6344fa0
Apr 23 16:10:06 node15 kernel: esi: 00000000   edi: c1db9eba   ebp: 00000040   esp: f6344ec8
Apr 23 16:10:06 node15 kernel: ds: 007b   es: 007b   ss: 0068
Apr 23 16:10:06 node15 kernel: Process cman_comms (pid: 2399, threadinfo=f6344000 task=c1e73aa0)
Apr 23 16:10:06 node15 kernel: Stack: badc0ded f63d5a80 00000000 f6000a00 f6344f74 f8a85bb3 00000100 00000002
Apr 23 16:10:06 node15 kernel:        00000040 f66a9800 f6319a40 f66a9801 00000001 00000001 f6383cc0 f8a8602d
Apr 23 16:10:06 node15 kernel:        f6344f90 00000001 000002fa c1b091e0 f6344f90 f6344f74 f6319a40 f7dc1b80
Apr 23 16:10:06 node15 kernel: Call Trace:
Apr 23 16:10:06 node15 kernel:  [<f8a85bb3>] send_to_user_port+0x159/0x3cc [cman]     [<f8a8602d>] process_incoming_packet+0x207/0x26c [cman]
Apr 23 16:10:06 node15 kernel:  [<f8a85188>] receive_message+0xb7/0xe0 [cman]     [<f8a8533c>] cluster_kthread+0x18b/0x39f [cman]
Apr 23 16:10:06 node15 kernel:  [<c01205ee>] default_wake_function+0x0/0xc     [<f8a851b1>] cluster_kthread+0x0/0x39f [cman]
Apr 23 16:10:06 node15 kernel:  [<c010243d>] kernel_thread_helper+0x5/0xb
Apr 23 16:10:06 node15 kernel: Code: 53 89 c3 89 cd 85 c9 7e 3e 83 c2 08 eb 07 83 c2 08 85 ed 7e 32 8b 42 fc 85 c0 74 f2 39 e8 0f 47 c5 89 c1 c1 e9 02 8b 7
2 f8 89 df <f3> a5 89 c1 83 e1 03 74 02 f3 a4 29 c5 01 c3 01 42 f8 29 42 fc
Apr 23 16:10:06 node15 kernel:  <0>Fatal exception: panic in 5 seconds





# -----------------------------------------------

Apr 23 18:05:33 node15 ccsd[3356]: Starting ccsd 1.0.0:
Apr 23 18:05:33 node15 ccsd[3356]:  Built: Jun 16 2005 10:45:39
Apr 23 18:05:33 node15 ccsd[3356]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Apr 23 18:05:34 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar  7 2006 15:36:41) installed
Apr 23 18:05:34 node15 kernel: NET: Registered protocol family 30
Apr 23 18:05:34 node15 ccsd[3356]: cluster.conf (cluster name = oreilly_cluster, version = 35) found.
Apr 23 18:05:35 node15 kernel: CMAN: Waiting to join or form a Linux-cluster
Apr 23 18:05:36 node15 ccsd[3356]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2
Apr 23 18:05:36 node15 ccsd[3356]: Initial status:: Inquorate
Apr 23 18:05:36 node15 kernel: CMAN: sending membership request
Apr 23 18:05:36 node15 kernel: CMAN: got node dream
Apr 23 18:06:13 node15 kernel: CMAN: quorum regained, resuming activity
Apr 23 18:06:13 node15 ccsd[3356]: Cluster is quorate.  Allowing connections.
Apr 23 18:06:13 node15 kernel: dlm: no version for "struct_module" found: kernel tainted.
Apr 23 18:06:13 node15 kernel: DLM 2.6.11.5-20050601.152643.FC4.22 (built Mar  7 2006 15:42:37) installed
Apr 23 18:06:23 node15 kernel: CMAN: node node1 rejoining
Apr 23 18:06:28 node15 last message repeated 3 times
Apr 23 18:06:32 node15 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Apr 23 18:06:32 node15 kernel:  printing eip:
Apr 23 18:06:32 node15 kernel: f8a85a39
Apr 23 18:06:32 node15 kernel: *pde = 37e89001
Apr 23 18:06:32 node15 kernel: Oops: 0000 [#1]
Apr 23 18:06:32 node15 kernel: SMP
Apr 23 18:06:32 node15 kernel: last sysfs file: /class/misc/dlm-control/dev
Apr 23 18:06:32 node15 kernel: Modules linked in: dlm(U) cman(U) nfs lockd nfs_acl ipv6 parport_pc lp parport autofs4 sunrpc dm_mod eepro100 uhci_hcd hw_ra
ndom i8xx_tco i2c_i801 i2c_core e1000 e100 mii floppy ext3 jbd
Apr 23 18:06:32 node15 kernel: CPU:    0
Apr 23 18:06:32 node15 kernel: EIP:    0060:[<f8a85a39>]    Tainted: GF     VLI
Apr 23 18:06:32 node15 kernel: EFLAGS: 00010203   (2.6.15-1.1833_FC4smp)
Apr 23 18:06:32 node15 kernel: EIP is at memcpy_fromkvec+0x2e/0x4f [cman]
Apr 23 18:06:32 node15 kernel: eax: 00000042   ebx: c1ffcab9   ecx: 00000010   edx: f5c22fa0
Apr 23 18:06:32 node15 kernel: esi: 00000000   edi: c1ffcab9   ebp: 00000042   esp: f5c22ec8
Apr 23 18:06:32 node15 kernel: ds: 007b   es: 007b   ss: 0068
Apr 23 18:06:32 node15 kernel: Process cman_comms (pid: 3387, threadinfo=f5c22000 task=c1e50000)
Apr 23 18:06:33 node15 kernel: Stack: badc0ded f7fd6d80 00000000 f6462000 f5c22f74 f8a85bb3 00000100 00000002
Apr 23 18:06:33 node15 kernel:        00000040 f66b1000 f5caf9c0 f66b1001 00000001 00000001 f5caf840 f8a8602d
Apr 23 18:06:33 node15 kernel:        f5c22f90 00000001 000002fb c1b091e0 f5c22f90 f5c22f74 f5caf9c0 c1e6d100
Apr 23 18:06:33 node15 kernel: Call Trace:
Apr 23 18:06:33 node15 kernel:  [<f8a85bb3>] send_to_user_port+0x159/0x3cc [cman]     [<f8a8602d>] process_incoming_packet+0x207/0x26c [cman]
Apr 23 18:06:33 node15 kernel:  [<f8a85188>] receive_message+0xb7/0xe0 [cman]     [<f8a8533c>] cluster_kthread+0x18b/0x39f [cman]
Apr 23 18:06:33 node15 kernel:  [<c01205ee>] default_wake_function+0x0/0xc     [<f8a851b1>] cluster_kthread+0x0/0x39f [cman]
Apr 23 18:06:33 node15 kernel:  [<c010243d>] kernel_thread_helper+0x5/0xb
Apr 23 18:06:33 node15 kernel: Code: 53 89 c3 89 cd 85 c9 7e 3e 83 c2 08 eb 07 83 c2 08 85 ed 7e 32 8b 42 fc 85 c0 74 f2 39 e8 0f 47 c5 89 c1 c1 e9 02 8b 7
2 f8 89 df <f3> a5 89 c1 83 e1 03 74 02 f3 a4 29 c5 01 c3 01 42 f8 29 42 fc
Apr 23 18:06:33 node15 kernel: Continuing in 120 seconds. ^MContinuing in 119 seconds. ^MContinuing in 118 seconds. ^MContinuing in 117 seconds. ^MContinui
ng in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111
seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds.
Continuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing
 in 100 seconds. ^MContinuing in 99 seconds. ^MContinuing in 98 seconds. ^MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing in 95 seconds
. ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds. ^MContinuing
in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing in 85 seconds. ^MCo
Apr 23 18:06:33 node15 kernel: tinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 8
ng in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111
seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds.
Continuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing
 in 100 seconds. ^MContinuing in 99 seconds. ^MContinuing in 98 seconds. ^MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing in 95 seconds
. ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds. ^MContinuing
in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing in 85 seconds. ^MCo
Apr 23 18:06:33 node15 kernel: tinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 8
0 seconds. ^MContinuing in 79 seconds. ^MContinuing in 78 seconds. ^MContinuing in 77 seconds. ^MContinuing in 76 seconds. ^MContinuing in 75 seconds. ^MCo
ntinuing in 74 seconds. ^MContinuing in 73 seconds. ^MContinuing in 72 seconds. ^MContinuing in 71 seconds. ^MContinuing in 70 seconds. ^MContinuing in 69
seconds. ^MContinuing in 68 seconds. ^MContinuing in 67 seconds. ^MContinuing in 66 seconds. ^MContinuing in 65 seconds. ^MContinuing in 64 seconds. ^MCont
inuing in 63 seconds. ^MContinuing in 62 seconds. ^MContinuing in 61 seconds. ^MContinuing in 60 seconds. ^MContinuing in 59 seconds. ^MContinuing in 58 se
conds. ^MContinuing in 57 seconds. ^MContinuing in 56 seconds. ^MContinuing in 55 seconds. ^MContinuing in 54 seconds. ^MContinuing in 53 seconds. ^MContin
uing in 52 seconds. ^MContinuing in 51 seconds. ^MContinuing in 50 seconds. ^MContinuing in 49 seconds. ^MContinuing in 48 seconds.
Apr 23 18:06:33 node15 kernel: tinuing in 47 seconds. ^MContinuing in 46 seconds. ^MContinuing in 45 seconds. ^MContinuing in 44 seconds. ^MContinuing in 4
3 seconds. ^MContinuing in 42 seconds. ^MContinuing in 41 seconds. ^MContinuing in 40 seconds. ^MContinuing in 39 seconds. ^MContinuing in 38 seconds. ^MCo
ntinuing in 37 seconds. ^MContinuing in 36 seconds. ^MContinuing in 35 seconds. ^MContinuing in 34 seconds. ^MContinuing in 33 seconds. ^MContinuing in 32
seconds. ^MContinuing in 31 seconds. ^MContinuing in 30 seconds. ^MContinuing in 29 seconds. ^MContinuing in 28 seconds. ^MContinuing in 27 seconds. ^MCont
inuing in 26 seconds. ^MContinuing in 25 seconds. ^MContinuing in 24 seconds. ^MContinuing in 23 seconds. ^MContinuing in 22 seconds. ^MContinuing in 21 se
conds. ^MContinuing in 20 seconds. ^MContinuing in 19 seconds. ^MContinuing in 18 seconds. ^MContinuing in 17 seconds. ^MContinuing in 16 seconds. ^MContin
uing in 15 seconds. ^MContinuing in 14 seconds. ^MContinuing in 13 seconds. ^MContinuing in 12 seconds. ^MContinuing in 11 seconds.
Apr 23 18:06:33 node15 kernel: tinuing in 10 seconds. ^MContinuing in 9 seconds. ^MContinuing in 8 seconds. ^MContinuing in 7 seconds. ^MContinuing in 6 se
conds. ^MContinuing in 5 seconds. ^MContinuing in 4 seconds. ^MContinuing in 3 seconds. ^MContinuing in 2 seconds. ^MContinuing in 1 seconds.
Apr 23 18:06:33 node15 kernel:  <0>Fatal exception: panic in 5 seconds

From pcaulfie at redhat.com  Mon Apr 24 18:28:48 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Mon, 24 Apr 2006 19:28:48 +0100
Subject: [Linux-cluster] different subnets/ manual fencing
In-Reply-To: <200604241118.30964.ookami@gmx.de>
References: <30320.1145851164@www075.gmx.net> <444C7D24.4090504@redhat.com>
	<200604241118.30964.ookami@gmx.de>
Message-ID: <444D18E0.8080609@redhat.com>

Wolfgang Pauli wrote:
>> Yes, but you'll need to configure it for multicas rather than broadcast -
>> and make sure that any intervening routers are good enough.
> 
> That is good news. So we have the head node (dream) with two ethernet cards. 
> We want it to serve a GFS partition to two different subnets. I guess this is 
> than also doable with multicast, right?

Yes, if your router is up to it.

>> I'd like to see those please.
>>
> Thanks! I attached a text file (myoops.txt). I hope it helps. I can't tell, 
> because I never really had to deal with kernel panics before...
> 
> Thanks again,
> 
> wolfgang
> 
> P.S.: Apr 23 16:15:10 node15 kernel: Linux version 2.6.15-1.1833_FC4smp 
> (bhcompile at hs20-bc1-1.build.redhat.com) (gcc version 4.0.2 20051125 (Red Hat 
> 4.0.2-8)) #1 SMP Wed Mar 1 23:56:51 EST 2006
> 
> 
> ------------------------------------------------------------------------
> 
> Apr 23 14:03:59 node15 ccsd[2367]: Starting ccsd 1.0.0:
> Apr 23 14:03:59 node15 ccsd[2367]:  Built: Jun 16 2005 10:45:39
> Apr 23 14:03:59 node15 ccsd[2367]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
> Apr 23 14:04:00 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar  7 2006 15:36:41) installed

That's a rather old version, I'm pretty sure that bug has been fixed 
since. Can you upgrade ?


Patrick



From Bowie_Bailey at BUC.com  Mon Apr 24 19:03:35 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Mon, 24 Apr 2006 15:03:35 -0400
Subject: [Linux-cluster] GFS
Message-ID: <4766EEE585A6D311ADF500E018C154E30213394B@bnifex.cis.buc.com>

proftpd at rodriges.spb.ru wrote:
> 
> I'm use GFS only for share one iSCSI target between 2
> initiators. I'm really don't need create a cluster among
> those two nodes. I'm install GFS,GFS-kernel,fence,ccs and
> others packages, install iSCSI-initiator and even create
> gfs-file system, but then i'm going to mount gfs i received
> 
> [root at initiator src]# mount -t gfs /dev/sda1 /mnt/w/
> mount: Transport endpoint is not connected
> 
> I haven't any idea about /etc/cluster/cluster.conf file -
> It's really necessary to create them. I'm simply want to
> share one iSCSI target between 2 hosts. Can i achive
> without creating a cluster???

No, you can't use GFS without a cluster.  You need the cluster
services to manage access to the shared filesystem and to prevent
misbehaving nodes from causing data corruption.

-- 
Bowie



From rohara at redhat.com  Mon Apr 24 19:15:48 2006
From: rohara at redhat.com (Ryan O'Hara)
Date: Mon, 24 Apr 2006 14:15:48 -0500
Subject: [Linux-cluster] GFS
In-Reply-To: <4766EEE585A6D311ADF500E018C154E30213394B@bnifex.cis.buc.com>
References: <4766EEE585A6D311ADF500E018C154E30213394B@bnifex.cis.buc.com>
Message-ID: <444D23E4.8070104@redhat.com>

Bowie Bailey wrote:
 >
> proftpd at rodriges.spb.ru wrote:
> 
>>I'm use GFS only for share one iSCSI target between 2
>>initiators. I'm really don't need create a cluster among
>>those two nodes. I'm install GFS,GFS-kernel,fence,ccs and
>>others packages, install iSCSI-initiator and even create
>>gfs-file system, but then i'm going to mount gfs i received
>>
>>[root at initiator src]# mount -t gfs /dev/sda1 /mnt/w/
>>mount: Transport endpoint is not connected
>>
>>I haven't any idea about /etc/cluster/cluster.conf file -
>>It's really necessary to create them. I'm simply want to
>>share one iSCSI target between 2 hosts. Can i achive
>>without creating a cluster???
> 
> 
> No, you can't use GFS without a cluster.  You need the cluster
> services to manage access to the shared filesystem and to prevent
> misbehaving nodes from causing data corruption.
> 

You can use GFS without a cluster if you run as a standalone filesystem.

When you use GFS as shared storage, as in this case, you do need to run 
GFS in a cluster.

Ryan



From rajeshkannna at gmail.com  Tue Apr 25 05:47:46 2006
From: rajeshkannna at gmail.com (Rajesh Kanna)
Date: Tue, 25 Apr 2006 11:17:46 +0530
Subject: [Linux-cluster] Re: Linux-cluster Digest, Vol 24, Issue 35
In-Reply-To: <20060424160007.155DA73461@hormel.redhat.com>
References: <20060424160007.155DA73461@hormel.redhat.com>
Message-ID: <1d301df90604242247j6a3ba03dn32fd920b54279347@mail.gmail.com>

dear sir,

I shall want to know about basic of linux-clustering  .

reg

P.Rajeshkanna



On 4/24/06, linux-cluster-request at redhat.com
<linux-cluster-request at redhat.com> wrote:
> Send Linux-cluster mailing list submissions to
>        linux-cluster at redhat.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://www.redhat.com/mailman/listinfo/linux-cluster
> or, via email, send a message with subject or body 'help' to
>        linux-cluster-request at redhat.com
>
> You can reach the person managing the list at
>        linux-cluster-owner at redhat.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Linux-cluster digest..."
>
>
> Today's Topics:
>
>   1. Re: where are built GFS rpms?, and upgrade question (Jason)
>   2. different subnets/ manual fencing (wolfgang pauli)
>   3. Re: different subnets/ manual fencing (Patrick Caulfield)
>   4. Re: CS4 Update 2 / GUI problem ? (Alain Moulle)
>   5. Re: CS4 Update 2 / GUI problem ? (Alain Moulle)
>   6. Re: Re: CS4 Update 2 / GUI problem ? (James Parsons)
>   7. GFS (proftpd at rodriges.spb.ru)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 23 Apr 2006 21:00:57 -0400
> From: Jason <jason at monsterjam.org>
> Subject: Re: [Linux-cluster] where are built GFS rpms?, and upgrade
>        question
> To: linux clustering <linux-cluster at redhat.com>
> Message-ID: <20060424010057.GA53613 at monsterjam.org>
> Content-Type: text/plain; charset=us-ascii
>
> On Fri, Apr 21, 2006 at 03:25:11PM -0500, John Griffin-Wiesner wrote:
> > Two questions:
> >
> > 1.  I can find src.rpm's but no built GFS rpm's for rhel 3U7.
> > I believe that should be GFS-6.0.2.30-0.  Can someone
> > tell me where those are hiding?  Or do we all have to build
> > those ourselves now?
>
> http://www.gyrate.org/archives/9
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 24 Apr 2006 05:59:24 +0200 (MEST)
> From: "wolfgang pauli" <ookami at gmx.de>
> Subject: [Linux-cluster] different subnets/ manual fencing
> To: linux-cluster at redhat.com
> Message-ID: <30320.1145851164 at www075.gmx.net>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi,
>
> I spent the whole day (sunday) trying to get this working...
> I guess these two questions might solve the issue.
>
> 1. Can I have a cluster span over more than one subnet?
>
> 2. When I try to start the cluster software, I always have to start it on
> all nodes at the same time. If I don't do it, startup will hang while
> fenced is starting up. I am using manual fencing. Probably the default
> configuration. The problem is that some of the nodes produce
> kernel-panics (when starting cman). so i would like to start the nodes
> one by one and test what the problem is.
>
> <fence_daemon post_fail_delay="5" post_join_delay="20"/>
> ...
> <clusternode name="eon" votes="1">
>  <fence>
>  <method name="1">
>  <device name="human" nodename="eon"/>
>  </method>
>  </fence>
> </clusternode>
> ...
> <fencedevices>
>  <fencedevice agent="fence_manual" name="human"/>
> </fencedevices>
>
> --
>
>
> "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
> Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 24 Apr 2006 08:24:20 +0100
> From: Patrick Caulfield <pcaulfie at redhat.com>
> Subject: Re: [Linux-cluster] different subnets/ manual fencing
> To: linux clustering <linux-cluster at redhat.com>
> Message-ID: <444C7D24.4090504 at redhat.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> wolfgang pauli wrote:
> > Hi,
> >
> > I spent the whole day (sunday) trying to get this working...
> > I guess these two questions might solve the issue.
> >
> > 1. Can I have a cluster span over more than one subnet?
>
> Yes, but you'll need to configure it for multicas rather than broadcast - and
> make sure that any intervening routers are good enough.
>
> > 2. When I try to start the cluster software, I always have to start it on
> > all nodes at the same time. If I don't do it, startup will hang while
> > fenced is starting up. I am using manual fencing. Probably the default
> > configuration. The problem is that some of the nodes produce
> > kernel-panics (when starting cman).
>
> I'd like to see those please.
>
> so i would like to start the nodes
> > one by one and test what the problem is.
> >
> > <fence_daemon post_fail_delay="5" post_join_delay="20"/>
> > ...
> > <clusternode name="eon" votes="1">
> >  <fence>
> >   <method name="1">
> >   <device name="human" nodename="eon"/>
> >   </method>
> >  </fence>
> > </clusternode>
> > ...
> > <fencedevices>
> >  <fencedevice agent="fence_manual" name="human"/>
> > </fencedevices>
> >
>
>
> --
>
> patrick
>
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 24 Apr 2006 13:26:11 +0200
> From: Alain Moulle <Alain.Moulle at bull.net>
> Subject: [Linux-cluster] Re: CS4 Update 2 / GUI problem ?
> To: linux-cluster at redhat.com
> Message-ID: <444CB5D3.5010408 at bull.net>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Thanks Jim, it was effectively the problem : the second step
> about managing the fence for each node was missing (but there
> is nothing in documentation about this step and dialog boxes ...)
>
> Another problem/question:
> when you have finished the 3 nodes cluster configuration,
> and Save the file in /etc/cluster/cluster.conf on local node,
> the Icon "Send to Cluster" is not available because the
> cs4 is not active at the moment. But with three nodes,
> even if you try to start the cs4 on this local node (where
> I've done the configuration) , it can't start alone because
> the cluster is not quorate ... and you can't start cman
> on other nodes, it seems that the start is failed because
> of no cluster.conf currently on the node.
>
> So, do we have to do manually the mkdir /etc/cluster on
> both other nodes, and scp of cluster.conf towards both nodes ?
> Or is there another tip via GUI to start the cs4 on three
> nodes despite two nodes have not yet any cluster.conf available ?
>
> Thanks
> Alain Moull?
>
>
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 24 Apr 2006 13:38:16 +0200
> From: Alain Moulle <Alain.Moulle at bull.net>
> Subject: [Linux-cluster] Re: CS4 Update 2 / GUI problem ?
> To: linux-cluster at redhat.com
> Message-ID: <444CB8A8.3060502 at bull.net>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Alain Moulle wrote:
> > Thanks Jim, it was effectively the problem : the second step
> > about managing the fence for each node was missing (but there
> > is nothing in documentation about this step and dialog boxes ...)
> >
> > Another problem/question:
> > when you have finished the 3 nodes cluster configuration,
> > and Save the file in /etc/cluster/cluster.conf on local node,
> > the Icon "Send to Cluster" is not available because the
> > cs4 is not active at the moment. But with three nodes,
> > even if you try to start the cs4 on this local node (where
> > I've done the configuration) , it can't start alone because
> > the cluster is not quorate ... and you can't start cman
> > on other nodes, it seems that the start is failed because
> > of no cluster.conf currently on the node.
> >
> > So, do we have to do manually the mkdir /etc/cluster on
> > both other nodes, and scp of cluster.conf towards both nodes ?
> > Or is there another tip via GUI to start the cs4 on three
> > nodes despite two nodes have not yet any cluster.conf available ?
> >
> > Thanks
> > Alain Moull?
>
> More information :
>
> in fact, when I started cman on nodes without cluster/cluster.conf,
> they effectuvely (as expected) got a cluster.conf from another node
> connected, but they take the cluster.conf from another HA pair cluster,
> not from the third node of this current cluster !!!!
> So that's why the start fails ...
> Which is the algorythm of search when a node has no cluster.conf available ?
>
> Thanks
> Alain Moull?
>
>
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Mon, 24 Apr 2006 08:57:13 -0400
> From: James Parsons <jparsons at redhat.com>
> Subject: Re: [Linux-cluster] Re: CS4 Update 2 / GUI problem ?
> To: linux clustering <linux-cluster at redhat.com>
> Message-ID: <444CCB29.9060109 at redhat.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Alain Moulle wrote:
>
> >Thanks Jim, it was effectively the problem : the second step
> >about managing the fence for each node was missing (but there
> >is nothing in documentation about this step and dialog boxes ...)
> >
> >Another problem/question:
> >when you have finished the 3 nodes cluster configuration,
> >and Save the file in /etc/cluster/cluster.conf on local node,
> >the Icon "Send to Cluster" is not available because the
> >cs4 is not active at the moment. But with three nodes,
> >even if you try to start the cs4 on this local node (where
> >I've done the configuration) , it can't start alone because
> >the cluster is not quorate ... and you can't start cman
> >on other nodes, it seems that the start is failed because
> >of no cluster.conf currently on the node.
> >
> >So, do we have to do manually the mkdir /etc/cluster on
> >both other nodes, and scp of cluster.conf towards both nodes ?
> >Or is there another tip via GUI to start the cs4 on three
> >nodes despite two nodes have not yet any cluster.conf available ?
> >
> Yes, Alain. Currently, you must manually scp the cluster.conf to each
> node before starting the cluster. This requirement is considered
> unacceptable, however, and ease of cluster deployment is being
> aggressively pursued in two projects here.
>
> The first is an app called deploy-tool, which pulls down the necessary
> RPMs onto the machines desired as cluster nodes, and installs them AND
> copies a preliminary cluster.conf file to each node. Finally, it starts
> the cluster service daemons on each node.
>
> The second project with ease of cluster deployment as an important
> objective is the Conga project. It will provide a remote method for
> deploying clusters, configuring and monitoring them, and even adding and
> removing nodes. It will also offer remote storage management and a few
> other fun things as well.
>
> Conga and deploy-tool are both in active development, and deploy-tool is
> being beta tested now.
>
> Regards,
>
> -Jim
>
> >
> >Thanks
> >Alain Moull?
> >
> >
> >
> >
> >--
> >Linux-cluster mailing list
> >Linux-cluster at redhat.com
> >https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
>
>
>
>
> ------------------------------
>
> Message: 7
> Date: Mon, 24 Apr 2006 18:35:26 +0400
> From: <proftpd at rodriges.spb.ru>
> Subject: [Linux-cluster] GFS
> To: linux-cluster at redhat.com
> Message-ID: <web-16582530 at eltel.net>
> Content-Type: text/plain; charset="KOI8-R"
>
> Hello.
>
> I'm use GFS only for share one iSCSI target between 2
> initiators. I'm really don't need create a cluster among
> those two nodes. I'm install GFS,GFS-kernel,fence,ccs and
> others packages, install iSCSI-initiator and even create
> gfs-file system, but then i'm going to mount gfs i received
>
> [root at initiator src]# mount -t gfs /dev/sda1 /mnt/w/
> mount: Transport endpoint is not connected
>
> I haven't any idea about /etc/cluster/cluster.conf file -
> It's really necessary to create them. I'm simply want to
> share one iSCSI target between 2 hosts. Can i achive
> without creating a cluster???
>
>
>
> ------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> End of Linux-cluster Digest, Vol 24, Issue 35
> *********************************************
>



From alfeijoo at cesga.es  Tue Apr 25 06:53:53 2006
From: alfeijoo at cesga.es (Alejandro Feijoo)
Date: Tue, 25 Apr 2006 08:53:53 +0200 (CEST)
Subject: [Linux-cluster] quotas on GFS
Message-ID: <52100.193.144.44.59.1145948033.squirrel@webmail.cesga.es>

Hi,

there are any method to assgin global quotas on GFS, for example for all
users assing 6Gb for home?

or may i need edit all users?

Tanks!


++-------------------------++
Alejandro Feij?o Fraga
Tecnico de Sistemas.
Centro de supercomputaci?n de Galicia
Avda. de Vigo s/n. Campus Sur.
15705 - Santiago de Compostela. Spain
Tlfn.: 981 56 98 10 Extension: 216
Fax: 981 59 46 16




From nemanja at yu.net  Tue Apr 25 10:42:41 2006
From: nemanja at yu.net (Nemanja Miletic)
Date: Tue, 25 Apr 2006 12:42:41 +0200
Subject: [Linux-cluster] Re: problems with 8 node production gfs cluster
In-Reply-To: <4448E8E4.3000400@arnet.net.ar>
References: <1145266165.27997.57.camel@nemanja.eunet.yu>
	<1145288499.6000.15.camel@nemanja.eunet.yu>
	<20060418133704.GA16121@redhat.com>  <4448E8E4.3000400@arnet.net.ar>
Message-ID: <1145961761.30361.23.camel@nemanja.eunet.yu>

Well we applied the 'echo "0" >> /proc/cluster/lock_dlm/drop_count'
before mounting our GFS partitions. 

We also introduced another pop3 node in the cluster, installed imapproxy
on our webmail machine and made connections for pop3 and imap persistant
for 120 seconds on loadbalancers.

We did not turn on data journaling because most of files on filesystem
are not empty. 

At the moment the condition is stable. We will probably introduce
another node soon.

On Fri, 2006-04-21 at 11:15 -0300, German Staltari wrote:
> David Teigland wrote:
> > On Mon, Apr 17, 2006 at 05:41:39PM +0200, Nemanja Miletic wrote:
> >   
> >> Hi,
> >>
> >> Does anyone think that turning on journaling on files could help us
> >> speed up the access to gfs partition?
> >>
> >> This would be difficult because journaling can be turned on only on
> >> files that are empty. We have a large number of empty files of active
> >> users that download all their mail from pop3 server, so turning on
> >> jurnaling for them should be possible. 
> >>     
> >
> > Data journaling might help, it will speed up fsync(), but will increase
> > the i/o going to your storage.
> >
> >   
> >> What size should be the journals when file journaling is on?
> >>     
> >
> > Continue to use the default.
> >
> > Another thing you might try is disabling the drop-locks callback, allowing
> > GFS to cache more locks.  Do this before you mount:
> >   echo "0" >> /proc/cluster/lock_dlm/drop_count
> >
> >   
> Did you apply this changes? Could you share the results of this changes 
> in your configuration? Do you recommend it?
> Thanks
> German Staltari
> 
-- 
Nemanja Miletic, System Engineer
-----
YUnet International  http://www.EUnet.yu
Dubrovacka 35/III, 11000 Belgrade
Tel: +381 11 3305633;  Fax: +381 11 3282760
-----
This e-mail is confidential and intended only for the recipient.
Unauthorized distribution, modification or disclosure of its
contents is prohibited. If you have received this e-mail in error,
please notify the sender by telephone +381 11 3305633.




From rajiv.vaidyanath at ccur.com  Tue Apr 25 11:59:37 2006
From: rajiv.vaidyanath at ccur.com (Rajiv Vaidyanath)
Date: Tue, 25 Apr 2006 07:59:37 -0400
Subject: [Linux-cluster] cluster suite / Opteron
Message-ID: <1145966377.16894.30.camel@mouse>

Hi,

I get some compilation warnings on opteron (cluster-1.02.00)

Eg:
--------------------------------------------
drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1595: warning: long unsigned
int format, uint64_t arg (arg 2)
drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1596: warning: long unsigned
int format, uint64_t arg (arg 2)
drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1598: warning: long unsigned
int format, uint64_t arg (arg 2)
drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1599: warning: long unsigned
int format, uint64_t arg (arg 2)
--------------------------------------------

Can I safely ignore these warnings ?

Thanks,
Rajiv



From Bowie_Bailey at BUC.com  Tue Apr 25 14:11:09 2006
From: Bowie_Bailey at BUC.com (Bowie Bailey)
Date: Tue, 25 Apr 2006 10:11:09 -0400
Subject: [Linux-cluster] Re: Linux-cluster Digest, Vol 24, Issue 35
Message-ID: <4766EEE585A6D311ADF500E018C154E302133958@bnifex.cis.buc.com>

Rajesh Kanna wrote:
> dear sir,
> 
> I shall want to know about basic of linux-clustering  .
> 
> reg
> 
> P.Rajeshkanna

That's a rather open-ended question.  Check out the manuals and then
come back if you have some more specific questions.

https://www.redhat.com/docs/manuals/csgfs/

Also, don't forget to search the list archives.  Quite a bit of the
"how does the cluster work" type questions have been asked and
answered several times before on the list.

-- 
Bowie



From filipe.miranda at gmail.com  Tue Apr 25 16:11:13 2006
From: filipe.miranda at gmail.com (Filipe Miranda)
Date: Tue, 25 Apr 2006 13:11:13 -0300
Subject: [Linux-cluster] Postfix/Dovecot/GFS
Message-ID: <a6d13c780604250911s3971298bg561b29286d72c5b9@mail.gmail.com>

Hello,

We are gathering as much information possible to build a mail cluster using
RHEL4/GFS.

Could you guys help us out witht some questions?

1) Does Postfix/Dovecot is lock aware when using in conjuction with RH GFS
on a RHEL4 ?
2) Will I have to setup Postfix to use Maildir? or mbox can handle it?

Thank you,

--
Att.
---
Filipe Miranda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060425/79ff9201/attachment.htm>

From ookami at gmx.de  Tue Apr 25 18:38:41 2006
From: ookami at gmx.de (Wolfgang Pauli)
Date: Tue, 25 Apr 2006 12:38:41 -0600
Subject: [Linux-cluster] multicast howto
Message-ID: <200604251238.41472.ookami@gmx.de>

Hi,

I am trying to setup gfs on a cluster that spans over two subnets. dream is a 
node with to interefaces, one on each subnet. I thought the below setup 
should work (taken from http://gfs.wikidev.net/Installation ). But it does 
not. Can anybody tell me what is wrong with that?

cheers,

wolfgang

<?xml version="1.0" ?>
<cluster config_version="2" name="alpha_cluster">
	<fence_daemon post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="dream" votes="1">
			<altname name"dream-e1">
			<multicast addr="224.0.0.1" interface="eth0"/>
			<multicast addr="224.0.0.9" interface="eth1"/>
			<fence>
				<method name="1">
					<device name="human" nodename="dream"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="neo" votes="1">
			<multicast addr="224.0.0.1" interface="eth0"/>
			<fence>
				<method name="1">
					<device name="human" nodename="neo"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="node1" votes="1">
			<multicast addr="224.0.0.9" interface="eth0"/>
			<fence>
				<method name="1">
					<device name="human" nodename="node1"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<cman expected_votes="1" two_node="1">
		<multicast addr="224.0.0.1"/>
		<multicast addr="224.0.0.9"/>
	</cman>
	<fencedevices>
		<fencedevice agent="fence_manual" name="human"/>
	</fencedevices>
	<rm>
		<failoverdomains/>
		<resources/>
	</rm>
</cluster>



From gforte at leopard.us.udel.edu  Tue Apr 25 18:51:49 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Tue, 25 Apr 2006 14:51:49 -0400
Subject: [Linux-cluster] multicast howto
In-Reply-To: <200604251238.41472.ookami@gmx.de>
References: <200604251238.41472.ookami@gmx.de>
Message-ID: <444E6FC5.6050202@leopard.us.udel.edu>

well, for starters, you've got three nodes but are using the two_node 
mode ... I'm pretty sure that won't work.

Also, I believe you need one multicast address that all the nodes 
communicate on - the multi-homed example given on that wiki page is 
intended for failover situations, not for "split-brain" networking ... I 
think.  And the addresses given there are just examples, you're going to 
need to explicitly configure your router(s) to send packets addressed to 
some multicast address that you assign for the cluster to the ports that 
the cluster nodes are attached to.

-g

Wolfgang Pauli wrote:
> Hi,
> 
> I am trying to setup gfs on a cluster that spans over two subnets. dream is a 
> node with to interefaces, one on each subnet. I thought the below setup 
> should work (taken from http://gfs.wikidev.net/Installation ). But it does 
> not. Can anybody tell me what is wrong with that?
> 
> cheers,
> 
> wolfgang
> 
> <?xml version="1.0" ?>
> <cluster config_version="2" name="alpha_cluster">
> 	<fence_daemon post_fail_delay="0" post_join_delay="3"/>
> 	<clusternodes>
> 		<clusternode name="dream" votes="1">
> 			<altname name"dream-e1">
> 			<multicast addr="224.0.0.1" interface="eth0"/>
> 			<multicast addr="224.0.0.9" interface="eth1"/>
> 			<fence>
> 				<method name="1">
> 					<device name="human" nodename="dream"/>
> 				</method>
> 			</fence>
> 		</clusternode>
> 		<clusternode name="neo" votes="1">
> 			<multicast addr="224.0.0.1" interface="eth0"/>
> 			<fence>
> 				<method name="1">
> 					<device name="human" nodename="neo"/>
> 				</method>
> 			</fence>
> 		</clusternode>
> 		<clusternode name="node1" votes="1">
> 			<multicast addr="224.0.0.9" interface="eth0"/>
> 			<fence>
> 				<method name="1">
> 					<device name="human" nodename="node1"/>
> 				</method>
> 			</fence>
> 		</clusternode>
> 	</clusternodes>
> 	<cman expected_votes="1" two_node="1">
> 		<multicast addr="224.0.0.1"/>
> 		<multicast addr="224.0.0.9"/>
> 	</cman>
> 	<fencedevices>
> 		<fencedevice agent="fence_manual" name="human"/>
> 	</fencedevices>
> 	<rm>
> 		<failoverdomains/>
> 		<resources/>
> 	</rm>
> </cluster>
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


-- 
Greg Forte
gforte at udel.edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE



From rick at espresolutions.com  Tue Apr 25 17:17:35 2006
From: rick at espresolutions.com (Rick Bansal)
Date: Tue, 25 Apr 2006 12:17:35 -0500
Subject: [Linux-cluster] mysql and redhat cluster suite?
Message-ID: <200604251912.k3PJCgMs014563@mx3.redhat.com>

Did anyone successfully get multiple mysql daemons to run against a shared
data store per Vladimir Grujic suggestion (post Mon,  19 Dec 2005)?  I have
not been able to as yet.

I'm using mysql 4.1 and cannot turn on external-locking.  It looks like the
option has been complied out in the binaries I have.  I'm currently trying
to rebuild from source with the "skip-locking" option removed.  I'll see if
that helps.

If anyone has successfully gotten multiple mysql daemons transacting against
a shared data store, I'd greatly appreciate any advise.  Thanks in advance.

Regards,
Rick Bansal




From kjalleda at gmail.com  Tue Apr 25 22:21:08 2006
From: kjalleda at gmail.com (Kishore Jalleda)
Date: Tue, 25 Apr 2006 18:21:08 -0400
Subject: [Linux-cluster] mysql and redhat cluster suite?
In-Reply-To: <200604251912.k3PJCgMs014563@mx3.redhat.com>
References: <200604251912.k3PJCgMs014563@mx3.redhat.com>
Message-ID: <78aaf6710604251521m51f9e53o3b6f96cfc4d0cc12@mail.gmail.com>

You can only acheive this using Mysql Cluster, if you are talking about
multiple mysql daemons using a shared data store, then I don't think you can
acheive this using traditional mysql storage engines,

Also I am just curious to know why do u need this kind of a setup ?? don't
get confused with the Redhat Cluster suite's shared storage which uses GFS,
and the locking is taken care of by the GFS, where multiple servers can
read/write to a shared storage without worrying about conflicts/locking
etc.

Mysql Cluster suite is very analogous  to Redhat Cluster suite in the
sense/intention that multiple nodes/daemons/instances can
write simultaneously to a shared data store, with the difference that Mysql
cluster suite is based on a shared nothing architecture, which has many SQLD
nodes (aka servers in redhat) with data on multiple NDBD nodes (aka shared
storage in redhat)

Hope this helps

Kishore Jalleda
http://kjalleda.googlepages.com/projects

On 4/25/06, Rick Bansal <rick at espresolutions.com> wrote:
>
> Did anyone successfully get multiple mysql daemons to run against a shared
> data store per Vladimir Grujic suggestion (post Mon,  19 Dec 2005)?  I
> have
> not been able to as yet.
>
> I'm using mysql 4.1 and cannot turn on external-locking.  It looks like
> the
> option has been complied out in the binaries I have.  I'm currently trying
> to rebuild from source with the "skip-locking" option removed.  I'll see
> if
> that helps.
>
> If anyone has successfully gotten multiple mysql daemons transacting
> against
> a shared data store, I'd greatly appreciate any advise.  Thanks in
> advance.
>
> Regards,
> Rick Bansal
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060425/53a158d9/attachment.htm>

From pcaulfie at redhat.com  Wed Apr 26 07:39:05 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 26 Apr 2006 08:39:05 +0100
Subject: [Linux-cluster] multicast howto
In-Reply-To: <200604251238.41472.ookami@gmx.de>
References: <200604251238.41472.ookami@gmx.de>
Message-ID: <444F2399.6020809@redhat.com>

Wolfgang Pauli wrote:
> Hi,
> 
> I am trying to setup gfs on a cluster that spans over two subnets. dream is a 
> node with to interefaces, one on each subnet. I thought the below setup 
> should work (taken from http://gfs.wikidev.net/Installation ). But it does 
> not. Can anybody tell me what is wrong with that?

Your multicast address entries are all to cock.

They should be the same for /all/ nodes, /and/ for the cman entry. Anwyay,
multi-home in CMAN isn't supported by the DLM so you must only specify one
multicast address and use ethernet bonding to get multi-path.

-- 

patrick



From sander at elexis.nl  Wed Apr 26 11:50:43 2006
From: sander at elexis.nl (Sander van Beek - Elexis)
Date: Wed, 26 Apr 2006 13:50:43 +0200
Subject: [Linux-cluster] Which 2.6 kernel and cluster tarball will compile
	together?
Message-ID: <7.0.1.0.0.20060426134654.00a2fec0@elexis.nl>

Hi all,

I'm trying to get GFS working on Slackware 10.2
Tried to compile the sources by hand, but I cannot find any working 
kernel/cluster combo that will compile.
Tried 2.6.9 & cluster-1.02.00, 2.6.12.2 & cluster-1.02.00, the latest 
2.6.16 & cluster-1.02.00 and the latest cluster CVS release. But none 
of these will compile together. Can anyone recommend me a working combination?


With best regards,
Sander van Beek

---------------------------------------

Ing. S. van Beek
Elexis
Marketing 9
6921 RE Duiven
The Netherlands

Tel:    +31 (0)26 7110329
Mob:    +31 (0)6 28395109
Fax:    +31 (0)318 611112
Email: sander at elexis.nl
Web:    http://www.elexis.nl



From sander at elexis.nl  Wed Apr 26 11:59:32 2006
From: sander at elexis.nl (Sander van Beek - Elexis)
Date: Wed, 26 Apr 2006 13:59:32 +0200
Subject: [Linux-cluster] MySQL on GFS benchmarks
Message-ID: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl>

Hi all,

We did a quick benchmark on our 2 node rhel4 testcluster with gfs and 
a gnbd storage server. The results were very sad. One of the nodes 
(p3 1ghz, 512 mb) could run +/- 2400 insert queries per second when 
running mysqld-max 5.0.20 on a local ext3 filesystem. With a 2 node 
GFS over GNBD setup and inserts on both nodes at the same time, we 
only could do 80 inserts per second. I'm very interested in the 
perfomance others got in a similar setup. Would the performance 
increase when we use software based iscsi instead of gnbd?
Or should we simply buy SAN equipment? Does anyone have statistics to 
compare a standalone mysql setup to a small gfs cluster using a san?


With best regards,
Sander van Beek

---------------------------------------

Ing. S. van Beek
Elexis
Marketing 9
6921 RE Duiven
The Netherlands

Tel:    +31 (0)26 7110329
Mob:    +31 (0)6 28395109
Fax:    +31 (0)318 611112
Email: sander at elexis.nl
Web:    http://www.elexis.nl



From pcaulfie at redhat.com  Wed Apr 26 12:00:15 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 26 Apr 2006 13:00:15 +0100
Subject: [Linux-cluster] Which 2.6 kernel and cluster tarball will compile
	together?
In-Reply-To: <7.0.1.0.0.20060426134654.00a2fec0@elexis.nl>
References: <7.0.1.0.0.20060426134654.00a2fec0@elexis.nl>
Message-ID: <444F60CF.3090409@redhat.com>

Sander van Beek - Elexis wrote:
> Hi all,
> 
> I'm trying to get GFS working on Slackware 10.2
> Tried to compile the sources by hand, but I cannot find any working
> kernel/cluster combo that will compile.
> Tried 2.6.9 & cluster-1.02.00, 2.6.12.2 & cluster-1.02.00, the latest
> 2.6.16 & cluster-1.02.00 and the latest cluster CVS release. But none of
> these will compile together. Can anyone recommend me a working combination?
> 

kernel 2.6.16 & cluster 1.02.00 (or CVS -rSTABLE) should compile. That's what
I'm using here.

-- 

patrick



From marco.lusini at governo.it  Wed Apr 26 12:07:11 2006
From: marco.lusini at governo.it (Marco Lusini)
Date: Wed, 26 Apr 2006 14:07:11 +0200
Subject: R: [Linux-cluster] multicast howto
In-Reply-To: <444F2399.6020809@redhat.com>
Message-ID: <00d701c66929$f2bdf9f0$8ec9100a@nicchio>


> 
> They should be the same for /all/ nodes, /and/ for the cman 
> entry. Anwyay, multi-home in CMAN isn't supported by the DLM 
> so you must only specify one multicast address and use 
> ethernet bonding to get multi-path.
> 

Since I am not using GFS, but just CS4, is it safe to use 
<altname> to run heartbeat on multiple interfaces?

TIA,

Marco Lusini


_______________________________________________________
Messaggio analizzato e protetto da tecnologia antivirus

Servizio erogato dal sistema informativo della 
Presidenza del Consiglio dei Ministri



From pcaulfie at redhat.com  Wed Apr 26 12:36:09 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 26 Apr 2006 13:36:09 +0100
Subject: R: [Linux-cluster] multicast howto
In-Reply-To: <00d701c66929$f2bdf9f0$8ec9100a@nicchio>
References: <00d701c66929$f2bdf9f0$8ec9100a@nicchio>
Message-ID: <444F6939.9000804@redhat.com>

Marco Lusini wrote:
>> They should be the same for /all/ nodes, /and/ for the cman 
>> entry. Anwyay, multi-home in CMAN isn't supported by the DLM 
>> so you must only specify one multicast address and use 
>> ethernet bonding to get multi-path.
>>
> 
> Since I am not using GFS, but just CS4, is it safe to use 
> <altname> to run heartbeat on multiple interfaces?

It should be. Because of the DLM shortcomings it hasn't been tested for a
while though.

-- 

patrick



From rajiv.vaidyanath at ccur.com  Mon Apr 24 17:28:46 2006
From: rajiv.vaidyanath at ccur.com (Rajiv Vaidyanath)
Date: Mon, 24 Apr 2006 13:28:46 -0400
Subject: [Linux-cluster] cluster suite / opteron
Message-ID: <1145899726.16894.28.camel@mouse>

Hi,

I get some compilation warnings on opteron (cluster-1.02.00)

Eg:
--------------------------------------------
drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1595: warning: long unsigned
int format, uint64_t arg (arg 2)
drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1596: warning: long unsigned
int format, uint64_t arg (arg 2)
drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1598: warning: long unsigned
int format, uint64_t arg (arg 2)
drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1599: warning: long unsigned
int format, uint64_t arg (arg 2)
--------------------------------------------

Can I safely ignore these warnings ?

Thanks,
Rajiv



From pauli at grey.colorado.edu  Mon Apr 24 17:11:45 2006
From: pauli at grey.colorado.edu (Wolfgang Pauli)
Date: Mon, 24 Apr 2006 11:11:45 -0600
Subject: [Linux-cluster] different subnets/ manual fencing
In-Reply-To: <444C7D24.4090504@redhat.com>
References: <30320.1145851164@www075.gmx.net> <444C7D24.4090504@redhat.com>
Message-ID: <200604241111.45417.pauli@grey.colorado.edu>

> Yes, but you'll need to configure it for multicas rather than broadcast -
> and make sure that any intervening routers are good enough.

That is good news. So we have the head node (dream) with two ethernet cards. 
We want it to serve a GFS partition to two different subnets. I guess this is 
than also doable with multicast, right?

> I'd like to see those please.
>
Thanks! I attached a text file (myoops.txt). I hope it helps. I can't tell, 
because I never really had to deal with kernel panics before...

Thanks again,

wolfgang

P.S.: Apr 23 16:15:10 node15 kernel: Linux version 2.6.15-1.1833_FC4smp 
(bhcompile at hs20-bc1-1.build.redhat.com) (gcc version 4.0.2 20051125 (Red Hat 
4.0.2-8)) #1 SMP Wed Mar 1 23:56:51 EST 2006
-------------- next part --------------
Apr 23 14:03:59 node15 ccsd[2367]: Starting ccsd 1.0.0:
Apr 23 14:03:59 node15 ccsd[2367]:  Built: Jun 16 2005 10:45:39
Apr 23 14:03:59 node15 ccsd[2367]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Apr 23 14:04:00 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar  7 2006 15:36:41) installed
Apr 23 14:04:00 node15 kernel: NET: Registered protocol family 30
Apr 23 14:04:00 node15 ccsd[2367]: cluster.conf (cluster name = oreilly_cluster, version = 33) found.
Apr 23 14:04:03 node15 kernel: CMAN: Waiting to join or form a Linux-cluster
Apr 23 14:04:03 node15 ccsd[2367]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2
Apr 23 14:04:03 node15 ccsd[2367]: Initial status:: Inquorate
Apr 23 14:04:32 node15 kernel: CMAN: sending membership request
Apr 23 14:04:53 node15 last message repeated 19 times
Apr 23 14:04:54 node15 kernel: CMAN: got node node27
Apr 23 14:04:54 node15 kernel: CMAN: got node node17
Apr 23 14:04:54 node15 kernel: CMAN: got node node16
Apr 23 14:04:54 node15 kernel: CMAN: got node node24
Apr 23 14:04:54 node15 kernel: CMAN: got node node1
Apr 23 14:04:54 node15 kernel: CMAN: got node node2
Apr 23 14:04:54 node15 kernel: CMAN: got node node23
Apr 23 14:04:54 node15 kernel: CMAN: got node node6
Apr 23 14:04:54 node15 kernel: CMAN: got node node10
Apr 23 14:04:54 node15 kernel: CMAN: got node node9
Apr 23 14:05:01 node15 kernel: CMAN: quorum regained, resuming activity
Apr 23 14:05:01 node15 ccsd[2367]: Cluster is quorate.  Allowing connections.
Apr 23 14:05:01 node15 kernel: dlm: no version for "struct_module" found: kernel tainted.
Apr 23 14:05:01 node15 kernel: DLM 2.6.11.5-20050601.152643.FC4.22 (built Mar  7 2006 15:42:37) installed
Apr 23 14:07:28 node15 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Apr 23 14:07:28 node15 kernel:  printing eip:
Apr 23 14:07:28 node15 kernel: f8acfa39
Apr 23 14:07:28 node15 kernel: *pde = 37d1f001
Apr 23 14:07:28 node15 kernel: Oops: 0000 [#1]
Apr 23 14:07:28 node15 kernel: SMP
Apr 23 14:07:28 node15 kernel: last sysfs file: /class/misc/dlm-control/dev
Apr 23 14:07:28 node15 kernel: Modules linked in: dlm(U) cman(U) ipv6 parport_pc lp parport autofs4 nfs lockd nfs_acl sunrpc dm_mod eepro100 uhci_hcd hw_ra
ndom i8xx_tco i2c_i801 i2c_core e1000 e100 mii floppy ext3 jbd
Apr 23 14:07:28 node15 kernel: CPU:    0
Apr 23 14:07:28 node15 kernel: EIP:    0060:[<f8acfa39>]    Tainted: GF     VLI
Apr 23 14:07:28 node15 kernel: EFLAGS: 00010207   (2.6.15-1.1833_FC4smp)
Apr 23 14:07:28 node15 kernel: EIP is at memcpy_fromkvec+0x2e/0x4f [cman]
Apr 23 14:07:28 node15 kernel: eax: 00000046   ebx: c1d7aeb7   ecx: 00000011   edx: f68b0fa0
Apr 23 14:07:28 node15 kernel: esi: 00000000   edi: c1d7aeb7   ebp: 00000046   esp: f68b0ec8
Apr 23 14:07:28 node15 kernel: ds: 007b   es: 007b   ss: 0068
Apr 23 14:07:28 node15 kernel: Process cman_comms (pid: 2392, threadinfo=f68b0000 task=f7d85550)
Apr 23 14:07:28 node15 kernel: Stack: badc0ded f6846380 00000000 f7386400 f68b0f74 f8acfbb3 00000100 00000002
Apr 23 14:07:28 node15 kernel:        00000040 f731e000 f7decb40 f731e001 00000001 00000001 f6cea440 f8ad002d
Apr 23 14:07:28 node15 kernel:        f68b0f90 00000001 000002fd c1b091e0 f68b0f90 f68b0f74 f7decb40 c1eb4940
Apr 23 14:07:28 node15 kernel: Call Trace:
Apr 23 14:07:28 node15 kernel:  [<f8acfbb3>] send_to_user_port+0x159/0x3cc [cman]     [<f8ad002d>] process_incoming_packet+0x207/0x26c [cman]
Apr 23 14:07:29 node15 kernel:  [<f8acf188>] receive_message+0xb7/0xe0 [cman]     [<f8acf33c>] cluster_kthread+0x18b/0x39f [cman]
Apr 23 14:07:29 node15 kernel:  [<c01205ee>] default_wake_function+0x0/0xc     [<f8acf1b1>] cluster_kthread+0x0/0x39f [cman]
Apr 23 14:07:29 node15 kernel:  [<c010243d>] kernel_thread_helper+0x5/0xb
Apr 23 14:07:29 node15 kernel: Code: 53 89 c3 89 cd 85 c9 7e 3e 83 c2 08 eb 07 83 c2 08 85 ed 7e 32 8b 42 fc 85 c0 74 f2 39 e8 0f 47 c5 89 c1 c1 e9 02 8b 7
2 f8 89 df <f3> a5 89 c1 83 e1 03 74 02 f3 a4 29 c5 01 c3 01 42 f8 29 42 fc
Apr 23 14:07:29 node15 kernel: Continuing in 120 seconds. ^MContinuing in 119 seconds. ^MContinuing in 118 seconds. ^MContinuing in 117 seconds. ^MContinui
ng in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111
seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds.
Continuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing
 in 100 seconds. ^M<4>hda: dma_timer_expiry: dma status == 0x24
Apr 23 14:07:29 node15 kernel: Continuing in 99 seconds. ^MContinuing in 98 seconds. ^MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing i
n 95 seconds. ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds.
hda: DMA interrupt recovery
Apr 23 14:07:29 node15 kernel: hda: lost interrupt
Apr 23 14:07:29 node15 kernel: Continuing in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing i
n 85 seconds. ^MContinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 80 seconds.
Continuing in 79 seconds. ^MContinuing in 78 seconds. ^MContinuing in 77 seconds. ^MContinuing in 76 seconds. ^MContinuing in 75 seconds. ^MContinuing in 7
4 seconds. ^MContinuing in 73 seconds. ^MContinuing in 72 seconds. ^MContinuing in 71 seconds. ^MContinuing in 70 seconds. ^MContinuing in 69 seconds. ^M<4
>hda: dma_timer_expiry: dma status == 0x24
Apr 23 14:07:29 node15 kernel: Continuing in 68 seconds. ^MContinuing in 67 seconds. ^MContinuing in 66 seconds. ^MContinuing in 65 seconds. ^MContinuing i
n 64 seconds. ^MContinuing in 63 seconds. ^MContinuing in 62 seconds. ^MContinuing in 61 seconds. ^M<6>ide-cd: cmd 0x3 timed out
Apr 23 14:07:29 node15 kernel: hdc: lost interrupt
Apr 23 14:07:29 node15 kernel: Continuing in 60 seconds. ^MContinuing in 59 seconds. ^Mhda: DMA interrupt recovery
Apr 23 14:07:29 node15 kernel: hda: lost interrupt
Apr 23 14:07:29 node15 kernel: Continuing in 58 seconds. ^MContinuing in 57 seconds. ^MContinuing in 56 seconds. ^MContinuing in 55 seconds. ^MContinuing i
n 54 seconds. ^MContinuing in 53 seconds. ^MContinuing in 52 seconds. ^MContinuing in 51 seconds. ^MContinuing in 50 seconds. ^MContinuing in 49 seconds.
Continuing in 48 seconds. ^MContinuing in 47 seconds. ^MContinuing in 46 seconds. ^MContinuing in 45 seconds. ^MContinuing in 44 seconds. ^MContinuing in 4
3 seconds. ^MContinuing in 42 seconds. ^MContinuing in 41 seconds. ^MContinuing in 40 seconds. ^MContinuing in 39 seconds. ^M<4>hda: dma_timer_expiry: dma
status == 0x24
Apr 23 14:07:29 node15 kernel: Continuing in 38 seconds. ^MContinuing in 37 seconds. ^MContinuing in 36 seconds. ^MContinuing in 35 seconds. ^MContinuing i
n 34 seconds. ^MContinuing in 33 seconds. ^MContinuing in 32 seconds. ^MContinuing in 31 seconds. ^MContinuing in 30 seconds. ^MContinuing in 29 seconds.
hda: DMA interrupt recovery
Apr 23 14:07:29 node15 kernel: hda: lost interrupt
Apr 23 14:07:29 node15 kernel: Continuing in 28 seconds. ^MContinuing in 27 seconds. ^MContinuing in 26 seconds. ^MContinuing in 25 seconds. ^MContinuing i
n 24 seconds. ^MContinuing in 23 seconds. ^MContinuing in 22 seconds. ^MContinuing in 21 seconds. ^MContinuing in 20 seconds. ^MContinuing in 19 seconds.
Continuing in 18 seconds. ^MContinuing in 17 seconds. ^MContinuing in 16 seconds. ^MContinuing in 15 seconds. ^MContinuing in 14 seconds. ^MContinuing in 1
3 seconds. ^MContinuing in 12 seconds. ^MContinuing in 11 seconds. ^MContinuing in 10 seconds. ^MContinuing in 9 seconds. ^M<4>hda: dma_timer_expiry: dma s
tatus == 0x24
Apr 23 14:07:29 node15 kernel: Continuing in 8 seconds. ^MContinuing in 7 seconds. ^MContinuing in 6 seconds. ^MContinuing in 5 seconds. ^MContinuing in 4
seconds. ^MContinuing in 3 seconds. ^MContinuing in 2 seconds. ^MContinuing in 1 seconds.
Apr 23 14:07:29 node15 kernel:  <0>Fatal exception: panic in 5 seconds


# -----------------------------------------------

Apr 23 16:07:04 node15 ccsd[2373]: Starting ccsd 1.0.0:
Apr 23 16:07:04 node15 ccsd[2373]:  Built: Jun 16 2005 10:45:39
Apr 23 16:07:04 node15 ccsd[2373]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Apr 23 16:07:05 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar  7 2006 15:36:41) installed
Apr 23 16:07:05 node15 kernel: NET: Registered protocol family 30
Apr 23 16:07:05 node15 ccsd[2373]: cluster.conf (cluster name = oreilly_cluster, version = 33) found.
Apr 23 16:07:14 node15 kernel: CMAN: Waiting to join or form a Linux-cluster
Apr 23 16:07:14 node15 ccsd[2373]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2
Apr 23 16:07:14 node15 ccsd[2373]: Initial status:: Inquorate
Apr 23 16:07:17 node15 kernel: CMAN: sending membership request
Apr 23 16:07:37 node15 last message repeated 27 times
Apr 23 16:07:38 node15 kernel: CMAN: got node node2
Apr 23 16:07:38 node15 kernel: CMAN: got node node26
Apr 23 16:07:38 node15 kernel: CMAN: got node node6
Apr 23 16:07:38 node15 kernel: CMAN: got node node27
Apr 23 16:07:38 node15 kernel: CMAN: got node node4
Apr 23 16:07:38 node15 kernel: CMAN: got node node5
Apr 23 16:07:38 node15 kernel: CMAN: got node node17
Apr 23 16:07:38 node15 kernel: CMAN: got node node3
Apr 23 16:07:38 node15 kernel: CMAN: got node node18
Apr 23 16:07:38 node15 kernel: CMAN: got node node16
Apr 23 16:07:38 node15 kernel: CMAN: got node node23
Apr 23 16:07:38 node15 kernel: CMAN: got node node12
Apr 23 16:07:38 node15 kernel: CMAN: got node node7
Apr 23 16:07:38 node15 ccsd[2373]: Cluster is quorate.  Allowing connections.
Apr 23 16:07:38 node15 kernel: CMAN: got node dream
Apr 23 16:07:38 node15 kernel: CMAN: got node node20
Apr 23 16:07:38 node15 kernel: CMAN: quorum regained, resuming activity
Apr 23 16:07:38 node15 kernel: dlm: no version for "struct_module" found: kernel tainted.
Apr 23 16:07:38 node15 kernel: DLM 2.6.11.5-20050601.152643.FC4.22 (built Mar  7 2006 15:42:37) installed
Apr 23 16:10:06 node15 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Apr 23 16:10:06 node15 kernel:  printing eip:
Apr 23 16:10:06 node15 kernel: f8a85a39
Apr 23 16:10:06 node15 kernel: *pde = 363b4001
Apr 23 16:10:06 node15 kernel: Oops: 0000 [#1]
Apr 23 16:10:06 node15 kernel: SMP
Apr 23 16:10:06 node15 kernel: last sysfs file: /class/misc/dlm-control/dev
Apr 23 16:10:06 node15 kernel: Modules linked in: dlm(U) cman(U) ipv6 parport_pc lp parport autofs4 sunrpc dm_mod eepro100 uhci_hcd hw_random i8xx_tco i2c_
i801 i2c_core e1000 e100 mii floppy ext3 jbd
Apr 23 16:10:06 node15 kernel: CPU:    0
Apr 23 16:10:06 node15 kernel: EIP:    0060:[<f8a85a39>]    Tainted: GF     VLI
Apr 23 16:10:06 node15 kernel: EFLAGS: 00010202   (2.6.15-1.1833_FC4smp)
Apr 23 16:10:06 node15 kernel: EIP is at memcpy_fromkvec+0x2e/0x4f [cman]
Apr 23 16:10:06 node15 kernel: eax: 00000040   ebx: c1db9eba   ecx: 00000010   edx: f6344fa0
Apr 23 16:10:06 node15 kernel: esi: 00000000   edi: c1db9eba   ebp: 00000040   esp: f6344ec8
Apr 23 16:10:06 node15 kernel: ds: 007b   es: 007b   ss: 0068
Apr 23 16:10:06 node15 kernel: Process cman_comms (pid: 2399, threadinfo=f6344000 task=c1e73aa0)
Apr 23 16:10:06 node15 kernel: Stack: badc0ded f63d5a80 00000000 f6000a00 f6344f74 f8a85bb3 00000100 00000002
Apr 23 16:10:06 node15 kernel:        00000040 f66a9800 f6319a40 f66a9801 00000001 00000001 f6383cc0 f8a8602d
Apr 23 16:10:06 node15 kernel:        f6344f90 00000001 000002fa c1b091e0 f6344f90 f6344f74 f6319a40 f7dc1b80
Apr 23 16:10:06 node15 kernel: Call Trace:
Apr 23 16:10:06 node15 kernel:  [<f8a85bb3>] send_to_user_port+0x159/0x3cc [cman]     [<f8a8602d>] process_incoming_packet+0x207/0x26c [cman]
Apr 23 16:10:06 node15 kernel:  [<f8a85188>] receive_message+0xb7/0xe0 [cman]     [<f8a8533c>] cluster_kthread+0x18b/0x39f [cman]
Apr 23 16:10:06 node15 kernel:  [<c01205ee>] default_wake_function+0x0/0xc     [<f8a851b1>] cluster_kthread+0x0/0x39f [cman]
Apr 23 16:10:06 node15 kernel:  [<c010243d>] kernel_thread_helper+0x5/0xb
Apr 23 16:10:06 node15 kernel: Code: 53 89 c3 89 cd 85 c9 7e 3e 83 c2 08 eb 07 83 c2 08 85 ed 7e 32 8b 42 fc 85 c0 74 f2 39 e8 0f 47 c5 89 c1 c1 e9 02 8b 7
2 f8 89 df <f3> a5 89 c1 83 e1 03 74 02 f3 a4 29 c5 01 c3 01 42 f8 29 42 fc
Apr 23 16:10:06 node15 kernel:  <0>Fatal exception: panic in 5 seconds





# -----------------------------------------------

Apr 23 18:05:33 node15 ccsd[3356]: Starting ccsd 1.0.0:
Apr 23 18:05:33 node15 ccsd[3356]:  Built: Jun 16 2005 10:45:39
Apr 23 18:05:33 node15 ccsd[3356]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Apr 23 18:05:34 node15 kernel: CMAN 2.6.11.5-20050601.152643.FC4.23 (built Mar  7 2006 15:36:41) installed
Apr 23 18:05:34 node15 kernel: NET: Registered protocol family 30
Apr 23 18:05:34 node15 ccsd[3356]: cluster.conf (cluster name = oreilly_cluster, version = 35) found.
Apr 23 18:05:35 node15 kernel: CMAN: Waiting to join or form a Linux-cluster
Apr 23 18:05:36 node15 ccsd[3356]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.2
Apr 23 18:05:36 node15 ccsd[3356]: Initial status:: Inquorate
Apr 23 18:05:36 node15 kernel: CMAN: sending membership request
Apr 23 18:05:36 node15 kernel: CMAN: got node dream
Apr 23 18:06:13 node15 kernel: CMAN: quorum regained, resuming activity
Apr 23 18:06:13 node15 ccsd[3356]: Cluster is quorate.  Allowing connections.
Apr 23 18:06:13 node15 kernel: dlm: no version for "struct_module" found: kernel tainted.
Apr 23 18:06:13 node15 kernel: DLM 2.6.11.5-20050601.152643.FC4.22 (built Mar  7 2006 15:42:37) installed
Apr 23 18:06:23 node15 kernel: CMAN: node node1 rejoining
Apr 23 18:06:28 node15 last message repeated 3 times
Apr 23 18:06:32 node15 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Apr 23 18:06:32 node15 kernel:  printing eip:
Apr 23 18:06:32 node15 kernel: f8a85a39
Apr 23 18:06:32 node15 kernel: *pde = 37e89001
Apr 23 18:06:32 node15 kernel: Oops: 0000 [#1]
Apr 23 18:06:32 node15 kernel: SMP
Apr 23 18:06:32 node15 kernel: last sysfs file: /class/misc/dlm-control/dev
Apr 23 18:06:32 node15 kernel: Modules linked in: dlm(U) cman(U) nfs lockd nfs_acl ipv6 parport_pc lp parport autofs4 sunrpc dm_mod eepro100 uhci_hcd hw_ra
ndom i8xx_tco i2c_i801 i2c_core e1000 e100 mii floppy ext3 jbd
Apr 23 18:06:32 node15 kernel: CPU:    0
Apr 23 18:06:32 node15 kernel: EIP:    0060:[<f8a85a39>]    Tainted: GF     VLI
Apr 23 18:06:32 node15 kernel: EFLAGS: 00010203   (2.6.15-1.1833_FC4smp)
Apr 23 18:06:32 node15 kernel: EIP is at memcpy_fromkvec+0x2e/0x4f [cman]
Apr 23 18:06:32 node15 kernel: eax: 00000042   ebx: c1ffcab9   ecx: 00000010   edx: f5c22fa0
Apr 23 18:06:32 node15 kernel: esi: 00000000   edi: c1ffcab9   ebp: 00000042   esp: f5c22ec8
Apr 23 18:06:32 node15 kernel: ds: 007b   es: 007b   ss: 0068
Apr 23 18:06:32 node15 kernel: Process cman_comms (pid: 3387, threadinfo=f5c22000 task=c1e50000)
Apr 23 18:06:33 node15 kernel: Stack: badc0ded f7fd6d80 00000000 f6462000 f5c22f74 f8a85bb3 00000100 00000002
Apr 23 18:06:33 node15 kernel:        00000040 f66b1000 f5caf9c0 f66b1001 00000001 00000001 f5caf840 f8a8602d
Apr 23 18:06:33 node15 kernel:        f5c22f90 00000001 000002fb c1b091e0 f5c22f90 f5c22f74 f5caf9c0 c1e6d100
Apr 23 18:06:33 node15 kernel: Call Trace:
Apr 23 18:06:33 node15 kernel:  [<f8a85bb3>] send_to_user_port+0x159/0x3cc [cman]     [<f8a8602d>] process_incoming_packet+0x207/0x26c [cman]
Apr 23 18:06:33 node15 kernel:  [<f8a85188>] receive_message+0xb7/0xe0 [cman]     [<f8a8533c>] cluster_kthread+0x18b/0x39f [cman]
Apr 23 18:06:33 node15 kernel:  [<c01205ee>] default_wake_function+0x0/0xc     [<f8a851b1>] cluster_kthread+0x0/0x39f [cman]
Apr 23 18:06:33 node15 kernel:  [<c010243d>] kernel_thread_helper+0x5/0xb
Apr 23 18:06:33 node15 kernel: Code: 53 89 c3 89 cd 85 c9 7e 3e 83 c2 08 eb 07 83 c2 08 85 ed 7e 32 8b 42 fc 85 c0 74 f2 39 e8 0f 47 c5 89 c1 c1 e9 02 8b 7
2 f8 89 df <f3> a5 89 c1 83 e1 03 74 02 f3 a4 29 c5 01 c3 01 42 f8 29 42 fc
Apr 23 18:06:33 node15 kernel: Continuing in 120 seconds. ^MContinuing in 119 seconds. ^MContinuing in 118 seconds. ^MContinuing in 117 seconds. ^MContinui
ng in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111
seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds.
Continuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing
 in 100 seconds. ^MContinuing in 99 seconds. ^MContinuing in 98 seconds. ^MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing in 95 seconds
. ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds. ^MContinuing
in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing in 85 seconds. ^MCo
Apr 23 18:06:33 node15 kernel: tinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 8
ng in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111
seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds.
Continuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing
 in 100 seconds. ^MContinuing in 99 seconds. ^MContinuing in 98 seconds. ^MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing in 95 seconds
. ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds. ^MContinuing
in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing in 85 seconds. ^MCo
Apr 23 18:06:33 node15 kernel: tinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 8
0 seconds. ^MContinuing in 79 seconds. ^MContinuing in 78 seconds. ^MContinuing in 77 seconds. ^MContinuing in 76 seconds. ^MContinuing in 75 seconds. ^MCo
ntinuing in 74 seconds. ^MContinuing in 73 seconds. ^MContinuing in 72 seconds. ^MContinuing in 71 seconds. ^MContinuing in 70 seconds. ^MContinuing in 69
seconds. ^MContinuing in 68 seconds. ^MContinuing in 67 seconds. ^MContinuing in 66 seconds. ^MContinuing in 65 seconds. ^MContinuing in 64 seconds. ^MCont
inuing in 63 seconds. ^MContinuing in 62 seconds. ^MContinuing in 61 seconds. ^MContinuing in 60 seconds. ^MContinuing in 59 seconds. ^MContinuing in 58 se
conds. ^MContinuing in 57 seconds. ^MContinuing in 56 seconds. ^MContinuing in 55 seconds. ^MContinuing in 54 seconds. ^MContinuing in 53 seconds. ^MContin
uing in 52 seconds. ^MContinuing in 51 seconds. ^MContinuing in 50 seconds. ^MContinuing in 49 seconds. ^MContinuing in 48 seconds.
Apr 23 18:06:33 node15 kernel: tinuing in 47 seconds. ^MContinuing in 46 seconds. ^MContinuing in 45 seconds. ^MContinuing in 44 seconds. ^MContinuing in 4
3 seconds. ^MContinuing in 42 seconds. ^MContinuing in 41 seconds. ^MContinuing in 40 seconds. ^MContinuing in 39 seconds. ^MContinuing in 38 seconds. ^MCo
ntinuing in 37 seconds. ^MContinuing in 36 seconds. ^MContinuing in 35 seconds. ^MContinuing in 34 seconds. ^MContinuing in 33 seconds. ^MContinuing in 32
seconds. ^MContinuing in 31 seconds. ^MContinuing in 30 seconds. ^MContinuing in 29 seconds. ^MContinuing in 28 seconds. ^MContinuing in 27 seconds. ^MCont
inuing in 26 seconds. ^MContinuing in 25 seconds. ^MContinuing in 24 seconds. ^MContinuing in 23 seconds. ^MContinuing in 22 seconds. ^MContinuing in 21 se
conds. ^MContinuing in 20 seconds. ^MContinuing in 19 seconds. ^MContinuing in 18 seconds. ^MContinuing in 17 seconds. ^MContinuing in 16 seconds. ^MContin
uing in 15 seconds. ^MContinuing in 14 seconds. ^MContinuing in 13 seconds. ^MContinuing in 12 seconds. ^MContinuing in 11 seconds.
Apr 23 18:06:33 node15 kernel: tinuing in 10 seconds. ^MContinuing in 9 seconds. ^MContinuing in 8 seconds. ^MContinuing in 7 seconds. ^MContinuing in 6 se
conds. ^MContinuing in 5 seconds. ^MContinuing in 4 seconds. ^MContinuing in 3 seconds. ^MContinuing in 2 seconds. ^MContinuing in 1 seconds.
Apr 23 18:06:33 node15 kernel:  <0>Fatal exception: panic in 5 seconds

From darrenf at jammicron.com  Tue Apr 25 18:02:51 2006
From: darrenf at jammicron.com (Darren Fraser)
Date: Tue, 25 Apr 2006 11:02:51 -0700
Subject: [Linux-cluster] Re: Linux (qmail) clustering
References: slrne4mn9e.ipo.mykleb@99RXZYP.ibm.com
Message-ID: <444E644B.3010901@jammicron.com>

If this is a high volume mail server, GFS and qmail are not going to 
work nicely together (at least they didn't in my experience). I had a 
qmail server running on a two node cluster with about 300 virtual 
domains. Load on each node would spiral out of control until I dropped 
one of the machines out of the cluster. I've had success with GFS and 
other services (i.e. ftp and web) but just not with qmail.

After googling around some, it appears to be the "NFS safeness" in how 
qmail delivers mail (see 
http://www.redhat.com/archives/linux-cluster/2005-September/msg00220.html) 
that ruins performance on GFS.

If this diagnosis is incorrect I'd love to be straightened out because 
my plan for a load balanced, fault tolerant qmail server had to be 
scrapped a couple of months back.

Cheers,
Darren



On 2006-04-23 10:55, Jan-Frode Myklebust wrote:
 > On 2006-04-11, Haydar Akpinar wrote:
 > >
 > > I would like to know if it is possible to do and also if any one 
has done
 > > qmail clustering on a Linux box.
 >
 > Since qmail is Maildir based (no locking problems to worry about), I 
think
 > this should be fairly easy to do. You'll just need to decide which
 > directories needs to be shared, and which needs to be private to each 
node.
 > It will probably be enough to have the home directories on a shared 
storage
 > (GFS or simply just NFS), and just do load balancing by equal MX record
 > priorities.
 >
 >
 >
 > --
 > Linux-cluster mailing list
 > Linux-cluster@???
 > https://www.redhat.com/mailman/listinfo/linux-cluster
 >



From sdake at redhat.com  Tue Apr 25 18:42:29 2006
From: sdake at redhat.com (Steven Dake)
Date: Tue, 25 Apr 2006 11:42:29 -0700
Subject: [Linux-cluster] multicast howto
In-Reply-To: <200604251238.41472.ookami@gmx.de>
References: <200604251238.41472.ookami@gmx.de>
Message-ID: <1145990549.6075.119.camel@shih.broked.org>


On Tue, 2006-04-25 at 12:38 -0600, Wolfgang Pauli wrote:
> Hi,
> 
> I am trying to setup gfs on a cluster that spans over two subnets. dream is a 
> node with to interefaces, one on each subnet. I thought the below setup 
> should work (taken from http://gfs.wikidev.net/Installation ). But it does 
> not. Can anybody tell me what is wrong with that?
> 
> cheers,
> 
> wolfgang
> 
> <?xml version="1.0" ?>
> <cluster config_version="2" name="alpha_cluster">
> 	<fence_daemon post_fail_delay="0" post_join_delay="3"/>
> 	<clusternodes>
> 		<clusternode name="dream" votes="1">
> 			<altname name"dream-e1">
> 			<multicast addr="224.0.0.1" interface="eth0"/>
> 			<multicast addr="224.0.0.9" interface="eth1"/>
> 			<fence>
> 				<method name="1">
> 					<device name="human" nodename="dream"/>
> 				</method>
> 			</fence>
> 		</clusternode>
> 		<clusternode name="neo" votes="1">
> 			<multicast addr="224.0.0.1" interface="eth0"/>
> 			<fence>
> 				<method name="1">
> 					<device name="human" nodename="neo"/>
> 				</method>
> 			</fence>
> 		</clusternode>
> 		<clusternode name="node1" votes="1">
> 			<multicast addr="224.0.0.9" interface="eth0"/>
> 			<fence>
> 				<method name="1">
> 					<device name="human" nodename="node1"/>
> 				</method>
> 			</fence>
> 		</clusternode>
> 	</clusternodes>
> 	<cman expected_votes="1" two_node="1">
> 		<multicast addr="224.0.0.1"/>
> 		<multicast addr="224.0.0.9"/>
> 	</cman>
> 	<fencedevices>
> 		<fencedevice agent="fence_manual" name="human"/>
> 	</fencedevices>
> 	<rm>
> 		<failoverdomains/>
> 		<resources/>
> 	</rm>
> </cluster>
> 

Wolfgang
Do not use the multicast address 224.0.0.1.  It is reserved for some
various ipv4 operations.

Try using 225.0.0.9.  If you have a switch between the two subnets, I
would expect RHCS to work.  If you have a router, I'd expect it not to
work as the TTL must be set for multicast packets to hop across routers.
For IPV6 the hop count must be set.  It appears you are using ipv4.

If you have a switch and it doesn't work, try turning off IGMP filtering
in the switch +if it is a smart switch.  If it is a dumb switch it
should just work with some additional latencies.

Regards
-steve


> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From pcaulfie at redhat.com  Wed Apr 26 13:17:17 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Wed, 26 Apr 2006 14:17:17 +0100
Subject: [Linux-cluster] multicast howto
In-Reply-To: <1145990549.6075.119.camel@shih.broked.org>
References: <200604251238.41472.ookami@gmx.de>
	<1145990549.6075.119.camel@shih.broked.org>
Message-ID: <444F72DD.3000607@redhat.com>

Steven Dake wrote:
> On Tue, 2006-04-25 at 12:38 -0600, Wolfgang Pauli wrote:
>> Hi,
>>
>> I am trying to setup gfs on a cluster that spans over two subnets. dream is a 
>> node with to interefaces, one on each subnet. I thought the below setup 
>> should work (taken from http://gfs.wikidev.net/Installation ). But it does 
>> not. Can anybody tell me what is wrong with that?
>>
>> cheers,
>>
>> wolfgang
>>
>> <?xml version="1.0" ?>
>> <cluster config_version="2" name="alpha_cluster">
>> 	<fence_daemon post_fail_delay="0" post_join_delay="3"/>
>> 	<clusternodes>
>> 		<clusternode name="dream" votes="1">
>> 			<altname name"dream-e1">
>> 			<multicast addr="224.0.0.1" interface="eth0"/>
>> 			<multicast addr="224.0.0.9" interface="eth1"/>
>> 			<fence>
>> 				<method name="1">
>> 					<device name="human" nodename="dream"/>
>> 				</method>
>> 			</fence>
>> 		</clusternode>
>> 		<clusternode name="neo" votes="1">
>> 			<multicast addr="224.0.0.1" interface="eth0"/>
>> 			<fence>
>> 				<method name="1">
>> 					<device name="human" nodename="neo"/>
>> 				</method>
>> 			</fence>
>> 		</clusternode>
>> 		<clusternode name="node1" votes="1">
>> 			<multicast addr="224.0.0.9" interface="eth0"/>
>> 			<fence>
>> 				<method name="1">
>> 					<device name="human" nodename="node1"/>
>> 				</method>
>> 			</fence>
>> 		</clusternode>
>> 	</clusternodes>
>> 	<cman expected_votes="1" two_node="1">
>> 		<multicast addr="224.0.0.1"/>
>> 		<multicast addr="224.0.0.9"/>
>> 	</cman>
>> 	<fencedevices>
>> 		<fencedevice agent="fence_manual" name="human"/>
>> 	</fencedevices>
>> 	<rm>
>> 		<failoverdomains/>
>> 		<resources/>
>> 	</rm>
>> </cluster>
>>
> 
> Wolfgang
> Do not use the multicast address 224.0.0.1.  It is reserved for some
> various ipv4 operations.
> 
> Try using 225.0.0.9.  If you have a switch between the two subnets, I
> would expect RHCS to work.  If you have a router, I'd expect it not to
> work as the TTL must be set for multicast packets to hop across routers.
> For IPV6 the hop count must be set.  It appears you are using ipv4.
> 
> If you have a switch and it doesn't work, try turning off IGMP filtering
> in the switch +if it is a smart switch.  If it is a dumb switch it
> should just work with some additional latencies.

Good advice.

I've fixed the Wiki page, so it reflects reality a little more. I don't know
where that came from but it was confusing.


-- 

patrick



From lhh at redhat.com  Wed Apr 26 21:36:22 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 26 Apr 2006 17:36:22 -0400
Subject: [Linux-cluster] Re: Meaning of "service"
In-Reply-To: <pan.2006.04.23.11.04.21.687000@arvin.dk>
References: <pan.2006.04.22.14.44.52.172000@arvin.dk>
	<1145718821.3302.20.camel@auh5-0479.corp.jabil.org>
	<pan.2006.04.23.11.04.21.687000@arvin.dk>
Message-ID: <1146087382.2984.116.camel@ayanami.boston.redhat.com>

On Sun, 2006-04-23 at 13:04 +0200, Troels Arvin wrote:
> Hello,
> 
> On Sat, 22 Apr 2006 11:13:41 -0400, Eric Kerin wrote:
> >> Should I set this up as
> >> a) one Cluster Service,
> >> b) as three different Cluster Services?
> >> 
> > I have a very similar setup for my cluster.  I recommend option b.
> 
> I ended up doing option a, because I couldn't get the other option
> working, for some strange reason.
> 
> By the way: The manual is rather unclear about the difference between
> _adding_ a resource, and _attaching_ a resource. Can someone explain the
> difference?

It's like making a table leg.  Just because you have a table leg doesn't
mean you have to build a table; you could just have this leg sitting
around doing nothing until you decide to use it later.

Attach enough pieces together and you can make a table.  ;)

Unattached (but present) resources are not started by the cluster.

Creating "global" resources separate from a service was primarily
designed to allow for reuse of resources in some cases.  E.g. GFS file
systems, clients for cluster NFS services: create "Joe's Desktop" as an
NFS client resource, and you can attach it to multiple NFS servers in
the cluster.  All instances get the same export options.


Hmmm... I don't think this plays well in to my table-leg example,
because it's really hard to share table legs between multiple tables
which are in different rooms; I think you'd have to have to introduce a
metaphysical redefinition of the world in order for it to work in which
the table legs have built-in infinite improbability drives, but I think
you get the idea. ;)

-- Lon



From pauli at grey.colorado.edu  Thu Apr 27 00:45:35 2006
From: pauli at grey.colorado.edu (Wolfgang Pauli)
Date: Wed, 26 Apr 2006 18:45:35 -0600
Subject: [Linux-cluster] multicast howto
In-Reply-To: <1145990549.6075.119.camel@shih.broked.org>
References: <200604251238.41472.ookami@gmx.de>
	<1145990549.6075.119.camel@shih.broked.org>
Message-ID: <200604261845.35577.pauli@grey.colorado.edu>

Thanks!

First, I had to figure out how multicast works. Still don't fully understand 
it. I can ping 224.0.0.1 and I get responses from all hosts in the same 
subnet. It tried to ping 225.0.0.8 but that does not really work. But I don't 
know whether it has to.
I have changed the cluster.conf to have only two nodes. Just to get the basic 
understanding. I have node dream on subnet 210 and neo on 223. They still 
form their own clusters. Should I just try different addresses, or do the 
switches/routers have to be programmed for that?

regards,

wolfgang

<?xml version="1.0" ?>
<cluster config_version="2" name="alpha_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="dream" votes="1">
                        <multicast addr="225.0.0.9" interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device name="human" 
nodename="dream"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="neo" votes="1">
                        <multicast addr="225.0.0.9" interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device name="human" nodename="neo"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1">
                <multicast addr="225.0.0.9"/>
        </cman>
        <fencedevices>
                <fencedevice agent="fence_manual" name="human"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>



From ookami at gmx.de  Thu Apr 27 00:47:29 2006
From: ookami at gmx.de (Wolfgang Pauli)
Date: Wed, 26 Apr 2006 18:47:29 -0600
Subject: [Linux-cluster] multicast howto
In-Reply-To: <1145990549.6075.119.camel@shih.broked.org>
References: <200604251238.41472.ookami@gmx.de>
	<1145990549.6075.119.camel@shih.broked.org>
Message-ID: <200604261847.29094.ookami@gmx.de>

Thanks!

First, I had to figure out how multicast works. Still don't fully understand 
it. I can ping 224.0.0.1 and I get responses from all hosts in the same 
subnet. It tried to ping 225.0.0.8 but that does not really work. But I don't 
know whether it has to.
I have changed the cluster.conf to have only two nodes. Just to get the basic 
understanding. I have node dream on subnet 210 and neo on 223. They still 
form their own clusters. Should I just try different addresses, or do the 
switches/routers have to be programmed for that?

regards,

wolfgang

<?xml version="1.0" ?>
<cluster config_version="2" name="alpha_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="dream" votes="1">
                        <multicast addr="225.0.0.9" interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device name="human" 
nodename="dream"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="neo" votes="1">
                        <multicast addr="225.0.0.9" interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device name="human" nodename="neo"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1">
                <multicast addr="225.0.0.9"/>
        </cman>
        <fencedevices>
                <fencedevice agent="fence_manual" name="human"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>



From jason at monsterjam.org  Thu Apr 27 02:14:22 2006
From: jason at monsterjam.org (Jason)
Date: Wed, 26 Apr 2006 22:14:22 -0400
Subject: [Linux-cluster] fencing?
Message-ID: <20060427021422.GC37759@monsterjam.org>

ok, so I have GFS installed and have the modules loaded, seems the next step is to set up "fencing"
and looking at the list of agents at,
http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-fence-methods.html
im kinda confused. I thought that the idea behind fencing was to allow one node (i have 2) to take 
down/reboot the other node after it has detected some sort of fault in the other node.

so I understand APC Network Power Switch and WTI Network Power Switch
but since I dont have any of those installed, Im going down the list and see
GNBD and xCAT as options. I dont understand what these software packages are supposed to be doing 
in relation to being a fencing agent. Can I use one of these options reliably without having a 
hardware power switch? I guess in short, the docs I have read dont quite explain how fencing is 
supposed to work with GFS.

regards,
Jason



From pcaulfie at redhat.com  Thu Apr 27 07:29:08 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Thu, 27 Apr 2006 08:29:08 +0100
Subject: [Linux-cluster] multicast howto
In-Reply-To: <200604261845.35577.pauli@grey.colorado.edu>
References: <200604251238.41472.ookami@gmx.de>	<1145990549.6075.119.camel@shih.broked.org>
	<200604261845.35577.pauli@grey.colorado.edu>
Message-ID: <445072C4.60208@redhat.com>

Wolfgang Pauli wrote:
> Thanks!
> 
> First, I had to figure out how multicast works. Still don't fully understand 
> it. I can ping 224.0.0.1 and I get responses from all hosts in the same 
> subnet. It tried to ping 225.0.0.8 but that does not really work. But I don't 
> know whether it has to.
> I have changed the cluster.conf to have only two nodes. Just to get the basic 
> understanding. I have node dream on subnet 210 and neo on 223. They still 
> form their own clusters. Should I just try different addresses, or do the 
> switches/routers have to be programmed for that?


That cluster.conf file looks a lot more sensible. If the nodes are still not
seeing each other then you may have to fiddle with the routers to make sure
that the multicast traffic is being passed. tcpdump will tell you whether the
traffic is moving between subnets.

It's also worth checking that there aren't any iptables rules preventing
traffic from the cluster port (6809/udp) or the multicast address reaching the
cluster manager.

> regards,
> 
> wolfgang
> 
> <?xml version="1.0" ?>
> <cluster config_version="2" name="alpha_cluster">
>         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="dream" votes="1">
>                         <multicast addr="225.0.0.9" interface="eth0"/>
>                         <fence>
>                                 <method name="1">
>                                         <device name="human" 
> nodename="dream"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="neo" votes="1">
>                         <multicast addr="225.0.0.9" interface="eth0"/>
>                         <fence>
>                                 <method name="1">
>                                         <device name="human" nodename="neo"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1">
>                 <multicast addr="225.0.0.9"/>
>         </cman>
>         <fencedevices>
>                 <fencedevice agent="fence_manual" name="human"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains/>
>                 <resources/>
>         </rm>
> </cluster>
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


-- 

patrick



From jerome.castang at adelpha-lan.org  Thu Apr 27 08:08:08 2006
From: jerome.castang at adelpha-lan.org (Castang Jerome)
Date: Thu, 27 Apr 2006 10:08:08 +0200
Subject: [Linux-cluster] iSCSI fence agent 
Message-ID: <44507BE8.20402@adelpha-lan.org>

Hi,

I found on the RC-list an email (sent in october 2004) about a script 
witch is a iscsi fence agent
Here is the mail:

http://www.redhat.com/archives/linux-cluster/2004-October/msg00105.html

When I try to start this script, I get this error:
"Could not start /usr/bin/ssh root at gfs5 No file or directory".

But "/usr/bin/ssh" does exist and node gfs5 is running.

Any idea on the problem?

-- 
Jerome Castang
mail: jcastang at adelpha-lan.org



From cjk at techma.com  Thu Apr 27 11:52:37 2006
From: cjk at techma.com (Kovacs, Corey J.)
Date: Thu, 27 Apr 2006 07:52:37 -0400
Subject: [Linux-cluster] multicast howto
Message-ID: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E75@tmaemail.techma.com>

Wolfgang, you can't arbitrarily ping multicast addresses as they don't really
exist in 
the sense that an ethernet interface exists as a card. Multicast is a
"subscription" 
based concept. You have to be listening for multicast traffic to receive it
at all. To
"listen", one "joins" a multicast group (bind to a mcast ip address) and the
router, if
configured properly will route mcast traffic to your interface. There are
certain mcast
addresses you should never use explicitly. As mentioned, 224.0.0.1 is one of
them, there
are others but I can't recall what they are. Your other config didn't work
(with respet 
to multicast) because all of your nodes were listening to different multicast
"groups"
which is like trying to dial in on a party line, but everone using the wrong
phone number.

Important to reitterate, Multicast has to be configured by whoever runs your
router(s).
If Multicast is not enabled on a switch, even if it is on your network, it
turns in to 
broadcast on that switch, which defeats the purpose of multicast in the first
place.


Hope this clears some multicast stuff up for you.

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Wolfgang Pauli
Sent: Wednesday, April 26, 2006 8:46 PM
To: sdake at redhat.com; linux clustering
Subject: Re: [Linux-cluster] multicast howto

Thanks!

First, I had to figure out how multicast works. Still don't fully understand
it. I can ping 224.0.0.1 and I get responses from all hosts in the same
subnet. It tried to ping 225.0.0.8 but that does not really work. But I don't
know whether it has to.
I have changed the cluster.conf to have only two nodes. Just to get the basic
understanding. I have node dream on subnet 210 and neo on 223. They still
form their own clusters. Should I just try different addresses, or do the
switches/routers have to be programmed for that?

regards,

wolfgang

<?xml version="1.0" ?>
<cluster config_version="2" name="alpha_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="dream" votes="1">
                        <multicast addr="225.0.0.9" interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device name="human" 
nodename="dream"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="neo" votes="1">
                        <multicast addr="225.0.0.9" interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device name="human" nodename="neo"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1">
                <multicast addr="225.0.0.9"/>
        </cman>
        <fencedevices>
                <fencedevice agent="fence_manual" name="human"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From lhh at redhat.com  Thu Apr 27 13:36:31 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 27 Apr 2006 09:36:31 -0400
Subject: [Linux-cluster] iSCSI fence agent
In-Reply-To: <44507BE8.20402@adelpha-lan.org>
References: <44507BE8.20402@adelpha-lan.org>
Message-ID: <1146144991.2984.127.camel@ayanami.boston.redhat.com>

On Thu, 2006-04-27 at 10:08 +0200, Castang Jerome wrote:
> Hi,
> 
> I found on the RC-list an email (sent in october 2004) about a script 
> witch is a iscsi fence agent
> Here is the mail:
> 
> http://www.redhat.com/archives/linux-cluster/2004-October/msg00105.html
> 
> When I try to start this script, I get this error:
> "Could not start /usr/bin/ssh root at gfs5 No file or directory".
> 
> But "/usr/bin/ssh" does exist and node gfs5 is running.

It's probably trying to exec:

    /usr/bin/ssh\ root at gfs5  <-- one filename

vs
    /usr/bin/ssh root at gfs5

for some reason; wrong quotation on the system / exec call(s) ?

-- Lon



From jerome.castang at adelpha-lan.org  Thu Apr 27 13:43:21 2006
From: jerome.castang at adelpha-lan.org (Castang Jerome)
Date: Thu, 27 Apr 2006 15:43:21 +0200
Subject: [Linux-cluster] iSCSI fence agent
In-Reply-To: <1146144991.2984.127.camel@ayanami.boston.redhat.com>
References: <44507BE8.20402@adelpha-lan.org>
	<1146144991.2984.127.camel@ayanami.boston.redhat.com>
Message-ID: <4450CA79.9020400@adelpha-lan.org>

Lon Hohberger a ?crit :

>
>It's probably trying to exec:
>
>    /usr/bin/ssh\ root at gfs5  <-- one filename
>
>vs
>    /usr/bin/ssh root at gfs5
>
>for some reason; wrong quotation on the system / exec call(s) ?
>
>-- Lon
>
>  
>


It's ok I found the probleme,
I replaced the function "runcommand" by "system" and it works perfectly.
Here is the modified perl script:

/#!/usr/bin/perl

###############################################################################
###############################################################################
##
##  Copyright (C) Sistina Software, Inc.  1997-2003  All rights reserved.
##  Copyright (C) 2004 Red Hat, Inc.  All rights reserved.
##
##  This copyrighted material is made available to anyone wishing to use,
##  modify, copy, or redistribute it subject to the terms and conditions
##  of the GNU General Public License v.2.
##
###############################################################################
###############################################################################

use Getopt::Std;

# Get the program name from $0 and strip directory names
$_=$0;
s/.*\///;
my $pname = $_;

$opt_o = 'disable';        # Default fence action

# WARNING!! Do not add code bewteen "#BEGIN_VERSION_GENERATION" and
# "#END_VERSION_GENERATION"  It is generated by the Makefile

#BEGIN_VERSION_GENERATION
$FENCE_RELEASE_NAME="";
$REDHAT_COPYRIGHT="";
$BUILD_DATE="";
#END_VERSION_GENERATION

sub usage
{
    print "Usage:\n";
    print "\n";
    print "$pname [options]\n";
    print "\n";
    print "Options:\n";
    print "  -a <ip>          ISCSI target address\n";
    print "  -h               usage\n";
#    print "  -l <name>        Login name\n";
    print "  -n <num>         IP of node to disable\n";
    print "  -o <string>      Action:  disable (default) or enable\n";
#    print "  -p <string>      Password for login (not used)\n";
    print "  -q               quiet mode\n";
    print "  -V               version\n";

    exit 0;
}

sub fail
{
  ($msg) = @_;
  print $msg."\n" unless defined $opt_q;
  $t->close if defined $t;
  exit 1;
}

sub fail_usage
{
  ($msg)= _;
  print STDERR $msg."\n" if $msg;
  print STDERR "Please use '-h' for usage.\n";
  exit 1;
}

sub version
{
  print "$pname $FENCE_RELEASE_NAME $BUILD_DATE\n";
  print "$REDHAT_COPYRIGHT\n" if ( $REDHAT_COPYRIGHT );

  exit 0;
}

if (@ARGV > 0)
{
   #getopts("a:hl:n:o:p:qV") || fail_usage ;
   getopts("a:hn:o:qV") || fail_usage ;

   usage if defined $opt_h;
   version if defined $opt_V;

   fail_usage "Unknown parameter." if (@ARGV > 0);

   fail_usage "No '-a' flag specified." unless defined $opt_a;
   fail_usage "No '-n' flag specified." unless defined $opt_n;
   fail_usage "Unrecognised action '$opt_o' for '-o' flag"
      unless $opt_o =~ /^(disable|enable)$/i;

}
else
{
   get_options_stdin();

   fail "failed: no IP address" unless defined $opt_a;
   fail "failed: no plug number" unless defined $opt_n;
   #fail "failed: no login name" unless defined $opt_l;
   #fail "failed: no password" unless defined $opt_p;
   fail "failed: unrecognised action: $opt_o"
      unless $opt_o =~ /^(disable|enable)$/i;
}

#
# Set up and log in
#

my $target_address=$opt_a; #The address of the iSCSI target
my $command=$opt_o;     #either enable or disable
my $node=$opt_n;        #the cluster member to lock out

#use ssh to log into remote host and send over iptables commands:

# iptables -D INPUT -s a.b.c.d -p all -j REJECT
# iptables -A INPUT -s a.b.c.d -p all -j REJECT

if ($command eq "enable")
{       #Enable $node on $target_address

        system("ssh ".' root@'.$target_address." /sbin/iptables -D INPUT 
-s " . $node . " -p all -j REJECT");

        if ($out != 0)
        {
                fail "111Could not $command $node on 
$target_address\n$cmd\n";
        }
}
elsif ($command eq "disable")
{       #Disable $node on $target_address

        system("ssh ".' root@'.$target_address." /sbin/iptables -A INPUT 
-s " . $node . " -p all -j REJECT");

        if ($? != 0 )
        {
                fail "Could not $command $node on $target_address\n$cmd\n";
        }
}
else
{       #This should never happen:
        fail "Unknown command: $command\n";
}

print "success: $command $node\n" unless defined $opt_q;
exit 0;

sub get_options_stdin
{
    my $opt;
    my $line = 0;
    while( defined($in = <>) )
    {
        $_ = $in;
        chomp;

        # strip leading and trailing whitespace
        s/^\s*//;
        s/\s*$//;

        # skip comments
        next if /^#/;

        $line+=1;
        $opt=$_;
        next unless $opt;

        ($name,$val)=split /\s*=\s*/, $opt;

        if ( $name eq "" )
        {
           print STDERR "parse error: illegal name in option $line\n";
           exit 2;
        }

        # DO NOTHING -- this field is used by fenced
        elsif ($name eq "agent" ) { }

        # FIXME -- depricated.  use "port" instead.
        elsif ($name eq "fm" )
        {
            (my $dummy,$opt_n) = split /\s+/,$val;
            print STDERR "Depricated \"fm\" entry detected.  refer to 
man page.\n";
        }

        elsif ($name eq "ipaddr" )
        {
            $opt_a = $val;
        }
        elsif ($name eq "login" )
        {
            $opt_l = $val;
        }

        # FIXME -- depreicated residue of old fencing system
        elsif ($name eq "name" ) { }

        elsif ($name eq "option" )
        {
            $opt_o = $val;
        }
        elsif ($name eq "passwd" )
        {
            $opt_p = $val;
        }
        elsif ($name eq "port" )
        {
            $opt_n = $val;
        }
        # elsif ($name eq "test" )
        # {
        #    $opt_T = $val;
        # }

        # FIXME should we do more error checking?
        # Excess name/vals will be eaten for now
        else
        {
           fail "parse error: unknown option \"$opt\"";
        }
    }
}/


thanks,

-- 
Jerome Castang
mail: jcastang at adelpha-lan.org




From mbrookov at mines.edu  Thu Apr 27 14:17:16 2006
From: mbrookov at mines.edu (Matthew B. Brookover)
Date: Thu, 27 Apr 2006 08:17:16 -0600
Subject: [Linux-cluster] iSCSI fence agent
In-Reply-To: <4450CA79.9020400@adelpha-lan.org>
References: <44507BE8.20402@adelpha-lan.org>
	<1146144991.2984.127.camel@ayanami.boston.redhat.com>
	<4450CA79.9020400@adelpha-lan.org>
Message-ID: <1146147436.12841.13.camel@merlin.Mines.EDU>

I have not used this tool in a while, but it did work on my system.

I would not trust this version to fence properly.  Using system does not
allow the exit status of iptables to be checked for errors.  System only
reports the status of the ssh command, not the command that is called on
the remote host.

Matt

On Thu, 2006-04-27 at 15:43 +0200, Castang Jerome wrote:

> Lon Hohberger a ?crit :
> 
> >
> >It's probably trying to exec:
> >
> >    /usr/bin/ssh\ root at gfs5  <-- one filename
> >
> >vs
> >    /usr/bin/ssh root at gfs5
> >
> >for some reason; wrong quotation on the system / exec call(s) ?
> >
> >-- Lon
> >
> >  
> >
> 
> 
> It's ok I found the probleme,
> I replaced the function "runcommand" by "system" and it works perfectly.
> Here is the modified perl script:
> 
> /#!/usr/bin/perl
> 
> ###############################################################################
> ###############################################################################
> ##
> ##  Copyright (C) Sistina Software, Inc.  1997-2003  All rights reserved.
> ##  Copyright (C) 2004 Red Hat, Inc.  All rights reserved.
> ##
> ##  This copyrighted material is made available to anyone wishing to use,
> ##  modify, copy, or redistribute it subject to the terms and conditions
> ##  of the GNU General Public License v.2.
> ##
> ###############################################################################
> ###############################################################################
> 
> use Getopt::Std;
> 
> # Get the program name from $0 and strip directory names
> $_=$0;
> s/.*\///;
> my $pname = $_;
> 
> $opt_o = 'disable';        # Default fence action
> 
> # WARNING!! Do not add code bewteen "#BEGIN_VERSION_GENERATION" and
> # "#END_VERSION_GENERATION"  It is generated by the Makefile
> 
> #BEGIN_VERSION_GENERATION
> $FENCE_RELEASE_NAME="";
> $REDHAT_COPYRIGHT="";
> $BUILD_DATE="";
> #END_VERSION_GENERATION
> 
> sub usage
> {
>     print "Usage:\n";
>     print "\n";
>     print "$pname [options]\n";
>     print "\n";
>     print "Options:\n";
>     print "  -a <ip>          ISCSI target address\n";
>     print "  -h               usage\n";
> #    print "  -l <name>        Login name\n";
>     print "  -n <num>         IP of node to disable\n";
>     print "  -o <string>      Action:  disable (default) or enable\n";
> #    print "  -p <string>      Password for login (not used)\n";
>     print "  -q               quiet mode\n";
>     print "  -V               version\n";
> 
>     exit 0;
> }
> 
> sub fail
> {
>   ($msg) = @_;
>   print $msg."\n" unless defined $opt_q;
>   $t->close if defined $t;
>   exit 1;
> }
> 
> sub fail_usage
> {
>   ($msg)= _;
>   print STDERR $msg."\n" if $msg;
>   print STDERR "Please use '-h' for usage.\n";
>   exit 1;
> }
> 
> sub version
> {
>   print "$pname $FENCE_RELEASE_NAME $BUILD_DATE\n";
>   print "$REDHAT_COPYRIGHT\n" if ( $REDHAT_COPYRIGHT );
> 
>   exit 0;
> }
> 
> if (@ARGV > 0)
> {
>    #getopts("a:hl:n:o:p:qV") || fail_usage ;
>    getopts("a:hn:o:qV") || fail_usage ;
> 
>    usage if defined $opt_h;
>    version if defined $opt_V;
> 
>    fail_usage "Unknown parameter." if (@ARGV > 0);
> 
>    fail_usage "No '-a' flag specified." unless defined $opt_a;
>    fail_usage "No '-n' flag specified." unless defined $opt_n;
>    fail_usage "Unrecognised action '$opt_o' for '-o' flag"
>       unless $opt_o =~ /^(disable|enable)$/i;
> 
> }
> else
> {
>    get_options_stdin();
> 
>    fail "failed: no IP address" unless defined $opt_a;
>    fail "failed: no plug number" unless defined $opt_n;
>    #fail "failed: no login name" unless defined $opt_l;
>    #fail "failed: no password" unless defined $opt_p;
>    fail "failed: unrecognised action: $opt_o"
>       unless $opt_o =~ /^(disable|enable)$/i;
> }
> 
> #
> # Set up and log in
> #
> 
> my $target_address=$opt_a; #The address of the iSCSI target
> my $command=$opt_o;     #either enable or disable
> my $node=$opt_n;        #the cluster member to lock out
> 
> #use ssh to log into remote host and send over iptables commands:
> 
> # iptables -D INPUT -s a.b.c.d -p all -j REJECT
> # iptables -A INPUT -s a.b.c.d -p all -j REJECT
> 
> if ($command eq "enable")
> {       #Enable $node on $target_address
> 
>         system("ssh ".' root@'.$target_address." /sbin/iptables -D INPUT 
> -s " . $node . " -p all -j REJECT");
> 
>         if ($out != 0)
>         {
>                 fail "111Could not $command $node on 
> $target_address\n$cmd\n";
>         }
> }
> elsif ($command eq "disable")
> {       #Disable $node on $target_address
> 
>         system("ssh ".' root@'.$target_address." /sbin/iptables -A INPUT 
> -s " . $node . " -p all -j REJECT");
> 
>         if ($? != 0 )
>         {
>                 fail "Could not $command $node on $target_address\n$cmd\n";
>         }
> }
> else
> {       #This should never happen:
>         fail "Unknown command: $command\n";
> }
> 
> print "success: $command $node\n" unless defined $opt_q;
> exit 0;
> 
> sub get_options_stdin
> {
>     my $opt;
>     my $line = 0;
>     while( defined($in = <>) )
>     {
>         $_ = $in;
>         chomp;
> 
>         # strip leading and trailing whitespace
>         s/^\s*//;
>         s/\s*$//;
> 
>         # skip comments
>         next if /^#/;
> 
>         $line+=1;
>         $opt=$_;
>         next unless $opt;
> 
>         ($name,$val)=split /\s*=\s*/, $opt;
> 
>         if ( $name eq "" )
>         {
>            print STDERR "parse error: illegal name in option $line\n";
>            exit 2;
>         }
> 
>         # DO NOTHING -- this field is used by fenced
>         elsif ($name eq "agent" ) { }
> 
>         # FIXME -- depricated.  use "port" instead.
>         elsif ($name eq "fm" )
>         {
>             (my $dummy,$opt_n) = split /\s+/,$val;
>             print STDERR "Depricated \"fm\" entry detected.  refer to 
> man page.\n";
>         }
> 
>         elsif ($name eq "ipaddr" )
>         {
>             $opt_a = $val;
>         }
>         elsif ($name eq "login" )
>         {
>             $opt_l = $val;
>         }
> 
>         # FIXME -- depreicated residue of old fencing system
>         elsif ($name eq "name" ) { }
> 
>         elsif ($name eq "option" )
>         {
>             $opt_o = $val;
>         }
>         elsif ($name eq "passwd" )
>         {
>             $opt_p = $val;
>         }
>         elsif ($name eq "port" )
>         {
>             $opt_n = $val;
>         }
>         # elsif ($name eq "test" )
>         # {
>         #    $opt_T = $val;
>         # }
> 
>         # FIXME should we do more error checking?
>         # Excess name/vals will be eaten for now
>         else
>         {
>            fail "parse error: unknown option \"$opt\"";
>         }
>     }
> }/
> 
> 
> thanks,
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060427/cfe87483/attachment.htm>

From eric at bootseg.com  Thu Apr 27 14:46:55 2006
From: eric at bootseg.com (Eric Kerin)
Date: Thu, 27 Apr 2006 10:46:55 -0400
Subject: [Linux-cluster] iSCSI fence agent
In-Reply-To: <4450CA79.9020400@adelpha-lan.org>
References: <44507BE8.20402@adelpha-lan.org>
	<1146144991.2984.127.camel@ayanami.boston.redhat.com>
	<4450CA79.9020400@adelpha-lan.org>
Message-ID: <1146149215.3397.11.camel@auh5-0479.corp.jabil.org>

On Thu, 2006-04-27 at 15:43 +0200, Castang Jerome wrote:
> Lon Hohberger a ?crit :
> 
> >
> >It's probably trying to exec:
> >
> >    /usr/bin/ssh\ root at gfs5  <-- one filename
> >
> >vs
> >    /usr/bin/ssh root at gfs5
> >


Is the node gfs5 one of the systems mounting the GFS filesystem, or a
single box sharing out a device using iscsi?

If it's a node mounting the GFS filesystem, this fence method might not
work in all failure conditions (kernel panics, intermittent network
problems, sky high system load, etc).  Since you can't trust a machine
that is acting up to follow any of your commands via ssh.


This fence script looks like it was meant to ssh into a linux box
sharing out the iscsi device, and block the node's access to it.  Not
ssh into the node, and block it's access to the iscsi device.  


I figured I'd check. It'd be better to find out if it won't work now,
than 3am when your cluster is down since it couldn't fence a node.

Thanks, 
Eric Kerin
eric at bootseg.com



From jerome.castang at adelpha-lan.org  Thu Apr 27 14:49:20 2006
From: jerome.castang at adelpha-lan.org (Castang Jerome)
Date: Thu, 27 Apr 2006 16:49:20 +0200
Subject: [Linux-cluster] iSCSI fence agent
In-Reply-To: <1146149215.3397.11.camel@auh5-0479.corp.jabil.org>
References: <44507BE8.20402@adelpha-lan.org>	
	<1146144991.2984.127.camel@ayanami.boston.redhat.com>	
	<4450CA79.9020400@adelpha-lan.org>
	<1146149215.3397.11.camel@auh5-0479.corp.jabil.org>
Message-ID: <4450D9F0.9060901@adelpha-lan.org>

Eric Kerin a ?crit :

>
>Is the node gfs5 one of the systems mounting the GFS filesystem, or a
>single box sharing out a device using iscsi?
>  
>
It's a single box sharing out a device  using iscsi.



-- 
Jerome Castang
mail: jcastang at adelpha-lan.org




From guillermo.gomez at gmail.com  Wed Apr 26 20:43:18 2006
From: guillermo.gomez at gmail.com (=?ISO-8859-1?Q?Guillermo_G=F3mez?=)
Date: Wed, 26 Apr 2006 16:43:18 -0400
Subject: [Linux-cluster] using fiber channel
Message-ID: <444FDB66.5080303@gmail.com>

Hi, i would like if this mail list is the right one to discuss for using 
SAN with Fiber Channel HDAs.

regards
Guillermo G?mez S.
Caracas/Venezuela



From sdake at redhat.com  Thu Apr 27 00:56:05 2006
From: sdake at redhat.com (Steven Dake)
Date: Wed, 26 Apr 2006 17:56:05 -0700
Subject: [Linux-cluster] multicast howto
In-Reply-To: <200604261847.29094.ookami@gmx.de>
References: <200604251238.41472.ookami@gmx.de>
	<1145990549.6075.119.camel@shih.broked.org>
	<200604261847.29094.ookami@gmx.de>
Message-ID: <1146099365.11702.1.camel@shih.broked.org>

On Wed, 2006-04-26 at 18:47 -0600, Wolfgang Pauli wrote:
> Thanks!
> 
> First, I had to figure out how multicast works. Still don't fully understand 
> it. I can ping 224.0.0.1 and I get responses from all hosts in the same 
> subnet. It tried to ping 225.0.0.8 but that does not really work. But I don't 
> know whether it has to.
> I have changed the cluster.conf to have only two nodes. Just to get the basic 
> understanding. I have node dream on subnet 210 and neo on 223. They still 
> form their own clusters. Should I just try different addresses, or do the 
> switches/routers have to be programmed for that?
> 
> regards,
> 
> wolfgang
> 

you dont say whether you have routers or switches between the two
machines

it probably wont work across routers
it should work across switches as long as they a) dont have IGMP
filtering enabled or b) their IGMP implementation is good which most are
not
> <?xml version="1.0" ?>
> <cluster config_version="2" name="alpha_cluster">
>         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="dream" votes="1">
>                         <multicast addr="225.0.0.9" interface="eth0"/>
>                         <fence>
>                                 <method name="1">
>                                         <device name="human" 
> nodename="dream"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="neo" votes="1">
>                         <multicast addr="225.0.0.9" interface="eth0"/>
>                         <fence>
>                                 <method name="1">
>                                         <device name="human" nodename="neo"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1">
>                 <multicast addr="225.0.0.9"/>
>         </cman>
>         <fencedevices>
>                 <fencedevice agent="fence_manual" name="human"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains/>
>                 <resources/>
>         </rm>
> </cluster>



From gforte at leopard.us.udel.edu  Thu Apr 27 14:56:27 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Thu, 27 Apr 2006 10:56:27 -0400
Subject: [Linux-cluster] using fiber channel
In-Reply-To: <444FDB66.5080303@gmail.com>
References: <444FDB66.5080303@gmail.com>
Message-ID: <4450DB9B.5000904@leopard.us.udel.edu>

use them for what?  ;-)  lots of people on this list use them, but 
they're not really the focus of the list.  If you have questions about 
using a SAN with a cluster, this is the place.  If you have general 
questions about how they work, not so much, but someone can/will 
probably still answer them.

-g

Guillermo G?mez wrote:
> Hi, i would like if this mail list is the right one to discuss for using 
> SAN with Fiber Channel HDAs.
> 
> regards
> Guillermo G?mez S.
> Caracas/Venezuela
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 


-- 
Greg Forte
gforte at udel.edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE



From carlopmart at gmail.com  Thu Apr 27 16:42:26 2006
From: carlopmart at gmail.com (carlopmart)
Date: Thu, 27 Apr 2006 18:42:26 +0200
Subject: [Linux-cluster] Recommended HP servers for cluster suite
Message-ID: <4450F472.8050205@gmail.com>

Hi all,

  Somebody can recommends me some HP servers to use with Redhat Cluster 
Suite for RHEL 4?? My requeriments are:

  - 4GB RAM
  - Scsi disks
  - Two CPUs
  - iLO support for RHCS fence agent.

I don't need shred storage.

Many thanks

-- 
CL Martinez
carlopmart {at} gmail {d0t} com



From lhh at redhat.com  Thu Apr 27 17:34:44 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 27 Apr 2006 13:34:44 -0400
Subject: [Linux-cluster] cluster suite / Opteron
In-Reply-To: <1145966377.16894.30.camel@mouse>
References: <1145966377.16894.30.camel@mouse>
Message-ID: <1146159284.2984.161.camel@ayanami.boston.redhat.com>

On Tue, 2006-04-25 at 07:59 -0400, Rajiv Vaidyanath wrote:
> Hi,
> 
> I get some compilation warnings on opteron (cluster-1.02.00)
> 
> Eg:
> --------------------------------------------
> drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1595: warning: long unsigned
> int format, uint64_t arg (arg 2)
> drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1596: warning: long unsigned
> int format, uint64_t arg (arg 2)
> drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1598: warning: long unsigned
> int format, uint64_t arg (arg 2)
> drivers/gfs/gfs-kernel/src/gfs/gfs_ondisk.h:1599: warning: long unsigned
> int format, uint64_t arg (arg 2)
> --------------------------------------------
> 
> Can I safely ignore these warnings ?

Yes, but thanks for noting them; they should be fixed at some point
simply for cleanliness.

-- Lon




From Steve.Bagby at neartek.com  Thu Apr 27 18:00:04 2006
From: Steve.Bagby at neartek.com (Steve Bagby)
Date: Thu, 27 Apr 2006 14:00:04 -0400
Subject: [Linux-cluster] Fencing using Fibre Alliance MIB
Message-ID: <C0767EE4.158A1%Steve.Bagby@neartek.com>

Has anyone thought about or done a fence agent using the (more or less)
standard Fibre Alliance MIB ?

Seems like this would work for a range of switches ...




From rainer at ultra-secure.de  Thu Apr 27 19:57:14 2006
From: rainer at ultra-secure.de (rainer at ultra-secure.de)
Date: Thu, 27 Apr 2006 21:57:14 +0200
Subject: [Linux-cluster] Recommended HP servers for cluster suite
In-Reply-To: <4450F472.8050205@gmail.com>
References: <4450F472.8050205@gmail.com>
Message-ID: <20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de>

Quoting carlopmart <carlopmart at gmail.com>:

> Hi all,
>
>  Somebody can recommends me some HP servers to use with Redhat 
> Cluster Suite for RHEL 4?? My requeriments are:
>
>  - 4GB RAM
>  - Scsi disks
>  - Two CPUs
>  - iLO support for RHCS fence agent.
>
> I don't need shred storage.

Blades.
bl20p can be had very cheap nowadays, but should be enough for most tasks.
Downside: only two internal disks, the rest is via SAN (or iSCSI).



cheers,
Rainer

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



From jparsons at redhat.com  Thu Apr 27 20:03:58 2006
From: jparsons at redhat.com (James Parsons)
Date: Thu, 27 Apr 2006 16:03:58 -0400
Subject: [Linux-cluster] Recommended HP servers for cluster suite
In-Reply-To: <20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de>
References: <4450F472.8050205@gmail.com>
	<20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de>
Message-ID: <445123AE.4000204@redhat.com>

rainer at ultra-secure.de wrote:

> Quoting carlopmart <carlopmart at gmail.com>:
>
>> Hi all,
>>
>>  Somebody can recommends me some HP servers to use with Redhat 
>> Cluster Suite for RHEL 4?? My requeriments are:
>>
>>  - 4GB RAM
>>  - Scsi disks
>>  - Two CPUs
>>  - iLO support for RHCS fence agent.
>>
>> I don't need shred storage.
>
>
> Blades.
> bl20p can be had very cheap nowadays, but should be enough for most 
> tasks.
> Downside: only two internal disks, the rest is via SAN (or iSCSI). 

I want to add a vote for the proliant bl* series. It uses iLO...not the 
older Riloe cards, which have been problematic now and then.

-J



From pcaulfie at redhat.com  Fri Apr 28 10:39:52 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Fri, 28 Apr 2006 11:39:52 +0100
Subject: [Linux-cluster] New features/architecture ?
In-Reply-To: <4448D132.2020906@redhat.com>
References: <FF2CE0D593AEE34B955FEC77BD5AFBE0079E70@tmaemail.techma.com>	<20060420153326.GB22326@redhat.com>	<4447AE0C.30000@redhat.com>
	<4448D132.2020906@redhat.com>
Message-ID: <4451F0F8.2060503@redhat.com>

OK, here's whole new document on where cman is going and how it fits in with
Openais and all that sort of stuff.

Comments welcome.

http://people.redhat.com/pcaulfie/docs/aiscman.odt

-- 

patrick



From Matthew.Patton.ctr at osd.mil  Fri Apr 28 14:51:48 2006
From: Matthew.Patton.ctr at osd.mil (Patton, Matthew F, CTR, OSD-PA&E)
Date: Fri, 28 Apr 2006 10:51:48 -0400
Subject: [Linux-cluster] New features/architecture ?
Message-ID: <D8063DF686D10247B0A49D01271285690CE91D81@osdn06.osd.mil>

Classification: UNCLASSIFIED

any chance for a vendor neutral document format - say RTF?

> OK, here's whole new document on where cman is going and how 
> it fits in with
> Openais and all that sort of stuff.
> 
> Comments welcome.
> 
> http://people.redhat.com/pcaulfie/docs/aiscman.odt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060428/ade0e3ce/attachment.htm>

From pcaulfie at redhat.com  Fri Apr 28 15:09:30 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Fri, 28 Apr 2006 16:09:30 +0100
Subject: [Linux-cluster] New features/architecture ?
In-Reply-To: <D8063DF686D10247B0A49D01271285690CE91D81@osdn06.osd.mil>
References: <D8063DF686D10247B0A49D01271285690CE91D81@osdn06.osd.mil>
Message-ID: <4452302A.9030707@redhat.com>

Patton, Matthew F, CTR, OSD-PA&E wrote:
> Classification: UNCLASSIFIED
> 
> any chance for a vendor neutral document format - say RTF?

!!

RTF is a microsoft format

ODT is Open Document format

>> OK, here's whole new document on where cman is going and how
>> it fits in with
>> Openais and all that sort of stuff.
>>
>> Comments welcome.
>>
>> http://people.redhat.com/pcaulfie/docs/aiscman.odt
> 



-- 

patrick



From pcaulfie at redhat.com  Fri Apr 28 15:16:10 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Fri, 28 Apr 2006 16:16:10 +0100
Subject: [Linux-cluster] New features/architecture ?
In-Reply-To: <D8063DF686D10247B0A49D01271285690CE91D81@osdn06.osd.mil>
References: <D8063DF686D10247B0A49D01271285690CE91D81@osdn06.osd.mil>
Message-ID: <445231BA.2090005@redhat.com>

For those that don't have the bandwidth to download OpenOffice.org, here's a PDF:

http://people.redhat.com/pcaulfie/docs/aiscman.pdf
-- 

patrick



From jason at monsterjam.org  Sat Apr 29 01:58:51 2006
From: jason at monsterjam.org (Jason)
Date: Fri, 28 Apr 2006 21:58:51 -0400
Subject: [Linux-cluster] 2nd try: fencing?
Message-ID: <20060429015851.GB66106@monsterjam.org>

ok, so I have GFS installed and have the modules loaded, seems the next step is to set up "fencing"
and looking at the list of agents at,
http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-fence-methods.html
im kinda confused. I thought that the idea behind fencing was to allow one node (i have 2) to take 
down/reboot the other node after it has detected some sort of fault in the other node.

so I understand APC Network Power Switch and WTI Network Power Switch
but since I dont have any of those installed, Im going down the list and see
GNBD and xCAT as options. I dont understand what these software packages are supposed to be doing 
in relation to being a fencing agent. Can I use one of these options reliably without having a 
hardware power switch? I guess in short, the docs I have read dont quite explain how fencing is 
supposed to work with GFS.

regards,
Jason




From eric at bootseg.com  Sat Apr 29 02:36:02 2006
From: eric at bootseg.com (Eric Kerin)
Date: Fri, 28 Apr 2006 22:36:02 -0400
Subject: [Linux-cluster] 2nd try: fencing?
In-Reply-To: <20060429015851.GB66106@monsterjam.org>
References: <20060429015851.GB66106@monsterjam.org>
Message-ID: <1146278162.5933.12.camel@mechanism.localnet>

On Fri, 2006-04-28 at 21:58 -0400, Jason wrote:
> ok, so I have GFS installed and have the modules loaded, seems the next step is to set up "fencing"
> and looking at the list of agents at,
> http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-fence-methods.html
> im kinda confused. I thought that the idea behind fencing was to allow one node (i have 2) to take 
> down/reboot the other node after it has detected some sort of fault in the other node.

Quick and dirty reason for needing fencing: In GFS you need to stop a
failed (or semi-failed) node from writing data to the shared filesystem,
otherwise corruption may occur.  

One method is to power off a misbehaving node.  Another is to block
access to the shared disk by telling a SAN switch to disable it's port.
Still another is to tell a firewall to not allow network traffic to an
iSCSI device from the offending node.

What you use for a fence method all depends on your hardware.  If you
give a quick explanation of your hardware setup, we might be able to
help you pick a fence device that will work with what you have already.
Or if you don't have anything that could be used to block access, you
might have to buy some network power switches. 

OR, if this isn't intended for production use, and you're just testing,
you can use fence_manual.  This one has the unpleasant downside of
needing manual intervention to bring the cluster up after a node
failure.  But for testing GFS and Cluster Suite, it's nice and cheap.

Thanks, 
Eric Kerin
eric at bootseg.com






From johannes.russek at io-consulting.net  Sat Apr 29 16:42:11 2006
From: johannes.russek at io-consulting.net (Johannes russek)
Date: Sat, 29 Apr 2006 18:42:11 +0200
Subject: [Linux-cluster] changes in include/linux/fs.h from 2.6.16 to 2.6.17
Message-ID: <LGEOIPCNDMCLNENHDMKPMEEMCPAA.johannes.russek@io-consulting.net>

hello everyone,
has anyone made a patch to use the new mutex mechanism in 2.6.17 in struct
block_device?
or am i the first one to try and notice? :)
best regards, johannes russek



From jason at monsterjam.org  Sun Apr 30 00:52:45 2006
From: jason at monsterjam.org (Jason)
Date: Sat, 29 Apr 2006 20:52:45 -0400
Subject: [Linux-cluster] 2nd try: fencing?
In-Reply-To: <1146278162.5933.12.camel@mechanism.localnet>
References: <20060429015851.GB66106@monsterjam.org>
	<1146278162.5933.12.camel@mechanism.localnet>
Message-ID: <20060430005245.GA76504@monsterjam.org>

> What you use for a fence method all depends on your hardware.  If you
> give a quick explanation of your hardware setup, we might be able to
> help you pick a fence device that will work with what you have already.
> Or if you don't have anything that could be used to block access, you
> might have to buy some network power switches. 

right now, all I have is 2 dell servers in a rack with identical configs. (dual ethernet 
controllers and 1 separate controller for the heartbeat). Both are running 
linux-ha and are both connected to a dell powervault 220S storage array which is configured so 
that both hosts can access the drives concurrently (cluster mode). Im following the instructions 
at 
http://www.gyrate.org/archives/9  and am at step 17.. which says to configure CCS.

I guess we could get an APC power switch, but what would you folks suggest? i.e. what model for 
just a 2 cluster node (each server has 2 power supplies). Or is there a better way?

regards,
Jason



From filipe.miranda at gmail.com  Sun Apr 30 22:00:01 2006
From: filipe.miranda at gmail.com (Filipe Miranda)
Date: Sun, 30 Apr 2006 19:00:01 -0300
Subject: [Linux-cluster] MySQL on GFS benchmarks
In-Reply-To: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl>
References: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl>
Message-ID: <a6d13c780604301500h68be8c1dpc07fce59876770d0@mail.gmail.com>

Sander,

It depends, if you are looking for performance, definately SAN.
iSCSI  might have a better performance over GNBD.
I found this on google
http://www.bwbug.org/docs/RedHat-GNBD-Ethernet-SAN.pdf
It has some detais about GFS on SAN and on GNBD, It might help though.3
Good Luck and keep us posted.

Att.
FTM


On 4/26/06, Sander van Beek - Elexis <sander at elexis.nl> wrote:
>
> Hi all,
>
> We did a quick benchmark on our 2 node rhel4 testcluster with gfs and
> a gnbd storage server. The results were very sad. One of the nodes
> (p3 1ghz, 512 mb) could run +/- 2400 insert queries per second when
> running mysqld-max 5.0.20 on a local ext3 filesystem. With a 2 node
> GFS over GNBD setup and inserts on both nodes at the same time, we
> only could do 80 inserts per second. I'm very interested in the
> perfomance others got in a similar setup. Would the performance
> increase when we use software based iscsi instead of gnbd?
> Or should we simply buy SAN equipment? Does anyone have statistics to
> compare a standalone mysql setup to a small gfs cluster using a san?
>
>
> With best regards,
> Sander van Beek
>
> ---------------------------------------
>
> Ing. S. van Beek
> Elexis
> Marketing 9
> 6921 RE Duiven
> The Netherlands
>
> Tel:    +31 (0)26 7110329
> Mob:    +31 (0)6 28395109
> Fax:    +31 (0)318 611112
> Email: sander at elexis.nl
> Web:    http://www.elexis.nl
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060430/d7740feb/attachment.htm>