From raju.rajsand at gmail.com Fri Jan 2 05:49:38 2009 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Fri, 2 Jan 2009 11:19:38 +0530 Subject: [Linux-cluster] Re: Fencing test In-Reply-To: <8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com> References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com> <8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com> <8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com> <8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com> Message-ID: <8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com> Greetings, On Wed, Dec 31, 2008 at 10:30 PM, Paras pradhan wrote: > > Pulled the heartbeat network cable from node1. Nothing happens. BUT > when i plug the cable back , then node1 restarted. What am i misssing > here. The Heartbeat network cable should be out for at least 20-30 seconds. If you have connected the data and heartbeat cable or in the same switch, you may need to pull out both. Incidently, you will have to enable multicasting for the heartbeat network in the switch if it is managed switch and assign a seperate VLAN for it. There have been cases in recent past where some of the switches > Also I don't see any thing interesting in /var/log/messages in > node1 after i disconnect the cable. Have you checked node2? HTH With warm regards Rajagopal From ccaulfie at redhat.com Fri Jan 2 08:34:38 2009 From: ccaulfie at redhat.com (Chrissie Caulfield) Date: Fri, 02 Jan 2009 08:34:38 +0000 Subject: [Linux-cluster] i rpmbuild the cman on linux as4 IBM power, it does not work. In-Reply-To: <4957B2AA.096868.02362@m50-132.163.com> References: <4957B2AA.096868.02362@m50-132.163.com> Message-ID: <495DD19E.4020801@redhat.com> victory.xu wrote: > when i run " service cman start" > the error in the /var/log/messages > > kernel: ioctl32(cman_tool:5382): Unknown cmd fd(3) cmd(2000780b){' '} arg(42000422) on socket:[17147] At a very quick guess that looks like the tools have been built as 32bit and the kernel is 64 bit. There is no 32/64 compatibility layer in cman for RHEL4, they must be the same word size. > the ccsd has been started > > i dont know why > > ????????victory.xu > ????????july_snow at 163.com > ??????????2008-12-29 > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Chrissie From pradhanparas at gmail.com Fri Jan 2 22:48:51 2009 From: pradhanparas at gmail.com (Paras pradhan) Date: Fri, 2 Jan 2009 16:48:51 -0600 Subject: [Linux-cluster] Re: Fencing test In-Reply-To: <8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com> References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com> <8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com> <8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com> <8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com> <8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com> Message-ID: <8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com> On Thu, Jan 1, 2009 at 11:49 PM, Rajagopal Swaminathan wrote: > Greetings, Thanks for following me up with your replies. I really appreciate this. > > On Wed, Dec 31, 2008 at 10:30 PM, Paras pradhan wrote: >> >> Pulled the heartbeat network cable from node1. Nothing happens. BUT >> when i plug the cable back , then node1 restarted. What am i misssing >> here. > > The Heartbeat network cable should be out for at least 20-30 seconds. Yes waited more then 20-30 seconds (around 2,3 minutes). Didn;t reboot. But as I said when I pushed the cable back to network port then it reboots. > > If you have connected the data and heartbeat cable or in the same > switch, you may need to pull out both. Each of my nodes have one network interface card. So my heartbeat and data cable is same and only one if I understand you correctly. > > Incidently, you will have to enable multicasting for the heartbeat > network in the switch if it is managed switch and assign a seperate > VLAN for it. There have been cases in recent past where some of the > switches Here I am using 4 nodes. Node 1) That runs luci Node 2) This is my iscsi shared storage where my virutal machine(s) resides Node 3) First node in my two node cluster Node 4) Second node in my two node cluster All of them are connected simply to an unmanaged 16 port switch. > >> Also I don't see any thing interesting in /var/log/messages in >> node1 after i disconnect the cable. > > Have you checked node2? Nothing in node2 log as well > > HTH > > With warm regards > > Rajagopal > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > Thanks ! Paras. From jngarratt at gmail.com Mon Jan 5 06:10:13 2009 From: jngarratt at gmail.com (James Garratt) Date: Mon, 5 Jan 2009 17:10:13 +1100 Subject: [Linux-cluster] clvm running with redundant gnbd servers Message-ID: <314e34340901042210t8aa4162je75ca81d82f66be4@mail.gmail.com> I'm setting up a GNBD cluster with clvmd on the clients for the purpose of running a xen cluster. I've been playing with this for a few months now and I've almost got everything working, However I still have one outstanding issue that even after extensive searches of documentation and goggle I can't find an answer to. My setup: 2 gnbd servers (running rhel5) 5 gnbd clients (running centos5) GNBD servers are connected to a SAN via redundant paths. Servers export multiple GNBDs with different names but with matching UIDs for each device they export. Clients import all GNBDs from each server. multipath.conf has been configured on the clients to see the GNBDs lvm.conf has been configured on the clients to filter everything except the local disks and /dev/mpath/* My problem is that if I put the two GNBD servers in the same cluster as the GNBD clients then I get warnings as the servers can't see the Volume Groups being used by the clients. If I put the servers in a separate cluster then fencing can not work properly in the event of a server crash and multipath locks up until the server is running again. Is there a way to tell clvm to ignore some of the cluster nodes or is there another solution to this problem? Any advice or pointers to relevant documentation would be appreciated. Regards, James Garratt -------------- next part -------------- An HTML attachment was scrubbed... URL: From raju.rajsand at gmail.com Mon Jan 5 14:23:36 2009 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Mon, 5 Jan 2009 19:53:36 +0530 Subject: [Linux-cluster] Re: Fencing test In-Reply-To: <8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com> References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com> <8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com> <8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com> <8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com> <8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com> <8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com> Message-ID: <8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com> Greetings, On Sat, Jan 3, 2009 at 4:18 AM, Paras pradhan wrote: > > Here I am using 4 nodes. > > Node 1) That runs luci > Node 2) This is my iscsi shared storage where my virutal machine(s) resides > Node 3) First node in my two node cluster > Node 4) Second node in my two node cluster > > All of them are connected simply to an unmanaged 16 port switch. Luci need not require a separate node to run. it can run on one of the member nodes (node 3 | 4). what does clustat say? Can you post your cluster.conf here? When you pull out the network cable *and* plug it back in say node 3, , what messages appear in the /var/log/messages if Node 4 (if any)? (sorry for the repitition, but messages are necessary here to make any sense of the situation) HTH With warm regards Rajagopal From pradhanparas at gmail.com Mon Jan 5 18:11:24 2009 From: pradhanparas at gmail.com (Paras pradhan) Date: Mon, 5 Jan 2009 12:11:24 -0600 Subject: [Linux-cluster] Re: Fencing test In-Reply-To: <8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com> References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com> <8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com> <8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com> <8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com> <8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com> <8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com> <8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com> Message-ID: <8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com> hi, On Mon, Jan 5, 2009 at 8:23 AM, Rajagopal Swaminathan wrote: > Greetings, > > On Sat, Jan 3, 2009 at 4:18 AM, Paras pradhan wrote: >> >> Here I am using 4 nodes. >> >> Node 1) That runs luci >> Node 2) This is my iscsi shared storage where my virutal machine(s) resides >> Node 3) First node in my two node cluster >> Node 4) Second node in my two node cluster >> >> All of them are connected simply to an unmanaged 16 port switch. > > Luci need not require a separate node to run. it can run on one of the > member nodes (node 3 | 4). OK. > > what does clustat say? Here is my clustat o/p: ----------- [root at ha1lx ~]# clustat Cluster Status for ipmicluster @ Mon Jan 5 12:00:10 2009 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ 10.42.21.29 1 Online, rgmanager 10.42.21.27 2 Online, Local, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- vm:linux64 10.42.21.27 started [root at ha1lx ~]# ------------------------ 10.42.21.27 is node3 and 10.42.21.29 is node4 > > Can you post your cluster.conf here? Here is my cluster.conf -- [root at ha1lx cluster]# more cluster.conf ------ Here: 10.42.21.28 is IPMI interface in node3 10.42.21.30 is IPMI interface in node4 > > When you pull out the network cable *and* plug it back in say node 3, > , what messages appear in the /var/log/messages if Node 4 (if any)? > (sorry for the repitition, but messages are necessary here to make any > sense of the situation) > Ok here is the log in node 4 after i disconnect the network cable in node3. ----------- Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] The token was lost in the OPERATIONAL state. Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] entering GATHER state from 2. Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering GATHER state from 0. Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Creating commit token because I am the rep. Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Saving state aru 76 high seq received 76 Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Storing new sequence id for ring ac Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering COMMIT state. Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering RECOVERY state. Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] position [0] member 10.42.21.29: Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] previous ring seq 168 rep 10.42.21.27 Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] aru 76 high delivered 76 received flag 1 Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Did not need to originate any messages in recovery. Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Sending initial ORF token Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] New Configuration: Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29) Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Left: Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27) Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Joined: Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE Jan 5 12:05:28 ha2lx kernel: dlm: closing connection to node 2 Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] New Configuration: Jan 5 12:05:28 ha2lx fenced[5004]: 10.42.21.27 not a cluster member after 0 sec post_fail_delay Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29) Jan 5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Trying to acquire journal lock... Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Left: Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Joined: Jan 5 12:05:28 ha2lx openais[4988]: [SYNC ] This node is within the primary component and will provide service. Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state. Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] got nodejoin message 10.42.21.29 Jan 5 12:05:28 ha2lx openais[4988]: [CPG ] got joinlist message from node 1 Jan 5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Looking at journal... Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Acquiring the transaction lock... Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Replaying journal... Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Replayed 0 of 0 blocks Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Found 0 revoke tags Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Journal replayed in 1s Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Done ------------------ Now when I plug back my cable to node3, node 4 reboots and here is the quickly grabbed log in node4 -- Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering GATHER state from 11. Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Saving state aru 1d high seq received 1d Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Storing new sequence id for ring b0 Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering COMMIT state. Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering RECOVERY state. Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] position [0] member 10.42.21.27: Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep 10.42.21.27 Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 16 high delivered 16 received flag 1 Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] position [1] member 10.42.21.29: Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep 10.42.21.29 Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 1d high delivered 1d received flag 1 Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Did not need to originate any messages in recovery. Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] New Configuration: Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29) Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Left: Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Joined: Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] New Configuration: Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27) Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29) Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Left: Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Joined: Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27) Jan 5 12:07:12 ha2lx openais[4988]: [SYNC ] This node is within the primary component and will provide service. Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state. Jan 5 12:07:12 ha2lx openais[4988]: [MAIN ] Killing node 10.42.21.27 because it has rejoined the cluster with existing state Jan 5 12:07:12 ha2lx openais[4988]: [CMAN ] cman killed by node 2 because we rejoined the cluster without a full restart Jan 5 12:07:12 ha2lx gfs_controld[5016]: groupd_dispatch error -1 errno 11 Jan 5 12:07:12 ha2lx gfs_controld[5016]: groupd connection died Jan 5 12:07:12 ha2lx gfs_controld[5016]: cluster is down, exiting Jan 5 12:07:12 ha2lx dlm_controld[5010]: cluster is down, exiting Jan 5 12:07:12 ha2lx kernel: dlm: closing connection to node 1 Jan 5 12:07:12 ha2lx fenced[5004]: cluster is down, exiting ------- Also here is the log of node3: -- [root at ha1lx ~]# tail -f /var/log/messages Jan 5 12:07:24 ha1lx openais[26029]: [TOTEM] entering OPERATIONAL state. Jan 5 12:07:24 ha1lx openais[26029]: [CLM ] got nodejoin message 10.42.21.27 Jan 5 12:07:24 ha1lx openais[26029]: [CLM ] got nodejoin message 10.42.21.27 Jan 5 12:07:24 ha1lx openais[26029]: [CPG ] got joinlist message from node 2 Jan 5 12:07:27 ha1lx ccsd[26019]: Attempt to close an unopened CCS descriptor (4520670). Jan 5 12:07:27 ha1lx ccsd[26019]: Error while processing disconnect: Invalid request descriptor Jan 5 12:07:27 ha1lx fenced[26045]: fence "10.42.21.29" success Jan 5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: jid=0: Trying to acquire journal lock... Jan 5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: jid=0: Looking at journal... Jan 5 12:07:28 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: jid=0: Done ---------------- > HTH > > With warm regards > > Rajagopal > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > Thanks a lot Paras. From Joseph.Greenseid at ngc.com Mon Jan 5 20:18:10 2009 From: Joseph.Greenseid at ngc.com (Greenseid, Joseph M.) Date: Mon, 5 Jan 2009 14:18:10 -0600 Subject: [Linux-cluster] problem adding new node to an existing cluster Message-ID: hi all, i am trying to add a new node to an existing 3 node GFS cluster. i followed the steps in the online docs for this, so i went onto the 1st node in my existing cluster, run system-config-cluster, added a new node and fence for it, then propagated that out to the existing nodes, and scp'd the cluster.conf file to the new node. at that point, i confirmed that multipath and mdadm config files were synced with my other nodes, the new node can properly see the SAN that they're all sharing, etc. i then started cman, which seemed to start without any trouble. i tried to start clvmd, but it says: Activating VGs: Skipping clustered volume group san01 my VG is named "san01," so it can see the volume group, it just won't activate it for some reason. any ideas what i'm doing wrong? thanks, --Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpeterso at redhat.com Mon Jan 5 20:25:36 2009 From: rpeterso at redhat.com (Bob Peterson) Date: Mon, 5 Jan 2009 15:25:36 -0500 (EST) Subject: [Linux-cluster] problem adding new node to an existing cluster In-Reply-To: Message-ID: <868569604.2835591231187135219.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> ----- "Joseph M. Greenseid" wrote: | hi all, | | i am trying to add a new node to an existing 3 node GFS cluster. | | i followed the steps in the online docs for this, so i went onto the | 1st node in my existing cluster, run system-config-cluster, added a | new node and fence for it, then propagated that out to the existing | nodes, and scp'd the cluster.conf file to the new node. | | at that point, i confirmed that multipath and mdadm config files were | synced with my other nodes, the new node can properly see the SAN that | they're all sharing, etc. | | i then started cman, which seemed to start without any trouble. i | tried to start clvmd, but it says: | | Activating VGs: Skipping clustered volume group san01 | | my VG is named "san01," so it can see the volume group, it just won't | activate it for some reason. any ideas what i'm doing wrong? | | thanks, | --Joe Hi Joe, Make sure that you have clvmd service running on the new node ("chkconfig clvmd on" and/or "service clvmd start" as necessary). Also, make sure the lock_type is 2 (RHEL4/similar) or 3 (RHEL5/similar) in the /etc/lvm/lvm.conf file. Regards, Bob Peterson Red Hat GFS From Joseph.Greenseid at ngc.com Mon Jan 5 20:28:12 2009 From: Joseph.Greenseid at ngc.com (Greenseid, Joseph M.) Date: Mon, 5 Jan 2009 14:28:12 -0600 Subject: [Linux-cluster] problem adding new node to an existing cluster References: <868569604.2835591231187135219.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> Message-ID: ---- "Joseph M. Greenseid" wrote: | hi all, | | i am trying to add a new node to an existing 3 node GFS cluster. | | i followed the steps in the online docs for this, so i went onto the | 1st node in my existing cluster, run system-config-cluster, added a | new node and fence for it, then propagated that out to the existing | nodes, and scp'd the cluster.conf file to the new node. | | at that point, i confirmed that multipath and mdadm config files were | synced with my other nodes, the new node can properly see the SAN that | they're all sharing, etc. | | i then started cman, which seemed to start without any trouble. i | tried to start clvmd, but it says: | | Activating VGs: Skipping clustered volume group san01 | | my VG is named "san01," so it can see the volume group, it just won't | activate it for some reason. any ideas what i'm doing wrong? | | thanks, | --Joe > Hi Joe, > Make sure that you have clvmd service running on the new node > ("chkconfig clvmd on" and/or "service clvmd start" as necessary). Hi Bob, Yes, this problem started when I tried to start clvmd (/sbin/service clvmd start). > Also, make sure the lock_type is 2 (RHEL4/similar) or 3 (RHEL5/similar) > in the /etc/lvm/lvm.conf file. Ah, Ok, I believe this may be the trouble. My lock_type was 1. I'll change it and try again. Thanks. --Joe > Regards, > Bob Peterson > Red Hat GFS -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 4399 bytes Desc: not available URL: From Joseph.Greenseid at ngc.com Mon Jan 5 21:10:29 2009 From: Joseph.Greenseid at ngc.com (Greenseid, Joseph M.) Date: Mon, 5 Jan 2009 15:10:29 -0600 Subject: [Linux-cluster] problem adding new node to an existing cluster References: Message-ID: > Also, make sure the lock_type is 2 (RHEL4/similar) or 3 (RHEL5/similar) > in the /etc/lvm/lvm.conf file. This fixed it. Thanks. --Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From Joseph.Greenseid at ngc.com Mon Jan 5 22:01:45 2009 From: Joseph.Greenseid at ngc.com (Greenseid, Joseph M.) Date: Mon, 5 Jan 2009 16:01:45 -0600 Subject: [Linux-cluster] problem adding new node to an existing cluster References: Message-ID: Hi, I have a new question. When I created this file system a year ago, I didn't anticipate needing any additional nodes other than the original 3 I set up. Consequently, I have 3 journals. Now that I've been told to add a fourth node, is there a way to add a journal to an existing file system that resides on a volume that has not been expanded (the docs appear to read that you can only do it to an expanded volume because the additional journal(s) take up additional space). My file system isn't full, though my volume is fully used by the formatted GFS file system. Is there anything I can do that won't involve destroying my existing file system? Thanks, --Joe -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3699 bytes Desc: not available URL: From rpeterso at redhat.com Mon Jan 5 23:09:18 2009 From: rpeterso at redhat.com (Bob Peterson) Date: Mon, 5 Jan 2009 18:09:18 -0500 (EST) Subject: [Linux-cluster] problem adding new node to an existing cluster In-Reply-To: <1380566121.21231196900140.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> Message-ID: <291064814.51231196957732.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> ----- "Joseph M. Greenseid" wrote: | Hi, | | I have a new question. When I created this file system a year ago, I | didn't anticipate needing any additional nodes other than the original | 3 I set up. Consequently, I have 3 journals. Now that I've been told | to add a fourth node, is there a way to add a journal to an existing | file system that resides on a volume that has not been expanded (the | docs appear to read that you can only do it to an expanded volume | because the additional journal(s) take up additional space). My file | system isn't full, though my volume is fully used by the formatted GFS | file system. | | Is there anything I can do that won't involve destroying my existing | file system? | | Thanks, | --Joe Hi Joe, Journals for gfs file systems are carved out during mkfs. The rest of the space is used for data and metadata. So there are only two ways to make journals: (1) Do another mkfs which will destroy your file system or (2) if you're using lvm, add more storage with something like lvresize or lvextend, then use gfs_jadd to add the new journal to the new chunk of storage. We realize that's a pain, and that's why we took away that restriction in gfs2. In gfs2, journals are kept as a hidden part of the file system, so they can be added painlessly to an existing file system without adding storage. So I guess a third option would be to convert the file system to gfs2 using gfs2_convert, add the journal with gfs2_jadd, then use it as gfs2 from then on. But please be aware that gfs2_convert had some serious problems until the 5.3 version that was committed to the cluster git tree in December, (i.e. the very latest and greatest "RHEL5", "RHEL53", "master", "STABLE2" or "STABLE3" versions in the cluster git (source code) tree.) Make ABSOLUTELY CERTAIN that you have a working & recent backup and restore option before you try this. Also, the GFS2 kernel code prior to 5.3 is considered tech preview as well, so not ready for production use. So if you're not building from source code, you should wait until RHEL5.3 or Centos5.3 (or similar) before even considering this option. Regards, Bob Peterson Red Hat GFS From Joseph.Greenseid at ngc.com Tue Jan 6 13:57:21 2009 From: Joseph.Greenseid at ngc.com (Greenseid, Joseph M.) Date: Tue, 6 Jan 2009 07:57:21 -0600 Subject: [Linux-cluster] problem adding new node to an existing cluster References: <291064814.51231196957732.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> Message-ID: ---- "Joseph M. Greenseid" wrote: | Hi, | | I have a new question. When I created this file system a year ago, I | didn't anticipate needing any additional nodes other than the original | 3 I set up. Consequently, I have 3 journals. Now that I've been told | to add a fourth node, is there a way to add a journal to an existing | file system that resides on a volume that has not been expanded (the | docs appear to read that you can only do it to an expanded volume | because the additional journal(s) take up additional space). My file | system isn't full, though my volume is fully used by the formatted GFS | file system. | | Is there anything I can do that won't involve destroying my existing | file system? | | Thanks, | --Joe > Hi Joe, > Journals for gfs file systems are carved out during mkfs. The rest of the > space is used for data and metadata. So there are only two ways to > make journals: (1) Do another mkfs which will destroy your file system > or (2) if you're using lvm, add more storage with something like > lvresize or lvextend, then use gfs_jadd to add the new journal to the > new chunk of storage. > Ok, so I did understand correctly. That's at least something positive. :) > We realize that's a pain, and that's why we took away that restriction > in gfs2. In gfs2, journals are kept as a hidden part of the file system, > so they can be added painlessly to an existing file system without > adding storage. So I guess a third option would be to convert the file > system to gfs2 using gfs2_convert, add the journal with gfs2_jadd, then > use it as gfs2 from then on. But please be aware that gfs2_convert had some > serious problems until the 5.3 version that was committed to the cluster > git tree in December, (i.e. the very latest and greatest "RHEL5", "RHEL53", > "master", "STABLE2" or "STABLE3" versions in the cluster git (source code) > tree.) Make ABSOLUTELY CERTAIN that you have a working & recent backup and > restore option before you try this. Also, the GFS2 kernel code prior to > 5.3 is considered tech preview as well, so not ready for production use. > So if you're not building from source code, you should wait until RHEL5.3 > or Centos5.3 (or similar) before even considering this option. > Ok, I have an earlier version of GFS2, so I guess I'm going to need to sit down and figure out a better strategy for what I've been asked to do. I appreciate the help with my questions, though. Thanks again. --Joe > Regards, > > Bob Peterson > Red Hat GFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From duplessis.jacques at gmail.com Tue Jan 6 23:56:56 2009 From: duplessis.jacques at gmail.com (Jacques Duplessis) Date: Tue, 6 Jan 2009 18:56:56 -0500 Subject: [Linux-cluster] Re: Linux-cluster Digest, Vol 57, Issue 5 In-Reply-To: <20090106170010.7AFD58E00FA@hormel.redhat.com> References: <20090106170010.7AFD58E00FA@hormel.redhat.com> Message-ID: <6d89d2a30901061556t4e6d66b6x7a4dd48a50e2dd80@mail.gmail.com> # Add theses lines to syslog.conf file & Restart syslog # ======================================================== # vi /etc/syslog.conf # rgmanager log local4.* /var/log/rgmanager # Create log file before restarting the syslog # ======================================================== # touch /var/log/rgmanager # chmod 644 /var/log/manager # chown root.root /var/log/rgmanager # service syslog restart Shutting down kernel logger: [ OK ] Shutting down system logger: [ OK ] Starting system logger: [ OK ] Starting kernel logger: [ OK ] # Change cluster config file to log rgmanager info # ======================================================== # vi /etc/cluster/cluster.conf change line to # Push changes to all cluster nodes # ======================================================== # ccs_tool update /etc/cluster/cluster.conf Unplug and plug back network cable on the node and look at the /var/log/rgmanager file. May contain usefull info for us. On Tue, Jan 6, 2009 at 12:00 PM, wrote: > Send Linux-cluster mailing list submissions to > linux-cluster at redhat.com > > To subscribe or unsubscribe via the World Wide Web, visit > https://www.redhat.com/mailman/listinfo/linux-cluster > or, via email, send a message with subject or body 'help' to > linux-cluster-request at redhat.com > > You can reach the person managing the list at > linux-cluster-owner at redhat.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Linux-cluster digest..." > > > Today's Topics: > > 1. Re: Re: Fencing test (Paras pradhan) > 2. problem adding new node to an existing cluster > (Greenseid, Joseph M.) > 3. Re: problem adding new node to an existing cluster (Bob Peterson) > 4. RE: problem adding new node to an existing cluster > (Greenseid, Joseph M.) > 5. RE: problem adding new node to an existing cluster > (Greenseid, Joseph M.) > 6. RE: problem adding new node to an existing cluster > (Greenseid, Joseph M.) > 7. Re: problem adding new node to an existing cluster (Bob Peterson) > 8. RE: problem adding new node to an existing cluster > (Greenseid, Joseph M.) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 5 Jan 2009 12:11:24 -0600 > From: "Paras pradhan" > Subject: Re: [Linux-cluster] Re: Fencing test > To: "linux clustering" > Message-ID: > <8b711df40901051011x79066243g38108439ffb1075f at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > hi, > > On Mon, Jan 5, 2009 at 8:23 AM, Rajagopal Swaminathan > wrote: > > Greetings, > > > > On Sat, Jan 3, 2009 at 4:18 AM, Paras pradhan > wrote: > >> > >> Here I am using 4 nodes. > >> > >> Node 1) That runs luci > >> Node 2) This is my iscsi shared storage where my virutal machine(s) > resides > >> Node 3) First node in my two node cluster > >> Node 4) Second node in my two node cluster > >> > >> All of them are connected simply to an unmanaged 16 port switch. > > > > Luci need not require a separate node to run. it can run on one of the > > member nodes (node 3 | 4). > > OK. > > > > > what does clustat say? > > Here is my clustat o/p: > > ----------- > > [root at ha1lx ~]# clustat > Cluster Status for ipmicluster @ Mon Jan 5 12:00:10 2009 > Member Status: Quorate > > Member Name ID > Status > ------ ---- ---- > ------ > 10.42.21.29 1 > Online, rgmanager > 10.42.21.27 2 > Online, Local, rgmanager > > Service Name > Owner (Last) State > ------- ---- > ----- ------ ----- > vm:linux64 > 10.42.21.27 > started > [root at ha1lx ~]# > ------------------------ > > > 10.42.21.27 is node3 and 10.42.21.29 is node4 > > > > > > > Can you post your cluster.conf here? > > Here is my cluster.conf > > -- > [root at ha1lx cluster]# more cluster.conf > > > post_join_delay="3"/> > > > > > > > > > > > > > > > > > > > login="admin" name="fence1" passwd="admin"/> > login="admin" name="fence2" passwd="admin"/> > > > > ordered="1" restricted="0"> > priority="2"/> > priority="1"/> > > > > name="linux64" path="/guest_roots" recovery="restart"/> > > > ------ > > > Here: > > 10.42.21.28 is IPMI interface in node3 > 10.42.21.30 is IPMI interface in node4 > > > > > > > > > > > > When you pull out the network cable *and* plug it back in say node 3, > > , what messages appear in the /var/log/messages if Node 4 (if any)? > > (sorry for the repitition, but messages are necessary here to make any > > sense of the situation) > > > > Ok here is the log in node 4 after i disconnect the network cable in node3. > > ----------- > > Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] The token was lost in the > OPERATIONAL state. > Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] Receive multicast socket > recv buffer size (288000 bytes). > Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] Transmit multicast socket > send buffer size (262142 bytes). > Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] entering GATHER state from 2. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering GATHER state from 0. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Creating commit token > because I am the rep. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Saving state aru 76 high > seq received 76 > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Storing new sequence id > for ring ac > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering COMMIT state. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering RECOVERY state. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] position [0] member > 10.42.21.29: > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] previous ring seq 168 rep > 10.42.21.27 > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] aru 76 high delivered 76 > received flag 1 > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Did not need to originate > any messages in recovery. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Sending initial ORF token > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] New Configuration: > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29) > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Left: > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27) > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Joined: > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE > Jan 5 12:05:28 ha2lx kernel: dlm: closing connection to node 2 > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] New Configuration: > Jan 5 12:05:28 ha2lx fenced[5004]: 10.42.21.27 not a cluster member > after 0 sec post_fail_delay > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29) > Jan 5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Trying to acquire journal lock... > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Left: > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Joined: > Jan 5 12:05:28 ha2lx openais[4988]: [SYNC ] This node is within the > primary component and will provide service. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state. > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] got nodejoin message > 10.42.21.29 > Jan 5 12:05:28 ha2lx openais[4988]: [CPG ] got joinlist message from node > 1 > Jan 5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Looking at journal... > Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Acquiring the transaction lock... > Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Replaying journal... > Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Replayed 0 of 0 blocks > Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Found 0 revoke tags > Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Journal replayed in 1s > Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: > Done > ------------------ > > Now when I plug back my cable to node3, node 4 reboots and here is the > quickly grabbed log in node4 > > > -- > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering GATHER state from 11. > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Saving state aru 1d high > seq received 1d > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Storing new sequence id > for ring b0 > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering COMMIT state. > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering RECOVERY state. > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] position [0] member > 10.42.21.27: > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep > 10.42.21.27 > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 16 high delivered 16 > received flag 1 > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] position [1] member > 10.42.21.29: > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep > 10.42.21.29 > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 1d high delivered 1d > received flag 1 > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Did not need to originate > any messages in recovery. > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] New Configuration: > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29) > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Left: > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Joined: > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] New Configuration: > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27) > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29) > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Left: > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Joined: > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27) > Jan 5 12:07:12 ha2lx openais[4988]: [SYNC ] This node is within the > primary component and will provide service. > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state. > Jan 5 12:07:12 ha2lx openais[4988]: [MAIN ] Killing node 10.42.21.27 > because it has rejoined the cluster with existing state > Jan 5 12:07:12 ha2lx openais[4988]: [CMAN ] cman killed by node 2 > because we rejoined the cluster without a full restart > Jan 5 12:07:12 ha2lx gfs_controld[5016]: groupd_dispatch error -1 errno 11 > Jan 5 12:07:12 ha2lx gfs_controld[5016]: groupd connection died > Jan 5 12:07:12 ha2lx gfs_controld[5016]: cluster is down, exiting > Jan 5 12:07:12 ha2lx dlm_controld[5010]: cluster is down, exiting > Jan 5 12:07:12 ha2lx kernel: dlm: closing connection to node 1 > Jan 5 12:07:12 ha2lx fenced[5004]: cluster is down, exiting > ------- > > > Also here is the log of node3: > > -- > [root at ha1lx ~]# tail -f /var/log/messages > Jan 5 12:07:24 ha1lx openais[26029]: [TOTEM] entering OPERATIONAL state. > Jan 5 12:07:24 ha1lx openais[26029]: [CLM ] got nodejoin message > 10.42.21.27 > Jan 5 12:07:24 ha1lx openais[26029]: [CLM ] got nodejoin message > 10.42.21.27 > Jan 5 12:07:24 ha1lx openais[26029]: [CPG ] got joinlist message from > node 2 > Jan 5 12:07:27 ha1lx ccsd[26019]: Attempt to close an unopened CCS > descriptor (4520670). > Jan 5 12:07:27 ha1lx ccsd[26019]: Error while processing disconnect: > Invalid request descriptor > Jan 5 12:07:27 ha1lx fenced[26045]: fence "10.42.21.29" success > Jan 5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: > jid=0: Trying to acquire journal lock... > Jan 5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: > jid=0: Looking at journal... > Jan 5 12:07:28 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: jid=0: > Done > ---------------- > > > > > > > > > > > > > > HTH > > > > With warm regards > > > > Rajagopal > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > Thanks a lot > > Paras. > > > > ------------------------------ > > Message: 2 > Date: Mon, 5 Jan 2009 14:18:10 -0600 > From: "Greenseid, Joseph M." > Subject: [Linux-cluster] problem adding new node to an existing > cluster > To: > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > hi all, > > i am trying to add a new node to an existing 3 node GFS cluster. > > i followed the steps in the online docs for this, so i went onto the 1st > node in my existing cluster, run system-config-cluster, added a new node and > fence for it, then propagated that out to the existing nodes, and scp'd the > cluster.conf file to the new node. > > at that point, i confirmed that multipath and mdadm config files were > synced with my other nodes, the new node can properly see the SAN that > they're all sharing, etc. > > i then started cman, which seemed to start without any trouble. i tried to > start clvmd, but it says: > > Activating VGs: Skipping clustered volume group san01 > > my VG is named "san01," so it can see the volume group, it just won't > activate it for some reason. any ideas what i'm doing wrong? > > thanks, > --Joe > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > https://www.redhat.com/archives/linux-cluster/attachments/20090105/d4760d53/attachment.html > > ------------------------------ > > Message: 3 > Date: Mon, 5 Jan 2009 15:25:36 -0500 (EST) > From: Bob Peterson > Subject: Re: [Linux-cluster] problem adding new node to an existing > cluster > To: linux clustering > Message-ID: > < > 868569604.2835591231187135219.JavaMail.root at zmail02.collab.prod.int.phx2.redhat.com > > > > Content-Type: text/plain; charset=utf-8 > > ----- "Joseph M. Greenseid" wrote: > | hi all, > | > | i am trying to add a new node to an existing 3 node GFS cluster. > | > | i followed the steps in the online docs for this, so i went onto the > | 1st node in my existing cluster, run system-config-cluster, added a > | new node and fence for it, then propagated that out to the existing > | nodes, and scp'd the cluster.conf file to the new node. > | > | at that point, i confirmed that multipath and mdadm config files were > | synced with my other nodes, the new node can properly see the SAN that > | they're all sharing, etc. > | > | i then started cman, which seemed to start without any trouble. i > | tried to start clvmd, but it says: > | > | Activating VGs: Skipping clustered volume group san01 > | > | my VG is named "san01," so it can see the volume group, it just won't > | activate it for some reason. any ideas what i'm doing wrong? > | > | thanks, > | --Joe > > Hi Joe, > > Make sure that you have clvmd service running on the new node > ("chkconfig clvmd on" and/or "service clvmd start" as necessary). > Also, make sure the lock_type is 2 (RHEL4/similar) or 3 (RHEL5/similar) > in the /etc/lvm/lvm.conf file. > > Regards, > > Bob Peterson > Red Hat GFS > > > > ------------------------------ > > Message: 4 > Date: Mon, 5 Jan 2009 14:28:12 -0600 > From: "Greenseid, Joseph M." > Subject: RE: [Linux-cluster] problem adding new node to an existing > cluster > To: "linux clustering" > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > ---- "Joseph M. Greenseid" wrote: > | hi all, > | > | i am trying to add a new node to an existing 3 node GFS cluster. > | > | i followed the steps in the online docs for this, so i went onto the > | 1st node in my existing cluster, run system-config-cluster, added a > | new node and fence for it, then propagated that out to the existing > | nodes, and scp'd the cluster.conf file to the new node. > | > | at that point, i confirmed that multipath and mdadm config files were > | synced with my other nodes, the new node can properly see the SAN that > | they're all sharing, etc. > | > | i then started cman, which seemed to start without any trouble. i > | tried to start clvmd, but it says: > | > | Activating VGs: Skipping clustered volume group san01 > | > | my VG is named "san01," so it can see the volume group, it just won't > | activate it for some reason. any ideas what i'm doing wrong? > | > | thanks, > | --Joe > > > Hi Joe, > > > Make sure that you have clvmd service running on the new node > > ("chkconfig clvmd on" and/or "service clvmd start" as necessary). > > Hi Bob, > > Yes, this problem started when I tried to start clvmd (/sbin/service clvmd > start). > > > > Also, make sure the lock_type is 2 (RHEL4/similar) or 3 (RHEL5/similar) > > in the /etc/lvm/lvm.conf file. > > Ah, Ok, I believe this may be the trouble. My lock_type was 1. I'll > change it and try again. Thanks. > > --Joe > > > Regards, > > > Bob Peterson > > Red Hat GFS > > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/ms-tnef > Size: 4399 bytes > Desc: not available > Url : > https://www.redhat.com/archives/linux-cluster/attachments/20090105/6da60c4d/attachment.bin > > ------------------------------ > > Message: 5 > Date: Mon, 5 Jan 2009 15:10:29 -0600 > From: "Greenseid, Joseph M." > Subject: RE: [Linux-cluster] problem adding new node to an existing > cluster > To: "linux clustering" , "linux clustering" > > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > > Also, make sure the lock_type is 2 (RHEL4/similar) or 3 (RHEL5/similar) > > in the /etc/lvm/lvm.conf file. > > This fixed it. Thanks. > > --Joe > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > https://www.redhat.com/archives/linux-cluster/attachments/20090105/0999baeb/attachment.html > > ------------------------------ > > Message: 6 > Date: Mon, 5 Jan 2009 16:01:45 -0600 > From: "Greenseid, Joseph M." > Subject: RE: [Linux-cluster] problem adding new node to an existing > cluster > To: "linux clustering" > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > Hi, > > I have a new question. When I created this file system a year ago, I > didn't anticipate needing any additional nodes other than the original 3 I > set up. Consequently, I have 3 journals. Now that I've been told to add a > fourth node, is there a way to add a journal to an existing file system that > resides on a volume that has not been expanded (the docs appear to read that > you can only do it to an expanded volume because the additional journal(s) > take up additional space). My file system isn't full, though my volume is > fully used by the formatted GFS file system. > > Is there anything I can do that won't involve destroying my existing file > system? > > Thanks, > --Joe > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/ms-tnef > Size: 3699 bytes > Desc: not available > Url : > https://www.redhat.com/archives/linux-cluster/attachments/20090105/ddb0e237/attachment.bin > > ------------------------------ > > Message: 7 > Date: Mon, 5 Jan 2009 18:09:18 -0500 (EST) > From: Bob Peterson > Subject: Re: [Linux-cluster] problem adding new node to an existing > cluster > To: linux clustering > Message-ID: > < > 291064814.51231196957732.JavaMail.root at zmail02.collab.prod.int.phx2.redhat.com > > > > Content-Type: text/plain; charset=utf-8 > > ----- "Joseph M. Greenseid" wrote: > | Hi, > | > | I have a new question. When I created this file system a year ago, I > | didn't anticipate needing any additional nodes other than the original > | 3 I set up. Consequently, I have 3 journals. Now that I've been told > | to add a fourth node, is there a way to add a journal to an existing > | file system that resides on a volume that has not been expanded (the > | docs appear to read that you can only do it to an expanded volume > | because the additional journal(s) take up additional space). My file > | system isn't full, though my volume is fully used by the formatted GFS > | file system. > | > | Is there anything I can do that won't involve destroying my existing > | file system? > | > | Thanks, > | --Joe > > Hi Joe, > > Journals for gfs file systems are carved out during mkfs. The rest of the > space is used for data and metadata. So there are only two ways to > make journals: (1) Do another mkfs which will destroy your file system > or (2) if you're using lvm, add more storage with something like > lvresize or lvextend, then use gfs_jadd to add the new journal to the > new chunk of storage. > > We realize that's a pain, and that's why we took away that restriction > in gfs2. In gfs2, journals are kept as a hidden part of the file system, > so they can be added painlessly to an existing file system without > adding storage. So I guess a third option would be to convert the file > system to gfs2 using gfs2_convert, add the journal with gfs2_jadd, then > use it as gfs2 from then on. But please be aware that gfs2_convert had > some > serious problems until the 5.3 version that was committed to the cluster > git tree in December, (i.e. the very latest and greatest "RHEL5", "RHEL53", > "master", "STABLE2" or "STABLE3" versions in the cluster git (source code) > tree.) Make ABSOLUTELY CERTAIN that you have a working & recent backup and > restore option before you try this. Also, the GFS2 kernel code prior to > 5.3 is considered tech preview as well, so not ready for production use. > So if you're not building from source code, you should wait until RHEL5.3 > or Centos5.3 (or similar) before even considering this option. > > Regards, > > Bob Peterson > Red Hat GFS > > > > ------------------------------ > > Message: 8 > Date: Tue, 6 Jan 2009 07:57:21 -0600 > From: "Greenseid, Joseph M." > Subject: RE: [Linux-cluster] problem adding new node to an existing > cluster > To: "linux clustering" , "linux clustering" > > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > ---- "Joseph M. Greenseid" wrote: > | Hi, > | > | I have a new question. When I created this file system a year ago, I > | didn't anticipate needing any additional nodes other than the original > | 3 I set up. Consequently, I have 3 journals. Now that I've been told > | to add a fourth node, is there a way to add a journal to an existing > | file system that resides on a volume that has not been expanded (the > | docs appear to read that you can only do it to an expanded volume > | because the additional journal(s) take up additional space). My file > | system isn't full, though my volume is fully used by the formatted GFS > | file system. > | > | Is there anything I can do that won't involve destroying my existing > | file system? > | > | Thanks, > | --Joe > > > Hi Joe, > > > Journals for gfs file systems are carved out during mkfs. The rest of > the > > space is used for data and metadata. So there are only two ways to > > make journals: (1) Do another mkfs which will destroy your file system > > or (2) if you're using lvm, add more storage with something like > > lvresize or lvextend, then use gfs_jadd to add the new journal to the > > new chunk of storage. > > > > Ok, so I did understand correctly. That's at least something positive. :) > > > > We realize that's a pain, and that's why we took away that restriction > > in gfs2. In gfs2, journals are kept as a hidden part of the file system, > > so they can be added painlessly to an existing file system without > > adding storage. So I guess a third option would be to convert the file > > system to gfs2 using gfs2_convert, add the journal with gfs2_jadd, then > > use it as gfs2 from then on. But please be aware that gfs2_convert had > some > > serious problems until the 5.3 version that was committed to the cluster > > git tree in December, (i.e. the very latest and greatest "RHEL5", > "RHEL53", > > "master", "STABLE2" or "STABLE3" versions in the cluster git (source > code) > > tree.) Make ABSOLUTELY CERTAIN that you have a working & recent backup > and > > restore option before you try this. Also, the GFS2 kernel code prior to > > 5.3 is considered tech preview as well, so not ready for production use. > > So if you're not building from source code, you should wait until RHEL5.3 > > or Centos5.3 (or similar) before even considering this option. > > > > > Ok, I have an earlier version of GFS2, so I guess I'm going to need to sit > down and figure out a better strategy for what I've been asked to do. I > appreciate the help with my questions, though. Thanks again. > > --Joe > > > Regards, > > > > Bob Peterson > > Red Hat GFS > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > https://www.redhat.com/archives/linux-cluster/attachments/20090106/78398c16/attachment.html > > ------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > End of Linux-cluster Digest, Vol 57, Issue 5 > ******************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garromo at us.ibm.com Wed Jan 7 20:39:42 2009 From: garromo at us.ibm.com (Gary Romo) Date: Wed, 7 Jan 2009 13:39:42 -0700 Subject: [Linux-cluster] system-config-cluster Error Message-ID: When I opened system-config-cluster today, I got this error; Poorly Formed XML Error A problem was encountered while reading configuration file /etc/cluster/cluster.conf Details or the error appear below. Click the `New` button to create a new configuration file. To continue anyway (Not recommended), click the `Ok` button Relax-NG validity error : Extra element rm in interleave /etc/cluster/cluster.conf:2: element cluster: Relax-NG validity error : Element cluster failed to validate content /etc/cluster/cluster.conf fails to validate Can anyone tell me what this is and how to correct? Thanks! Gary Romo -------------- next part -------------- An HTML attachment was scrubbed... URL: From jumanjiman at gmail.com Wed Jan 7 21:06:33 2009 From: jumanjiman at gmail.com (Paul Morgan) Date: Wed, 7 Jan 2009 15:06:33 -0600 Subject: [Linux-cluster] system-config-cluster Error In-Reply-To: References: Message-ID: <07646F01-ED12-430D-97AF-5CDCD33CDC7D@gmail.com> On Jan 7, 2009, at 14:39, Gary Romo wrote: > When I opened system-config-cluster today, I got this error; > > Poorly Formed XML Error > > A problem was encountered while reading configuration file /etc/ > cluster/cluster.conf > Details or the error appear below. Click the `New` button to create > a new configuration file. > To continue anyway (Not recommended), click the `Ok` button > > Relax-NG validity error : Extra element rm in interleave > /etc/cluster/cluster.conf:2: element cluster: Relax-NG validity > error : Element cluster failed to validate content > /etc/cluster/cluster.conf fails to validate > > Can anyone tell me what this is and how to correct? Thanks! > > Gary Romo > Assuming you have a functional cluster: Somebody-maybe you or another admin-used luci to modify the cluster. s-c-cluster uses an older XML or doesn't perfectly validate luci's version. I ignore the validation error and have yet to see any fallout. hth, -paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gary_Hunt at gallup.com Wed Jan 7 21:46:52 2009 From: Gary_Hunt at gallup.com (Hunt, Gary) Date: Wed, 7 Jan 2009 15:46:52 -0600 Subject: [Linux-cluster] DELL M600 fencing Message-ID: Hello New to this list and am trying to get a cluster up and running. I noticed someone added support to the fence_drac agent to support the Dell CMC. Could I get a link to the repository where the patched agent is at? Thanks Gary Hunt -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpeterso at redhat.com Wed Jan 7 21:57:16 2009 From: rpeterso at redhat.com (Bob Peterson) Date: Wed, 7 Jan 2009 16:57:16 -0500 (EST) Subject: [Linux-cluster] system-config-cluster Error In-Reply-To: Message-ID: <1034294076.526991231365436502.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> ----- "Gary Romo" wrote: | When I opened system-config-cluster today, I got this error; | | Poorly Formed XML Error | | A problem was encountered while reading configuration file | /etc/cluster/cluster.conf | Details or the error appear below. Click the `New` button to create a | new configuration file. | To continue anyway (Not recommended), click the `Ok` button | | Relax-NG validity error : Extra element rm in interleave | /etc/cluster/cluster.conf:2: element cluster: Relax-NG validity error | : Element cluster failed to validate content | /etc/cluster/cluster.conf fails to validate | | Can anyone tell me what this is and how to correct? Thanks! | | Gary Romo Hi Gary, Could it be: http://sources.redhat.com/cluster/wiki/FAQ/GUI#gui_validityerror Without seeing your cluster.conf it's hard to tell if it's a "real" error. Regards, Bob Peterson Red Hat GFS From garromo at us.ibm.com Wed Jan 7 23:48:37 2009 From: garromo at us.ibm.com (Gary Romo) Date: Wed, 7 Jan 2009 16:48:37 -0700 Subject: [Linux-cluster] system-config-cluster Error In-Reply-To: <07646F01-ED12-430D-97AF-5CDCD33CDC7D@gmail.com> Message-ID: Paul Morgan To Sent by: linux clustering linux-cluster-bou nces at redhat.com cc Subject 01/07/2009 02:06 Re: [Linux-cluster] PM system-config-cluster Error Please respond to linux clustering On Jan 7, 2009, at 14:39, Gary Romo wrote: When I opened system-config-cluster today, I got this error; Poorly Formed XML Error A problem was encountered while reading configuration file /etc/cluster/cluster.conf Details or the error appear below. Click the `New` button to create a new configuration file. To continue anyway (Not recommended), click the `Ok` button Relax-NG validity error : Extra element rm in interleave /etc/cluster/cluster.conf:2: element cluster: Relax-NG validity error : Element cluster failed to validate content /etc/cluster/cluster.conf fails to validate, Can anyone tell me what this is and how to correct? Thanks! Gary Romo Assuming you have a functional cluster: Somebody-maybe you or another admin-used luci to modify the cluster. s-c-cluster uses an older XML or doesn't perfectly validate luci's version. I ignore the validation error and have yet to see any fallout. hth, -paul -- We do have a functional cluster. luci was used. As long as there us no fallout. Thank you for your explination! Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pic04890.gif Type: image/gif Size: 1255 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From stewart at epits.com.au Thu Jan 8 02:42:29 2009 From: stewart at epits.com.au (Stewart Walters) Date: Thu, 08 Jan 2009 11:42:29 +0900 Subject: [Linux-cluster] cman_tool nodes shows different Inc numbers; should I be concerned? Message-ID: <49656815.6070000@epits.com.au> Hello List Members, I've just joined, so please forgive me in advance if I break some list etiquette :-) I have a two node cluster (RHEL5) whereby running "cman_tool nodes" on each node net's the following results: [root at node01 ~]# cman_tool nodes Node Sts Inc Joined Name 1 M 512 2009-01-08 10:59:53 node01.example.com 2 M 516 2009-01-08 10:59:54 node02.example.com [root at node02 ~]# cman_tool nodes Node Sts Inc Joined Name 1 M 516 2009-01-08 10:59:53 node01.example.com 2 M 504 2009-01-08 10:35:59 node02.example.com As you can see the "Inc" numbers are seen as different from both nodes. First off, should I be concerned that they are different? And secondly, what is the Inc number signify anyway? The man page for cman_tool doesn't directly describe what an Inc number is for. I think in my travels in trying to answer this question I found a vague reference to the fact that it's something to do with openais, but I wouldn't mind if someone could confirm this and/or hit me over the head with the clue stick. From manpage for cman_tools: Example: In this example we have a five node cluster that has experienced a net- work partition. Here is the output of cman_tool nodes from all systems: Node Sts Inc Joined Name 1 M 2372 2007-11-05 02:58:55 node-01.example.com 2 d 2376 2007-11-05 02:58:56 node-02.example.com 3 d 2376 2007-11-05 02:58:56 node-03.example.com 4 M 2376 2007-11-05 02:58:56 node-04.example.com 5 M 2376 2007-11-05 02:58:56 node-05.example.com Node Sts Inc Joined Name 1 d 2372 2007-11-05 02:58:55 node-01.example.com 2 M 2376 2007-11-05 02:58:56 node-02.example.com 3 M 2376 2007-11-05 02:58:56 node-03.example.com 4 d 2376 2007-11-05 02:58:56 node-04.example.com 5 d 2376 2007-11-05 02:58:56 node-05.example.com Node Sts Inc Joined Name 1 d 2372 2007-11-05 02:58:55 node-01.example.com 2 M 2376 2007-11-05 02:58:56 node-02.example.com 3 M 2376 2007-11-05 02:58:56 node-03.example.com 4 d 2376 2007-11-05 02:58:56 node-04.example.com 5 d 2376 2007-11-05 02:58:56 node-05.example.com Node Sts Inc Joined Name 1 M 2372 2007-11-05 02:58:55 node-01.example.com 2 d 2376 2007-11-05 02:58:56 node-02.example.com 3 d 2376 2007-11-05 02:58:56 node-03.example.com 4 M 2376 2007-11-05 02:58:56 node-04.example.com 5 M 2376 2007-11-05 02:58:56 node-05.example.com Node Sts Inc Joined Name 1 M 2372 2007-11-05 02:58:55 node-01.example.com 2 d 2376 2007-11-05 02:58:56 node-02.example.com 3 d 2376 2007-11-05 02:58:56 node-03.example.com 4 M 2376 2007-11-05 02:58:56 node-04.example.com 5 M 2376 2007-11-05 02:58:56 node-05.example.com At least in the man page example, node-01 consistently has Inc number 2372, as seen consistently from all nodes. But as you can see in my cluster, both nodes register a different Inc number for themselves and the other. Thanks in advance for any information you can provide me regarding this. Kind Regards, Stewart From ccaulfie at redhat.com Thu Jan 8 08:25:56 2009 From: ccaulfie at redhat.com (Chrissie Caulfield) Date: Thu, 08 Jan 2009 08:25:56 +0000 Subject: [Linux-cluster] cman_tool nodes shows different Inc numbers; should I be concerned? In-Reply-To: <49656815.6070000@epits.com.au> References: <49656815.6070000@epits.com.au> Message-ID: <4965B894.7080807@redhat.com> Stewart Walters wrote: > Hello List Members, > > I've just joined, so please forgive me in advance if I break some list > etiquette :-) > > I have a two node cluster (RHEL5) whereby running "cman_tool nodes" on > each node net's the following results: > > [root at node01 ~]# cman_tool nodes > Node Sts Inc Joined Name > 1 M 512 2009-01-08 10:59:53 node01.example.com > 2 M 516 2009-01-08 10:59:54 node02.example.com > > [root at node02 ~]# cman_tool nodes > Node Sts Inc Joined Name > 1 M 516 2009-01-08 10:59:53 node01.example.com > 2 M 504 2009-01-08 10:35:59 node02.example.com > > As you can see the "Inc" numbers are seen as different from both nodes. > > First off, should I be concerned that they are different? No, it's perfectly normal that they are different. > And secondly, what is the Inc number signify anyway? The man page for > cman_tool doesn't directly describe what an Inc number is for. I think > in my travels in trying to answer this question I found a vague > reference to the fact that it's something to do with openais, but I > wouldn't mind if someone could confirm this and/or hit me over the head > with the clue stick. > Inc is the cluster incarnation number at the time the node joined. It's a totally pointless piece of information that I think we'll remove in future releases ;-) Chrissie From Brett.Dellegrazie at intact-is.com Thu Jan 8 11:48:49 2009 From: Brett.Dellegrazie at intact-is.com (Brett Delle Grazie) Date: Thu, 8 Jan 2009 11:48:49 -0000 Subject: [Linux-cluster] Load share http servers using clusterip with failover Message-ID: Hi, I have a configured two-node cluster with some GFS file systems on them. Those servers also run http servers and I'd like to load-share the HTTP servers without putting a hardware load balancer in front of them. I read about clusterIP: http://www.linux-ha.org/ClusterIP and was wondering if anyone has managed to use this iptables capability to get a service running in load-shared fashion across multiple nodes with the failover of a node handled by rgmanager? Is there an example of this anywhere that someone could point me to? Has anyone got a resource script of this type they would be willing to share? Thanks in advance, Best regards, Brett ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From td3201 at gmail.com Thu Jan 8 12:07:13 2009 From: td3201 at gmail.com (Terry) Date: Thu, 8 Jan 2009 06:07:13 -0600 Subject: [Linux-cluster] failover domain not working as expected Message-ID: <8ee061010901080407j3b4162e5r308f965da80cf62a@mail.gmail.com> Hello, I have an NFS cluster that isn't quite working as expected. I intend to distribute several volumes between both nodes of my cluster and in the event one node goes down, the other picks up the full load. I had a situation where I had to reboot one of the nodes. I did so and all the services were restarted on the other node, which is great. Then, after a minute or so, some of the services stopped and stayed stopped. Here are some relevant parts of my config, anyone see anything unusual:? From jeff.jansen at kkoncepts.net Thu Jan 8 13:22:59 2009 From: jeff.jansen at kkoncepts.net (Jeff Jansen) Date: Thu, 08 Jan 2009 21:22:59 +0800 Subject: [Linux-cluster] Qdisk in initial quorum Message-ID: <4965FE33.10509@kkoncepts.net> Is it possible to use a qdisk to ATTAIN quorum or does it only SUSTAIN quorum? I have a STABLE2 version 2 node cluster that is set up with 'expected_votes="3"'. There are two physical nodes and a qdisk, which at the moment is simply a ping heuristic. But on start-up qdiskd can't run unless the cluster already has a quorum. I see this in the logs when qdiskd is started: qdiskd[2624]: Connection to CCSD failed; cannot start qdiskd[2624]: Configuration failed ccsd[3258]: Cluster is not quorate. Refusing connection. ccsd[3258]: Error while processing connect: Connection refused Once the two nodes join together and form a quorum, then qdiskd (if it's restarted) will start correctly on both nodes and becomes part of the quorum. >From then everything happens as expected and one node can maintain quorum as long as it can "see" the qdisk. But I'd like the qdisk to be used to ATTAIN quorum at start up if necessary. If the whole cluster gets shut down (which actually happened a while ago when our data center had a "power incident") :-) and only one node boots back up for some reason, then I'd like it to form a quorum with the qdisk. But at the moment it doesn't seem possible since qdiskd refuses to start without a pre-existing quorum. TIA Jeff Jansen From pradhanparas at gmail.com Thu Jan 8 18:39:10 2009 From: pradhanparas at gmail.com (Paras pradhan) Date: Thu, 8 Jan 2009 12:39:10 -0600 Subject: [Linux-cluster] Re: Fencing test In-Reply-To: <8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com> References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com> <8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com> <8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com> <8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com> <8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com> <8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com> <8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com> <8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com> Message-ID: <8b711df40901081039m4351f8b9te7d3a2a10e118328@mail.gmail.com> On Mon, Jan 5, 2009 at 12:11 PM, Paras pradhan wrote: > hi, > > On Mon, Jan 5, 2009 at 8:23 AM, Rajagopal Swaminathan > wrote: >> Greetings, >> >> On Sat, Jan 3, 2009 at 4:18 AM, Paras pradhan wrote: >>> >>> Here I am using 4 nodes. >>> >>> Node 1) That runs luci >>> Node 2) This is my iscsi shared storage where my virutal machine(s) resides >>> Node 3) First node in my two node cluster >>> Node 4) Second node in my two node cluster >>> >>> All of them are connected simply to an unmanaged 16 port switch. >> >> Luci need not require a separate node to run. it can run on one of the >> member nodes (node 3 | 4). > > OK. > >> >> what does clustat say? > > Here is my clustat o/p: > > ----------- > > [root at ha1lx ~]# clustat > Cluster Status for ipmicluster @ Mon Jan 5 12:00:10 2009 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > 10.42.21.29 1 > Online, rgmanager > 10.42.21.27 2 > Online, Local, rgmanager > > Service Name > Owner (Last) State > ------- ---- > ----- ------ ----- > vm:linux64 > 10.42.21.27 > started > [root at ha1lx ~]# > ------------------------ > > > 10.42.21.27 is node3 and 10.42.21.29 is node4 > > > >> >> Can you post your cluster.conf here? > > Here is my cluster.conf > > -- > [root at ha1lx cluster]# more cluster.conf > > > > > > > > > > > > > > > > > > > > > > login="admin" name="fence1" passwd="admin"/> > login="admin" name="fence2" passwd="admin"/> > > > > > > > > > > name="linux64" path="/guest_roots" recovery="restart"/> > > > ------ > > > Here: > > 10.42.21.28 is IPMI interface in node3 > 10.42.21.30 is IPMI interface in node4 > > > > > > > > >> >> When you pull out the network cable *and* plug it back in say node 3, >> , what messages appear in the /var/log/messages if Node 4 (if any)? >> (sorry for the repitition, but messages are necessary here to make any >> sense of the situation) >> > > Ok here is the log in node 4 after i disconnect the network cable in node3. > > ----------- > > Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] The token was lost in the > OPERATIONAL state. > Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] Receive multicast socket > recv buffer size (288000 bytes). > Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] Transmit multicast socket > send buffer size (262142 bytes). > Jan 5 12:05:24 ha2lx openais[4988]: [TOTEM] entering GATHER state from 2. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering GATHER state from 0. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Creating commit token > because I am the rep. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Saving state aru 76 high > seq received 76 > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Storing new sequence id > for ring ac > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering COMMIT state. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering RECOVERY state. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] position [0] member 10.42.21.29: > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] previous ring seq 168 rep > 10.42.21.27 > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] aru 76 high delivered 76 > received flag 1 > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Did not need to originate > any messages in recovery. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] Sending initial ORF token > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] New Configuration: > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29) > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Left: > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27) > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Joined: > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE > Jan 5 12:05:28 ha2lx kernel: dlm: closing connection to node 2 > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] New Configuration: > Jan 5 12:05:28 ha2lx fenced[5004]: 10.42.21.27 not a cluster member > after 0 sec post_fail_delay > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29) > Jan 5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Trying to acquire journal lock... > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Left: > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] Members Joined: > Jan 5 12:05:28 ha2lx openais[4988]: [SYNC ] This node is within the > primary component and will provide service. > Jan 5 12:05:28 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state. > Jan 5 12:05:28 ha2lx openais[4988]: [CLM ] got nodejoin message 10.42.21.29 > Jan 5 12:05:28 ha2lx openais[4988]: [CPG ] got joinlist message from node 1 > Jan 5 12:05:28 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Looking at journal... > Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Acquiring the transaction lock... > Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Replaying journal... > Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Replayed 0 of 0 blocks > Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Found 0 revoke tags > Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: > jid=1: Journal replayed in 1s > Jan 5 12:05:29 ha2lx kernel: GFS2: fsid=ipmicluster:guest_roots.0: jid=1: Done > ------------------ > > Now when I plug back my cable to node3, node 4 reboots and here is the > quickly grabbed log in node4 > > > -- > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering GATHER state from 11. > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Saving state aru 1d high > seq received 1d > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Storing new sequence id > for ring b0 > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering COMMIT state. > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering RECOVERY state. > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] position [0] member 10.42.21.27: > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep > 10.42.21.27 > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 16 high delivered 16 > received flag 1 > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] position [1] member 10.42.21.29: > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] previous ring seq 172 rep > 10.42.21.29 > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] aru 1d high delivered 1d > received flag 1 > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] Did not need to originate > any messages in recovery. > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] New Configuration: > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29) > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Left: > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Joined: > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] CLM CONFIGURATION CHANGE > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] New Configuration: > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27) > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.29) > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Left: > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] Members Joined: > Jan 5 12:07:12 ha2lx openais[4988]: [CLM ] r(0) ip(10.42.21.27) > Jan 5 12:07:12 ha2lx openais[4988]: [SYNC ] This node is within the > primary component and will provide service. > Jan 5 12:07:12 ha2lx openais[4988]: [TOTEM] entering OPERATIONAL state. > Jan 5 12:07:12 ha2lx openais[4988]: [MAIN ] Killing node 10.42.21.27 > because it has rejoined the cluster with existing state > Jan 5 12:07:12 ha2lx openais[4988]: [CMAN ] cman killed by node 2 > because we rejoined the cluster without a full restart > Jan 5 12:07:12 ha2lx gfs_controld[5016]: groupd_dispatch error -1 errno 11 > Jan 5 12:07:12 ha2lx gfs_controld[5016]: groupd connection died > Jan 5 12:07:12 ha2lx gfs_controld[5016]: cluster is down, exiting > Jan 5 12:07:12 ha2lx dlm_controld[5010]: cluster is down, exiting > Jan 5 12:07:12 ha2lx kernel: dlm: closing connection to node 1 > Jan 5 12:07:12 ha2lx fenced[5004]: cluster is down, exiting > ------- > > > Also here is the log of node3: > > -- > [root at ha1lx ~]# tail -f /var/log/messages > Jan 5 12:07:24 ha1lx openais[26029]: [TOTEM] entering OPERATIONAL state. > Jan 5 12:07:24 ha1lx openais[26029]: [CLM ] got nodejoin message 10.42.21.27 > Jan 5 12:07:24 ha1lx openais[26029]: [CLM ] got nodejoin message 10.42.21.27 > Jan 5 12:07:24 ha1lx openais[26029]: [CPG ] got joinlist message from node 2 > Jan 5 12:07:27 ha1lx ccsd[26019]: Attempt to close an unopened CCS > descriptor (4520670). > Jan 5 12:07:27 ha1lx ccsd[26019]: Error while processing disconnect: > Invalid request descriptor > Jan 5 12:07:27 ha1lx fenced[26045]: fence "10.42.21.29" success > Jan 5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: > jid=0: Trying to acquire journal lock... > Jan 5 12:07:27 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: > jid=0: Looking at journal... > Jan 5 12:07:28 ha1lx kernel: GFS2: fsid=ipmicluster:guest_roots.1: jid=0: Done > ---------------- > > > > > > > > > > > > >> HTH >> >> With warm regards >> >> Rajagopal >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > Thanks a lot > > Paras. > In an act to solve my fencing issue in my 2 node cluster, i tried to run fence_ipmi to check if fencing is working or not. I need to know what is my problem - [root at ha1lx ~]# fence_ipmilan -a 10.42.21.28 -o off -l admin -p admin Powering off machine @ IPMI:10.42.21.28...ipmilan: Failed to connect after 30 seconds Failed [root at ha1lx ~]# --------------- Here 10.42.21.28 is an IP address assigned to IPMI interface and I am running this command in the same host. Thanks Paras. From Bevan.Broun at ardec.com.au Thu Jan 8 22:32:53 2009 From: Bevan.Broun at ardec.com.au (Bevan Broun) Date: Fri, 9 Jan 2009 09:32:53 +1100 Subject: [Linux-cluster] Qdisk in initial quorum In-Reply-To: <4965FE33.10509@kkoncepts.net> References: <4965FE33.10509@kkoncepts.net> Message-ID: <6008E5CED89FD44A86D3C376519E1DB2102553963B@megatron.ms.a2end.com> Hi Jeff I set up a 2 node cluster with qdisk and had the behavior you are expecting. At least I get a running cluster with 2 votes when only 1 node is booted up. So it should work. I have And This is on RH-5.1. Bevan Broun Solutions Architect Ardec International http://www.ardec.com.au http://www.lisasoft.com http://www.terrapages.com Sydney ----------------------- Suite 112,The Lower Deck 19-21 Jones Bay Wharf Pirrama Road, Pyrmont 2009 Ph: +61 2 8570 5000 Fax: +61 2 8570 5099 -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Jansen Sent: Friday, 9 January 2009 12:23 AM To: linux clustering Subject: [Linux-cluster] Qdisk in initial quorum Is it possible to use a qdisk to ATTAIN quorum or does it only SUSTAIN quorum? I have a STABLE2 version 2 node cluster that is set up with 'expected_votes="3"'. There are two physical nodes and a qdisk, which at the moment is simply a ping heuristic. But on start-up qdiskd can't run unless the cluster already has a quorum. I see this in the logs when qdiskd is started: qdiskd[2624]: Connection to CCSD failed; cannot start qdiskd[2624]: Configuration failed ccsd[3258]: Cluster is not quorate. Refusing connection. ccsd[3258]: Error while processing connect: Connection refused Once the two nodes join together and form a quorum, then qdiskd (if it's restarted) will start correctly on both nodes and becomes part of the quorum. >From then everything happens as expected and one node can maintain quorum as long as it can "see" the qdisk. But I'd like the qdisk to be used to ATTAIN quorum at start up if necessary. If the whole cluster gets shut down (which actually happened a while ago when our data center had a "power incident") :-) and only one node boots back up for some reason, then I'd like it to form a quorum with the qdisk. But at the moment it doesn't seem possible since qdiskd refuses to start without a pre-existing quorum. TIA Jeff Jansen -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence. From raju.rajsand at gmail.com Fri Jan 9 04:57:51 2009 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Fri, 9 Jan 2009 10:27:51 +0530 Subject: [Linux-cluster] Re: Fencing test In-Reply-To: <8b711df40901081039m4351f8b9te7d3a2a10e118328@mail.gmail.com> References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com> <8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com> <8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com> <8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com> <8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com> <8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com> <8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com> <8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com> <8b711df40901081039m4351f8b9te7d3a2a10e118328@mail.gmail.com> Message-ID: <8786b91c0901082057t63abc80ct6ae041873a859bf@mail.gmail.com> Greetings, On Fri, Jan 9, 2009 at 12:09 AM, Paras pradhan wrote: > > > In an act to solve my fencing issue in my 2 node cluster, i tried to > run fence_ipmi to check if fencing is working or not. I need to know > what is my problem > > - > [root at ha1lx ~]# fence_ipmilan -a 10.42.21.28 -o off -l admin -p admin > Powering off machine @ IPMI:10.42.21.28...ipmilan: Failed to connect > after 30 seconds > Failed > [root at ha1lx ~]# > --------------- > > > Here 10.42.21.28 is an IP address assigned to IPMI interface and I am > running this command in the same host. > Sorry couldn't respond earlier as I do this on personal time (which as useual limited for us IT guys and gals ;-) ) and not during work per se.. Do not run fence script from the node that you want to fence. Let us say you want to fence node 3. 1. Try pinging the node 3's IPMI from node 4. I should be successful 2. Issue the fence command from Node 4 with IP of Node 3 IPMI as argument . HTH With warm regards Rajagopal From pradhanparas at gmail.com Fri Jan 9 05:22:34 2009 From: pradhanparas at gmail.com (Paras pradhan) Date: Thu, 8 Jan 2009 23:22:34 -0600 Subject: [Linux-cluster] Re: Fencing test In-Reply-To: <8786b91c0901082057t63abc80ct6ae041873a859bf@mail.gmail.com> References: <8b711df40812301514u3ff824f0wcc16e293fdc581fd@mail.gmail.com> <8b711df40812301526ne581071xd322f6c869955de9@mail.gmail.com> <8786b91c0812302229x115fcb1fse7f3ffe14bb8bbb3@mail.gmail.com> <8b711df40812310900m708256c7n1052df04b1cf0826@mail.gmail.com> <8786b91c0901012149x11805301v8ccf47346cc83b70@mail.gmail.com> <8b711df40901021448s7bfa3693kafb7f5082c30871e@mail.gmail.com> <8786b91c0901050623m46c79628i795e18dda28474c9@mail.gmail.com> <8b711df40901051011x79066243g38108439ffb1075f@mail.gmail.com> <8b711df40901081039m4351f8b9te7d3a2a10e118328@mail.gmail.com> <8786b91c0901082057t63abc80ct6ae041873a859bf@mail.gmail.com> Message-ID: <8b711df40901082122r5de5b6candd56b61090fdc53a@mail.gmail.com> On Thu, Jan 8, 2009 at 10:57 PM, Rajagopal Swaminathan wrote: > Greetings, > > On Fri, Jan 9, 2009 at 12:09 AM, Paras pradhan wrote: >> >> >> In an act to solve my fencing issue in my 2 node cluster, i tried to >> run fence_ipmi to check if fencing is working or not. I need to know >> what is my problem >> >> - >> [root at ha1lx ~]# fence_ipmilan -a 10.42.21.28 -o off -l admin -p admin >> Powering off machine @ IPMI:10.42.21.28...ipmilan: Failed to connect >> after 30 seconds >> Failed >> [root at ha1lx ~]# >> --------------- >> >> >> Here 10.42.21.28 is an IP address assigned to IPMI interface and I am >> running this command in the same host. >> > > Sorry couldn't respond earlier as I do this on personal time (which as > useual limited for us IT guys and gals ;-) ) and not during work per > se.. > > Do not run fence script from the node that you want to fence. > > Let us say you want to fence node 3. > 1. Try pinging the node 3's IPMI from node 4. I should be successful > 2. Issue the fence command from Node 4 with IP of Node 3 IPMI as argument . > > > HTH > > With warm regards > > Rajagopal > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > Thanks will try that. Did u get a chance to see my cluster.conf file? Paras. From chattygk at gmail.com Fri Jan 9 09:40:04 2009 From: chattygk at gmail.com (Chaitanya Kulkarni) Date: Fri, 9 Jan 2009 15:10:04 +0530 Subject: [Linux-cluster] About the ccs_test tool Message-ID: <1ad236320901090140t294c2468w3b42cbfa7bfb7347@mail.gmail.com> Hi, I am new to the RHEL cluster. I would like to know how we can write queries for the ccs_test tool and how they actually fetch the information from the cluster. Any help would be much appreciated. Thanks, Chaitanya. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chattygk at gmail.com Fri Jan 9 09:43:33 2009 From: chattygk at gmail.com (Chaitanya Kulkarni) Date: Fri, 9 Jan 2009 15:13:33 +0530 Subject: [Linux-cluster] Resource State Message-ID: <1ad236320901090143g23043925n52e5ca7855a95149@mail.gmail.com> Hi, When we use the clustat command, we get to know about the Status of the cluster Service (or resource group). In similar way, is there any CLI command using which we can get to know about the Status of the Resources (ip, fs, nfsexport, script, etc) of the Service? Any help will be much appreciated. Thanks, Chaitanya -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alain.Moulle at bull.net Fri Jan 9 10:47:02 2009 From: Alain.Moulle at bull.net (Alain.Moulle) Date: Fri, 09 Jan 2009 11:47:02 +0100 Subject: [Linux-cluster] cman-2.0.95-1.el5 / question about a problem when launching cman Message-ID: <49672B26.4020306@bull.net> Hi Release : cman-2.0.95-1.el5 (but same problem with 2.0.98) I face a problem when launching cman on a two-node cluster : 1. Launching cman on node 1 : OK 2. When launching cman on node 2, the log on node1 gives : cman killed by node 2 because we rejoined the cluster without a full restart Any idea ? knowing that my cluster.conf is likewise (note the use of gfs if it could be linked to ...) :