From mcollins at flmnh.ufl.edu Tue Dec 1 21:38:23 2009 From: mcollins at flmnh.ufl.edu (Matthew Collins) Date: Tue, 01 Dec 2009 16:38:23 -0500 Subject: [Linux-cluster] Limiting the number of VMs that start at once Message-ID: <4B158CCF.3040801@flmnh.ufl.edu> Is there a structure for staggering the starting of resources when failing over to another node? The problem I'm having is that when one node fails and its Xen VMs start on another node in the failover domain, that second node's load is so high it can't respond to tokens or qdisk requests in a timely fashion and it gets fenced. This is kind of specific to VM resources which have high startup costs so I was going to hack the vm.sh script. Does anyone have a better idea? Would anyone want my hacks when I'm done? -- Matt Collins Systems Administrator Florida Museum of Natural History From rmicmirregs at gmail.com Tue Dec 1 23:33:00 2009 From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda) Date: Wed, 02 Dec 2009 00:33:00 +0100 Subject: [Linux-cluster] Qdisk with multiple heuristics? Message-ID: <1259710380.6571.14.camel@mecatol> Hi all, As it can be found in qdiskd man page, it is allowed to use up to 10 different heuristics in one cluster. How is this specified into cluster.conf? I'm trying to make it work with the following piece of cluster.conf file: My objective is to have 2 (or more) different heuristics which keep this node alive even if only one heuristic is OK. The cluster.conf file was created with system-config-cluster and later was edited by hand. The qdisk and heuristics are not working: 1.- system-config-cluster shows me a warning about an error related to some options not allowed into quorumd. I'm sorry i cannot be more specific right now, I could attach the exact message tomorrow. 2.- The cluster is operational, but using "clustat" i don't see the qdisk with its votes in the node list. The qdisk process is neither shown in the process list on the system. Is there somethin wrong? I'm using RHEL5.3 with: cman-2.0.98-1.el5.x86_64 openais-0.80.3-22.el5.x86_64 rgmanager-2.0.46-1.el5.x86_64 Thanks in advance. Cheers, Rafael -- Rafael Mic? Miranda From maniac.nl at gmail.com Wed Dec 2 10:09:48 2009 From: maniac.nl at gmail.com (Mark Janssen) Date: Wed, 2 Dec 2009 11:09:48 +0100 Subject: [Linux-cluster] GFS - Small files - Performance In-Reply-To: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com> References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com> Message-ID: <531e3e4c0912020209q5a617654g20daba32154c79b2@mail.gmail.com> 2009/11/30 Leonardo D'Angelo Gon?alves : > Hi > > I have a GFS cluster on RHEL4.8 which one filesystem (10G) with? various > directories and sub-directories and small files about 5Kb. When I run the > command "du-sh" in the directory it generates about 1500 IOPS on the disks, > for GFS it takes time about 5 minutes and 2 second for ext3 filesyem. Could > someone help me with this problem. follows below the output of gfs_tool > Why for GFS it takes 5 minutes and ext3 2 seconds ? Is there any relation ? Try setting statfs_fast to '1'. This should speed up commands like 'df'. gfs_tool settune statfs_fast 1 Do note that when you resize your filesystem you have to turn it back off, and then back on again to update the size of your filesystem. -- Mark Janssen -- maniac(at)maniac.nl -- pgp: 0x357D2178 | ,''`. | Unix / Linux Open-Source and Internet Consultant @ Snow.nl | : :' : | Maniac.nl MarkJanssen.nl NerdNet.nl Unix.nl | `. `' | Skype: markmjanssen ICQ: 129696007 irc: FooBar on undernet | `- | From brem.belguebli at gmail.com Wed Dec 2 10:49:30 2009 From: brem.belguebli at gmail.com (brem belguebli) Date: Wed, 2 Dec 2009 11:49:30 +0100 Subject: [Linux-cluster] Qdisk with multiple heuristics? In-Reply-To: <1259710380.6571.14.camel@mecatol> References: <1259710380.6571.14.camel@mecatol> Message-ID: <29ae894c0912020249q198d8410sc9007a6398757ea4@mail.gmail.com> Hi Rafael, Concerning your second point, have you initialized your /dev/mpath/quorum device with mkqdisk ? Also, the qdisk daemon must be running if you want it to be operationnal in your cluster. In my setup, everything is started manually, no automatic boot time cluster start (safest option IMHO), and I use the following stepping: 1) start qdisk (service qdiskd start) 2) start cman (service cman start) 3) start rgmanager (service rgmanager start) 4) wait untill the cluster is quorate (a shell loop) before starting clvmd 5) start clvmd Output of clustat: Cluster Status for rhcl1 @ Wed Dec 2 11:20:20 2009 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1.mydom 1 Online, Local, rgmanager node2.mydom 2 Online, rgmanager node3.mydom 3 Online, rgmanager /dev/iscsi/storage.quorum 0 Online, Quorum Disk <-- Qorum disk started... Service Name Owner (Last) State .... [root at node1 ~]# ps -edf | grep qdisk root 4409 1 0 Nov26 ? 00:04:00 qdiskd -Q Concerning your point 1, you may address this by giving a different score to each heuristic, but I clearly don't know if this is what it intends to. Brem Regards 2009/12/2 Rafael Mic? Miranda : > Hi all, > > As it can be found in qdiskd man page, it is allowed to use up to 10 > different heuristics in one cluster. > > How is this specified into cluster.conf? I'm trying to make it work with > the following piece of cluster.conf file: > > votes="3"> > ? ? ? ? > ? ? ? ? score="1"/> > > > My objective is to have 2 (or more) different heuristics which keep this > node alive even if only one heuristic is OK. The cluster.conf file was > created with system-config-cluster and later was edited by hand. > > The qdisk and heuristics are not working: > 1.- system-config-cluster shows me a warning about an error related to > some options not allowed into quorumd. I'm sorry i cannot be more > specific right now, I could attach the exact message tomorrow. > > 2.- The cluster is operational, but using "clustat" i don't see the > qdisk with its votes in the node list. The qdisk process is neither > shown in the process list on the system. > > Is there somethin wrong? > > I'm using RHEL5.3 with: > cman-2.0.98-1.el5.x86_64 > openais-0.80.3-22.el5.x86_64 > rgmanager-2.0.46-1.el5.x86_64 > > > Thanks in advance. Cheers, > > Rafael > > > -- > Rafael Mic? Miranda > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From leonardodg2084 at gmail.com Wed Dec 2 10:50:56 2009 From: leonardodg2084 at gmail.com (=?ISO-8859-1?Q?Leonardo_D=27Angelo_Gon=E7alves?=) Date: Wed, 2 Dec 2009 08:50:56 -0200 Subject: [Linux-cluster] GFS - Small files - Performance In-Reply-To: <531e3e4c0912020209q5a617654g20daba32154c79b2@mail.gmail.com> References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com> <531e3e4c0912020209q5a617654g20daba32154c79b2@mail.gmail.com> Message-ID: <3170ac020912020250l56177bd4p420e3e714756c3dd@mail.gmail.com> Hi.. So.. I set up this configuration but, don`t resolve my problem. ilimit1 = 100 ilimit1_tries = 3 ilimit1_min = 1 ilimit2 = 500 ilimit2_tries = 10 ilimit2_min = 3 demote_secs = 100 incore_log_blocks = 1024 jindex_refresh_secs = 60 depend_secs = 60 scand_secs = 3 recoverd_secs = 60 logd_secs = 1 quotad_secs = 5 inoded_secs = 15 glock_purge = 50 quota_simul_sync = 64 quota_warn_period = 10 atime_quantum = 3600 quota_quantum = 60 quota_scale = 1.0000 (1, 1) quota_enforce = 1 quota_account = 1 new_files_jdata = 0 new_files_directio = 0 max_atomic_write = 4194304 max_readahead = 262144 lockdump_size = 131072 stall_secs = 600 complain_secs = 10 reclaim_limit = 5000 entries_per_readdir = 32 prefetch_secs = 10 statfs_slots = 64 max_mhc = 10000 greedy_default = 100 greedy_quantum = 25 greedy_max = 250 rgrp_try_threshold = 100 statfs_fast = 1 seq_readahead = 0 2009/12/2 Mark Janssen > 2009/11/30 Leonardo D'Angelo Gon?alves : > > Hi > > > > I have a GFS cluster on RHEL4.8 which one filesystem (10G) with various > > directories and sub-directories and small files about 5Kb. When I run the > > command "du-sh" in the directory it generates about 1500 IOPS on the > disks, > > for GFS it takes time about 5 minutes and 2 second for ext3 filesyem. > Could > > someone help me with this problem. follows below the output of gfs_tool > > Why for GFS it takes 5 minutes and ext3 2 seconds ? Is there any relation > ? > > Try setting statfs_fast to '1'. This should speed up commands like 'df'. > > gfs_tool settune statfs_fast 1 > > Do note that when you resize your filesystem you have to turn it back > off, and then back on again to update the size of your filesystem. > > -- > Mark Janssen -- maniac(at)maniac.nl -- pgp: 0x357D2178 | ,''`. | > Unix / Linux Open-Source and Internet Consultant @ Snow.nl | : :' : | > Maniac.nl MarkJanssen.nl NerdNet.nl Unix.nl | `. `' | > Skype: markmjanssen ICQ: 129696007 irc: FooBar on undernet | `- | > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From frank at si.ct.upc.edu Wed Dec 2 11:53:46 2009 From: frank at si.ct.upc.edu (frank) Date: Wed, 02 Dec 2009 12:53:46 +0100 Subject: [Linux-cluster] GFS performance test Message-ID: <4B16554A.50002@si.ct.upc.edu> Hi, after seeing some posts related to GFS performance, we have decided to test our two-node GFS filesystem with ping_pong program. We are worried about the results. Running the program in only one node, without parameters, we get between 800000 locks/sec and 900000 locks/sec Running the program in both nodes over the same file on the shared filesystem, the lock rate did not drop and it is the same in both nodes! What does this mean? Is there any problem with locks ? Just for you info, GFS filesystem is /mnt/gfs and what I run in both nodes is: ./ping_pong /mnt/gfs/tmp/test.dat 3 Thanks for your help. Frank -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que est? net. For all your IT requirements visit: http://www.transtec.co.uk From dan at quah.ro Wed Dec 2 12:09:20 2009 From: dan at quah.ro (Dan Candea) Date: Wed, 2 Dec 2009 14:09:20 +0200 Subject: [Linux-cluster] gfs2 assertion "!mapping->nrpages" failed on rsync In-Reply-To: <531e3e4c0912020209q5a617654g20daba32154c79b2@mail.gmail.com> References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com> <531e3e4c0912020209q5a617654g20daba32154c79b2@mail.gmail.com> Message-ID: <200912021409.21001.dan@quah.ro> hello randomly , during a nightly backup with rsync I receive the error below on a 3 node setup with cluster2. because of the withdraw I can't unmount without a reboot. does someone have a clue? GFS2: fsid=data:FSdata.0: fatal: assertion "!mapping->nrpages" failed GFS2: fsid=data:FSdata.0: function = gfs2_meta_inval, file = fs/gfs2/meta_io.c, line = 110 GFS2: fsid=data:FSdata.0: about to withdraw this file system GFS2: fsid=data:FSdata.0: telling LM to withdraw GFS2: fsid=data:FSdata.0: withdrawn Pid: 4643, comm: glock_workqueue Not tainted 2.6.28-hardened-r9 #1 Call Trace: [] 0xffffffffa008e4ea [] 0xffffffff8025ecee [] 0xffffffffa0091307 [] 0xffffffffa008f640 [] 0xffffffffa000fc18 [] 0xffffffffa000bfe8 [] 0xffffffff8022605c [] 0xffffffffa008f060 [] 0xffffffffa008e5cb [] 0xffffffffa00912f3 [] 0xffffffffa0077a9b [] 0xffffffffa0076a03 [] 0xffffffffa00771f7 [] 0xffffffff8023b43e [] 0xffffffff8023b571 [] 0xffffffff8023eee5 [] 0xffffffff8023eee5 [] 0xffffffff8023b4d8 [] 0xffffffff8023e794 [] 0xffffffff802035e9 [] 0xffffffff8023e72b [] 0xffffffff802035df regards -- Dan C?ndea Does God Play Dice? From dan at quah.ro Wed Dec 2 12:15:01 2009 From: dan at quah.ro (Dan Candea) Date: Wed, 2 Dec 2009 14:15:01 +0200 Subject: [Linux-cluster] gfs2 assertion "!mapping->nrpages" failed on rsync Message-ID: <200912021415.01701.dan@quah.ro> hello randomly , during a nightly backup with rsync I receive the error below on a 3 node setup with cluster2. because of the withdraw I can't unmount without a reboot. does someone have a clue? GFS2: fsid=data:FSdata.0: fatal: assertion "!mapping->nrpages" failed GFS2: fsid=data:FSdata.0: function = gfs2_meta_inval, file = fs/gfs2/meta_io.c, line = 110 GFS2: fsid=data:FSdata.0: about to withdraw this file system GFS2: fsid=data:FSdata.0: telling LM to withdraw GFS2: fsid=data:FSdata.0: withdrawn Pid: 4643, comm: glock_workqueue Not tainted 2.6.28-hardened-r9 #1 Call Trace: [] 0xffffffffa008e4ea [] 0xffffffff8025ecee [] 0xffffffffa0091307 [] 0xffffffffa008f640 [] 0xffffffffa000fc18 [] 0xffffffffa000bfe8 [] 0xffffffff8022605c [] 0xffffffffa008f060 [] 0xffffffffa008e5cb [] 0xffffffffa00912f3 [] 0xffffffffa0077a9b [] 0xffffffffa0076a03 [] 0xffffffffa00771f7 [] 0xffffffff8023b43e [] 0xffffffff8023b571 [] 0xffffffff8023eee5 [] 0xffffffff8023eee5 [] 0xffffffff8023b4d8 [] 0xffffffff8023e794 [] 0xffffffff802035e9 [] 0xffffffff8023e72b [] 0xffffffff802035df regards -- Dan C?ndea Does God Play Dice? From swhiteho at redhat.com Wed Dec 2 12:48:06 2009 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 02 Dec 2009 12:48:06 +0000 Subject: [Linux-cluster] gfs2 assertion "!mapping->nrpages" failed on rsync In-Reply-To: <200912021409.21001.dan@quah.ro> References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com> <531e3e4c0912020209q5a617654g20daba32154c79b2@mail.gmail.com> <200912021409.21001.dan@quah.ro> Message-ID: <1259758086.6052.959.camel@localhost.localdomain> Hi, On Wed, 2009-12-02 at 14:09 +0200, Dan Candea wrote: > hello > > randomly , during a nightly backup with rsync I receive the error below on a 3 > node setup with cluster2. because of the withdraw I can't unmount without a > reboot. > > does someone have a clue? > > > GFS2: fsid=data:FSdata.0: fatal: assertion "!mapping->nrpages" failed > GFS2: fsid=data:FSdata.0: function = gfs2_meta_inval, file = > fs/gfs2/meta_io.c, line = 110 > GFS2: fsid=data:FSdata.0: about to withdraw this file system > GFS2: fsid=data:FSdata.0: telling LM to withdraw > GFS2: fsid=data:FSdata.0: withdrawn > Pid: 4643, comm: glock_workqueue Not tainted 2.6.28-hardened-r9 #1 I don't recognise this kernel version, which distro is it from? Can you reproduce this issue? I've heard of an issue involving rsync, but having now tried various different rsync commands, I've not been able to reproduce anything that fails. > Call Trace: > [] 0xffffffffa008e4ea > [] 0xffffffff8025ecee > [] 0xffffffffa0091307 > [] 0xffffffffa008f640 > [] 0xffffffffa000fc18 > [] 0xffffffffa000bfe8 > [] 0xffffffff8022605c > [] 0xffffffffa008f060 > [] 0xffffffffa008e5cb > [] 0xffffffffa00912f3 > [] 0xffffffffa0077a9b > [] 0xffffffffa0076a03 > [] 0xffffffffa00771f7 > [] 0xffffffff8023b43e > [] 0xffffffff8023b571 > [] 0xffffffff8023eee5 > [] 0xffffffff8023eee5 > [] 0xffffffff8023b4d8 > [] 0xffffffff8023e794 > [] 0xffffffff802035e9 > [] 0xffffffff8023e72b > [] 0xffffffff802035df > This set of numbers is pretty useless without being translated into symbols. On the other hand the assertion which you've hit is GFS2 complaining that its requested that the pages relating to an inode to be invalidated, but there are some that have not been removed after that invalidation. So in this particular case it doesn't matter, Steve. > > regards From mm at yuhu.biz Wed Dec 2 12:54:45 2009 From: mm at yuhu.biz (Marian Marinov) Date: Wed, 2 Dec 2009 14:54:45 +0200 Subject: [Linux-cluster] Searching for speakers Message-ID: <200912021454.53872.mm@yuhu.biz> Hello, sorry for the off topic e-mail, but I'm organizing the biggest FOSS conference in Bulgaria - OpenFest. And I'm curious if any one of you guys is interested in coming to Bulgaria next year and speaking about CLVM or the Cluster project as a whole? Next year's OpenFest will be held in Sofia, Bulgaria at 6-7 of November. If you are interested, please contact me. Again sorry for the off-topic mail. -- Best regards, Marian Marinov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From rvandolson at esri.com Wed Dec 2 14:58:43 2009 From: rvandolson at esri.com (Ray Van Dolson) Date: Wed, 2 Dec 2009 06:58:43 -0800 Subject: [Linux-cluster] GFS performance test In-Reply-To: <4B16554A.50002@si.ct.upc.edu> References: <4B16554A.50002@si.ct.upc.edu> Message-ID: <20091202145842.GA16292@esri.com> On Wed, Dec 02, 2009 at 03:53:46AM -0800, frank wrote: > Hi, > after seeing some posts related to GFS performance, we have decided to > test our two-node GFS filesystem with ping_pong program. > We are worried about the results. > > Running the program in only one node, without parameters, we get between > 800000 locks/sec and 900000 locks/sec > Running the program in both nodes over the same file on the shared > filesystem, the lock rate did not drop and it is the same in both nodes! > What does this mean? Is there any problem with locks ? > > Just for you info, GFS filesystem is /mnt/gfs and what I run in both > nodes is: > > ./ping_pong /mnt/gfs/tmp/test.dat 3 > > Thanks for your help. > Wow, that doesn't sound right at all (or at least not consistent with results I've gotten :) Can you provide details of your setup, and perhaps your cluster.conf file? Have you done any other GFS tuning? Are we talking GFS1 or GFS2? I get in the 3000-5000 locks/sec range with my GFS2 filesystem (using nodiratime,noatime and reducing the lock limit to 0 from 100 in my cluster.conf file). The numbers you provide I'd expect to see on a local filesystem. Ray From swhiteho at redhat.com Wed Dec 2 15:14:21 2009 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 02 Dec 2009 15:14:21 +0000 Subject: [Linux-cluster] GFS performance test In-Reply-To: <20091202145842.GA16292@esri.com> References: <4B16554A.50002@si.ct.upc.edu> <20091202145842.GA16292@esri.com> Message-ID: <1259766861.6052.963.camel@localhost.localdomain> Hi, On Wed, 2009-12-02 at 06:58 -0800, Ray Van Dolson wrote: > On Wed, Dec 02, 2009 at 03:53:46AM -0800, frank wrote: > > Hi, > > after seeing some posts related to GFS performance, we have decided to > > test our two-node GFS filesystem with ping_pong program. > > We are worried about the results. > > > > Running the program in only one node, without parameters, we get between > > 800000 locks/sec and 900000 locks/sec > > Running the program in both nodes over the same file on the shared > > filesystem, the lock rate did not drop and it is the same in both nodes! > > What does this mean? Is there any problem with locks ? > > > > Just for you info, GFS filesystem is /mnt/gfs and what I run in both > > nodes is: > > > > ./ping_pong /mnt/gfs/tmp/test.dat 3 > > > > Thanks for your help. > > > > Wow, that doesn't sound right at all (or at least not consistent with > results I've gotten :) > > Can you provide details of your setup, and perhaps your cluster.conf > file? Have you done any other GFS tuning? Are we talking GFS1 or > GFS2? > > I get in the 3000-5000 locks/sec range with my GFS2 filesystem (using > nodiratime,noatime and reducing the lock limit to 0 from 100 in my > cluster.conf file). > > The numbers you provide I'd expect to see on a local filesystem. > > Ray > If you are mounting with lock_nolock, then the locks are the same as for any other local filesystem, so you'll see it works much faster than any clustered arrangement. If the lock rate appears to be that high in the cluster, maybe the localflocks mount parameter has been specified which means that the locking will be done locally on each node, and is not being done across the cluster. Ray's figures sound much more reasonable, Steve. From marcos.david at efacec.com Wed Dec 2 16:09:12 2009 From: marcos.david at efacec.com (Marcos David) Date: Wed, 02 Dec 2009 16:09:12 +0000 Subject: [Linux-cluster] Random clurgmrgd crashes In-Reply-To: <4B1656EF.50301@efacec.com> References: <1259710380.6571.14.camel@mecatol> <4B16544A.5060408@efacec.com> <4B1656EF.50301@efacec.com> Message-ID: <4B169128.3040903@efacec.com> Hi, I'm experiencing random crashes on clurgmgrd on a 4 node RHEL5.3 cluster. This is a big problem since it is happening on the production cluster.... The corefile backtrace gives: Core was generated by `clurgmgrd -d'. Program terminated with signal 6, Aborted. [New process 2495] #0 0x0068b402 in __kernel_vsyscall () (gdb) bt #0 0x0068b402 in __kernel_vsyscall () #1 0x001da211 in select () from /lib/libc.so.6 #2 0x08051f6a in event_loop () #3 0x08052d10 in main () (gdb) Can anyone help me out with this? Thanks in advance. From marcos.david at efacec.com Wed Dec 2 16:14:31 2009 From: marcos.david at efacec.com (Marcos David) Date: Wed, 02 Dec 2009 16:14:31 +0000 Subject: [Linux-cluster] Random clurgmrgd crashes Message-ID: <4B169267.7080000@efacec.com> (Previous message went into the wrong thread... sorry). Hi, I'm experiencing random crashes on clurgmgrd on a 4 node RHEL5.3 cluster. This is a big problem since it is happening on the production cluster.... The corefile backtrace gives: Core was generated by `clurgmgrd -d'. Program terminated with signal 6, Aborted. [New process 2495] #0 0x0068b402 in __kernel_vsyscall () (gdb) bt #0 0x0068b402 in __kernel_vsyscall () #1 0x001da211 in select () from /lib/libc.so.6 #2 0x08051f6a in event_loop () #3 0x08052d10 in main () (gdb) Can anyone help me out with this? Thanks in advance. From dan at quah.ro Wed Dec 2 16:25:12 2009 From: dan at quah.ro (Dan Candea) Date: Wed, 2 Dec 2009 18:25:12 +0200 Subject: [Linux-cluster] gfs2 assertion "!mapping->nrpages" failed on rsync In-Reply-To: <1259758086.6052.959.camel@localhost.localdomain> References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com> <200912021409.21001.dan@quah.ro> <1259758086.6052.959.camel@localhost.localdomain> Message-ID: <200912021825.12575.dan@quah.ro> On Wednesday 02 December 2009 14:48, Whitehouse Steven wrote: -- Hi, On Wed, 2009-12-02 at 14:09 +0200, Dan Candea wrote: > hello > > randomly , during a nightly backup with rsync I receive the error below on a 3 > node setup with cluster2. because of the withdraw I can't unmount without a > reboot. > > does someone have a clue? > > > GFS2: fsid=data:FSdata.0: fatal: assertion "!mapping->nrpages" failed > GFS2: fsid=data:FSdata.0: function = gfs2_meta_inval, file = > fs/gfs2/meta_io.c, line = 110 > GFS2: fsid=data:FSdata.0: about to withdraw this file system > GFS2: fsid=data:FSdata.0: telling LM to withdraw > GFS2: fsid=data:FSdata.0: withdrawn > Pid: 4643, comm: glock_workqueue Not tainted 2.6.28-hardened-r9 #1 I don't recognise this kernel version, which distro is it from? its a kernel with grsecurity applied from gentoo Can you reproduce this issue? I've heard of an issue involving rsync, but having now tried various different rsync commands, I've not been able to reproduce anything that fails. I'll try to reproduce it after the reboot, which I have to do it by night, but I'm not sure I'll make something of it, cause the error is spontaneous, while the rsync is ran each day. > Call Trace: > [] 0xffffffffa008e4ea > [] 0xffffffff8025ecee > [] 0xffffffffa0091307 > [] 0xffffffffa008f640 > [] 0xffffffffa000fc18 > [] 0xffffffffa000bfe8 > [] 0xffffffff8022605c > [] 0xffffffffa008f060 > [] 0xffffffffa008e5cb > [] 0xffffffffa00912f3 > [] 0xffffffffa0077a9b > [] 0xffffffffa0076a03 > [] 0xffffffffa00771f7 > [] 0xffffffff8023b43e > [] 0xffffffff8023b571 > [] 0xffffffff8023eee5 > [] 0xffffffff8023eee5 > [] 0xffffffff8023b4d8 > [] 0xffffffff8023e794 > [] 0xffffffff802035e9 > [] 0xffffffff8023e72b > [] 0xffffffff802035df > This set of numbers is pretty useless without being translated into symbols. On the other hand the assertion which you've hit is GFS2 complaining that its requested that the pages relating to an inode to be invalidated, but there are some that have not been removed after that invalidation. So in this particular case it doesn't matter, Here are you saying that it could be an inconsistency in the FS? Steve. > > regards thank you -- Dan C?ndea Does God Play Dice? From swhiteho at redhat.com Wed Dec 2 16:46:08 2009 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 02 Dec 2009 16:46:08 +0000 Subject: [Linux-cluster] gfs2 assertion "!mapping->nrpages" failed on rsync In-Reply-To: <200912021825.12575.dan@quah.ro> References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com> <200912021409.21001.dan@quah.ro> <1259758086.6052.959.camel@localhost.localdomain> <200912021825.12575.dan@quah.ro> Message-ID: <1259772368.6052.968.camel@localhost.localdomain> Hi, On Wed, 2009-12-02 at 18:25 +0200, Dan Candea wrote: > On Wednesday 02 December 2009 14:48, Whitehouse Steven wrote: > -- > Hi, > > On Wed, 2009-12-02 at 14:09 +0200, Dan Candea wrote: > > hello > > > > randomly , during a nightly backup with rsync I receive the error below on a > 3 > > node setup with cluster2. because of the withdraw I can't unmount without a > > reboot. > > > > does someone have a clue? > > > > > > GFS2: fsid=data:FSdata.0: fatal: assertion "!mapping->nrpages" failed > > GFS2: fsid=data:FSdata.0: function = gfs2_meta_inval, file = > > fs/gfs2/meta_io.c, line = 110 > > GFS2: fsid=data:FSdata.0: about to withdraw this file system > > GFS2: fsid=data:FSdata.0: telling LM to withdraw > > GFS2: fsid=data:FSdata.0: withdrawn > > Pid: 4643, comm: glock_workqueue Not tainted 2.6.28-hardened-r9 #1 > I don't recognise this kernel version, which distro is it from? > > its a kernel with grsecurity applied from gentoo > > > Can you reproduce this issue? I've heard of an issue involving rsync, > but having now tried various different rsync commands, I've not been > able to reproduce anything that fails. > > > I'll try to reproduce it after the reboot, which I have to do it by night, but > I'm not sure I'll make something of it, cause the error is spontaneous, while > the rsync is ran each day. > Ok. I suspect though that whatever the issue, it has probably been fixed in more recent kernels, .28 is pretty old now so I'd suggest upgrading your kernel as one possible solution. I'd be surprised if that doesn't fix your issue. [various number removed for brevity] > > [] 0xffffffff802035df > > > This set of numbers is pretty useless without being translated into > symbols. On the other hand the assertion which you've hit is GFS2 > complaining that its requested that the pages relating to an inode to be > invalidated, but there are some that have not been removed after that > invalidation. So in this particular case it doesn't matter, > > > > Here are you saying that it could be an inconsistency in the FS? > No, its more likely to be an issue in the code. It doesn't look like the fs is damaged at all, in fact that bug trap is there to prevent damage to the fs in this particular case, Steve. From dan at quah.ro Wed Dec 2 16:44:54 2009 From: dan at quah.ro (Dan Candea) Date: Wed, 2 Dec 2009 18:44:54 +0200 Subject: [Linux-cluster] gfs2 assertion "!mapping->nrpages" failed on rsync In-Reply-To: <1259772368.6052.968.camel@localhost.localdomain> References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com> <200912021825.12575.dan@quah.ro> <1259772368.6052.968.camel@localhost.localdomain> Message-ID: <200912021844.54205.dan@quah.ro> On Wednesday 02 December 2009 18:46, Whitehouse Steven wrote: -- Hi, On Wed, 2009-12-02 at 18:25 +0200, Dan Candea wrote: > On Wednesday 02 December 2009 14:48, Whitehouse Steven wrote: > -- > Hi, > > On Wed, 2009-12-02 at 14:09 +0200, Dan Candea wrote: > > hello > > > > randomly , during a nightly backup with rsync I receive the error below on a > 3 > > node setup with cluster2. because of the withdraw I can't unmount without a > > reboot. > > > > does someone have a clue? > > > > > > GFS2: fsid=data:FSdata.0: fatal: assertion "!mapping->nrpages" failed > > GFS2: fsid=data:FSdata.0: function = gfs2_meta_inval, file = > > fs/gfs2/meta_io.c, line = 110 > > GFS2: fsid=data:FSdata.0: about to withdraw this file system > > GFS2: fsid=data:FSdata.0: telling LM to withdraw > > GFS2: fsid=data:FSdata.0: withdrawn > > Pid: 4643, comm: glock_workqueue Not tainted 2.6.28-hardened-r9 #1 > I don't recognise this kernel version, which distro is it from? > > its a kernel with grsecurity applied from gentoo > > > Can you reproduce this issue? I've heard of an issue involving rsync, > but having now tried various different rsync commands, I've not been > able to reproduce anything that fails. > > > I'll try to reproduce it after the reboot, which I have to do it by night, but > I'm not sure I'll make something of it, cause the error is spontaneous, while > the rsync is ran each day. > Ok. I suspect though that whatever the issue, it has probably been fixed in more recent kernels, .28 is pretty old now so I'd suggest upgrading your kernel as one possible solution. I'd be surprised if that doesn't fix your issue. ok, thank you. I'll try a kernel upgrade. [various number removed for brevity] > > [] 0xffffffff802035df > > > This set of numbers is pretty useless without being translated into > symbols. On the other hand the assertion which you've hit is GFS2 > complaining that its requested that the pages relating to an inode to be > invalidated, but there are some that have not been removed after that > invalidation. So in this particular case it doesn't matter, > > > > Here are you saying that it could be an inconsistency in the FS? > No, its more likely to be an issue in the code. It doesn't look like the fs is damaged at all, in fact that bug trap is there to prevent damage to the fs in this particular case, Steve. -- Dan C?ndea Does God Play Dice? From rmicmirregs at gmail.com Wed Dec 2 16:52:35 2009 From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda) Date: Wed, 02 Dec 2009 17:52:35 +0100 Subject: [Linux-cluster] Qdisk with multiple heuristics? In-Reply-To: <29ae894c0912020249q198d8410sc9007a6398757ea4@mail.gmail.com> References: <1259710380.6571.14.camel@mecatol> <29ae894c0912020249q198d8410sc9007a6398757ea4@mail.gmail.com> Message-ID: <1259772755.6568.5.camel@mecatol> Hi Brem, Thanks for you answer. The problem was the qdiskd service not being started by CMAN. In my previous configuration, it was started by the CMAN startup script (located in init.d, it would start qdiskd if necessary) but this time the qdiskd service was configured to not start in system start-up (with chkconfig) so CMAN did not start it either. A strange behaviour/design decision, in my opinion. Now everything is solved and the multiple heuristic is working (I see the 2 ping processes working). I only need to check the "score" configuration to see if it is working properly. I plan to do it tomorrow. Cheers, Rafael El mi?, 02-12-2009 a las 11:49 +0100, brem belguebli escribi?: > Hi Rafael, > > Concerning your second point, have you initialized your > /dev/mpath/quorum device with mkqdisk ? > > Also, the qdisk daemon must be running if you want it to be > operationnal in your cluster. > > In my setup, everything is started manually, no automatic boot time > cluster start (safest option IMHO), and I use the following stepping: > > 1) start qdisk (service qdiskd start) > 2) start cman (service cman start) > 3) start rgmanager (service rgmanager start) > 4) wait untill the cluster is quorate (a shell loop) before starting clvmd > 5) start clvmd > > Output of clustat: > > > Cluster Status for rhcl1 @ Wed Dec 2 11:20:20 2009 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > node1.mydom 1 Online, Local, rgmanager > node2.mydom 2 Online, rgmanager > node3.mydom 3 Online, rgmanager > /dev/iscsi/storage.quorum 0 Online, Quorum > Disk <-- Qorum disk started... > > Service Name Owner (Last) > State > .... > > > [root at node1 ~]# ps -edf | grep qdisk > root 4409 1 0 Nov26 ? 00:04:00 qdiskd -Q > > > Concerning your point 1, you may address this by giving a different > score to each heuristic, but I clearly don't know if this is what it > intends to. > > Brem > Regards > > > 2009/12/2 Rafael Mic? Miranda : > > Hi all, > > > > As it can be found in qdiskd man page, it is allowed to use up to 10 > > different heuristics in one cluster. > > > > How is this specified into cluster.conf? I'm trying to make it work with > > the following piece of cluster.conf file: > > > > > votes="3"> > > > > > score="1"/> > > > > > > My objective is to have 2 (or more) different heuristics which keep this > > node alive even if only one heuristic is OK. The cluster.conf file was > > created with system-config-cluster and later was edited by hand. > > > > The qdisk and heuristics are not working: > > 1.- system-config-cluster shows me a warning about an error related to > > some options not allowed into quorumd. I'm sorry i cannot be more > > specific right now, I could attach the exact message tomorrow. > > > > 2.- The cluster is operational, but using "clustat" i don't see the > > qdisk with its votes in the node list. The qdisk process is neither > > shown in the process list on the system. > > > > Is there somethin wrong? > > > > I'm using RHEL5.3 with: > > cman-2.0.98-1.el5.x86_64 > > openais-0.80.3-22.el5.x86_64 > > rgmanager-2.0.46-1.el5.x86_64 > > > > > > Thanks in advance. Cheers, > > > > Rafael > > > > > > -- > > Rafael Mic? Miranda > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Rafael Mic? Miranda From rvandolson at esri.com Thu Dec 3 20:42:57 2009 From: rvandolson at esri.com (Ray Van Dolson) Date: Thu, 3 Dec 2009 12:42:57 -0800 Subject: [Linux-cluster] GFS2 and backups (performance tuning) Message-ID: <20091203204257.GA15314@esri.com> We have a two node cluster primarily acting as an NFS serving environment. Our backup infrastructure here uses NetBackup and, unfortunately, NetBackup has no PPC client (we're running on IBM JS20 blades) so we're approaching the backup strategy in two different ways: - Run netbackup client from another machine and point it to NFS share on one of our two cluster nodes - Run rsyncd on our cluster nodes and rsync from a remote machine. NetBackup then backs up that machine. The GFS2 filesystem in our cluster only is storing about 90GB of data, but has about one million files (inodes used reported via df -i) on it. (For the curious, this is a home directory server and we do break thinsg up under a top level hierarchy of a folder for each first letter of a username). The NetBackup over NFS route is extremely slow and spikes the load up on whichever server is being backed up from. We made the following adjustments to try and improve performance: - Set the following in our cluster.conf file: ping_pong will give me about 3-5k locks/sec now. - Mounted filesystem with noatime,nodiratime,quota=off This seems to have helped a bit, but things are still taking a long time. I should note here that I tried running ping_pong to one of our cluster nodes via one of its NFS exports of the GFS2 filesystem. While I can get 3000-5000 locks/sec locally, over NFS it was about... 2 or 3 (not thousand, literally 2 or 3). tcpdump of the NLM port shows the NFS lock manager on the node responding NLM_BLOCK most of the time. I'm not sure if GFS2 or our NFS daemon is to blame... in any case... .. I've set up rsyncd on the cluster nodes and am sync'ing from a remote server now (all of this via Gigabit ethernet). I'm over an hour in and the client is still generatin the file list. strace confirms that rsync --daemon is still trolling through, generating a list of files on the filesystem... I've done a blktrace dump on my GFS2 filesystem's block device and can clearly see glock_workqueue showing up the most by far. However, I don't know what else I can glean from these results. Anyone have any tips or suggestions on improving either our NFS locking or rsync --daemon performance beyond what I've already tried? It might almost be quicker for us to do a full backup each time than to spend hours building file lists for differential backups :) Details of our setup: - IBM DS4300 Storage (12 drive RAID5 + 2 spares) - Exposed as two LUNs (one per controller) - Don't believe this array does hardware snapshots :( - Two (2) IBM JS20 Blades (PPC) - QLogic ISP2312 2Gb HBA's - RHEL 5.4 Advanced Platform PPC - multipathd - clvm aggregates two LUNs - GFS2 on top of clvm - Configured with quotas originally, but disabled later by mounting quota=off - Mounted with noatime,nodiratime,quota=off # gfs2_tool gettune /domus1 new_files_directio = 0 new_files_jdata = 0 quota_scale = 1.0000 (1, 1) logd_secs = 1 recoverd_secs = 60 statfs_quantum = 30 stall_secs = 600 quota_cache_secs = 300 quota_simul_sync = 64 statfs_slow = 0 complain_secs = 10 max_readahead = 262144 quota_quantum = 60 quota_warn_period = 10 jindex_refresh_secs = 60 log_flush_secs = 60 incore_log_blocks = 1024 # gfs2_tool getargs /domus1 data 2 suiddir 0 quota 0 posix_acl 1 upgrade 0 debug 0 localflocks 0 localcaching 0 ignore_local_fs 0 spectator 0 hostdata jid=1:id=196610:first=0 locktable lockproto Thanks in advance for any advice. Ray From allen at isye.gatech.edu Thu Dec 3 22:30:29 2009 From: allen at isye.gatech.edu (Allen Belletti) Date: Thu, 03 Dec 2009 17:30:29 -0500 Subject: [Linux-cluster] GFS2: processes stuck in "just schedule" In-Reply-To: <20091203204257.GA15314@esri.com> References: <20091203204257.GA15314@esri.com> Message-ID: <4B183C05.1060101@isye.gatech.edu> Hi All, After Steve and the RedHat guys dug into my nasty crashdump (thanks all!), I believe I'm down to the last GFS2 problem on our mail cluster, but it's a common one. I've always had trouble with processes getting stuck on GFS2 access and queuing up. Since the 5.4 upgrade and moving the proper GFS2 kernel module, it's changed but not gone away. Ever few days now, I'm seeing processes getting stuck with WCHAN=just_schedule. Once this starts happening, both cluster nodes will accumulate them rapidly which eventually brings IO to a halt. The only way I've found to escape is via a reboot, sometimes of one, sometimes of both nodes. Since there's no crash, I don't get any useful debug information. Outside of this one repeating glitch, performance is great and all is well. If anyone can suggest ways of gathering more data about the problem, or possible solutions, I would be grateful. Thanks, Allen From no-reply at dropbox.com Fri Dec 4 01:41:14 2009 From: no-reply at dropbox.com (Dropbox) Date: Fri, 04 Dec 2009 01:41:14 +0000 Subject: [Linux-cluster] Jorge Palma has invited you to Dropbox Message-ID: <20091204014114.C4E7E46180B@mailman.dropbox.com> We're excited to let you know that Jorge Palma has invited you to Dropbox! Jorge Palma has been using Dropbox to sync and share files online and across computers, and thought you might want it too. Visit http://www.dropbox.com/link/20.yzjZ2HAsSs/NjYwMDc0ODg3 to get started. - The Dropbox Team ____________________________________________________ To stop receiving invites from Dropbox, please go to http://www.dropbox.com/bl/180e8afc7eea/linux-cluster%40redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From baishuwei at gmail.com Fri Dec 4 03:06:05 2009 From: baishuwei at gmail.com (Bai Shuwei) Date: Fri, 4 Dec 2009 11:06:05 +0800 Subject: [Linux-cluster] LUN/LUN Masking Message-ID: HI, everyone: I am a begginer on FC-SAN. On my machine i have installed HBAs(QLogic2342) and SLCI tools. How I can setup the lun masking to forbidden/allow hosts to access special LUN/Disk? Do I need some other speccial tools to do it? Thanks all. Best Regards Bai SHuwei -- Love other people, as same as love yourself! Don't think all the time, do it by your hands! Personal URL: http://dslab.lzu.edu.cn:8080/members/baishw/ E-Mail: baishuwei at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From fajar at fajar.net Fri Dec 4 04:26:31 2009 From: fajar at fajar.net (Fajar A. Nugraha) Date: Fri, 4 Dec 2009 11:26:31 +0700 Subject: [Linux-cluster] LUN/LUN Masking In-Reply-To: References: Message-ID: <7207d96f0912032026n7ef04c7ahc86b24ca482e3326@mail.gmail.com> On Fri, Dec 4, 2009 at 10:06 AM, Bai Shuwei wrote: > HI, everyone: > ?? I am a begginer on FC-SAN. On my machine i have installed > HBAs(QLogic2342) and SLCI tools. How I can setup the lun masking to > forbidden/allow hosts to access special LUN/Disk? Do I need some other > speccial tools to do it? Thanks all. AFAIK LUN masking is done on storage side, not client side. -- Fajar From cthulhucalling at gmail.com Fri Dec 4 05:43:26 2009 From: cthulhucalling at gmail.com (Ian Hayes) Date: Thu, 3 Dec 2009 21:43:26 -0800 Subject: [Linux-cluster] LUN/LUN Masking In-Reply-To: References: Message-ID: <36df569a0912032143u6bbf9e0fh5f01496738b51e33@mail.gmail.com> It depends on who your san vendor is, but its done on the storage side usually through the management console. all the ones I've used filter by the wwn of the host bus adapters. You may also want to consider zoning your hba's at the switch level. On Dec 3, 2009 7:06 PM, "Bai Shuwei" wrote: HI, everyone: I am a begginer on FC-SAN. On my machine i have installed HBAs(QLogic2342) and SLCI tools. How I can setup the lun masking to forbidden/allow hosts to access special LUN/Disk? Do I need some other speccial tools to do it? Thanks all. Best Regards Bai SHuwei -- Love other people, as same as love yourself! Don't think all the time, do it by your hands! Personal URL: http://dslab.lzu.edu.cn:8080/members/baishw/ E-Mail: baishuwei at gmail.com -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhiteho at redhat.com Fri Dec 4 09:39:01 2009 From: swhiteho at redhat.com (Steven Whitehouse) Date: Fri, 04 Dec 2009 09:39:01 +0000 Subject: [Linux-cluster] GFS2: processes stuck in "just schedule" In-Reply-To: <4B183C05.1060101@isye.gatech.edu> References: <20091203204257.GA15314@esri.com> <4B183C05.1060101@isye.gatech.edu> Message-ID: <1259919541.2489.8.camel@localhost> Hi, On Thu, 2009-12-03 at 17:30 -0500, Allen Belletti wrote: > Hi All, > > After Steve and the RedHat guys dug into my nasty crashdump (thanks > all!), I believe I'm down to the last GFS2 problem on our mail cluster, > but it's a common one. > > I've always had trouble with processes getting stuck on GFS2 access and > queuing up. Since the 5.4 upgrade and moving the proper GFS2 kernel > module, it's changed but not gone away. Ever few days now, I'm seeing > processes getting stuck with WCHAN=just_schedule. Once this starts > happening, both cluster nodes will accumulate them rapidly which > eventually brings IO to a halt. The only way I've found to escape is > via a reboot, sometimes of one, sometimes of both nodes. > > Since there's no crash, I don't get any useful debug information. > Outside of this one repeating glitch, performance is great and all is > well. If anyone can suggest ways of gathering more data about the > problem, or possible solutions, I would be grateful. > > Thanks, > Allen > > This would be typical for what happens when there is contention on a glock between two (or more) nodes. There is a mechanism which is supposed to try and mitigate the issue (by allowing each node to hold on to a glock for a minimum period of time which is designed to ensure that some work is done each time a node acquires a glock) but if your storage is particularly slow, and/or possibly depending upon the exact I/O pattern, it may not always be 100% effective. In the first instance though, see if you can find an inode which is being contended from both nodes as that will most likely be the culprit, Steve. From swhiteho at redhat.com Fri Dec 4 09:44:30 2009 From: swhiteho at redhat.com (Steven Whitehouse) Date: Fri, 04 Dec 2009 09:44:30 +0000 Subject: [Linux-cluster] GFS2 and backups (performance tuning) In-Reply-To: <20091203204257.GA15314@esri.com> References: <20091203204257.GA15314@esri.com> Message-ID: <1259919870.2489.10.camel@localhost> Hi, I'd suggest filing a bug in the first instance. I can't see anything obviously wrong with what you are doing. The fcntl() locks go via the dlm and dlm_controld not via the glock_workqueues, so I don't think that is likely to be the issue, Steve. On Thu, 2009-12-03 at 12:42 -0800, Ray Van Dolson wrote: > We have a two node cluster primarily acting as an NFS serving > environment. Our backup infrastructure here uses NetBackup and, > unfortunately, NetBackup has no PPC client (we're running on IBM JS20 > blades) so we're approaching the backup strategy in two different ways: > > - Run netbackup client from another machine and point it to NFS share > on one of our two cluster nodes > - Run rsyncd on our cluster nodes and rsync from a remote machine. > NetBackup then backs up that machine. > > The GFS2 filesystem in our cluster only is storing about 90GB of data, > but has about one million files (inodes used reported via df -i) on it. > > (For the curious, this is a home directory server and we do break > thinsg up under a top level hierarchy of a folder for each first letter > of a username). > > The NetBackup over NFS route is extremely slow and spikes the load up > on whichever server is being backed up from. We made the following > adjustments to try and improve performance: > > - Set the following in our cluster.conf file: > > > > > ping_pong will give me about 3-5k locks/sec now. > > - Mounted filesystem with noatime,nodiratime,quota=off > > This seems to have helped a bit, but things are still taking a long > time. I should note here that I tried running ping_pong to one of our > cluster nodes via one of its NFS exports of the GFS2 filesystem. While > I can get 3000-5000 locks/sec locally, over NFS it was about... 2 or 3 > (not thousand, literally 2 or 3). tcpdump of the NLM port shows the > NFS lock manager on the node responding NLM_BLOCK most of the time. > I'm not sure if GFS2 or our NFS daemon is to blame... in any case... > > .. I've set up rsyncd on the cluster nodes and am sync'ing from a > remote server now (all of this via Gigabit ethernet). I'm over an hour > in and the client is still generatin the file list. strace confirms > that rsync --daemon is still trolling through, generating a list of > files on the filesystem... > > I've done a blktrace dump on my GFS2 filesystem's block device and can > clearly see glock_workqueue showing up the most by far. However, I > don't know what else I can glean from these results. > > Anyone have any tips or suggestions on improving either our NFS locking > or rsync --daemon performance beyond what I've already tried? It might > almost be quicker for us to do a full backup each time than to spend > hours building file lists for differential backups :) > > Details of our setup: > > - IBM DS4300 Storage (12 drive RAID5 + 2 spares) > - Exposed as two LUNs (one per controller) > - Don't believe this array does hardware snapshots :( > - Two (2) IBM JS20 Blades (PPC) > - QLogic ISP2312 2Gb HBA's > - RHEL 5.4 Advanced Platform PPC > - multipathd > - clvm aggregates two LUNs > - GFS2 on top of clvm > - Configured with quotas originally, but disabled later by > mounting quota=off > - Mounted with noatime,nodiratime,quota=off > > # gfs2_tool gettune /domus1 > new_files_directio = 0 > new_files_jdata = 0 > quota_scale = 1.0000 (1, 1) > logd_secs = 1 > recoverd_secs = 60 > statfs_quantum = 30 > stall_secs = 600 > quota_cache_secs = 300 > quota_simul_sync = 64 > statfs_slow = 0 > complain_secs = 10 > max_readahead = 262144 > quota_quantum = 60 > quota_warn_period = 10 > jindex_refresh_secs = 60 > log_flush_secs = 60 > incore_log_blocks = 1024 > > # gfs2_tool getargs /domus1 > data 2 > suiddir 0 > quota 0 > posix_acl 1 > upgrade 0 > debug 0 > localflocks 0 > localcaching 0 > ignore_local_fs 0 > spectator 0 > hostdata jid=1:id=196610:first=0 > locktable > lockproto > > Thanks in advance for any advice. > > Ray > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From frank at si.ct.upc.edu Fri Dec 4 12:06:52 2009 From: frank at si.ct.upc.edu (frank) Date: Fri, 04 Dec 2009 13:06:52 +0100 Subject: [Linux-cluster] GFS performance test In-Reply-To: <20091202163200.DCB0A8E14CA@hormel.redhat.com> References: <20091202163200.DCB0A8E14CA@hormel.redhat.com> Message-ID: <4B18FB5C.3090500@si.ct.upc.edu> Hi Ray, thank for your answer. We are using GFS1 on a Red Hat 5.4 cluster. GFS filesystem is mounted on /mnt/gfs, and when we created such filesystem we used parameter "-p lock_dlm". Anyway, look at this output : [root at parmenides ~]# gfs_tool getsb /mnt/gfs ......................... no_addr = 26 sb_lockproto = lock_dlm sb_locktable = hr-pm:gfs01 no_formal_ino = 24 no_addr = 24 ............... For you information my cluster.conf file is: ------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------- Shared disk is a LUN on a fibber channel SAN. The most surprising thing is that we have another similar cluster, and there we get "98 locks/sec" always, starting the ping_pong in one or in both nodes. Buf! I'm lost! What is happening? Frank > Date: Wed, 2 Dec 2009 06:58:43 -0800 From: Ray Van Dolson > Subject: Re: [Linux-cluster] GFS performance > test To: linux-cluster at redhat.com Message-ID: > <20091202145842.GA16292 at esri.com> Content-Type: text/plain; > charset=us-ascii On Wed, Dec 02, 2009 at 03:53:46AM -0800, frank wrote: >> > Hi, >> > after seeing some posts related to GFS performance, we have decided to >> > test our two-node GFS filesystem with ping_pong program. >> > We are worried about the results. >> > >> > Running the program in only one node, without parameters, we get between >> > 800000 locks/sec and 900000 locks/sec >> > Running the program in both nodes over the same file on the shared >> > filesystem, the lock rate did not drop and it is the same in both nodes! >> > What does this mean? Is there any problem with locks ? >> > >> > Just for you info, GFS filesystem is /mnt/gfs and what I run in both >> > nodes is: >> > >> > ./ping_pong /mnt/gfs/tmp/test.dat 3 >> > >> > Thanks for your help. >> > >> > Wow, that doesn't sound right at all (or at least not consistent with > results I've gotten:) > > Can you provide details of your setup, and perhaps your cluster.conf > file? Have you done any other GFS tuning? Are we talking GFS1 or > GFS2? > > I get in the 3000-5000 locks/sec range with my GFS2 filesystem (using > nodiratime,noatime and reducing the lock limit to 0 from 100 in my > cluster.conf file). > > The numbers you provide I'd expect to see on a local filesystem. > > Ray > -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que est? net. For all your IT requirements visit: http://www.transtec.co.uk From rvandolson at esri.com Fri Dec 4 15:19:16 2009 From: rvandolson at esri.com (Ray Van Dolson) Date: Fri, 4 Dec 2009 07:19:16 -0800 Subject: [Linux-cluster] GFS2 and backups (performance tuning) In-Reply-To: <1259919870.2489.10.camel@localhost> References: <20091203204257.GA15314@esri.com> <1259919870.2489.10.camel@localhost> Message-ID: <20091204151916.GA899@esri.com> On Fri, Dec 04, 2009 at 01:44:30AM -0800, Steven Whitehouse wrote: > Hi, > > I'd suggest filing a bug in the first instance. I can't see anything > obviously wrong with what you are doing. The fcntl() locks go via the > dlm and dlm_controld not via the glock_workqueues, so I don't think that > is likely to be the issue, > > Steve. Thanks Steve. I'll go the bug + SR route. Ray From allen at isye.gatech.edu Fri Dec 4 19:26:39 2009 From: allen at isye.gatech.edu (Allen Belletti) Date: Fri, 04 Dec 2009 14:26:39 -0500 Subject: [Linux-cluster] GFS2: processes stuck in "just schedule" In-Reply-To: <1259919541.2489.8.camel@localhost> References: <20091203204257.GA15314@esri.com> <4B183C05.1060101@isye.gatech.edu> <1259919541.2489.8.camel@localhost> Message-ID: <4B19626F.3060405@isye.gatech.edu> On 12/04/2009 04:39 AM, Steven Whitehouse wrote: > Hi, > > On Thu, 2009-12-03 at 17:30 -0500, Allen Belletti wrote: > >> Hi All, >> >> After Steve and the RedHat guys dug into my nasty crashdump (thanks >> all!), I believe I'm down to the last GFS2 problem on our mail cluster, >> but it's a common one. >> >> I've always had trouble with processes getting stuck on GFS2 access and >> queuing up. Since the 5.4 upgrade and moving the proper GFS2 kernel >> module, it's changed but not gone away. Ever few days now, I'm seeing >> processes getting stuck with WCHAN=just_schedule. Once this starts >> happening, both cluster nodes will accumulate them rapidly which >> eventually brings IO to a halt. The only way I've found to escape is >> via a reboot, sometimes of one, sometimes of both nodes. >> >> Since there's no crash, I don't get any useful debug information. >> Outside of this one repeating glitch, performance is great and all is >> well. If anyone can suggest ways of gathering more data about the >> problem, or possible solutions, I would be grateful. >> >> Thanks, >> Allen >> >> >> > This would be typical for what happens when there is contention on a > glock between two (or more) nodes. There is a mechanism which is > supposed to try and mitigate the issue (by allowing each node to hold on > to a glock for a minimum period of time which is designed to ensure that > some work is done each time a node acquires a glock) but if your storage > is particularly slow, and/or possibly depending upon the exact I/O > pattern, it may not always be 100% effective. > > In the first instance though, see if you can find an inode which is > being contended from both nodes as that will most likely be the culprit, > We've got a 3-4 year old Sun 3510 FC array shared between the two nodes. The utilization on it is generally quite reasonable, so I doubt that this would qualify as "particularly slow". Also, the very busiest times for the mail system are usually during the night rsync backups and it rarely if ever gets wedged during those times. Can you give me some hints as to how I might go about finding a inode that's being contended for by both nodes? I assume that would be useful to confirm what the problem is at least. Thanks, Allen -- Allen Belletti allen at isye.gatech.edu 404-894-6221 Phone Industrial and Systems Engineering 404-385-2988 Fax Georgia Institute of Technology From gbmiglia at yahoo.it Mon Dec 7 17:41:01 2009 From: gbmiglia at yahoo.it (gilberto migliavacca) Date: Mon, 07 Dec 2009 18:41:01 +0100 Subject: [Linux-cluster] redhat cluster and resource agent Message-ID: <4B1D3E2D.8040109@yahoo.it> Hi I'm a newbie in the red hat cluster configuration and I don't know if this is the right mailing list for my question. I have to use my own resource agent script and I have to say to the cluster that the related service must be run just on single server. I other words I want to drive 2 nodes with 4 instances of the same application (2 instances per node). the infostructure is somehting like: node_1 /opt/myapp_11/bin/myapp.sh /opt/myapp_12/bin/myapp.sh node_2 /opt/myapp_21/bin/myapp.sh /opt/myapp_22/bin/myapp.sh My idea is to create 4 services in the /etc/cluster/cluster.conf but I don't know how to related the service with a given machine and a related path on the given machine for my understanding I think I cannot use the Conga GUI (neither the system-config-cluster) and I have to edit manually the /etc/cluster/cluster.conf could anyone help to write the XML section in the tag? something like As you can see I don't know how to specify the node thanks in advance gilberto From rmicmirregs at gmail.com Mon Dec 7 23:16:01 2009 From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda) Date: Tue, 08 Dec 2009 00:16:01 +0100 Subject: [Linux-cluster] redhat cluster and resource agent In-Reply-To: <4B1D3E2D.8040109@yahoo.it> References: <4B1D3E2D.8040109@yahoo.it> Message-ID: <1260227761.6606.9.camel@mecatol> Hi Gilberto, What you need to specify where to run each service is the Failover Domain of each service. Some info: http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Cluster_Administration/s1-config-failover-domain-CA.html http://sources.redhat.com/cluster/wiki/FailoverDomains You have 4 different services, so I would use 2 or 4 different Failover Domains to achieve your objective, depending on the availability of running each of your services in your cluster nodes. First you will need to define de Failover Domains: Failover Domain X Restricted domain: yes Ordered: yes Node A - Priority 1 Node B - Priority 2 And so on. Then you'll need to set the Failover Domain for each of the services, for example: Service 1 -> FailoverDomain1 Service 2 -> FailoverDomain2 Service 3 -> FailoverDomain3 Service 4 -> FailoverDomain4 This can be all done with system-config-cluster, but using a resource made by yourself into cluster.conf will give you some errors. It should be similar to this: [I think you need your definition of your myapp resources here] [...and so on] [and then start the definition of your services] [... and so on] Another question is: is your script usable by CMAN? I hope this helps. Cheers, Rafael El lun, 07-12-2009 a las 18:41 +0100, gilberto migliavacca escribi?: > Hi > > I'm a newbie in the red hat cluster configuration and > I don't know if this is the right mailing list for my > question. > > I have to use my own resource agent script and I have > to say to the cluster that the related service must be > run just on single server. > > I other words I want to drive 2 nodes with 4 instances > of the same application (2 instances per node). > > the infostructure is somehting like: > > node_1 > /opt/myapp_11/bin/myapp.sh > /opt/myapp_12/bin/myapp.sh > node_2 > /opt/myapp_21/bin/myapp.sh > /opt/myapp_22/bin/myapp.sh > > > My idea is to create 4 services in the /etc/cluster/cluster.conf > but I don't know how to related the service with a > given machine and a related path on the given machine > > > for my understanding I think I cannot use the Conga GUI (neither > the system-config-cluster) and I have to edit manually the > /etc/cluster/cluster.conf > > could anyone help to write the XML section in the tag? > > something like > > > > myapp_home="/opt/myapp_11" > shutdown_wait="0"/> > > > myapp_home="/opt/myapp_12" > shutdown_wait="0"/> > > > myapp_home="/opt/myapp_21" > shutdown_wait="0"/> > > > myapp_home="/opt/myapp_22" > shutdown_wait="0"/> > > > > > As you can see I don't know how to specify the node > > thanks in advance > > gilberto > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Rafael Mic? Miranda From fdinitto at redhat.com Tue Dec 8 00:09:30 2009 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Tue, 08 Dec 2009 01:09:30 +0100 Subject: [Linux-cluster] Cluster 3.0.6 stable release Message-ID: <4B1D993A.5010402@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The cluster team and its community are proud to announce the 3.0.6 stable release from the STABLE3 branch. This release contains a few major bug fixes. We strongly recommend people to update your clusters. IMPORTANT NOTE: - - fence_xvm has now been obsoleted. fence_xvmd is provided as backward compatibility tool. The new replacement can be downloaded here: http://fence-virt.sourceforge.net/ and it also includes a fence_xvm compatibility mode. In order to build the 3.0.6 release you will need: - - corosync 1.1.2 - - openais 1.1.1 - - linux kernel 2.6.31 The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-3.0.6.tar.gz https://fedorahosted.org/releases/c/l/cluster/cluster-3.0.6.tar.gz To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Thanks/congratulations to all people that contributed to achieve this great milestone. Happy clustering, Fabio Under the hood (from 3.0.5): Abhijith Das (3): Revert "gfs2_convert: Fix rgrp conversion to allow re-converts" gfs2_convert: Fix rgrp conversion to allow re-converts gfs2_convert: Fix conversion of inodes with different heights on gfs1 and gfs2 Bob Peterson (2): GFS2: fsck.gfs2 should fix the system statfs file GFS kernel panic, suid + nfsd with posix ACLs enabled Christine Caulfield (3): cman: Look for group_tool in SBINDIR rather than PATH cman: Make consensus twice token timeout Revert "cman: Look for group_tool in SBINDIR rather than PATH" David Teigland (3): group_tool: remove "groupd not running" dlm_controld: set rmem for sctp cman: remove set_networking_params Fabio M. Di Nitto (7): rgmanager: make init script LSB compliant cman init: make init script LSB compliant rgmanager: init script should create lock file cman init: update help text rgmanager init: update help text rgmanage init: no need to re-init variables around fence_xvm: obsole in favour of fence_virt Federico Simoncelli (1): resource-agents: Fix vm.sh return codes Lon Hohberger (4): resource-agents: Add "path" support to virsh mode resource-agents: Fix some path support bugs in vm.sh resource-agents: Fix vm.sh migration failure handling config: Update Schemas for new fence_scsi Marek 'marx' Grac (1): fence: RSB fence agents changed interface a bit Ryan O'Hara (4): Remove fence_scsi_test.pl and update Makefile. New fence_scsi with config options. Change location of key file to /var/lib/cluster/fence_scsi.key Update fence_scsi man page. cman/daemon/cman-preconfig.c | 8 +- cman/init.d/cman.in | 74 ++--- config/plugins/ldap/99cluster.ldif | 30 +- config/plugins/ldap/ldap-base.csv | 5 +- config/tools/xml/cluster.rng.in | 25 ++- fence/agents/rsb/fence_rsb.py | 17 +- fence/agents/scsi/Makefile | 4 +- fence/agents/scsi/fence_scsi.pl | 712 ++++++++++++++++++++++------------ fence/agents/scsi/fence_scsi_test.pl | 236 ----------- fence/agents/xvm/Makefile | 51 +-- fence/agents/xvm/fence_xvm.c | 380 ------------------ fence/agents/xvm/ip_lookup.c | 307 --------------- fence/agents/xvm/ip_lookup.h | 22 - fence/man/fence_scsi.8 | 148 ++++---- gfs-kernel/src/gfs/eattr.c | 107 +++--- gfs-kernel/src/gfs/ops_file.c | 5 +- gfs2/convert/gfs2_convert.c | 3 +- gfs2/fsck/main.c | 70 ++++- gfs2/libgfs2/libgfs2.h | 3 +- gfs2/libgfs2/structures.c | 60 ++-- gfs2/mkfs/main_mkfs.c | 3 +- group/dlm_controld/action.c | 88 +++++ group/tool/main.c | 1 - rgmanager/init.d/rgmanager.in | 14 +- rgmanager/src/resources/vm.sh | 164 +++++++-- 25 files changed, 1044 insertions(+), 1493 deletions(-) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAksdmTgACgkQhCzbekR3nhiyRgCfdBL4GpYG48HZaULWaaP6EvrG s+YAoJ2OLEKHjkHBAO+AkJs264y8kyUe =vdD4 -----END PGP SIGNATURE----- From avi at myphonebook.co.in Wed Dec 9 07:03:59 2009 From: avi at myphonebook.co.in (avi at myphonebook.co.in) Date: Wed, 09 Dec 2009 12:33:59 +0530 (IST) Subject: [Linux-cluster] Cluster configuration enquiry Message-ID: <1260342239.14007@myphonebook.co.in> Hi I am a newbie to clustering in Linux. Just wanted some advice. My requirements are as under: I am hosting several domains and dynamic/static websites. I need load balancing and redundancy. Hardware : 3 systems ( one public IP address ). outside world internal lan ---------------------> node A ------------------> node B public IP address | | internal lan | v node C The cluster will use LVS-NAT and mysql clustering on gigabit ethernet. node A: two interfaces with a public ip and an internal lan IP. It will host the mysql management node and LVS. node B and node C: apache + mysql storage nodes. connected to node A on internal IP. LVS with persistence will make sure that user sessions are honored. Mysql cluster will make sure that the databases are up to date, on both nodes B and C. I do not plan to use GFS, because I do not want to invest in a SAN right now. Any ideas or comments? From pradhanparas at gmail.com Wed Dec 9 18:54:10 2009 From: pradhanparas at gmail.com (Paras pradhan) Date: Wed, 9 Dec 2009 12:54:10 -0600 Subject: [Linux-cluster] changing heartbeat interface Message-ID: <8b711df40912091054m3a8c5d7ax42d7cd0143898fde@mail.gmail.com> hi, I believe its not recommend but just curious to know about the consequences of changing the heartbeat of the cluster to the 2nd interface of the cluster nodes. In this case if the network switch fails , then cluster will still be quorate since they will be connected each other with the 2nd interfaces of the nodes and will not be fenced. Thanks Paras. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rvandolson at esri.com Wed Dec 9 19:08:28 2009 From: rvandolson at esri.com (Ray Van Dolson) Date: Wed, 9 Dec 2009 11:08:28 -0800 Subject: [Linux-cluster] Backup strategies for large-ish GFS2 filesystems. Message-ID: <20091209190828.GA8880@esri.com> How do those of you with large-ish GFS2 filesystems (and multiple nodes) handle backups? I'm specifically thinking of people running mailspools and such with many files. I'd be interested in hearing your space usage, inode usage and how long it takes you to do a full and diff backup to see if the numbers we're seeing are reasonable. Thanks! Ray From johannes.russek at io-consulting.net Thu Dec 10 11:03:48 2009 From: johannes.russek at io-consulting.net (jr) Date: Thu, 10 Dec 2009 12:03:48 +0100 Subject: [Linux-cluster] Backup strategies for large-ish GFS2 filesystems. In-Reply-To: <20091209190828.GA8880@esri.com> References: <20091209190828.GA8880@esri.com> Message-ID: <1260443028.15239.2.camel@dell-jr.intern.win-rar.com> Hello Ray, unfortunately we only have a very small gfs volume running, but how are you doing backups? Are you doing snapshots and mounting them with lockproto=lock_nolock? regards, Johannes Am Mittwoch, den 09.12.2009, 11:08 -0800 schrieb Ray Van Dolson: > How do those of you with large-ish GFS2 filesystems (and multiple > nodes) handle backups? I'm specifically thinking of people running > mailspools and such with many files. > > I'd be interested in hearing your space usage, inode usage and how long > it takes you to do a full and diff backup to see if the numbers we're > seeing are reasonable. > > Thanks! > Ray > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From gbmiglia at yahoo.it Thu Dec 10 15:15:32 2009 From: gbmiglia at yahoo.it (gilberto migliavacca) Date: Thu, 10 Dec 2009 16:15:32 +0100 Subject: [Linux-cluster] redhat cluster and resource agent In-Reply-To: <1260227761.6606.9.camel@mecatol> References: <4B1D3E2D.8040109@yahoo.it> <1260227761.6606.9.camel@mecatol> Message-ID: <4B211094.5010501@yahoo.it> Thanks for helping me. now the configuration seems ok; but I have another problem, I'll open a new thred gilberto Rafael Mic? Miranda wrote: > Hi Gilberto, > > What you need to specify where to run each service is the Failover > Domain of each service. > > Some info: > > http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Cluster_Administration/s1-config-failover-domain-CA.html > http://sources.redhat.com/cluster/wiki/FailoverDomains > > You have 4 different services, so I would use 2 or 4 different Failover > Domains to achieve your objective, depending on the availability of > running each of your services in your cluster nodes. > > First you will need to define de Failover Domains: > > Failover Domain X > Restricted domain: yes > Ordered: yes > Node A - Priority 1 > Node B - Priority 2 > > And so on. > > Then you'll need to set the Failover Domain for each of the services, > for example: > Service 1 -> FailoverDomain1 > Service 2 -> FailoverDomain2 > Service 3 -> FailoverDomain3 > Service 4 -> FailoverDomain4 > > This can be all done with system-config-cluster, but using a resource > made by yourself into cluster.conf will give you some errors. > > It should be similar to this: > > > > restricted="1"> priority="1"/> priority="2"/> > restricted="1"> priority="1"/> priority="2"/> > > > [I think you need your definition of your myapp resources here] > > [...and so on] > > [and then start the definition of your services] > > > > [... and so on] > > > Another question is: is your script usable by CMAN? > > I hope this helps. Cheers, > > Rafael > > El lun, 07-12-2009 a las 18:41 +0100, gilberto migliavacca escribi?: >> Hi >> >> I'm a newbie in the red hat cluster configuration and >> I don't know if this is the right mailing list for my >> question. >> >> I have to use my own resource agent script and I have >> to say to the cluster that the related service must be >> run just on single server. >> >> I other words I want to drive 2 nodes with 4 instances >> of the same application (2 instances per node). >> >> the infostructure is somehting like: >> >> node_1 >> /opt/myapp_11/bin/myapp.sh >> /opt/myapp_12/bin/myapp.sh >> node_2 >> /opt/myapp_21/bin/myapp.sh >> /opt/myapp_22/bin/myapp.sh >> >> >> My idea is to create 4 services in the /etc/cluster/cluster.conf >> but I don't know how to related the service with a >> given machine and a related path on the given machine >> >> >> for my understanding I think I cannot use the Conga GUI (neither >> the system-config-cluster) and I have to edit manually the >> /etc/cluster/cluster.conf >> >> could anyone help to write the XML section in the tag? >> >> something like >> >> >> >> > myapp_home="/opt/myapp_11" >> shutdown_wait="0"/> >> >> >> > myapp_home="/opt/myapp_12" >> shutdown_wait="0"/> >> >> >> > myapp_home="/opt/myapp_21" >> shutdown_wait="0"/> >> >> >> > myapp_home="/opt/myapp_22" >> shutdown_wait="0"/> >> >> >> >> >> As you can see I don't know how to specify the node >> >> thanks in advance >> >> gilberto >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster From gbmiglia at yahoo.it Thu Dec 10 15:26:30 2009 From: gbmiglia at yahoo.it (gilberto migliavacca) Date: Thu, 10 Dec 2009 16:26:30 +0100 Subject: [Linux-cluster] how to start/stop a service Message-ID: <4B211326.3060102@yahoo.it> Hi I have the following configuration: 2 nodes with the same application "fun". This is the /etc/cluster/cluster.conf Now I'd like to manage (start/stop) manually both instances from the node redhat02.fun.uk; I'm using the command line tool but when I run clusvcadm -e fun11 -m redhat02.fun.uk the application starts correctly when I run clusvcadm -e fun22 -m redhat03.fun.uk the output says: Member redhat03.fun.uk trying to enable service:fun22...Success service:fun22 is now running on redhat03.fun.uk but the service is not up and running on the redhat03.fun.uk can anybody help me with this issue? thanks in advance gilberto From kkovachev at varna.net Thu Dec 10 15:27:44 2009 From: kkovachev at varna.net (Kaloyan Kovachev) Date: Thu, 10 Dec 2009 17:27:44 +0200 Subject: [Linux-cluster] validity error Message-ID: <20091210151736.M20864@varna.net> Hello, after upgrading to 3.0.6 i get: Starting cman... Relax-NG validity error : Extra element cman in interleave but cluster.conf should be correct and was working so far without problems. The coresponding section in is: how should i change it to pass the validity check? From ccaulfie at redhat.com Thu Dec 10 16:12:33 2009 From: ccaulfie at redhat.com (Christine Caulfield) Date: Thu, 10 Dec 2009 16:12:33 +0000 Subject: [Linux-cluster] validity error In-Reply-To: <20091210151736.M20864@varna.net> References: <20091210151736.M20864@varna.net> Message-ID: <4B211DF1.9030305@redhat.com> On 10/12/09 15:27, Kaloyan Kovachev wrote: > Hello, > after upgrading to 3.0.6 i get: > > Starting cman... Relax-NG validity error : Extra element cman in interleave > > but cluster.conf should be correct and was working so far without problems. > The coresponding section in is: > > > > > > how should i change it to pass the validity check? Remove the keyfile="" attribute. cman ignores it anyway :-) If you need to specify an encrpytion key it should go into the part of cluster.conf. Chrissie From rvandolson at esri.com Thu Dec 10 16:25:09 2009 From: rvandolson at esri.com (Ray Van Dolson) Date: Thu, 10 Dec 2009 08:25:09 -0800 Subject: [Linux-cluster] Backup strategies for large-ish GFS2 filesystems. In-Reply-To: <1260443028.15239.2.camel@dell-jr.intern.win-rar.com> References: <20091209190828.GA8880@esri.com> <1260443028.15239.2.camel@dell-jr.intern.win-rar.com> Message-ID: <20091210162508.GA24895@esri.com> On Thu, Dec 10, 2009 at 03:03:48AM -0800, jr wrote: > Hello Ray, > unfortunately we only have a very small gfs volume running, but how are > you doing backups? Are you doing snapshots and mounting them with > lockproto=lock_nolock? > regards, > Johannes That would be ideal -- unfortunately our underlying storage hardware (IBM DS4300/FASt600) does not support snapshots. If cLVM supported snapshots I'd jump on going that route in a millisecond... :) We've tried three methods (1) NetBackup to exposed NFS export of GFS2 filesystem; (2) rsync from remote machine to rsyncd on GFS2 node; (3) rsync from remote machine to NFS export of GFS2 filesystem. Option 1 is the slowest (6+ hours), 2 is somewhat better (3 hours) and 3 has been our best bet so far (82 minutes). This is using the --size-only argument to rsycn in an effort to avoid reading mtime on an inode. Probably not much gain though as it appears stat() is called anyways. I'm kind of surprised that rsync to NFS is faster than rsync --daemon. I have been testing with our GFS2 filesystem mounted in spectator mode on the passive node, but I don't think it's really making much difference. It would be nice if GFS2 had some backup-friendly type options for caching some of this information about all our inodes. I mean, obviously it does -- but some knobs we could easily turn on a node we intend to run backups from that, given ample amount of memory, cache all the stat() information for 24+ hour periods... Or maybe some cluster filesystem friendly backup tools as I see these problems exist on OCFS2 and Lustre as well... Thanks for the reply. > > Am Mittwoch, den 09.12.2009, 11:08 -0800 schrieb Ray Van Dolson: > > How do those of you with large-ish GFS2 filesystems (and multiple > > nodes) handle backups? I'm specifically thinking of people running > > mailspools and such with many files. > > > > I'd be interested in hearing your space usage, inode usage and how long > > it takes you to do a full and diff backup to see if the numbers we're > > seeing are reasonable. > > > > Thanks! > > Ray > > Ray From gbmiglia at yahoo.it Thu Dec 10 16:45:02 2009 From: gbmiglia at yahoo.it (gilberto migliavacca) Date: Thu, 10 Dec 2009 17:45:02 +0100 Subject: [Linux-cluster] SOLVED - how to start/stop a service In-Reply-To: <4B211326.3060102@yahoo.it> References: <4B211326.3060102@yahoo.it> Message-ID: <4B21258E.5050107@yahoo.it> there was an error in the log due a incorrect settings for a given property in the "metadafile". in that case the configuration was not applied correctly. Now I fixed the problem in the "metadafile" and the clusvcadm command works properly gilberto gilberto migliavacca wrote: > Hi > > I have the following configuration: 2 nodes with the same > application "fun". > This is the /etc/cluster/cluster.conf > > > > > > > > > > > > > > > > > > > > > restricted="1"> > > > restricted="1"> > > > > > > > > > > > > > > > > > > > > > Now I'd like to manage (start/stop) manually both instances > from the node redhat02.fun.uk; > > I'm using the command line tool but when I run > > clusvcadm -e fun11 -m redhat02.fun.uk > > the application starts correctly > > when I run > > clusvcadm -e fun22 -m redhat03.fun.uk > > the output says: > > Member redhat03.fun.uk trying to enable service:fun22...Success > service:fun22 is now running on redhat03.fun.uk > > but the service is not up and running on the redhat03.fun.uk > > > can anybody help me with this issue? > > thanks in advance > > gilberto > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From kkovachev at varna.net Fri Dec 11 09:48:02 2009 From: kkovachev at varna.net (Kaloyan Kovachev) Date: Fri, 11 Dec 2009 11:48:02 +0200 Subject: [Linux-cluster] validity error In-Reply-To: <4B211DF1.9030305@redhat.com> References: <20091210151736.M20864@varna.net> <4B211DF1.9030305@redhat.com> Message-ID: <20091211093427.M4078@varna.net> On Thu, 10 Dec 2009 16:12:33 +0000, Christine Caulfield wrote > On 10/12/09 15:27, Kaloyan Kovachev wrote: > > Hello, > > after upgrading to 3.0.6 i get: > > > > Starting cman... Relax-NG validity error : Extra element cman in interleave > > > > but cluster.conf should be correct and was working so far without problems. > > The coresponding section in is: > > > > > > > > > > > > how should i change it to pass the validity check? > > Remove the keyfile="" attribute. cman ignores it anyway :-) > I am sure it was working with RHCM v2, so it seems i will need to rewrite the config for V3, as i get another error now about specifying multicast interface for clusternode and there will be others for sure > If you need to specify an encrpytion key it should go into the > part of cluster.conf. > looking at cluster.rng keyfile is valid for the cman block. May i just move it there or i should create > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From ccaulfie at redhat.com Fri Dec 11 09:58:38 2009 From: ccaulfie at redhat.com (Christine Caulfield) Date: Fri, 11 Dec 2009 09:58:38 +0000 Subject: [Linux-cluster] validity error In-Reply-To: <20091211093427.M4078@varna.net> References: <20091210151736.M20864@varna.net> <4B211DF1.9030305@redhat.com> <20091211093427.M4078@varna.net> Message-ID: <4B2217CE.3060007@redhat.com> On 11/12/09 09:48, Kaloyan Kovachev wrote: > On Thu, 10 Dec 2009 16:12:33 +0000, Christine Caulfield wrote >> On 10/12/09 15:27, Kaloyan Kovachev wrote: >>> Hello, >>> after upgrading to 3.0.6 i get: >>> >>> Starting cman... Relax-NG validity error : Extra element cman in interleave >>> >>> but cluster.conf should be correct and was working so far without problems. >>> The coresponding section in is: >>> >>> >>> >>> >>> >>> how should i change it to pass the validity check? >> >> Remove the keyfile="" attribute. cman ignores it anyway :-) >> > > I am sure it was working with RHCM v2, so it seems i will need to rewrite the > config for V3, as i get another error now about specifying multicast interface > for clusternode and there will be others for sure Yes, it would work fine under v2. In fact it's working now - you're just getting a warning message (I hope!). We have added a lot more checks to the configuration to try and help invalid configurations from being run and causing trouble. >> If you need to specify an encrpytion key it should go into the >> part of cluster.conf. >> > > looking at cluster.rng keyfile is valid for the cman block. May i just move it > there or i should create I would just remove it. It's not doing anything, so if you move it to you will change the encryption key used by the cluster and have to reboot all your nodes to get them communicating again. Chrissie From kkovachev at varna.net Fri Dec 11 10:21:41 2009 From: kkovachev at varna.net (Kaloyan Kovachev) Date: Fri, 11 Dec 2009 12:21:41 +0200 Subject: [Linux-cluster] validity error In-Reply-To: <4B2217CE.3060007@redhat.com> References: <20091210151736.M20864@varna.net> <4B211DF1.9030305@redhat.com> <20091211093427.M4078@varna.net> <4B2217CE.3060007@redhat.com> Message-ID: <20091211100852.M47131@varna.net> On Fri, 11 Dec 2009 09:58:38 +0000, Christine Caulfield wrote > On 11/12/09 09:48, Kaloyan Kovachev wrote: > > On Thu, 10 Dec 2009 16:12:33 +0000, Christine Caulfield wrote > >> On 10/12/09 15:27, Kaloyan Kovachev wrote: > >>> Hello, > >>> after upgrading to 3.0.6 i get: > >>> > >>> Starting cman... Relax-NG validity error : Extra element cman in interleave > >>> > >>> but cluster.conf should be correct and was working so far without problems. > >>> The coresponding section in is: > >>> > >>> > >>> > >>> > >>> > >>> how should i change it to pass the validity check? > >> > >> Remove the keyfile="" attribute. cman ignores it anyway :-) > >> > > > > I am sure it was working with RHCM v2, so it seems i will need to rewrite the > > config for V3, as i get another error now about specifying multicast interface > > for clusternode and there will be others for sure > > Yes, it would work fine under v2. In fact it's working now - you're just > getting a warning message (I hope!). We have added a lot more checks to > the configuration to try and help invalid configurations from being run > and causing trouble. when starting the cluster i get just warnings, but updating the config and using cman_tool version -r cman doesn't reload it, so i am forced to fix my errors :) > > >> If you need to specify an encrpytion key it should go into the > >> part of cluster.conf. > >> > > > > looking at cluster.rng keyfile is valid for the cman block. May i just move it > > there or i should create > > I would just remove it. It's not doing anything, so if you move it to > you will change the encryption key used by the cluster and have > to reboot all your nodes to get them communicating again. > The cluster is not a production one, so it is OK and am looking for the correct end result. My question was actually 'Is encription key valid/used only from section or in too as described in cluster.rng file'. Multicast and keyfile are present in both and ... i guess is the preferred one for future compatibility? > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From ccaulfie at redhat.com Fri Dec 11 10:24:49 2009 From: ccaulfie at redhat.com (Christine Caulfield) Date: Fri, 11 Dec 2009 10:24:49 +0000 Subject: [Linux-cluster] validity error In-Reply-To: <20091211100852.M47131@varna.net> References: <20091210151736.M20864@varna.net> <4B211DF1.9030305@redhat.com> <20091211093427.M4078@varna.net> <4B2217CE.3060007@redhat.com> <20091211100852.M47131@varna.net> Message-ID: <4B221DF1.2030300@redhat.com> On 11/12/09 10:21, Kaloyan Kovachev wrote: > On Fri, 11 Dec 2009 09:58:38 +0000, Christine Caulfield wrote >> On 11/12/09 09:48, Kaloyan Kovachev wrote: >>> On Thu, 10 Dec 2009 16:12:33 +0000, Christine Caulfield wrote >>>> On 10/12/09 15:27, Kaloyan Kovachev wrote: >>>>> Hello, >>>>> after upgrading to 3.0.6 i get: >>>>> >>>>> Starting cman... Relax-NG validity error : Extra element cman in interleave >>>>> >>>>> but cluster.conf should be correct and was working so far without problems. >>>>> The coresponding section in is: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> how should i change it to pass the validity check? >>>> >>>> Remove the keyfile="" attribute. cman ignores it anyway :-) >>>> >>> >>> I am sure it was working with RHCM v2, so it seems i will need to rewrite the >>> config for V3, as i get another error now about specifying multicast interface >>> for clusternode and there will be others for sure >> >> Yes, it would work fine under v2. In fact it's working now - you're just >> getting a warning message (I hope!). We have added a lot more checks to >> the configuration to try and help invalid configurations from being run >> and causing trouble. > > when starting the cluster i get just warnings, but updating the config and > using cman_tool version -r cman doesn't reload it, so i am forced to fix my > errors :) > >> >>>> If you need to specify an encrpytion key it should go into the >>>> part of cluster.conf. >>>> >>> >>> looking at cluster.rng keyfile is valid for the cman block. May i just move it >>> there or i should create >> >> I would just remove it. It's not doing anything, so if you move it to >> you will change the encryption key used by the cluster and have >> to reboot all your nodes to get them communicating again. >> > > The cluster is not a production one, so it is OK and am looking for the > correct end result. My question was actually 'Is encription key valid/used > only from section or in too as described in cluster.rng file'. > > Multicast and keyfile are present in both and ... i guess > is the preferred one for future compatibility? > Confusingly, multicast must be part of and keyfile should be part of . That's just how it is, sorry ;-) Chrissie From kkovachev at varna.net Fri Dec 11 11:21:36 2009 From: kkovachev at varna.net (Kaloyan Kovachev) Date: Fri, 11 Dec 2009 13:21:36 +0200 Subject: [Linux-cluster] validity error In-Reply-To: <4B221DF1.2030300@redhat.com> References: <20091210151736.M20864@varna.net> <4B211DF1.9030305@redhat.com> <20091211093427.M4078@varna.net> <4B2217CE.3060007@redhat.com> <20091211100852.M47131@varna.net> <4B221DF1.2030300@redhat.com> Message-ID: <20091211105109.M90980@varna.net> On Fri, 11 Dec 2009 10:24:49 +0000, Christine Caulfield wrote > On 11/12/09 10:21, Kaloyan Kovachev wrote: > > On Fri, 11 Dec 2009 09:58:38 +0000, Christine Caulfield wrote > >> On 11/12/09 09:48, Kaloyan Kovachev wrote: > >>> On Thu, 10 Dec 2009 16:12:33 +0000, Christine Caulfield wrote > >>>> On 10/12/09 15:27, Kaloyan Kovachev wrote: > >>>>> Hello, > >>>>> after upgrading to 3.0.6 i get: > >>>>> > >>>>> Starting cman... Relax-NG validity error : Extra element cman in interleave > >>>>> > >>>>> but cluster.conf should be correct and was working so far without problems. > >>>>> The coresponding section in is: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> how should i change it to pass the validity check? > >>>> > >>>> Remove the keyfile="" attribute. cman ignores it anyway :-) > >>>> > >>> > >>> I am sure it was working with RHCM v2, so it seems i will need to rewrite the > >>> config for V3, as i get another error now about specifying multicast interface > >>> for clusternode and there will be others for sure > >> > >> Yes, it would work fine under v2. In fact it's working now - you're just > >> getting a warning message (I hope!). We have added a lot more checks to > >> the configuration to try and help invalid configurations from being run > >> and causing trouble. > > > > when starting the cluster i get just warnings, but updating the config and > > using cman_tool version -r cman doesn't reload it, so i am forced to fix my > > errors :) > > > >> > >>>> If you need to specify an encrpytion key it should go into the > >>>> part of cluster.conf. > >>>> > >>> > >>> looking at cluster.rng keyfile is valid for the cman block. May i just move it > >>> there or i should create > >> > >> I would just remove it. It's not doing anything, so if you move it to > >> you will change the encryption key used by the cluster and have > >> to reboot all your nodes to get them communicating again. > >> > > > > The cluster is not a production one, so it is OK and am looking for the > > correct end result. My question was actually 'Is encription key valid/used > > only from section or in too as described in cluster.rng file'. > > > > Multicast and keyfile are present in both and ... i guess > > is the preferred one for future compatibility? > > > > Confusingly, multicast must be part of and keyfile should be part > of . > > That's just how it is, sorry ;-) > Thanks. The validation schema should be updated then, as it allows keyfile in cman too (fixed in my copy and could provide a patch). I still can't find how do i specify per node interface. There is interface allowed only in section while most of the nodes have (and use) bond0, for one of them i need to specify different interface for cluster communication. Is per node interface attribute missed from the validation schema or is removed completely? > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From ccaulfie at redhat.com Fri Dec 11 11:36:53 2009 From: ccaulfie at redhat.com (Christine Caulfield) Date: Fri, 11 Dec 2009 11:36:53 +0000 Subject: [Linux-cluster] validity error In-Reply-To: <20091211105109.M90980@varna.net> References: <20091210151736.M20864@varna.net> <4B211DF1.9030305@redhat.com> <20091211093427.M4078@varna.net> <4B2217CE.3060007@redhat.com> <20091211100852.M47131@varna.net> <4B221DF1.2030300@redhat.com> <20091211105109.M90980@varna.net> Message-ID: <4B222ED5.4040805@redhat.com> On 11/12/09 11:21, Kaloyan Kovachev wrote: > On Fri, 11 Dec 2009 10:24:49 +0000, Christine Caulfield wrote >> On 11/12/09 10:21, Kaloyan Kovachev wrote: >>> On Fri, 11 Dec 2009 09:58:38 +0000, Christine Caulfield wrote >>>> On 11/12/09 09:48, Kaloyan Kovachev wrote: >>>>> On Thu, 10 Dec 2009 16:12:33 +0000, Christine Caulfield wrote >>>>>> On 10/12/09 15:27, Kaloyan Kovachev wrote: >>>>>>> Hello, >>>>>>> after upgrading to 3.0.6 i get: >>>>>>> >>>>>>> Starting cman... Relax-NG validity error : Extra element cman in > interleave >>>>>>> >>>>>>> but cluster.conf should be correct and was working so far without > problems. >>>>>>> The coresponding section in is: >>>>>>> >>>>>>> >>>>>>> keyfile="/etc/cluster/cman_authkey"/> >>>>>>> >>>>>>> >>>>>>> how should i change it to pass the validity check? >>>>>> >>>>>> Remove the keyfile="" attribute. cman ignores it anyway :-) >>>>>> >>>>> >>>>> I am sure it was working with RHCM v2, so it seems i will need to > rewrite the >>>>> config for V3, as i get another error now about specifying multicast > interface >>>>> for clusternode and there will be others for sure >>>> >>>> Yes, it would work fine under v2. In fact it's working now - you're just >>>> getting a warning message (I hope!). We have added a lot more checks to >>>> the configuration to try and help invalid configurations from being run >>>> and causing trouble. >>> >>> when starting the cluster i get just warnings, but updating the config and >>> using cman_tool version -r cman doesn't reload it, so i am forced to fix my >>> errors :) >>> >>>> >>>>>> If you need to specify an encrpytion key it should go into the >>>>>> part of cluster.conf. >>>>>> >>>>> >>>>> looking at cluster.rng keyfile is valid for the cman block. May i just > move it >>>>> there or i should create >>>> >>>> I would just remove it. It's not doing anything, so if you move it to >>>> you will change the encryption key used by the cluster and have >>>> to reboot all your nodes to get them communicating again. >>>> >>> >>> The cluster is not a production one, so it is OK and am looking for the >>> correct end result. My question was actually 'Is encription key valid/used >>> only from section or in too as described in cluster.rng file'. >>> >>> Multicast and keyfile are present in both and ... i guess >>> is the preferred one for future compatibility? >>> >> >> Confusingly, multicast must be part of and keyfile should be part >> of. >> >> That's just how it is, sorry ;-) >> > > Thanks. The validation schema should be updated then, as it allows keyfile in > cman too (fixed in my copy and could provide a patch). Hmm. I am totally wrong. Very sorry. keyfile IS allowed in cluster3, it overrides the one assigned in totem. In which case I'm not sure why it's failing to validate on your system. The schema is a bit of a work-in-progress at the moment, which is why it warns rather than fails if it finds an error. Did it work when you removed keyfile ? > I still can't find how do i specify per node interface. There is interface > allowed only in section while most of the nodes have (and use) bond0, > for one of them i need to specify different interface for cluster > communication. Is per node interface attribute missed from the validation > schema or is removed completely? > cman always binds to the address of the host given as a node name. see http://sources.redhat.com/cluster/wiki/FAQ/CMAN#cman_heartbeat_nic Chrissie From kkovachev at varna.net Fri Dec 11 12:32:39 2009 From: kkovachev at varna.net (Kaloyan Kovachev) Date: Fri, 11 Dec 2009 14:32:39 +0200 Subject: [Linux-cluster] validity error In-Reply-To: <4B222ED5.4040805@redhat.com> References: <20091210151736.M20864@varna.net> <4B211DF1.9030305@redhat.com> <20091211093427.M4078@varna.net> <4B2217CE.3060007@redhat.com> <20091211100852.M47131@varna.net> <4B221DF1.2030300@redhat.com> <20091211105109.M90980@varna.net> <4B222ED5.4040805@redhat.com> Message-ID: <20091211115838.M23490@varna.net> On Fri, 11 Dec 2009 11:36:53 +0000, Christine Caulfield wrote > > Hmm. I am totally wrong. Very sorry. keyfile IS allowed in cluster3, it > overrides the one assigned in totem. In which case I'm not sure why it's > failing to validate on your system. > according to the validation file (cluster.rng) it should be an attribute of cman, while in my case it was attribute of multicast subelement and is not allowed there > The schema is a bit of a work-in-progress at the moment, which is why it > warns rather than fails if it finds an error. Did it work when you > removed keyfile ? > Yes i know and trying to help with one more config case (hence my email here in the first place). I have replaced it with and passed this warning, but still can't pass the validation because of rm ... still looking for the reason. > > I still can't find how do i specify per node interface. There is interface > > allowed only in section while most of the nodes have (and use) bond0, > > for one of them i need to specify different interface for cluster > > communication. Is per node interface attribute missed from the validation > > schema or is removed completely? > > > > cman always binds to the address of the host given as a node name. see > > http://sources.redhat.com/cluster/wiki/FAQ/CMAN#cman_heartbeat_nic > I have used the answer of "How can I configure my RHEL4 cluster to use multicast rather than broadcast?" few lines below (with V2 initially), so it is safe to just remove this in my conf now and that step was passed. > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From kkovachev at varna.net Fri Dec 11 13:34:26 2009 From: kkovachev at varna.net (Kaloyan Kovachev) Date: Fri, 11 Dec 2009 15:34:26 +0200 Subject: [Linux-cluster] validity error In-Reply-To: <20091211115838.M23490@varna.net> References: <20091210151736.M20864@varna.net> <4B211DF1.9030305@redhat.com> <20091211093427.M4078@varna.net> <4B2217CE.3060007@redhat.com> <20091211100852.M47131@varna.net> <4B221DF1.2030300@redhat.com> <20091211105109.M90980@varna.net> <4B222ED5.4040805@redhat.com> <20091211115838.M23490@varna.net> Message-ID: <20091211130955.M1600@varna.net> Update On Fri, 11 Dec 2009 14:32:39 +0200, Kaloyan Kovachev wrote > On Fri, 11 Dec 2009 11:36:53 +0000, Christine Caulfield wrote > > > > > > > Hmm. I am totally wrong. Very sorry. keyfile IS allowed in cluster3, it > > overrides the one assigned in totem. In which case I'm not sure why it's > > failing to validate on your system. > > > > according to the validation file (cluster.rng) it should be an attribute of > cman, while in my case it was attribute of multicast subelement and is not > allowed there > > > The schema is a bit of a work-in-progress at the moment, which is why it > > warns rather than fails if it finds an error. Did it work when you > > removed keyfile ? > > > > Yes i know and trying to help with one more config case (hence my email here > in the first place). I have replaced it with and passed > this warning, but still can't pass the validation because of rm ... still > looking for the reason. > After i have added validation section for our custom service resource and reloaded the config the nodes were still communicating with each other. Then i have restarted one of the nodes and it didn't join the cluster, but created a new one as do other restarted nodes did later. It seems the keyfile was not active with the old config and not activated on just reload > > > I still can't find how do i specify per node interface. There is interface > > > allowed only in section while most of the nodes have (and use) bond0, > > > for one of them i need to specify different interface for cluster > > > communication. Is per node interface attribute missed from the validation > > > schema or is removed completely? > > > > > > > cman always binds to the address of the host given as a node name. see > > > > http://sources.redhat.com/cluster/wiki/FAQ/CMAN#cman_heartbeat_nic > > > > I have used the answer of "How can I configure my RHEL4 cluster to use > multicast rather than broadcast?" few lines below (with V2 initially), so it > is safe to just remove this in my conf now and that step was passed. > > > Chrissie > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From baishuwei at gmail.com Sun Dec 13 04:46:48 2009 From: baishuwei at gmail.com (Bai Shuwei) Date: Sun, 13 Dec 2009 12:46:48 +0800 Subject: [Linux-cluster] How to set lun masking Message-ID: Hi, All: The bellow is my system architecture HOST0 ---| HOST1 ---| --switch --| server (disk0, disk1, disk2 or LUN0, LUN1, LUN2). HOST2 ---| I want to assign LUN0 to HOST0, LUN1 to HOST1, and LUN2 to HOST2. There is only one QLogic HBA on the server. In my zone i build, all HOST can see all LUNs. So I want to know how to make it possible to make one to one.? I have installed CSCT and scli tools on my server. How i configure my server to make the network work? THanks all! Bai SHuwei -- Love other people, as same as love yourself! Don't think all the time, do it by your hands! Personal URL: http://dslab.lzu.edu.cn:8080/members/baishw/ E-Mail: baishuwei at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From baishuwei at gmail.com Sun Dec 13 07:05:47 2009 From: baishuwei at gmail.com (Bai Shuwei) Date: Sun, 13 Dec 2009 15:05:47 +0800 Subject: [Linux-cluster] LUN/LUN Masking In-Reply-To: <7207d96f0912032026n7ef04c7ahc86b24ca482e3326@mail.gmail.com> References: <7207d96f0912032026n7ef04c7ahc86b24ca482e3326@mail.gmail.com> Message-ID: On Fri, Dec 4, 2009 at 12:26 PM, Fajar A. Nugraha wrote: > On Fri, Dec 4, 2009 at 10:06 AM, Bai Shuwei wrote: > > HI, everyone: > > I am a begginer on FC-SAN. On my machine i have installed > > HBAs(QLogic2342) and SLCI tools. How I can setup the lun masking to > > forbidden/allow hosts to access special LUN/Disk? Do I need some other > > speccial tools to do it? Thanks all. > > AFAIK LUN masking is done on storage side, not client side. > > How I configure my storage to meet the requirement? Or which tool can help me? Thanks! > -- > Fajar > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Love other people, as same as love yourself! Don't think all the time, do it by your hands! Personal URL: http://dslab.lzu.edu.cn:8080/members/baishw/ E-Mail: baishuwei at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From arturogf at gmail.com Mon Dec 14 08:49:04 2009 From: arturogf at gmail.com (Arturo Gonzalez Ferrer) Date: Mon, 14 Dec 2009 09:49:04 +0100 Subject: [Linux-cluster] Adding a new node to rh cluster + GFS2 Message-ID: Dear all, I'm in trouble with adding a new node to an existing cluster of three nodes (so I want to have four), because it somehow doesn't let me access the cluster infrastructure. These 3 nodes were set up as http servers, sharing a GFS2 volume (physical: vg_cluster, logical: lv_cluster) where data is stored. I want to set up the new node to access the same GFS2 volume, with the idea of exporting the data via NFS, so that a remote backup library can be configured to backup nightly the data, by connecting to the new node. I've tried a lot of things, always getting same kind of errors. Running "cman_tool status" on any of the 3 nodes i get: Version: 6.2.0 Config Version: 70 Cluster Name: campusvirtual Cluster Id: 45794 Cluster Member: Yes Cluster Generation: 1136 Membership state: Cluster-Member Nodes: 3 Expected votes: 4 Total votes: 3 Quorum: 3 Active subsystems: 9 Flags: Dirty Ports Bound: 0 11 177 Node name: cev01 Node ID: 2 Multicast addresses: 239.192.178.149 Node addresses: 150.214.243.20 while running "cman_tool status" on the new node: Version: 6.2.0 Config Version: 70 Cluster Name: campusvirtual Cluster Id: 45794 Cluster Member: Yes Cluster Generation: 1124 Membership state: Cluster-Member Nodes: 1 Expected votes: 4 Total votes: 1 Quorum: 3 Activity blocked Active subsystems: 2 Flags: Ports Bound: 0 Node name: cevstream.ugr.es Node ID: 4 Multicast addresses: 239.192.178.149 Node addresses: 150.214.243.19 Running "fence_tool_dump" on the three nodes: [root at cev01 ~]# fence_tool dump dump read: Success 1260778939 our_nodeid 2 our_name cev01.ugr.es 1260778939 listen 4 member 5 groupd 7 1260778964 client 3: join default 1260778964 delay post_join 3s post_fail 0s 1260778964 added 4 nodes from ccs 1260778964 setid default 65538 1260778964 start default 1 members 2 1260778964 do_recovery stop 0 start 1 finish 0 1260778964 node "cevstream.ugr.es" not a cman member, cn 1 1260778964 add first victim cevstream.ugr.es 1260778965 node "cevstream.ugr.es" not a cman member, cn 1 1260778966 node "cevstream.ugr.es" not a cman member, cn 1 1260778967 node "cevstream.ugr.es" not a cman member, cn 1 1260778967 delay of 3s leaves 1 victims 1260778967 node "cevstream.ugr.es" not a cman member, cn 1 1260778967 node "cevstream.ugr.es" has not been fenced 1260778967 fencing node cevstream.ugr.es 1260778971 finish default 1 1260778971 stop default 1260778971 start default 2 members 3 2 1260778971 do_recovery stop 1 start 2 finish 1 1260778971 finish default 2 1260778971 stop default 1260778971 start default 3 members 1 3 2 1260778971 do_recovery stop 2 start 3 finish 2 1260778971 finish default 3 1260779876 client 3: dump while running it in the new node: [root at cevstream ~]# fence_tool dump fence_tool: can't communicate with fenced I get a lot of errors telling me that cluster is not quorate: Dec 14 09:39:20 cevstream ccsd[3668]: Cluster is not quorate. Refusing connection. Dec 14 09:39:20 cevstream ccsd[3668]: Error while processing connect: Connection refused Printing the superblock on any of the three nodes: [root at cev01 ~]# gfs2_tool sb /dev/vg_cluster/lv_cluster all mh_magic = 0x01161970 mh_type = 1 mh_format = 100 sb_fs_format = 1801 sb_multihost_format = 1900 sb_bsize = 4096 sb_bsize_shift = 12 no_formal_ino = 2 no_addr = 23 no_formal_ino = 1 no_addr = 22 sb_lockproto = lock_dlm sb_locktable = campusvirtual:gfs_cluster01 uuid = C6A9FBB4-A881-2128-2AB8-1AB8547C7F30 I've tried something i saw in some forums, deactivating and even removing the logical volume (with lvremove), because supposedly the new node could need this operation in order to access the gfs2 volume. Running lvcreate on the new node, with all the other nodes deactivates and removed, i still get the error: [root at cevstream ~]# lvcreate -l 100%FREE -n lv_cluster vg_cluster connect() failed on local socket: Conexi?n rehusada WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible. Skipping clustered volume group vg_cluster Find attached the configuration of cluster.conf. I'm pretty desperate with this situation, i really don't know how to deal with the adition of a new node. Best regards, Arturo. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cluster.conf Type: application/octet-stream Size: 3463 bytes Desc: not available URL: From ccaulfie at redhat.com Mon Dec 14 09:03:58 2009 From: ccaulfie at redhat.com (Christine Caulfield) Date: Mon, 14 Dec 2009 09:03:58 +0000 Subject: [Linux-cluster] Adding a new node to rh cluster + GFS2 In-Reply-To: References: Message-ID: <4B25FF7E.5000203@redhat.com> On 14/12/09 08:49, Arturo Gonzalez Ferrer wrote: > Dear all, > > I'm in trouble with adding a new node to an existing cluster of three > nodes (so I want to have four), because it somehow doesn't let me access > the cluster infrastructure. > > These 3 nodes were set up as http servers, sharing a GFS2 volume > (physical: vg_cluster, logical: lv_cluster) where data is stored. > > I want to set up the new node to access the same GFS2 volume, with the > idea of exporting the data via NFS, so that a remote backup library can > be configured to backup nightly the data, by connecting to the new node. > > I've tried a lot of things, always getting same kind of errors. > > Running "cman_tool status" on any of the 3 nodes i get: > > Version: 6.2.0 > Config Version: 70 > Cluster Name: campusvirtual > Cluster Id: 45794 > Cluster Member: Yes > Cluster Generation: 1136 > Membership state: Cluster-Member > Nodes: 3 > Expected votes: 4 > Total votes: 3 > Quorum: 3 > Active subsystems: 9 > Flags: Dirty > Ports Bound: 0 11 177 > Node name: cev01 > Node ID: 2 > Multicast addresses: 239.192.178.149 > Node addresses: 150.214.243.20 > > > while running "cman_tool status" on the new node: > > Version: 6.2.0 > Config Version: 70 > Cluster Name: campusvirtual > Cluster Id: 45794 > Cluster Member: Yes > Cluster Generation: 1124 > Membership state: Cluster-Member > Nodes: 1 This is the key. The new node can't see the network traffic of the other three. The most likely explanation for this is iptables blocking the traffic. But check other network connections and settings too - It's almost certainly a network configuration problem. The multicast and node addresses look fine to me. Chrissie From arturogf at gmail.com Mon Dec 14 09:09:21 2009 From: arturogf at gmail.com (Arturo Gonzalez Ferrer) Date: Mon, 14 Dec 2009 10:09:21 +0100 Subject: [Linux-cluster] Adding a new node to rh cluster + GFS2 In-Reply-To: <4B25FF7E.5000203@redhat.com> References: <4B25FF7E.5000203@redhat.com> Message-ID: 2009/12/14 Christine Caulfield > On 14/12/09 08:49, Arturo Gonzalez Ferrer wrote: > >> Dear all, >> >> I'm in trouble with adding a new node to an existing cluster of three >> nodes (so I want to have four), because it somehow doesn't let me access >> the cluster infrastructure. >> >> These 3 nodes were set up as http servers, sharing a GFS2 volume >> (physical: vg_cluster, logical: lv_cluster) where data is stored. >> >> I want to set up the new node to access the same GFS2 volume, with the >> idea of exporting the data via NFS, so that a remote backup library can >> be configured to backup nightly the data, by connecting to the new node. >> >> I've tried a lot of things, always getting same kind of errors. >> >> Running "cman_tool status" on any of the 3 nodes i get: >> >> Version: 6.2.0 >> Config Version: 70 >> Cluster Name: campusvirtual >> Cluster Id: 45794 >> Cluster Member: Yes >> Cluster Generation: 1136 >> Membership state: Cluster-Member >> Nodes: 3 >> Expected votes: 4 >> Total votes: 3 >> Quorum: 3 >> Active subsystems: 9 >> Flags: Dirty >> Ports Bound: 0 11 177 >> Node name: cev01 >> Node ID: 2 >> Multicast addresses: 239.192.178.149 >> Node addresses: 150.214.243.20 >> >> >> while running "cman_tool status" on the new node: >> >> Version: 6.2.0 >> Config Version: 70 >> Cluster Name: campusvirtual >> Cluster Id: 45794 >> Cluster Member: Yes >> Cluster Generation: 1124 >> Membership state: Cluster-Member >> Nodes: 1 >> > > This is the key. The new node can't see the network traffic of the other > three. The most likely explanation for this is iptables blocking the > traffic. > > But check other network connections and settings too - It's almost > certainly a network configuration problem. The multicast and node addresses > look fine to me. > The iptables is deactivated in the new node: [root at cevstream ~]# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination as well as the selinux, is deactivated also. Any other idea? I don't see the flag "dirty" in the new node. It has no service associated, i don't know if this can means anything... Cheers, Arturo. > > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brettcave at gmail.com Mon Dec 14 09:32:11 2009 From: brettcave at gmail.com (Brett Cave) Date: Mon, 14 Dec 2009 11:32:11 +0200 Subject: [Linux-cluster] how to specify a fence method with "ccs_tool addnode" Message-ID: how would I go about specifying a fence method with ccs_tool? Can find much documentation on fencing methods. ccs_tool addfence myfence .... ccs_tool addnode mynode -n X -f myfence the above uses "single" fencing method, whereas I would like to specify "fabric". What impact does specifying different method's have? -------------- next part -------------- An HTML attachment was scrubbed... URL: From brettcave at gmail.com Mon Dec 14 09:36:09 2009 From: brettcave at gmail.com (Brett Cave) Date: Mon, 14 Dec 2009 11:36:09 +0200 Subject: [Linux-cluster] more info on fencing Message-ID: I am using ilo fencing to reset servers, so this is more power fencing than fabric fencing? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmicmirregs at gmail.com Mon Dec 14 22:15:09 2009 From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda) Date: Mon, 14 Dec 2009 23:15:09 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device Message-ID: <1260828909.6558.24.camel@mecatol> Hi all, I was wondering if there is a way to achieve a "quorum disk over a RAID software device" working CMAN cluster. Explanation: A) Environment - 6 x different servers used as cluster nodes, with dual FC HBA - 2 x different fabrics, each build with 3 FC SAN switches - 2 x storage arrays, with 23 270GB LUNs of data each. - 1x Qdisk: a 24th LUN located in one of the storage arrays B) Objectives - All the 6 nodes must be able to mount and use any of the 2x23 LUNs of data in the final configuration. Already done. - Usage of a Qdisk for a last-man-standing configuration. Already done (1 vote each node and 5 votes in the Qdisk device) C) Flaws - Qdisk is located in ONE storage array. If there is a failure in that storage array, 5 votes are lost. With only one cluster node failing there won't be quorum. This means that with 5 nodes and an storage array operative I will lose quorum. D) Possible Fixes - Using 2 quorum disks: Not implemented yet http://sources.redhat.com/cluster/wiki/MultiQuorumDisk - Using an LVM-Mirror device as a Qdisk and creating additional LUNs for mirror and log in both storage arrays: if the Qdisk is a Clustered Logical Volume, it won't be available in the CMAN start phase due CLVMD (and CMIRROR) is needed to have access to clustered logical volumes and CLVMD won't be running if CMAN is not running yet. Question: is it really necessary to use a Clustered Logical Volume for the Qdisk? Is there any problem in NOT using a clustered volume? - Using an Software Raid (MDRAID) device as a Qdisk and creating an additional LUN in the second storage array: Each cluster node will use the MD device as de Qdisk. Do you see any problem in this proposal? E) Possible Flaws - With LVM-Mirror: what would happen if one of the underlying disks of the Qdisk fails in only a part of the cluster nodes? You can imagine in a lun-masking problem of the storage array controller or in an admin making a mistake, which would result in some nodes losing the access to one of the disks. What would happen when the disk when it's fully on-line again? - With MDRAID: same questions. Of course, any idea os proposal is welcome. Thanks in advance. Cheers, Rafael -- Rafael Mic? Miranda From raju.rajsand at gmail.com Tue Dec 15 03:27:05 2009 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Tue, 15 Dec 2009 08:57:05 +0530 Subject: [Linux-cluster] more info on fencing In-Reply-To: References: Message-ID: <8786b91c0912141927m48f244bdr3d613bc284292c9a@mail.gmail.com> Greetings, I am not an expert in cluster. On Mon, Dec 14, 2009 at 3:06 PM, Brett Cave wrote: > I am using ilo fencing to reset servers, so this is more power fencing than > fabric fencing? Following is my understanding of fencing: There are three type of fencing (excluding manual): 1. Power fencing -- using IP enabled power strips 2. In-band fencing -- using RSA, ILO, IPMI and the such -- sorta power fencing 3. Storage fencing -- using the SAN fabric Switch Please correct me if I am wrong Regards Rajagopal From brettcave at gmail.com Tue Dec 15 09:30:41 2009 From: brettcave at gmail.com (Brett Cave) Date: Tue, 15 Dec 2009 11:30:41 +0200 Subject: [Linux-cluster] more info on fencing In-Reply-To: <8786b91c0912141927m48f244bdr3d613bc284292c9a@mail.gmail.com> References: <8786b91c0912141927m48f244bdr3d613bc284292c9a@mail.gmail.com> Message-ID: On Tue, Dec 15, 2009 at 5:27 AM, Rajagopal Swaminathan < raju.rajsand at gmail.com> wrote: > Greetings, > > I am not an expert in cluster. > > On Mon, Dec 14, 2009 at 3:06 PM, Brett Cave wrote: > > I am using ilo fencing to reset servers, so this is more power fencing > than > > fabric fencing? > > Following is my understanding of fencing: > > There are three type of fencing (excluding manual): > 1. Power fencing -- using IP enabled power strips > 2. In-band fencing -- using RSA, ILO, IPMI and the such -- sorta power > fencing > 3. Storage fencing -- using the SAN fabric Switch > Ah, ok - in-band fencing it is, thanks. With regards to my previous post, I assumed that the name applied to the fencing method of a node had some sort of impact, but from what i can see, it seems to just be for reference. I used to maintain consistency in the config file (which i editted manually): .... the method name was originally fabric as i was going to use san fabric switching, but this did not work - ILO fencing works well for us however, so "inband" would be a more relevant name. we are making changes that now result in configuration being updated via ccs_tool, which doesn't seem to provide an parameter to configure the method name, and defaults to "single" - so this is inconsistent with naming, but with no real functional affects on the config. thanks again. > > Please correct me if I am wrong > > Regards > > Rajagopal > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakov.sosic at srce.hr Tue Dec 15 10:58:34 2009 From: jakov.sosic at srce.hr (Jakov Sosic) Date: Tue, 15 Dec 2009 11:58:34 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <1260828909.6558.24.camel@mecatol> References: <1260828909.6558.24.camel@mecatol> Message-ID: <1260874714.9719.1.camel@localhost> On Mon, 2009-12-14 at 23:15 +0100, Rafael Mic? Miranda wrote: > - Using an LVM-Mirror device as a Qdisk and creating additional LUNs for > mirror and log in both storage arrays: if the Qdisk is a Clustered > Logical Volume, But is it possible to have clustered LVM-mirror? And if so, how? I would be very interested in something like that... Sorry that I haven't helped you out with this one, but if there is possibility to have mirrored volumes I would be very interested... Because it would solve lot of my problems... -- | Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D | ================================================================= | | From apfaffeneder at pfaffeneder.org Tue Dec 15 14:31:24 2009 From: apfaffeneder at pfaffeneder.org (Andreas Pfaffeneder) Date: Tue, 15 Dec 2009 15:31:24 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <1260828909.6558.24.camel@mecatol> References: <1260828909.6558.24.camel@mecatol> Message-ID: <4B279DBC.4090102@pfaffeneder.org> Hi Rafael, Am 14.12.2009 23:15, schrieb Rafael Mic? Miranda: > Hi all, > > I was wondering if there is a way to achieve a "quorum disk over a RAID > software device" working CMAN cluster. > > in a similar situation I am using a raid-1 device (built with mdadm prior to the startup of cman/rgmanager) which consists of two luns, one in each location. This works pretty well as quorum-device. Andreas From brem.belguebli at gmail.com Tue Dec 15 16:21:34 2009 From: brem.belguebli at gmail.com (brem belguebli) Date: Tue, 15 Dec 2009 17:21:34 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <4B279DBC.4090102@pfaffeneder.org> References: <1260828909.6558.24.camel@mecatol> <4B279DBC.4090102@pfaffeneder.org> Message-ID: <29ae894c0912150821w607b9edcv75443186035b0c5c@mail.gmail.com> Hi, The problem you could encounter is the network and storage split brain. If your Qdsik LUNs were hosted by 2 arrays located in 2 different rooms or site, each room hosting half the nodes of your cluster, in case a SAN and network partition occurs between the 2 rooms, you'll find yourself in a perfect storage and network split brain. Each room having the same number of nodes and accessing one leg of your qdisk, each qdisk leg being seen "alive" by the nodes in the room. Brem 2009/12/15 Andreas Pfaffeneder : > Hi Rafael, > > Am 14.12.2009 23:15, schrieb Rafael Mic? Miranda: >> >> Hi all, >> >> I was wondering if there is a way to achieve a "quorum disk over a RAID >> software device" working CMAN cluster. >> >> > > in a similar situation I am using a raid-1 device (built with mdadm prior to > the startup of cman/rgmanager) which consists of two luns, one in each > location. This works pretty well as quorum-device. > > Andreas > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From jakov.sosic at srce.hr Tue Dec 15 16:26:54 2009 From: jakov.sosic at srce.hr (Jakov Sosic) Date: Tue, 15 Dec 2009 17:26:54 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <4B279DBC.4090102@pfaffeneder.org> References: <1260828909.6558.24.camel@mecatol> <4B279DBC.4090102@pfaffeneder.org> Message-ID: <1260894414.1878.1.camel@localhost> On Tue, 2009-12-15 at 15:31 +0100, Andreas Pfaffeneder wrote: > in a similar situation I am using a raid-1 device (built with mdadm > prior to the startup of cman/rgmanager) which consists of two luns, one > in each location. This works pretty well as quorum-device. So you have to create mdraid on every node of the cluster? But, is that legitimate way of doing things - because mdraid isn't cluster aware? It's like having a LVM without using clustered volumes... It's ok as long as you don't change metadata... What about mdraid? -- | Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D | ================================================================= | | From rmicmirregs at gmail.com Tue Dec 15 18:51:16 2009 From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda) Date: Tue, 15 Dec 2009 19:51:16 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <1260874714.9719.1.camel@localhost> References: <1260828909.6558.24.camel@mecatol> <1260874714.9719.1.camel@localhost> Message-ID: <1260903076.7153.1.camel@mecatol> Hi Jakov, El mar, 15-12-2009 a las 11:58 +0100, Jakov Sosic escribi?: > On Mon, 2009-12-14 at 23:15 +0100, Rafael Mic? Miranda wrote: > > > - Using an LVM-Mirror device as a Qdisk and creating additional LUNs for > > mirror and log in both storage arrays: if the Qdisk is a Clustered > > Logical Volume, > > But is it possible to have clustered LVM-mirror? And if so, how? I would > be very interested in something like that... > > > Sorry that I haven't helped you out with this one, but if there is > possibility to have mirrored volumes I would be very interested... > Because it would solve lot of my problems... > > > Maybe this helps: http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirrored_volumes.html http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirvol_create_ex.html The point is, as I said, I cannot use a clustered logical volume (I mean a logical volume over a clustered volume group) because it wont be available as CMAN starts. Cheers, Rafael -- Rafael Mic? Miranda From rmicmirregs at gmail.com Tue Dec 15 19:01:00 2009 From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda) Date: Tue, 15 Dec 2009 20:01:00 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <4B279DBC.4090102@pfaffeneder.org> References: <1260828909.6558.24.camel@mecatol> <4B279DBC.4090102@pfaffeneder.org> Message-ID: <1260903660.7153.12.camel@mecatol> Hi Andreas El mar, 15-12-2009 a las 15:31 +0100, Andreas Pfaffeneder escribi?: > Hi Rafael, > > Am 14.12.2009 23:15, schrieb Rafael Mic? Miranda: > > Hi all, > > > > I was wondering if there is a way to achieve a "quorum disk over a RAID > > software device" working CMAN cluster. > > > > > in a similar situation I am using a raid-1 device (built with mdadm > prior to the startup of cman/rgmanager) which consists of two luns, one > in each location. This works pretty well as quorum-device. > > Andreas > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster Today I tried this approach, but I have no previous experience with MDADM. The problem I found was, how do you manage multipath on the different LUNs without device-mapper-multipath? As I see in the system logs the MD driver loads before device-mapper-multipath is working, and it uses the devices (/dev/sdX) to assemble the RAID devices (/dev/mdX) before the devices are available through device-mapper-multipath (/dev/mapper/quorumdiskX) at system boot time. This even makes the device-mapper-multipath devices not getting built when multipathd starts after that. Does this happen to you? So, I tried to configure a multipath device with MDADM and after that use the built devices to assemble a RAID1 MD device. This was the config in mdadm.conf I used: DEVICE /dev/sd* ARRAY /dev/md1 metadata=1.1 level=multipath num-devices=2 name=multipath01 ARRAY /dev/md2 metadata=1.1 level=multipath num-devices=2 name=multipath02 ARRAY /dev/md3 metadata=1.2 level=raid1 num-devices=2 name=quorum devices=/dev/md1,/dev/md2 Devices where all with the "Linux raid auto" partition mark. When I built the MDs first everything seemed to work, but after a system boot the /dev/md3 device was not built, so the quorum disk would not be available for CMAN. How do you implement this with MDADM? Thanks in advance, Rafael -- Rafael Mic? Miranda From rmicmirregs at gmail.com Tue Dec 15 19:10:16 2009 From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda) Date: Tue, 15 Dec 2009 20:10:16 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <29ae894c0912150821w607b9edcv75443186035b0c5c@mail.gmail.com> References: <1260828909.6558.24.camel@mecatol> <4B279DBC.4090102@pfaffeneder.org> <29ae894c0912150821w607b9edcv75443186035b0c5c@mail.gmail.com> Message-ID: <1260904216.7153.22.camel@mecatol> Hi Brem El mar, 15-12-2009 a las 17:21 +0100, brem belguebli escribi?: > Hi, > > The problem you could encounter is the network and storage split brain. > > If your Qdsik LUNs were hosted by 2 arrays located in 2 different > rooms or site, each room hosting half the nodes of your cluster, in > case a SAN and network partition occurs between the 2 rooms, you'll > find yourself in a perfect storage and network split brain. > > Each room having the same number of nodes and accessing one leg of > your qdisk, each qdisk leg being seen "alive" by the nodes in the > room. > > Brem > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster I thought about this. In my situation: - All the nodes are in the same site. - All the nodes are connected to the two storage arrays via the same FC switches in a symmetric way. - All the nodes have their network interfaces connected to the same couple of Ethernet Switches in a symmetric way via bonding. I think the probability of failing exactly the devices that should fail (5 exact FC ports in one FC switch, another 5 on the another FC switch and a "split" in the Ethernet switches themselves exactly dividing the nodes in groups of 3) is pretty small. I see you exposed your point with the idea of a multi-site cluster with the 2 qdisk LUNs placed in different sites and cluster nodes in both of them, but this is not the case. But that is, in fact, a really interesting scenario :) Thanks for your interest. Cheers, Rafael -- Rafael Mic? Miranda From rmicmirregs at gmail.com Tue Dec 15 19:23:23 2009 From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda) Date: Tue, 15 Dec 2009 20:23:23 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <1260894414.1878.1.camel@localhost> References: <1260828909.6558.24.camel@mecatol> <4B279DBC.4090102@pfaffeneder.org> <1260894414.1878.1.camel@localhost> Message-ID: <1260905003.7153.34.camel@mecatol> Hi Jacov El mar, 15-12-2009 a las 17:26 +0100, Jakov Sosic escribi?: > On Tue, 2009-12-15 at 15:31 +0100, Andreas Pfaffeneder wrote: > > > in a similar situation I am using a raid-1 device (built with mdadm > > prior to the startup of cman/rgmanager) which consists of two luns, one > > in each location. This works pretty well as quorum-device. > > So you have to create mdraid on every node of the cluster? But, is that > legitimate way of doing things - because mdraid isn't cluster aware? > It's like having a LVM without using clustered volumes... It's ok as > long as you don't change metadata... > > What about mdraid? > > > As I see, in this situation of the usage of the shared storage volume as a Qdisk there is no problem of the system being "not cluster aware". I mean: a usual qdisk is a LUN with a "clustered" filesystem, to say it in some way, in which all the cluster nodes can write an read at the same time. If you don't plan to change the LVM metadata of the qdisk (I don't) I think this will be feasible. The same should happen with the MDADM variant. Today I configured a not-clustered volume group and then I built a mirrored logical volume over it and configured it as a Qdisk. Then I started CMAN and it worked OK using the LVM-mirror qdisk. Tomorrow (I hope) I'll do some tests to see what happens if only one of the nodes loses one of the LUNs which build the LVM-mirror volume, and what happens when the LUN is back. Thanks for your interest. Cheers, Rafael -- Rafael Mic? Miranda From apfaffeneder at pfaffeneder.org Tue Dec 15 19:38:26 2009 From: apfaffeneder at pfaffeneder.org (Andreas Pfaffeneder) Date: Tue, 15 Dec 2009 20:38:26 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <1260903660.7153.12.camel@mecatol> References: <1260828909.6558.24.camel@mecatol> <4B279DBC.4090102@pfaffeneder.org> <1260903660.7153.12.camel@mecatol> Message-ID: <4B27E5B2.20005@pfaffeneder.org> Am 15.12.2009 20:01, schrieb Rafael Mic? Miranda: > in a similar situation I am using a raid-1 device (built with mdadm >> prior to the startup of cman/rgmanager) which consists of two luns, one >> in each location. This works pretty well as quorum-device. >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > Today I tried this approach, but I have no previous experience with > MDADM. The problem I found was, how do you manage multipath on the > different LUNs without device-mapper-multipath? > [ > [...] > How do you implement this with MDADM? > > with a custom init-script which runs after multipathd but before cman/rgmanager. Andreas From brem.belguebli at gmail.com Tue Dec 15 20:15:30 2009 From: brem.belguebli at gmail.com (brem belguebli) Date: Tue, 15 Dec 2009 21:15:30 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <1260905003.7153.34.camel@mecatol> References: <1260828909.6558.24.camel@mecatol> <4B279DBC.4090102@pfaffeneder.org> <1260894414.1878.1.camel@localhost> <1260905003.7153.34.camel@mecatol> Message-ID: <29ae894c0912151215g433305ebncfde15dd10e124ea@mail.gmail.com> Hi Rafael, I can already predict what is going to happen during your test I one of your nodes looses only 1 leg of your mirrored qdisk (either with mdadm or lvm), the qdisk will still be active from the point of view of this particular node, so nothing will happen. What you should consider is 1) reducing the scsi timeout of the lun which is by default around 60 seconds (see udev rules) 2) if your qdisk lun is configured to multipath, don't configure it with queue_if_no_path or mdadm will never see if one of the legs came to be unavail. Brem 2009/12/15 Rafael Mic? Miranda : > Hi Jacov > > El mar, 15-12-2009 a las 17:26 +0100, Jakov Sosic escribi?: >> On Tue, 2009-12-15 at 15:31 +0100, Andreas Pfaffeneder wrote: >> >> > in a similar situation I am using a raid-1 device (built with mdadm >> > prior to the startup of cman/rgmanager) which consists of two luns, one >> > in each location. This works pretty well as quorum-device. >> >> So you have to create mdraid on every node of the cluster? But, is that >> legitimate way of doing things - because mdraid isn't cluster aware? >> It's like having a LVM without using clustered volumes... It's ok as >> long as you don't change metadata... >> >> What about mdraid? >> >> >> > > As I see, in this situation of the usage of the shared storage volume as > a Qdisk there is no problem of the system being "not cluster aware". I > mean: a usual qdisk is a LUN with a "clustered" filesystem, to say it in > some way, in which all the cluster nodes can write an read at the same > time. > > If you don't plan to change the LVM metadata of the qdisk (I don't) I > think this will be feasible. The same should happen with the MDADM > variant. > > Today I configured a not-clustered volume group and then I built a > mirrored logical volume over it and configured it as a Qdisk. Then I > started CMAN and it worked OK using the LVM-mirror qdisk. > > Tomorrow (I hope) I'll do some tests to see what happens if only one of > the nodes loses one of the LUNs which build the LVM-mirror volume, and > what happens when the LUN is back. > > Thanks for your interest. Cheers, > > Rafael > > -- > Rafael Mic? Miranda > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From jakov.sosic at srce.hr Wed Dec 16 00:02:19 2009 From: jakov.sosic at srce.hr (Jakov Sosic) Date: Wed, 16 Dec 2009 01:02:19 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <1260903076.7153.1.camel@mecatol> References: <1260828909.6558.24.camel@mecatol> <1260874714.9719.1.camel@localhost> <1260903076.7153.1.camel@mecatol> Message-ID: <1260921739.2754.1.camel@localhost> On Tue, 2009-12-15 at 19:51 +0100, Rafael Mic? Miranda wrote: > http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirrored_volumes.html > http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirvol_create_ex.html Thank you... This seems as a good replacement for DRBD, except that after one side of the mirror failse, whole logical volume would be synced from the start (because I presume there is no wfbitmap like in drbd)? -- | Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D | ================================================================= | | From kkovachev at varna.net Wed Dec 16 11:41:11 2009 From: kkovachev at varna.net (Kaloyan Kovachev) Date: Wed, 16 Dec 2009 13:41:11 +0200 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <1260921739.2754.1.camel@localhost> References: <1260828909.6558.24.camel@mecatol> <1260874714.9719.1.camel@localhost> <1260903076.7153.1.camel@mecatol> <1260921739.2754.1.camel@localhost> Message-ID: <20091216104820.M24883@varna.net> On Wed, 16 Dec 2009 01:02:19 +0100, Jakov Sosic wrote > On Tue, 2009-12-15 at 19:51 +0100, Rafael [UTF-8?]Mic?? Miranda wrote: > > > [1] http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirrored_volumes.html > > http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirvol_create_ex.html > > Thank you... > > This seems as a good replacement for DRBD, except that after one side of > the mirror failse, whole logical volume would be synced from the start > (because I presume there is no wfbitmap like in drbd)? > from [1] "An LVM mirror divides the device being copied into regions that are typically 512KB in size. LVM maintains a small log which it uses to keep track of which regions are in sync with the mirror or mirrors. This log can be kept on disk, which will keep it persistent across reboots, or it can be maintained in memory." - so they shouldn't be synced from start About the 6 node cluster - do you really need to have it operational with just a single node? If this is not mandatory it might be better to use different votes for the nodes to break the tie instead of mirrored qdisk (one more place for split brain) ... like 3 nodes with 2 votes and the others with 3 votes or a combination with non mirrored qdisk (with 4 votes) > -- > | Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D | > ================================================================= > | | > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From Alain.Moulle at bull.net Wed Dec 16 13:53:27 2009 From: Alain.Moulle at bull.net (Alain.Moulle) Date: Wed, 16 Dec 2009 14:53:27 +0100 Subject: [Linux-cluster] Question about openais rrp_mode Message-ID: <4B28E657.3090400@bull.net> Hi, > man openais.conf : rrp_mode > This specifies the mode of redundant ring, which may > be none, > active, or passive. Active replication offers > slightly lower > latency from transmit to delivery in faulty network > environ- > ments but with less performance. Passive > replication may > nearly double the speed of the totem protocol if the > protocol > doesn't become cpu bound. Not completely clear for me: does that mean that "active mode" makes it send the totems systematically on both networks, and "passive mode" makes it send on the first interface ringnumber (in openais.conf) and only on the second interface rignnumber if the first is broken ? Could someone give more precise information ? or where can I find more information about this ? And by the way, is there any issue to use to set a first interface ringnumber on Ethernet (eth0) and a second on IP/Infiniband ? Thanks for your response. Alain Moull? From gianluca.cecchi at gmail.com Wed Dec 16 14:51:06 2009 From: gianluca.cecchi at gmail.com (Gianluca Cecchi) Date: Wed, 16 Dec 2009 15:51:06 +0100 Subject: [Linux-cluster] actions to be taken when changing fence devices ip address Message-ID: <561c252c0912160651y79ca70fk618173542a249464@mail.gmail.com> Hello, I'm using RHEL 5.4 based cluster. I'm using fence_ilo fence device and I'm going to change ip address for the iLO of one node of the cluster. Is this action supposed to be made near-online, in the sense that I have not to shutdown all the cluster nodes? Idea would be: 1) services remains on node where iLO ip doesn't change 2) shutdown and change iLO ip of the other node (actually it is a server swap maintaining its disks) 3) on first node change cluster.conf and issue of ccs_tool update /etc/cluster/cluster.conf cman_tool version -r 4) power on of second node will action 4) give an automatic join with new config of node 2 to the cluster? Or do I have to make anything with fenced to reload its config? My question arises from past experience with need of changing qdisk parameters in cluster.conf: this requires a qdiskd restart, with steps in 3) not being sufficient... Do I have to restart fenced? In this case does this produce any problem/relocation? Thanks in advance, Gianluca From jakov.sosic at srce.hr Wed Dec 16 17:50:09 2009 From: jakov.sosic at srce.hr (Jakov Sosic) Date: Wed, 16 Dec 2009 18:50:09 +0100 Subject: [Linux-cluster] 2 pptp links on two hosts Message-ID: <1260985809.2168.6.camel@localhost> Hi. I have two pptp links on two hosts. Hosts are frontends (gateways, firewalls, NAT) for some network. Hosts also must be gateways for all the VLANs. Now, two things in this case can fail - one host, or for example it's pptp route, in which case again gateway and static routes should be transferred to the secondary node. Is there a way to solve this with RHCS, or is there any more appropriate software for this kind of failover? My initial idea when I heard the problem was to write something like init script, which in status part pings some address behind PPTP link, and if ping is OK, than service is considered OK. Now, if for some reason ping fails, status wouldn't be 0, and RHCS would apply relocate policy, stop the script on primary and start it on secondary. stop function would delete all the routes, and start would set appropriate static routes. RHCS itself would take care of the floating address of the gateway. I wonder if you have any experience with this kind of setup, or any ideas if this could be done in any better way? Thank you. -- | Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D | ================================================================= | | From rmicmirregs at gmail.com Wed Dec 16 18:50:26 2009 From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda) Date: Wed, 16 Dec 2009 19:50:26 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <20091216104820.M24883@varna.net> References: <1260828909.6558.24.camel@mecatol> <1260874714.9719.1.camel@localhost> <1260903076.7153.1.camel@mecatol> <1260921739.2754.1.camel@localhost> <20091216104820.M24883@varna.net> Message-ID: <1260989426.6687.3.camel@mecatol> Hi Kaloyan El mi?, 16-12-2009 a las 13:41 +0200, Kaloyan Kovachev escribi?: > About the 6 node cluster - do you really need to have it operational with just > a single node? If this is not mandatory it might be better to use different > votes for the nodes to break the tie instead of mirrored qdisk (one more place > for split brain) ... like 3 nodes with 2 votes and the others with 3 votes or > a combination with non mirrored qdisk (with 4 votes) > > > Well, this is a thing I have to think about. Maybe only one node cannot give the full service due to load and performance reasons, but I think the Qdisk is a must in the service for availability reasons. I'll take note on your recommendation and maybe i change the votes to make the minimal number of nodes higher, possibly 2. Thanks! Rafael -- Rafael Mic? Miranda From rmicmirregs at gmail.com Wed Dec 16 19:09:04 2009 From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda) Date: Wed, 16 Dec 2009 20:09:04 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <29ae894c0912151215g433305ebncfde15dd10e124ea@mail.gmail.com> References: <1260828909.6558.24.camel@mecatol> <4B279DBC.4090102@pfaffeneder.org> <1260894414.1878.1.camel@localhost> <1260905003.7153.34.camel@mecatol> <29ae894c0912151215g433305ebncfde15dd10e124ea@mail.gmail.com> Message-ID: <1260990544.6687.23.camel@mecatol> Hi Brem El mar, 15-12-2009 a las 21:15 +0100, brem belguebli escribi?: > Hi Rafael, > > I can already predict what is going to happen during your test > > I one of your nodes looses only 1 leg of your mirrored qdisk (either > with mdadm or lvm), the qdisk will still be active from the point of > view of this particular node, so nothing will happen. > > What you should consider is > > 1) reducing the scsi timeout of the lun which is by default around 60 > seconds (see udev rules) > 2) if your qdisk lun is configured to multipath, don't configure it > with queue_if_no_path or mdadm will never see if one of the legs came > to be unavail. > > Brem > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster I made some tests today. A) With MDADM mirrored LUNs: I built the MD device over the multipathd devices and used it as a quorum disk. It seemed to work, but in a test during the intentioned failure of a LUN on a single machine the node failed to access the quorum device, so it was evicted by the rest of the nodes. I have to take a closer look to this because in other attempts it didn't happen, I think this is realated with the device timeouts, retries and queues. B) With non-clustered LVM-Mirrored LUNs: Seems to work too, but there are some strange behaviours. During the intentioned failure of a LUN on a single machine the node did not see the failure at the LVM layer of one device not being reachable, but the multipath daemon was marking the device as failed. In other attempts it worked right. Also I have to check, as you commented, the values at the udev rules and multipath.conf file: device { vendor "HP" product "MSA VOLUME" path_grouping_policy group_by_prio getuid_callout "/sbin/scsi_id -g -u -s /block/%n" path_checket tur patch_selector "round_robin 0" prio_callout "/sbin/mpath_prio_alua /dev/%n" rr_weight uniform failback immediate hardware_handler "0" no_path_retry 12 rr_min_io 100 } Note: this is my testing scenario. The production environment is not using MSA storage arrays. I'm thinking in reducing the "no_path_retry" to a smaller value or even to "fail". With the current value (equivalent to "queue_if_no_path" of 12 regarding RHEL docs) MDADM saw the failure of the device, so this is more or less working. I'm interested too in the "flush_on_last_del" parameter, have you ever tried it? Thanks in advance. Cheers, Rafael -- Rafael Mic? Miranda From brem.belguebli at gmail.com Wed Dec 16 19:13:02 2009 From: brem.belguebli at gmail.com (brem belguebli) Date: Wed, 16 Dec 2009 20:13:02 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <1260989426.6687.3.camel@mecatol> References: <1260828909.6558.24.camel@mecatol> <1260874714.9719.1.camel@localhost> <1260903076.7153.1.camel@mecatol> <1260921739.2754.1.camel@localhost> <20091216104820.M24883@varna.net> <1260989426.6687.3.camel@mecatol> Message-ID: <29ae894c0912161113u76528a3em5532005a6407b177@mail.gmail.com> Rafael, What ou have to take care about is the following. Imagine your SAN admin modifies the wrong zoning while doing his job, making the qdisk (both legs) unavailable for your nodes, and at this time you have one node off because of maintenance operation, your whole cluster would go down. Brem 2009/12/16 Rafael Mic? Miranda : > Hi Kaloyan > > El mi?, 16-12-2009 a las 13:41 +0200, Kaloyan Kovachev escribi?: > >> About the 6 node cluster - do you really need to have it operational with just >> a single node? If this is not mandatory it might be better to use different >> votes for the nodes to break the tie instead of mirrored qdisk (one more place >> for split brain) ... like 3 nodes with 2 votes and the others with 3 votes or >> a combination with non mirrored qdisk (with 4 votes) >> >> > > > Well, this is a thing I have to think about. Maybe only one node cannot > give the full service due to load and performance reasons, but I think > the Qdisk is a must in the service for availability reasons. I'll take > note on your recommendation and maybe i change the votes to make the > minimal number of nodes higher, possibly 2. > > Thanks! > > Rafael > > -- > Rafael Mic? Miranda > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From brem.belguebli at gmail.com Wed Dec 16 19:41:26 2009 From: brem.belguebli at gmail.com (brem belguebli) Date: Wed, 16 Dec 2009 20:41:26 +0100 Subject: [Linux-cluster] Quorum disk over RAID software device In-Reply-To: <1260990544.6687.23.camel@mecatol> References: <1260828909.6558.24.camel@mecatol> <4B279DBC.4090102@pfaffeneder.org> <1260894414.1878.1.camel@localhost> <1260905003.7153.34.camel@mecatol> <29ae894c0912151215g433305ebncfde15dd10e124ea@mail.gmail.com> <1260990544.6687.23.camel@mecatol> Message-ID: <29ae894c0912161141t1085baf7t6bbba32a82820bc1@mail.gmail.com> In my multipath setup I use the following : polling_interval 3 (checks the storage every 3 seconds) no_path_retry 5 (will check 5 times the path if failure happens on it, making it last scsi_timer (/sys/block/sdXX/device/timeout) + 5*3 secondes ) path_grouping_policy multibus (to load-balance accross all paths, group_by_prio may be recommended with MSA if it is an active/passive array?) >From my experience, no_path_retry, when using mirror (md or LVM) could be put to fail instead of 5 in my case. Concerning the flush_on_last_del, it just means that for a given LUN, when there is only one path remaining, if it comes to fail, what behaviour to adopt. Same consideration, if using mirror, just fail. The thing to take into account is the interval at which your qdisk process accesses the qdisk lun, if configured to a high value (let's imagine every 65 seconds) it'll take (worst case) 60 seconds of scsi timeout (default) + 12 times default polling interval (30 seconds if I'm not wrong) + 5 seconds= 425 seconds..... Brem 2009/12/16 Rafael Mic? Miranda : > Hi Brem > > El mar, 15-12-2009 a las 21:15 +0100, brem belguebli escribi?: >> Hi Rafael, >> >> I can already predict what is going to happen during your test >> >> I one of your nodes looses only 1 leg of your mirrored qdisk (either >> with mdadm or lvm), the qdisk will still be active from the point of >> view of this particular node, so nothing will happen. >> >> What you should consider is >> >> 1) reducing the scsi timeout of the lun which is by default around 60 >> seconds (see udev rules) >> 2) if your qdisk lun is configured to multipath, don't configure it >> with queue_if_no_path or mdadm will never see if one of the legs came >> to be unavail. >> >> Brem > >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > I made some tests today. > > A) With MDADM mirrored LUNs: > I built the MD device over the multipathd devices and used it as a > quorum disk. It seemed to work, but in a test during the intentioned > failure of a LUN on a single machine the node failed to access the > quorum device, so it was evicted by the rest of the nodes. I have to > take a closer look to this because in other attempts it didn't happen, I > think this is realated with the device timeouts, retries and queues. > > B) With non-clustered LVM-Mirrored LUNs: > Seems to work too, but there are some strange behaviours. During the > intentioned failure of a LUN on a single machine the node did not see > the failure at the LVM layer of one device not being reachable, but the > multipath daemon was marking the device as failed. In other attempts it > worked right. > > Also I have to check, as you commented, the values at the udev rules and > multipath.conf file: > > device { > vendor ? ? ? ? ? ? ? ? ?"HP" > product ? ? ? ? ? ? ? ? "MSA VOLUME" > path_grouping_policy ? ?group_by_prio > getuid_callout ? ? ? ? ?"/sbin/scsi_id -g -u -s /block/%n" > path_checket ? ? ? ? ? ?tur > patch_selector ? ? ? ? ?"round_robin 0" > prio_callout ? ? ? ? ? ?"/sbin/mpath_prio_alua /dev/%n" > rr_weight ? ? ? ? ? ? ? uniform > failback ? ? ? ? ? ? ? ?immediate > hardware_handler ? ? ? ?"0" > no_path_retry ? ? ? ? ? 12 > rr_min_io ? ? ? ? ? ? ? 100 > } > > Note: this is my testing scenario. The production environment is not > using MSA storage arrays. > > I'm thinking in reducing the "no_path_retry" to a smaller value or even > to "fail". With the current value (equivalent to "queue_if_no_path" of > 12 regarding RHEL docs) MDADM saw the failure of the device, so this is > more or less working. > I'm interested too in the "flush_on_last_del" parameter, have you ever > tried it? > > Thanks in advance. Cheers, > > Rafael > > -- > Rafael Mic? Miranda > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From siedler at hrd-asia.com Thu Dec 17 07:41:13 2009 From: siedler at hrd-asia.com (Wolf Siedler) Date: Thu, 17 Dec 2009 15:41:13 +0800 Subject: [Linux-cluster] Cluster config. advice for sought Message-ID: <4B29E099.6090105@hrd-asia.com> Dear all: I am new to this list and cluster technology. Anyway, I managed to get a cluster set up based on CentOS 5 with two nodes which worked very well for several months. Even several CentOS update rounds (all within version 5) worked flawlessly. The cluster contains three paravirtualized Xen-based virtual machines in an iSCSI storage vault. Even failover and failback worked perfectly. Cluster control/management was handled by a separate standalone PC running Conga. Both cluster nodes and the adminpc are running CentOS5. After another CentOS upgrade round in October, the cluster wouldn't start anymore. We got that solved (cman would't start, but a newer openais package - 0.80.6 - let us overcome that by manual update), but now the virtual machines always get started on all nodes simultaneously. Furthermore, something in Conga setup also seems to have broken: The Conga webinterface at the separate adminpc can still be accessed, but fails when probing storage (broken ricci/luci communication?) This never happened before the upgrade and we had changed neither hardware nor software configuration during the update. Unfortunately, I don't have access to the testing system anymore (but we *did* a lot of testing before putting the system in production use). I would appreciate if more experienced persons could review our configuration and point out any errors or improvements: The cluster has two nodes (station1, station2) and one standalone PC for administration running Conga (adminpc). The nodes are standard Dell 1950 servers. Main storage location is a Dell storage vault which is accessed via iSCSI and mounted on both nodes as /rootfs/. The file system is GFS2. Furthermore, it provides a quorum partition. Fencing is handled via the included DRAC remote access boards. There are three paravirtualized Xen-based virtual machines (vm_mailserver, vm_ldapserver, vm_adminserver). Their container files are located at /rootfs/vmadminserver etc. The VMs are supposed to start distributed on station1 (vm_mailserver) and station2 (vm_ldapserver, vm_adminserver). Software versions (identical on both nodes): kernel 2.6.18-164.el5xen openais-0.80.6-8.el5 cman-2.0.115-1.el5 rgmanager-2.0.52-1.el5.centos xen-3.0.3-80.el5-3.3 xen-libs-3.0.3-80.el5-3.3 luci-0.12.1-7.3.el5.centos.1 ricci-0.12.1-7.3.el5.centos.1 gfs2-utils-0.1.62-1.el5 Before the CentOS update, the working cluster.conf was: ===quote nonworking cluster.conf=== ===unquote nonworking cluster.conf=== A explained, this configuration worked flawlessly for 10 months. Only after the CentOS update, it started the virtual machines simultaneously on both station1 *and* station2 and not distributed as per the directive. We temporarily worked arounf this problem by changing the autostart parameter to . At least this brought our cluster back to running, but we lost the desired automatic restart should a system hang. And failover also doesn't seem to work anymore. I read several messages on this list where users seem to have had a similar problem. It seems to me as if I had missed the use_virsh="0" statement. Hence my question: Is the following a valid cluster.conf for such a setup (distributed VMs, automatic start, failover/failback): ===quote=== ===unquote=== I am open to further updates/testing and will gladly provide additional details should if needed. But as this setup also contains production systems, I want to avoid any fundamental mistakes/oversights. Needless to say, I would appreciate any feedback/suggestions! Regards, Wolf From siedler at hrd-asia.com Thu Dec 17 10:14:43 2009 From: siedler at hrd-asia.com (Wolf Siedler) Date: Thu, 17 Dec 2009 18:14:43 +0800 Subject: [Linux-cluster] Re: Cluster config. advice for sought (2) In-Reply-To: <4B29E099.6090105@hrd-asia.com> References: <4B29E099.6090105@hrd-asia.com> Message-ID: <4B2A0493.6040409@hrd-asia.com> As follow up to my earlier question: === Station1 - output clustat: Cluster Status for example_cluster_1 @ Thu Dec 17 18:07:44 2009 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ station1.example.com 1 Online, Local, rgmanager station2.example.com 2 Online, rgmanager /dev/disk/by-id/scsi-36002219000a2f28b00000420494880a3-part1 0 Online, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- vm:vm_adminserver (none) disabled vm:vm_ldapserver (none) disabled vm:vm_mailserver (none) disabled === Station1 - output xm li: Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 768 1 r----- 410641.5 vm_mailserver 3 2047 4 -b---- 833206.1 === Station2 - output clustat: Cluster Status for example_cluster_1 @ Thu Dec 17 17:37:15 2009 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ station1.example.com 1 Online, rgmanager station2.example.com 2 Online, Local, rgmanager /dev/disk/by-id/scsi-36002219000a2f28b00000420494880a3-part1 0 Online, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- vm:vm_adminserver (none) disabled vm:vm_ldapserver (none) disabled vm:vm_mailserver (none) disabled === Station2 - output xm li: Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 768 1 r----- 384845.0 vm_adminserver 6 1023 1 -b---- 76745.5 vm_ldapserver 4 1023 1 -b---- 22685.6 === Hope this provides better insight. Regards, Wolf From brem.belguebli at gmail.com Thu Dec 17 10:22:50 2009 From: brem.belguebli at gmail.com (brem belguebli) Date: Thu, 17 Dec 2009 11:22:50 +0100 Subject: [Linux-cluster] Re: Cluster config. advice for sought (2) In-Reply-To: <4B2A0493.6040409@hrd-asia.com> References: <4B29E099.6090105@hrd-asia.com> <4B2A0493.6040409@hrd-asia.com> Message-ID: <29ae894c0912170222k6480988av4608450236ff1356@mail.gmail.com> I think it is better if you post your cluster.conf. Try to look in linux-cluster archive, your problem looks similar to some others that were posted around October/November. There were things to check with use_virsh, path etc... in the cluster.conf... 2009/12/17 Wolf Siedler : > As follow up to my earlier question: > > === > Station1 - output clustat: > Cluster Status for example_cluster_1 @ Thu Dec 17 18:07:44 2009 > Member Status: Quorate > > ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status > ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------ > ?station1.example.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 > Online, Local, rgmanager > ?station2.example.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 > Online, rgmanager > ?/dev/disk/by-id/scsi-36002219000a2f28b00000420494880a3-part1 ? ? ? ?0 > Online, Quorum Disk > > ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Owner > (Last) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? State > ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ----- > ------ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ----- > ?vm:vm_adminserver > (none) > disabled > ?vm:vm_ldapserver > (none) > disabled > ?vm:vm_mailserver > (none) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? disabled > === > Station1 - output xm li: > Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s) > Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ?768 ? ? 1 r----- 410641.5 > vm_mailserver ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?3 ? ? 2047 ? ? 4 -b---- 833206.1 > === > Station2 - output clustat: > Cluster Status for example_cluster_1 @ Thu Dec 17 17:37:15 2009 > Member Status: Quorate > > ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status > ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------ > ?station1.example.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 > Online, rgmanager > ?station2.example.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 > Online, Local, rgmanager > ?/dev/disk/by-id/scsi-36002219000a2f28b00000420494880a3-part1 ? ? ? ?0 > Online, Quorum Disk > > ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Owner > (Last) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? State > ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ----- > ------ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ----- > ?vm:vm_adminserver > (none) > disabled > ?vm:vm_ldapserver > (none) > disabled > ?vm:vm_mailserver > (none) > disabled > === > Station2 - output xm li: > Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s) > Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ?768 ? ? 1 r----- 384845.0 > vm_adminserver ? ? ? ? ? ? ? ? ? ? ? ? ? ? 6 ? ? 1023 ? ? 1 -b---- ?76745.5 > vm_ldapserver ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?4 ? ? 1023 ? ? 1 -b---- ?22685.6 > === > > Hope this provides better insight. > > Regards, > Wolf > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From siedler at hrd-asia.com Thu Dec 17 11:22:19 2009 From: siedler at hrd-asia.com (Wolf Siedler) Date: Thu, 17 Dec 2009 19:22:19 +0800 Subject: [Linux-cluster] Re: cluster.conf, was: Cluster config. advice sought In-Reply-To: <29ae894c0912170222k6480988av4608450236ff1356@mail.gmail.com> References: <4B29E099.6090105@hrd-asia.com> <4B2A0493.6040409@hrd-asia.com> <29ae894c0912170222k6480988av4608450236ff1356@mail.gmail.com> Message-ID: <4B2A146B.5050705@hrd-asia.com> Dear Brem, Thanks for taking time to look at my problem. > Try to look in linux-cluster archive, your problem looks similar to > some others that were posted around October/November. > There were things to check with use_virsh, path etc... in the > cluster.conf... I did and this was actually the reason for my original question (I am definitely open for testing, but there is one production VM running in the cluster. Which in turn limits my access for configuration changes and restarts.): After studying the thread you described, I came up with this cluster.conf: ===quote=== ===unquote=== You will notice that I already included use_virsh. Does this cluster.conf look OK? As said before, I would highly appreciate any advice/suggestion you would be willing to give. Regards, Wolf From brem.belguebli at gmail.com Thu Dec 17 12:42:04 2009 From: brem.belguebli at gmail.com (brem belguebli) Date: Thu, 17 Dec 2009 13:42:04 +0100 Subject: [Linux-cluster] Re: cluster.conf, was: Cluster config. advice sought In-Reply-To: <4B2A146B.5050705@hrd-asia.com> References: <4B29E099.6090105@hrd-asia.com> <4B2A0493.6040409@hrd-asia.com> <29ae894c0912170222k6480988av4608450236ff1356@mail.gmail.com> <4B2A146B.5050705@hrd-asia.com> Message-ID: <29ae894c0912170442y22ff7076ob61bcfbc0960f6ee@mail.gmail.com> Hi Wolf, I have no xen setup to tell you exactly if the cluster.conf you posted should be fine. I do understand that this cluster.conf comes from what you think it should be after reading the different posts, and it is not the one you have in production right now, right ? To test it without disturbing your prod setup, as the use_virsh, path parameters are VM based, you may create a test VM with these parameters and see if you get the same behaviour. Brem 2009/12/17 Wolf Siedler : > Dear Brem, > > Thanks for taking time to look at my problem. > >> Try to look in linux-cluster archive, your problem looks similar to >> some others that were posted around October/November. >> There were things to check with use_virsh, path etc... in the >> cluster.conf... > > I did and this was actually the reason for my original question (I am > definitely open for testing, but there is one production VM running in > the cluster. Which in turn limits my access for configuration changes > and restarts.): > After studying the thread you described, I came up with this cluster.conf: > ===quote=== > > name="example_cluster_1"> > ? ? > ? ? > ? ? ? ? > ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? > ? ? ? ? > ? ? ? ? > ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? > ? ? ? ? > ? ? > ? ? > ? ? > ? ? ? ? login="ipmi_admin" name="station1_fenced" operation="off" passwd="secret"/> > ? ? ? ? login="ipmi_admin" name="station2_fenced" operation="off" passwd="secret"/> > ? ? > ? ? > ? ? ? ? > ? ? ? ? ? ? ordered="0" restricted="0"> > ? ? ? ? ? ? ? ? priority="1"/> > ? ? ? ? ? ? > ? ? ? ? ? ? ordered="0" restricted="0"> > ? ? ? ? ? ? ? ? priority="1"/> > ? ? ? ? ? ? > ? ? ? ? > ? ? ? ? > ? ? ? ? exclusive="0" migrate="live" name="vm_mailserver" path="/rootfs" > recovery="restart"/> > ? ? ? ? exclusive="0" migrate="live" name="vm_ldapserver" path="/rootfs" > recovery="restart"/> > ? ? ? ? exclusive="0" migrate="live" name="vm_adminserver" path="/rootfs" > recovery="restart"/> > ? ? > ? ? votes="1"/> > > ===unquote=== > > You will notice that I already included use_virsh. > Does this cluster.conf look OK? > > As said before, I would highly appreciate any advice/suggestion you > would be willing to give. > > Regards, > Wolf > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From pradhanparas at gmail.com Thu Dec 17 18:33:49 2009 From: pradhanparas at gmail.com (Paras pradhan) Date: Thu, 17 Dec 2009 12:33:49 -0600 Subject: [Linux-cluster] conga issue? Message-ID: <8b711df40912171033u66034db1p7c6f35816b20cc79@mail.gmail.com> I am trying to configure a cluster using conga in RH5.4. Luci version is 0.12.2-6.el5_4.1. It is responding really really slow. When I log on inside the congo and click the tabs, it takes ages to show me the page/link that I want to. Sometimes it reports it is unable to communicate with cluster nodes. What might be this issue? Thanks Paras. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brem.belguebli at gmail.com Thu Dec 17 18:49:09 2009 From: brem.belguebli at gmail.com (brem belguebli) Date: Thu, 17 Dec 2009 19:49:09 +0100 Subject: [Linux-cluster] conga issue? In-Reply-To: <8b711df40912171033u66034db1p7c6f35816b20cc79@mail.gmail.com> References: <8b711df40912171033u66034db1p7c6f35816b20cc79@mail.gmail.com> Message-ID: <29ae894c0912171049q6d7a865ja97fae23efd3f753@mail.gmail.com> I personnaly gave up trying to use it, as it is very slow Particularly the storage tab is completely unusable if you have mutipath devices or more than a few disks. There was something about the /etc/hosts entries that was supposed to resolve the overall slowlyness (I can't find back the thread it was about) but it didn't have any kind of effects in my setup. Brem 2009/12/17 Paras pradhan : > I am trying to configure a cluster using conga in RH5.4. Luci version is > 0.12.2-6.el5_4.1. It is responding really really slow. When I log on inside > the congo and click the tabs, it takes ages to show me the page/link that I > want to. Sometimes it reports it is unable to communicate with cluster > nodes. What might be this issue? > > Thanks > Paras. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From pradhanparas at gmail.com Thu Dec 17 18:53:44 2009 From: pradhanparas at gmail.com (Paras pradhan) Date: Thu, 17 Dec 2009 12:53:44 -0600 Subject: [Linux-cluster] conga issue? In-Reply-To: <29ae894c0912171049q6d7a865ja97fae23efd3f753@mail.gmail.com> References: <8b711df40912171033u66034db1p7c6f35816b20cc79@mail.gmail.com> <29ae894c0912171049q6d7a865ja97fae23efd3f753@mail.gmail.com> Message-ID: <8b711df40912171053v44ab326atef5ec43addafc8b5@mail.gmail.com> On Thu, Dec 17, 2009 at 12:49 PM, brem belguebli wrote: > I personnaly gave up trying to use it, as it is very slow > > Particularly the storage tab is completely unusable if you have > mutipath devices or more than a few disks. > Yes you are correct. I have multipath device mapper. I created the storage using conga. But now I the storage tab is completely unsuable. > > There was something about the /etc/hosts entries that was supposed to > resolve the overall slowlyness (I can't find back the thread it was > about) but it didn't have any kind of effects in my setup. > I have played a bit with /etc/hosts but no luck to me as well. > > Brem > > 2009/12/17 Paras pradhan : > > I am trying to configure a cluster using conga in RH5.4. Luci version is > > 0.12.2-6.el5_4.1. It is responding really really slow. When I log on > inside > > the congo and click the tabs, it takes ages to show me the page/link that > I > > want to. Sometimes it reports it is unable to communicate with cluster > > nodes. What might be this issue? > > > > Thanks > > Paras. > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster Thanks Paras. -------------- next part -------------- An HTML attachment was scrubbed... URL: From siedler at hrd-asia.com Fri Dec 18 01:24:25 2009 From: siedler at hrd-asia.com (Wolf Siedler) Date: Fri, 18 Dec 2009 09:24:25 +0800 Subject: [Linux-cluster] Re: cluster.conf In-Reply-To: <29ae894c0912170442y22ff7076ob61bcfbc0960f6ee@mail.gmail.com> References: <4B29E099.6090105@hrd-asia.com> <4B2A0493.6040409@hrd-asia.com> <29ae894c0912170222k6480988av4608450236ff1356@mail.gmail.com> <4B2A146B.5050705@hrd-asia.com> <29ae894c0912170442y22ff7076ob61bcfbc0960f6ee@mail.gmail.com> Message-ID: <4B2AD9C9.3080006@hrd-asia.com> Hi Brem, > I do understand that this cluster.conf comes from what you think it > should be after reading the different posts, and it is not the one you > have in production right now, right ? Yes. However, except for use_virsh="0" it is exactly the one we used in production until the problematic CentOS update. > I have no xen setup to tell you exactly if the cluster.conf you posted > should be fine I had noticed that. But anyway, if you don't spot any major misconfigurations in the original cluster.conf (as quoted below), then I'll give it a try with the included use_virsh parameter. Thanks for your feedback and regards, Wolf PS: Just to clarify, this is the exact cluster.conf we used until the update-related problem: ===quote=== ===unquote=== From baishuwei at gmail.com Fri Dec 18 05:17:38 2009 From: baishuwei at gmail.com (Bai Shuwei) Date: Thu, 17 Dec 2009 21:17:38 -0800 (PST) Subject: [Linux-cluster] Invitation to connect on LinkedIn Message-ID: <1756103043.258655.1261113458016.JavaMail.app@ech3-cdn05.prod> LinkedIn ------------ Bai Shuwei requested to add you as a connection on LinkedIn: ------------------------------------------ Marian, I'd like to add you to my professional network on LinkedIn. - Bai Accept invitation from Bai Shuwei http://www.linkedin.com/e/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/blk/I1669366669_2/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_cBYVdzoSdzcVdzoNiiYUc31xu5pBuiYUdzwVdjwUcPALrCBxbOYWrSlI/EML_comm_afe/ View invitation from Bai Shuwei http://www.linkedin.com/e/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/blk/I1669366669_2/39vejoSdzoPejoSckALqnpPbOYWrSlI/svi/ ------------------------------------------ Why might connecting with Bai Shuwei be a good idea? People Bai Shuwei knows can discover your profile: Connecting to Bai Shuwei will attract the attention of LinkedIn users. See who's been viewing your profile: http://www.linkedin.com/e/wvp/inv18_wvmp/ ------ (c) 2009, LinkedIn Corporation -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakov.sosic at srce.hr Fri Dec 18 16:17:55 2009 From: jakov.sosic at srce.hr (Jakov Sosic) Date: Fri, 18 Dec 2009 17:17:55 +0100 Subject: [Linux-cluster] openais on RHEL 5.4 and broadcast? Message-ID: <1261153075.1918.8.camel@localhost> Hi. How can I force openais on RHEL 5.4 to use broadcast? I've found this in documentation: OpenAIS now provides broadcast network communication in addition to multicast. This functionality is considered Technology Preview for standalone usage of OpenAIS and for usage with the Cluster Suite. Note, however, that the functionality for configuring OpenAIS to use broadcast is not integrated into the cluster management tools and must be configured manually. I've found in cman(5) that openais settings from /etc/ais/openais.conf are ignored if openais is started by ccs_tool, and that I have to set properties for totem in cluster.conf. But how could I do that? Beacause there is no example in the man page :( -- | Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D | ================================================================= | | From ccook at pandora.com Fri Dec 18 17:35:52 2009 From: ccook at pandora.com (Christopher Strider Cook) Date: Fri, 18 Dec 2009 09:35:52 -0800 Subject: [Linux-cluster] cluster3 - service fails, doesn't failover/fence Message-ID: <4B2BBD78.5000900@pandora.com> I've got an otherwise working fine two node + qdisk cluster3 (3.0.0) setup running under Debian with 2.6.30 kern. In the past it has fenced and failed over properly to recover from a failed node. But, yesterday one of the status checks returned a 1 and the subsequent automatic start/stop of the service also returned non-good. This set my cluster service into a 'failed' state and all related components were stopped. Everything was resolved with a manual service disable and enable. Should the secondary have fenced in this case or is that reserved for only when communications in the cluster fail? I would have thought that it would have tried to start the service at least. A clustat on either machine showed the service "failed' and nothing was logged on the non-active node. Since a failover (rather then a give up) would be the proper thing, I'm assuming a config issue. Any pointers? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdiesburg at gmail.com Tue Dec 22 17:27:29 2009 From: mdiesburg at gmail.com (Marty Diesburg) Date: Tue, 22 Dec 2009 11:27:29 -0600 Subject: [Linux-cluster] Mysql.sh error. Message-ID: <5370ab990912220927i38092060t7d439dea109e6e00@mail.gmail.com> Sorry, for the double-post, ---great way to start on the list :). Below has the error message as well "Failed - Invalid Name Of Service". Hi all, I am new to the list and have an issue with the Mysql service. It is running, but when I run the commands /usr/share/cluster/mysql.sh restart, or /usr/share/cluster/mysql.sh status I get the following errors. I am using mysql as a database for an email server with Dovecot, Qmail, and Vpopmail. Verifying Configuration Of default Verifying Configuration Of default > Failed - Invalid Name Of Service Monitoring Service default Monitoring Service default > Service Is Running Thanks and Happy Holidays! Marty Diesburg Adv. Tech Independence Telcom -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Tue Dec 22 18:38:13 2009 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 22 Dec 2009 13:38:13 -0500 Subject: [Linux-cluster] Mysql.sh error. In-Reply-To: <5370ab990912220927i38092060t7d439dea109e6e00@mail.gmail.com> References: <5370ab990912220927i38092060t7d439dea109e6e00@mail.gmail.com> Message-ID: <1261507093.26419.83.camel@localhost.localdomain> On Tue, 2009-12-22 at 11:27 -0600, Marty Diesburg wrote: > Sorry, for the double-post, ---great way to start on the list :). > Below has the error message as well "Failed - Invalid Name Of > Service". > I am new to the list and have an issue with the Mysql service. It is > running, but when I run the commands /usr/share/cluster/mysql.sh > restart, or /usr/share/cluster/mysql.sh status I get > the following errors. I am using mysql as a database for an email > server with Dovecot, Qmail, and Vpopmail. > > > Verifying Configuration Of default > Verifying Configuration Of default > Failed - Invalid Name Of > Service > Monitoring Service default > Monitoring Service default > Service Is Running Try: rg_test test /etc/cluster/cluster.conf status mysql productionsql For restarting, use 'clusvcadm -R mailcluster". If you need to work on your mysql instance while the rest of your application is running, you need to do: clusvcadm -Z mailcluster rg_test test /etc/cluster/cluster.conf stop mysql productionsql [do stuff] rg_test test /etc/cluster/cluster.conf start mysql productionsql clusvcadm -U mailcluster Your service can be simplified a lot, as well: > > > > > marx, -------------- next part -------------- An HTML attachment was scrubbed... URL: From td3201 at gmail.com Tue Dec 29 19:30:15 2009 From: td3201 at gmail.com (Terry) Date: Tue, 29 Dec 2009 13:30:15 -0600 Subject: [Linux-cluster] cannot add 3rd node to running cluster Message-ID: <8ee061010912291130n68f0bad6l496f71df2cd703ac@mail.gmail.com> Hello, I have a working 2 node cluster that I am trying to add a third node to. I am trying to use Red Hat's conga (luci) to add the node in but I have also tried command line as well with no luck. I cannot start cman. cman_tool does not give any errors when I try to join the cluster either, even with -d. I am not sure where to take this at this point. Here are my package versions: cman-2.0.115-1.el5_4.9 rgmanager-2.0.52-1.el5_4.3 modcluster-0.12.1-2.el5 luci-0.12.2-6.el5_4.1 ricci-0.12.2-6.el5_4.1 I would really appreciate some help. Thanks, Terry From jwellband at gmail.com Tue Dec 29 23:20:42 2009 From: jwellband at gmail.com (Jason W.) Date: Tue, 29 Dec 2009 18:20:42 -0500 Subject: [Linux-cluster] cannot add 3rd node to running cluster In-Reply-To: <8ee061010912291130n68f0bad6l496f71df2cd703ac@mail.gmail.com> References: <8ee061010912291130n68f0bad6l496f71df2cd703ac@mail.gmail.com> Message-ID: <74e9d01e0912291520l3bc36ac4yc7a17b1f96fa123d@mail.gmail.com> On Tue, Dec 29, 2009 at 2:30 PM, Terry wrote: > Hello, > > I have a working 2 node cluster that I am trying to add a third node > to. ? I am trying to use Red Hat's conga (luci) to add the node in but If you have two node cluster with two_node=1 in cluster.conf - such as two nodes with no quorum device to break a tie - you'll need to bring the cluster down, change two_node to 0 on both nodes (and rev the cluster version at the top of cluster.conf), bring the cluster up and then add the third node. For troubleshooting any cluster issue, take a look at syslog (/var/log/messages by default). It can help to watch it on a centralized syslog server that all of your nodes forward logs to. -- HTH, YMMV, HANW :) Jason The path to enlightenment is /usr/bin/enlightenment. From xishipan at gmail.com Wed Dec 30 01:44:08 2009 From: xishipan at gmail.com (Xishi PAN) Date: Wed, 30 Dec 2009 09:44:08 +0800 Subject: [Linux-cluster] changing heartbeat interface In-Reply-To: <8b711df40912091054m3a8c5d7ax42d7cd0143898fde@mail.gmail.com> References: <8b711df40912091054m3a8c5d7ax42d7cd0143898fde@mail.gmail.com> Message-ID: Hi, Would you like to try channel bonding? Thanks. G.P On Thu, Dec 10, 2009 at 2:54 AM, Paras pradhan wrote: > hi, > > I believe its not recommend but just curious to know about the consequences > of changing the heartbeat of the cluster to the 2nd interface of the > cluster nodes. In this case if the network switch fails , then cluster will > still be quorate since they will be connected each other with the 2nd > interfaces of the nodes and will not be fenced. > > Thanks > Paras. > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Stay Fabulous, Xishi PAN -------------- next part -------------- An HTML attachment was scrubbed... URL: From diamondiona at gmail.com Wed Dec 30 03:36:02 2009 From: diamondiona at gmail.com (Diamond Li) Date: Wed, 30 Dec 2009 11:36:02 +0800 Subject: [Linux-cluster] can not mount GFS, "no such device" Message-ID: Hello, everyone: I failed to mount GFS with error message "no such device". But I have confirmed that the device exists and all relevant kernel modules have been loaded. I am using RH5.4 and no any customization at all. Would someone kindly help? [root at wplccdlvm445 ~]# mount -t gfs -v /dev/vg100/lvol0 /gfs /sbin/mount.gfs: mount /dev/mapper/vg100-lvol0 /gfs /sbin/mount.gfs: parse_opts: opts = "rw" /sbin/mount.gfs: clear flag 1 for "rw", flags = 0 /sbin/mount.gfs: parse_opts: flags = 0 /sbin/mount.gfs: parse_opts: extra = "" /sbin/mount.gfs: parse_opts: hostdata = "" /sbin/mount.gfs: parse_opts: lockproto = "" /sbin/mount.gfs: parse_opts: locktable = "" /sbin/mount.gfs: message to gfs_controld: asking to join mountgroup: /sbin/mount.gfs: write "join /gfs gfs lock_dlm clearcase:gfs rw /dev/mapper/vg100-lvol0" /sbin/mount.gfs: message from gfs_controld: response to join request: /sbin/mount.gfs: lock_dlm_join: read "0" /sbin/mount.gfs: message from gfs_controld: mount options: /sbin/mount.gfs: lock_dlm_join: read "hostdata=jid=0:id=327681:first=1" /sbin/mount.gfs: lock_dlm_join: hostdata: "hostdata=jid=0:id=327681:first=1" /sbin/mount.gfs: lock_dlm_join: extra_plus: "hostdata=jid=0:id=327681:first=1" /sbin/mount.gfs: mount(2) failed error -1 errno 19 /sbin/mount.gfs: lock_dlm_mount_result: write "mount_result /gfs gfs -1" /sbin/mount.gfs: message to gfs_controld: asking to leave mountgroup: /sbin/mount.gfs: lock_dlm_leave: write "leave /gfs gfs 19" /sbin/mount.gfs: message from gfs_controld: response to leave request: /sbin/mount.gfs: lock_dlm_leave: read "0" /sbin/mount.gfs: error mounting /dev/mapper/vg100-lvol0 on /gfs: No such device [root at wplccdlvm445 ~]# ls /dev/mapper/vg100-lvol0 /dev/mapper/vg100-lvol0 [root at wplccdlvm445 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.4 (Tikanga) [root at wplccdlvm445 ~]# uname -r 2.6.18-164.el5 [root at wplccdlvm445 ~]# lsmod |grep gfs gfs2 347529 1 lock_dlm configfs 28753 2 dlm [root at wplccdlvm445 ~]# lsmod |grep dl lock_dlm 20193 0 gfs2 347529 1 lock_dlm dlm 113749 11 lock_dlm configfs 28753 2 dlm From diamondiona at gmail.com Wed Dec 30 06:11:52 2009 From: diamondiona at gmail.com (Diamond Li) Date: Wed, 30 Dec 2009 14:11:52 +0800 Subject: [Linux-cluster] can not mount GFS, "no such device" In-Reply-To: References: Message-ID: after I use mkfs.gfs2, it works. However, I did not see any document to mention this command, always gfs_mkfs. in my humble opnion, redhat has a log way to provide real enterprise solution, both from software quality and documentation. On Wed, Dec 30, 2009 at 11:36 AM, Diamond Li wrote: > Hello, everyone: > > I failed to mount GFS with error message "no such device". But I have > confirmed that the device exists and all relevant kernel modules have > been loaded. > > I am using RH5.4 and no any customization at all. > > Would someone kindly help? > > [root at wplccdlvm445 ~]# mount -t gfs -v /dev/vg100/lvol0 /gfs > /sbin/mount.gfs: mount /dev/mapper/vg100-lvol0 /gfs > /sbin/mount.gfs: parse_opts: opts = "rw" > /sbin/mount.gfs: ? clear flag 1 for "rw", flags = 0 > /sbin/mount.gfs: parse_opts: flags = 0 > /sbin/mount.gfs: parse_opts: extra = "" > /sbin/mount.gfs: parse_opts: hostdata = "" > /sbin/mount.gfs: parse_opts: lockproto = "" > /sbin/mount.gfs: parse_opts: locktable = "" > /sbin/mount.gfs: message to gfs_controld: asking to join mountgroup: > /sbin/mount.gfs: write "join /gfs gfs lock_dlm clearcase:gfs rw > /dev/mapper/vg100-lvol0" > /sbin/mount.gfs: message from gfs_controld: response to join request: > /sbin/mount.gfs: lock_dlm_join: read "0" > /sbin/mount.gfs: message from gfs_controld: mount options: > /sbin/mount.gfs: lock_dlm_join: read "hostdata=jid=0:id=327681:first=1" > /sbin/mount.gfs: lock_dlm_join: hostdata: "hostdata=jid=0:id=327681:first=1" > /sbin/mount.gfs: lock_dlm_join: extra_plus: "hostdata=jid=0:id=327681:first=1" > /sbin/mount.gfs: mount(2) failed error -1 errno 19 > /sbin/mount.gfs: lock_dlm_mount_result: write "mount_result /gfs gfs -1" > /sbin/mount.gfs: message to gfs_controld: asking to leave mountgroup: > /sbin/mount.gfs: lock_dlm_leave: write "leave /gfs gfs 19" > /sbin/mount.gfs: message from gfs_controld: response to leave request: > /sbin/mount.gfs: lock_dlm_leave: read "0" > /sbin/mount.gfs: error mounting /dev/mapper/vg100-lvol0 on /gfs: No such device > > [root at wplccdlvm445 ~]# ls /dev/mapper/vg100-lvol0 > /dev/mapper/vg100-lvol0 > > [root at wplccdlvm445 ~]# cat /etc/redhat-release > Red Hat Enterprise Linux Server release 5.4 (Tikanga) > [root at wplccdlvm445 ~]# uname -r > 2.6.18-164.el5 > > [root at wplccdlvm445 ~]# lsmod |grep gfs > gfs2 ? ? ? ? ? ? ? ? ?347529 ?1 lock_dlm > configfs ? ? ? ? ? ? ? 28753 ?2 dlm > [root at wplccdlvm445 ~]# lsmod |grep dl > lock_dlm ? ? ? ? ? ? ? 20193 ?0 > gfs2 ? ? ? ? ? ? ? ? ?347529 ?1 lock_dlm > dlm ? ? ? ? ? ? ? ? ? 113749 ?11 lock_dlm > configfs ? ? ? ? ? ? ? 28753 ?2 dlm > From gordan at bobich.net Wed Dec 30 06:44:23 2009 From: gordan at bobich.net (Gordan Bobic) Date: Wed, 30 Dec 2009 06:44:23 +0000 Subject: [Linux-cluster] can not mount GFS, "no such device" In-Reply-To: References: Message-ID: <4B3AF6C7.2080203@bobich.net> Diamond Li wrote: > after I use mkfs.gfs2, it works. However, I did not see any document > to mention this command, always gfs_mkfs. I'm not sure what you're doing differntly (you omitted the FS creation command in your previous email), but this works just fine for me: gfs_mkfs -j 2 -p lock_dlm -t test:root /dev/hdb mount /mnt/gfs The fstab line is: /dev/hdb /mnt/gfs gfs defaults,noatime,nodiratime 0 0 Just tested it on a scratch VM. I'm assuming you have your cluster.conf configured right and the cman service (which provides fenced, groupd, etc.) has started without any errors? Again, you haven't posted your cluster.conf so it's impossible to tell. You also haven't specified whether your intention is to use gfs or gfs2. They are not the same. > in my humble opnion, redhat has a log way to provide real enterprise > solution, both from software quality and documentation. There doesn't seem to be enough in this thread to persuade me that the cause of problems isn't user error. :) Gordan From diamondiona at gmail.com Wed Dec 30 06:49:40 2009 From: diamondiona at gmail.com (Diamond Li) Date: Wed, 30 Dec 2009 14:49:40 +0800 Subject: [Linux-cluster] lvextend hangs up Message-ID: hello, everyone, it is frustrated to see lvextend hanging up when I am trying to extend a mirror logical volume: no error message, log, can't exit using CTRL+C. :-( anyone has similar experience? [root at wplccdlvm446 gfs]# lvextend -d -L +1G -m1 /dev/vg100/lvol0 [root at wplccdlvm446 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.4 (Tikanga) [root at wplccdlvm446 ~]# uname -r 2.6.18-164.el5 From diamondiona at gmail.com Wed Dec 30 06:53:54 2009 From: diamondiona at gmail.com (Diamond Li) Date: Wed, 30 Dec 2009 14:53:54 +0800 Subject: [Linux-cluster] can not mount GFS, "no such device" In-Reply-To: <4B3AF6C7.2080203@bobich.net> References: <4B3AF6C7.2080203@bobich.net> Message-ID: thanks Gordan, looks like we are in the same timezone, here is the command, same as previous one except for using mkfs.gfs2 instead of gfs_mkfs mkfs.gfs2 -t clearcase:gfs -p lock_dlm -j 6 /dev/vg100/lvol0 [root at wplccdlvm445 gfs]# clustat Cluster Status for clearcase @ Wed Dec 30 14:56:37 2009 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ wplccdlvm445.cn.ibm.com 1 Online, Local wplccdlvm446.cn.ibm.com 2 Online On Wed, Dec 30, 2009 at 2:44 PM, Gordan Bobic wrote: > Diamond Li wrote: >> >> after I use mkfs.gfs2, it works. However, I did not see any document >> to mention this command, ?always gfs_mkfs. > > I'm not sure what you're doing differntly (you omitted the FS creation > command in your previous email), but this works just fine for me: > > gfs_mkfs -j 2 -p lock_dlm -t test:root /dev/hdb > mount /mnt/gfs > > The fstab line is: > /dev/hdb ? /mnt/gfs ? gfs ? defaults,noatime,nodiratime ? 0 0 > > Just tested it on a scratch VM. > > I'm assuming you have your cluster.conf configured right and the cman > service (which provides fenced, groupd, etc.) has started without any > errors? Again, you haven't posted your cluster.conf so it's impossible to > tell. > > You also haven't specified whether your intention is to use gfs or gfs2. > They are not the same. > >> in my humble opnion, redhat has a log way to provide real enterprise >> solution, both from software quality and documentation. > > There doesn't seem to be enough in this thread to persuade me that the cause > of problems isn't user error. :) > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From cthulhucalling at gmail.com Wed Dec 30 07:01:03 2009 From: cthulhucalling at gmail.com (Ian Hayes) Date: Tue, 29 Dec 2009 23:01:03 -0800 Subject: [Linux-cluster] can not mount GFS, "no such device" In-Reply-To: <4B3AF6C7.2080203@bobich.net> References: <4B3AF6C7.2080203@bobich.net> Message-ID: <36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com> On Tue, Dec 29, 2009 at 10:44 PM, Gordan Bobic wrote: > Diamond Li wrote: > >> after I use mkfs.gfs2, it works. However, I did not see any document >> to mention this command, always gfs_mkfs. >> > > I'm not sure what you're doing differntly (you omitted the FS creation > command in your previous email), but this works just fine for me: > > gfs_mkfs -j 2 -p lock_dlm -t test:root /dev/hdb > mount /mnt/gfs > > The fstab line is: > /dev/hdb /mnt/gfs gfs defaults,noatime,nodiratime 0 0 > I had a similar problem in my Redhat Clustering and Storage Management class the other week. I believe the problem was with a couple of mistakes I made while playing around in one of the labs. I know once it was because I was trying to mount the block device instead of the logical volume. in my humble opnion, redhat has a log way to provide real enterprise > solution, both from software quality and documentation. > > There doesn't seem to be enough in this thread to persuade me that the > cause of problems isn't user error. :) > IIRC, gfs2 is still under development and considered experimental. There's tons of documentation for production-quality GFS and I imagine once gfs2 gets more mainlined, this will be the case also. -------------- next part -------------- An HTML attachment was scrubbed... URL: From diamondiona at gmail.com Wed Dec 30 07:14:59 2009 From: diamondiona at gmail.com (Diamond Li) Date: Wed, 30 Dec 2009 15:14:59 +0800 Subject: [Linux-cluster] can not mount GFS, "no such device" In-Reply-To: <36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com> References: <4B3AF6C7.2080203@bobich.net> <36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com> Message-ID: Since I have no idea about you guys OS version. From my version, RH5.4, system is using gfs2 kernel module, so I guess I have to use mkfs.gfs2 to create gfs2 file system. However, I didn't see any RH5.4 document pointing this(or I missed it out). If you guys have the same configuration, that means GFS tools is unstable because the only change I did is using different command. Same as the problem I encountered using lvcreate, on the first day it always hangs up, but in next morning, it executed successfully without any changes. It sounds impossible but this is the truth. [root at wplccdlvm445 gfs]# lsmod |grep -i gfs gfs2 347529 2 lock_dlm configfs 28753 2 dlm On Wed, Dec 30, 2009 at 3:01 PM, Ian Hayes wrote: > On Tue, Dec 29, 2009 at 10:44 PM, Gordan Bobic wrote: >> >> Diamond Li wrote: >>> >>> after I use mkfs.gfs2, it works. However, I did not see any document >>> to mention this command, ?always gfs_mkfs. >> >> I'm not sure what you're doing differntly (you omitted the FS creation >> command in your previous email), but this works just fine for me: >> >> gfs_mkfs -j 2 -p lock_dlm -t test:root /dev/hdb >> mount /mnt/gfs >> >> The fstab line is: >> /dev/hdb ? /mnt/gfs ? gfs ? defaults,noatime,nodiratime ? 0 0 > > > I had a similar problem in my Redhat Clustering and Storage Management class > the other week. I believe the problem was with a couple of mistakes I made > while playing around in one of the labs. I know once it was because I was > trying to mount the block device instead of the logical volume. > >> in my humble opnion, redhat has a log way to provide real enterprise >> solution, both from software quality and documentation. > >> >> There doesn't seem to be enough in this thread to persuade me that the >> cause of problems isn't user error. :) > > IIRC, gfs2 is still under development and considered experimental. There's > tons of documentation for production-quality GFS and I imagine once gfs2 > gets more mainlined, this will be the case also. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From gordan at bobich.net Wed Dec 30 07:23:40 2009 From: gordan at bobich.net (Gordan Bobic) Date: Wed, 30 Dec 2009 07:23:40 +0000 Subject: [Linux-cluster] can not mount GFS, "no such device" In-Reply-To: <36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com> References: <4B3AF6C7.2080203@bobich.net> <36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com> Message-ID: <4B3AFFFC.2000903@bobich.net> Ian Hayes wrote: > I had a similar problem in my Redhat Clustering and Storage Management > class the other week. I believe the problem was with a couple of > mistakes I made while playing around in one of the labs. I know once it > was because I was trying to mount the block device instead of the > logical volume. I'm assuming you mean that you were mkfs-ing one and then trying to mount the other. I'm vehemently against putting everything on lvm just for the sake of it, but I've never had a problem with mkfs-ing or mount-ing either, as long as it's consistent. I tend not to partition iSCSI and DRBD volumes, so I know that working direct with the whole block device works just fine. > in my humble opnion, redhat has a log way to provide real enterprise > solution, both from software quality and documentation. > > > > There doesn't seem to be enough in this thread to persuade me that > the cause of problems isn't user error. :) > > > IIRC, gfs2 is still under development and considered experimental. > There's tons of documentation for production-quality GFS and I imagine > once gfs2 gets more mainlined, this will be the case also. Don't quite me on this, but I'm pretty sure GFS2 is deemed stable as of RHEL 5.4 (or was it 5.3?). Having said that, I haven't yet deployed any GFS2 volumes in production, and don't plan on doing so imminently, so draw whatever conclusions you see fit from that. ;) Gordan From gordan at bobich.net Wed Dec 30 07:31:03 2009 From: gordan at bobich.net (Gordan Bobic) Date: Wed, 30 Dec 2009 07:31:03 +0000 Subject: [Linux-cluster] can not mount GFS, "no such device" In-Reply-To: References: <4B3AF6C7.2080203@bobich.net> <36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com> Message-ID: <4B3B01B7.10907@bobich.net> Diamond Li wrote: > Since I have no idea about you guys OS version. From my version, > RH5.4, system is using > gfs2 kernel module, so I guess I have to use mkfs.gfs2 to create gfs2 > file system. However, I didn't see any RH5.4 document pointing > this(or I missed it out). I suspect most documentation still doesn't mention GFS2 since it is still quite new and not far on it's maturity curve. GFS1, OTOH, has been around for a long time and is what is expected to be in production at the moment. FYI, I use RHEL/CentOS 5.x on my systems, most are now updated to 5.4 (as was the example I ran to test what you reported). There are two separate kernel modules: gfs and gfs2. gfs requires gfs2 (some of the low level dependencies were moved there a long time ago), but working with GFS1 requires the gfs kernel module. If you haven't got gfs loaded (but do have gfs2 loaded) that would explain why you were having difficulties mounting a GFS1 file system (but GFS2 worked fine). Your lsmod information seems consistent with this theory. > If you guys have the same configuration, that means GFS tools is > unstable because the only change I did is using different command. See previous paragraph for gfs vs. gfs2. > Same as the problem I encountered using lvcreate, on the first day it > always hangs up, but in next morning, it executed successfully without > any changes. It sounds impossible but this is the truth. Just to make sure - I take it you are aware that lvm (the non-cluster version) is different to clvm (the cluster-aware version)? You aren't using the non-cluster lvm for a cluster volume, are you? Gordan From cthulhucalling at gmail.com Wed Dec 30 08:07:18 2009 From: cthulhucalling at gmail.com (Ian Hayes) Date: Wed, 30 Dec 2009 00:07:18 -0800 Subject: [Linux-cluster] can not mount GFS, "no such device" In-Reply-To: <4B3AFFFC.2000903@bobich.net> References: <4B3AF6C7.2080203@bobich.net> <36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com> <4B3AFFFC.2000903@bobich.net> Message-ID: <36df569a0912300007v3f7ff0b9p87ffe405987099dc@mail.gmail.com> On Tue, Dec 29, 2009 at 11:23 PM, Gordan Bobic wrote: > Ian Hayes wrote: > > I had a similar problem in my Redhat Clustering and Storage Management >> class the other week. I believe the problem was with a couple of mistakes I >> made while playing around in one of the labs. I know once it was because I >> was trying to mount the block device instead of the logical volume. >> > > I'm assuming you mean that you were mkfs-ing one and then trying to mount > the other. I'm vehemently against putting everything on lvm just for the > sake of it, but I've never had a problem with mkfs-ing or mount-ing either, > as long as it's consistent. I tend not to partition iSCSI and DRBD volumes, > so I know that working direct with the whole block device works just fine. > Well, the good thing about being in a RH class is that you can do all kinds of sick, twisted evil things just to see what happens. I've also made the mistake of doing things like not changing the locking_type in lvm.conf to 3 and forgetting to start clvmd. Any of those can lead to strange and exciting times with GFS. IIRC, gfs2 is still under development and considered experimental. There's >> tons of documentation for production-quality GFS and I imagine once gfs2 >> gets more mainlined, this will be the case also. >> > > Don't quite me on this, but I'm pretty sure GFS2 is deemed stable as of > RHEL 5.4 (or was it 5.3?). Having said that, I haven't yet deployed any GFS2 > volumes in production, and don't plan on doing so imminently, so draw > whatever conclusions you see fit from that. ;) We're fine with GFS where we are. I've done some benchmarking on GFS2 and it's performance didn't come anywhere near what we could do with GFS. -------------- next part -------------- An HTML attachment was scrubbed... URL: From diamondiona at gmail.com Wed Dec 30 08:20:50 2009 From: diamondiona at gmail.com (Diamond Li) Date: Wed, 30 Dec 2009 16:20:50 +0800 Subject: [Linux-cluster] can not mount GFS, "no such device" In-Reply-To: <36df569a0912300007v3f7ff0b9p87ffe405987099dc@mail.gmail.com> References: <4B3AF6C7.2080203@bobich.net> <36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com> <4B3AFFFC.2000903@bobich.net> <36df569a0912300007v3f7ff0b9p87ffe405987099dc@mail.gmail.com> Message-ID: I have started clvmd on all nodes, and changed locking_type. anyway, I will keep an eye on this random error. On Wed, Dec 30, 2009 at 4:07 PM, Ian Hayes wrote: > > > On Tue, Dec 29, 2009 at 11:23 PM, Gordan Bobic wrote: >> >> Ian Hayes wrote: >> >>> I had a similar problem in my Redhat Clustering and Storage Management >>> class the other week. I believe the problem was with a couple of mistakes I >>> made while playing around in one of the labs. I know once it was because I >>> was trying to mount the block device instead of the logical volume. >> >> I'm assuming you mean that you were mkfs-ing one and then trying to mount >> the other. I'm vehemently against putting everything on lvm just for the >> sake of it, but I've never had a problem with mkfs-ing or mount-ing either, >> as long as it's consistent. I tend not to partition iSCSI and DRBD volumes, >> so I know that working direct with the whole block device works just fine. > > Well, the good thing about being in a RH class is that you can do all kinds > of sick, twisted evil things just to see what happens. I've also made the > mistake of doing things like not changing the locking_type in lvm.conf to 3 > and forgetting to start clvmd. Any of those can lead to strange and exciting > times with GFS. > > >>> IIRC, gfs2 is still under development and considered experimental. >>> There's tons of documentation for production-quality GFS and I imagine once >>> gfs2 gets more mainlined, this will be the case also. >> >> Don't quite me on this, but I'm pretty sure GFS2 is deemed stable as of >> RHEL 5.4 (or was it 5.3?). Having said that, I haven't yet deployed any GFS2 >> volumes in production, and don't plan on doing so imminently, so draw >> whatever conclusions you see fit from that. ;) > > We're fine with GFS where we are. I've done some benchmarking on GFS2 and > it's performance didn't come anywhere near what we could do with GFS. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From diamondiona at gmail.com Wed Dec 30 09:41:42 2009 From: diamondiona at gmail.com (Diamond Li) Date: Wed, 30 Dec 2009 17:41:42 +0800 Subject: [Linux-cluster] CTDB configuration files are missing Message-ID: Hello, everyone, it may not the right group to ask CTDB question, but if someone happens to know the answer, I would appreciate. after I compiled and installed ctdb, I did not see configure file /etc/sysconfig/ctdb, but there is no errors during installation. It should be created during installation, right? one more question, is there easy to build CTDB rpm package? [root at wplccdlvm445 config]# ls -l /etc/sysconfig/ctdb ls: /etc/sysconfig/ctdb: No such file or directory installation steps: cd ctdb ./autogen.sh ./configure make make install [root at wplccdlvm445 ctdb]# make install |less ctdb will be compiled with flags: CFLAGS = -g -I./include -Iinclude -Ilib -Ilib/util -I. -I./lib/talloc -Ilib/tdb/include -I./lib/re place -DVARDIR=\"/usr/local/var\" -DETCDIR=\"/usr/local/etc\" -DLOGDIR=\"/usr/local/var/log\" -DUSE_ MMAP=1 -I./lib/replace -Wall -Wshadow -Wstrict-prototypes -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings LIBS = mkdir -p //usr/local/lib/pkgconfig mkdir -p //usr/local/bin mkdir -p //usr/local/sbin mkdir -p //usr/local/include mkdir -p //usr/local/etc/ctdb mkdir -p //usr/local/etc/ctdb/events.d mkdir -p //usr/share/doc/ctdb /usr/bin/install -c -m 644 ctdb.pc //usr/local/lib/pkgconfig /usr/bin/install -c -m 755 bin/ctdb //usr/local/bin /usr/bin/install -c -m 755 bin/ctdbd //usr/local/sbin /usr/bin/install -c -m 755 bin/smnotify //usr/local/bin /usr/bin/install -c -m 755 bin/ping_pong //usr/local/bin /usr/bin/install -c -m 644 include/ctdb.h //usr/local/include /usr/bin/install -c -m 644 include/ctdb_private.h //usr/local/include # for samba3 /usr/bin/install -c -m 644 config/functions //usr/local/etc/ctdb /usr/bin/install -c -m 755 config/statd-callout //usr/local/etc/ctdb /usr/bin/install -c -m 644 config/events.d/README //usr/share/doc/ctdb/README.eventscripts /usr/bin/install -c -m 644 doc/recovery-process.txt //usr/share/doc/ctdb/recovery-process.txt /usr/bin/install -c -m 755 config/events.d/00.ctdb //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/01.reclock //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/10.interface //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/11.natgw //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/11.routing //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 644 config/events.d/20.multipathd //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 644 config/events.d/31.clamd //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/40.vsftpd //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/41.httpd //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/50.samba //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/60.nfs //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/61.nfstickle //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/70.iscsi //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/91.lvs //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 tools/ctdb_diagnostics //usr/local/bin /usr/bin/install -c -m 755 tools/onnode //usr/local/bin if [ -f doc/ctdb.1 ];then /usr/bin/install -c -d //usr/local/man/man1; fi if [ -f doc/ctdb.1 ];then /usr/bin/install -c -m 644 doc/ctdb.1 //usr/local/man/man1; fi if [ -f doc/ctdbd.1 ];then /usr/bin/install -c -m 644 doc/ctdbd.1 //usr/local/man/man1; fi if [ -f doc/onnode.1 ];then /usr/bin/install -c -m 644 doc/onnode.1 //usr/local/man/man1; fi if [ ! -f //usr/local/etc/ctdb/notify.sh ];then /usr/bin/install -c -m 755 config/notify.sh //usr/local/etc/ctdb; fi From gordan at bobich.net Wed Dec 30 09:59:30 2009 From: gordan at bobich.net (Gordan Bobic) Date: Wed, 30 Dec 2009 09:59:30 +0000 Subject: [Linux-cluster] CTDB configuration files are missing In-Reply-To: References: Message-ID: <4B3B2482.9080906@bobich.net> Never used CTDB myself, but as far as RPMs go, they are available in the epel yum repository. rpm -Uvh \ http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-5-3.noarch.rpm yum install ctdb should to get you going without building your own. Gordan Diamond Li wrote: > Hello, everyone, > it may not the right group to ask CTDB question, but if someone > happens to know the answer, I would appreciate. after I compiled and > installed ctdb, I did not see configure file /etc/sysconfig/ctdb, but > there is no errors during installation. > > It should be created during installation, right? > > one more question, is there easy to build CTDB rpm package? > > [root at wplccdlvm445 config]# ls -l /etc/sysconfig/ctdb > ls: /etc/sysconfig/ctdb: No such file or directory > > installation steps: > cd ctdb > ./autogen.sh > ./configure > make > make install > > > > [root at wplccdlvm445 ctdb]# make install |less > ctdb will be compiled with flags: > CFLAGS = -g -I./include -Iinclude -Ilib -Ilib/util -I. > -I./lib/talloc -Ilib/tdb/include -I./lib/re > place -DVARDIR=\"/usr/local/var\" -DETCDIR=\"/usr/local/etc\" > -DLOGDIR=\"/usr/local/var/log\" -DUSE_ > MMAP=1 -I./lib/replace -Wall -Wshadow -Wstrict-prototypes > -Wpointer-arith -Wcast-qual -Wcast-align > -Wwrite-strings > LIBS = > mkdir -p //usr/local/lib/pkgconfig > mkdir -p //usr/local/bin > mkdir -p //usr/local/sbin > mkdir -p //usr/local/include > mkdir -p //usr/local/etc/ctdb > mkdir -p //usr/local/etc/ctdb/events.d > mkdir -p //usr/share/doc/ctdb > /usr/bin/install -c -m 644 ctdb.pc //usr/local/lib/pkgconfig > /usr/bin/install -c -m 755 bin/ctdb //usr/local/bin > /usr/bin/install -c -m 755 bin/ctdbd //usr/local/sbin > /usr/bin/install -c -m 755 bin/smnotify //usr/local/bin > /usr/bin/install -c -m 755 bin/ping_pong //usr/local/bin > /usr/bin/install -c -m 644 include/ctdb.h //usr/local/include > /usr/bin/install -c -m 644 include/ctdb_private.h //usr/local/include > # for samba3 > /usr/bin/install -c -m 644 config/functions //usr/local/etc/ctdb > /usr/bin/install -c -m 755 config/statd-callout //usr/local/etc/ctdb > /usr/bin/install -c -m 644 config/events.d/README > //usr/share/doc/ctdb/README.eventscripts > /usr/bin/install -c -m 644 doc/recovery-process.txt > //usr/share/doc/ctdb/recovery-process.txt > /usr/bin/install -c -m 755 config/events.d/00.ctdb //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 755 config/events.d/01.reclock > //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 755 config/events.d/10.interface > //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 755 config/events.d/11.natgw > //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 755 config/events.d/11.routing > //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 644 config/events.d/20.multipathd > //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 644 config/events.d/31.clamd > //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 755 config/events.d/40.vsftpd > //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 755 config/events.d/41.httpd > //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 755 config/events.d/50.samba > //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 755 config/events.d/60.nfs //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 755 config/events.d/61.nfstickle > //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 755 config/events.d/70.iscsi > //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 755 config/events.d/91.lvs //usr/local/etc/ctdb/events.d > /usr/bin/install -c -m 755 tools/ctdb_diagnostics //usr/local/bin > /usr/bin/install -c -m 755 tools/onnode //usr/local/bin > if [ -f doc/ctdb.1 ];then /usr/bin/install -c -d //usr/local/man/man1; fi > if [ -f doc/ctdb.1 ];then /usr/bin/install -c -m 644 doc/ctdb.1 > //usr/local/man/man1; fi > if [ -f doc/ctdbd.1 ];then /usr/bin/install -c -m 644 doc/ctdbd.1 > //usr/local/man/man1; fi > if [ -f doc/onnode.1 ];then /usr/bin/install -c -m 644 doc/onnode.1 > //usr/local/man/man1; fi > if [ ! -f //usr/local/etc/ctdb/notify.sh ];then /usr/bin/install -c -m > 755 config/notify.sh //usr/local/etc/ctdb; fi > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From crosa at redhat.com Wed Dec 30 10:08:45 2009 From: crosa at redhat.com (Cleber Rosa) Date: Wed, 30 Dec 2009 05:08:45 -0500 (EST) Subject: [Linux-cluster] CTDB configuration files are missing In-Reply-To: Message-ID: <1581333230.47421262167725041.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> Hi Li, AFAIK there are ctdb packages on RHEL's supplementary channel (to support the samba 3x package). CR. --- Cleber Rodrigues < crosa at redhat.com > Solutions Architect - Red Hat, Inc. Mobile: +55 61 9185.3454 ----- Mensagem original ----- De: "Diamond Li" Para: "linux clustering" Enviadas: Quarta-feira, 30 de Dezembro de 2009 7:41:42 (GMT-0300) Auto-Detected Assunto: [Linux-cluster] CTDB configuration files are missing Hello, everyone, it may not the right group to ask CTDB question, but if someone happens to know the answer, I would appreciate. after I compiled and installed ctdb, I did not see configure file /etc/sysconfig/ctdb, but there is no errors during installation. It should be created during installation, right? one more question, is there easy to build CTDB rpm package? [root at wplccdlvm445 config]# ls -l /etc/sysconfig/ctdb ls: /etc/sysconfig/ctdb: No such file or directory installation steps: cd ctdb ./autogen.sh ./configure make make install [root at wplccdlvm445 ctdb]# make install |less ctdb will be compiled with flags: CFLAGS = -g -I./include -Iinclude -Ilib -Ilib/util -I. -I./lib/talloc -Ilib/tdb/include -I./lib/re place -DVARDIR=\"/usr/local/var\" -DETCDIR=\"/usr/local/etc\" -DLOGDIR=\"/usr/local/var/log\" -DUSE_ MMAP=1 -I./lib/replace -Wall -Wshadow -Wstrict-prototypes -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings LIBS = mkdir -p //usr/local/lib/pkgconfig mkdir -p //usr/local/bin mkdir -p //usr/local/sbin mkdir -p //usr/local/include mkdir -p //usr/local/etc/ctdb mkdir -p //usr/local/etc/ctdb/events.d mkdir -p //usr/share/doc/ctdb /usr/bin/install -c -m 644 ctdb.pc //usr/local/lib/pkgconfig /usr/bin/install -c -m 755 bin/ctdb //usr/local/bin /usr/bin/install -c -m 755 bin/ctdbd //usr/local/sbin /usr/bin/install -c -m 755 bin/smnotify //usr/local/bin /usr/bin/install -c -m 755 bin/ping_pong //usr/local/bin /usr/bin/install -c -m 644 include/ctdb.h //usr/local/include /usr/bin/install -c -m 644 include/ctdb_private.h //usr/local/include # for samba3 /usr/bin/install -c -m 644 config/functions //usr/local/etc/ctdb /usr/bin/install -c -m 755 config/statd-callout //usr/local/etc/ctdb /usr/bin/install -c -m 644 config/events.d/README //usr/share/doc/ctdb/README.eventscripts /usr/bin/install -c -m 644 doc/recovery-process.txt //usr/share/doc/ctdb/recovery-process.txt /usr/bin/install -c -m 755 config/events.d/00.ctdb //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/01.reclock //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/10.interface //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/11.natgw //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/11.routing //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 644 config/events.d/20.multipathd //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 644 config/events.d/31.clamd //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/40.vsftpd //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/41.httpd //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/50.samba //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/60.nfs //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/61.nfstickle //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/70.iscsi //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 config/events.d/91.lvs //usr/local/etc/ctdb/events.d /usr/bin/install -c -m 755 tools/ctdb_diagnostics //usr/local/bin /usr/bin/install -c -m 755 tools/onnode //usr/local/bin if [ -f doc/ctdb.1 ];then /usr/bin/install -c -d //usr/local/man/man1; fi if [ -f doc/ctdb.1 ];then /usr/bin/install -c -m 644 doc/ctdb.1 //usr/local/man/man1; fi if [ -f doc/ctdbd.1 ];then /usr/bin/install -c -m 644 doc/ctdbd.1 //usr/local/man/man1; fi if [ -f doc/onnode.1 ];then /usr/bin/install -c -m 644 doc/onnode.1 //usr/local/man/man1; fi if [ ! -f //usr/local/etc/ctdb/notify.sh ];then /usr/bin/install -c -m 755 config/notify.sh //usr/local/etc/ctdb; fi -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.lense at convergys.com Wed Dec 30 13:32:56 2009 From: michael.lense at convergys.com (michael.lense at convergys.com) Date: Wed, 30 Dec 2009 08:32:56 -0500 Subject: [Linux-cluster] Network Bonding in Clustered Environment ?? Message-ID: <1F33592152DAAB43A67411276617D3C5F53EB3A931@CDCMW10E.na.convergys.com> Red Hat Linux-Clustering I am currently setting up a two node cluster for a Database Environment... I have Network Bonding setup on the two nodes and was reading in one document that Red Hat uses eth0 as the default heartbeat... Is there something I need to do to have it setup to us a certain bond0.xxx VLan ?? and if so how would I do this ?? bond0 Link encap:Ethernet HWaddr 00:26:B9:34:46:0B inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:205491095 errors:0 dropped:0 overruns:0 frame:0 TX packets:213210619 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:95394558805 (88.8 GiB) TX bytes:173234191604 (161.3 GiB) bond0.211 Link encap:Ethernet HWaddr 00:26:B9:34:46:0B inet addr:10.195.27.5 Bcast:10.195.27.31 Mask:255.255.255.224 inet6 addr: fe80::226:b9ff:fe34:460b/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:200489472 errors:0 dropped:0 overruns:0 frame:0 TX packets:213200258 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:90408105703 (84.1 GiB) TX bytes:171522012810 (159.7 GiB) bond0.211:1 Link encap:Ethernet HWaddr 00:26:B9:34:46:0B inet addr:10.195.27.16 Bcast:10.195.27.31 Mask:255.255.255.224 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 bond0.212 Link encap:Ethernet HWaddr 00:26:B9:34:46:0B inet addr:135.108.71.5 Bcast:135.108.71.31 Mask:255.255.255.224 inet6 addr: fe80::226:b9ff:fe34:460b/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:523160 errors:0 dropped:0 overruns:0 frame:0 TX packets:10850 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:79853140 (76.1 MiB) TX bytes:2559699 (2.4 MiB) bond0.212:1 Link encap:Ethernet HWaddr 00:26:B9:34:46:0B inet addr:135.108.71.16 Bcast:135.108.71.31 Mask:255.255.255.224 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 bond0.213 Link encap:Ethernet HWaddr 00:26:B9:34:46:0B inet addr:192.168.65.5 Bcast:192.168.65.31 Mask:255.255.255.224 inet6 addr: fe80::226:b9ff:fe34:460b/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:367458 errors:0 dropped:0 overruns:0 frame:0 TX packets:938 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:63975254 (61.0 MiB) TX bytes:39788 (38.8 KiB) bond0.213:1 Link encap:Ethernet HWaddr 00:26:B9:34:46:0B inet addr:192.168.65.16 Bcast:192.168.65.31 Mask:255.255.255.224 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 bond0.215 Link encap:Ethernet HWaddr 00:26:B9:34:46:0B inet addr:192.168.15.5 Bcast:192.168.15.255 Mask:255.255.255.0 inet6 addr: fe80::226:b9ff:fe34:460b/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:39936 errors:0 dropped:0 overruns:0 frame:0 TX packets:17 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:7984970 (7.6 MiB) TX bytes:1106 (1.0 KiB) eth0 Link encap:Ethernet HWaddr 00:26:B9:34:46:0B UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:106617306 errors:0 dropped:0 overruns:0 frame:0 TX packets:106605309 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:49644091276 (46.2 GiB) TX bytes:86616008153 (80.6 GiB) Interrupt:225 Memory:d6000000-d6012100 eth1 Link encap:Ethernet HWaddr 00:26:B9:34:46:0B UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:98873789 errors:0 dropped:0 overruns:0 frame:0 TX packets:106605310 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:45750467529 (42.6 GiB) TX bytes:86618183451 (80.6 GiB) Interrupt:233 Memory:d8000000-d8012100 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:63014709 errors:0 dropped:0 overruns:0 frame:0 TX packets:63014709 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4440888070 (4.1 GiB) TX bytes:4440888070 (4.1 GiB) # cat /etc/cluster/cluster.conf