From aneesh.kumar at gmail.com Mon Jan 1 15:57:25 2007 From: aneesh.kumar at gmail.com (Aneesh Kumar K.V) Date: Mon, 01 Jan 2007 21:27:25 +0530 Subject: [Linux-cluster] Re: RedHat SSI cluster In-Reply-To: <45950A9D.7040607@interstudio.homeunix.net> References: <45950A9D.7040607@interstudio.homeunix.net> Message-ID: <45992F65.206@gmail.com> Bob Marcan wrote: > Hi. > Are there any plans to enhance RHCS to become > full SSI (Single System Image) cluster? > Will http://www.open-sharedroot.org/ become officialy included and > supported? > Isn't time to unite force with the http://www.openssi.org ? > If you look at openssi.org code you can consider it contain multiple components a) ICS b) VPROC c) CFS d) Clusterwide SYSVIPC e) Clusterwide PID f) Clusterwide remote file operations I am right now done with getting ICS cleaned up for 2.6.20-rc1 kernel. It provides a transport independent cluster framework for writing kernel cluster services. You can find the code at http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=summary So what could be done which will help GFS and OCFS2 is to make sure they can work on top of ICS. That also bring in an advantage that GFS and OCFS2 can work using TCP/Infiniband/SCTP/TIPC what ever the transport layer protocol is. Once that is done next step would be to get Clusterwide SYSVIPC from OpenSSI and merge it with latest kernel. ClusterWide PID and clusterwide remote file operation is easy to get working. What is most difficult is VPROC which bring in the clusterwide proc model. Bruce Walker have a paper written on a generic framework at http://www.openssi.org/cgi-bin/view?page=proc-hooks.html -aneesh -aneesh From aneesh.kumar at gmail.com Mon Jan 1 15:57:25 2007 From: aneesh.kumar at gmail.com (Aneesh Kumar K.V) Date: Mon, 01 Jan 2007 21:27:25 +0530 Subject: [Linux-cluster] Re: RedHat SSI cluster In-Reply-To: <45950A9D.7040607@interstudio.homeunix.net> References: <45950A9D.7040607@interstudio.homeunix.net> Message-ID: <45992F65.206@gmail.com> Bob Marcan wrote: > Hi. > Are there any plans to enhance RHCS to become > full SSI (Single System Image) cluster? > Will http://www.open-sharedroot.org/ become officialy included and > supported? > Isn't time to unite force with the http://www.openssi.org ? > If you look at openssi.org code you can consider it contain multiple components a) ICS b) VPROC c) CFS d) Clusterwide SYSVIPC e) Clusterwide PID f) Clusterwide remote file operations I am right now done with getting ICS cleaned up for 2.6.20-rc1 kernel. It provides a transport independent cluster framework for writing kernel cluster services. You can find the code at http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=summary So what could be done which will help GFS and OCFS2 is to make sure they can work on top of ICS. That also bring in an advantage that GFS and OCFS2 can work using TCP/Infiniband/SCTP/TIPC what ever the transport layer protocol is. Once that is done next step would be to get Clusterwide SYSVIPC from OpenSSI and merge it with latest kernel. ClusterWide PID and clusterwide remote file operation is easy to get working. What is most difficult is VPROC which bring in the clusterwide proc model. Bruce Walker have a paper written on a generic framework at http://www.openssi.org/cgi-bin/view?page=proc-hooks.html -aneesh -aneesh From cbay at excellency.fr Tue Jan 2 10:47:38 2007 From: cbay at excellency.fr (Cyril) Date: Tue, 2 Jan 2007 11:47:38 +0100 Subject: [Linux-cluster] GFS2: kernel oops on mount with lock_nolock Message-ID: <37213c1e67b467e8a709429a465d89a1@mail.excellency.fr> Hello, First of all, happy new year to everyone :-) I compiled a 2.6.19.1 kernel with GFS2 and lock_nolock. When trying to mount the newly created GFS2 partition, I get 2 successive identical kernel oops (see [1], at the end of this mail). The second oops appears about 15 seconds after the first one. Nevertheless, the FS is mounted and I can make basic file operations on it. However, extracting kernel sources triggers another oops and tar exits with a segmentation fault: dbx5:/mnt# tar xjvf /root/linux-2.6.19.tar.bz2 [...] linux-2.6.19/include/asm-h8300/shmbuf.h Segmentation fault See the oops in [2]. Now the FS seems stuck, and trying to remove a file hangs forever. These errors happen on a freshly installed Debian stable with my custom kernel. I get the same oops with a 2.6.20-rc2 kernel. Steps to reproduce: - install Debian stable - install the kernel compiled with my .config (can be found on http://dev.excellency.fr/cbay/config) - install mkfs.gfs2 (from latest cluster CVS, compiled myself) and libvolume_id.so (from udev 0.94, compiled myself) - # mkfs -t gfs2 -p lock_nolock -t test:test /dev/sda3 - # mount -t gfs2 /dev/sda3 /mnt/ /dev/sda3 is 10GB. Any idea? Thanks! [1] : GFS2: fsid=: Trying to join cluster "lock_nolock", "test:test" GFS2: fsid=test:test.0: Joined cluster. Now mounting FS... GFS2: fsid=test:test.0: jid=0, already locked for use GFS2: fsid=test:test.0: jid=0: Looking at journal... GFS2: fsid=test:test.0: jid=0: Done ------------[ cut here ]------------ kernel BUG at fs/gfs2/glock.c:738! invalid opcode: 0000 [#1] Modules linked in: CPU: 0 EIP: 0060:[] Not tainted VLI EFLAGS: 00010286 (2.6.19.1 #1) EIP is at gfs2_glmutex_unlock+0x26/0x30 eax: f788dbbc ebx: f788dbec ecx: 00000001 edx: f788dbbc esi: f788db78 edi: f5022388 ebp: f5415f94 esp: f5415f48 ds: 007b es: 007b ss: 0068 Process gfs2_glockd (pid: 1892, ti=f5414000 task=f7e6e030 task.ti=f5414000) Stack: f788db78 c023ee95 f788db78 00000283 f5022000 f5415f88 c0234a28 f5022000 00000000 f7e6e030 c012d760 f5415f94 f5415f94 c052b7a0 00000000 00000000 00000000 f7e6e030 c012d760 f5415f94 f5415f94 f7014fe8 000000cc c1bc8550 Call Trace: [] gfs2_reclaim_glock+0x85/0xb0 [] gfs2_glockd+0xe8/0x110 [] autoremove_wake_function+0x0/0x60 [] autoremove_wake_function+0x0/0x60 [] gfs2_glockd+0x0/0x110 [] kthread+0xb7/0xc0 [] kthread+0x0/0xc0 [] kernel_thread_helper+0x7/0x10 ======================= Code: bf 00 00 00 00 83 ec 04 b8 01 00 00 00 8b 54 24 08 0f b3 42 08 c7 42 24 00 00 00 00 c7 42 28 00 00 00 00 89 14 24 e8 2a fe ff ff <0f> 0b e2 02 81 d4 3b c0 58 c3 55 57 56 31 f6 53 83 ec 10 8b 7c EIP: [] gfs2_glmutex_unlock+0x26/0x30 SS:ESP 0068:f5415f48 <0>------------[ cut here ]------------ kernel BUG at fs/gfs2/glock.c:738! invalid opcode: 0000 [#2] Modules linked in: CPU: 0 EIP: 0060:[] Not tainted VLI EFLAGS: 00010286 (2.6.19.1 #1) EIP is at gfs2_glmutex_unlock+0x26/0x30 eax: f5757e2c ebx: f5757de8 ecx: 00000001 edx: f5757e2c esi: f5757de8 edi: f5022000 ebp: 00000001 esp: f55b3f78 ds: 007b es: 007b ss: 0068 Process gfs2_scand (pid: 1891, ti=f55b2000 task=c19a8030 task.ti=f55b2000) Stack: f5757de8 c023ef23 f5757de8 0000090e f5022000 c0234900 fffffffc c023efc5 c023ef30 f5022000 0000090d f5022000 f5022000 c0234921 f5022000 f7813d7c c012d3a7 f5022000 f55b3fcc 00000000 00000001 ffffffff ffffffff c012d2f0 Call Trace: [] examine_bucket+0x63/0x70 [] gfs2_scand+0x0/0x40 [] gfs2_scand_internal+0x25/0x40 [] scan_glock+0x0/0x70 [] gfs2_scand+0x21/0x40 [] kthread+0xb7/0xc0 [] kthread+0x0/0xc0 [] kernel_thread_helper+0x7/0x10 ======================= Code: bf 00 00 00 00 83 ec 04 b8 01 00 00 00 8b 54 24 08 0f b3 42 08 c7 42 24 00 00 00 00 c7 42 28 00 00 00 00 89 14 24 e8 2a fe ff ff <0f> 0b e2 02 81 d4 3b c0 58 c3 55 57 56 31 f6 53 83 ec 10 8b 7c EIP: [] gfs2_glmutex_unlock+0x26/0x30 SS:ESP 0068:f55b3f78 [2] : <0>------------[ cut here ]------------ kernel BUG at fs/gfs2/log.c:74! invalid opcode: 0000 [#3] Modules linked in: CPU: 0 EIP: 0060:[] Not tainted VLI EFLAGS: 00010292 (2.6.19.1-alwaysdata #1) EIP is at gfs2_ail1_start_one+0xb/0x150 eax: f5532380 ebx: f5022000 ecx: f5022000 edx: 00000000 esi: f5532380 edi: 00000000 ebp: f502269c esp: f70cfcc4 ds: 007b es: 007b ss: 0068 Process tar (pid: 2032, ti=f70ce000 task=f7e6e030 task.ti=f70ce000) Stack: 000001f6 000001f7 f502263c 00000000 f70cfcd4 f70cfcd4 00000004 f5022000 00000000 00000000 f502269c c0243378 f5022000 f5532380 f5022000 00000000 f5532380 f502267c f5022000 00000002 00000125 f5022658 c0243732 f5022000 Call Trace: [] gfs2_ail1_start+0x68/0x120 [] gfs2_log_reserve+0x92/0x110 [] gfs2_glock_nq+0x4a/0xa0 [] gfs2_trans_begin+0xff/0x160 [] link_dinode+0xe1/0x230 [] gfs2_createi+0x268/0x300 [] gfs2_create+0x66/0x130 [] gfs2_createi+0x71/0x300 [] gfs2_glock_nq_num+0x78/0xa0 [] gfs2_create+0x0/0x130 [] vfs_create+0xa9/0x190 [] open_namei_create+0x60/0xb0 [] open_namei+0x64d/0x680 [] default_wake_function+0x0/0x20 [] do_filp_open+0x40/0x60 [] get_unused_fd+0x66/0xc0 [] do_sys_open+0x57/0xf0 [] sys_open+0x27/0x30 [] syscall_call+0x7/0xb ======================= Code: 89 c3 8d 44 08 ff f7 f3 8d 68 01 89 e8 8b 1c 24 8b 74 24 04 8b 7c 24 08 8b 6c 24 0c 83 c4 10 c3 55 57 56 53 83 ec 1c 8b 74 24 34 <0f> 0b 4a 00 29 d8 3b c0 8d 6e 0c 8d 76 00 8d bc 27 00 00 00 00 EIP: [] gfs2_ail1_start_one+0xb/0x150 SS:ESP 0068:f70cfcc4 -- Cyril B. excelleNCy From lhh at redhat.com Tue Jan 2 15:45:42 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 02 Jan 2007 10:45:42 -0500 Subject: [Linux-cluster] Lazy umount - NFS HA In-Reply-To: <45897FCA.2080204@cesca.es> References: <45897FCA.2080204@cesca.es> Message-ID: <1167752742.26770.107.camel@rei.boston.devel.redhat.com> On Wed, 2006-12-20 at 19:24 +0100, Jordi Prats wrote: > Hi all, > It's normal that I must use a script to do a lazy umount (umount -l > /mountpoint) of a ext3 partition (not GFS) in a HA NFS cluster? > > Thanks, Don't do that. If umount is failing, there are things you can enable in the current code (RHCS4/5) to make it try to clean up the mountpoint harder. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Tue Jan 2 15:48:38 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 02 Jan 2007 10:48:38 -0500 Subject: [Linux-cluster] Lazy umount - NFS HA In-Reply-To: <4589D02A.5010805@arts.usyd.edu.au> References: <45897FCA.2080204@cesca.es> <4589D02A.5010805@arts.usyd.edu.au> Message-ID: <1167752918.26770.110.camel@rei.boston.devel.redhat.com> On Thu, 2006-12-21 at 11:07 +1100, Matthew Geier wrote: > Jordi Prats wrote: > > Hi all, > > It's normal that I must use a script to do a lazy umount (umount -l > > /mountpoint) of a ext3 partition (not GFS) in a HA NFS cluster? > > I'm having the same problem - the service won't shutdown cleanly as it > can't unmount the file systems - which it can't unmount due to some one > logging in with SSH and their home directory is on that volume. Try adding nfslock="1" to the tag. > I've also had an issue with the cluster manager not shutting down > nfsd, so a volume won't unmount 'cause an NFS client is still attached. > I think I might have nailed that one, but there is little I can do about > people with interactive logins. Not shutting down nfsd is expected behavior. RHCS can manage multiple NFS services at the same time - killing nfsd would kill all NFS services at the same time. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Tue Jan 2 15:49:07 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 02 Jan 2007 10:49:07 -0500 Subject: [Linux-cluster] Lazy umount - NFS HA In-Reply-To: <4589D02A.5010805@arts.usyd.edu.au> References: <45897FCA.2080204@cesca.es> <4589D02A.5010805@arts.usyd.edu.au> Message-ID: <1167752947.26770.112.camel@rei.boston.devel.redhat.com> On Thu, 2006-12-21 at 11:07 +1100, Matthew Geier wrote: > Jordi Prats wrote: > > Hi all, > > It's normal that I must use a script to do a lazy umount (umount -l > > /mountpoint) of a ext3 partition (not GFS) in a HA NFS cluster? > > I'm having the same problem - the service won't shutdown cleanly as it > can't unmount the file systems - which it can't unmount due to some one > logging in with SSH and their home directory is on that volume. Oh, and, don't forget to enable force-unmount of the file system. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Tue Jan 2 15:49:54 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 02 Jan 2007 10:49:54 -0500 Subject: [Linux-cluster] Re: Lazy umount - NFS HA In-Reply-To: References: <45897FCA.2080204@cesca.es> <4589D02A.5010805@arts.usyd.edu.au> Message-ID: <1167752994.26770.114.camel@rei.boston.devel.redhat.com> On Thu, 2006-12-21 at 09:44 -0800, Jonathan Biggar wrote: > > We got around this by writing a custom script that uses fuser to > identify and kill all processes that had open files on the filesystem. > The fs script does this if you enable force unmount. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Tue Jan 2 15:50:18 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 02 Jan 2007 10:50:18 -0500 Subject: [Linux-cluster] Re: Lazy umount - NFS HA In-Reply-To: <458ACC39.8010005@cesca.es> References: <45897FCA.2080204@cesca.es> <4589D02A.5010805@arts.usyd.edu.au> <458ACC39.8010005@cesca.es> Message-ID: <1167753019.26770.116.camel@rei.boston.devel.redhat.com> On Thu, 2006-12-21 at 19:02 +0100, Jordi Prats wrote: > Hi, > fuser (or lsof) does not show any process because we export the > filesystem with NFS (NFS is inside the kernel) It's probably a lock, then; try adding nfslock="1" to the tag. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Tue Jan 2 15:57:10 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 02 Jan 2007 10:57:10 -0500 Subject: [Linux-cluster] Cluster issue In-Reply-To: <20061229155200.29844.qmail@web33203.mail.mud.yahoo.com> References: <20061229155200.29844.qmail@web33203.mail.mud.yahoo.com> Message-ID: <1167753430.26770.123.camel@rei.boston.devel.redhat.com> On Fri, 2006-12-29 at 07:52 -0800, Brian Pontz wrote: > > I ended up having to reboot both nodes. Any ideas on > what would cause this error? Are you up to date? I thought this was fixed in U4... -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From axehind007 at yahoo.com Tue Jan 2 16:14:29 2007 From: axehind007 at yahoo.com (Brian Pontz) Date: Tue, 2 Jan 2007 08:14:29 -0800 (PST) Subject: [Linux-cluster] Cluster issue In-Reply-To: <1167753430.26770.123.camel@rei.boston.devel.redhat.com> Message-ID: <960889.25622.qm@web33207.mail.mud.yahoo.com> > On Fri, 2006-12-29 at 07:52 -0800, Brian Pontz > wrote: > > > > I ended up having to reboot both nodes. Any ideas > on > > what would cause this error? > > Are you up to date? I thought this was fixed in > U4... At this point the 2 machines are running CentOS release 4.2. I guess we need to upgrade/update though. Do you have a bug id as to what this is related to so I can make sure it's been fixed in the latest release? If not, then no big deal... Thanks, Brian From jprats at cesca.es Tue Jan 2 16:31:14 2007 From: jprats at cesca.es (Jordi Prats) Date: Tue, 02 Jan 2007 17:31:14 +0100 Subject: [Linux-cluster] Re: Lazy umount - NFS HA In-Reply-To: <1167752994.26770.114.camel@rei.boston.devel.redhat.com> References: <45897FCA.2080204@cesca.es> <4589D02A.5010805@arts.usyd.edu.au> <1167752994.26770.114.camel@rei.boston.devel.redhat.com> Message-ID: <459A88D2.7080204@cesca.es> Hi, I have this already enabled with the option force_unmount="1" on fs's tags, but it's still failing: There is any other option ? Thanks, Jordi -- ...................................................................... __ / / Jordi Prats C E / S / C A Dept. de Sistemes /_/ Centre de Supercomputaci? de Catalunya Gran Capit?, 2-4 (Edifici Nexus) ? 08034 Barcelona T. 93 205 6464 ? F. 93 205 6979 ? jprats at cesca.es ...................................................................... Lon Hohberger wrote: > On Thu, 2006-12-21 at 09:44 -0800, Jonathan Biggar wrote: > >> We got around this by writing a custom script that uses fuser to >> identify and kill all processes that had open files on the filesystem. >> >> > > The fs script does this if you enable force unmount. > > -- Lon > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- ...................................................................... __ / / Jordi Prats C E / S / C A Dept. de Sistemes /_/ Centre de Supercomputaci? de Catalunya Gran Capit?, 2-4 (Edifici Nexus) ? 08034 Barcelona T. 93 205 6464 ? F. 93 205 6979 ? jprats at cesca.es ...................................................................... From jprats at cesca.es Tue Jan 2 16:37:56 2007 From: jprats at cesca.es (Jordi Prats) Date: Tue, 02 Jan 2007 17:37:56 +0100 Subject: [Linux-cluster] Re: Lazy umount - NFS HA In-Reply-To: <1167753019.26770.116.camel@rei.boston.devel.redhat.com> References: <45897FCA.2080204@cesca.es> <4589D02A.5010805@arts.usyd.edu.au> <458ACC39.8010005@cesca.es> <1167753019.26770.116.camel@rei.boston.devel.redhat.com> Message-ID: <459A8A64.8030507@cesca.es> Ok, I'll try this. Thank you! Jordi Lon Hohberger wrote: > On Thu, 2006-12-21 at 19:02 +0100, Jordi Prats wrote: > >> Hi, >> fuser (or lsof) does not show any process because we export the >> filesystem with NFS (NFS is inside the kernel) >> > > It's probably a lock, then; try adding nfslock="1" to the tag. > > -- Lon > > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- ...................................................................... __ / / Jordi Prats C E / S / C A Dept. de Sistemes /_/ Centre de Supercomputaci? de Catalunya Gran Capit?, 2-4 (Edifici Nexus) ? 08034 Barcelona T. 93 205 6464 ? F. 93 205 6979 ? jprats at cesca.es ...................................................................... From isplist at logicore.net Tue Jan 2 16:49:03 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Tue, 2 Jan 2007 10:49:03 -0600 Subject: [Linux-cluster] Logging with cluster In-Reply-To: <296568.98665.qm@web50601.mail.yahoo.com> Message-ID: <20071210493.264895@leena> Solution: Log Merging > to merge the log I used to use the 'mergerlog' command: > About: > mergelog is a small and fast C program, which merges > HTTP log files by date in 'Common Log Format' (Apache > default log format) from Web servers, behind > round-robin DNS. It has been designed to easily > process huge logs from highly stressed servers, and > can manage gzipped files. Thanks, this is one option. Mike From isplist at logicore.net Tue Jan 2 17:12:57 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Tue, 2 Jan 2007 11:12:57 -0600 Subject: [Linux-cluster] Logging with cluster In-Reply-To: <459567DD.7090205@dorm.org> Message-ID: <200712111257.852962@leena> Solution: preprocessing logs Great ideas, I'll look into some of these, thanks very much. > We have a cluster of 6 machines, some running Apache, some running MySQL. > We use shared logging successfully along with stats and post-processing > scripts. We also use plain-ol' logrotate with our shared logs. > > We use network-enabled syslog to capture logging on every node to a single, > master logging node (with fail-over, of course!) > > For Apache, we use custom ErrorLog, CustomLog, and RewriteLog directives > per vhost to pipe output to a custom script which greps a few undesirable > statements out prior to logging. > > Apache is sent to the local1 facility on the target syslog > machine that holds all of our logs, where it's configured > with something like: > > /etc/syslog.conf: > # Cluster Apache Logging > local1.err /var/log/shared-apache-err.log > local1.notice /var/log/shared-apache-access.log > local1.debug /var/log/shared-apache-rewrite.log > > > And, for example, all Apache nodes use the same config akin to: > > /path/to/http-vhost.conf: > > ErrorLog "|/path/to/logger.pl err some_string_ID" > CustomLog "|/path/to/logger.pl notice some_string_ID" > RewriteLog "|/path/to/logger.pl debug some_string_ID" > > > where logger.pl continually reads input, runs some filters > to determine if it should indeed log the particular message, > and then calls Sys::Syslog's "syslog()" function, and > "some_string_ID" is a tag to identify each message in > the shared log files. > > You could really use any line-by-line filtering program > here, but be aware that Apache executes the first argument > after the pipe symbol directly - it doesn't run a shell or > anything, so you don't have any expansion, piping of other > commands, etc. > > You can also use /usr/bin/logger (see "man logger") to > send output to various facilities (localN) and informational > levels (err, notice, debug, etc.). This does the same > thing as "logger.pl" above, but doesn't provide any > filtering. > > Also, we've seen syslog drop some messages under > heavy load (hence why we filter some Apache logging > prior to syslogging it). I don't know the exact > cause - maybe someone else can shed light on that for me! > > > Hope this helps - it's what we do and it seems to work > well enough for what we need. > > Regards, > -Brenton Rothchild From jon at levanta.com Tue Jan 2 17:12:48 2007 From: jon at levanta.com (Jonathan Biggar) Date: Tue, 02 Jan 2007 09:12:48 -0800 Subject: [Linux-cluster] Re: Lazy umount - NFS HA In-Reply-To: <1167752994.26770.114.camel@rei.boston.devel.redhat.com> References: <45897FCA.2080204@cesca.es> <4589D02A.5010805@arts.usyd.edu.au> <1167752994.26770.114.camel@rei.boston.devel.redhat.com> Message-ID: Lon Hohberger wrote: > On Thu, 2006-12-21 at 09:44 -0800, Jonathan Biggar wrote: >> We got around this by writing a custom script that uses fuser to >> identify and kill all processes that had open files on the filesystem. >> > > The fs script does this if you enable force unmount. Thanks for the tip, but we don't use the fs service directly because our application dynamically mounts & unmounts filesystems, so we need finer control over the mounting & unmounting. -- Jonathan Biggar jon at levanta.com From lhh at redhat.com Tue Jan 2 21:46:01 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 02 Jan 2007 16:46:01 -0500 Subject: [Linux-cluster] Re: Lazy umount - NFS HA In-Reply-To: References: <45897FCA.2080204@cesca.es> <4589D02A.5010805@arts.usyd.edu.au> <1167752994.26770.114.camel@rei.boston.devel.redhat.com> Message-ID: <1167774361.26770.146.camel@rei.boston.devel.redhat.com> On Tue, 2007-01-02 at 09:12 -0800, Jonathan Biggar wrote: > Lon Hohberger wrote: > > On Thu, 2006-12-21 at 09:44 -0800, Jonathan Biggar wrote: > >> We got around this by writing a custom script that uses fuser to > >> identify and kill all processes that had open files on the filesystem. > >> > > > > The fs script does this if you enable force unmount. > > Thanks for the tip, but we don't use the fs service directly because our > application dynamically mounts & unmounts filesystems, so we need finer > control over the mounting & unmounting. > Good reason not to use it, then :) -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Tue Jan 2 21:48:46 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 02 Jan 2007 16:48:46 -0500 Subject: [Linux-cluster] Cluster issue In-Reply-To: <960889.25622.qm@web33207.mail.mud.yahoo.com> References: <960889.25622.qm@web33207.mail.mud.yahoo.com> Message-ID: <1167774526.26770.148.camel@rei.boston.devel.redhat.com> On Tue, 2007-01-02 at 08:14 -0800, Brian Pontz wrote: > > On Fri, 2006-12-29 at 07:52 -0800, Brian Pontz > > wrote: > > > > > > I ended up having to reboot both nodes. Any ideas > > on > > > what would cause this error? > > > > Are you up to date? I thought this was fixed in > > U4... > > At this point the 2 machines are running CentOS > release 4.2. I guess we need to upgrade/update though. > Do you have a bug id as to what this is related to so > I can make sure it's been fixed in the latest release? > If not, then no big deal... I can find them if you want -- there's one fairly recent one which cropped up which isn't in any release (yet) but has been fixed in CVS. There are several lock-related fixes between 4.2 and 4.4; this could be one of several. Here's one of them: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=193128 You'll need to update magma, magma-plugins, and rgmanager to the latest. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From otakurx at gmail.com Tue Jan 2 22:28:23 2007 From: otakurx at gmail.com (Michael Mitchell) Date: Tue, 2 Jan 2007 17:28:23 -0500 Subject: [Linux-cluster] Kernel issue when trying to mount Message-ID: <96fe693d0701021428r1682a993kc95d06454d4adb22@mail.gmail.com> When I go to mount the GFS drive I get the following Kernel error: Jan 2 16:51:51 perforce2 kernel: GFS: Trying to join cluster "lock_dlm", "media:perforce_gfs" Jan 2 16:51:53 perforce2 kernel: GFS: fsid=media:perforce_gfs.0: Joined cluster. Now mounting FS... Jan 2 16:51:53 perforce2 kernel: GFS: fsid=media:perforce_gfs.0: jid=0: Trying to acquire journal lock... Jan 2 16:51:53 perforce2 kernel: GFS: fsid=media:perforce_gfs.0: jid=0: Looking at journal... Jan 2 16:51:53 perforce2 kernel: GFS: fsid=media:perforce_gfs.0: jid=0: Done Jan 2 16:51:53 perforce2 kernel: GFS: fsid=media:perforce_gfs.0: jid=1: Trying to acquire journal lock... Jan 2 16:51:53 perforce2 kernel: GFS: fsid=media:perforce_gfs.0: jid=1: Looking at journal... Jan 2 16:51:53 perforce2 kernel: GFS: fsid=media:perforce_gfs.0: jid=1: Done Jan 2 16:51:53 perforce2 kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024 Jan 2 16:51:53 perforce2 kernel: printing eip: Jan 2 16:51:53 perforce2 kernel: c0172db7 Jan 2 16:51:53 perforce2 kernel: *pde = 2e32b001 Jan 2 16:51:53 perforce2 kernel: Oops: 0000 [#1] Jan 2 16:51:53 perforce2 kernel: SMP Jan 2 16:51:53 perforce2 kernel: Modules linked in: iscsi_tcp libiscsi scsi_transport_iscsi crc32c libcrc32c lock_dlm dlm gfs lock_harness cman gnbd mptctl mptbase ipmi_devintf ipmi_si ipmi_msghandler dell_rbu parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc dm_mirror dm_mod button battery asus_acpi ac ipv6 uhci_hcd ehci_hcd e1000 floppy sg megaraid_mbox megaraid_mm sd_mod Jan 2 16:51:53 perforce2 kernel: CPU: 2 Jan 2 16:51:53 perforce2 kernel: EIP: 0060:[] Tainted: GF VLI Jan 2 16:51:53 perforce2 kernel: EFLAGS: 00010293 (2.6.18.3.mitchell #2) Jan 2 16:51:53 perforce2 kernel: EIP is at do_add_mount+0x67/0xea Jan 2 16:51:53 perforce2 kernel: eax: 0000000c ebx: d87aba00 ecx: 00000000 edx: c676d440 Jan 2 16:51:53 perforce2 kernel: esi: d8697f38 edi: ffffffea ebp: 00000000 esp: d8697f08 Jan 2 16:51:53 perforce2 kernel: ds: 007b es: 007b ss: 0068 Jan 2 16:51:53 perforce2 kernel: Process mount (pid: 15304, ti=d8697000 task=f5390df0 task.ti=d8697000) Jan 2 16:51:53 perforce2 kernel: Stack: 00000000 00000000 00000000 d8697f38 00000000 c017334b 00000000 d3a04000 Jan 2 16:51:53 perforce2 kernel: 00000000 00000000 e92df000 d3a04000 d15933e8 c676d440 00000000 000200d0 Jan 2 16:51:53 perforce2 kernel: c0362780 00000001 00000001 00000000 c0142da5 00000044 00001000 f5390df0 Jan 2 16:51:53 perforce2 kernel: Call Trace: Jan 2 16:51:53 perforce2 kernel: [] do_mount+0x1af/0x1c7 Jan 2 16:51:53 perforce2 kernel: [] __alloc_pages+0x5e/0x284 Jan 2 16:51:53 perforce2 kernel: [] sys_mount+0x6f/0xa8 Jan 2 16:51:53 perforce2 kernel: [] sysenter_past_esp+0x56/0x79 Jan 2 16:51:53 perforce2 kernel: Code: f0 ff ff 8b 00 8b 80 58 04 00 00 39 42 64 75 7a 8b 43 14 66 bf f0 ff 39 42 14 75 07 8b 06 39 42 10 74 67 8b 43 10 bf ea ff ff ff <8b> 40 18 0f b7 40 28 25 00 f0 00 00 3d 00 a0 00 00 74 4c 8b 04 Jan 2 16:51:53 perforce2 kernel: EIP: [] do_add_mount+0x67/0xea SS:ESP 0068:d8697f08 [root at perforce2 ~]# Is there a patch for this? I am running kernel 2.6.18.3 -- Mike Mitchell otakurx at gmail.com (603) 706-0026 www.otaku-wired.net (offline) zatoichi.is-a-geek.org otakurx.blogspot.com (my Blog) -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.lusini at governo.it Wed Jan 3 11:35:57 2007 From: marco.lusini at governo.it (Marco Lusini) Date: Wed, 3 Jan 2007 12:35:57 +0100 Subject: [Linux-cluster] High system CPU usage in one of a two node cluster In-Reply-To: <96fe693d0701021428r1682a993kc95d06454d4adb22@mail.gmail.com> Message-ID: <016001c72f2b$57f3c080$8ec9100a@nicchio> Hi all, I have 3 2-node clusters, running just cluster suite, without gfs, each one updated with the latest packages released by RHN. In each cluster one of the two nodes has a steadily growing system CPU usage, which seems to be consumed by clurgmgrd and dlm_recvd. As an example here is the running time accumulated on one cluster since 20 december when oit was rebooted: [root at estestest ~]# ps axo pid,start,time,args PID STARTED TIME COMMAND ... 10221 Dec 20 10:37:05 clurgmgrd 11169 Dec 20 06:48:24 [dlm_recvd] ... [root at frascati ~]# ps axo pid,start,time,args PID STARTED TIME COMMAND ... 6226 Dec 20 00:04:17 clurgmgrd 8249 Dec 20 00:00:19 [dlm_recvd] ... I attach two graphs made with RRD which show that the system CPU usage is steadily growing: note how the trend changed after the reboot on 20 december. Of course as the system usage increases so does the system load and I am afraid of what will happen after 1-2 months of uptime... Does anybody else see this behaviour? Any hint on a solution? TIA, Marco Lusini _______________________________________________________ Messaggio analizzato e protetto da tecnologia antivirus Servizio erogato dal sistema informativo della Presidenza del Consiglio dei Ministri -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: estestest_monthly.jpg Type: image/jpeg Size: 35833 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: frascati_monthly.jpg Type: image/jpeg Size: 31823 bytes Desc: not available URL: From lhh at redhat.com Wed Jan 3 16:39:31 2007 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 03 Jan 2007 11:39:31 -0500 Subject: [Linux-cluster] High system CPU usage in one of a two node cluster In-Reply-To: <016001c72f2b$57f3c080$8ec9100a@nicchio> References: <016001c72f2b$57f3c080$8ec9100a@nicchio> Message-ID: <1167842371.26770.162.camel@rei.boston.devel.redhat.com> On Wed, 2007-01-03 at 12:35 +0100, Marco Lusini wrote: > Hi all, > > I have 3 2-node clusters, running just cluster suite, without gfs, > each one updated with the latest > packages released by RHN. > > In each cluster one of the two nodes has a steadily growing system CPU > usage, which seems > to be consumed by clurgmgrd and dlm_recvd. > As an example here is the running time accumulated on one cluster > since 20 december when > oit was rebooted: > > [root at estestest ~]# ps axo pid,start,time,args > PID STARTED TIME COMMAND > ... > 10221 Dec 20 10:37:05 clurgmgrd > 11169 Dec 20 06:48:24 [dlm_recvd] > ... > > [root at frascati ~]# ps axo pid,start,time,args > PID STARTED TIME COMMAND > ... > 6226 Dec 20 00:04:17 clurgmgrd > 8249 Dec 20 00:00:19 [dlm_recvd] > ... > > I attach two graphs made with RRD which show that the system CPU usage > is steadily growing: > note how the trend changed after the reboot on 20 december. > Of course as the system usage increases so does the system load and I > am afraid of what will > happen after 1-2 months of uptime... System load averages are the average of the number of processes on the run queue over the past 1, 5, and 15 minutes. It doesn't generally trend upwards over time; if that were the case, I'd be in trouble: ... 28204 15:11:11 01:04:19 /usr/lib/firefox-1.5.0.9/firefox-bin -UILocale en-US ... However, it is a little odd that you had 10 hours of runtime for clurgmgrd and over 6 for dlm_recvd. Just taking a wild guess, but it looks like the locks were all mastered on frascati. How many services are you running? Also, take a look at: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634 The RPMs there might solve the problem with dlm_recvd. Rgmanager in some situations causes a strange leak of NL locks in the DLM. If dlm_recvd has to traverse lock lists and that list is ever-growing (total speculation here), it could explain the amount of consumed system time. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From danwest at comcast.net Thu Jan 4 04:12:28 2007 From: danwest at comcast.net (danwest) Date: Wed, 03 Jan 2007 23:12:28 -0500 Subject: [Linux-cluster] qdiskd eviction on missed writes Message-ID: <1167883949.6614.24.camel@belmont.site> It seems that we can get into situations where certain spike conditions will cause a node to evict another node based on missed writes to the qdisk. The problem is that during these spikes application access to the same storage back end does not seem to be impacted. The SAN in this case is a high end EMC DMX, multipathed, etc... Currently our clusters are set to interval="1" and tko="15" which should allow for at least 15 seconds (a very long time for this type of storage) In looking at ~/cluster/cman/qdisk/main.c it seems like the following is taking place: In quroum_loop {} 1) read everybody else's status (not sure if this includes yourself 2) check for node transitions (write eviction notice if number of heartbeats missed > tko) 3) check local heuristic (if we do not meet requirement remove from qdisk partition and possibly reboot) 4) Find master and/or determine new master, etc... 5) write out our status to qdisk 6) write out our local status (heuristics) 7) cycle ( sleep for defined interval). sleep() measured in seconds so complete cycle = interval + time for steps (1) through (6) Do you think that any delay in steps (1) through (4) could be the problem? From an architectural standpoint wouldn't it be better to have (6) and (7) as a separate thread or daemon? A kernel thread like cman_hbeat for example? Further in the check_transitions procedure case #2 it might be more helpful to clulog what actually caused this to trigger. The current logging is a bit generic. Am I totally off base or does this seem plausible? Thanks, Dan From simone.gotti at email.it Thu Jan 4 14:13:12 2007 From: simone.gotti at email.it (Simone Gotti) Date: Thu, 04 Jan 2007 15:13:12 +0100 Subject: [Linux-cluster] [PATCH] Fix cman_get_node_id in qdisk. Message-ID: <1167919992.11659.14.camel@localhost> Hi all, testing the qdiskd provided by the new openais cman-2.0.35-2 (on RHEL5 Beta 2) I found that it would no start with the following error: Could not determine local node ID; cannot start Looking at the code of the other programs that connects to cman I noticed that before the call libcman function: cman_get_node(cman_handle_t handle, int nodeid, cman_node_t *node), they are inizialing with all zeros the third argument, so I did the same with qdiskd and it worked. Looking at the cvs repository I didn't find a fix for it. A patch is attached. Thanks! Bye! -- Simone Gotti -- Email.it, the professional e-mail, gratis per te: http://www.email.it/f Sponsor: Refill srl il paradiso della tua stampante - cartucce e toner compatibili, inchiostri e accessori per la ricarica, carta speciale. Tutto a prezzi scontatissimi! Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=5187&d=4-1 -------------- next part -------------- A non-text attachment was scrubbed... Name: cman-2.0.35-qdisk-cman_get_node-fix.patch Type: text/x-patch Size: 452 bytes Desc: not available URL: From pcaulfie at redhat.com Thu Jan 4 14:23:56 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 04 Jan 2007 14:23:56 +0000 Subject: [Linux-cluster] [PATCH] Fix cman_get_node_id in qdisk. In-Reply-To: <1167919992.11659.14.camel@localhost> References: <1167919992.11659.14.camel@localhost> Message-ID: <459D0DFC.1080609@redhat.com> Simone Gotti wrote: > Hi all, > > > testing the qdiskd provided by the new openais cman-2.0.35-2 (on RHEL5 > Beta 2) I found that it would no start with the following error: > > Could not determine local node ID; cannot start > > > Looking at the code of the other programs that connects to cman I > noticed that before the call libcman function: > cman_get_node(cman_handle_t handle, int nodeid, cman_node_t *node), > they are inizialing with all zeros the third argument, so I did the same > with qdiskd and it worked. > > Looking at the cvs repository I didn't find a fix for it. > A patch is attached. > Yes, that patch look correct to me. Thanks -- patrick From francisco_javier.pena at roche.com Thu Jan 4 14:42:05 2007 From: francisco_javier.pena at roche.com (Pena, Francisco Javier) Date: Thu, 4 Jan 2007 15:42:05 +0100 Subject: [Linux-cluster] Removing a node from a running cluster Message-ID: Hello, I am finding a strange cman behavior when removing a node from a running cluster. The starting point is: - 3 nodes running RHEL 4 U4, GFS 6.1 (1 vote per node) - Quorum disk (4 votes) I stop all cluster services on node 3, then modify the cluster.conf file to remove the node (and adjust the quorum disk votes to 3), and then "ccs_tool update" and "cman_tool version -r ". The cluster services keep running, however it looks like cman is not completely in sync with ccsd: # ccs_tool lsnode Cluster name: TestCluster, config_version: 9 Nodename Votes Nodeid Iface Fencetype gfsnode1 1 1 iLO_NODE1 gfsnode2 1 2 iLO_NODE2 # cman_tool nodes Node Votes Exp Sts Name 0 4 0 M /dev/emcpowera1 1 1 3 M gfsnode1 2 1 3 M gfsnode2 3 1 3 X gfsnode3 # cman_tool status Protocol version: 5.0.1 Config version: 9 Cluster name: TestCluster Cluster ID: 62260 Cluster Member: Yes Membership state: Cluster-Member Nodes: 2 Expected_votes: 3 Total_votes: 6 Quorum: 4 Active subsystems: 9 Node name: gfsnode1 Node ID: 1 Node addresses: A.B.C.D CMAN still thinks the third node is part of the cluster, but has just stopped working. In addition to that, it is not updating the number of votes for the quorum disk. If I completely restart the cluster services on all nodes, I get the right information: - Correct votes for the quorum disk - Third node dissappears - The Expected_votes value is now 2 I know from a previous post that two node clusters are a special case, even with quorum disk, but I am pretty sure the same problem will happen with higher node counts (I just do not have enough hardware to test it). So, is this considered as a bug or is it expected that the information from removed nodes is still there until the whole cluster is restarted? Thanks in advance, Javier Pe?a From Alain.Moulle at bull.net Thu Jan 4 14:48:43 2007 From: Alain.Moulle at bull.net (Alain Moulle) Date: Thu, 04 Jan 2007 15:48:43 +0100 Subject: [Linux-cluster] Re: CS4 U4 / problem when fencing (Marcos David) Message-ID: <459D13CB.3000909@bull.net> Hi Thanks for the patch. I've applied the patch, it seems that fencing is successful despite I always have the message Error with ccsd, is it normal ? Extract of syslog : ... Jan 4 16:17:32 s_sys at nova6 ccsd[16197]: Attempt to close an unopened CCS descriptor (870). Jan 4 16:17:32 s_sys at nova6 ccsd[16197]: Error while processing disconnect: Invalid request descriptor Jan 4 16:17:32 s_sys at nova6 fenced[16341]: fence "nova10" success Jan 4 16:17:39 s_sys at nova6 clurgmgrd[16355]: Magma Event: Membership Change Jan 4 16:17:39 s_sys at nova6 clurgmgrd[16355]: State change: nova10 DOWN Jan 4 16:17:39 s_sys at nova6 clurgmgrd[16355]: Taking over service testHA from down member (null) Jan 4 16:17:39 s_sys at nova6 clurgmgrd: [16355]: Executing /tmp/testHAmanage start Jan 4 16:17:39 s_sys at nova6 clurgmgrd[16355]: Service testHA started ... Alain Moull? > Hello, > I've had the same problem, the fence agent times out while trying to > fence a node. > There is a patch to solve this problem. > follow this link: > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=219633 > It explains the problem and has a fix. > Hope it helps. > Marcos David From jwhiter at redhat.com Thu Jan 4 14:59:07 2007 From: jwhiter at redhat.com (Josef Whiter) Date: Thu, 4 Jan 2007 09:59:07 -0500 Subject: [Linux-cluster] Re: CS4 U4 / problem when fencing (Marcos David) In-Reply-To: <459D13CB.3000909@bull.net> References: <459D13CB.3000909@bull.net> Message-ID: <20070104145906.GB5282@korben.rdu.redhat.com> Yes thats normal. What happens is the connection to magma is dropped because the fencing takes too long and then fenced goes to reuse the connection, and then that error pops up, and then fenced will re-create the connection and try again. So it is doing what its supposed to. Josef On Thu, Jan 04, 2007 at 03:48:43PM +0100, Alain Moulle wrote: > Hi > > Thanks for the patch. > I've applied the patch, it seems that fencing is successful despite > I always have the message Error with ccsd, is it normal ? > > Extract of syslog : > ... > Jan 4 16:17:32 s_sys at nova6 ccsd[16197]: Attempt to close an unopened CCS > descriptor (870). > Jan 4 16:17:32 s_sys at nova6 ccsd[16197]: Error while processing disconnect: > Invalid request descriptor > Jan 4 16:17:32 s_sys at nova6 fenced[16341]: fence "nova10" success > Jan 4 16:17:39 s_sys at nova6 clurgmgrd[16355]: Magma Event: Membership Change > Jan 4 16:17:39 s_sys at nova6 clurgmgrd[16355]: State change: nova10 DOWN > Jan 4 16:17:39 s_sys at nova6 clurgmgrd[16355]: Taking over service > testHA from down member (null) > Jan 4 16:17:39 s_sys at nova6 clurgmgrd: [16355]: Executing > /tmp/testHAmanage start > Jan 4 16:17:39 s_sys at nova6 clurgmgrd[16355]: Service testHA started > ... > > Alain Moull? > > > Hello, > > I've had the same problem, the fence agent times out while trying to > > fence a node. > > There is a patch to solve this problem. > > follow this link: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=219633 > > It explains the problem and has a fix. > > Hope it helps. > > Marcos David > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From pcaulfie at redhat.com Thu Jan 4 15:00:43 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 04 Jan 2007 15:00:43 +0000 Subject: [Linux-cluster] Removing a node from a running cluster In-Reply-To: References: Message-ID: <459D169B.90201@redhat.com> Pena, Francisco Javier wrote: > Hello, > > I am finding a strange cman behavior when removing a node from a running cluster. The starting point is: > > - 3 nodes running RHEL 4 U4, GFS 6.1 (1 vote per node) > - Quorum disk (4 votes) > > I stop all cluster services on node 3, then modify the cluster.conf file to remove the node (and adjust the quorum disk votes to 3), and then "ccs_tool update" and "cman_tool version -r ". The cluster services keep running, however it looks like cman is not completely in sync with ccsd: > > # ccs_tool lsnode > > Cluster name: TestCluster, config_version: 9 > > Nodename Votes Nodeid Iface Fencetype > gfsnode1 1 1 iLO_NODE1 > gfsnode2 1 2 iLO_NODE2 > > > # cman_tool nodes > > Node Votes Exp Sts Name > 0 4 0 M /dev/emcpowera1 > 1 1 3 M gfsnode1 > 2 1 3 M gfsnode2 > 3 1 3 X gfsnode3 > > # cman_tool status > > Protocol version: 5.0.1 > Config version: 9 > Cluster name: TestCluster > Cluster ID: 62260 > Cluster Member: Yes > Membership state: Cluster-Member > Nodes: 2 > Expected_votes: 3 > Total_votes: 6 > Quorum: 4 > Active subsystems: 9 > Node name: gfsnode1 > Node ID: 1 > Node addresses: A.B.C.D > > CMAN still thinks the third node is part of the cluster, but has just stopped working. In addition to that, it is not updating the number of votes for the quorum disk. If I completely restart the cluster services on all nodes, I get the right information: > > - Correct votes for the quorum disk > - Third node dissappears > - The Expected_votes value is now 2 > I can't comment on the behaviour of the quorum disk, but cman is behaving as expected. A node is NEVER removed from the internal lists of cman while any node of the cluster is till active. It is completely harmless in that state, the node simply remains permanently dead and expected votes is adjusted accordingly. -- patrick From jparsons at redhat.com Thu Jan 4 15:55:14 2007 From: jparsons at redhat.com (Jim Parsons) Date: Thu, 04 Jan 2007 10:55:14 -0500 Subject: [Linux-cluster] Removing a node from a running cluster References: <459D169B.90201@redhat.com> Message-ID: <459D2362.40204@redhat.com> Patrick Caulfield wrote: >Pena, Francisco Javier wrote: > >>Hello, >> >>I am finding a strange cman behavior when removing a node from a running cluster. The starting point is: >> >>- 3 nodes running RHEL 4 U4, GFS 6.1 (1 vote per node) >>- Quorum disk (4 votes) >> >>I stop all cluster services on node 3, then modify the cluster.conf file to remove the node (and adjust the quorum disk votes to 3), and then "ccs_tool update" and "cman_tool version -r ". The cluster services keep running, however it looks like cman is not completely in sync with ccsd: >> >># ccs_tool lsnode >> >>Cluster name: TestCluster, config_version: 9 >> >>Nodename Votes Nodeid Iface Fencetype >>gfsnode1 1 1 iLO_NODE1 >>gfsnode2 1 2 iLO_NODE2 >> >> >># cman_tool nodes >> >>Node Votes Exp Sts Name >> 0 4 0 M /dev/emcpowera1 >> 1 1 3 M gfsnode1 >> 2 1 3 M gfsnode2 >> 3 1 3 X gfsnode3 >> >># cman_tool status >> >>Protocol version: 5.0.1 >>Config version: 9 >>Cluster name: TestCluster >>Cluster ID: 62260 >>Cluster Member: Yes >>Membership state: Cluster-Member >>Nodes: 2 >>Expected_votes: 3 >>Total_votes: 6 >>Quorum: 4 >>Active subsystems: 9 >>Node name: gfsnode1 >>Node ID: 1 >>Node addresses: A.B.C.D >> >>CMAN still thinks the third node is part of the cluster, but has just stopped working. In addition to that, it is not updating the number of votes for the quorum disk. If I completely restart the cluster services on all nodes, I get the right information: >> >>- Correct votes for the quorum disk >>- Third node dissappears >>- The Expected_votes value is now 2 >> > >I can't comment on the behaviour of the quorum disk, but cman is behaving as expected. A node is NEVER removed from the internal >lists of cman while any node of the cluster is till active. It is completely harmless in that state, the node simply remains >permanently dead and expected votes is adjusted accordingly. > > Patrick - isn't it also necessary to set a cman attribute for two-node='1' in the conf file? In order for cman to see this attribute, the entire cluster would need to be restarted. Regards, -Jim From pcaulfie at redhat.com Thu Jan 4 15:34:16 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 04 Jan 2007 15:34:16 +0000 Subject: [Linux-cluster] Removing a node from a running cluster In-Reply-To: <459D2362.40204@redhat.com> References: <459D169B.90201@redhat.com> <459D2362.40204@redhat.com> Message-ID: <459D1E78.6070602@redhat.com> Jim Parsons wrote: > Patrick Caulfield wrote: > >> Pena, Francisco Javier wrote: >> >>> Hello, >>> >>> I am finding a strange cman behavior when removing a node from a >>> running cluster. The starting point is: >>> >>> - 3 nodes running RHEL 4 U4, GFS 6.1 (1 vote per node) >>> - Quorum disk (4 votes) >>> >>> I stop all cluster services on node 3, then modify the cluster.conf >>> file to remove the node (and adjust the quorum disk votes to 3), and >>> then "ccs_tool update" and "cman_tool version -r ". The >>> cluster services keep running, however it looks like cman is not >>> completely in sync with ccsd: >>> >>> # ccs_tool lsnode >>> >>> Cluster name: TestCluster, config_version: 9 >>> >>> Nodename Votes Nodeid Iface Fencetype >>> gfsnode1 1 1 iLO_NODE1 >>> gfsnode2 1 2 iLO_NODE2 >>> >>> >>> # cman_tool nodes >>> >>> Node Votes Exp Sts Name >>> 0 4 0 M /dev/emcpowera1 >>> 1 1 3 M gfsnode1 >>> 2 1 3 M gfsnode2 >>> 3 1 3 X gfsnode3 >>> >>> # cman_tool status >>> >>> Protocol version: 5.0.1 >>> Config version: 9 >>> Cluster name: TestCluster >>> Cluster ID: 62260 >>> Cluster Member: Yes >>> Membership state: Cluster-Member >>> Nodes: 2 >>> Expected_votes: 3 >>> Total_votes: 6 >>> Quorum: 4 >>> Active subsystems: 9 >>> Node name: gfsnode1 >>> Node ID: 1 >>> Node addresses: A.B.C.D >>> >>> CMAN still thinks the third node is part of the cluster, but has just >>> stopped working. In addition to that, it is not updating the number >>> of votes for the quorum disk. If I completely restart the cluster >>> services on all nodes, I get the right information: >>> >>> - Correct votes for the quorum disk >>> - Third node dissappears >>> - The Expected_votes value is now 2 >>> >> >> I can't comment on the behaviour of the quorum disk, but cman is >> behaving as expected. A node is NEVER removed from the internal >> lists of cman while any node of the cluster is till active. It is >> completely harmless in that state, the node simply remains >> permanently dead and expected votes is adjusted accordingly. >> >> > Patrick - isn't it also necessary to set a cman attribute for > two-node='1' in the conf file? In order for cman to see this attribute, > the entire cluster would need to be restarted. > No, not if they're using a quorum disk. That flag is only needed for a two-node cluster where the quorum is set to one and the surviving node is determined by a fencing race. -- patrick From jparsons at redhat.com Thu Jan 4 16:23:13 2007 From: jparsons at redhat.com (Jim Parsons) Date: Thu, 04 Jan 2007 11:23:13 -0500 Subject: [Linux-cluster] Removing a node from a running cluster References: <459D169B.90201@redhat.com> <459D2362.40204@redhat.com> <459D1E78.6070602@redhat.com> Message-ID: <459D29F1.2070205@redhat.com> Patrick Caulfield wrote: > >>> >>Patrick - isn't it also necessary to set a cman attribute for >>two-node='1' in the conf file? In order for cman to see this attribute, >>the entire cluster would need to be restarted. >> > >No, not if they're using a quorum disk. > >That flag is only needed for a two-node cluster where the quorum is set to one and the surviving node is determined by a fencing race. > Oh my - what are the implications of having that attr set when using a quorum disk? Nothing, I hope... -J From lhh at redhat.com Thu Jan 4 16:35:42 2007 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 04 Jan 2007 11:35:42 -0500 Subject: [Linux-cluster] qdiskd eviction on missed writes In-Reply-To: <1167883949.6614.24.camel@belmont.site> References: <1167883949.6614.24.camel@belmont.site> Message-ID: <1167928542.26770.190.camel@rei.boston.devel.redhat.com> On Wed, 2007-01-03 at 23:12 -0500, danwest wrote: > The SAN in this > case is a high end EMC DMX, multipathed, etc... Currently our clusters > are set to interval="1" and tko="15" which should allow for at least 15 > seconds (a very long time for this type of storage) "at max" 15 seconds. > In looking at ~/cluster/cman/qdisk/main.c it seems like the following is > taking place: > > In quroum_loop {} > > 1) read everybody else's status (not sure if this includes > yourself > 2) check for node transitions (write eviction notice if number > of heartbeats missed > tko) > 3) check local heuristic (if we do not meet requirement remove > from qdisk partition and possibly reboot) > 4) Find master and/or determine new master, etc... > 5) write out our status to qdisk > 6) write out our local status (heuristics) > 7) cycle ( sleep for defined interval). sleep() measured in > seconds so complete cycle = interval + time for steps (1) through (6) > > Do you think that any delay in steps (1) through (4) could be the > problem? From an architectural standpoint wouldn't it be better to have > (6) and (7) as a separate thread or daemon? A kernel thread like > cman_hbeat for example? The heuristics are checked in the background in a separate thread; the only thing that is checked is their states. Step 1 will take awhile (most of any part of qdiskd). However, steps 2-4 shouldn't. Making the read/write separate probably will (probably) not change much - it's all direct I/O. You basically said it yourself: on high end storage, this just shouldn't be a problem. We're doing a maddening 8k of reads and 0.5k of writes during a normal cycle every (in your case) 1 second. So, I suspect it's a scheduling problem. That is, it would probably be a whole lot more effective to just increase the priority of qdiskd so that it gets scheduled even during load spikes (E.g. use a realtime queue; SCHED_RR?). I don't think the I/O path is the bottleneck. > Further in the check_transitions procedure case #2 it might be more > helpful to clulog what actually caused this to trigger. The current > logging is a bit generic. You're totally right here; the logging isn't very great at the moment. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From pcaulfie at redhat.com Thu Jan 4 17:05:04 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 04 Jan 2007 17:05:04 +0000 Subject: [Linux-cluster] Removing a node from a running cluster In-Reply-To: <459D29F1.2070205@redhat.com> References: <459D169B.90201@redhat.com> <459D2362.40204@redhat.com> <459D1E78.6070602@redhat.com> <459D29F1.2070205@redhat.com> Message-ID: <459D33C0.9010504@redhat.com> Jim Parsons wrote: > Patrick Caulfield wrote: > >> >>>> >>> Patrick - isn't it also necessary to set a cman attribute for >>> two-node='1' in the conf file? In order for cman to see this attribute, >>> the entire cluster would need to be restarted. >>> >> >> No, not if they're using a quorum disk. >> >> That flag is only needed for a two-node cluster where the quorum is >> set to one and the surviving node is determined by a fencing race. >> > Oh my - what are the implications of having that attr set when using a > quorum disk? Nothing, I hope... Well, basically that flag allows a cluster to continue with a single vote. So it could be quite dangerous I suppose if the cluster splits and one node has the quorum disk and one doesn't. I'd need to check specific configurations but I wouldn't really recommend it... -- patrick From axehind007 at yahoo.com Thu Jan 4 19:46:54 2007 From: axehind007 at yahoo.com (Brian Pontz) Date: Thu, 4 Jan 2007 11:46:54 -0800 (PST) Subject: [Linux-cluster] Cluster issue In-Reply-To: <1167774526.26770.148.camel@rei.boston.devel.redhat.com> Message-ID: <20070104194654.36383.qmail@web33210.mail.mud.yahoo.com> So I tried upgrading a node in the cluster from CentOS 4.2. I did yum update yum sqlite python-sqlite yum upgrade reboot And now the node starts to come up and hangs after complaining about "lvm.static segfault" on line 504 in rc.sysinit Any ideas? Brian > I can find them if you want -- there's one fairly > recent one which > cropped up which isn't in any release (yet) but has > been fixed in CVS. > > There are several lock-related fixes between 4.2 and > 4.4; this could be > one of several. > > Here's one of them: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=193128 > > You'll need to update magma, magma-plugins, and > rgmanager to the latest. > From lhh at redhat.com Thu Jan 4 22:19:21 2007 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 04 Jan 2007 17:19:21 -0500 Subject: [Linux-cluster] qdiskd eviction on missed writes In-Reply-To: <1167928542.26770.190.camel@rei.boston.devel.redhat.com> References: <1167883949.6614.24.camel@belmont.site> <1167928542.26770.190.camel@rei.boston.devel.redhat.com> Message-ID: <1167949161.10215.5.camel@rei.boston.devel.redhat.com> On Thu, 2007-01-04 at 11:35 -0500, Lon Hohberger wrote: > > Making the read/write separate probably will (probably) not change much > - it's all direct I/O. You basically said it yourself: on high end > storage, this just shouldn't be a problem. We're doing a maddening 8k > of reads and 0.5k of writes during a normal cycle every (in your case) 1 > second. > > So, I suspect it's a scheduling problem. That is, it would probably be > a whole lot more effective to just increase the priority of qdiskd so > that it gets scheduled even during load spikes (E.g. use a realtime > queue; SCHED_RR?). I don't think the I/O path is the bottleneck. I'll be working on a patch to allow you to turn on/off RT scheduling for qdiskd from the configuration file (as well as other qdisk-related bits) tomorrow and early next week -- would you like to test it when I get it ready? -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Thu Jan 4 22:25:11 2007 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 04 Jan 2007 17:25:11 -0500 Subject: [Linux-cluster] Cluster issue In-Reply-To: <20070104194654.36383.qmail@web33210.mail.mud.yahoo.com> References: <20070104194654.36383.qmail@web33210.mail.mud.yahoo.com> Message-ID: <1167949511.10215.11.camel@rei.boston.devel.redhat.com> On Thu, 2007-01-04 at 11:46 -0800, Brian Pontz wrote: > So I tried upgrading a node in the cluster from CentOS > 4.2. > > I did > yum update yum sqlite python-sqlite > yum upgrade > reboot > > And now the node starts to come up and hangs after > complaining about "lvm.static segfault" on line 504 in > rc.sysinit > > Any ideas? Is it before or after it starts init? -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From axehind007 at yahoo.com Thu Jan 4 22:52:22 2007 From: axehind007 at yahoo.com (Brian Pontz) Date: Thu, 4 Jan 2007 14:52:22 -0800 (PST) Subject: [Linux-cluster] Cluster issue In-Reply-To: <1167949511.10215.11.camel@rei.boston.devel.redhat.com> Message-ID: <20070104225222.9881.qmail@web33206.mail.mud.yahoo.com> Nevermind about this. It finally passes this after hanging for a little bit. It's basically the same error as this person posted. http://lists.centos.org/pipermail/centos/2006-November/072155.html Brian --- Lon Hohberger wrote: > On Thu, 2007-01-04 at 11:46 -0800, Brian Pontz > wrote: > > So I tried upgrading a node in the cluster from > CentOS > > 4.2. > > > > I did > > yum update yum sqlite python-sqlite > > yum upgrade > > reboot > > > > And now the node starts to come up and hangs after > > complaining about "lvm.static segfault" on line > 504 in > > rc.sysinit > > > > Any ideas? > > Is it before or after it starts init? > > -- Lon > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From marco.lusini at governo.it Fri Jan 5 09:38:41 2007 From: marco.lusini at governo.it (Marco Lusini) Date: Fri, 5 Jan 2007 10:38:41 +0100 Subject: R: [Linux-cluster] High system CPU usage in one of a two node cluster In-Reply-To: <1167842371.26770.162.camel@rei.boston.devel.redhat.com> Message-ID: <002e01c730ad$4c6e4900$8ec9100a@nicchio> > > System load averages are the average of the number of > processes on the run queue over the past 1, 5, and 15 > minutes. It doesn't generally trend upwards over time; if > that were the case, I'd be in trouble: > I am in trouble, then :-) As I told in the first mail, as system (i.e. kernel) CPU usage grows so does the system load (1, 5, and 15 mins average). In order to better show what I see in my clusters, I am sending more graphs (on a yearly time base) that illustrate how system load trends upwards as kernel usage grows. Graphs were produced by CACTI probing the snmpd daemon running on the nodes. Again note how the trend swap from node to node on reboots. > > However, it is a little odd that you had 10 hours of runtime > for clurgmgrd and over 6 for dlm_recvd. Just taking a wild > guess, but it looks like the locks were all mastered on frascati. > How can I get more info on this? I checked /proc/cluster/dlm_locks on both nodes and it is empty. Here is the output of cat /proc/cluster/dlm_stats: [root at estestest ~]# cat /proc/cluster/dlm_stats DLM stats (HZ=1000) Lock operations: 1688738 Unlock operations: 838064 Convert operations: 0 Completion ASTs: 2526802 Blocking ASTs: 0 Lockqueue num waittime ave [root at frascati ~]# cat /proc/cluster/dlm_stats DLM stats (HZ=1000) Lock operations: 1122141 Unlock operations: 556623 Convert operations: 0 Completion ASTs: 1678764 Blocking ASTs: 0 Lockqueue num waittime ave WAIT_RSB 6 3 0 WAIT_GRANT 1122141 32507056 28 WAIT_UNLOCK 556623 316924 0 Total 1678770 32823983 19 > > How many services are you running? > At the moment I have 3 services on estestest (Sybase SQL server, a tomcat5 application and an apache web site) and 2 services on frascati (another tomcat5 application and Postgres SQL server). > Also, take a look at: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634 > > The RPMs there might solve the problem with dlm_recvd. > Rgmanager in some situations causes a strange leak of NL > locks in the DLM. If dlm_recvd has to traverse lock lists > and that list is ever-growing (total speculation here), it > could explain the amount of consumed system time. > If I use those RPMs, will the patches be included in RHCS 4.5 (I think so, but just to be sure...)? Thanks, Marco _______________________________________________________ Messaggio analizzato e protetto da tecnologia antivirus Servizio erogato dal sistema informativo della Presidenza del Consiglio dei Ministri -------------- next part -------------- A non-text attachment was scrubbed... Name: frascati_yearly_CPU_Usage.jpg Type: image/jpeg Size: 29334 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: frascati_yearly_System_Load.jpg Type: image/jpeg Size: 27710 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: estestest_yearly_CPU_Usage.jpg Type: image/jpeg Size: 30518 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: estestest_yearly_System_Load.jpg Type: image/jpeg Size: 26103 bytes Desc: not available URL: From simone.gotti at email.it Fri Jan 5 10:11:27 2007 From: simone.gotti at email.it (Simone Gotti) Date: Fri, 05 Jan 2007 11:11:27 +0100 Subject: [Linux-cluster] [PATCH] wrong strings in quorum disk registration. Message-ID: <1167991887.3079.13.camel@localhost> Hi all, on the openais based cman-2.0.35-2.el5 the output of "cman_tool nodes" or "clustat" provides a wrong quorum device name: [root at nodo01 ~]# cman_tool nodes Node Sts Inc Joined Name 0 X 0 /dev/sdb1?????? 1 M 4 2007-01-05 13:03:18 nodo01 2 X 0 nodo02 [root at nodo01 ~]# clustat /dev/sdb1?????? not found realloc 924 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ nodo01 1 Online, Local nodo02 2 Offline /dev/sdb1?????? 0 Online, Estranged Looking at the code look like the call to info_call in cman_register_quorum_device passed a too small by one "inlen" argument missing the ending \0 of the device name string. I attached a patch the should fix this, I hope it's correct. Thanks! Bye! -- Simone Gotti -- Email.it, the professional e-mail, gratis per te: http://www.email.it/f Sponsor: Cassine di Pietra: una variet? completa di vini del Veneto, * in pi? un regalo per il primo ordine! Clicca subito qui * Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=3925&d=5-1 -------------- next part -------------- A non-text attachment was scrubbed... Name: cman-2.0.35-libcman-cman_register_quorum_device-info_call.patch Type: text/x-patch Size: 600 bytes Desc: not available URL: From pcaulfie at redhat.com Fri Jan 5 10:13:16 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 05 Jan 2007 10:13:16 +0000 Subject: [Linux-cluster] High system CPU usage in one of a two node cluster In-Reply-To: <1167842371.26770.162.camel@rei.boston.devel.redhat.com> References: <016001c72f2b$57f3c080$8ec9100a@nicchio> <1167842371.26770.162.camel@rei.boston.devel.redhat.com> Message-ID: <459E24BC.7070301@redhat.com> Lon Hohberger wrote: > On Wed, 2007-01-03 at 12:35 +0100, Marco Lusini wrote: >> Hi all, >> >> I have 3 2-node clusters, running just cluster suite, without gfs, >> each one updated with the latest >> packages released by RHN. >> >> In each cluster one of the two nodes has a steadily growing system CPU >> usage, which seems >> to be consumed by clurgmgrd and dlm_recvd. >> As an example here is the running time accumulated on one cluster >> since 20 december when >> oit was rebooted: >> >> [root at estestest ~]# ps axo pid,start,time,args >> PID STARTED TIME COMMAND >> ... >> 10221 Dec 20 10:37:05 clurgmgrd >> 11169 Dec 20 06:48:24 [dlm_recvd] >> ... >> >> [root at frascati ~]# ps axo pid,start,time,args >> PID STARTED TIME COMMAND >> ... >> 6226 Dec 20 00:04:17 clurgmgrd >> 8249 Dec 20 00:00:19 [dlm_recvd] >> ... I suspect these two being at the top are related. If clurgmgrd is taking out locks then dlm_recvd will also be busy >> I attach two graphs made with RRD which show that the system CPU usage >> is steadily growing: >> note how the trend changed after the reboot on 20 december. > >> Of course as the system usage increases so does the system load and I >> am afraid of what will >> happen after 1-2 months of uptime... > > System load averages are the average of the number of processes on the > run queue over the past 1, 5, and 15 minutes. It doesn't generally > trend upwards over time; if that were the case, I'd be in trouble: > > ... > 28204 15:11:11 01:04:19 /usr/lib/firefox-1.5.0.9/firefox-bin -UILocale > en-US > ... > > However, it is a little odd that you had 10 hours of runtime for > clurgmgrd and over 6 for dlm_recvd. Just taking a wild guess, but it > looks like the locks were all mastered on frascati. > > How many services are you running? > > Also, take a look at: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634 > > The RPMs there might solve the problem with dlm_recvd. Rgmanager in > some situations causes a strange leak of NL locks in the DLM. If > dlm_recvd has to traverse lock lists and that list is ever-growing > (total speculation here), it could explain the amount of consumed system > time. > Yes, DLM will do a lot of traversing lock lists if there are a lot of similar locks on one resource. VMS has an optimisation on this known as the group grant and concversion grant modes that we don't currently implement. > How can I get more info on this? I checked /proc/cluster/dlm_locks > on both nodes and it is empty. /proc/cluster/dlm_locks needs to be told which lockspace to use. Just catting that file after bootup will show nothing. What you need to do is to echo the lockspace name into that file, then look a it. You can get the lockspace names with the "cman_tool services" command so (eg) # cman_tool services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [1 2] DLM Lock Space: "clvmd" 2 3 run - [1 2] # echo "clvmd" > /proc/cluster/dlm_locks # cat /proc/cluster/dlm_locks This shows locks held by clvmd. If you want to look at another lockspace just echo the other name into the /proc file. -- patrick From pcaulfie at redhat.com Fri Jan 5 10:31:55 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 05 Jan 2007 10:31:55 +0000 Subject: [Linux-cluster] [PATCH] wrong strings in quorum disk registration. In-Reply-To: <1167991887.3079.13.camel@localhost> References: <1167991887.3079.13.camel@localhost> Message-ID: <459E291B.5040101@redhat.com> Simone Gotti wrote: > Hi all, > > on the openais based cman-2.0.35-2.el5 the output of "cman_tool nodes" > or "clustat" provides a wrong quorum device name: > > [root at nodo01 ~]# cman_tool nodes > Node Sts Inc Joined Name > 0 X 0 /dev/sdb1?? > 1 M 4 2007-01-05 13:03:18 nodo01 > 2 X 0 nodo02 > > [root at nodo01 ~]# clustat > /dev/sdb1?? not found > realloc 924 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > nodo01 1 Online, Local > nodo02 2 Offline > /dev/sdb1?? 0 Online, Estranged > > > Looking at the code look like the call to info_call in > cman_register_quorum_device passed a too small by one "inlen" argument > missing the ending \0 of the device name string. > I attached a patch the should fix this, I hope it's correct. > Now committed to CVS Thank very much. -- patrick From pcaulfie at redhat.com Fri Jan 5 10:35:22 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 05 Jan 2007 10:35:22 +0000 Subject: [Linux-cluster] [PATCH] Fix cman_get_node_id in qdisk. In-Reply-To: <459D0DFC.1080609@redhat.com> References: <1167919992.11659.14.camel@localhost> <459D0DFC.1080609@redhat.com> Message-ID: <459E29EA.4060105@redhat.com> Patrick Caulfield wrote: > Simone Gotti wrote: >> Hi all, >> >> >> testing the qdiskd provided by the new openais cman-2.0.35-2 (on RHEL5 >> Beta 2) I found that it would no start with the following error: >> >> Could not determine local node ID; cannot start >> >> >> Looking at the code of the other programs that connects to cman I >> noticed that before the call libcman function: >> cman_get_node(cman_handle_t handle, int nodeid, cman_node_t *node), >> they are inizialing with all zeros the third argument, so I did the same >> with qdiskd and it worked. >> >> Looking at the cvs repository I didn't find a fix for it. >> A patch is attached. >> > > Yes, that patch look correct to me. > > Thanks Now in CVS. -- patrick From marco.lusini at governo.it Fri Jan 5 10:49:46 2007 From: marco.lusini at governo.it (Marco Lusini) Date: Fri, 5 Jan 2007 11:49:46 +0100 Subject: R: [Linux-cluster] High system CPU usage in one of a two node cluster In-Reply-To: <459E24BC.7070301@redhat.com> Message-ID: <004801c730b7$466cc3b0$8ec9100a@nicchio> Thanks Patrick, I have tried to get the locks for Magma on both nodes, and I get the same error of https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634: cat: /proc/cluster/dlm_locks: Cannot allocate memory I will try to install the RPMs from Lon if I can and see if it solve the problem... Marco > -----Messaggio originale----- > Da: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] Per conto di > Patrick Caulfield > Inviato: venerd? 5 gennaio 2007 11.13 > A: linux clustering > Oggetto: Re: [Linux-cluster] High system CPU usage in one of > a two node cluster > > > Lon Hohberger wrote: > > On Wed, 2007-01-03 at 12:35 +0100, Marco Lusini wrote: > >> Hi all, > >> > >> I have 3 2-node clusters, running just cluster suite, without gfs, > >> each one updated with the latest packages released by RHN. > >> > >> In each cluster one of the two nodes has a steadily growing system > >> CPU usage, which seems to be consumed by clurgmgrd and dlm_recvd. > >> As an example here is the running time accumulated on one cluster > >> since 20 december when oit was rebooted: > >> > >> [root at estestest ~]# ps axo pid,start,time,args > >> PID STARTED TIME COMMAND > >> ... > >> 10221 Dec 20 10:37:05 clurgmgrd > >> 11169 Dec 20 06:48:24 [dlm_recvd] > >> ... > >> > >> [root at frascati ~]# ps axo pid,start,time,args > >> PID STARTED TIME COMMAND > >> ... > >> 6226 Dec 20 00:04:17 clurgmgrd > >> 8249 Dec 20 00:00:19 [dlm_recvd] > >> ... > > I suspect these two being at the top are related. If > clurgmgrd is taking out locks then dlm_recvd will also be busy > > >> I attach two graphs made with RRD which show that the system CPU > >> usage is steadily growing: > >> note how the trend changed after the reboot on 20 december. > > > >> Of course as the system usage increases so does the system > load and I > >> am afraid of what will happen after 1-2 months of uptime... > > > > System load averages are the average of the number of > processes on the > > run queue over the past 1, 5, and 15 minutes. It doesn't generally > > trend upwards over time; if that were the case, I'd be in trouble: > > > > ... > > 28204 15:11:11 01:04:19 > /usr/lib/firefox-1.5.0.9/firefox-bin -UILocale > > en-US ... > > > > However, it is a little odd that you had 10 hours of runtime for > > clurgmgrd and over 6 for dlm_recvd. Just taking a wild > guess, but it > > looks like the locks were all mastered on frascati. > > > > How many services are you running? > > > > Also, take a look at: > > > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634 > > > > The RPMs there might solve the problem with dlm_recvd. > Rgmanager in > > some situations causes a strange leak of NL locks in the DLM. If > > dlm_recvd has to traverse lock lists and that list is ever-growing > > (total speculation here), it could explain the amount of consumed > > system time. > > > > > Yes, DLM will do a lot of traversing lock lists if there are > a lot of similar locks on one resource. VMS has an > optimisation on this known as the group grant and concversion > grant modes that we don't currently implement. > > > > How can I get more info on this? I checked > /proc/cluster/dlm_locks on > > both nodes and it is empty. > > /proc/cluster/dlm_locks needs to be told which lockspace to > use. Just catting that file after bootup will show nothing. > What you need to do is to echo the lockspace name into that > file, then look a it. You can get the lockspace names with > the "cman_tool services" command so (eg) > > # cman_tool services > > Service Name GID LID > State Code > Fence Domain: "default" 1 2 run - > [1 2] > > DLM Lock Space: "clvmd" 2 3 run - > [1 2] > > # echo "clvmd" > /proc/cluster/dlm_locks # cat /proc/cluster/dlm_locks > > This shows locks held by clvmd. If you want to look at > another lockspace just echo the other name into the /proc file. > -- > > patrick > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > _______________________________________________________ > Messaggio analizzato e protetto da tecnologia antivirus > > Servizio erogato dal sistema informativo della Presidenza del > Consiglio dei Ministri From pcaulfie at redhat.com Fri Jan 5 10:54:18 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 05 Jan 2007 10:54:18 +0000 Subject: R: [Linux-cluster] High system CPU usage in one of a two node cluster In-Reply-To: <004801c730b7$466cc3b0$8ec9100a@nicchio> References: <004801c730b7$466cc3b0$8ec9100a@nicchio> Message-ID: <459E2E5A.1060304@redhat.com> Marco Lusini wrote: > Thanks Patrick, > > I have tried to get the locks for Magma on both nodes, > and I get the same error of > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634: > > cat: /proc/cluster/dlm_locks: Cannot allocate memory That shows that there is definitely a problem of too many locks there! > I will try to install the RPMs from Lon if I can and > see if it solve the problem... > I think it will, AFAIK Magma should not be allocating many locks, certainly not enough to cause a allocation overflow! -- patrick From marco.lusini at governo.it Fri Jan 5 11:04:31 2007 From: marco.lusini at governo.it (Marco Lusini) Date: Fri, 5 Jan 2007 12:04:31 +0100 Subject: R: R: [Linux-cluster] High system CPU usage in one of a two nodecluster In-Reply-To: <459E2E5A.1060304@redhat.com> Message-ID: <004901c730b9$5112d460$8ec9100a@nicchio> I was looking at Lon's RPMs, and they are (apparently) based on rgmanager 1.9.53-1, while the last released package is 1.9.54-1... Would it be possible to have fixed RPMs compiled wrt the last version? TIA, Marco > -----Messaggio originale----- > Da: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] Per conto di > Patrick Caulfield > Inviato: venerd? 5 gennaio 2007 11.54 > A: linux clustering > Oggetto: Re: R: [Linux-cluster] High system CPU usage in one > of a two nodecluster > > > Marco Lusini wrote: > > Thanks Patrick, > > > > I have tried to get the locks for Magma on both nodes, and > I get the > > same error of > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634: > > > > cat: /proc/cluster/dlm_locks: Cannot allocate memory > > That shows that there is definitely a problem of too many locks there! > > > > I will try to install the RPMs from Lon if I can and see if > it solve > > the problem... > > > > I think it will, AFAIK Magma should not be allocating many > locks, certainly not enough to cause a allocation overflow! > > -- > > patrick > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > _______________________________________________________ > Messaggio analizzato e protetto da tecnologia antivirus > > Servizio erogato dal sistema informativo della Presidenza del > Consiglio dei Ministri From pcaulfie at redhat.com Fri Jan 5 11:22:33 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 05 Jan 2007 11:22:33 +0000 Subject: R: R: [Linux-cluster] High system CPU usage in one of a two nodecluster In-Reply-To: <004901c730b9$5112d460$8ec9100a@nicchio> References: <004901c730b9$5112d460$8ec9100a@nicchio> Message-ID: <459E34F9.1060604@redhat.com> Marco Lusini wrote: > I was looking at Lon's RPMs, and they are (apparently) > based on rgmanager 1.9.53-1, while the last released > package is 1.9.54-1... > Would it be possible to have fixed RPMs compiled wrt the > last version? > I'm not up on rgmanager versions & RPMs so I'll leave that for Lon, but I reckon it's still worth your while trying that package or building from source with the patch if you can. -- patrick From marco.lusini at governo.it Fri Jan 5 11:39:47 2007 From: marco.lusini at governo.it (Marco Lusini) Date: Fri, 5 Jan 2007 12:39:47 +0100 Subject: R: R: R: [Linux-cluster] High system CPU usage in one of a two nodecluster In-Reply-To: <459E34F9.1060604@redhat.com> Message-ID: <004b01c730be$3d8da140$8ec9100a@nicchio> In the mean time I diff-ed rel 53 and rel 54, and the olny difference is a kill related to NFS locks (which I don't use), so I'll try to rebuild updated RPMs and will give them a try... I'll let you know of the results (it will take at least a week to be sure that the CPU kernel usage is not growing again...) Marco > -----Messaggio originale----- > Da: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] Per conto di > Patrick Caulfield > Inviato: venerd? 5 gennaio 2007 12.23 > A: linux clustering > Oggetto: Re: R: R: [Linux-cluster] High system CPU usage in > one of a two nodecluster > > > Marco Lusini wrote: > > I was looking at Lon's RPMs, and they are (apparently) based on > > rgmanager 1.9.53-1, while the last released package is 1.9.54-1... > > Would it be possible to have fixed RPMs compiled wrt the > last version? > > > > I'm not up on rgmanager versions & RPMs so I'll leave that > for Lon, but I reckon it's still worth your while trying that > package or building from source with the patch if you can. > > -- > > patrick > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > _______________________________________________________ > Messaggio analizzato e protetto da tecnologia antivirus > > Servizio erogato dal sistema informativo della Presidenza del > Consiglio dei Ministri From simone.gotti at email.it Fri Jan 5 14:02:17 2007 From: simone.gotti at email.it (Simone Gotti) Date: Fri, 05 Jan 2007 15:02:17 +0100 Subject: [Linux-cluster] [PATCH] Fix fence_agent string not correctly sent over the cluster. Message-ID: <1168005737.6322.11.camel@localhost> Hi all, on the openais based cman-2.0.35-2.el5 I noticed that the output of "cman_tool nodes -f" provided a not correctly terminated fence agent name: [root at nodo01 ~]# cman_tool nodes -f Node Sts Inc Joined Name 1 M 4 2007-01-05 17:39:27 nodo01 2 X 0 nodo02 Last fenced: 2007-01-05 17:39:41 by fence-node02!? ^^ I think the problem is in the function do_cmd_update_fence_info in cman/daemon/commands.c that calculate the bytes needed by the message to send without counting the \0 terminating the fence_agent string. I found also another similar problem in another point of the file and I changed also it, but without testing. I made a little patch and I hope it's correct. Thanks! Bye! -- Simone Gotti -- Email.it, the professional e-mail, gratis per te: http://www.email.it/f Sponsor: Acquista i tuoi gioielli in tutta sicurezza ed a prezzi veramente imbattibili. Sfoglia il nostro catalogo on-line! Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=5634&d=5-1 -------------- next part -------------- A non-text attachment was scrubbed... Name: cman-2.0.35-cman-do_cmd_update_fence_info-msg_size.patch Type: text/x-patch Size: 1038 bytes Desc: not available URL: From baesso at ksolutions.it Fri Jan 5 15:33:24 2007 From: baesso at ksolutions.it (Baesso Mirko) Date: Fri, 5 Jan 2007 16:33:24 +0100 Subject: [Linux-cluster] Kernel Bug Message-ID: <10DBC6018C67E94C961A7334501A2E6F4041B6@kmail.ksolutions.it> Hi, we received this error on our cluster node, could you please tell me how to debug? Thanks Jan 5 14:35:02 clnfs2 kernel: ------------[ cut here ]------------ Jan 5 14:35:02 clnfs2 kernel: ------------[ cut here ]------------ Jan 5 14:35:02 clnfs2 kernel: kernel BUG at include/asm/spinlock.h:109! Jan 5 14:35:02 clnfs2 kernel: invalid operand: 0000 [#1] Jan 5 14:35:02 clnfs2 kernel: SMP Jan 5 14:35:02 clnfs2 kernel: Modules linked in: dlm(U) cman(U) nfsd exportfs lockd parport_pc lp parport autofs4 i2c_dev i2 c_core sunrpc dm_round_robin md5 ipv6 dm_multipath button battery ac uhci_hcd ehci_hcd hw_random shpchp e1000 tg3 bonding(U) sg dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata mptscsih mptbase sd_mod scsi_mod Jan 5 14:35:02 clnfs2 kernel: CPU: 1 Jan 5 14:35:02 clnfs2 kernel: EIP: 0060:[] Not tainted VLI Jan 5 14:35:02 clnfs2 kernel: EFLAGS: 00010002 (2.6.9-22.ELsmp) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pcaulfie at redhat.com Fri Jan 5 15:44:23 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 05 Jan 2007 15:44:23 +0000 Subject: [Linux-cluster] Kernel Bug In-Reply-To: <10DBC6018C67E94C961A7334501A2E6F4041B6@kmail.ksolutions.it> References: <10DBC6018C67E94C961A7334501A2E6F4041B6@kmail.ksolutions.it> Message-ID: <459E7257.5090009@redhat.com> Baesso Mirko wrote: > Hi, > > we received this error on our cluster node, > > could you please tell me how to debug? > > Thanks > > > > Jan 5 14:35:02 clnfs2 kernel: ------------[ cut here ]------------ > > Jan 5 14:35:02 clnfs2 kernel: ------------[ cut here ]------------ > > Jan 5 14:35:02 clnfs2 kernel: kernel BUG at include/asm/spinlock.h:109! > > Jan 5 14:35:02 clnfs2 kernel: invalid operand: 0000 [#1] > > Jan 5 14:35:02 clnfs2 kernel: SMP > > Jan 5 14:35:02 clnfs2 kernel: Modules linked in: dlm(U) cman(U) nfsd > exportfs lockd parport_pc lp parport autofs4 i2c_dev i2 > > c_core sunrpc dm_round_robin md5 ipv6 dm_multipath button battery ac > uhci_hcd ehci_hcd hw_random shpchp e1000 tg3 bonding(U) > > sg dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata > mptscsih mptbase sd_mod scsi_mod > > Jan 5 14:35:02 clnfs2 kernel: CPU: 1 > > Jan 5 14:35:02 clnfs2 kernel: EIP: 0060:[] Not tainted VLI > > Jan 5 14:35:02 clnfs2 kernel: EFLAGS: 00010002 (2.6.9-22.ELsmp) > providing us with the whole of the kernel traceback would be a start ;-) -- patrick From baesso at ksolutions.it Fri Jan 5 15:56:08 2007 From: baesso at ksolutions.it (Baesso Mirko) Date: Fri, 5 Jan 2007 16:56:08 +0100 Subject: R: [Linux-cluster] Kernel Bug Message-ID: <10DBC6018C67E94C961A7334501A2E6F4041B7@kmail.ksolutions.it> Thanks for reply This is all kernel log I found on messages before server restarting ---------------------- Jan 5 14:35:02 clnfs2 kernel: Assertion failure in log_do_checkpoint() at fs/jbd/checkpoint.c:361: "drop_count != 0 || clean up_ret != 0" Jan 5 14:35:02 clnfs2 kernel: ------------[ cut here ]------------ Jan 5 14:35:02 clnfs2 kernel: ------------[ cut here ]------------ Jan 5 14:35:02 clnfs2 kernel: kernel BUG at include/asm/spinlock.h:109! Jan 5 14:35:02 clnfs2 kernel: invalid operand: 0000 [#1] Jan 5 14:35:02 clnfs2 kernel: SMP Jan 5 14:35:02 clnfs2 kernel: Modules linked in: dlm(U) cman(U) nfsd exportfs lockd parport_pc lp parport autofs4 i2c_dev i2 c_core sunrpc dm_round_robin md5 ipv6 dm_multipath button battery ac uhci_hcd ehci_hcd hw_random shpchp e1000 tg3 bonding(U) sg dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata mptscsih mptbase sd_mod scsi_mod Jan 5 14:35:02 clnfs2 kernel: CPU: 1 Jan 5 14:35:02 clnfs2 kernel: EIP: 0060:[] Not tainted VLI Jan 5 14:35:02 clnfs2 kernel: EFLAGS: 00010002 (2.6.9-22.ELsmp) Jan 5 14:39:46 clnfs2 syslogd 1.4.1: restart. Jan 5 14:39:46 clnfs2 syslog: syslogd startup succeeded Jan 5 14:39:46 clnfs2 kernel: klogd 1.4.1, log source = /proc/kmsg started. Jan 5 14:39:46 clnfs2 kernel: Linux version 2.6.9-22.ELsmp (bhcompile at porky.build.redhat.com) (gcc version 3.4.4 20050721 (R ed Hat 3.4.4-2)) #1 SMP Mon Sep 19 18:32:14 EDT 2005 ------------------------------------- -----Messaggio originale----- Da: Patrick Caulfield [mailto:pcaulfie at redhat.com] Inviato: venerd? 5 gennaio 2007 16.44 A: linux clustering Oggetto: Re: [Linux-cluster] Kernel Bug Baesso Mirko wrote: > Hi, > > we received this error on our cluster node, > > could you please tell me how to debug? > > Thanks > > > > Jan 5 14:35:02 clnfs2 kernel: ------------[ cut here ]------------ > > Jan 5 14:35:02 clnfs2 kernel: ------------[ cut here ]------------ > > Jan 5 14:35:02 clnfs2 kernel: kernel BUG at include/asm/spinlock.h:109! > > Jan 5 14:35:02 clnfs2 kernel: invalid operand: 0000 [#1] > > Jan 5 14:35:02 clnfs2 kernel: SMP > > Jan 5 14:35:02 clnfs2 kernel: Modules linked in: dlm(U) cman(U) nfsd > exportfs lockd parport_pc lp parport autofs4 i2c_dev i2 > > c_core sunrpc dm_round_robin md5 ipv6 dm_multipath button battery ac > uhci_hcd ehci_hcd hw_random shpchp e1000 tg3 bonding(U) > > sg dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata > mptscsih mptbase sd_mod scsi_mod > > Jan 5 14:35:02 clnfs2 kernel: CPU: 1 > > Jan 5 14:35:02 clnfs2 kernel: EIP: 0060:[] Not tainted VLI > > Jan 5 14:35:02 clnfs2 kernel: EFLAGS: 00010002 (2.6.9-22.ELsmp) > providing us with the whole of the kernel traceback would be a start ;-) -- patrick -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From Darrell.Frazier at crc.army.mil Fri Jan 5 16:13:59 2007 From: Darrell.Frazier at crc.army.mil (Frazier, Darrell USA CRC (Contractor)) Date: Fri, 5 Jan 2007 10:13:59 -0600 Subject: [Linux-cluster] How to turn off the cluster attribute of a local volume Message-ID: Hi, I have an issue I havent yet been able to find an answer to. I created a local volume on a cluster node to give it more swap space using command-line tools (pvcreate, vgcreate, lvcreate). Unbeknownst to me, at the time I created the volume, the clvmd subsystem was dead but locked. Anyway, now the system I have created the filesystem on thinks that the partition created is a clustered partition. I found this out using the vgs command I found in the cluster FAQ (you da man Bob Peterson) VG #PV #LV #SN Attr VSize VFree homevg 1 1 0 wz--n- 3.12G 1.12G optvg 1 1 0 wz--n- 7.84G 2.84G rootvg 1 1 0 wz--n- 3.12G 1.12G swapvg00 1 1 0 wz--n- 3.12G 1.12G swapvg01 1 1 0 wz--nc 9.32G 324.00M tmpvg 1 1 0 wz--n- 4.72G 1.69G u01vg 1 1 0 wz--n- 33.00G 12.00G u02vg 1 1 0 wz--nc 399.61G 0 usrvg 1 1 0 wz--n- 6.28G 2.28G varvg 1 1 0 wz--n- 6.28G 2.28G Though I would love to know how this happened. It is more important to me right now to know how to disable the clustering attribute on this partition. Thanx much in advance. Darrell J. Frazier Unix System Administrator US Army Combat Readiness Center CAUTION: This electronic transmission may contain information protected by deliberative process or other privilege, which is protected from disclosure under the Freedom of Information Act, 5 U.S.C. ? 552. The information is intended for the use of the individual or agency to which it was sent. If you are not the intended recipient, be aware that any disclosure, distribution or use of the contents of this information is prohibited. Do not release outside of DoD channels without prior authorization from the sender. The sender provides no assurance as to the integrity of the content of this electronic transmission after it has been sent and received by the intended email recipient. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpeterso at redhat.com Fri Jan 5 16:45:36 2007 From: rpeterso at redhat.com (Robert Peterson) Date: Fri, 05 Jan 2007 10:45:36 -0600 Subject: [Linux-cluster] How to turn off the cluster attribute of a local volume In-Reply-To: References: Message-ID: <459E80B0.3030808@redhat.com> Frazier, Darrell USA CRC (Contractor) wrote: > > Hi, > > I have an issue I havent yet been able to find an answer to. I created > a local volume on a cluster node to give it more swap space using > command-line tools (pvcreate, vgcreate, lvcreate). Unbeknownst to me, > at the time I created the volume, the clvmd subsystem was dead but locked. > > Anyway, now the system I have created the filesystem on thinks that > the partition created is a clustered partition. I found this out using > the vgs command I found in the cluster FAQ (you da man Bob Peterson) > > VG #PV #LV #SN Attr VSize VFree > homevg 1 1 0 wz--n- 3.12G 1.12G > optvg 1 1 0 wz--n- 7.84G 2.84G > rootvg 1 1 0 wz--n- 3.12G 1.12G > swapvg00 1 1 0 wz--n- 3.12G 1.12G > / //swapvg01 1 1 0 wz--nc 9.32G 324.00M/ > tmpvg 1 1 0 wz--n- 4.72G 1.69G > u01vg 1 1 0 wz--n- 33.00G 12.00G > u02vg 1 1 0 wz--nc 399.61G 0 > usrvg 1 1 0 wz--n- 6.28G 2.28G > varvg 1 1 0 wz--n- 6.28G 2.28G > > Though I would love to know how this happened. It is more important to > me right now to know how to disable the clustering attribute on this > partition. Thanx much in advance. > > *Darrell J. Frazier* > Unix System Administrator > US Army Combat Readiness Center > *//* > Hi Darrell, Glad to be of service! What you want to disable the clustering bit is: vgchange -cn The answer isn't "exactly" in the faq, but you can find something close here: http://sources.redhat.com/cluster/faq.html#clvmd_clustered Regards, Bob Peterson Red Hat Cluster Suite From Darrell.Frazier at crc.army.mil Fri Jan 5 19:46:04 2007 From: Darrell.Frazier at crc.army.mil (Frazier, Darrell USA CRC (Contractor)) Date: Fri, 5 Jan 2007 13:46:04 -0600 Subject: [Linux-cluster] How to turn off the cluster attribute of a lo cal volume Message-ID: Thanks Bob, I will try that. I wonder how that bit got turned on using standard local lvm commands? Interesting. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson Sent: Friday, January 05, 2007 10:46 AM To: linux clustering Subject: Re: [Linux-cluster] How to turn off the cluster attribute of a local volume Frazier, Darrell USA CRC (Contractor) wrote: > > Hi, > > I have an issue I havent yet been able to find an answer to. I created > a local volume on a cluster node to give it more swap space using > command-line tools (pvcreate, vgcreate, lvcreate). Unbeknownst to me, > at the time I created the volume, the clvmd subsystem was dead but locked. > > Anyway, now the system I have created the filesystem on thinks that > the partition created is a clustered partition. I found this out using > the vgs command I found in the cluster FAQ (you da man Bob Peterson) > > VG #PV #LV #SN Attr VSize VFree > homevg 1 1 0 wz--n- 3.12G 1.12G > optvg 1 1 0 wz--n- 7.84G 2.84G > rootvg 1 1 0 wz--n- 3.12G 1.12G > swapvg00 1 1 0 wz--n- 3.12G 1.12G > / //swapvg01 1 1 0 wz--nc 9.32G 324.00M/ > tmpvg 1 1 0 wz--n- 4.72G 1.69G > u01vg 1 1 0 wz--n- 33.00G 12.00G > u02vg 1 1 0 wz--nc 399.61G 0 > usrvg 1 1 0 wz--n- 6.28G 2.28G > varvg 1 1 0 wz--n- 6.28G 2.28G > > Though I would love to know how this happened. It is more important to > me right now to know how to disable the clustering attribute on this > partition. Thanx much in advance. > > *Darrell J. Frazier* > Unix System Administrator > US Army Combat Readiness Center > *//* > Hi Darrell, Glad to be of service! What you want to disable the clustering bit is: vgchange -cn The answer isn't "exactly" in the faq, but you can find something close here: http://sources.redhat.com/cluster/faq.html#clvmd_clustered Regards, Bob Peterson Red Hat Cluster Suite -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From Darrell.Frazier at crc.army.mil Fri Jan 5 20:16:20 2007 From: Darrell.Frazier at crc.army.mil (Frazier, Darrell USA CRC (Contractor)) Date: Fri, 5 Jan 2007 14:16:20 -0600 Subject: [Linux-cluster] FC Fabric fencing Message-ID: Hi, I am looking into adding another fence level to my two two-node clusters. Since our APC setup doesn't support powering off and on of specific outlets, I thought I would look to our Fiber channel fabric. Here are the details: 2 sets of 2 HP DL380 systems with dual port qlogic isp2422 FC cards 2 qlogic SANBox 5600 Fiber Channel switches 1 Compellent enclosure (12 TB) The way we currently have it set up is each FC card has one port going to one switch (We plan to do redundancy with the other port to the second switch) We are currently using HP ilo to fence the cluster. Since reading the interesting posts on this board regarding HP ilo, I thought I would add a fabric fence for even more redundancy. My questions are: Will the SANBox 2 fence device in RHCS support my switches? Also, how does fabric fencing work, since I have other systems besides the four systems connected to these switches, I am hoping the fence device can stop access to the SAN on a per-port basis. My thanx in advance for your replies... Darrell J. Frazier Unix System Administrator US Army Combat Readiness Center CAUTION: This electronic transmission may contain information protected by deliberative process or other privilege, which is protected from disclosure under the Freedom of Information Act, 5 U.S.C. ? 552. The information is intended for the use of the individual or agency to which it was sent. If you are not the intended recipient, be aware that any disclosure, distribution or use of the contents of this information is prohibited. Do not release outside of DoD channels without prior authorization from the sender. The sender provides no assurance as to the integrity of the content of this electronic transmission after it has been sent and received by the intended email recipient. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Fri Jan 5 22:03:23 2007 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 05 Jan 2007 17:03:23 -0500 Subject: [Linux-cluster] [PATCH] wrong strings in quorum disk registration. In-Reply-To: <1167991887.3079.13.camel@localhost> References: <1167991887.3079.13.camel@localhost> Message-ID: <1168034603.5634.0.camel@rei.boston.devel.redhat.com> On Fri, 2007-01-05 at 11:11 +0100, Simone Gotti wrote: > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > nodo01 1 Online, Local > nodo02 2 Offline > /dev/sdb1?? 0 Online, Estranged > Nice! Heh, the fact that clustat reports /dev/sdb1 is ... a little weird, and actually a bug, but I'll leave it for now ;) -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Fri Jan 5 22:04:14 2007 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 05 Jan 2007 17:04:14 -0500 Subject: R: [Linux-cluster] High system CPU usage in one of a two node cluster In-Reply-To: <004801c730b7$466cc3b0$8ec9100a@nicchio> References: <004801c730b7$466cc3b0$8ec9100a@nicchio> Message-ID: <1168034654.5634.2.camel@rei.boston.devel.redhat.com> On Fri, 2007-01-05 at 11:49 +0100, Marco Lusini wrote: > Thanks Patrick, > > I have tried to get the locks for Magma on both nodes, > and I get the same error of > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634: > > cat: /proc/cluster/dlm_locks: Cannot allocate memory > > I will try to install the RPMs from Lon if I can and > see if it solve the problem... The RPMs in 212634 have solved that problem for several people :) -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Fri Jan 5 22:04:58 2007 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 05 Jan 2007 17:04:58 -0500 Subject: R: R: [Linux-cluster] High system CPU usage in one of a two nodecluster In-Reply-To: <004901c730b9$5112d460$8ec9100a@nicchio> References: <004901c730b9$5112d460$8ec9100a@nicchio> Message-ID: <1168034698.5634.4.camel@rei.boston.devel.redhat.com> On Fri, 2007-01-05 at 12:04 +0100, Marco Lusini wrote: > I was looking at Lon's RPMs, and they are (apparently) > based on rgmanager 1.9.53-1, while the last released > package is 1.9.54-1... > Would it be possible to have fixed RPMs compiled wrt the > last version? use .53 for now; I'll build new ones on .54 on Monday. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From danwest at comcast.net Fri Jan 5 23:19:49 2007 From: danwest at comcast.net (danwest) Date: Fri, 05 Jan 2007 18:19:49 -0500 Subject: [Linux-cluster] qdiskd eviction on missed writes Message-ID: <1168039189.3646.2.camel@belmont.site> >On Thu, 2007-01-04 at 11:35 -0500, Lon Hohberger wrote: >I'll be working on a patch to allow you to turn on/off RT scheduling for >qdiskd from the configuration file (as well as other qdisk-related bits) >tomorrow and early next week -- would you like to test it when I get it >ready? >-- Lon sure, ready and willing to test as early as possible. Thanks, Dan From isplist at logicore.net Sun Jan 7 16:53:44 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Sun, 7 Jan 2007 10:53:44 -0600 Subject: [Linux-cluster] Multiple clusters Message-ID: <200717105344.765824@leena> In my case, I have separate GFS storage for web, mysql, mail and other parts of our network. All of the servers are in one single cluster but share their own areas. For example, web servers only share web storage areas, etc. My question is... is there value in spilling these things up? I have enough fencing hardware to split up each section and can see only one benefit in that if the cluster crashes, it does not take everything down, only that section. Yet, all sections function as one site so, not sure about the value in that. Any thoughts on this? Mike From isplist at logicore.net Sun Jan 7 17:36:30 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Sun, 7 Jan 2007 11:36:30 -0600 Subject: [Linux-cluster] Cluster.conf In-Reply-To: <1160576999.11134.45.camel@rei.boston.devel.redhat.com> Message-ID: <200717113630.820770@leena> >>> http://sources.redhat.com/cluster/doc/cluster_schema.html I've been looking at this and searching the net high and low and just can't seem to find enough information to build a proper cluster.conf file. I'm almost sure that it is the cause of some of the problems I am still suffering months into this cluster learning. For example, I've seen all sorts of uses for "method name" but have not found ONE single document showing/explaining all of the possible choices or why on each one. That goes for MANY other areas of this file. Is there not any documentation for building this file other than the above? It's just not enough for me at least. Mike From simone.gotti at email.it Sun Jan 7 19:29:54 2007 From: simone.gotti at email.it (Simone Gotti) Date: Sun, 07 Jan 2007 20:29:54 +0100 Subject: [Linux-cluster] qdiskd + cman: trying to fix the use of quorumdev_poll. Message-ID: <1168198194.4309.19.camel@localhost> Hi all, I'm using the openais based cman-2.0.35.el5 and I'm trying to understand how the quorum disk concept is implemented in rhcs, after various experiments I think that I found at least 2 problems: Problem 1) Little bug in the quorum disk polling mechanism: looking at the code in cman/daemon/commands.c the variable quorumdev_poll = 10000 is expressed in milliseconds and used to call "quorum_device_timer_fn" every quorumdev_poll interval to check if qdiskd is informing cman that the node can use the quorum votes. The same variable is then used in quorum_device_timer_fn, but here it's used as seconds: if (quorum_device->last_hello.tv_sec + quorumdev_poll < now.tv_sec) { so, when the qdisks dies, or the access to the quorum disk is lost it will take more than 2 hours to notify this and recalculate the quorum. After changing the line: ======================================================================== --- cman-2.0.35.orig/cman/daemon/commands.c 2007-01-07 21:01:30.000000000 +0100 +++ cman-2.0.35.patched/cman/daemon/commands.c 2007-01-05 18:12:33.000000000 +0100 @@ -1038,15 +1037,12 @@ static void ccsd_timer_fn(void *arg) static void quorum_device_timer_fn(void *arg) { struct timeval now; if (!quorum_device || quorum_device->state == NODESTATE_DEAD) return; gettimeofday(&now, NULL); - if (quorum_device->last_hello.tv_sec + quorumdev_poll < now.tv_sec) { + if (quorum_device->last_hello.tv_sec + quorumdev_poll/1000 < now.tv_sec) { quorum_device->state = NODESTATE_DEAD; log_msg(LOG_INFO, "lost contact with quorum device\n"); recalculate_quorum(0); ======================================================================== it worked. A more precise fix should be the use if tv_usec/1000 instead of tv_sec. Problem 2) After fixing Problem 1, if I set in the quorumd tag of cluster.conf an interval > quorumdev_poll/1000*2 the quorum is lost then regained over and over as the polling frequency of qdiskd is less than the polling one of cman. Probably the right thing to do is to calculate the value of quorumdev_poll from the ccs return value of "/cluster/quorumd/@interval" and quorumdev_poll=interval*1000*2 should be ok. What do you think about these problems? I'll be happy to fix them providing a full patch. Thanks. Bye! -- Simone Gotti -- Email.it, the professional e-mail, gratis per te: http://www.email.it/f Sponsor: Cerchi un gioiello per te o da regalare? Sfoglia il nostro catalogo on-line e non lasciarti sfuggire le numerose occasioni presenti! Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=5631&d=7-1 From pcaulfie at redhat.com Mon Jan 8 10:17:11 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Mon, 08 Jan 2007 10:17:11 +0000 Subject: [Linux-cluster] [PATCH] Fix fence_agent string not correctly sent over the cluster. In-Reply-To: <1168005737.6322.11.camel@localhost> References: <1168005737.6322.11.camel@localhost> Message-ID: <45A21A27.50005@redhat.com> Simone Gotti wrote: > Hi all, > > on the openais based cman-2.0.35-2.el5 I noticed that the output of > "cman_tool nodes -f" provided a not correctly terminated fence agent > name: > > [root at nodo01 ~]# cman_tool nodes -f > Node Sts Inc Joined Name > 1 M 4 2007-01-05 17:39:27 nodo01 > 2 X 0 nodo02 > Last fenced: 2007-01-05 17:39:41 by fence-node02!? > ^^ > > I think the problem is in the function do_cmd_update_fence_info in > cman/daemon/commands.c that calculate the bytes needed by the message to > send without counting the \0 terminating the fence_agent string. > > I found also another similar problem in another point of the file and I > changed also it, but without testing. > > I made a little patch and I hope it's correct. > Another good patch, now in CVS :) Thank you very much. -- patrick From jparsons at redhat.com Mon Jan 8 13:31:57 2007 From: jparsons at redhat.com (James Parsons) Date: Mon, 08 Jan 2007 08:31:57 -0500 Subject: [Linux-cluster] Cluster.conf In-Reply-To: <200717113630.820770@leena> References: <200717113630.820770@leena> Message-ID: <45A247CD.4050708@redhat.com> isplist at logicore.net wrote: >>>>http://sources.redhat.com/cluster/doc/cluster_schema.html >>>> >>>> > >I've been looking at this and searching the net high and low and just can't >seem to find enough information to build a proper cluster.conf file. I'm >almost sure that it is the cause of some of the problems I am still suffering >months into this cluster learning. > >For example, I've seen all sorts of uses for "method name" but have not found >ONE single document showing/explaining all of the possible choices or why on >each one. > >That goes for MANY other areas of this file. Is there not any documentation >for building this file other than the above? It's just not enough for me at >least. > >Mike > > > Man cluster.conf. The method tag block denotes a fence group level. The only attribute that can be specified for a method tag block is a name attr. This just needs to be distinctive from all other method block names below a specific clusternode fence block. The name can be any string - it just does not matter. system-config-cluster generates unique interger values for each method block and converts them to strings in order to set method name attributes. A method block allows you to group one or more fence types at a specific level of fencing; for example, if you wish to employ power fencing as a first measure for a node, you would insert an initial method block within a clusternode's fence block, referencing power fence devices with instance specific attributes such as port numbers. Let's say that clusternode 'A' has dual power supplies...this is what the xml would look like: While the defauly action for every fence agent is to reboot, the 'option' atr is used above in the case of dual power supply nodes to insure both power supplies are off at the same time - making certain that the node is fenced. Now, let's say that you are paranoid, and that you do not trust your power switch 100%. You can add a second level of fencing (as additional insurance) like so: The above sets up a primary fence method in he first method block. If that block fails to fence the node, then the next block will be attempted. There is no limit that I know of for how many method blocks you wish to employ -- but 1 or 2 is the norm...anymore tends to suggest paranoid tendencies ;-) As additional information on this subject, the following is from the schema doc that is mentioned above: A Note On Fencing Fencing is specified within the cluster.conf file in two places. The first place is within the tag. Any device used for fencing a node must be defined here as a first. This applies to power switches (APC, WTI, etc.) with multiple ports that are able to fence multiple cluster nodes, as well as fabric switches and baseboard management fence strategies (iLO, RSA, IPMI, Drac, etc.) that are usually 1 to 1 in nature; that is, one specified fence device is able to fence only one node. After defining the fence devices to be used in the cluster, it is necessary to associate the fence device listings with specific cluster nodes. The second place that fencing is specified within cluster.conf is within the tag. Beneath the tag, is a tag. Beneath the tag is one or more tag sets. Within a tag set, is a tag set. This is where the actual association between and node takes place. A tag has a required "name" attribute that refers to the name of one of the 's specified in the section of cluster.conf. More about blocks: A method block is like a fence level. If a primary fence method is selected, yet the user wants to define a backup method in case the first fence method fails, this is done by defining two blocks for a cluster node, each with a unique name parameter. The fence daemon will call each fence method in the order they are specified under the tag set. Fence specification within cluster.conf offers one other feature for customizing fence action. Within a block, it is allowable to list more than one . This is useful when fencing a node with redundant power supplies, for example. The fence daemon will run the agent for each device listed within a block before determining success or failure. I hope this sets you up with all that you need. -J > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > > From leo.pleiman at raba.com Mon Jan 8 14:07:01 2007 From: leo.pleiman at raba.com (Leo J Pleiman) Date: Mon, 8 Jan 2007 09:07:01 -0500 Subject: R: R: [Linux-cluster] High system CPU usage in one of a two nodecluster In-Reply-To: <1168034698.5634.4.camel@rei.boston.devel.redhat.com> References: <004901c730b9$5112d460$8ec9100a@nicchio> <1168034698.5634.4.camel@rei.boston.devel.redhat.com> Message-ID: <20070108090701.xy6dau05og488ss4@www.raba.com> I have a 2 node cluster where the nodes seemed to pause for several second every couple minutes, paused in that an interactive session would freeze. I found that 2 of the 8 cpus on each node were 25-50% wait or system all the time. After applying this patch the problem appears to be gone. Thanks! I'm awaiting your 1.9.54 build. -- Leo J Pleiman, RHCE Principal Consultant RABA Technologies 301.763.3527 (office) 410.688.3873 (cell) Quoting Lon Hohberger : > On Fri, 2007-01-05 at 12:04 +0100, Marco Lusini wrote: >> I was looking at Lon's RPMs, and they are (apparently) >> based on rgmanager 1.9.53-1, while the last released >> package is 1.9.54-1... >> Would it be possible to have fixed RPMs compiled wrt the >> last version? > > use .53 for now; I'll build new ones on .54 on Monday. > > -- Lon > > From isplist at logicore.net Mon Jan 8 15:34:26 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Mon, 8 Jan 2007 09:34:26 -0600 Subject: [Linux-cluster] Cluster.conf In-Reply-To: <45A247CD.4050708@redhat.com> Message-ID: <20071893426.356753@leena> Not sure if thanks does it but this is very helpful and will now be part of my own documentation. Thank you! Mike >> I've been looking at this and searching the net high and low and just >> can't seem to find enough information to build a proper cluster.conf file. > Man cluster.conf. > > The method tag block denotes a fence group level. The only attribute > that can be specified for a method tag block is a name attr. This just > needs to be distinctive from all other method block names below a > specific clusternode fence block. The name can be any string - it just > does not matter. system-config-cluster generates unique interger values > for each method block and converts them to strings in order to set > method name attributes. > > A method block allows you to group one or more fence types at a specific > level of fencing; for example, if you wish to employ power fencing as a > first measure for a node, you would insert an initial method block > within a clusternode's fence block, referencing power fence devices with > instance specific attributes such as port numbers. Let's say that > clusternode 'A' has dual power supplies...this is what the xml would > look like: > > > > > > > > > > > > > While the defauly action for every fence agent is to reboot, the > 'option' atr is used above in the case of dual power supply nodes to > insure both power supplies are off at the same time - making certain > that the node is fenced. > > Now, let's say that you are paranoid, and that you do not trust your > power switch 100%. You can add a second level of fencing (as additional > insurance) like so: > > > > > > > > > > > > > > > > The above sets up a primary fence method in he first method block. If > that block fails to fence the node, then the next block will be > attempted. There is no limit that I know of for how many method blocks > you wish to employ -- but 1 or 2 is the norm...anymore tends to suggest > paranoid tendencies ;-) > > As additional information on this subject, the following is from the > schema doc that is mentioned above: > > > A Note On Fencing > > > Fencing is specified within the cluster.conf file in two places. The first > place is within the tag. Any device used for fencing a node > must be defined here as a first. This applies to power > switches (APC, WTI, etc.) with multiple ports that are able to fence > multiple > cluster nodes, as well as fabric switches and baseboard management fence > strategies (iLO, RSA, IPMI, Drac, etc.) that are usually 1 to 1 in nature; > that is, one specified fence device is able to fence only one node. > > After defining the fence devices to be used in the cluster, it is necessary > to > associate the fence device listings with specific cluster nodes. The second > place that fencing is specified within cluster.conf is within the > > tag. Beneath the tag, is a tag. Beneath the > tag is > one or more tag sets. Within a tag set, is a tag > set. > This is where the actual association between and node takes > place. > A tag has a required "name" attribute that refers to the name of > one > of the 's specified in the section of > cluster.conf. > > More about blocks: A method block is like a fence level. If a > primary fence method is selected, yet the user wants to define a backup > method > in case the first fence method fails, this is done by defining two > blocks for a cluster node, each with a unique name parameter. The fence > daemon > will call each fence method in the order they are specified under the > tag set. > > Fence specification within cluster.conf offers one other feature for > customizing fence action. Within a block, it is allowable to list > more than one . This is useful when fencing a node with redundant > power supplies, for example. The fence daemon will run the agent for each > device listed within a block before determining success or failure. > > I hope this sets you up with all that you need. > > -J > >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster From lhh at redhat.com Mon Jan 8 15:43:26 2007 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 08 Jan 2007 10:43:26 -0500 Subject: [Linux-cluster] qdiskd + cman: trying to fix the use of quorumdev_poll. In-Reply-To: <1168198194.4309.19.camel@localhost> References: <1168198194.4309.19.camel@localhost> Message-ID: <1168271006.15369.22.camel@rei.boston.devel.redhat.com> On Sun, 2007-01-07 at 20:29 +0100, Simone Gotti wrote: > Problem 2) > > After fixing Problem 1, if I set in the quorumd tag of cluster.conf an > interval > quorumdev_poll/1000*2 the quorum is lost then regained over > and over as the polling frequency of qdiskd is less than the polling one > of cman. > Probably the right thing to do is to calculate the value of > quorumdev_poll from the ccs return value of "/cluster/quorumd/@interval" > and quorumdev_poll=interval*1000*2 should be ok. I think the poll rate should be closer to (interval * tko * 1000) [10 seconds by default] - and not a function of just the quorum disk interval. This is because after (interval*tko*1000), the master node of the cluster will write an eviction message to a hung node - and that's when qdiskd will either reboot the node or tell CMAN that its votes are no longer valid. I do not think it will cause any problems per se, but dropping qdiskd's votes after ~2 seconds when the qdisk master won't write an eviction notice for another ~8 seconds seems a bit odd. Normal node failure delay should be >= 2*(i*t*1000). There's a parameter in the tag (which defaults to 5,000ms) - which should be 2 * interval * tko * 1000, but I don't recall what it is right now. qdiskd needs to time out before CMAN does. While it doesn't have to be "half or less", it's a good paranoia factor that's easy to remember, and it gives the node plenty of time. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From graeme.crawford at gmail.com Mon Jan 8 18:20:36 2007 From: graeme.crawford at gmail.com (Graeme Crawford) Date: Mon, 8 Jan 2007 20:20:36 +0200 Subject: [Linux-cluster] Removing a node from a running cluster In-Reply-To: References: Message-ID: <326f0a380701081020j16c6366r7e970cc1b76fca04@mail.gmail.com> Next time, run "cman_tool leave" it has a few pre-req's so check the man page. Then a "cman_tool expected vote_num" should sort out your quorum/votes. graeme. On 1/4/07, Pena, Francisco Javier wrote: > Hello, > > I am finding a strange cman behavior when removing a node from a running cluster. The starting point is: > > - 3 nodes running RHEL 4 U4, GFS 6.1 (1 vote per node) > - Quorum disk (4 votes) > > I stop all cluster services on node 3, then modify the cluster.conf file to remove the node (and adjust the quorum disk votes to 3), and then "ccs_tool update" and "cman_tool version -r ". The cluster services keep running, however it looks like cman is not completely in sync with ccsd: > > # ccs_tool lsnode > > Cluster name: TestCluster, config_version: 9 > > Nodename Votes Nodeid Iface Fencetype > gfsnode1 1 1 iLO_NODE1 > gfsnode2 1 2 iLO_NODE2 > > > # cman_tool nodes > > Node Votes Exp Sts Name > 0 4 0 M /dev/emcpowera1 > 1 1 3 M gfsnode1 > 2 1 3 M gfsnode2 > 3 1 3 X gfsnode3 > > # cman_tool status > > Protocol version: 5.0.1 > Config version: 9 > Cluster name: TestCluster > Cluster ID: 62260 > Cluster Member: Yes > Membership state: Cluster-Member > Nodes: 2 > Expected_votes: 3 > Total_votes: 6 > Quorum: 4 > Active subsystems: 9 > Node name: gfsnode1 > Node ID: 1 > Node addresses: A.B.C.D > > CMAN still thinks the third node is part of the cluster, but has just stopped working. In addition to that, it is not updating the number of votes for the quorum disk. If I completely restart the cluster services on all nodes, I get the right information: > > - Correct votes for the quorum disk > - Third node dissappears > - The Expected_votes value is now 2 > > I know from a previous post that two node clusters are a special case, even with quorum disk, but I am pretty sure the same problem will happen with higher node counts (I just do not have enough hardware to test it). > > So, is this considered as a bug or is it expected that the information from removed nodes is still there until the whole cluster is restarted? > > Thanks in advance, > > Javier Pe?a > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From isplist at logicore.net Mon Jan 8 18:31:01 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Mon, 8 Jan 2007 12:31:01 -0600 Subject: [Linux-cluster] Removing a node from a running cluster In-Reply-To: <326f0a380701081020j16c6366r7e970cc1b76fca04@mail.gmail.com> Message-ID: <20071812311.543597@leena> > Next time, run "cman_tool leave" it has a few pre-req's so check the man > page. I have these problems also, trying to shut down the cluster, I get; cman_tool leave; cman_tool: Can't leave cluster while there are 6 active subsystems Mike From lshen at cisco.com Mon Jan 8 18:39:07 2007 From: lshen at cisco.com (Lin Shen (lshen)) Date: Mon, 8 Jan 2007 10:39:07 -0800 Subject: [Linux-cluster] Remove the clusterness from GFS Message-ID: <08A9A3213527A6428774900A80DBD8D803360714@xmb-sjc-222.amer.cisco.com> All we need is a cluster file system to aggregate local disks attached to different nodes into a shared storage pool. GFS+GNBD fits in our requirement nicely except the cluster suite that comes with it. We really don't need/want to turn our system into a cluster by using GFS since we're not very clear about what are the side effects that would bring in. Would it slow down the system more, take up more memory and affect the system bootup and shutdown sequencies etc? How easy is it to remove some or all of the clusterness from GFS such as fencing, cman and ccsd stuff? I understand that things like dlm must stay for GFS to work. Thanks lin From lhh at redhat.com Mon Jan 8 18:50:03 2007 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 08 Jan 2007 13:50:03 -0500 Subject: [Linux-cluster] Remove the clusterness from GFS In-Reply-To: <08A9A3213527A6428774900A80DBD8D803360714@xmb-sjc-222.amer.cisco.com> References: <08A9A3213527A6428774900A80DBD8D803360714@xmb-sjc-222.amer.cisco.com> Message-ID: <1168282204.15369.43.camel@rei.boston.devel.redhat.com> On Mon, 2007-01-08 at 10:39 -0800, Lin Shen (lshen) wrote: > How easy is it to > remove some or all of the clusterness from GFS such as fencing, cman and > ccsd stuff? I understand that things like dlm must stay for GFS to work. I would think it is very difficult. You can use GFS on *one* node without a cluster. In order to use a clustered file system, you need a cluster. The cluster acts as the control mechanism for accessing the file system. Without it, each computer accessing GFS will have no knowledge of when it is safe to write to or read from the file system. This will lead to file system corruption very quickly. If you absolutely can not have a bit of "cluster software running", you'll probably need to use a client/server approach like NFS instead of a cluster file system like GFS. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From isplist at logicore.net Mon Jan 8 18:52:59 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Mon, 8 Jan 2007 12:52:59 -0600 Subject: [Linux-cluster] Removing a node from a running cluster In-Reply-To: <326f0a380701081020j16c6366r7e970cc1b76fca04@mail.gmail.com> Message-ID: <200718125259.606965@leena> Fixed my shut down problems so anyone else having issues, here's how it works. Man for cman_tool says; // leave; Tells CMAN to leave the cluster. You cannot do this if there are subsystems (eg DLM, GFS) active. You should dismount all GFS filesystems, shutdown CLVM, fenced and anything else using the cluster manager before using cman_tool leave. Look at cman_tool status|services to see how many (and which) services are running. \\ Answers all :). Mike From jparsons at redhat.com Mon Jan 8 19:04:45 2007 From: jparsons at redhat.com (James Parsons) Date: Mon, 08 Jan 2007 14:04:45 -0500 Subject: [Linux-cluster] Removing a node from a running cluster In-Reply-To: <200718125259.606965@leena> References: <200718125259.606965@leena> Message-ID: <45A295CD.5030009@redhat.com> isplist at logicore.net wrote: >Fixed my shut down problems so anyone else having issues, here's how it works. > >Man for cman_tool says; > >// >leave; > >Tells CMAN to leave the cluster. You cannot do this if there are subsystems >(eg DLM, GFS) active. >You should dismount all GFS filesystems, shutdown CLVM, fenced and anything >else using the cluster manager before using cman_tool leave. Look at >cman_tool status|services to see how many (and which) services are running. >\\ > >Answers all :). > WARNING: Shameless Promotion -- Conga does all of these things for you in a browser window...there is a dropdown menu on the node page that offers the user the option to have a node leave or join a cluster, completely delete a node, reboot a node, or use the fence subsystem to fence a node. With one mouse click and a confirmation dailog, all neccesary services are checked and shutdown for you and the node is removed/deleted/etc. When you add a new node, you enter the ipaddr/hostname for the new node, and then all necessary packages are yummed and installed, all necessary services started, and a new configuration file reflecting the new node is propagated. What if you add a node two a two-node cluster that does not use quorum disk, you ask? Conga removes the two_node=1 attr from the tag and reminds you that the cluster needs to be restarted...and provides a link to the appropriate cluster page where one mouse click and a confirmation dialog will restart the whole cluster. -J From lshen at cisco.com Mon Jan 8 19:26:11 2007 From: lshen at cisco.com (Lin Shen (lshen)) Date: Mon, 8 Jan 2007 11:26:11 -0800 Subject: [Linux-cluster] Remove the clusterness from GFS In-Reply-To: <1168282204.15369.43.camel@rei.boston.devel.redhat.com> Message-ID: <08A9A3213527A6428774900A80DBD8D80336076E@xmb-sjc-222.amer.cisco.com> > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger > Sent: Monday, January 08, 2007 10:50 AM > To: linux clustering > Subject: Re: [Linux-cluster] Remove the clusterness from GFS > > On Mon, 2007-01-08 at 10:39 -0800, Lin Shen (lshen) wrote: > > How easy is it to > > remove some or all of the clusterness from GFS such as > fencing, cman > > and ccsd stuff? I understand that things like dlm must stay > for GFS to work. > > I would think it is very difficult. > > You can use GFS on *one* node without a cluster. > > In order to use a clustered file system, you need a cluster. > The cluster acts as the control mechanism for accessing the > file system. > Without it, each computer accessing GFS will have no > knowledge of when it is safe to write to or read from the > file system. This will lead to file system corruption very quickly. I thought that's the duty of DLM. > > If you absolutely can not have a bit of "cluster software > running", you'll probably need to use a client/server > approach like NFS instead of a cluster file system like GFS. How about Luster? It's a cluster file system, but seems to me it doesn't require much extra cluster software. Thanks Lin > > -- Lon > From isplist at logicore.net Mon Jan 8 19:35:53 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Mon, 8 Jan 2007 13:35:53 -0600 Subject: [Linux-cluster] Can't leave cluster In-Reply-To: <326f0a380701081020j16c6366r7e970cc1b76fca04@mail.gmail.com> Message-ID: <200718133553.024528@leena> Ok, confusion again... why does this work on one node but not another. They are identical nodes in every way. # more stop_gfs /etc/init.d/httpd stop umount /var/www vgchange -aln /etc/init.d/clvmd stop fence_tool leave /etc/init.d/fenced stop cman_tool leave killall ccsd On some nodes, I'm still getting; cman_tool: Can't leave cluster while there are 1 active subsystems Mike >Fixed my shut down problems so anyone else having issues, here's how >it works. >Man for cman_tool says; >// >leave; >Tells CMAN to leave the cluster. You cannot do this if there are subsystems >(eg DLM, GFS) active. >You should dismount all GFS filesystems, shutdown CLVM, fenced and anything >else using the cluster manager before using cman_tool leave. Look at >cman_tool status|services to see how many (and which) services are >running. >\\ >Answers all :). >Mike From lhh at redhat.com Mon Jan 8 22:45:45 2007 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 08 Jan 2007 17:45:45 -0500 Subject: [Linux-cluster] Remove the clusterness from GFS In-Reply-To: <08A9A3213527A6428774900A80DBD8D80336076E@xmb-sjc-222.amer.cisco.com> References: <08A9A3213527A6428774900A80DBD8D80336076E@xmb-sjc-222.amer.cisco.com> Message-ID: <1168296345.15369.144.camel@rei.boston.devel.redhat.com> On Mon, 2007-01-08 at 11:26 -0800, Lin Shen (lshen) wrote: > > > > If you absolutely can not have a bit of "cluster software > > running", you'll probably need to use a client/server > > approach like NFS instead of a cluster file system like GFS. > > How about Luster? It's a cluster file system, but seems to me it doesn't > require much extra cluster software. Lustre clients do not need to be cluster aware. (Neither do NFS clients.) If you are willing to sacrifice fault tolerance, you can run Lustre without a cluster stack. If you want fault tolerance, you have to go get a third-party cluster stack, like heartbeat (or linux-cluster; but no one's done it AFAIK), to provide the failover. OSS/OST locations are stored in a replicated LDAP database, which you must set up as well. As a side note, I think HP was working on building a (non-Free) metadata server cluster product for Lustre: http://h20311.www2.hp.com/HPC/cache/276636-0-0-0-121.html GFS has no concept of "client" and "server". If you mount a GFS volume, you need to be part of that file system's cluster. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From jvantuyl at engineyard.com Tue Jan 9 07:25:36 2007 From: jvantuyl at engineyard.com (Jayson Vantuyl) Date: Tue, 9 Jan 2007 01:25:36 -0600 Subject: [Linux-cluster] Remove the clusterness from GFS In-Reply-To: <08A9A3213527A6428774900A80DBD8D803360714@xmb-sjc-222.amer.cisco.com> References: <08A9A3213527A6428774900A80DBD8D803360714@xmb-sjc-222.amer.cisco.com> Message-ID: <325D4B40-0CC0-42C8-B96D-D5EAE5BBBC8C@engineyard.com> On Jan 8, 2007, at 12:39 PM, Lin Shen (lshen) wrote: > All we need is a cluster file system to aggregate local disks attached > to different nodes into a shared storage pool. GFS+GNBD fits in our > requirement nicely except the cluster suite that comes with it. We > really don't need/want to turn our system into a cluster by using GFS > since we're not very clear about what are the side effects that would > bring in. Would it slow down the system more, take up more memory and > affect the system bootup and shutdown sequencies etc? How easy is > it to > remove some or all of the clusterness from GFS such as fencing, > cman and > ccsd stuff? I understand that things like dlm must stay for GFS to > work. Dlm must know the nodes in the cluster. It most know when they are there. That's CMAN. It also must have all of the configuration to support knowing that. That's CCSd. GFS must be able to handle a node failure of any kind. That's fencing. Asking to run GFS without CMAN, fencing, and CCSd is like asking to run PHPmyadmin without Apache, PHP, or MySQL. If you aren't sharing the data between two hosts simultaneously, you might try ReiserFS/XFS with CLVM. CLVM still requires the CMAN stack but it doesn't introduce some of the more exciting failure behavior that GFS can. -- Jayson Vantuyl Systems Architect Engine Yard jvantuyl at engineyard.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaap at sara.nl Tue Jan 9 08:42:40 2007 From: jaap at sara.nl (Jaap Dijkshoorn) Date: Tue, 9 Jan 2007 09:42:40 +0100 Subject: [Linux-cluster] Can't leave cluster In-Reply-To: <200718133553.024528@leena> Message-ID: <339554D0FE9DD94A8E5ACE4403676CEB01DA2541@douwes.ka.sara.nl> MIke, > Ok, confusion again... why does this work on one node but not > another. They > are identical nodes in every way. > > # more stop_gfs > > /etc/init.d/httpd stop > umount /var/www > vgchange -aln > /etc/init.d/clvmd stop > fence_tool leave > /etc/init.d/fenced stop > cman_tool leave > killall ccsd > > On some nodes, I'm still getting; > > cman_tool: Can't leave cluster while there are 1 active subsystems > > Mike You should check with cman_tool services(said below), which services are still running/updating/joining etc. It sometimes happen that a service cant be shutdown nicely. You can try to kill daemons by hand with a soft/hard kill. Met vriendelijke groet, Kind Regards, Jaap P. Dijkshoorn Group Leader Cluster Computing Systems Programmer mailto:jaap at sara.nl http://home.sara.nl/~jaapd SARA Computing & Networking Services Kruislaan 415 1098 SJ Amsterdam Tel: +31-(0)20-5923000 Fax: +31-(0)20-6683167 http://www.sara.nl > > > > >Fixed my shut down problems so anyone else having issues, here's how > >it works. > > >Man for cman_tool says; > > >// > >leave; > > >Tells CMAN to leave the cluster. You cannot do this if there > are subsystems > >(eg DLM, GFS) active. > >You should dismount all GFS filesystems, shutdown CLVM, > fenced and anything > >else using the cluster manager before using cman_tool leave. > Look at > >cman_tool status|services to see how many (and which) > services are > >running. > >\\ > > >Answers all :). > > >Mike > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3199 bytes Desc: not available URL: From ramon at vanalteren.nl Tue Jan 9 12:32:07 2007 From: ramon at vanalteren.nl (Ramon van Alteren) Date: Tue, 09 Jan 2007 13:32:07 +0100 Subject: [Linux-cluster] Can't leave cluster In-Reply-To: <200718133553.024528@leena> References: <200718133553.024528@leena> Message-ID: <45A38B47.1060407@vanalteren.nl> isplist at logicore.net wrote: > Ok, confusion again... why does this work on one node but not another. They > are identical nodes in every way. > > # more stop_gfs > > /etc/init.d/httpd stop > umount /var/www > vgchange -aln > /etc/init.d/clvmd stop > fence_tool leave > /etc/init.d/fenced stop > cman_tool leave > killall ccsd > > On some nodes, I'm still getting; > > cman_tool: Can't leave cluster while there are 1 active subsystems > > Check with cman_tool services One that bit me before is *thinking* I unmounted a gfs system but it failed because I was still running nfs which exported one of the gfs filesystems. Ramon From pcaulfie at redhat.com Tue Jan 9 13:34:47 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Tue, 09 Jan 2007 13:34:47 +0000 Subject: [Linux-cluster] qdiskd + cman: trying to fix the use of quorumdev_poll. In-Reply-To: <1168198194.4309.19.camel@localhost> References: <1168198194.4309.19.camel@localhost> Message-ID: <45A399F7.8070008@redhat.com> Simone Gotti wrote: > Hi all, > > I'm using the openais based cman-2.0.35.el5 and I'm trying to understand > how the quorum disk concept is implemented in rhcs, after various > experiments I think that I found at least 2 problems: > > Problem 1) > > Little bug in the quorum disk polling mechanism: > > looking at the code in cman/daemon/commands.c the variable > quorumdev_poll = 10000 is expressed in milliseconds and used to call > "quorum_device_timer_fn" every quorumdev_poll interval to check if > qdiskd is informing cman that the node can use the quorum votes. > > The same variable is then used in quorum_device_timer_fn, but here it's > used as seconds: > > if (quorum_device->last_hello.tv_sec + quorumdev_poll < now.tv_sec) { > > so, when the qdisks dies, or the access to the quorum disk is lost it > will take more than 2 hours to notify this and recalculate the quorum. > > After changing the line: > ======================================================================== > --- cman-2.0.35.orig/cman/daemon/commands.c 2007-01-07 > 21:01:30.000000000 +0100 > +++ cman-2.0.35.patched/cman/daemon/commands.c 2007-01-05 > 18:12:33.000000000 +0100 > @@ -1038,15 +1037,12 @@ static void ccsd_timer_fn(void *arg) > > static void quorum_device_timer_fn(void *arg) > { > struct timeval now; > if (!quorum_device || quorum_device->state == NODESTATE_DEAD) > return; > > gettimeofday(&now, NULL); > - if (quorum_device->last_hello.tv_sec + quorumdev_poll < > now.tv_sec) { > + if (quorum_device->last_hello.tv_sec + quorumdev_poll/1000 < > now.tv_sec) { > quorum_device->state = NODESTATE_DEAD; > log_msg(LOG_INFO, "lost contact with quorum device\n"); > recalculate_quorum(0); > ======================================================================== > Thanks. I've committed that version for now. > it worked. A more precise fix should be the use if tv_usec/1000 instead > of tv_sec. True, it needs to take both into account. For the sake of time I've left the granularity at seconds. -- patrick From pcaulfie at redhat.com Tue Jan 9 13:35:59 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Tue, 09 Jan 2007 13:35:59 +0000 Subject: [Linux-cluster] qdiskd + cman: trying to fix the use of quorumdev_poll. In-Reply-To: <1168271006.15369.22.camel@rei.boston.devel.redhat.com> References: <1168198194.4309.19.camel@localhost> <1168271006.15369.22.camel@rei.boston.devel.redhat.com> Message-ID: <45A39A3F.7060208@redhat.com> Lon Hohberger wrote: > On Sun, 2007-01-07 at 20:29 +0100, Simone Gotti wrote: >> Problem 2) >> >> After fixing Problem 1, if I set in the quorumd tag of cluster.conf an >> interval > quorumdev_poll/1000*2 the quorum is lost then regained over >> and over as the polling frequency of qdiskd is less than the polling one >> of cman. >> Probably the right thing to do is to calculate the value of >> quorumdev_poll from the ccs return value of "/cluster/quorumd/@interval" >> and quorumdev_poll=interval*1000*2 should be ok. > > I think the poll rate should be closer to (interval * tko * 1000) [10 > seconds by default] - and not a function of just the quorum disk > interval. > > This is because after (interval*tko*1000), the master node of the > cluster will write an eviction message to a hung node - and that's when > qdiskd will either reboot the node or tell CMAN that its votes are no > longer valid. > > I do not think it will cause any problems per se, but dropping qdiskd's > votes after ~2 seconds when the qdisk master won't write an eviction > notice for another ~8 seconds seems a bit odd. > > Normal node failure delay should be >= 2*(i*t*1000). There's a > parameter in the tag (which defaults to 5,000ms) - which should > be 2 * interval * tko * 1000, but I don't recall what it is right now. > > qdiskd needs to time out before CMAN does. While it doesn't have to be > "half or less", it's a good paranoia factor that's easy to remember, and > it gives the node plenty of time. lon: do you reckon we need a blocker bug for "problem 1)" ? -- patrick From isplist at logicore.net Tue Jan 9 15:45:43 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Tue, 9 Jan 2007 09:45:43 -0600 Subject: [Linux-cluster] Can't leave cluster In-Reply-To: <339554D0FE9DD94A8E5ACE4403676CEB01DA2541@douwes.ka.sara.nl> Message-ID: <20071994543.553454@leena> Hi there, This is my current shutdown script which works on some servers, not on others. /etc/init.d/httpd stop umount /var/www vgchange -aln /etc/init.d/clvmd stop fence_tool leave /etc/init.d/fenced stop cman_tool leave killall ccsd I run it... Deactivating VG VolGroup01: [ OK ] Deactivating VG VolGroup02: [ OK ] Deactivating VG VolGroup03: [ OK ] Deactivating VG VolGroup04: [ OK ] Stopping clvm: [ OK ] Stopping fence domain: [ OK ] cman_tool: Can't leave cluster while there are 4 active subsystems # cman_tool services Service Name GID LID State Code User: "usrm::manager" 13 6 run - [2 3 4 6 5 7 8] What's usrm::manager? I can't seem to find anything on the redhat site and online searches lead to endless 'stuff'. I'm guessing what ever this is, it's the problem? Mike > You should check with cman_tool services(said below), which services are > still running/updating/joining etc. It sometimes happen that a service > cant be shutdown nicely. You can try to kill daemons by hand with a > soft/hard kill. > > > Met vriendelijke groet, Kind Regards, > > Jaap P. Dijkshoorn > Group Leader Cluster Computing > Systems Programmer > mailto:jaap at sara.nl http://home.sara.nl/~jaapd > > SARA Computing & Networking Services > Kruislaan 415 1098 SJ Amsterdam > Tel: +31-(0)20-5923000 > Fax: +31-(0)20-6683167 > http://www.sara.nl From chawkins at bplinux.com Tue Jan 9 15:55:51 2007 From: chawkins at bplinux.com (Christopher Hawkins) Date: Tue, 9 Jan 2007 10:55:51 -0500 Subject: [Linux-cluster] Can't leave cluster In-Reply-To: <20071994543.553454@leena> Message-ID: <200701091527.l09FRFAA002785@mail2.ontariocreditcorp.com> Are you unmounting the GFS filesystem first? That should be the first thing in your script... > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of > isplist at logicore.net > Sent: Tuesday, January 09, 2007 10:46 AM > To: linux clustering > Subject: RE: [Linux-cluster] Can't leave cluster > > Hi there, > > This is my current shutdown script which works on some > servers, not on others. > > /etc/init.d/httpd stop > umount /var/www > vgchange -aln > /etc/init.d/clvmd stop > fence_tool leave > /etc/init.d/fenced stop > cman_tool leave > killall ccsd > > I run it... > > Deactivating VG VolGroup01: [ OK ] > Deactivating VG VolGroup02: [ OK ] > Deactivating VG VolGroup03: [ OK ] > Deactivating VG VolGroup04: [ OK ] > Stopping clvm: [ OK ] > Stopping fence domain: [ OK ] > cman_tool: Can't leave cluster while there are 4 active subsystems > > # cman_tool services > Service Name GID LID > State Code > User: "usrm::manager" 13 6 run - > [2 3 4 6 5 7 8] > > What's usrm::manager? I can't seem to find anything on the > redhat site and online searches lead to endless 'stuff'. I'm > guessing what ever this is, it's the problem? > > Mike > > > You should check with cman_tool services(said below), which > services > > are still running/updating/joining etc. It sometimes happen that a > > service cant be shutdown nicely. You can try to kill > daemons by hand > > with a soft/hard kill. > > > > > > Met vriendelijke groet, Kind Regards, > > > > Jaap P. Dijkshoorn > > Group Leader Cluster Computing > > Systems Programmer > > mailto:jaap at sara.nl http://home.sara.nl/~jaapd > > > > SARA Computing & Networking Services > > Kruislaan 415 1098 SJ Amsterdam > > Tel: +31-(0)20-5923000 > > Fax: +31-(0)20-6683167 > > http://www.sara.nl > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From isplist at logicore.net Tue Jan 9 16:03:33 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Tue, 9 Jan 2007 10:03:33 -0600 Subject: [Linux-cluster] Can't leave cluster In-Reply-To: <200701091527.l09FRFAA002785@mail2.ontariocreditcorp.com> Message-ID: <20071910333.437232@leena> Yup, it's the second item in my script. On Tue, 9 Jan 2007 10:55:51 -0500, Christopher Hawkins wrote: > Are you unmounting the GFS filesystem first? That should be the first thing > > in your script... > >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of >> isplist at logicore.net >> Sent: Tuesday, January 09, 2007 10:46 AM >> To: linux clustering >> Subject: RE: [Linux-cluster] Can't leave cluster >> >> Hi there, >> >> This is my current shutdown script which works on some >> servers, not on others. >> >> /etc/init.d/httpd stop >> umount /var/www >> vgchange -aln >> /etc/init.d/clvmd stop >> fence_tool leave >> /etc/init.d/fenced stop >> cman_tool leave >> killall ccsd >> >> I run it... >> >> Deactivating VG VolGroup01: [ OK ] >> Deactivating VG VolGroup02: [ OK ] >> Deactivating VG VolGroup03: [ OK ] >> Deactivating VG VolGroup04: [ OK ] >> Stopping clvm: [ OK ] >> Stopping fence domain: [ OK ] >> cman_tool: Can't leave cluster while there are 4 active subsystems >> >> # cman_tool services >> Service Name GID LID >> State Code >> User: "usrm::manager" 13 6 run - >> [2 3 4 6 5 7 8] >> >> What's usrm::manager? I can't seem to find anything on the >> redhat site and online searches lead to endless 'stuff'. I'm >> guessing what ever this is, it's the problem? >> >> Mike >> >>> You should check with cman_tool services(said below), which >> services >>> are still running/updating/joining etc. It sometimes happen that a >>> service cant be shutdown nicely. You can try to kill >> daemons by hand >>> with a soft/hard kill. >>> >>> >>> Met vriendelijke groet, Kind Regards, >>> >>> Jaap P. Dijkshoorn >>> Group Leader Cluster Computing >>> Systems Programmer >>> mailto:jaap at sara.nl http://home.sara.nl/~jaapd >>> >>> SARA Computing & Networking Services >>> Kruislaan 415 1098 SJ Amsterdam >>> Tel: +31-(0)20-5923000 >>> Fax: +31-(0)20-6683167 >>> http://www.sara.nl >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster From rpeterso at redhat.com Tue Jan 9 16:07:54 2007 From: rpeterso at redhat.com (Robert Peterson) Date: Tue, 09 Jan 2007 10:07:54 -0600 Subject: [Linux-cluster] Can't leave cluster In-Reply-To: <20071994543.553454@leena> References: <20071994543.553454@leena> Message-ID: <45A3BDDA.6070807@redhat.com> isplist at logicore.net wrote: > # cman_tool services > Service Name GID LID State Code > User: "usrm::manager" 13 6 run - > [2 3 4 6 5 7 8] > > What's usrm::manager? I can't seem to find anything on the redhat site and > online searches lead to endless 'stuff'. I'm guessing what ever this is, it's > the problem? > > Mike > Hi Mike, That's for rgmanager I think. Perhaps your script should also do: service rgmanager stop Regards, Bob Peterson Red Hat Cluster Suite From chawkins at bplinux.com Tue Jan 9 16:13:44 2007 From: chawkins at bplinux.com (Christopher Hawkins) Date: Tue, 9 Jan 2007 11:13:44 -0500 Subject: [Linux-cluster] Can't leave cluster In-Reply-To: <20071910333.437232@leena> Message-ID: <200701091545.l09Fj7AA003313@mail2.ontariocreditcorp.com> > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of > isplist at logicore.net > Sent: Tuesday, January 09, 2007 11:04 AM > To: linux-cluster > Subject: RE: [Linux-cluster] Can't leave cluster > > Yup, it's the second item in my script. Wow, a serious blonde moment. I have had the same issue from time to time (with starting as well as stopping) if the scripts go too fast. I don't recall which component was being sensitive, but you might try adding a sleep 5 here and there or running the commands manually, but with a good pause between them, and see if that changes anything. From lhh at redhat.com Tue Jan 9 17:01:15 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 09 Jan 2007 12:01:15 -0500 Subject: [Linux-cluster] Remove the clusterness from GFS In-Reply-To: <325D4B40-0CC0-42C8-B96D-D5EAE5BBBC8C@engineyard.com> References: <08A9A3213527A6428774900A80DBD8D803360714@xmb-sjc-222.amer.cisco.com> <325D4B40-0CC0-42C8-B96D-D5EAE5BBBC8C@engineyard.com> Message-ID: <1168362075.15369.190.camel@rei.boston.devel.redhat.com> On Tue, 2007-01-09 at 01:25 -0600, Jayson Vantuyl wrote: > > If you aren't sharing the data between two hosts simultaneously, you > might try ReiserFS/XFS with CLVM. CLVM still requires the CMAN stack > but it doesn't introduce some of the more exciting failure behavior > that GFS can. If you mount a raw disk partition on one node at a time, you don't even need CLVM :) -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From pcaulfie at redhat.com Tue Jan 9 17:10:32 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Tue, 09 Jan 2007 17:10:32 +0000 Subject: [Linux-cluster] Remove the clusterness from GFS In-Reply-To: <1168362075.15369.190.camel@rei.boston.devel.redhat.com> References: <08A9A3213527A6428774900A80DBD8D803360714@xmb-sjc-222.amer.cisco.com> <325D4B40-0CC0-42C8-B96D-D5EAE5BBBC8C@engineyard.com> <1168362075.15369.190.camel@rei.boston.devel.redhat.com> Message-ID: <45A3CC88.8030707@redhat.com> Lon Hohberger wrote: > On Tue, 2007-01-09 at 01:25 -0600, Jayson Vantuyl wrote: > >> If you aren't sharing the data between two hosts simultaneously, you >> might try ReiserFS/XFS with CLVM. CLVM still requires the CMAN stack >> but it doesn't introduce some of the more exciting failure behavior >> that GFS can. > > If you mount a raw disk partition on one node at a time, you don't even > need CLVM :) > ...but you do need a /lot/ of care ... -- patrick From Bowie_Bailey at BUC.com Tue Jan 9 17:17:04 2007 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Tue, 9 Jan 2007 12:17:04 -0500 Subject: [Linux-cluster] Can't leave cluster Message-ID: <4766EEE585A6D311ADF500E018C154E302685071@bnifex.cis.buc.com> Christopher Hawkins wrote: > > Wow, a serious blonde moment. I have had the same issue from time to > time (with starting as well as stopping) if the scripts go too fast. > I don't recall which component was being sensitive, but you might try > adding a sleep 5 here and there or running the commands manually, but > with a good pause between them, and see if that changes anything. I had the same issue with CMAN failing to stop. I found that adding a "sleep 5" before the call to cman_tool in the init script fixed it. -- Bowie From kitgerrits at gmail.com Tue Jan 9 17:23:56 2007 From: kitgerrits at gmail.com (Kit Gerrits) Date: Tue, 9 Jan 2007 18:23:56 +0100 Subject: [Linux-cluster] Cluster software won't start at boot Message-ID: <005501c73412$f1b16920$4c4b3291@kagtqp> Ladies and Gentlemen, I have an interesting issue: I hafe a pair of RHEL 2.1 machines in a cluster. The cluster service is enabled for runlevels 2 - 5 Entering 'service cluster start' as root works fine Oddly enough, the cluster service will -not start- at bootup Any ideas? [root at nzcs1 root]# chkconfig --list |grep cluster cluster 0:off 1:off 2:on 3:on 4:on 5:on 6:off # after manually starting the cluster service: [root at nzcs1 root]# service cluster status cluhbd (pid 4997) is running. clusvcmgrd (pid 4993) is running. cluquorumd (pid 4989) is running. clupowerd (pid 4995) is running. clumibd (pid 4999) is running. cluscand (pid 5003) is running. clurmtabd (pid 5001) is running. [root at nzcs1 root]# grep cluster /var/log/messages Jan 9 12:11:27 nzcs1 cluster[6946]: Shutting down Cluster Manager services Jan 9 12:11:27 nzcs1 cluster[7007]: Completed shutdown of Cluster Manager Jan 9 12:31:26 nzcs1 bigbrother: ^I^IStarting external script bb-rh-cluster.sh Jan 9 12:36:15 nzcs1 cluster[4980]: Starting cluster manager services Any hetlp would be appreciated! Thanks, Kit From hosting at sylconia.nl Tue Jan 9 17:59:39 2007 From: hosting at sylconia.nl (Support @ Sylconia) Date: Tue, 09 Jan 2007 18:59:39 +0100 Subject: [Linux-cluster] performance on a 4 node cluster after 6/7 days Message-ID: <19869.1168365584@sylconia.nl> Dear reader, short version: we are experiencing performance problems after 6/7 days of running time on the non lock master(s). long version: We have the following setup: a 4 node RHCS cluster where node 4 (backend) exports 3 raid disks via gnbd to the other nodes (1-3) The other 3 nodes (frontend) import those exports via gnbd. We have created 4 LV's via clvmd on top of those imported disks. logical volumes /dev/mapper/vg0-tmp 9.7G 1.1M 9.7G 1% /phpsessions /dev/mapper/vg0-config 9.7G 152K 9.7G 1% /config /dev/mapper/vg0-logging 100G 126M 100G 1% /var/log/httpd /dev/mapper/vg0-www 500G 222M 500G 1% /www as lock manager we use lock_dlm with following rpm's installed dlm-kernel-2.6.9-44.3 dlm-1.0.1-1 gfs version gfs_tool -V gfs_tool 6.1.6 (built Aug 25 2006 15:17:50) gnbd version Copyright (C) Red Hat, Inc. 2004-2005 All rights reserved. gnbd_import 1.0.8. (built Nov 14 2006 02:18:52) Copyright (C) Red Hat, Inc. 2004 All rights reserved. cman_tool status Protocol version: 5.0.1 os version centos 4.4 on all nodes all rpm's are from the centos.org website. all nodes are connected via a seperate NIC (GB) and private gigabit VLAN no other network traffic is on this VLAN. Now this is all running fine till 6 or 7 days running time than the nodes which are not lock master are becoming very slow in for example a df command. while df runs the cpu load rises to 4 or 5 and the node is not very responsive (it seems the os hangs for a few seconds) Running the top command at the same time shows 18524 root 15 -10 0 0 0 R 97.4 0.0 1:08.99 dlm_sendd 12959 root 18 0 4184 592 528 R 1.9 0.1 0:00.17 df so i think the problem is in dlm but i do not know how to debug this can someone give me some pointers? I checked /proc/cluster/dlm* but honestly do not know what to look for. regards Constan Sylconia.nl ---- This message was sent via a demo version of - http://atmail.com/ From rhcluster at natecarlson.com Tue Jan 9 18:32:40 2007 From: rhcluster at natecarlson.com (Nate Carlson) Date: Tue, 9 Jan 2007 12:32:40 -0600 (CST) Subject: [Linux-cluster] Upgrading filesystem from gfs -> gfs2 Message-ID: Hello, Just curious - how hard is it to upgrade a filesystem from gfs to gfs2? I'm not finding a FAQ for this anywhere.. :( ------------------------------------------------------------------------ | nate carlson | natecars at natecarlson.com | http://www.natecarlson.com | | depriving some poor village of its idiot since 1981 | ------------------------------------------------------------------------ From jon at levanta.com Tue Jan 9 18:50:53 2007 From: jon at levanta.com (Jonathan Biggar) Date: Tue, 09 Jan 2007 10:50:53 -0800 Subject: [Linux-cluster] Power based fencing in cluster causes single point of failure that can take down a cluster Message-ID: If we set up a cluster and use network power switches for fencing, won't the failure of the power switch attached to a cluster member cause all services that were running on that node to fail to migrate to other cluster members? This seems to happen to us in practice, because fencing the offline member fails due to the power switch being unavailable, so rgmanager never migrates the failed service(s) to another member. Is there a general solution to this problem that I'm missing? -- Jon Biggar Levanta jon at levanta.com 650-403-7252 From jwhiter at redhat.com Tue Jan 9 19:00:58 2007 From: jwhiter at redhat.com (Josef Whiter) Date: Tue, 9 Jan 2007 14:00:58 -0500 Subject: [Linux-cluster] Power based fencing in cluster causes single point of failure that can take down a cluster In-Reply-To: References: Message-ID: <20070109190056.GG21486@korben.rdu.redhat.com> You can either have redundant fence devices, or look into qdisk. Josef On Tue, Jan 09, 2007 at 10:50:53AM -0800, Jonathan Biggar wrote: > If we set up a cluster and use network power switches for fencing, won't > the failure of the power switch attached to a cluster member cause all > services that were running on that node to fail to migrate to other > cluster members? > > This seems to happen to us in practice, because fencing the offline > member fails due to the power switch being unavailable, so rgmanager > never migrates the failed service(s) to another member. > > Is there a general solution to this problem that I'm missing? > > -- > Jon Biggar > Levanta > jon at levanta.com > 650-403-7252 > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From jon at levanta.com Tue Jan 9 19:22:10 2007 From: jon at levanta.com (Jonathan Biggar) Date: Tue, 09 Jan 2007 11:22:10 -0800 Subject: [Linux-cluster] Re: Power based fencing in cluster causes single point of failure that can take down a cluster In-Reply-To: <20070109190056.GG21486@korben.rdu.redhat.com> References: <20070109190056.GG21486@korben.rdu.redhat.com> Message-ID: Josef Whiter wrote: > You can either have redundant fence devices, or look into qdisk. Thanks for the reply. Can you explain how qdisk would solve the problem? It seems to me that the fencing device failing which simultaneously causes the cluster member to fail wouldn't be affected by qdisk. Does qdisk have some feedback mechanism that tells the cluster that it's ok to restart the failed services on another node without fencing being successful? I can't see how that can work reliably and still prevent split brain problems. > On Tue, Jan 09, 2007 at 10:50:53AM -0800, Jonathan Biggar wrote: >> If we set up a cluster and use network power switches for fencing, won't >> the failure of the power switch attached to a cluster member cause all >> services that were running on that node to fail to migrate to other >> cluster members? >> >> This seems to happen to us in practice, because fencing the offline >> member fails due to the power switch being unavailable, so rgmanager >> never migrates the failed service(s) to another member. >> >> Is there a general solution to this problem that I'm missing? -- Jon Biggar Levanta jon at levanta.com 650-403-7252 From rpeterso at redhat.com Tue Jan 9 19:23:20 2007 From: rpeterso at redhat.com (Robert Peterson) Date: Tue, 09 Jan 2007 13:23:20 -0600 Subject: [Linux-cluster] Upgrading filesystem from gfs -> gfs2 In-Reply-To: References: Message-ID: <45A3EBA8.9000907@redhat.com> Nate Carlson wrote: > Hello, > > Just curious - how hard is it to upgrade a filesystem from gfs to gfs2? > > I'm not finding a FAQ for this anywhere.. :( Hi Nate, I wrote a little tool called gfs2_convert whose job is to convert a file system from gfs1 to gfs2. You just do something like: gfs2_convert /your/file/system And after it gives you some warnings and asks you the all-important "are you sure" question, it converts it to gfs2. Pretty simple, really. But bear in mind that gfs2 is still being worked on, so you should not use it for a production box yet. And always--ALWAYS--back up your gfs1 file system before running the tool, because it's a brand new app and who knows; it might have bugs. I tested it, even under conditions where I would interrupt it during critical phases and restart it, etc., so hopefully it won't have problems. And if you do have problems, you know who to open the bugzilla up against. ;) I also recommend that you run gfs_fsck on your file system first, just in case there's some kind of weird fs corruption that might confuse the tool. Regards, Bob Peterson Red Hat Cluster Suite From natecars at natecarlson.com Tue Jan 9 19:54:04 2007 From: natecars at natecarlson.com (Nate Carlson) Date: Tue, 9 Jan 2007 13:54:04 -0600 (CST) Subject: [Linux-cluster] Upgrading filesystem from gfs -> gfs2 In-Reply-To: <45A3EBA8.9000907@redhat.com> References: <45A3EBA8.9000907@redhat.com> Message-ID: On Tue, 9 Jan 2007, Robert Peterson wrote: > I wrote a little tool called gfs2_convert whose job is to convert a file > system from gfs1 to gfs2. You just do something like: > > gfs2_convert /your/file/system > > And after it gives you some warnings and asks you the all-important "are > you sure" question, it converts it to gfs2. Pretty simple, really. > But bear in mind that gfs2 is still being worked on, so you should not > use it for a production box yet. Nifty! That's why I asked - I'm rolling out a new cluster, and wanted to go GFS1 since GFS2 is still "in the works", but wanted to make sure there was an easy upgrade path.. :) > And always--ALWAYS--back up your gfs1 file system before running the > tool, because it's a brand new app and who knows; it might have bugs. > I tested it, even under conditions where I would interrupt it during > critical phases and restart it, etc., so hopefully it won't have > problems. And if you do have problems, you know who to open the > bugzilla up against. ;) *grin* > I also recommend that you run gfs_fsck on your file system first, just > in case there's some kind of weird fs corruption that might confuse the > tool. So I guess it's fairly obvious that the FS needs to be offline? ------------------------------------------------------------------------ | nate carlson | natecars at natecarlson.com | http://www.natecarlson.com | | depriving some poor village of its idiot since 1981 | ------------------------------------------------------------------------ From rpeterso at redhat.com Tue Jan 9 20:21:34 2007 From: rpeterso at redhat.com (Robert Peterson) Date: Tue, 09 Jan 2007 14:21:34 -0600 Subject: [Linux-cluster] Upgrading filesystem from gfs -> gfs2 In-Reply-To: References: <45A3EBA8.9000907@redhat.com> Message-ID: <45A3F94E.2090100@redhat.com> Nate Carlson wrote: > Nifty! That's why I asked - I'm rolling out a new cluster, and wanted > to go GFS1 since GFS2 is still "in the works", but wanted to make sure > there was an easy upgrade path.. :) > > So I guess it's fairly obvious that the FS needs to be offline? Yes, the file system definitely needs to be offline and not mounted by any node. BTW, I should also mention that the gfs2_convert tool won't convert file systems unless they have the default (4K) block size. If you created your file system with different block size, then you can't convert it. This was done purposely because of known GFS2 block size issues. Regards, Bob Peterson Red Hat Cluster Suite From lhh at redhat.com Tue Jan 9 20:42:39 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 09 Jan 2007 15:42:39 -0500 Subject: [Linux-cluster] Remove the clusterness from GFS In-Reply-To: <45A3CC88.8030707@redhat.com> References: <08A9A3213527A6428774900A80DBD8D803360714@xmb-sjc-222.amer.cisco.com> <325D4B40-0CC0-42C8-B96D-D5EAE5BBBC8C@engineyard.com> <1168362075.15369.190.camel@rei.boston.devel.redhat.com> <45A3CC88.8030707@redhat.com> Message-ID: <1168375359.15369.194.camel@rei.boston.devel.redhat.com> On Tue, 2007-01-09 at 17:10 +0000, Patrick Caulfield wrote: > ...but you do need a /lot/ of care ... That is absolutely correct. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Tue Jan 9 20:45:40 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 09 Jan 2007 15:45:40 -0500 Subject: [Linux-cluster] Cluster software won't start at boot In-Reply-To: <005501c73412$f1b16920$4c4b3291@kagtqp> References: <005501c73412$f1b16920$4c4b3291@kagtqp> Message-ID: <1168375540.15369.198.camel@rei.boston.devel.redhat.com> On Tue, 2007-01-09 at 18:23 +0100, Kit Gerrits wrote: > Ladies and Gentlemen, I have an interesting issue: > > I hafe a pair of RHEL 2.1 machines in a cluster. > The cluster service is enabled for runlevels 2 - 5 > Entering 'service cluster start' as root works fine > Oddly enough, the cluster service will -not start- at bootup > > Any ideas? > > [root at nzcs1 root]# chkconfig --list |grep cluster > cluster 0:off 1:off 2:on 3:on 4:on 5:on 6:off > > # after manually starting the cluster service: > [root at nzcs1 root]# service cluster status > cluhbd (pid 4997) is running. > clusvcmgrd (pid 4993) is running. > cluquorumd (pid 4989) is running. > clupowerd (pid 4995) is running. > clumibd (pid 4999) is running. > cluscand (pid 5003) is running. > clurmtabd (pid 5001) is running. > > [root at nzcs1 root]# grep cluster /var/log/messages > Jan 9 12:11:27 nzcs1 cluster[6946]: Shutting down Cluster Manager > services > Jan 9 12:11:27 nzcs1 cluster[7007]: Completed shutdown of Cluster > Manager > Jan 9 12:31:26 nzcs1 bigbrother: ^I^IStarting external script > bb-rh-cluster.sh > Jan 9 12:36:15 nzcs1 cluster[4980]: Starting cluster manager > services I've not seen this before; usually that "just works". Do you have any more information that could help, i.e.: rpm -q clumanager rpm -qV clumanager -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Tue Jan 9 20:49:03 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 09 Jan 2007 15:49:03 -0500 Subject: [Linux-cluster] Power based fencing in cluster causes single point of failure that can take down a cluster In-Reply-To: <20070109190056.GG21486@korben.rdu.redhat.com> References: <20070109190056.GG21486@korben.rdu.redhat.com> Message-ID: <1168375743.15369.202.camel@rei.boston.devel.redhat.com> On Tue, 2007-01-09 at 14:00 -0500, Josef Whiter wrote: > You can either have redundant fence devices, or look into qdisk. Qdisk doesn't obviate fencing confirmation, I'm afraid; in fact, it uses fencing to kill nodes :( I'd check out fence_scsi as a backup fencing device. You're certainly not the first to ask this question. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From srramasw at cisco.com Tue Jan 9 21:04:55 2007 From: srramasw at cisco.com (Sridharan Ramaswamy (srramasw)) Date: Tue, 9 Jan 2007 13:04:55 -0800 Subject: [Linux-cluster] GFS can be used as root filesystem? Message-ID: As anyone attempted to use GFS client on diskless Linux node to act as its root file system? Thinking about the dependencies of GFS, the likes of CMAN, clvmd, gnbd (if needed) should start before GFS during the boot up process. But the concern is CMAN would need to read /etc/cluster/cluster.conf file which won't be available. Other components might need something from filesystem too, like CLVM might look for lvm.conf. Sounds like a chicken & egg problem. As anyone got around these aspects and able to use GFS mount as a root filesystem? Appreciate any ideas on this. thanks, Sridhar -------------- next part -------------- An HTML attachment was scrubbed... URL: From grimme at atix.de Tue Jan 9 21:07:26 2007 From: grimme at atix.de (Marc Grimme) Date: Tue, 9 Jan 2007 22:07:26 +0100 Subject: [Linux-cluster] GFS can be used as root filesystem? In-Reply-To: References: Message-ID: <200701092207.26902.grimme@atix.de> On Tuesday 09 January 2007 22:04, Sridharan Ramaswamy (srramasw) wrote: > As anyone attempted to use GFS client on diskless Linux node to act as > its root file system? > > Thinking about the dependencies of GFS, the likes of CMAN, clvmd, gnbd > (if needed) should start before GFS during the boot up process. But the > concern is CMAN would need to read /etc/cluster/cluster.conf file which > won't be available. Other components might need something from > filesystem too, like CLVM might look for lvm.conf. Sounds like a chicken > & egg problem. > > As anyone got around these aspects and able to use GFS mount as a root > filesystem? > > Appreciate any ideas on this. have a look at www.open-sharedroot.org. Works like a charm. There should also be a HOWTO. Regards Marc. > > thanks, > Sridhar -- Gruss / Regards, Marc Grimme Phone: +49-89 452 3538-14 http://www.atix.de/ http://www.open-sharedroot.org/ ** ATIX - Ges. fuer Informationstechnologie und Consulting mbH Einsteinstr. 10 - 85716 Unterschleissheim - Germany From isplist at logicore.net Tue Jan 9 21:12:25 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Tue, 9 Jan 2007 15:12:25 -0600 Subject: [Linux-cluster] Can't leave cluster In-Reply-To: <45A3BDDA.6070807@redhat.com> Message-ID: <200719151225.638211@leena> Thanks Bob, Can't recall if I replied to this but have one other question. >What's usrm::manager? I can't seem to find anything on the redhat site and >online searches lead to endless 'stuff'. I'm guessing what ever this is, >it's the problem? > That's for rgmanager I think. Perhaps your script should also do: > service rgmanager stop That was indeed what it was. Here is my final shutdown script; service httpd stop umount /var/www vgchange -aln service clvmd stop fence_tool leave service fenced stop service rgmanager stop cman_tool leave killall ccsd Two questions; 1: I probably don't need the last line in there correct? 2: Can I create a new service so that I can run this script to shut things down cleanly when I want to reboot the node? If so, what is the process? Mike From lhh at redhat.com Tue Jan 9 21:15:15 2007 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 09 Jan 2007 16:15:15 -0500 Subject: R: R: [Linux-cluster] High system CPU usage in one of a two nodecluster In-Reply-To: <1168034698.5634.4.camel@rei.boston.devel.redhat.com> References: <004901c730b9$5112d460$8ec9100a@nicchio> <1168034698.5634.4.camel@rei.boston.devel.redhat.com> Message-ID: <1168377315.15369.207.camel@rei.boston.devel.redhat.com> On Fri, 2007-01-05 at 17:04 -0500, Lon Hohberger wrote: > On Fri, 2007-01-05 at 12:04 +0100, Marco Lusini wrote: > > I was looking at Lon's RPMs, and they are (apparently) > > based on rgmanager 1.9.53-1, while the last released > > package is 1.9.54-1... > > Would it be possible to have fixed RPMs compiled wrt the > > last version? > > use .53 for now; I'll build new ones on .54 on Monday. Or.. Tuesday, for those of you who were keeping track. https://bugzilla.redhat.com/bugzilla/process_bug.cgi#c18 -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From isplist at logicore.net Tue Jan 9 21:39:50 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Tue, 9 Jan 2007 15:39:50 -0600 Subject: [Linux-cluster] Can't leave cluster In-Reply-To: <45A3BDDA.6070807@redhat.com> Message-ID: <200719153950.361989@leena> Same problem again on another node. stop-cluster-script service httpd stop umount /var/www vgchange -aln service clvmd stop fence_tool leave service fenced stop service rgmanager stop cman_tool leave killall ccsd I run it and; ]# ./stop_gfs Stopping httpd: [ OK ] Found duplicate PV y6nVM03KVVWs0v68yQVmiGruP5hOSv1z: using /dev/sdd not /dev/sda Found duplicate PV wv0qVlspVX11RBlVI5IKyXLAVoH0eiZ3: using /dev/sde not /dev/sdb Found duplicate PV t9Fwnx7n6vrPpCZ8d3XKyO6V6cIvqeWR: using /dev/sdf not /dev/sdc 0 logical volume(s) in volume group "VolGroup01" now active 0 logical volume(s) in volume group "VolGroup04" now active 0 logical volume(s) in volume group "VolGroup03" now active 0 logical volume(s) in volume group "VolGroup02" now active Found duplicate PV y6nVM03KVVWs0v68yQVmiGruP5hOSv1z: using /dev/sdd not /dev/sda Found duplicate PV wv0qVlspVX11RBlVI5IKyXLAVoH0eiZ3: using /dev/sde not /dev/sdb Found duplicate PV t9Fwnx7n6vrPpCZ8d3XKyO6V6cIvqeWR: using /dev/sdf not /dev/sdc Found duplicate PV y6nVM03KVVWs0v68yQVmiGruP5hOSv1z: using /dev/sdd not /dev/sda Found duplicate PV wv0qVlspVX11RBlVI5IKyXLAVoH0eiZ3: using /dev/sde not /dev/sdb Found duplicate PV t9Fwnx7n6vrPpCZ8d3XKyO6V6cIvqeWR: using /dev/sdf not /dev/sdc Deactivating VG VolGroup01: [ OK ] Deactivating VG VolGroup02: [ OK ] Deactivating VG VolGroup03: [ OK ] Deactivating VG VolGroup04: [ OK ] Stopping clvm: [ OK ] Stopping fence domain: [ OK ] Cluster Service Manager is stopped. cman_tool: Can't leave cluster while there are 2 active subsystems # cman_tool services Service Name GID LID State Code From bfilipek at crscold.com Tue Jan 9 23:11:19 2007 From: bfilipek at crscold.com (Brad Filipek) Date: Tue, 9 Jan 2007 17:11:19 -0600 Subject: [Linux-cluster] General 2-node cluster questions Message-ID: <9C01E18EF3BC2448A3B1A4812EB87D0232837D@SRVEDI.upark.crscold.com> I am in the process of setting up a 2-node cluster with a SAN for data storage. I have a few general questions as this is my first time using RHEL CS. I have two boxes with RHEL4U4 and one application. Should the app be installed locally on both nodes, and have the data on the SAN? Or should the app and the data both be on the SAN? This will be an active/passive config. Also, does the app and data both need to sit on a GFS? Thank you, Brad Filipek Confidentiality Notice: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient or the employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email reply or by telephone and immediately delete this message and any attachments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mparadis at logicore.net Tue Jan 9 23:40:49 2007 From: mparadis at logicore.net (Mike Paradis) Date: Tue, 9 Jan 2007 17:40:49 -0600 Subject: [Linux-cluster] Quick off topic question Message-ID: <200719174049.935516@leena> Can one remotely log bash_history files such as one does with syslog.conf and @192.168.x.x for example? I want to consolidate all of my bash history files onto one of the GFS servers. Mike From isplist at logicore.net Tue Jan 9 23:41:15 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Tue, 9 Jan 2007 17:41:15 -0600 Subject: [Linux-cluster] Quick off topic question Message-ID: <200719174115.718127@leena> Can one remotely log bash_history files such as one does with syslog.conf and @192.168.x.x for example? I want to consolidate all of my bash history files onto one of the GFS servers. Mike From riaan at obsidian.co.za Wed Jan 10 08:52:50 2007 From: riaan at obsidian.co.za (Riaan van Niekerk) Date: Wed, 10 Jan 2007 10:52:50 +0200 Subject: [Linux-cluster] General 2-node cluster questions In-Reply-To: <9C01E18EF3BC2448A3B1A4812EB87D0232837D@SRVEDI.upark.crscold.com> References: <9C01E18EF3BC2448A3B1A4812EB87D0232837D@SRVEDI.upark.crscold.com> Message-ID: <45A4A962.5080800@obsidian.co.za> Brad Filipek wrote: > I am in the process of setting up a 2-node cluster with a SAN for data > storage. I have a few general questions as this is my first time using > RHEL CS. > > > > I have two boxes with RHEL4U4 and one application. Should the app be > installed locally on both nodes, and have the data on the SAN? Or should > the app and the data both be on the SAN? This will be an active/passive > config. > It is up to you to decide if you want the app on the SAN or not. App locally installed If the App is simple and/or part of the OS FS hierarchy (e.g. apache, or not in /opt), you can install/configure it on node1 and copy the configuration across (keeping in mind that you need to manually keep the configs in sync) App installed to shared storage (for illustration purposes I will use Oracle as the clustered app) If the App is complex and goes into a distinct directory you can partition off (e.g. ORACLE_HOME somewhere in /opt/oracle), it might make more sense to have the whole /opt/oracle on the SAN aswell. Any files that belong to or are required by the application (e.g /etc/oratab, initscripts) would have to be copied across manually from one node to the other. If you don't have an easy way of determining which files are located outside of the shared partition, you might have to install the app twice (once on each side), but it might get confused on the second node since it may get confused by the install done on node1. > > > Also, does the app and data both need to sit on a GFS? > For active-passive: they don't need to, but they can be. If you are using an active-passive cluster and only one node at a time will have the application running (and writing to the partition with the data on), you can use ext3. Ext3 is a lot faster than GFS since it does not require the overhead/complexity of a clustered file system. also - a tip - before you configure the app as a clustered service in rgmanager, make sure that it starts up flawlessly on both sides (after manually moving the VIP and filesystem resources from one node to the other). Otherwise, after you configure the app in rgmanager and things dont work, you may have to troubleshoot both the app startup and rgmanager. greetings Riaan -------------- next part -------------- A non-text attachment was scrubbed... Name: riaan.vcf Type: text/x-vcard Size: 310 bytes Desc: not available URL: From kitgerrits at gmail.com Wed Jan 10 11:44:06 2007 From: kitgerrits at gmail.com (Kit Gerrits) Date: Wed, 10 Jan 2007 12:44:06 +0100 Subject: [Linux-cluster] Cluster software won't start at boot Message-ID: <001801c734ac$a26229f0$4c4b3291@kagtqp> From: Lon Hohberger > I've not seen this before; usually that "just works". Do you have any more information that could help, i.e.: > > rpm -q clumanager > rpm -qV clumanager Well, they people that set that system up are a bit strange. They have runlevel 5 as initdefault, but the system does not show a graphical login at boot. (startx works, though) Fyi: [root at nzcs1 root]# rpm -qi clumanager Name : clumanager Relocations: (not relocateable) Version : 1.0.19 Vendor: Red Hat, Inc. Release : 2 Build Date: Mon 23 Dec 2002 10:08:02 PM MET Install date: Thu 03 Jun 2004 02:55:23 PM MET Build Host: bugs.devel.redhat.com ...yadda yadda yadda... We're carefully considering upgrading some parts of the system... The owners have (now) realised the O/S is actually too old to handle the LTO-1 drives correctly. (The fact that the 2.1 install CD wouldn't recognise the drive wasn't enough of a hint) From lhh at redhat.com Wed Jan 10 14:40:01 2007 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 10 Jan 2007 09:40:01 -0500 Subject: [Linux-cluster] General 2-node cluster questions In-Reply-To: <9C01E18EF3BC2448A3B1A4812EB87D0232837D@SRVEDI.upark.crscold.com> References: <9C01E18EF3BC2448A3B1A4812EB87D0232837D@SRVEDI.upark.crscold.com> Message-ID: <1168440001.15369.220.camel@rei.boston.devel.redhat.com> On Tue, 2007-01-09 at 17:11 -0600, Brad Filipek wrote: > I have two boxes with RHEL4U4 and one application. Should the app be > installed locally on both nodes, and have the data on the SAN? Or > should the app and the data both be on the SAN? This will be an > active/passive config. That's a matter of "what works better for you". Either one works. For Oracle 10g, I installed everything on to the SAN. For something like Apache, you could make a new httpd.conf that points the docroot to the SAN mount point (since httpd was already installed on both nodes anyway). > Also, does the app and data both need to sit on a GFS? Not at all. You can use a non-cluster FS if you want (e.g. ext3). Remember that the service only runs on one node at a time; the file system is only mounted in one place at a time, etc. The key is that the file system may only be mounted in one place at a time. RHCS is designed to manage this, but you have to manually mount all this stuff up and bring up the service IP address during configuration / installation yourself. Basically: (a) partition disks (b) set up clvm (if you're going to use LVM in the cluster) (c) mkfs -t ext3 /dev/foo1 (d) mkdir /cluster/service0 (e) mount -t ext3 /dev/foo1 /cluster/service0 (f) ip addr add 192.168.1.2 dev eth0 (g) Install app. Make it use 192.168.1.2 for all traffic, and /cluster/service0 for data. (h) Start app; give it a test from your clients. (g) Stop app. (h) ip addr del 192.168.1.2 dev eth0 (j) umount /cluster/service0 (k) [configure RHCS service] -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Wed Jan 10 14:43:46 2007 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 10 Jan 2007 09:43:46 -0500 Subject: [Linux-cluster] Cluster software won't start at boot In-Reply-To: <001801c734ac$a26229f0$4c4b3291@kagtqp> References: <001801c734ac$a26229f0$4c4b3291@kagtqp> Message-ID: <1168440226.15369.224.camel@rei.boston.devel.redhat.com> On Wed, 2007-01-10 at 12:44 +0100, Kit Gerrits wrote: > From: Lon Hohberger > > I've not seen this before; usually that "just works". Do you have any > more information that could help, i.e.: > > > > rpm -q clumanager > > rpm -qV clumanager > > Well, they people that set that system up are a bit strange. > They have runlevel 5 as initdefault, but the system does not show a > graphical login at boot. > (startx works, though) That's weird, but certainly not the problem. Although, it might be related somehow... clumanager doesn't start, and X doesn't start, but both *should*. > Fyi: > [root at nzcs1 root]# rpm -qi clumanager > Name : clumanager Relocations: (not relocateable) > Version : 1.0.19 Vendor: Red Hat, Inc. > Release : 2 Build Date: Mon 23 Dec 2002 > 10:08:02 PM MET > Install date: Thu 03 Jun 2004 02:55:23 PM MET Build Host: > bugs.devel.redhat.com :o That's kind of old; there's an updated 1.0.28 package on RHN. However, the init script didn't change much ... if at all. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Wed Jan 10 14:45:14 2007 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 10 Jan 2007 09:45:14 -0500 Subject: [Linux-cluster] Quick off topic question In-Reply-To: <200719174115.718127@leena> References: <200719174115.718127@leena> Message-ID: <1168440314.15369.226.camel@rei.boston.devel.redhat.com> On Tue, 2007-01-09 at 17:41 -0600, isplist at logicore.net wrote: > Can one remotely log bash_history files such as one does with syslog.conf and > @192.168.x.x for example? > > I want to consolidate all of my bash history files onto one of the GFS > servers. > > Mike Nope - not that I'm aware of. The easiest way to do this is to just put home directories for each user NFS or GFS (the latter requires the client to be on the SAN and part of the cluster, of course). -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From fedele at fis.unical.it Wed Jan 10 15:13:06 2007 From: fedele at fis.unical.it (Fedele Stabile) Date: Wed, 10 Jan 2007 16:13:06 +0100 Subject: [Linux-cluster] Cluster for number-crunching purposes Message-ID: <45A50282.7060205@fis.unical.it> I have a new 35-nodes cluster with a SAN for data storage, my SAN is connected via SCSI with two nodes. OS is CentOS4 with ClusterSuite Cluster purpose is numer-crinching: SAN disks are GFS and exported via gnbd to the other 33 nodes in the cluster. Configuration file cluster.conf is below. This is my first cluster configured with the ClusterSuite, can anyone help me to understand if i made any mistake? Thank you Fedele STABILE /etc/cluster/cluster.conf ..... ..... ..... ..... From bfilipek at crscold.com Wed Jan 10 15:24:51 2007 From: bfilipek at crscold.com (Brad Filipek) Date: Wed, 10 Jan 2007 09:24:51 -0600 Subject: [Linux-cluster] General 2-node cluster questions Message-ID: <9C01E18EF3BC2448A3B1A4812EB87D0232840A@SRVEDI.upark.crscold.com> Hi Riaan and Lon, Thanks for your replies. The App we use is called PRO5 and uses SSH to run. The users SSH into the RHEL4 box, and their .bash_profile fires up the PRO5 app which is located at /basis/pro5/pro5. Their .bash_profile files look like this: ============================ # .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs PATH=$PATH:$HOME/bin export PATH unset USERNAME umask 0000 TERM=vt220;export TERM TERMCAP=/basis/pro5/termcap;export TERMCAP cd /basis/pro5 ./pro5 -tT001 /live/cf.src/PGMSYS9999 exit ============================ Since I will be using an active/passive config in this scenario, would I be able to install both PRO5 and it's data on an ext3 partition located on the SAN? Would I even need to have a GFS partition at all? Obviously SSH would run locally on each node. Thanks again, Brad Filipek -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Riaan van Niekerk Sent: Wednesday, January 10, 2007 2:53 AM To: linux clustering Subject: [] - Re: [Linux-cluster] General 2-node cluster questions - Email found in subject Brad Filipek wrote: > I am in the process of setting up a 2-node cluster with a SAN for data > storage. I have a few general questions as this is my first time using > RHEL CS. > > > > I have two boxes with RHEL4U4 and one application. Should the app be > installed locally on both nodes, and have the data on the SAN? Or should > the app and the data both be on the SAN? This will be an active/passive > config. > It is up to you to decide if you want the app on the SAN or not. App locally installed If the App is simple and/or part of the OS FS hierarchy (e.g. apache, or not in /opt), you can install/configure it on node1 and copy the configuration across (keeping in mind that you need to manually keep the configs in sync) App installed to shared storage (for illustration purposes I will use Oracle as the clustered app) If the App is complex and goes into a distinct directory you can partition off (e.g. ORACLE_HOME somewhere in /opt/oracle), it might make more sense to have the whole /opt/oracle on the SAN aswell. Any files that belong to or are required by the application (e.g /etc/oratab, initscripts) would have to be copied across manually from one node to the other. If you don't have an easy way of determining which files are located outside of the shared partition, you might have to install the app twice (once on each side), but it might get confused on the second node since it may get confused by the install done on node1. > > > Also, does the app and data both need to sit on a GFS? > For active-passive: they don't need to, but they can be. If you are using an active-passive cluster and only one node at a time will have the application running (and writing to the partition with the data on), you can use ext3. Ext3 is a lot faster than GFS since it does not require the overhead/complexity of a clustered file system. also - a tip - before you configure the app as a clustered service in rgmanager, make sure that it starts up flawlessly on both sides (after manually moving the VIP and filesystem resources from one node to the other). Otherwise, after you configure the app in rgmanager and things dont work, you may have to troubleshoot both the app startup and rgmanager. greetings Riaan Confidentiality Notice: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient or the employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email reply or by telephone and immediately delete this message and any attachments. From breeves at redhat.com Wed Jan 10 15:48:45 2007 From: breeves at redhat.com (Bryn M. Reeves) Date: Wed, 10 Jan 2007 15:48:45 +0000 Subject: [Linux-cluster] Cluster software won't start at boot In-Reply-To: <1168440226.15369.224.camel@rei.boston.devel.redhat.com> References: <001801c734ac$a26229f0$4c4b3291@kagtqp> <1168440226.15369.224.camel@rei.boston.devel.redhat.com> Message-ID: <45A50ADD.5030008@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Lon Hohberger wrote: >> Well, they people that set that system up are a bit strange. >> They have runlevel 5 as initdefault, but the system does not show a >> graphical login at boot. >> (startx works, though) > > That's weird, but certainly not the problem. Although, it might be > related somehow... clumanager doesn't start, and X doesn't start, but > both *should*. Sounds like inittab weirdness - I saw symptoms like this a few times while teaching class when students would do stuff like: id:5:initdefault: # System initialization. si::sysinit:/etc/rc.d/rc.sysinit l0:0:wait:/etc/rc.d/rc 0 l1:1:wait:/etc/rc.d/rc 1 l2:2:wait:/etc/rc.d/rc 2 l3:3:wait:/etc/rc.d/rc 3 l4:4:wait:/etc/rc.d/rc 4 l5:5:wait:/etc/rc.d/rc 3 <------ l6:6:wait:/etc/rc.d/rc 6 ... When attempting to change their default runlevel. It makes life kinda exciting if things are disabled (K links) in rc3.d but enabled (S links) in rc5.d - runlevel/who -r etc. report one thing, but the services started are those belonging to the other runlevel. It's also worth checking grub.conf incase they've overridden initdefault from the kernel command line. Kind regards, Bryn. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFFpQrd6YSQoMYUY94RAmX9AKCS1jCPfc6nGiawlmCbed0Uy/oFOwCePqYZ 0n1dGAgZcJZy4AdwGrG2Uuc= =gM4S -----END PGP SIGNATURE----- From isplist at logicore.net Wed Jan 10 15:55:07 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Wed, 10 Jan 2007 09:55:07 -0600 Subject: [Linux-cluster] Quick off topic question In-Reply-To: <1168440314.15369.226.camel@rei.boston.devel.redhat.com> Message-ID: <20071109557.814030@leena> > Nope - not that I'm aware of. The easiest way to do this is to just put > home directories for each user NFS or GFS (the latter requires the > client to be on the SAN and part of the cluster, of course). Unfortunately, that would defeat the purpose behind my wanting to remotely log the activity. Mike From axehind007 at yahoo.com Wed Jan 10 16:00:03 2007 From: axehind007 at yahoo.com (Brian Pontz) Date: Wed, 10 Jan 2007 08:00:03 -0800 (PST) Subject: [Linux-cluster] Quick off topic question In-Reply-To: <20071109557.814030@leena> Message-ID: <140844.91703.qm@web33215.mail.mud.yahoo.com> --- "isplist at logicore.net" wrote: > > Nope - not that I'm aware of. The easiest way to > do this is to just put > > home directories for each user NFS or GFS (the > latter requires the > > client to be on the SAN and part of the cluster, > of course). > > Unfortunately, that would defeat the purpose behind > my wanting to remotely log > the activity. You can do this through syslog but it would require you to modify the kernel code and recompile it. You would basically printk() all exec's in the kernel. Otherwise the honeynet project would probably be the best people to ask about this. Brian From maarten.boot at mbu.hr Wed Jan 10 16:03:17 2007 From: maarten.boot at mbu.hr (Maarten Boot) Date: Wed, 10 Jan 2007 17:03:17 +0100 Subject: [Linux-cluster] Quick off topic question Message-ID: Or recompiling bash to use syslog next to bash history on exec -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Brian Pontz Sent: Wednesday, January 10, 2007 5:00 PM To: linux-cluster at redhat.com Subject: Re: [Linux-cluster] Quick off topic question --- "isplist at logicore.net" wrote: > > Nope - not that I'm aware of. The easiest way to > do this is to just put > > home directories for each user NFS or GFS (the > latter requires the > > client to be on the SAN and part of the cluster, > of course). > > Unfortunately, that would defeat the purpose behind > my wanting to remotely log > the activity. You can do this through syslog but it would require you to modify the kernel code and recompile it. You would basically printk() all exec's in the kernel. Otherwise the honeynet project would probably be the best people to ask about this. Brian -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From isplist at logicore.net Wed Jan 10 16:06:44 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Wed, 10 Jan 2007 10:06:44 -0600 Subject: [Linux-cluster] Quick off topic question In-Reply-To: <140844.91703.qm@web33215.mail.mud.yahoo.com> Message-ID: <200711010644.372396@leena> Interesting, if a user can modify the history file, you can't see what's been done. I honestly thought this was something pretty basic for security and wanted to add this in all my nodes for a single history file. Mike > You can do this through syslog but it would require > you to modify the kernel code and recompile it. You > would basically printk() all exec's in the kernel. > Otherwise the honeynet project would probably be the > best people to ask about this. From simone.gotti at email.it Wed Jan 10 16:08:10 2007 From: simone.gotti at email.it (Simone Gotti) Date: Wed, 10 Jan 2007 17:08:10 +0100 Subject: [Linux-cluster] [PATCH] qdisk: fix crash or wrong behavior if "qdisk_read" returns an error. Message-ID: <1168445290.4361.24.camel@localhost> In qdisk/main.c:read_node_blocks if the call to "qdisk_read" returns an error, the cycle isn't interrupted and the call swab_status_block_t will make qdiskd crash or report bad node id, master status etc.... This will probably (not reproduced) cause strange behavior like this node trying to kill the others that are working correctly. I putted a "continue" to skip the cycle after the error. As if nothing about the node can be read it's better to not change the current informations. I hope the patch is correct. Thanks. Bye! ============================================================================= [2725] warning: Error reading node ID block 1 [2725] warning: Error reading node ID block 2 [2725] warning: Error reading node ID block 3 [2725] warning: Error reading node ID block 4 [2725] warning: Error reading node ID block 5 [2725] warning: Error reading node ID block 6 [2725] warning: Error reading node ID block 7 [2725] warning: Error reading node ID block 8 [2725] warning: Error reading node ID block 9 [2725] warning: Error reading node ID block 10 [2725] warning: Error reading node ID block 11 [2725] warning: Error reading node ID block 12 [2725] warning: Error reading node ID block 13 [2725] warning: Error reading node ID block 14 [2725] warning: Error reading node ID block 15 [2725] warning: Error reading node ID block 16 [2725] debug: Node 16777216 is UP [2725] crit: A master exists, but it's not me?! diskRawWriteShadow: Input/output error diskRawWriteShadow: aligned write returned -1, not 512 diskRawWriteShadow: Input/output error Error writing node ID block 1 [2725] err: Error writing to quorum disk Node ID: 1 Score (current / min req. / max allowed): 1 / 1 / 1 Current state: Master Current disk state: None Visible Set: { 16777216 } Master Node ID: 16777216 Quorate Set: { 16777216 33554432 50331648 67108864 83886080 100663296 117440512 134217728 150994944 167772160 184549376 201326592 218103808 234881024 251658240 268435456 } [2725] warning: Error reading node ID block 1 [2725] warning: Error reading node ID block 2 [2725] warning: Error reading node ID block 3 [2725] warning: Error reading node ID block 4 [2725] warning: Error reading node ID block 5 [2725] warning: Error reading node ID block 6 [2725] warning: Error reading node ID block 7 [2725] warning: Error reading node ID block 8 [2725] warning: Error reading node ID block 9 [2725] warning: Error reading node ID block 10 [2725] warning: Error reading node ID block 11 [2725] warning: Error reading node ID block 12 [2725] warning: Error reading node ID block 13 [2725] warning: Error reading node ID block 14 [2725] warning: Error reading node ID block 15 [2725] warning: Error reading node ID block 16 [2725] info: Node 1 is the master [2725] crit: Critical Error: More than one master found! diskRawWriteShadow: Input/output error diskRawWriteShadow: aligned write returned -1, not 512 diskRawWriteShadow: Input/output error Error writing node ID block 1 [2725] err: Error writing to quorum disk Node ID: 1 Score (current / min req. / max allowed): 1 / 1 / 1 Current state: Master Current disk state: None Visible Set: { 1 } Master Node ID: 1 Quorate Set: { 1 } [2725] warning: Error reading node ID block 1 [2725] warning: Error reading node ID block 2 [2725] warning: Error reading node ID block 3 [2725] warning: Error reading node ID block 4 [2725] warning: Error reading node ID block 5 [2725] warning: Error reading node ID block 6 [2725] warning: Error reading node ID block 7 [2725] warning: Error reading node ID block 8 [2725] warning: Error reading node ID block 9 [2725] warning: Error reading node ID block 10 [2725] warning: Error reading node ID block 11 [2725] warning: Error reading node ID block 12 [2725] warning: Error reading node ID block 13 [2725] warning: Error reading node ID block 14 [2725] warning: Error reading node ID block 15 [2725] warning: Error reading node ID block 16 [2725] crit: A master exists, but it's not me?! diskRawWriteShadow: Input/output error diskRawWriteShadow: aligned write returned -1, not 512 diskRawWriteShadow: Input/output error Error writing node ID block 1 [2725] err: Error writing to quorum disk Node ID: 1 Score (current / min req. / max allowed): 1 / 1 / 1 Current state: Master Current disk state: None Visible Set: { 16777216 } Master Node ID: 16777216 Quorate Set: { 16777216 33554432 50331648 67108864 83886080 100663296 117440512 134217728 150994944 167772160 184549376 201326592 218103808 234881024 251658240 268435456 } [2725] warning: Error reading node ID block 1 [2725] warning: Error reading node ID block 2 [2725] warning: Error reading node ID block 3 [2725] warning: Error reading node ID block 4 [2725] warning: Error reading node ID block 5 [2725] warning: Error reading node ID block 6 [2725] warning: Error reading node ID block 7 [2725] warning: Error reading node ID block 8 [2725] warning: Error reading node ID block 9 [2725] warning: Error reading node ID block 10 [2725] warning: Error reading node ID block 11 [2725] warning: Error reading node ID block 12 [2725] warning: Error reading node ID block 13 [2725] warning: Error reading node ID block 14 [2725] warning: Error reading node ID block 15 [2725] warning: Error reading node ID block 16 [2725] crit: Critical Error: More than one master found! diskRawWriteShadow: Input/output error diskRawWriteShadow: aligned write returned -1, not 512 diskRawWriteShadow: Input/output error Error writing node ID block 1 [2725] err: Error writing to quorum disk Node ID: 1 Score (current / min req. / max allowed): 1 / 1 / 1 Current state: Master Current disk state: None Visible Set: { 1 } Master Node ID: 1 Quorate Set: { 1 } [2725] warning: Error reading node ID block 1 [2725] warning: Error reading node ID block 2 [2725] warning: Error reading node ID block 3 [2725] warning: Error reading node ID block 4 [2725] warning: Error reading node ID block 5 [2725] warning: Error reading node ID block 6 [2725] warning: Error reading node ID block 7 [2725] warning: Error reading node ID block 8 [2725] warning: Error reading node ID block 9 [2725] warning: Error reading node ID block 10 [2725] warning: Error reading node ID block 11 [2725] warning: Error reading node ID block 12 [2725] warning: Error reading node ID block 13 [2725] warning: Error reading node ID block 14 [2725] warning: Error reading node ID block 15 [2725] warning: Error reading node ID block 16 [2725] crit: A master exists, but it's not me?! diskRawWriteShadow: Input/output error diskRawWriteShadow: aligned write returned -1, not 512 diskRawWriteShadow: Input/output error Error writing node ID block 1 [2725] err: Error writing to quorum disk Node ID: 1 Score (current / min req. / max allowed): 1 / 1 / 1 Current state: Master Current disk state: None Visible Set: { 16777216 } Master Node ID: 16777216 Quorate Set: { 16777216 33554432 50331648 67108864 83886080 100663296 117440512 134217728 150994944 167772160 184549376 201326592 218103808 234881024 251658240 268435456 } -- Simone Gotti -- Email.it, the professional e-mail, gratis per te: http://www.email.it/f Sponsor: Refill s.r.l. - Prodotti per TUTTE le stampanti sul mercato a prezzi sempre convenienti. Dal 1993, leader nel compatibile di qualit? in Italia. Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=5188&d=10-1 -------------- next part -------------- A non-text attachment was scrubbed... Name: cman-2.0.35-qdisk-read_node_blocks-continue-on-disk-error.patch Type: text/x-patch Size: 461 bytes Desc: not available URL: From breeves at redhat.com Wed Jan 10 16:34:15 2007 From: breeves at redhat.com (Bryn M. Reeves) Date: Wed, 10 Jan 2007 16:34:15 +0000 Subject: [Linux-cluster] Quick off topic question In-Reply-To: <140844.91703.qm@web33215.mail.mud.yahoo.com> References: <140844.91703.qm@web33215.mail.mud.yahoo.com> Message-ID: <45A51587.1020807@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Brian Pontz wrote: >> Unfortunately, that would defeat the purpose behind >> my wanting to remotely log >> the activity. > > You can do this through syslog but it would require > you to modify the kernel code and recompile it. You > would basically printk() all exec's in the kernel. > Otherwise the honeynet project would probably be the > best people to ask about this. Isn't that what the audit subsystem is designed for? No need for custom kernels - just set some audit rules to monitor execs and parse the auditd output. This still won't be a perfect replacement for bash_history though as it will loose some detail of the arguments. That said, if this is purely for security monitoring and not to have a list of commands and their arguments for re-play purposes (that's the goal of a shell history), I think audit would be the most straightforward solution. You need to set up two files to configure auditing, auditd.conf and audit.rules. The first governs the daemon itself, the second tells it what to audit. There's currently no direct support for syslog-style @host remote logging, but there is a "dispatcher" directive in auditd.conf that will run an external command when audit starts and pipe each message to that program's stdin - a simple wrapper would then be able to squirt the messages to a remote server if needed. Alternately, make /var/log/audit a separate filesystem on GFS and write the logs here. That will probably need some twiddling as I think auditd normally starts before GFS filesystems are mounted but shouldn't be impossible. A simple audit rule to get started could look like this: - -a exit,always -S exec You can be more specific about what to log, filter by uid, pid and other attributes - see the auditctl man page for the details as well as the sample rule files under /usr/share/doc/audit-*/. One word of warning - it's possible to DoS yourself in a couple of ways with audit. The default behavior when audit cannot create its logs is to panic - this is for high security environments where no service is better than insecure service. Disable it by setting "-f 0" or "-f 1" in the rules file (silent/printk on error respectively). Also, the volume of messages can be huge with a very broad ruleset - be sure to allow enough space for the logs and to configure rotation if needed. More info here: http://people.redhat.com/sgrubb/audit/ Cheers, Bryn. > Brian > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFFpRWH6YSQoMYUY94RAtxrAKCfBPDO2dcLLx8lWy/7gQbagM5KDACfYfX/ WOWcQ/oJTQ/JA8z7Uitx8lA= =Dey9 -----END PGP SIGNATURE----- From kagato at souja.net Wed Jan 10 17:15:09 2007 From: kagato at souja.net (Jayson Vantuyl) Date: Wed, 10 Jan 2007 11:15:09 -0600 Subject: [Linux-cluster] Cluster for number-crunching purposes In-Reply-To: <45A50282.7060205@fis.unical.it> References: <45A50282.7060205@fis.unical.it> Message-ID: <83A2D7D7-125F-4586-A4AE-A0BB37F78ADD@souja.net> I think having 20 votes per node with cman expecting 1 vote could completely break quorum calculation (although it would appear to work just fine until you had a network failure). On Jan 10, 2007, at 9:13 AM, Fedele Stabile wrote: > I have a new 35-nodes cluster with a SAN for data storage, my SAN > is connected via SCSI with two nodes. > OS is CentOS4 with ClusterSuite > Cluster purpose is numer-crinching: > SAN disks are GFS and exported via gnbd to the other 33 nodes in > the cluster. > Configuration file cluster.conf is below. > > This is my first cluster configured with the ClusterSuite, can > anyone help me to understand if i made any mistake? > > Thank you > > Fedele STABILE > > > /etc/cluster/cluster.conf > > > > > > > > > > > > > > > > > > > > > > > nodename="pc0"/> > > > > ..... > ..... > > > > > servers="server1 server2"/> > > > > ..... > ..... > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From fedele at fis.unical.it Wed Jan 10 17:50:14 2007 From: fedele at fis.unical.it (Fedele Stabile) Date: Wed, 10 Jan 2007 18:50:14 +0100 Subject: [Linux-cluster] Cluster for number-crunching purposes In-Reply-To: <83A2D7D7-125F-4586-A4AE-A0BB37F78ADD@souja.net> References: <45A50282.7060205@fis.unical.it> <83A2D7D7-125F-4586-A4AE-A0BB37F78ADD@souja.net> Message-ID: <45A52756.8000302@fis.unical.it> I experienced that using vote=1 for all members gives the same quorum votes as result of command cman_tool status Instead i woulk create a quorum disk on SAN storage. Can you help me? Jayson Vantuyl wrote: > I think having 20 votes per node with cman expecting 1 vote could > completely break quorum calculation (although it would appear to work > just fine until you had a network failure). > > On Jan 10, 2007, at 9:13 AM, Fedele Stabile wrote: > >> I have a new 35-nodes cluster with a SAN for data storage, my SAN is >> connected via SCSI with two nodes. >> OS is CentOS4 with ClusterSuite >> Cluster purpose is numer-crinching: >> SAN disks are GFS and exported via gnbd to the other 33 nodes in the >> cluster. >> Configuration file cluster.conf is below. >> >> This is my first cluster configured with the ClusterSuite, can anyone >> help me to understand if i made any mistake? >> >> Thank you >> >> Fedele STABILE >> >> >> /etc/cluster/cluster.conf >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > nodename="pc0"/> >> >> >> >> ..... >> ..... >> >> >> >> >> > servers="server1 server2"/> >> >> >> >> ..... >> ..... >> >> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > From lhh at redhat.com Wed Jan 10 17:56:57 2007 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 10 Jan 2007 12:56:57 -0500 Subject: [Linux-cluster] [PATCH] qdisk: fix crash or wrong behavior if "qdisk_read" returns an error. In-Reply-To: <1168445290.4361.24.camel@localhost> References: <1168445290.4361.24.camel@localhost> Message-ID: <1168451817.15369.228.camel@rei.boston.devel.redhat.com> On Wed, 2007-01-10 at 17:08 +0100, Simone Gotti wrote: > In qdisk/main.c:read_node_blocks if the call to "qdisk_read" returns an > error, the cycle isn't interrupted and the call swab_status_block_t will > make qdiskd crash or report bad node id, master status etc.... This will > probably (not reproduced) cause strange behavior like this node trying > to kill the others that are working correctly. > > I putted a "continue" to skip the cycle after the error. As if nothing > about the node can be read it's better to not change the current > informations. > > I hope the patch is correct. It looks right. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From Andre at hudat.com Wed Jan 10 18:12:00 2007 From: Andre at hudat.com (Andre Henry) Date: Wed, 10 Jan 2007 13:12:00 -0500 Subject: [Linux-cluster] ccsd problems Message-ID: I have a two node cluster that has been humming along without issues for over a year. Reboots crashes no problems. Restart and all is well. I had a SCSI error yesterday now node 2 will not even start ccsd. All seems ok with packages, nics, kernel, modules. The system has been rebooted in the past. No info other than "Unable to connect to cluster infrastructure" printed in logs. An strace seems to show its using IPv6 to connect to the other node. I have tried passing the -I and -4 option with no luck. -- Andre From kitgerrits at gmail.com Wed Jan 10 18:14:58 2007 From: kitgerrits at gmail.com (Kit Gerrits) Date: Wed, 10 Jan 2007 19:14:58 +0100 Subject: [Linux-cluster] Quick off topic question Message-ID: <005d01c734e3$3cf79eb0$4c4b3291@kagtqp> Keep in mind, that Bash does some interesting tricks with its bash_history. (like maintaining a single history per session and fusing them afterwards). It might be a good idea to mail&wipe the .bash_history file upon logout. If you want to use the .bash_history file for autiding: Some O/S'es / filesystems allow write-only access to files. This would make sure the user cannot 'edit' the file to remove any traces. (This is usually limited to /var/log, so I don't know if it can be applied to a single file) Regards, Kit Gerrits From kitgerrits at gmail.com Wed Jan 10 18:18:37 2007 From: kitgerrits at gmail.com (Kit Gerrits) Date: Wed, 10 Jan 2007 19:18:37 +0100 Subject: [Linux-cluster] Cluster software won't start at boot Message-ID: <005e01c734e3$bf7ba340$4c4b3291@kagtqp> Lon Hohberger wrote: >>> Well, they people that set that system up are a bit strange. >>> They have runlevel 5 as initdefault, but the system does not show a >>> graphical login at boot. >>> (startx works, though) >> >> That's weird, but certainly not the problem. Although, it might be >> related somehow... clumanager doesn't start, and X doesn't start, but >> both *should*. > >Sounds like inittab weirdness - I saw symptoms like this a few times while teaching class when students would do stuff like: > >id:5:initdefault: > ># System initialization. >si::sysinit:/etc/rc.d/rc.sysinit > >l0:0:wait:/etc/rc.d/rc 0 >l1:1:wait:/etc/rc.d/rc 1 >l2:2:wait:/etc/rc.d/rc 2 >l3:3:wait:/etc/rc.d/rc 3 >l4:4:wait:/etc/rc.d/rc 4 >l5:5:wait:/etc/rc.d/rc 3 <------ >l6:6:wait:/etc/rc.d/rc 6 >... > >When attempting to change their default runlevel. > That trick sounds horribly familiar... ( Sorry, NDA ;-) ) Checked, but this is not the case (It's much more fun to redirect runlevels to 6 or 0 :-D [root at nzcs1 etc]# grep :5 /etc/inittab id:5:initdefault: l5:5:wait:/etc/rc.d/rc 5 x:5:respawn:/etc/X11/prefdm -nodaemon >It makes life kinda exciting if things are disabled (K links) in rc3.d but enabled (S links) in rc5.d - runlevel/who -r etc. report >one thing, but the services started are those belonging to the other runlevel. That's cute, checked and passed: [root at nzcs1 etc]# find . -type l -ls |grep cluster 2796355 0 lrwxrwxrwx 1 root root 17 Jun 3 2004 ./rc.d/rc0.d/K01cluster -> ../init.d/cluster 2976480 0 lrwxrwxrwx 1 root root 17 Jun 3 2004 ./rc.d/rc1.d/K01cluster -> ../init.d/cluster 3009017 0 lrwxrwxrwx 1 root root 17 Jun 3 2004 ./rc.d/rc2.d/S99cluster -> ../init.d/cluster 3026727 0 lrwxrwxrwx 1 root root 17 Jun 3 2004 ./rc.d/rc3.d/S99cluster -> ../init.d/cluster 3042076 0 lrwxrwxrwx 1 root root 17 Jun 3 2004 ./rc.d/rc4.d/S99cluster -> ../init.d/cluster 3058185 0 lrwxrwxrwx 1 root root 17 Jun 16 2005 ./rc.d/rc5.d/S99cluster -> ../init.d/cluster 3090980 0 lrwxrwxrwx 1 root root 17 Jun 3 2004 ./rc.d/rc6.d/K01cluster -> ../init.d/cluster >It's also worth checking grub.conf incase they've overridden initdefault from the kernel command line. Foiled again! >From /boot/grub/grub.conf: default=13 fallback=10 # This entry (no. 10) added by Proliant HBA install script title HP-2.4.9-e.24enterprise-1 root (hd0,0) kernel /vmlinuz-2.4.9-e.24enterprise ro root=/dev/cciss/c0d0p5 hda=ide-scsi initrd /HP-initrd-2.4.9-e.24enterprise.img # This entry (no. 13) added by Proliant HBA install script title HP-2.4.9-e.24enterprise-2 root (hd0,0) kernel /vmlinuz-2.4.9-e.24enterprise ro root=/dev/cciss/c0d0p5 hda=ide-scsi initrd /HP-initrd-2.4.9-e.24enterprise.img-0 (I'm still trying to figure out WTF HP did with my grub.conf) I -DO- appreciate all the help I have received so far. This is an interesting little trick they must have pulled... Regards, Kit From lhh at redhat.com Wed Jan 10 18:41:16 2007 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 10 Jan 2007 13:41:16 -0500 Subject: [Linux-cluster] ccsd problems In-Reply-To: References: Message-ID: <1168454476.15369.230.camel@rei.boston.devel.redhat.com> On Wed, 2007-01-10 at 13:12 -0500, Andre Henry wrote: > I have a two node cluster that has been humming along without issues > for over a year. Reboots crashes no problems. Restart and all is well. > I had a SCSI error yesterday now node 2 will not even start ccsd. All > seems ok with packages, nics, kernel, modules. The system has been > rebooted in the past. > > No info other than "Unable to connect to cluster infrastructure" > printed in logs. An strace seems to show its using IPv6 to connect to > the other node. I have tried passing the -I and -4 option with no luck. Is it RHEL3 or RHEL4 ? -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Wed Jan 10 19:00:10 2007 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 10 Jan 2007 14:00:10 -0500 Subject: [Linux-cluster] Cluster software won't start at boot In-Reply-To: <005e01c734e3$bf7ba340$4c4b3291@kagtqp> References: <005e01c734e3$bf7ba340$4c4b3291@kagtqp> Message-ID: <1168455610.15369.232.camel@rei.boston.devel.redhat.com> On Wed, 2007-01-10 at 19:18 +0100, Kit Gerrits wrote: > That's cute, checked and passed: > [root at nzcs1 etc]# find . -type l -ls |grep cluster > 2796355 0 lrwxrwxrwx 1 root root 17 Jun 3 2004 > ./rc.d/rc0.d/K01cluster -> ../init.d/cluster > 2976480 0 lrwxrwxrwx 1 root root 17 Jun 3 2004 > ./rc.d/rc1.d/K01cluster -> ../init.d/cluster > 3009017 0 lrwxrwxrwx 1 root root 17 Jun 3 2004 > ./rc.d/rc2.d/S99cluster -> ../init.d/cluster > 3026727 0 lrwxrwxrwx 1 root root 17 Jun 3 2004 > ./rc.d/rc3.d/S99cluster -> ../init.d/cluster > 3042076 0 lrwxrwxrwx 1 root root 17 Jun 3 2004 > ./rc.d/rc4.d/S99cluster -> ../init.d/cluster > 3058185 0 lrwxrwxrwx 1 root root 17 Jun 16 2005 > ./rc.d/rc5.d/S99cluster -> ../init.d/cluster > 3090980 0 lrwxrwxrwx 1 root root 17 Jun 3 2004 > ./rc.d/rc6.d/K01cluster -> ../init.d/cluster Maybe /etc/init.d/cluster isn't mode 755 for some reason ? *shrug* -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mvz+rhcluster at nimium.hr Wed Jan 10 19:47:09 2007 From: mvz+rhcluster at nimium.hr (Miroslav Zubcic) Date: Wed, 10 Jan 2007 20:47:09 +0100 Subject: [Linux-cluster] RH Cluster doesn't pass basic acceptance tests - bug in fenced? Message-ID: <45A542BD.9070107@nimium.hr> Hi people. I had this problem in spring last year while configuring one RH cluster for local telco. RH tehnical support was not very useful. They told me this is not a bug and so on ... So I will like to ask this here on RH cluster list, in hope for better advice. When I have 2-node cluster with RSA II management cards (fence_rsa agent) configured to have 1 oracle database in failover together with VIP adress and 5 luns shared from EMC storage, how can I pass one simple test with pooling out main data ethernet cables from active node? Let's say that I have interface bond0 (data subnet/vlan) and bond1 (fence subnet/vlan) on each node. Our customers (and we also, it is logical) are expecting if we pull out all two data cables from bond0 that inactive node will kill/fence active node and take over it's services. Unfortunately, what we see almost every time on acceptance test is that two nodes are killing each other no matter if they have or does not have a link. Here is fragment from /var/adm/messages on the active node when I disable bond0 (by pooling out cables): --------------------------------------------------------------------- Jan 9 14:05:43 north clurgmgrd: [4593]: Link for bond0: Not detected Jan 9 14:05:43 north clurgmgrd: [4593]: No link on bond0... Jan 9 14:05:43 north clurgmgrd[4593]: status on ip "10.156.10.32/26" returned 1 (generic error) Jan 9 14:05:43 north clurgmgrd[4593]: Stopping service ora_PROD Jan 9 14:05:53 north kernel: CMAN: removing node south from the cluster : Missed too many heartbeats Jan 9 14:05:53 north fenced[4063]: north not a cluster member after 0 sec post_fail_delay Jan 9 14:05:53 north fenced[4063]: fencing node "south" Jan 9 14:05:55 north shutdown: shutting down for system halt Jan 9 14:05:55 north init: Switching to runlevel: 0 Jan 9 14:05:55 north login(pam_unix)[4599]: session closed for user root Jan 9 14:05:56 north rgmanager: [4270]: Shutting down Cluster Service Manager... Jan 9 14:05:56 north clurgmgrd[4593]: Shutting down Jan 9 14:05:56 north fenced[4063]: fence "south" success [...] Jan 9 14:11:19 north syslogd 1.4.1: restart. ---------------------------------------------------------- As we see here, clurgmgrd(8) on node "north" has DETECTED that there is no link, it began to stop service "ora_PROD", system goes in shutdown. So far, so good. But then, fenced(8) daemon decides to fence "south" node (healthy node which has data link and all presupositions to take over ora_PROD service (oracle + IP + 5 ext3 FS's from EMC storage)! Why? Of course, south also is fenceing north, and I then have tragicomic situation where both nodes are beeing rebooted by eacs other. How can I prevent this? This looks like a bug. I don't want fenced to fence other node south if it already "knows" that it is the one without link. What to do? We cannot pass acceptance tests with such cluster state. :-( Thanks for any advice ... -- Miroslav Zubcic, Nimium d.o.o., email: Tel: +385 01 4852 639, Fax: +385 01 4852 640, Mobile: +385 098 942 8672 Mrazoviceva 12, 10000 Zagreb, Hrvatska From Andre at hudat.com Wed Jan 10 19:49:40 2007 From: Andre at hudat.com (Andre Henry) Date: Wed, 10 Jan 2007 14:49:40 -0500 Subject: [Linux-cluster] ccsd problems In-Reply-To: <1168454476.15369.230.camel@rei.boston.devel.redhat.com> References: <1168454476.15369.230.camel@rei.boston.devel.redhat.com> Message-ID: <1b5633ce2dba2364638f2b786860162e@hudat.com> RHEL4 -- Thanks Andre On Jan 10, 2007, at 1:41 PM, Lon Hohberger wrote: > On Wed, 2007-01-10 at 13:12 -0500, Andre Henry wrote: >> I have a two node cluster that has been humming along without issues >> for over a year. Reboots crashes no problems. Restart and all is well. >> I had a SCSI error yesterday now node 2 will not even start ccsd. All >> seems ok with packages, nics, kernel, modules. The system has been >> rebooted in the past. >> >> No info other than "Unable to connect to cluster infrastructure" >> printed in logs. An strace seems to show its using IPv6 to connect to >> the other node. I have tried passing the -I and -4 option with no >> luck. > > Is it RHEL3 or RHEL4 ? > > -- Lon > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From breeves at redhat.com Wed Jan 10 19:59:29 2007 From: breeves at redhat.com (Bryn M. Reeves) Date: Wed, 10 Jan 2007 19:59:29 +0000 Subject: [Linux-cluster] Quick off topic question In-Reply-To: <005d01c734e3$3cf79eb0$4c4b3291@kagtqp> References: <005d01c734e3$3cf79eb0$4c4b3291@kagtqp> Message-ID: <45A545A1.9090808@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Kit Gerrits wrote: > Keep in mind, that Bash does some interesting tricks with its bash_history. > (like maintaining a single history per session and fusing them afterwards). > > It might be a good idea to mail&wipe the .bash_history file upon logout. > > > If you want to use the .bash_history file for autiding: > Some O/S'es / filesystems allow write-only access to files. > This would make sure the user cannot 'edit' the file to remove any traces. > (This is usually limited to /var/log, so I don't know if it can be applied > to a single file) > Ext3 allows something close to this. Using its extended attributes you can mark a file as append only (chattr +a ). Only the root account can add/remove this attr. It doesn't seem to play to well when the history fills up though - if I set HISTFILESIZE and HISTSIZE both to 10, after 10 history items have accumulated it ceases to record anything. I don't think trying to use the shell history as a security audit is really going to fly. Kind regards, Bryn. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFFpUWg6YSQoMYUY94RAodyAJwPqvhL6kjsuNtk+41fjCTTm42WCQCfePBG Ej02a3O1mY8reqbN/8KqRDM= =mSYq -----END PGP SIGNATURE----- From jwhiter at redhat.com Wed Jan 10 20:16:49 2007 From: jwhiter at redhat.com (Josef Whiter) Date: Wed, 10 Jan 2007 15:16:49 -0500 Subject: [Linux-cluster] RH Cluster doesn't pass basic acceptance tests - bug in fenced? In-Reply-To: <45A542BD.9070107@nimium.hr> References: <45A542BD.9070107@nimium.hr> Message-ID: <20070110201648.GE24326@korben.rdu.redhat.com> On Wed, Jan 10, 2007 at 08:47:09PM +0100, Miroslav Zubcic wrote: > Hi people. > > I had this problem in spring last year while configuring one RH cluster > for local telco. RH tehnical support was not very useful. They told me > this is not a bug and so on ... So I will like to ask this here on RH > cluster list, in hope for better advice. > > When I have 2-node cluster with RSA II management cards (fence_rsa agent) > configured to have 1 oracle database in failover together with VIP adress > and 5 luns shared from EMC storage, how can I pass one simple test with > pooling out main data ethernet cables from active node? > > Let's say that I have interface bond0 (data subnet/vlan) and bond1 (fence > subnet/vlan) on each node. Our customers (and we also, it is logical) are > expecting if we pull out all two data cables from bond0 that inactive node > will kill/fence active node and take over it's services. > > Unfortunately, what we see almost every time on acceptance test is that > two nodes are killing each other no matter if they have or does not have a > link. > > Here is fragment from /var/adm/messages on the active node when I disable > bond0 (by pooling out cables): > > --------------------------------------------------------------------- > Jan 9 14:05:43 north clurgmgrd: [4593]: Link for bond0: Not > detected > Jan 9 14:05:43 north clurgmgrd: [4593]: No link on bond0... > > Jan 9 14:05:43 north clurgmgrd[4593]: status on ip > "10.156.10.32/26" returned 1 (generic error) > Jan 9 14:05:43 north clurgmgrd[4593]: Stopping service ora_PROD > Jan 9 14:05:53 north kernel: CMAN: removing node south from the cluster : > Missed too many heartbeats > Jan 9 14:05:53 north fenced[4063]: north not a cluster member after 0 sec > post_fail_delay > Jan 9 14:05:53 north fenced[4063]: fencing node "south" > Jan 9 14:05:55 north shutdown: shutting down for system halt > Jan 9 14:05:55 north init: Switching to runlevel: 0 > Jan 9 14:05:55 north login(pam_unix)[4599]: session closed for user root > Jan 9 14:05:56 north rgmanager: [4270]: Shutting down Cluster > Service Manager... > Jan 9 14:05:56 north clurgmgrd[4593]: Shutting down > Jan 9 14:05:56 north fenced[4063]: fence "south" success > > [...] > > Jan 9 14:11:19 north syslogd 1.4.1: restart. > ---------------------------------------------------------- > > As we see here, clurgmgrd(8) on node "north" has DETECTED that there is no > link, it began to stop service "ora_PROD", system goes in shutdown. So > far, so good. But then, fenced(8) daemon decides to fence "south" node > (healthy node which has data link and all presupositions to take over > ora_PROD service (oracle + IP + 5 ext3 FS's from EMC storage)! Why? > > Of course, south also is fenceing north, and I then have tragicomic > situation where both nodes are beeing rebooted by eacs other. > > How can I prevent this? This looks like a bug. I don't want fenced to > fence other node south if it already "knows" that it is the one without link. > > What to do? We cannot pass acceptance tests with such cluster state. :-( > > Thanks for any advice ... > This isn't a bug, its working as expected. What you need in qdisk, set it up with the proper hueristics and it will force the shutdown of the bad node before the bad node has a chance to fence off the working node. Josef From jvantuyl at engineyard.com Wed Jan 10 21:17:02 2007 From: jvantuyl at engineyard.com (Jayson Vantuyl) Date: Wed, 10 Jan 2007 15:17:02 -0600 Subject: [Linux-cluster] Quick off topic question In-Reply-To: <45A545A1.9090808@redhat.com> References: <005d01c734e3$3cf79eb0$4c4b3291@kagtqp> <45A545A1.9090808@redhat.com> Message-ID: In bash, shell history can be disabled with the command: unset HISTFILE It wasn't intended to be and isn't suitable for any form of security tracking. Not to mention that at any point the intruder could manually execute a non-interactive shell which wouldn't log either. I'd really recommend the auditing infrastructure. On Jan 10, 2007, at 1:59 PM, Bryn M. Reeves wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Kit Gerrits wrote: >> Keep in mind, that Bash does some interesting tricks with its >> bash_history. >> (like maintaining a single history per session and fusing them >> afterwards). >> >> It might be a good idea to mail&wipe the .bash_history file upon >> logout. >> >> >> If you want to use the .bash_history file for autiding: >> Some O/S'es / filesystems allow write-only access to files. >> This would make sure the user cannot 'edit' the file to remove any >> traces. >> (This is usually limited to /var/log, so I don't know if it can be >> applied >> to a single file) >> > > Ext3 allows something close to this. Using its extended attributes you > can mark a file as append only (chattr +a ). Only the root > account > can add/remove this attr. > > It doesn't seem to play to well when the history fills up though - > if I > set HISTFILESIZE and HISTSIZE both to 10, after 10 history items have > accumulated it ceases to record anything. > > I don't think trying to use the shell history as a security audit is > really going to fly. > > Kind regards, > > Bryn. > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.5 (GNU/Linux) > Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org > > iD8DBQFFpUWg6YSQoMYUY94RAodyAJwPqvhL6kjsuNtk+41fjCTTm42WCQCfePBG > Ej02a3O1mY8reqbN/8KqRDM= > =mSYq > -----END PGP SIGNATURE----- > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Jayson Vantuyl Systems Architect Engine Yard jvantuyl at engineyard.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jvantuyl at engineyard.com Wed Jan 10 21:22:35 2007 From: jvantuyl at engineyard.com (Jayson Vantuyl) Date: Wed, 10 Jan 2007 15:22:35 -0600 Subject: [Linux-cluster] RH Cluster doesn't pass basic acceptance tests - bug in fenced? In-Reply-To: <45A542BD.9070107@nimium.hr> References: <45A542BD.9070107@nimium.hr> Message-ID: <79A61925-4F6E-4947-863B-DCA0BD4C3A74@engineyard.com> On Jan 10, 2007, at 1:47 PM, Miroslav Zubcic wrote: > How can I prevent this? This looks like a bug. I don't want fenced to > fence other node south if it already "knows" that it is the one > without link. It is very possible to write a wrapper script for your fencing agent that simply checks the link and refuses to fence when the link is down. However, this wouldn't really be good in any non-network related fencing situation. The recommendation to use a qdiskd would be a good one and you could even use a link detection script as an arbitrary heuristic in this case. -- Jayson Vantuyl Systems Architect Engine Yard jvantuyl at engineyard.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From lshen at cisco.com Wed Jan 10 23:00:54 2007 From: lshen at cisco.com (Lin Shen (lshen)) Date: Wed, 10 Jan 2007 15:00:54 -0800 Subject: [Linux-cluster] Remove the clusterness from GFS In-Reply-To: <1168282204.15369.43.camel@rei.boston.devel.redhat.com> Message-ID: <08A9A3213527A6428774900A80DBD8D8033E77FA@xmb-sjc-222.amer.cisco.com> > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger > Sent: Monday, January 08, 2007 10:50 AM > To: linux clustering > Subject: Re: [Linux-cluster] Remove the clusterness from GFS > > On Mon, 2007-01-08 at 10:39 -0800, Lin Shen (lshen) wrote: > > How easy is it to > > remove some or all of the clusterness from GFS such as > fencing, cman > > and ccsd stuff? I understand that things like dlm must stay > for GFS to work. > > I would think it is very difficult. > > You can use GFS on *one* node without a cluster. > > In order to use a clustered file system, you need a cluster. > The cluster acts as the control mechanism for accessing the > file system. > Without it, each computer accessing GFS will have no > knowledge of when it is safe to write to or read from the > file system. This will lead to file system corruption very quickly. > > If you absolutely can not have a bit of "cluster software > running", you'll probably need to use a client/server > approach like NFS instead of a cluster file system like GFS. > > It's not that we discriminate against cluster software :). We just have some worries about the potential impact the cluster suite could bring to the system. Extra CPU and memory cost is ok, we can consider that's part of running GFS. The part that gets us wonder is any potential behavioral changes and instability to the system. After all, the system is effectively tunrned into a cluster. I read some of the emails in the alias about cluster issues aside from GFS. For instance, we support hot removal/insertion of nodes in the system, I'm not clear how fencing will get in the way. We're not planning to add any fencing hardware, and most likely will set fencing mechanism as manual. Ideally, we'd like to disable fencing except the part that is needed for running GFS. lin From irwan at magnifix.com.my Thu Jan 11 04:10:05 2007 From: irwan at magnifix.com.my (Mohd Irwan Jamaluddin) Date: Thu, 11 Jan 2007 12:10:05 +0800 Subject: [Linux-cluster] Fencing Problem On APC 7950 Message-ID: <1168488605.26513.28.camel@kuli.magnifix.com.my> Good day, I'm running Red Hat Cluster Suite on RHEL 4 U3 with APC 7950 ( http://www.apc.com/resource/include/techspec_index.cfm?base_sku=AP7950 ) as the fencing device. The version of my fence package is 1.32.18-0. I try to execute some fence_apc commands but I've got errors. Below are the details: [root at orarac01 ~]# fence_apc -a 10.0.6.150 -l apc -p apc -n 02 \ > -o Reboot -T -v failed: unrecognised Reboot response Same error occured for On/Off option. Also, I found a weird problem if I put "2" instead of "02" for the Outlet Number option (-n option). Below are the error message: [root at orarac01 ~]# fence_apc -a 10.0.6.150 -l apc -p apc -n 2 \ > -o Reboot -T -v failed: unrecognised menu response H Anyone have ever faced similar problem with me? Here I attached the apclog for reference. Thanks in advanced for your response. -- Regards, +--------------------------------+ | Mohd Irwan Jamaluddin | | ## System Engineer, | | (o_ Magnifix Sdn. Bhd. | | //\ Tel: +60 3 42705073 | | V_/_ Fax: +60 3 42701960 | | http://www.magnifix.com/ | +--------------------------------+ | "Every successful side needs | | unsung heroes" - fcbayern.de | +--------------------------------+ -------------- next part -------------- User Name : apc Password : *** American Power Conversion Network Management Card AOS v2.2.7 (c) Copyright 2002 All Rights Reserved Rack PDU APP v2.2.0 ------------------------------------------------------------------------------- Name : APC for TMNet Date : 06/30/2002 Contact : Mohd Irwan Jamaluddin Time : 10:31:03 Location : Bilik Server Magnifix User : Administrator Up Time : 0 Days 15 Hours 22 Minutes Stat : P+ N+ A+ Switched Rack PDU: Communication Established ------- Control Console ------------------------------------------------------- 1- Device Manager 2- Network 3- System 4- Logout - Main Menu, - Refresh, - Event Log > ------- Control Console ------------------------------------------------------- 1- Device Manager 2- Network 3- System 4- Logout - Main Menu, - Refresh, - Event Log > American Power Conversion Network Management Card AOS v2.2.7 (c) Copyright 2002 All Rights Reserved Rack PDU APP v2.2.0 ------------------------------------------------------------------------------- Name : APC for TMNet Date : 06/30/2002 Contact : Mohd Irwan Jamaluddin Time : 10:31:03 Location : Bilik Server Magnifix User : Administrator Up Time : 0 Days 15 Hours 22 Minutes Stat : P+ N+ A+ Switched Rack PDU: Communication Established ------- Control Console ------------------------------------------------------- 1- Device Manager 2- Network 3- System 4- Logout - Main Menu, - Refresh, - Event Log > ------- Control Console ------------------------------------------------------- 1- Device Manager 2- Network 3- System 4- Logout - Main Menu, - Refresh, - Event Log > 1 ------- Device Manager -------------------------------------------------------- 1- Phase Monitor/Configuration 2- Outlet Restriction Configuration 3- Outlet Control/Configuration 4- Power Supply Status - Back, - Refresh, - Event Log > 3 ------- Outlet Control/Configuration ------------------------------------------ 1- Outlet 01: Outlet 1 ON 2- Outlet 02: 2 ON 3- Outlet 03: Outlet 3 ON 4- Outlet 04: Outlet 4 ON 5- Outlet 05: Outlet 5 ON 6- Outlet 06: Outlet 6 ON 7- Outlet 07: Outlet 7 ON 8- Outlet 08: Outlet 8 ON 9- Outlet 09: Outlet 9 ON 10- Outlet 10: Outlet 10 ON 11- Outlet 11: Outlet 11 ON 12- Outlet 12: Outlet 12 ON 13- Outlet 13: Outlet 13 ON 14- Outlet 14: Outlet 14 ON 15- Outlet 15: Outlet 15 ON 16- Outlet 16: Outlet 16 ON 17- Master Control/Configuration - Back, - Refresh, - Event Log > 2 ------- Outlet 02: 2 ---------------------------------------------------------- Name : 2 Outlet : 2 State : ON 1- Control Outlet 2 2- Configure Outlet 2 ?- Help, - Back, - Refresh, - Event Log > 1 ------- Control Outlet -------------------------------------------------------- Name : 2 Outlet : 2 State : ON 1- Immediate On 2- Immediate Off 3- Immediate Reboot 4- Delayed On 5- Delayed Off 6- Delayed Reboot 7- Cancel ?- Help, - Back, - Refresh, - Event Log > 3 ----------------------------------------------------------------------- Immediate Reboot This command will immediately shutdown outlet 2 named '2', delay for 5 seconds, and then restart. Enter 'YES' to continue or to cancel : Press to continue... ------- Control Outlet -------------------------------------------------------- Name : 2 Outlet : 2 State : ON 1- Immediate On 2- Immediate Off 3- Immediate Reboot 4- Delayed On 5- Delayed Off 6- Delayed Reboot 7- Cancel ?- Help, - Back, - Refresh, - Event Log > ------- Outlet 02: 2 ---------------------------------------------------------- Name : 2 Outlet : 2 State : ON 1- Control Outlet 2 2- Configure Outlet 2 ?- Help, - Back, - Refresh, - Event Log > ------- Outlet Control/Configuration ------------------------------------------ 1- Outlet 01: Outlet 1 ON 2- Outlet 02: 2 ON 3- Outlet 03: Outlet 3 ON 4- Outlet 04: Outlet 4 ON 5- Outlet 05: Outlet 5 ON 6- Outlet 06: Outlet 6 ON 7- Outlet 07: Outlet 7 ON 8- Outlet 08: Outlet 8 ON 9- Outlet 09: Outlet 9 ON 10- Outlet 10: Outlet 10 ON 11- Outlet 11: Outlet 11 ON 12- Outlet 12: Outlet 12 ON 13- Outlet 13: Outlet 13 ON 14- Outlet 14: Outlet 14 ON 15- Outlet 15: Outlet 15 ON 16- Outlet 16: Outlet 16 ON 17- Master Control/Configuration - Back, - Refresh, - Event Log > ------- Device Manager -------------------------------------------------------- 1- Phase Monitor/Configuration 2- Outlet Restriction Configuration 3- Outlet Control/Configuration 4- Power Supply Status - Back, - Refresh, - Event Log > ------- Control Console ------------------------------------------------------- 1- Device Manager 2- Network 3- System 4- Logout - Main Menu, - Refresh, - Event Log > From jvantuyl at engineyard.com Thu Jan 11 06:09:53 2007 From: jvantuyl at engineyard.com (Jayson Vantuyl) Date: Thu, 11 Jan 2007 00:09:53 -0600 Subject: [Linux-cluster] Remove the clusterness from GFS In-Reply-To: <08A9A3213527A6428774900A80DBD8D8033E77FA@xmb-sjc-222.amer.cisco.com> References: <08A9A3213527A6428774900A80DBD8D8033E77FA@xmb-sjc-222.amer.cisco.com> Message-ID: > It's not that we discriminate against cluster software :). We just > have > some worries about the potential impact the cluster suite could > bring to > the system. Extra CPU and memory cost is ok, we can consider that's > part > of running GFS. The part that gets us wonder is any potential > behavioral > changes and instability to the system. After all, the system is > effectively tunrned into a cluster. I read some of the emails in the > alias about cluster issues aside from GFS. Behavior aside, with full understanding of how it works, clustering is neither complex nor particularly troublesome. Understand that the instability you read about comes not from the clustering but rather is the nature of sharing these resources between multiple machines. I operate over 40 clusters with a total of well over 100 nodes and I can assure you that the day I implemented comprehensive fencing (i.e. removed fence_manual and wrote a fencing agent for our platform) was very likely the best day of my life. Fencing is what makes a GFS cluster reliable. > For instance, we support hot removal/insertion of nodes in the system, > I'm not clear how fencing will get in the way. We're not planning > to add > any fencing hardware, and most likely will set fencing mechanism as > manual. Ideally, we'd like to disable fencing except the part that is > needed for running GFS. There are issues. As long as you don't change to a two-node cluster at some point (going from 1 node to 2 nodes or 3 nodes to 2 nodes) you should be able to achieve this. In my personal opinion, I would avoid running GFS on less than 3 nodes anyways (again, 2-node clusters exhibit behavior that is easily avoidable with a third box, even if it doesn't use the GFS). In a controlled manner it is possible to unmount the FS, leave the cluster, then change the cluster composition from a still running node. Adding isn't too much trouble. In either case I suggest a quorum disk (qdisk). As for uncontrolled crashes fencing is absolutely necessary to recover state of the FS. Complete fencing is absolutely necessary for running GFS. Suggesting that you don't need (indeed want) fencing is an indication that you don't understand how GFS will share your data. Relying on manual fencing is a sign that you will likely lose a great deal of data someday. Redhat won't even support that configuration due to liability concerns. Fencing only makes sure that a machine that has lost contact with the cluster does not trash your data. Without fencing, a node that is out of control can (and will) trash your GFS. This will result in the downtime required to shut down the cluster, fsck the filesystem, and then bring it back up. It will also still likely trash some data. Make no mistake, when fencing occurs, the system is already behaving badly. It fixes it, albeit brutally. With fence_manual, when you have any sort of outage whatsoever, one node will be hosed and the entire cluster will halt. At this point you will do one of three things: 1. You may just restart the entire cluster. or 2. You may correctly make sure the dead machine is truly dead. YOU WILL NOT BE ABLE TO DO THIS REMOTELY WITHOUT HARDWARE SUITABLE FOR FENCING. At that point you will call fence_ack_manual (manually) to free up the cluster. or 3. You may, in your haste, run fence_ack_manual to free up the cluster. If at any point the other node is not completely dead, your data may be forfeit. Worse, it may not be visible immediately, only after it is widely corrupted. At that point you will probably get the downed node running without realizing what damage you may have done. In the meantime, everyone mounting your GFS will be hung. A single hardware failure can freeze your cluster. Totally. Note that to take the only path that saves your data (#2) you will have to have remote power switches or the like to reset a toasted node in all cases. So you will NOT save yourself any money and yet you WILL create trouble. Also, have you considered fencing at your network switch (for networked storage) or at your storage device itself? It is not always necessary to purchase remote power switches to fence your data. If you are not able to abide fencing, you probably should farm this out to someone who can. Fencing is the way to avoid the bad behavior you have read about. It is not the cause of trouble--it's the solution. GFS absolutely must have it in its entirety or no dice. If you would like a more official, professional explanation as to why this is absolutely, unequivocally necessary, contact me by e-mail. I'll call you. I could fly out. I can even give you a report with a letterhead and everything. However, removing fencing from GFS is not a possibility. It's not even really hard. -- Jayson Vantuyl Systems Architect Engine Yard jvantuyl at engineyard.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From krikler_samuel at diligent.com Thu Jan 11 06:34:48 2007 From: krikler_samuel at diligent.com (Krikler, Samuel) Date: Thu, 11 Jan 2007 08:34:48 +0200 Subject: [Linux-cluster] RH Cluster doesn't pass basic acceptance tests -bug in fenced? Message-ID: <453D02254A9EBC45866DBF28FECEA46F0DFA88@ILEX01.corp.diligent.com> Hi, We got the same problem and tried to used qdisk without success. I managed to create the qdisk but didn't manage to get both nodes registered into it. Could someone please point me to the documentation / description of how to properly set up the qdisk for a 2 nodes-cluster? Thanks a lot, Samuel. ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jayson Vantuyl Sent: Wednesday, January 10, 2007 11:23 PM To: linux clustering Subject: Re: [Linux-cluster] RH Cluster doesn't pass basic acceptance tests -bug in fenced? On Jan 10, 2007, at 1:47 PM, Miroslav Zubcic wrote: How can I prevent this? This looks like a bug. I don't want fenced to fence other node south if it already "knows" that it is the one without link. It is very possible to write a wrapper script for your fencing agent that simply checks the link and refuses to fence when the link is down. However, this wouldn't really be good in any non-network related fencing situation. The recommendation to use a qdiskd would be a good one and you could even use a link detection script as an arbitrary heuristic in this case. -- Jayson Vantuyl Systems Architect Engine Yard jvantuyl at engineyard.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jvantuyl at engineyard.com Thu Jan 11 06:54:06 2007 From: jvantuyl at engineyard.com (Jayson Vantuyl) Date: Thu, 11 Jan 2007 00:54:06 -0600 Subject: [Linux-cluster] RH Cluster doesn't pass basic acceptance tests -bug in fenced? In-Reply-To: <453D02254A9EBC45866DBF28FECEA46F0DFA88@ILEX01.corp.diligent.com> References: <453D02254A9EBC45866DBF28FECEA46F0DFA88@ILEX01.corp.diligent.com> Message-ID: <58679925-D017-49E9-AB3F-2ECBFDF09939@engineyard.com> Samuel, On Jan 11, 2007, at 12:34 AM, Krikler, Samuel wrote: > Could someone please point me to the documentation / description of > how to properly set up the qdisk for a 2 nodes-cluster? Now that is an interesting question. I'm not so sure how the two- node setup handles a quorum node. That said, I think the solution is actually to set up the qdisk to have a vote *AND* not configure the system as a two-node cluster. Basically, take off the two-node flag for CMAN, set CMAN's expected_votes to 2, give each node 1 vote and the qdisk 1 vote. That way, two running nodes give you quorum, either node + qdisk gives you quorum, and either node - qdisk is inquorate. Can any of the cluster gods comment on this? I usually have 3 or more nodes. -- Jayson Vantuyl Systems Architect Engine Yard jvantuyl at engineyard.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From shailesh at verismonetworks.com Thu Jan 11 08:45:54 2007 From: shailesh at verismonetworks.com (Shailesh) Date: Thu, 11 Jan 2007 14:15:54 +0530 Subject: [Linux-cluster] Not able to build cluster tools 1.03.00 Message-ID: <1168505154.6593.46.camel@shailesh> Hi, I am attempting to build the tools in cluster-1.03.00.tar.gz on my Redhat workstation RHEL-V4 which is having the kernel 2.6.9-5. I see a lot compile errors in the build. Can anybody suggest me if I am missing something, Do I have to upgrade the kernel,if so which version will it be? Thanks & Regards Shailesh From raj4linux at gmail.com Thu Jan 11 08:52:27 2007 From: raj4linux at gmail.com (rajesh mishra) Date: Thu, 11 Jan 2007 14:22:27 +0530 Subject: [Linux-cluster] Not able to build cluster tools 1.03.00 In-Reply-To: <1168505154.6593.46.camel@shailesh> References: <1168505154.6593.46.camel@shailesh> Message-ID: <5a8d914c0701110052h24c04a0cva699226a35a346c8@mail.gmail.com> First did u take latest source code from the Red Hat repository..? U need to specify what kind of error u r getting while compilation. I strongly feel if u peep into the code u even can make out. There might be minor problems. With Regards Rajesh. On 1/11/07, Shailesh wrote: > Hi, > I am attempting to build the tools in cluster-1.03.00.tar.gz on my > Redhat workstation RHEL-V4 which is having the kernel 2.6.9-5. > I see a lot compile errors in the build. > > Can anybody suggest me if I am missing something, > > Do I have to upgrade the kernel,if so which version will it be? > > Thanks & Regards > Shailesh > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From pcaulfie at redhat.com Thu Jan 11 09:04:30 2007 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 11 Jan 2007 09:04:30 +0000 Subject: [Linux-cluster] Not able to build cluster tools 1.03.00 In-Reply-To: <1168505154.6593.46.camel@shailesh> References: <1168505154.6593.46.camel@shailesh> Message-ID: <45A5FD9E.6090706@redhat.com> Shailesh wrote: > Hi, > I am attempting to build the tools in cluster-1.03.00.tar.gz on my > Redhat workstation RHEL-V4 which is having the kernel 2.6.9-5. > I see a lot compile errors in the build. > > Can anybody suggest me if I am missing something, > > Do I have to upgrade the kernel,if so which version will it be? > If you want to use a Red Hat kernel then you should check out the RHEL4 branch from CVS or use the SRPMS. If you want to use cluster-1.03 then you'll need a recent (but not too recent) kernel.org kernel. (yes I know that's very vague, I can't remember which kernel it compiles against now, sorry!) -- patrick From fedele at fis.unical.it Thu Jan 11 09:08:26 2007 From: fedele at fis.unical.it (Fedele Stabile) Date: Thu, 11 Jan 2007 10:08:26 +0100 Subject: [Linux-cluster] Problems updating cluster.conf Message-ID: <45A5FE8A.5020403@fis.unical.it> Good day, I'm running Cluster Suite on CentOS4 on a 35 PC cluster and a SAN, when i try to update the cluster.conf on my cluster I have this message: [root at linuxlab1 cluster]# ccs_tool update /etc/cluster/cluster.conf Failed to receive COMM_UPDATE_NOTICE_ACK from pc10. Hint: Check the log on pc10 for reason. Failed to update config file. [root at linuxlab1 cluster]# but in pc10 log files I can't see anything, also if I try to run ccsd on pc10 with the command ccsd -n I can't see the error Versions of cman, ccsd amd rgmanager are: ccs-1.0.7-0 cman-1.0.11-0 rgmanager-1.9.53-0 I have 18 PC on a 1Gbit LAN and 17 PC on 100Mbit Fedele STABILE From fedele at fis.unical.it Thu Jan 11 09:38:31 2007 From: fedele at fis.unical.it (Fedele Stabile) Date: Thu, 11 Jan 2007 10:38:31 +0100 Subject: [Linux-cluster] Problems updating cluster.conf SOLVED but I have a question In-Reply-To: <45A5FE8A.5020403@fis.unical.it> References: <45A5FE8A.5020403@fis.unical.it> Message-ID: <45A60597.2070300@fis.unical.it> I solved my problem running ccs_tool update from a node on the slow network, can you help me to explain the reason of this behaviour? Fedele STABILE Fedele Stabile wrote: > Good day, > > I'm running Cluster Suite on CentOS4 on a 35 PC cluster and a SAN, > when i try to update the cluster.conf on my cluster I have this message: > > [root at linuxlab1 cluster]# ccs_tool update /etc/cluster/cluster.conf > Failed to receive COMM_UPDATE_NOTICE_ACK from pc10. > Hint: Check the log on pc10 for reason. > > Failed to update config file. > [root at linuxlab1 cluster]# > > but in pc10 log files I can't see anything, also if I try to run ccsd on > pc10 with the command > ccsd -n > I can't see the error > > Versions of cman, ccsd amd rgmanager are: > > ccs-1.0.7-0 > cman-1.0.11-0 > rgmanager-1.9.53-0 > > I have 18 PC on a 1Gbit LAN and 17 PC on 100Mbit > > Fedele STABILE > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From kri_thi at yahoo.com Thu Jan 11 11:34:54 2007 From: kri_thi at yahoo.com (krishnamurthi G) Date: Thu, 11 Jan 2007 03:34:54 -0800 (PST) Subject: [Linux-cluster] cluster version identification:how? Message-ID: <20070111113454.28556.qmail@web90413.mail.mud.yahoo.com> Hi Frieds, Is there any ways to find cluster version (if it is !). Problem: The cluster specific commands/paths/command outputs have been changed/changing completely from RHEL 2.1 to 2.4 to 2.6. I am working on a project where I need support for all version/releases, some how I need to find cluster version if available so that I can parse accordingly. Temporary Work around: The cluster config file is unique for different RHEL releases. e.g RHEL 2.1 "/etc/cluster.xml" RHEL 2.4 "/etc/cluster.conf" RHEL 2.6 "/etc/cluster/cluster.conf" Check this config file and identify cluster type. I appreciate if any of you help me for efficient solution? Thanks in advance - Krishna ____________________________________________________________________________________ Want to start your own business? Learn how on Yahoo! Small Business. http://smallbusiness.yahoo.com/r-index -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Thu Jan 11 14:30:08 2007 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 11 Jan 2007 09:30:08 -0500 Subject: [Linux-cluster] RH Cluster doesn't pass basic acceptance tests - bug in fenced? In-Reply-To: <20070110201648.GE24326@korben.rdu.redhat.com> References: <45A542BD.9070107@nimium.hr> <20070110201648.GE24326@korben.rdu.redhat.com> Message-ID: <1168525808.15369.273.camel@rei.boston.devel.redhat.com> On Wed, 2007-01-10 at 15:16 -0500, Josef Whiter wrote: > > Thanks for any advice ... > > > > This isn't a bug, its working as expected. What you need in qdisk, set it up > with the proper hueristics and it will force the shutdown of the bad node before > the bad node has a chance to fence off the working node. What he said. With qdisk, you can have the node declare itself unfit for cluster operation when bond0 or bond1 loses link; something like: You could use more complex link monitoring (like the stuff in /usr/share/cluster/ip.sh) if you wanted, but this gives you the basic idea. The idea here is that if bond0 *or* bond1 loses link, qdiskd declares the node unfit (min_score = 2, and each route is 1 point, so loss of either => fatal). A feature was added after the initial release of qdiskd to reboot the node on loss of required score (previously, it would cause the node to become inquorate and block activity). -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Thu Jan 11 14:47:27 2007 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 11 Jan 2007 09:47:27 -0500 Subject: [Linux-cluster] Remove the clusterness from GFS In-Reply-To: <08A9A3213527A6428774900A80DBD8D8033E77FA@xmb-sjc-222.amer.cisco.com> References: <08A9A3213527A6428774900A80DBD8D8033E77FA@xmb-sjc-222.amer.cisco.com> Message-ID: <1168526847.15369.288.camel@rei.boston.devel.redhat.com> On Wed, 2007-01-10 at 15:00 -0800, Lin Shen (lshen) wrote: > For instance, we support hot removal/insertion of nodes in the system, > I'm not clear how fencing will get in the way. We're not planning to add > any fencing hardware, and most likely will set fencing mechanism as > manual. Ideally, we'd like to disable fencing except the part that is > needed for running GFS. Hmm, well, GFS requires every node mounting a volume directly to have fencing. You can use NFS to export the same GFS volume from multiple servers. The idea here is that with more than one NFS server exporting the same file system, you can achieve very high data parallel data throughput - near the maximum the SAN allows - because the network bandwidth and server bottlenecks are, in theory, eliminated. This solution requires building a GFS cluster, say, 3 or 5 nodes + a SAN. Make one or more GFS volumes on the SAN, and mount on all nodes. Export from all nodes. Adding more clients is simple. Just mount the NFS export. Fencing is needed for the GFS cluster, but not the NFS clients. You could do the same thing with Lustre (sort of). Build a server cluster, and mount over the network. You'd only need fencing hardware for the metadata server (I *think*; never tried it). Adding a client is easy: set up Lustre on the client and mount the file system. There's some "waste" in the sense that to build either of these solutions, you need several machines that act as a "storage farm" for the best possible reliability. -- Lon From lhh at redhat.com Thu Jan 11 14:58:03 2007 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 11 Jan 2007 09:58:03 -0500 Subject: [Linux-cluster] RH Cluster doesn't pass basic acceptance tests -bug in fenced? In-Reply-To: <58679925-D017-49E9-AB3F-2ECBFDF09939@engineyard.com> References: <453D02254A9EBC45866DBF28FECEA46F0DFA88@ILEX01.corp.diligent.com> <58679925-D017-49E9-AB3F-2ECBFDF09939@engineyard.com> Message-ID: <1168527483.15369.299.camel@rei.boston.devel.redhat.com> On Thu, 2007-01-11 at 00:54 -0600, Jayson Vantuyl wrote: > That said, I think the solution is actually to set up the qdisk to > have a vote *AND* not configure the system as a two-node cluster. > Basically, take off the two-node flag for CMAN, set CMAN's > expected_votes to 2, give each node 1 vote and the qdisk 1 vote. That's correct, if you're using a single heuristic to implement a tiebreaker. > That way, two running nodes give you quorum, either node + qdisk gives > you quorum, and either node - qdisk is inquorate. With a multi-point qdisk setup, you want qdisk to be required (generally) - i.e., when monitoring multiple network paths. However, for a 2-node + tiebreaker setup, yours looks right. > Can any of the cluster gods comment on this? I usually have 3 or more > nodes. I hadn't considered the implications of doing 1 vote for 3+ node clusters, but I don't think there are any; it should work, but it wouldn't be particularly useful. The man pages talk about the general setup for making N->1 failure recovery work using qdisk, but it's missing the 2-node+tiebreaker case. I'll have to add that (since it's a *very* interesting use case). -- Lon From lhh at redhat.com Thu Jan 11 15:05:24 2007 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 11 Jan 2007 10:05:24 -0500 Subject: [Linux-cluster] cluster version identification:how? In-Reply-To: <20070111113454.28556.qmail@web90413.mail.mud.yahoo.com> References: <20070111113454.28556.qmail@web90413.mail.mud.yahoo.com> Message-ID: <1168527924.15369.308.camel@rei.boston.devel.redhat.com> On Thu, 2007-01-11 at 03:34 -0800, krishnamurthi G wrote: > Hi Frieds, > > Is there any ways to find cluster version (if it is !). > Problem: The cluster specific commands/paths/command outputs have been > changed/changing completely from RHEL 2.1 to 2.4 to 2.6. > I am working on a project where I need support for all > version/releases, some how I need to find cluster version if available > so that I can parse accordingly. > Temporary Work around: The cluster config file is unique for different > RHEL releases. > e.g RHEL 2.1 "/etc/cluster.xml" > RHEL 2.4 "/etc/cluster.conf" > RHEL 2.6 "/etc/cluster/cluster.conf" > > Check this config file and identify cluster type. RHEL 2.1: /etc/cluster.conf RHEL3: /etc/cluster.xml RHEL4: /etc/cluster/cluster.conf RHEL5: /etc/cluster/cluster.conf Why not just do 'rpm -q redhat-release'? I'm curious; why does the cluster version matter: are you manipulating cluster.[xml|conf] directly? If so, you'll need to do a few extra things. -- Lon From ramon at vanalteren.nl Thu Jan 11 16:04:34 2007 From: ramon at vanalteren.nl (Ramon van Alteren) Date: Thu, 11 Jan 2007 17:04:34 +0100 Subject: [Linux-cluster] Fencing Problem On APC 7950 In-Reply-To: <1168488605.26513.28.camel@kuli.magnifix.com.my> References: <1168488605.26513.28.camel@kuli.magnifix.com.my> Message-ID: <45A66012.1090006@vanalteren.nl> Hi, Mohd Irwan Jamaluddin wrote: > I'm running Red Hat Cluster Suite on RHEL 4 U3 with APC 7950 > ( http://www.apc.com/resource/include/techspec_index.cfm?base_sku=AP7950 ) > as the fencing device. The version of my fence package is 1.32.18-0. > I try to execute some fence_apc commands but I've got errors. Below are the details: > > [root at orarac01 ~]# fence_apc -a 10.0.6.150 -l apc -p apc -n 02 \ > >> -o Reboot -T -v >> > failed: unrecognised Reboot response > > Same error occured for On/Off option. > > Also, I found a weird problem if I put "2" instead of "02" for the Outlet Number option (-n option). > Below are the error message: > [root at orarac01 ~]# fence_apc -a 10.0.6.150 -l apc -p apc -n 2 \ > >> -o Reboot -T -v >> > failed: unrecognised menu response > > > H > Anyone have ever faced similar problem with me? Here I attached the apclog for reference. > we're running fencing through an apc 7920 which I assume is similar. I've hacked up the fence_apc until it worked, it accepts outlet names and numbers if I remember correctly. I've attached ours to the mail, it's originally from cluster-1.02 but works for me on cluster-1.03 as well. No guarantees implied ;-) WFM, YMMV > Thanks in advanced for your response. > Hope it helps Ramon -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fence_apc URL: From erickson.jon at gmail.com Thu Jan 11 16:08:31 2007 From: erickson.jon at gmail.com (Jon Erickson) Date: Thu, 11 Jan 2007 11:08:31 -0500 Subject: [Linux-cluster] multipathd/ lvm.static error Message-ID: <6a90e4da0701110808k281cef1ej260de4ba6a06496@mail.gmail.com> All, I'm receiving these two errors on all of the systems I have setup in my GFS cluster. Even though I receive these errors my systems appear to be functioning fine. The first error makes sense because my local SCSI disk is not multipath-ed, however, I do not know how to get rid of it. The second error is confusing to me especially because the system still comes up with all of my multipath-ed devices in working order. multipathd: error calling out /sbin/scsi_id -g -u -s /block/sda lvm.static[6938]: segfault at 0000000000000000 rip 0000000000000000 rsp 0000007fbfff9388 error 14 ------------------------------- uname -r = 2.6.9-42.0.3.ELsmp Packages Installed: system-config-lvm-1.0.16-1.0 lvm2-cluster-2.02.06-7.0.RHEL4 lvm2-2.02.06-6.0.RHEL4 device-mapper-multipath-0.4.5-16.1.RHEL4 GFS-6.1.6-1 GFS-kernel-smp-2.6.9-60.3 GFS-kernheaders-2.6.9-60.3 kernel-smp-2.6.9-42.0.3.EL Thanks, Jon From jamesm at xandros.com Thu Jan 11 16:19:25 2007 From: jamesm at xandros.com (James McOrmond) Date: Thu, 11 Jan 2007 11:19:25 -0500 Subject: [Linux-cluster] Fencing Problem On APC 79x0 In-Reply-To: <45A66012.1090006@vanalteren.nl> References: <1168488605.26513.28.camel@kuli.magnifix.com.my> Message-ID: <45A6638D.8020501@xandros.com> Ramon van Alteren wrote: >>Anyone have ever faced similar problem with me? Here I attached the apclog for reference. >> >> >> >we're running fencing through an apc 7920 which I assume is similar. >I've hacked up the fence_apc until it worked, it accepts outlet names >and numbers if I remember correctly. >I've attached ours to the mail, it's originally from cluster-1.02 but >works for me on cluster-1.03 as well. > Do you know if this is a general fence_apc issue? I've got a 7900 on order to do some work with and i'm wondering if I should expect to have to make these modifications? -- James A. McOrmond (jamesm at xandros.com) Hardware QA Lead & Network Administrator Xandros Corporation, Ottawa, Canada. Morpheus: ...after a century of war I remember that which matters most: *We are still HERE!* -------------- next part -------------- An HTML attachment was scrubbed... URL: From jparsons at redhat.com Thu Jan 11 16:50:18 2007 From: jparsons at redhat.com (Jim Parsons) Date: Thu, 11 Jan 2007 11:50:18 -0500 Subject: [Linux-cluster] Fencing Problem On APC 7950 References: <1168488605.26513.28.camel@kuli.magnifix.com.my> <45A66012.1090006@vanalteren.nl> Message-ID: <45A66ACA.6060607@redhat.com> Thanks Ramon and Mohd. Attached is our latest, heavily refactored version of this agent. outlet naming and grouping should all work now - on every 7900 series switch. Please try it. Just rename it fence_apc when you drop it into /sbin and make certain the executable bits are set. -J Ramon van Alteren wrote: >Hi, > >Mohd Irwan Jamaluddin wrote: > >>I'm running Red Hat Cluster Suite on RHEL 4 U3 with APC 7950 >>( http://www.apc.com/resource/include/techspec_index.cfm?base_sku=AP7950 ) >>as the fencing device. The version of my fence package is 1.32.18-0. >>I try to execute some fence_apc commands but I've got errors. Below are the details: >> >>[root at orarac01 ~]# fence_apc -a 10.0.6.150 -l apc -p apc -n 02 \ >> >> >>>-o Reboot -T -v >>> >>> >>failed: unrecognised Reboot response >> >>Same error occured for On/Off option. >> >>Also, I found a weird problem if I put "2" instead of "02" for the Outlet Number option (-n option). >>Below are the error message: >>[root at orarac01 ~]# fence_apc -a 10.0.6.150 -l apc -p apc -n 2 \ >> >> >>>-o Reboot -T -v >>> >>> >>failed: unrecognised menu response >> >> >>H >>Anyone have ever faced similar problem with me? Here I attached the apclog for reference. >> >> >we're running fencing through an apc 7920 which I assume is similar. >I've hacked up the fence_apc until it worked, it accepts outlet names >and numbers if I remember correctly. >I've attached ours to the mail, it's originally from cluster-1.02 but >works for me on cluster-1.03 as well. > >No guarantees implied ;-) WFM, YMMV > >>Thanks in advanced for your response. >> >> >Hope it helps > >Ramon > > >------------------------------------------------------------------------ > >#!/usr/bin/perl > >############################################################################### >############################################################################### >## >## Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. >## Copyright (C) 2004 Red Hat, Inc. All rights reserved. >## >## This copyrighted material is made available to anyone wishing to use, >## modify, copy, or redistribute it subject to the terms and conditions >## of the GNU General Public License v.2. >## >############################################################################### >############################################################################### > >use Getopt::Std; >use Net::Telnet (); > ># Get the program name from $0 and strip directory names >$_=$0; >s/.*\///; >my $pname = $_; > ># Change these if the text returned by your equipment is different. ># Test by running script with options -t -v and checking /tmp/apclog > >my $immediate = 'immediate'; # # Or 'delayed' - action string prefix on menu >my $masterswitch = 'masterswitch plus '; # 'Device Manager' option to choose >my $login_prompt = '/: /'; >my $cmd_prompt = '/> /'; > >my $max_open_tries = 3; # How many telnet attempts to make. Because the > # APC can fail repeated login attempts, this number > # should be more than 1 >my $open_wait = 5; # Seconds to wait between each telnet attempt >my $telnet_timeout = 2; # Seconds to wait for matching telent response >my $debuglog = '/tmp/apclog';# Location of debugging log when in verbose mode >$opt_o = 'reboot'; # Default fence action. > >my $logged_in = 0; > >my $t = new Net::Telnet; > > > ># WARNING!! Do not add code bewteen "#BEGIN_VERSION_GENERATION" and ># "#END_VERSION_GENERATION" It is generated by the Makefile > >#BEGIN_VERSION_GENERATION >$FENCE_RELEASE_NAME="1.02.00"; >$REDHAT_COPYRIGHT=("Copyright (C) Red Hat, Inc. 2004 All rights reserved."); >$BUILD_DATE="(built Wed Jun 28 13:17:53 CEST 2006)"; >#END_VERSION_GENERATION > >sub usage >{ > print "Usage:\n"; > print "\n"; > print "$pname [options]\n"; > print "\n"; > print "Options:\n"; > print " -a IP address or hostname of MasterSwitch\n"; > print " -h usage\n"; > print " -l Login name\n"; > print " -n Outlet number to change: [:] \n"; > print " -o Action: Reboot (default), Off or On\n"; > print " -p Login password\n"; > print " -q quiet mode\n"; > print " -T Test mode (cancels action)\n"; > print " -V version\n"; > print " -v Log to file /tmp/apclog\n"; > > exit 0; >} > >sub fail >{ > ($msg)=@_; > print $msg."\n" unless defined $opt_q; > > if (defined $t) > { > # make sure we don't get stuck in a loop due to errors > $t->errmode('return'); > > logout() if $logged_in; > $t->close > } > exit 1; >} > >sub fail_usage >{ > ($msg)=@_; > print STDERR $msg."\n" if $msg; > print STDERR "Please use '-h' for usage.\n"; > exit 1; >} > >sub version >{ > print "$pname $FENCE_RELEASE_NAME $BUILD_DATE\n"; > print "$SISTINA_COPYRIGHT\n" if ( $SISTINA_COPYRIGHT ); > exit 0; >} > > >sub login >{ > for (my $i=0; $i<$max_open_tries; $i++) > { > $t->open($opt_a); > ($_) = $t->waitfor($login_prompt); > > # Expect 'User Name : ' > if (! /name/i) { > $t->close; > sleep($open_wait); > next; > } > > $t->print($opt_l); > ($_) = $t->waitfor($login_prompt); > > # Expect 'Password : ' > if (! /password/i ) { > $t->close; > sleep($open_wait); > next; > } > > # Send password > $t->print($opt_p); > > (my $dummy, $_) = $t->waitfor('/(>|(?i:user name|password)\s*:) /'); > if (/> /) > { > $logged_in = 1; > > # send newline to flush prompt > $t->print(""); > > return; > } > else > { > fail "invalid username or password"; > } > } > fail "failed: telnet failed: ". $t->errmsg."\n" >} > ># print_escape_char() -- utility subroutine for sending the 'Esc' character >sub print_escape_char >{ > # The APC menu uses "" to go 'up' menues. We must set > # the output_record_separator to "" so that "\n" is not printed > # after the "" character > > $ors=$t->output_record_separator; > $t->output_record_separator(""); > $t->print("\x1b"); # send escape > $t->output_record_separator("$ors"); >} > > ># Determine if the switch is a working state. Also check to make sure that ># the switch has been specified in the case that there are slave switches ># present. This assumes that we are at the main menu. >sub identify_switch >{ > > ($_) = $t->waitfor($cmd_prompt); > print_escape_char(); > > # determine what type of switch we are dealling with > ($_) = $t->waitfor($cmd_prompt); > if ( /Switched Rack PDU: Communication Established/i) > { > # No further test needed > } > elsif ( /MS plus 1 : Serial Communication Established/i ) > { > if ( defined $switchnum ) > { > $masterswitch = $masterswitch . $switchnum; > } > elsif ( /MS plus [^1] : Serial Communication Established/i ) > { > fail "multiple switches detected. 'switch' must be defined."; > } > else > { > $switchnum = 1; > } > } > else > { > fail "APC is in undetermined state" > } > > # send a newline to cause APC to reprint the menu > $t->print(""); >} > > ># Navigate through menus to the appropriate outlet control menu of the apc ># MasterSwitch and 79xx series switches. Uses multi-line (mostly) ># case-insensitive matches to recognise menus and works out what option number ># to select from each menu. >sub navigate >{ > # Limit the ammount of menu depths to 20. We should never be this deep > for(my $i=20; $i ; $i--) > { > # Get the new text from the menu > ($_) = $t->waitfor($cmd_prompt); > # Identify next option > if ( > # "Control Console", "1- Device Manager" > /--\s*control console.*(\d+)\s*-\s*device manager/is || > > # "Device Manager", "2- Outlet Control" > /--\s*device manager.*(\d+)\s*-\s*outlet control/is || > > # > # APC MasterSwitch Menus > # > # "Device Manager", "1- MasterSwitch plus 1" > /--\s*device manager.*(\d+)\s*-\s*$masterswitch/is || > > # "Device Manager", "1- Cluster Node 0 ON" > /--\s*(?:device manager|$masterswitch).*(\d+)\s*-\s+Outlet\s+$switchnum:$opt_n\D[^\n]*\s(?-i:ON|OFF)\*?\s/ism || > > # "MasterSwitch plus 1", "1- Outlet 1:1 Outlet #1 ON" > /--\s*$masterswitch.*(\d+)\s*-\s*Outlet\s+$switchnum:$opt_n\s[^\n]*\s(?-i:ON|OFF)\*?\s/ism || > > # Administrator outlet control menu > /--\s*Outlet $switchnum:$opt_n\D.*(\d+)\s*-\s*outlet control\s*$switchnum:?$opt_n\D/ism || > > > # > # APC 79XX Menus > # > # "3- Outlet Control/Configuration" > /--\s*device manager.*(\d+)\s*-\s*Outlet Control/is || > > # "Device Manager", "1- Cluster Node 0 ON" > /--\s*Outlet Control.*(\d+)\s*-\s+Outlet\s+$opt_n\D[^\n]*\s(?-i:ON|OFF)\*?\s/ism || > > # "Device Manager", "1- ON" > /--\s*Outlet Control.*(\d+)\s*-\s+$opt_n\D[^\n]*\s(?-i:ON|OFF)\*?\s/ism || > > # Administrator Outlet Control menu > /--\s*Outlet $opt_n\D.*(\d+)\s*-\s*control\s*outlet\s+$opt_n\D/ism || > /--\s*Outlet $opt_n\D.*(\d+)\s*-\s*control\s*outlet/ism > > ) { > $t->print($1); > next; > } > > # "Outlet Control X:N", "4- Immediate Reboot" > if ( > /(\d+)\s*-\s*immediate $opt_o/is || > /--\s*$opt_n.*(\d+)\s*-\s*immediate\s*$opt_o/is || > /--\s*Control Outlet\D.*(\d+)\s*-\s*Immediate\s*$opt_o/is > ) { > $t->print($1); > last; > } > > fail "failed: unrecognised menu response\n"; > } >} > > >sub logout >{ > # send a newline to make sure that we refresh the menus > # ($t->waitfor() can hang otherwise) > $t->print(""); > > # Limit the ammount of menu depths to 20. We should never be this deep > for(my $i=20; $i ; $i--) > { > > # Get the new text from the menu > ($_) = $t->waitfor($cmd_prompt); > > if ( > # "Control Console", "4- Logout" > /--\s*control console.*(\d+)\s*-\s*Logout/is > ) { > $t->print($1); > last; > } > else > { > print_escape_char(); > next; > } > } >} > > >sub action >{ > # "Enter 'YES' to continue or to cancel : " > ($_) = $t->waitfor('/: /'); > if (! /.*immediate $opt_o.*YES.*to continue/si ) { > fail "failed: unrecognised $opt_o response\n"; > } > > # Test mode? > $t->print($opt_T?'NO':'YES'); > > # "Success", "Press to continue..." > ($_) = $t->waitfor('/continue/'); > $t->print(''); > > if (defined $opt_T) { > logout(); > print "success: test outlet $opt_n $opt_o\n" unless defined $opt_q; > $t->close; > > # Allow the APC some time to clean connection > # before next login. > sleep 1; > > exit 0; > } elsif ( /Success/i ) { > logout(); > print "success: outlet $opt_n $opt_o\n" unless defined $opt_q; > $t->close; > > # Allow the APC some time to clean connection > # before next login. > sleep 1; > > exit 0; > } > > fail "failed: unrecognised action response\n"; >} > > >sub get_options_stdin >{ > my $opt; > my $line = 0; > while( defined($in = <>) ) > { > $_ = $in; > chomp; > > # strip leading and trailing whitespace > s/^\s*//; > s/\s*$//; > > # skip comments > next if /^#/; > > $line+=1; > $opt=$_; > next unless $opt; > > ($name,$val)=split /\s*=\s*/, $opt; > > if ( $name eq "" ) > { > print STDERR "parse error: illegal name in option $line\n"; > exit 2; > } > # DO NOTHING -- this field is used by fenced > elsif ($name eq "agent" ) > { > } > elsif ($name eq "ipaddr" ) > { > $opt_a = $val; > } > elsif ($name eq "login" ) > { > $opt_l = $val; > } > elsif ($name eq "option" ) > { > $opt_o = $val; > } > elsif ($name eq "passwd" ) > { > $opt_p = $val; > } > elsif ($name eq "port" ) > { > $opt_n = $val; > } > elsif ($name eq "switch" ) > { > $switchnum = $val; > } > elsif ($name eq "test" ) > { > $opt_T = $val; > } > elsif ($name eq "verbose" ) > { > $opt_v = $val; > } > # Excess name/vals will fail > else > { > fail "parse error: unknown option \"$opt\""; > } > } >} > > >sub telnet_error >{ > fail "failed: telnet returned: ".$t->errmsg."\n"; >} > > >### MAIN ####################################################### > >if (@ARGV > 0) { > getopts("a:hl:n:o:p:qTvV") || fail_usage ; > > usage if defined $opt_h; > version if defined $opt_V; > > fail_usage "Unkown parameter." if (@ARGV > 0); > > fail_usage "No '-a' flag specified." unless defined $opt_a; > fail_usage "No '-n' flag specified." unless defined $opt_n; > fail_usage "No '-l' flag specified." unless defined $opt_l; > fail_usage "No '-p' flag specified." unless defined $opt_p; > fail_usage "Unrecognised action '$opt_o' for '-o' flag" > unless $opt_o =~ /^(Off|On|Reboot)$/i; > > if ( $opt_n =~ /(\d+):(\d+)/ ) { > $switchnum=($1); > $opt_n = ($2); > } >} else { > get_options_stdin(); > > fail "failed: no IP address" unless defined $opt_a; > fail "failed: no plug number" unless defined $opt_n; > fail "failed: no login name" unless defined $opt_l; > fail "failed: no password" unless defined $opt_p; > fail "failed: unrecognised action: $opt_o" > unless $opt_o =~ /^(Off|On|Reboot)$/i; >} > >$t->timeout($telnet_timeout); >$t->input_log($debuglog) if $opt_v; >$t->errmode('return'); > >&login; > >&identify_switch; > ># Abort on failure beyond here >$t->errmode(\&telnet_error); > >&navigate; >&action; > >exit 0; > > > >------------------------------------------------------------------------ > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fence_apc_master.py_done URL: From jparsons at redhat.com Thu Jan 11 16:54:34 2007 From: jparsons at redhat.com (Jim Parsons) Date: Thu, 11 Jan 2007 11:54:34 -0500 Subject: [Linux-cluster] Fencing Problem On APC 79x0 References: <1168488605.26513.28.camel@kuli.magnifix.com.my> <45A6638D.8020501@xandros.com> Message-ID: <45A66BCA.9050408@redhat.com> James McOrmond wrote: > > > Ramon van Alteren wrote: > >>>Anyone have ever faced similar problem with me? Here I attached the apclog for reference. >>> >>> >>> >>we're running fencing through an apc 7920 which I assume is similar. >>I've hacked up the fence_apc until it worked, it accepts outlet names >>and numbers if I remember correctly. >>I've attached ours to the mail, it's originally from cluster-1.02 but >>works for me on cluster-1.03 as well. >> > Do you know if this is a general fence_apc issue? I've got a 7900 on > order to do some work with and i'm wondering if I should expect to > have to make these modifications? I maintain fence agents, and apc is one of out most stable agents. We just refactored it to allow for port aliasing and grouping, as well as support for master switch plus series. I have several apc switches of different flavors that I use daily in development clusters in our lab. I posted this latest version of the agent to this list about 20 minutes ago. Please try it out, if you can. -J From ramon at vanalteren.nl Thu Jan 11 16:40:25 2007 From: ramon at vanalteren.nl (Ramon van Alteren) Date: Thu, 11 Jan 2007 17:40:25 +0100 Subject: [Linux-cluster] Fencing Problem On APC 79x0 In-Reply-To: <45A6638D.8020501@xandros.com> References: <1168488605.26513.28.camel@kuli.magnifix.com.my> <45A6638D.8020501@xandros.com> Message-ID: <45A66879.9020305@vanalteren.nl> James McOrmond wrote: > > > Ramon van Alteren wrote: >>> Anyone have ever faced similar problem with me? Here I attached the apclog for reference. >>> >>> >> we're running fencing through an apc 7920 which I assume is similar. >> I've hacked up the fence_apc until it worked, it accepts outlet names >> and numbers if I remember correctly. >> I've attached ours to the mail, it's originally from cluster-1.02 but >> works for me on cluster-1.03 as well. > Do you know if this is a general fence_apc issue? I've got a 7900 on > order to do some work with and i'm wondering if I should expect to > have to make these modifications? Nope sorry, just own 7920's Regards, Ramon From erickson.jon at gmail.com Thu Jan 11 16:41:04 2007 From: erickson.jon at gmail.com (Jon Erickson) Date: Thu, 11 Jan 2007 11:41:04 -0500 Subject: [Linux-cluster] Cluster Project FAQ - GFS tuning section Message-ID: <6a90e4da0701110841x27694fa3l8e5df13550ea0792@mail.gmail.com> I have a couple of question regarding the Cluster Project FAQ ? GFS tuning section (http://sources.redhat.com/cluster/faq.html#gfs_tuning). First: - Use ?r 2048 on gfs_mkfs and mkfs.gfs2 for large file systems. I noticed that when I used the ?r 2048 switch while creating my file system it ended up creating the file system with the 256MB resource group size. When I omitted the ?r flag the file system was created with 2048Mb resource group size. Is there a problem with the ?r flag, and does gfs_mkfs dynamically come up with the best resource group size based on your file system size? Another thing I did which ended up in a problem was executing the gfs_mkfs command while my current GFS file system was mounted. The command completed successfully but when I went into the mount point all the old files and directories still showed up. When I attempted to remove files bad things happened?I believe I received invalid metadata blocks error and the cluster went into an infinite loop trying to restart the service. I ended up fixing this problem by un-mounting my file system re-creating the GFS file system and re-mounting. This problem was caused by my user error, but maybe there should be some sort of check that determines whether the file system is currently mounted. Second: - Break file systems up when huge numbers of file are involved. This FAQ states that there is an amount of overhead when dealing with lots (millions) of files. What is a recommended limit of files in a file system? The theoretical limit of 8 exabytes for a file system does not seem at all realistic if you can't have (millions) of files in a file system. I just curious to see what everyone thinks about this. Thanks -- Jon From rpeterso at redhat.com Thu Jan 11 17:56:09 2007 From: rpeterso at redhat.com (Robert Peterson) Date: Thu, 11 Jan 2007 11:56:09 -0600 Subject: [Linux-cluster] Cluster Project FAQ - GFS tuning section In-Reply-To: <6a90e4da0701110841x27694fa3l8e5df13550ea0792@mail.gmail.com> References: <6a90e4da0701110841x27694fa3l8e5df13550ea0792@mail.gmail.com> Message-ID: <45A67A39.3090802@redhat.com> Jon Erickson wrote: > I have a couple of question regarding the Cluster Project FAQ ? GFS > tuning section (http://sources.redhat.com/cluster/faq.html#gfs_tuning). > > First: > - Use ?r 2048 on gfs_mkfs and mkfs.gfs2 for large file systems. > I noticed that when I used the ?r 2048 switch while creating my file > system it ended up creating the file system with the 256MB resource > group size. When I omitted the ?r flag the file system was created > with 2048Mb resource group size. Is there a problem with the ?r flag, > and does gfs_mkfs dynamically come up with the best resource group > size based on your file system size? Another thing I did which ended > up in a problem was executing the gfs_mkfs command while my current > GFS file system was mounted. The command completed successfully but > when I went into the mount point all the old files and directories > still showed up. When I attempted to remove files bad things > happened?I believe I received invalid metadata blocks error and the > cluster went into an infinite loop trying to restart the service. I > ended up fixing this problem by un-mounting my file system re-creating > the GFS file system and re-mounting. This problem was caused by my > user error, but maybe there should be some sort of check that > determines whether the file system is currently mounted. > > Second: > - Break file systems up when huge numbers of file are involved. > This FAQ states that there is an amount of overhead when dealing with > lots (millions) of files. What is a recommended limit of files in a > file system? The theoretical limit of 8 exabytes for a file system > does not seem at all realistic if you can't have (millions) of files > in a file system. > > I just curious to see what everyone thinks about this. Thanks > > Hi Jon, The newer gfs_mkfs (gfs1) and mkfs.gfs2 (gfs2) in the CVS HEAD will create the RG size based on the size of the file system if "-r" is not specified, so that would explain why it used 2048 in the case where you didn't specify -r. The previous versions just always assumed 256MB unless -r was specified. If you specified -r 2048 and it used 256 for its rg size, that would be a bug. What version of the gfs_mkfs code were you running to get this? I agree that it would be very nice if all the userspace GFS-related tools could make sure the file system is not mounted anywhere first before running. We even have a bugzilla from long ago about this regarding gfs_fsck: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156012 It's easy enough to check if the local node (the one running mkfs or fsck) has it mounted, but it's harder to figure out whether other nodes do because the userland tools can't assume access to the cluster infrastructure like the kernel code can. So I guess we haven't thought of an elegant solution to this yet; we almost need to query every node and check its cman_tool services output to see if it is using resources pertaining to the file system, but that would require some kind of socket or connection, (e.g. ssh) but what should it do when it can't contact a node that's powered off, etc? Regarding the number of files in a GFS file system: I don't have any kind of recommendations because I haven't studied the exact performance impact based on the number of inodes. It would be cool if someone could do some tests and see where the performance starts to degrade. The cluster team at Red Hat can work toward improving the performance of GFS (in fact, we are; hence the change to gfs_mkfs for the rg size), but many of the performance issues are already addressed with GFS2, and since GFS2 was accepted by the upstream linux kernel, in a way I think it makes more sense to focus more of our efforts there. One thing I thought about doing was trying to use btrees instead of linked lists for some of our more critical resources, like the RGs and the glocks. We'd have to figure out the impact of doing that; the overhead to do that might also impact performance. Just my $0.02. Regards, Bob Peterson Red Hat Cluster Suite From erickson.jon at gmail.com Thu Jan 11 18:48:38 2007 From: erickson.jon at gmail.com (Jon Erickson) Date: Thu, 11 Jan 2007 13:48:38 -0500 Subject: [Linux-cluster] Cluster Project FAQ - GFS tuning section In-Reply-To: <45A67A39.3090802@redhat.com> References: <6a90e4da0701110841x27694fa3l8e5df13550ea0792@mail.gmail.com> <45A67A39.3090802@redhat.com> Message-ID: <6a90e4da0701111048y82806b6qfbd472bd39e5debf@mail.gmail.com> Robert, > What version of the gfs_mkfs code were you running to get this? gfs_mkfs -V produced the following results: gfs_mkfs 6.1.6 (built May 9 2006 17:48:45) Copyright (C) Red Hat, Inc. 2004-2005 All rights reserved Thanks, Jon On 1/11/07, Robert Peterson wrote: > Jon Erickson wrote: > > I have a couple of question regarding the Cluster Project FAQ ? GFS > > tuning section (http://sources.redhat.com/cluster/faq.html#gfs_tuning). > > > > First: > > - Use ?r 2048 on gfs_mkfs and mkfs.gfs2 for large file systems. > > I noticed that when I used the ?r 2048 switch while creating my file > > system it ended up creating the file system with the 256MB resource > > group size. When I omitted the ?r flag the file system was created > > with 2048Mb resource group size. Is there a problem with the ?r flag, > > and does gfs_mkfs dynamically come up with the best resource group > > size based on your file system size? Another thing I did which ended > > up in a problem was executing the gfs_mkfs command while my current > > GFS file system was mounted. The command completed successfully but > > when I went into the mount point all the old files and directories > > still showed up. When I attempted to remove files bad things > > happened?I believe I received invalid metadata blocks error and the > > cluster went into an infinite loop trying to restart the service. I > > ended up fixing this problem by un-mounting my file system re-creating > > the GFS file system and re-mounting. This problem was caused by my > > user error, but maybe there should be some sort of check that > > determines whether the file system is currently mounted. > > > > Second: > > - Break file systems up when huge numbers of file are involved. > > This FAQ states that there is an amount of overhead when dealing with > > lots (millions) of files. What is a recommended limit of files in a > > file system? The theoretical limit of 8 exabytes for a file system > > does not seem at all realistic if you can't have (millions) of files > > in a file system. > > > > I just curious to see what everyone thinks about this. Thanks > > > > > Hi Jon, > > The newer gfs_mkfs (gfs1) and mkfs.gfs2 (gfs2) in the CVS HEAD will > create the RG size based on the size of the file system if "-r" is not > specified, > so that would explain why it used 2048 in the case where you didn't > specify -r. > The previous versions just always assumed 256MB unless -r was specified. > > If you specified -r 2048 and it used 256 for its rg size, that would be > a bug. > What version of the gfs_mkfs code were you running to get this? > > I agree that it would be very nice if all the userspace GFS-related > tools could > make sure the file system is not mounted anywhere first before running. > We even have a bugzilla from long ago about this regarding gfs_fsck: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156012 > > It's easy enough to check if the local node (the one running mkfs or fsck) > has it mounted, but it's harder to figure out whether other nodes do because > the userland tools can't assume access to the cluster infrastructure > like the > kernel code can. So I guess we haven't thought of an elegant solution to > this yet; we almost need to query every node and check its cman_tool > services output to see if it is using resources pertaining to the file > system, > but that would require some kind of socket or connection, > (e.g. ssh) but what should it do when it can't contact a node that's powered > off, etc? > > Regarding the number of files in a GFS file system: I don't have any kind > of recommendations because I haven't studied the exact performance impact > based on the number of inodes. It would be cool if someone could do some > tests and see where the performance starts to degrade. > > The cluster team at Red Hat can work toward improving the performance > of GFS (in fact, we are; hence the change to gfs_mkfs for the rg size), > but many of the performance issues are already addressed with GFS2, > and since GFS2 was accepted by the upstream linux kernel, in a way > I think it makes more sense to focus more of our efforts there. > > One thing I thought about doing was trying to use btrees instead of > linked lists for some of our more critical resources, like the RGs and > the glocks. We'd have to figure out the impact of doing that; the overhead > to do that might also impact performance. Just my $0.02. > > Regards, > > Bob Peterson > Red Hat Cluster Suite > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Jon From cjk at techma.com Thu Jan 11 18:53:34 2007 From: cjk at techma.com (Kovacs, Corey J.) Date: Thu, 11 Jan 2007 13:53:34 -0500 Subject: [Linux-cluster] GFS+EXT3 via NFS? Message-ID: <7DCE72B3C36E2A45B7580F887EE4948C18FD84@tmaemail.techma.com> We have a 5 node cluster that is exporting several GFS and EXT3 filesystems distributed among 5 individual services each with it's own failover group etc. For the most part, things work fine. However, sometimes when we move these services around, the node that recieves the service doesn't re-export the filesystems and clients get stale handles until we manually execute "exportfs -ra" to clear this up. Right now each NFS service exports both GFS and EXT3 filesystems concurrently. There is some thought about seperating the filesystems so that a service only exports GFS OR EXT3, buit not both. We'd like some input though to see if this might really be the problem, or maybe something along these lines etc. My "gut" feeling is that since a service is exporting a GFS filesystem, there may be a built in assumption that the filesystem is exported via /etc/exports and that the only thing transitioning is the IP address as per the unofficial NFS cookbook. The cluster is RHEL4.4.2 and the associated cluster/gfs packages. We have a patched kernel based on the patches for the bnx2 driver which are the only changes to the kernel which also happens to keep it from crashing :) Our hardware is HP DL360-G5 machines connected to an EVA8000 SAN via qlogic FC cards. Any input is appreciated. Thanks Corey Kovacs Senior Systems Engineer Technology Management Associates 703.279.6168 (B) 855-6168 (R) -------------- next part -------------- An HTML attachment was scrubbed... URL: From isplist at logicore.net Thu Jan 11 20:21:04 2007 From: isplist at logicore.net (isplist at logicore.net) Date: Thu, 11 Jan 2007 14:21:04 -0600 Subject: [Linux-cluster] Joomla MySQL cluster? Message-ID: <200711114214.669265@leena> Anyone working with Joomla and a GFS based MySQL cluster? Mike From lgodoy at atichile.com Thu Jan 11 20:38:02 2007 From: lgodoy at atichile.com (Luis Godoy Gonzalez) Date: Thu, 11 Jan 2007 17:38:02 -0300 Subject: [Linux-cluster] Unexpected service restart In-Reply-To: <83A2D7D7-125F-4586-A4AE-A0BB37F78ADD@souja.net> References: <45A50282.7060205@fis.unical.it> <83A2D7D7-125F-4586-A4AE-A0BB37F78ADD@souja.net> Message-ID: <45A6A02A.10709@atichile.com> Hi I have a problem with a service (oracle ) , this service is restarted by clurgmgrd without a error message. The "message" log show: ============================================================= Jan 8 04:50:50 eir-db1 clurgmgrd: [2922]: Executing /opt/oracle/OraHome10g/bin/oracle_mgr.sh status ... Jan 8 04:51:20 eir-db1 clurgmgrd: [2922]: Executing /opt/oracle/OraHome10g/bin/oracle_mgr.sh status ... Jan 8 04:51:40 eir-db1 clurgmgrd[2922]: Stopping service Oracle Jan 8 04:51:40 eir-db1 clurgmgrd: [2922]: Removing IPv4 address xxx.xxx.xxx.xxx from bond1 ... Jan 8 04:51:50 eir-db1 clurgmgrd: [2922]: Executing /opt/oracle/OraHome10g/bin/oracle_mgr.sh stop ... Jan 8 04:52:27 eir-db1 clurgmgrd: [2922]: Removing IPv4 address yyy.yyy.yyy.yyy from bond0 Jan 8 04:52:37 eir-db1 clurgmgrd: [2922]: unmounting /data Jan 8 04:52:38 eir-db1 clurgmgrd[2922]: Service Oracle is recovering Jan 8 04:52:38 eir-db1 clurgmgrd[2922]: Recovering failed service Oracle Jan 8 04:52:38 eir-db1 clurgmgrd: [2922]: mounting /dev/cciss/c1d0p1 on /data Jan 8 04:52:38 eir-db1 kernel: kjournald starting. Commit interval 5 seconds Jan 8 04:52:38 eir-db1 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended Jan 8 04:52:38 eir-db1 kernel: EXT3 FS on cciss/c1d0p1, internal journal Jan 8 04:52:38 eir-db1 kernel: EXT3-fs: mounted filesystem with ordered data mode. Jan 8 04:52:38 eir-db1 clurgmgrd: [2922]: Adding IPv4 address xxx.xxx.xxx.xxx to bond0 Jan 8 04:52:39 eir-db1 clurgmgrd: [2922]: Executing /opt/oracle/OraHome10g/bin/oracle_mgr.sh start ... Jan 8 04:52:47 eir-db1 su(pam_unix)[27069]: session closed for user oracle Jan 8 04:52:47 eir-db1 clurgmgrd: [2922]: Adding IPv4 address yyy.yyy.yyy.yyy to bond1 Jan 8 04:52:48 eir-db1 clurgmgrd[2922]: Service Oracle started Jan 8 04:52:50 eir-db1 clurgmgrd: [2922]: Executing /opt/oracle/OraHome10g/bin/oracle_mgr.sh status ... Jan 8 04:53:20 eir-db1 clurgmgrd: [2922]: Executing /opt/oracle/OraHome10g/bin/oracle_mgr.sh status ============================================================================ the "/opt/oracle/OraHome10g/bin/oracle_mgr.sh status" NOT fail ( this is a basic test), so I don't understand the reason for the restart of services. The service configuration is: ==============================================================================================