From wcheng at redhat.com Tue Jan 1 17:00:32 2008 From: wcheng at redhat.com (Wendy Cheng) Date: Tue, 01 Jan 2008 12:00:32 -0500 Subject: [Linux-cluster] GFS performance In-Reply-To: References: <1198770380.4932.23.camel@WSBID06223> Message-ID: <477A71B0.1080804@redhat.com> Kamal Jain wrote: > A challenge we?re dealing with is a massive number of small files, so > there is a lot of file-level overhead, and as you saw in the > charts?the random reads and writes were not friends of GFS. > It is expected that GFS2 would do better in this area butt this does *not* imply GFS(1) is not fixable. One thing would be helpful is sending us the benchmark (or test program that can reasonably represent your application IO patterns) you used to generate the performance data. Then we'll see what can be done from there .... -- Wendy From jos at xos.nl Tue Jan 1 17:34:18 2008 From: jos at xos.nl (Jos Vos) Date: Tue, 1 Jan 2008 18:34:18 +0100 Subject: [Linux-cluster] GFS performance In-Reply-To: <477A71B0.1080804@redhat.com> References: <1198770380.4932.23.camel@WSBID06223> <477A71B0.1080804@redhat.com> Message-ID: <20080101173418.GB27030@jasmine.xos.nl> On Tue, Jan 01, 2008 at 12:00:32PM -0500, Wendy Cheng wrote: > It is expected that GFS2 would do better in this area butt this does > *not* imply GFS(1) is not fixable. One thing would be helpful is sending > us the benchmark (or test program that can reasonably represent your > application IO patterns) you used to generate the performance data. Then > we'll see what can be done from there .... Take a typical public mirror tree (like Fedora, but FreeBSD gives you even more fun, as it has *huge* directories), start the rsync service and let a bunch of clients rsync some trees. -- -- Jos Vos -- X/OS Experts in Open Systems BV | Phone: +31 20 6938364 -- Amsterdam, The Netherlands | Fax: +31 20 6948204 From wcheng at redhat.com Tue Jan 1 17:56:12 2008 From: wcheng at redhat.com (Wendy Cheng) Date: Tue, 01 Jan 2008 12:56:12 -0500 Subject: [Linux-cluster] gfs2 hang In-Reply-To: <20071228085734.GA23405@jasmine.xos.nl> References: <477419CF.1040002@cgl.ucsf.edu> <20071227214025.GB16736@jasmine.xos.nl> <47741D14.7020109@cgl.ucsf.edu> <20071228085734.GA23405@jasmine.xos.nl> Message-ID: <477A7EBC.7060201@redhat.com> Jos Vos wrote: > >The one thing that's horribly wrong in some applications is performance. >If you need to have large amounts of files and frequent directory scans >(i.e. rsync etc.), you're lost. > > > On GFS(1) part, the glock trimming patch (http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4) was developed for customers with rsync issues. Field data have shown positive results. It is released on RHEL 5.1, as well on RHEL 4.6. Check out the usage part of above write-up. -- Wendy From jos at xos.nl Tue Jan 1 17:44:22 2008 From: jos at xos.nl (Jos Vos) Date: Tue, 1 Jan 2008 18:44:22 +0100 Subject: [Linux-cluster] gfs2 hang In-Reply-To: <477A7EBC.7060201@redhat.com> References: <477419CF.1040002@cgl.ucsf.edu> <20071227214025.GB16736@jasmine.xos.nl> <47741D14.7020109@cgl.ucsf.edu> <20071228085734.GA23405@jasmine.xos.nl> <477A7EBC.7060201@redhat.com> Message-ID: <20080101174422.GD27030@jasmine.xos.nl> On Tue, Jan 01, 2008 at 12:56:12PM -0500, Wendy Cheng wrote: > On GFS(1) part, the glock trimming patch > (http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4) > was developed for customers with rsync issues. Field data have shown > positive results. It is released on RHEL 5.1, as well on RHEL 4.6. Check > out the usage part of above write-up. Checking the 5.1 bahavior is on my todo-list... will post the results afterwards. Current experiences are based on 5.0, yes. -- -- Jos Vos -- X/OS Experts in Open Systems BV | Phone: +31 20 6938364 -- Amsterdam, The Netherlands | Fax: +31 20 6948204 From raycharles_man at yahoo.com Wed Jan 2 03:09:21 2008 From: raycharles_man at yahoo.com (Ray Charles) Date: Tue, 1 Jan 2008 19:09:21 -0800 (PST) Subject: [Linux-cluster] clvmd fails to start on second node. Message-ID: <580534.86509.qm@web32111.mail.mud.yahoo.com> Hi, I am following the directions of the VMClusterCookbook and have the early makings of a virtualized two node cluster. On both guest-nodes i can initialize cman and achieve quorate. However, things go sideways when I try to initialize clvmd on both nodes. The first node brings up clvmd cleanly but the second reports FAILED upon coming up and the volumes i wish to mount become inaccessible. Further, I've no firewalling on the guests. I 've read a similar post but that was from many months ago, on older version not much use. Logs and output below.. Thanks in advance for any responses! >From the consoles.. [root at vsp07 ~]# /etc/init.d/clvmd start Starting clvmd: dlm: Using TCP for communications [ OK ] Activating VGs: 2 logical volume(s) in volume group "VolGroup00" now active [ OK ] [root at vsp07 ~]# [root at vsp08 ~]# /etc/init.d/clvmd start Starting clvmd: clvmd startup timed out [FAILED] >From the /var/log/messages... [root at vsp07 ~]# tail -f /var/log/messages Jan 1 19:28:13 vsp07 kernel: dlm: Using TCP for communications Jan 1 19:28:14 vsp07 clvmd: Cluster LVM daemon started - connected to CMAN Jan 1 19:28:27 vsp07 kernel: dlm: connecting to 2 Jan 1 19:28:27 vsp07 kernel: dlm: connect from non cluster node [root at vsp08 ~]# tail -f /var/log/messages Jan 1 19:28:27 vsp08 kernel: dlm: Using TCP for communications Jan 1 19:28:27 vsp08 kernel: dlm: connect from non cluster node Jan 1 19:28:27 vsp08 kernel: dlm: connecting to 1 Here is my cluster.conf file for the guests Cent0S-5.1.. ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs From pcaulfie at redhat.com Wed Jan 2 07:55:39 2008 From: pcaulfie at redhat.com (Patrick Caulfeld) Date: Wed, 02 Jan 2008 07:55:39 +0000 Subject: [Linux-cluster] clvmd fails to start on second node. In-Reply-To: <580534.86509.qm@web32111.mail.mud.yahoo.com> References: <580534.86509.qm@web32111.mail.mud.yahoo.com> Message-ID: <477B437B.6020503@redhat.com> Ray Charles wrote: > > Hi, > > I am following the directions of the VMClusterCookbook > and have the early makings of a virtualized two node > cluster. On both guest-nodes i can initialize cman and > achieve quorate. However, things go sideways when I > try to initialize clvmd on both nodes. The first node > brings up clvmd cleanly but the second reports FAILED > upon coming up and the volumes i wish to mount become > inaccessible. Further, I've no firewalling on the > guests. I 've read a similar post but that was from > many months ago, on older version not much use. Logs > and output below.. > > Thanks in advance for any responses! > >>From the consoles.. > > [root at vsp07 ~]# /etc/init.d/clvmd start > Starting clvmd: dlm: Using TCP for communications > [ OK ] > Activating VGs: 2 logical volume(s) in volume group > "VolGroup00" now active > [ OK ] > [root at vsp07 ~]# > > > [root at vsp08 ~]# /etc/init.d/clvmd start > Starting clvmd: clvmd startup timed out > > [FAILED] > >>From the /var/log/messages... > > [root at vsp07 ~]# tail -f /var/log/messages > Jan 1 19:28:13 vsp07 kernel: dlm: Using TCP for > communications > Jan 1 19:28:14 vsp07 clvmd: Cluster LVM daemon > started - connected to CMAN > Jan 1 19:28:27 vsp07 kernel: dlm: connecting to 2 > Jan 1 19:28:27 vsp07 kernel: dlm: connect from non > cluster node > > [root at vsp08 ~]# tail -f /var/log/messages > Jan 1 19:28:27 vsp08 kernel: dlm: Using TCP for > communications > Jan 1 19:28:27 vsp08 kernel: dlm: connect from non > cluster node > Jan 1 19:28:27 vsp08 kernel: dlm: connecting to 1 > That looks like an old and buggy dlm kernel module. I don't know off-hand what the version numbers are, but see if you can find an updated version. Patrick From swhiteho at redhat.com Wed Jan 2 09:21:22 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 02 Jan 2008 09:21:22 +0000 Subject: [Linux-cluster] gfs2 hang In-Reply-To: <477419CF.1040002@cgl.ucsf.edu> References: <477419CF.1040002@cgl.ucsf.edu> Message-ID: <1199265682.22038.15.camel@quoit> Hi, On Thu, 2007-12-27 at 13:31 -0800, Scooter Morris wrote: > Greetings, > We've got a two-node cluster running RHEL 5.1 that we've been > experimenting with and have discovered a problem with gfs2. As part of > our build environment, we have some find scripts that walk a directory tree: > > #! /bin/sh > for html in `/usr/bin/find curGenerated -name \*.html -print` ; do \ > cat $html > tmpCR.html ; \ > /bin/mv tmpCR.html $html ; \ > done > > The curGenerated directory has about 141 subdirectories, each of which > has from 2-10 subdirectories. What we find is that this find script > will hang the operating system when it is executed within a gfs2 > partition that is shared between the two nodes. Fencing is configured > and detects the hung node and restarts it, but that's not much of a > consolation. The gfs2 partition lives on a fibreChannel array (HP > EVA5000), and quotas are not turned on. The gfs2 filesystem continues > to operate normally on the other node. > > Is this a known bug in gfs2? Is there something we could do to help > find this problem? > > Thanks! > > -- scooter > I think this is probably a known bug, bz #404711 which is fixed in upstream and also for 5.2. It triggers when rename is called in the situation where it needs to allocate an extra block for the directory and also there is a target file being unlinked, and also where both of these operations happen to occur in the same resource group. If this doesn't turn out to be the case, then please file a bugzilla, Steve. From janne.peltonen at helsinki.fi Wed Jan 2 11:37:35 2008 From: janne.peltonen at helsinki.fi (Janne Peltonen) Date: Wed, 2 Jan 2008 13:37:35 +0200 Subject: [Linux-cluster] #48: Unable to obtain cluster lock: Invalid argument Message-ID: <20080102113734.GV19197@helsinki.fi> Hi. After running a cluster node in a production cluster since July, I got the folllowing error: #48: Unable to obtain cluster lock: Invalid argument Which resulted in a reboot: --clip-- Dec 27 02:50:31 pcn1 clurgmgrd[6217]: #48: Unable to obtain cluster lock: Invalid argument Dec 27 02:50:31 pcn1 clurgmgrd[6217]: Stopping service service:p01 Dec 27 02:50:34 pcn1 in.rdiscd[30325]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use Dec 27 02:50:34 pcn1 in.rdiscd[30325]: Failed joining addresses Dec 27 02:50:38 pcn1 snmpd[15929]: error on subcontainer 'ia_addr' insert (-1) Dec 27 02:50:38 pcn1 snmpd[15929]: error on subcontainer 'ia_addr' insert (-1) Dec 27 02:50:38 pcn1 snmpd[15929]: error on subcontainer '' insert (-1) Dec 27 02:50:38 pcn1 snmpd[15929]: error on subcontainer '' insert (-1) Dec 27 02:50:45 pcn1 clurgmgrd[6217]: Service service:p01 is recovering Dec 27 02:50:45 pcn1 clurgmgrd[6217]: Recovering failed service service:p01 Dec 27 02:50:45 pcn1 kernel: dlm: add_to_waiters error 1 Dec 27 02:50:45 pcn1 kernel: dlm: remove_from_waiters error Dec 27 02:50:45 pcn1 kernel: dlm: rgmanager: receive_unlock_reply not on waiters Dec 27 02:50:45 pcn1 clurgmgrd[6216]: Watchdog: Daemon died, rebooting... Dec 27 02:50:45 pcn1 kernel: md: stopping all md devices. Dec 27 02:55:23 pcn1 syslogd 1.4.1: restart. --clip-- Other members of the cluster noticed the missing member, fenced it, failed services over, and back (when the missing node had rejoined): --clip-- Dec 27 02:50:56 pcn2 openais[4588]: [TOTEM] The token was lost in the OPERATIONAL state. Dec 27 02:50:56 pcn2 openais[4588]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes). Dec 27 02:50:56 pcn2 openais[4588]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Dec 27 02:50:56 pcn2 openais[4588]: [TOTEM] entering GATHER state from 2. Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] entering GATHER state from 11. Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] Saving state aru 6a4 high seq received 6a4 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] entering COMMIT state. Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] entering RECOVERY state. Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] position [0] member 10.3.0.10: Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] previous ring seq 324 rep 10.3.0.10 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] aru 6a4 high delivered 6a4 received flag 0 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] position [1] member 10.3.0.12: Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] previous ring seq 324 rep 10.3.0.10 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] aru 6a4 high delivered 6a4 received flag 0 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] position [2] member 10.3.0.13: Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] previous ring seq 324 rep 10.3.0.10 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] aru 6a4 high delivered 6a4 received flag 0 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] position [3] member 10.3.0.14: Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] previous ring seq 324 rep 10.3.0.10 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] aru 6a4 high delivered 6a4 received flag 0 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] position [4] member 10.3.0.15: Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] previous ring seq 324 rep 10.3.0.10 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] aru 6a4 high delivered 6a4 received flag 1 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] position [5] member 10.3.0.16: Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] previous ring seq 324 rep 10.3.0.10 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] aru 6a4 high delivered 6a4 received flag 1 Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] Did not need to originate any messages in recovery. Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] Storing new sequence id for ring 14c Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] CLM CONFIGURATION CHANGE Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] New Configuration: Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.10) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.12) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.13) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.14) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.15) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.16) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] Members Left: Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.11) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] Members Joined: Dec 27 02:51:01 pcn2 openais[4588]: [SYNC ] This node is within the primary component and will provide se rvice. Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] CLM CONFIGURATION CHANGE Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] New Configuration: Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.10) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.12) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.13) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.14) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.15) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.16) Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] Members Left: Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] Members Joined: Dec 27 02:51:01 pcn2 openais[4588]: [SYNC ] This node is within the primary component and will provide se rvice. Dec 27 02:51:01 pcn2 openais[4588]: [TOTEM] entering OPERATIONAL state. Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.10 Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.12 Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.13 Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.14 Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.15 Dec 27 02:51:01 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.16 Dec 27 02:51:01 pcn2 openais[4588]: [CPG ] got joinlist message from node 3 Dec 27 02:51:01 pcn2 openais[4588]: [CPG ] got joinlist message from node 4 Dec 27 02:51:01 pcn2 openais[4588]: [CPG ] got joinlist message from node 5 Dec 27 02:51:01 pcn2 openais[4588]: [CPG ] got joinlist message from node 6 Dec 27 02:51:01 pcn2 openais[4588]: [CPG ] got joinlist message from node 100 Dec 27 02:51:01 pcn2 openais[4588]: [CPG ] got joinlist message from node 2 Dec 27 02:51:01 pcn2 kernel: dlm: closing connection to node 1 Dec 27 02:51:01 pcn2 fenced[4614]: pcn1-hb not a cluster member after 0 sec post_fail_delay Dec 27 02:51:01 pcn2 fenced[4614]: fencing node "pcn1-hb" Dec 27 02:52:13 pcn2 fenced[4614]: fence "pcn1-hb" success Dec 27 02:52:18 pcn2 ccsd[4541]: Attempt to close an unopened CCS descriptor (799075500). Dec 27 02:52:18 pcn2 ccsd[4541]: Error while processing disconnect: Invalid request descriptor Dec 27 02:52:20 pcn2 clurgmgrd[6262]: Taking over service service:p01 from down member pcn1-hb Dec 27 02:52:20 pcn2 clurgmgrd[6262]: Taking over service service:i01 from down member pcn1-hb Dec 27 02:52:20 pcn2 kernel: kjournald starting. Commit interval 5 seconds Dec 27 02:52:20 pcn2 kernel: EXT3 FS on dm-65, internal journal Dec 27 02:52:20 pcn2 kernel: EXT3-fs: mounted filesystem with ordered data mode. Dec 27 02:52:21 pcn2 clurgmgrd[6262]: Taking over service service:i13 from down member pcn1-hb Dec 27 02:52:21 pcn2 in.rdiscd[2158]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use Dec 27 02:52:21 pcn2 in.rdiscd[2158]: Failed joining addresses Dec 27 02:52:22 pcn2 kernel: kjournald starting. Commit interval 5 seconds Dec 27 02:52:22 pcn2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended Dec 27 02:52:22 pcn2 kernel: EXT3 FS on dm-14, internal journal Dec 27 02:52:22 pcn2 kernel: EXT3-fs: recovery complete. Dec 27 02:52:22 pcn2 kernel: EXT3-fs: mounted filesystem with ordered data mode. Dec 27 02:52:24 pcn2 clurgmgrd[6262]: Service service:p01 started Dec 27 02:52:25 pcn2 last message repeated 2 times Dec 27 02:52:27 pcn2 kernel: kjournald starting. Commit interval 5 seconds Dec 27 02:52:27 pcn2 kernel: EXT3 FS on dm-2, internal journal Dec 27 02:52:27 pcn2 kernel: EXT3-fs: dm-2: 3 orphan inodes deleted Dec 27 02:52:27 pcn2 kernel: EXT3-fs: recovery complete. Dec 27 02:52:27 pcn2 kernel: EXT3-fs: mounted filesystem with ordered data mode. Dec 27 02:52:29 pcn2 kernel: kjournald starting. Commit interval 5 seconds Dec 27 02:52:29 pcn2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended Dec 27 02:52:29 pcn2 kernel: EXT3 FS on dm-38, internal journal Dec 27 02:52:29 pcn2 kernel: EXT3-fs: recovery complete. Dec 27 02:52:29 pcn2 kernel: EXT3-fs: mounted filesystem with ordered data mode. Dec 27 02:52:30 pcn2 in.rdiscd[3313]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use Dec 27 02:52:30 pcn2 in.rdiscd[3313]: Failed joining addresses Dec 27 02:52:32 pcn2 clurgmgrd[6262]: Service service:i13 started Dec 27 02:52:35 pcn2 kernel: kjournald starting. Commit interval 5 seconds Dec 27 02:52:35 pcn2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended Dec 27 02:52:35 pcn2 kernel: EXT3 FS on dm-26, internal journal Dec 27 02:52:35 pcn2 kernel: EXT3-fs: recovery complete. Dec 27 02:52:35 pcn2 kernel: EXT3-fs: mounted filesystem with ordered data mode. Dec 27 02:52:37 pcn2 in.rdiscd[3833]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use Dec 27 02:52:37 pcn2 in.rdiscd[3833]: Failed joining addresses Dec 27 02:52:38 pcn2 clurgmgrd[6262]: Service service:i01 started Dec 27 02:53:25 pcn2 last message repeated 2 times Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] entering GATHER state from 11. Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] Saving state aru c8 high seq received c8 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] entering COMMIT state. Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] entering RECOVERY state. Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] position [0] member 10.3.0.10: Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] previous ring seq 332 rep 10.3.0.10 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] aru c8 high delivered c8 received flag 0 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] position [1] member 10.3.0.11: Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] previous ring seq 288 rep 10.3.0.11 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] aru 9 high delivered 9 received flag 0 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] position [2] member 10.3.0.12: Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] previous ring seq 332 rep 10.3.0.10 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] aru c8 high delivered c8 received flag 0 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] position [3] member 10.3.0.13: Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] previous ring seq 332 rep 10.3.0.10 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] aru c8 high delivered c8 received flag 0 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] position [4] member 10.3.0.14: Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] previous ring seq 332 rep 10.3.0.10 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] aru c8 high delivered c8 received flag 0 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] position [5] member 10.3.0.15: Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] previous ring seq 332 rep 10.3.0.10 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] aru c8 high delivered c8 received flag 1 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] position [6] member 10.3.0.16: Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] previous ring seq 332 rep 10.3.0.10 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] aru c8 high delivered c8 received flag 1 Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] Did not need to originate any messages in recovery. Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] Storing new sequence id for ring 150 Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] CLM CONFIGURATION CHANGE Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] New Configuration: Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.10) Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.12) Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.13) Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.14) Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.15) Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] Members Left: Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] Members Joined: Dec 27 02:55:26 pcn2 openais[4588]: [SYNC ] This node is within the primary component and will provide se rvice. Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] CLM CONFIGURATION CHANGE Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] New Configuration: Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.10) Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.11) Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.12) Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.13) Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.14) Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.15) Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.16) Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] Members Left: Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] Members Joined: Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] r(0) ip(10.3.0.11) Dec 27 02:55:26 pcn2 openais[4588]: [SYNC ] This node is within the primary component and will provide se rvice. Dec 27 02:55:26 pcn2 openais[4588]: [TOTEM] entering OPERATIONAL state. Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.10 Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.11 Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.12 Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.13 Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.14 Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.15 Dec 27 02:55:26 pcn2 openais[4588]: [CLM ] got nodejoin message 10.3.0.16 Dec 27 02:55:26 pcn2 openais[4588]: [CPG ] got joinlist message from node 100 Dec 27 02:55:26 pcn2 openais[4588]: [CPG ] got joinlist message from node 2 Dec 27 02:55:26 pcn2 openais[4588]: [CPG ] got joinlist message from node 3 Dec 27 02:55:26 pcn2 openais[4588]: [CPG ] got joinlist message from node 4 Dec 27 02:55:26 pcn2 openais[4588]: [CPG ] got joinlist message from node 5 Dec 27 02:55:26 pcn2 openais[4588]: [CPG ] got joinlist message from node 6 Dec 27 02:55:35 pcn2 kernel: dlm: connecting to 1 --clip-- --clip-- Dec 27 02:55:24 pcn1 ccsd[4132]: Starting ccsd 2.0.69: Dec 27 02:55:24 pcn1 ccsd[4132]: Built: Jun 27 2007 15:21:32 Dec 27 02:55:24 pcn1 ccsd[4132]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Dec 27 02:55:24 pcn1 ccsd[4132]: cluster.conf (cluster name = mappi-primary, version = 109) found. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] AIS Executive Service RELEASE 'subrev 1324 version 0.80.2' Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contribu tors. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Copyright (C) 2006 Red Hat, Inc. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] AIS Executive Service: started and ready to provide service. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Using default multicast address of 239.192.46.199 Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] openais component openais_cpg loaded. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Registering service handler 'openais cluster closed process g roup service v1.01' Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] openais component openais_cfg loaded. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Registering service handler 'openais configuration service' Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] openais component openais_msg loaded. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Registering service handler 'openais message service B.01.01' Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] openais component openais_lck loaded. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Registering service handler 'openais distributed locking serv ice B.01.01' Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] openais component openais_evt loaded. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Registering service handler 'openais event service B.01.01' Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] openais component openais_ckpt loaded. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Registering service handler 'openais checkpoint service B.01. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] openais component openais_amf loaded. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Registering service handler 'openais availability management framework B.01.01' Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] openais component openais_clm loaded. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Registering service handler 'openais cluster membership servi ce B.01.01' Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] openais component openais_evs loaded. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Registering service handler 'openais extended virtual synchro ny service' Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] openais component openais_cman loaded. Dec 27 02:55:26 pcn1 openais[4143]: [MAIN ] Registering service handler 'openais CMAN membership service 2.01' Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms) Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans) Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms) Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs) Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] window size per rotation (50 messages) maximum messages per r otation (17 messages) Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] send threads (0 threads) Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] RRP token expired timeout (495 ms) Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] RRP token problem counter (2000 ms) Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] RRP threshold (10 problem count) Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] RRP mode set to none. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] heartbeat_failures_allowed (0) Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] max_network_delay (50 ms) Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allow ed > 0 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes). Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] The network interface [10.3.0.11] is now up. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] Created or loaded sequence id 284.10.3.0.11 for this ring. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] entering GATHER state from 15. Dec 27 02:55:26 pcn1 openais[4143]: [SERV ] Initialising service handler 'openais extended virtual synchr ony service' Dec 27 02:55:26 pcn1 openais[4143]: [SERV ] Initialising service handler 'openais cluster membership serv ice B.01.01' Dec 27 02:55:26 pcn1 openais[4143]: [SERV ] Initialising service handler 'openais availability management framework B.01.01' Dec 27 02:55:26 pcn1 openais[4143]: [SERV ] Initialising service handler 'openais checkpoint service B.01 .01' Dec 27 02:55:26 pcn1 openais[4143]: [SERV ] Initialising service handler 'openais event service B.01.01' Dec 27 02:55:26 pcn1 openais[4143]: [SERV ] Initialising service handler 'openais distributed locking ser vice B.01.01' Dec 27 02:55:26 pcn1 openais[4143]: [SERV ] Initialising service handler 'openais message service B.01.01 ' Dec 27 02:55:26 pcn1 openais[4143]: [SERV ] Initialising service handler 'openais configuration service' Dec 27 02:55:26 pcn1 openais[4143]: [SERV ] Initialising service handler 'openais cluster closed process group service v1.01' Dec 27 02:55:26 pcn1 openais[4143]: [SERV ] Initialising service handler 'openais CMAN membership service 2.01' Dec 27 02:55:26 pcn1 openais[4143]: [CMAN ] CMAN 2.0.69 (built Jun 27 2007 15:21:36) started Dec 27 02:55:26 pcn1 openais[4143]: [SYNC ] Not using a virtual synchrony filter. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] Creating commit token because I am the rep. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] Saving state aru 0 high seq received 0 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] entering COMMIT state. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] entering RECOVERY state. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] position [0] member 10.3.0.11: Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] previous ring seq 284 rep 10.3.0.11 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] aru 0 high delivered 0 received flag 0 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] Did not need to originate any messages in recovery. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] Storing new sequence id for ring 120 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] Sending initial ORF token Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] CLM CONFIGURATION CHANGE Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] New Configuration: Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] Members Left: Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] Members Joined: Dec 27 02:55:26 pcn1 openais[4143]: [SYNC ] This node is within the primary component and will provide se rvice. Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] CLM CONFIGURATION CHANGE Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] New Configuration: Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.11) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] Members Left: Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] Members Joined: Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.11) Dec 27 02:55:26 pcn1 openais[4143]: [SYNC ] This node is within the primary component and will provide se rvice. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] entering OPERATIONAL state. Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] got nodejoin message 10.3.0.11 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] entering GATHER state from 11. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] Saving state aru 9 high seq received 9 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] entering COMMIT state. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] entering RECOVERY state. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] position [0] member 10.3.0.10: Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] previous ring seq 332 rep 10.3.0.10 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] aru c8 high delivered c8 received flag 0 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] position [1] member 10.3.0.11: Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] previous ring seq 288 rep 10.3.0.11 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] aru 9 high delivered 9 received flag 0 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] position [2] member 10.3.0.12: Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] previous ring seq 332 rep 10.3.0.10 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] aru c8 high delivered c8 received flag 0 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] position [3] member 10.3.0.13: Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] previous ring seq 332 rep 10.3.0.10 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] aru c8 high delivered c8 received flag 0 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] position [4] member 10.3.0.14: Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] previous ring seq 332 rep 10.3.0.10 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] aru c8 high delivered c8 received flag 0 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] position [5] member 10.3.0.15: Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] previous ring seq 332 rep 10.3.0.10 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] aru c8 high delivered c8 received flag 1 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] position [6] member 10.3.0.16: Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] previous ring seq 332 rep 10.3.0.10 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] aru c8 high delivered c8 received flag 1 Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] Did not need to originate any messages in recovery. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] Storing new sequence id for ring 150 Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] CLM CONFIGURATION CHANGE Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] New Configuration: Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.11) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] Members Left: Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] Members Joined: Dec 27 02:55:26 pcn1 openais[4143]: [SYNC ] This node is within the primary component and will provide se rvice. Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] CLM CONFIGURATION CHANGE Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] New Configuration: Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.10) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.11) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.12) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.13) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.14) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.15) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.16) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] Members Left: Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] Members Joined: Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.10) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.12) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.13) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.14) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.15) Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] r(0) ip(10.3.0.16) Dec 27 02:55:26 pcn1 openais[4143]: [SYNC ] This node is within the primary component and will provide se rvice. Dec 27 02:55:26 pcn1 openais[4143]: [TOTEM] entering OPERATIONAL state. Dec 27 02:55:26 pcn1 openais[4143]: [CMAN ] quorum regained, resuming activity Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] got nodejoin message 10.3.0.10 Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] got nodejoin message 10.3.0.11 Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] got nodejoin message 10.3.0.12 Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] got nodejoin message 10.3.0.13 Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] got nodejoin message 10.3.0.14 Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] got nodejoin message 10.3.0.15 Dec 27 02:55:26 pcn1 openais[4143]: [CLM ] got nodejoin message 10.3.0.16 Dec 27 02:55:26 pcn1 openais[4143]: [CPG ] got joinlist message from node 100 Dec 27 02:55:26 pcn1 openais[4143]: [CPG ] got joinlist message from node 2 Dec 27 02:55:26 pcn1 openais[4143]: [CPG ] got joinlist message from node 3 Dec 27 02:55:26 pcn1 openais[4143]: [CPG ] got joinlist message from node 4 Dec 27 02:55:26 pcn1 openais[4143]: [CPG ] got joinlist message from node 5 Dec 27 02:55:26 pcn1 openais[4143]: [CPG ] got joinlist message from node 6 Dec 27 02:55:26 pcn1 ccsd[4132]: Initial status:: Quorate Dec 27 02:55:35 pcn1 kernel: dlm: got connection from 100 Dec 27 02:55:35 pcn1 kernel: dlm: got connection from 2 Dec 27 02:55:35 pcn1 kernel: dlm: got connection from 3 Dec 27 02:55:35 pcn1 kernel: dlm: got connection from 5 Dec 27 02:55:35 pcn1 kernel: dlm: got connection from 6 Dec 27 02:55:35 pcn1 kernel: dlm: got connection from 4 Dec 27 02:55:35 pcn1 clvmd: Cluster LVM daemon started - connected to CMAN Dec 27 03:01:04 pcn1 clurgmgrd[5515]: Starting stopped service service:i03 Dec 27 03:01:04 pcn1 clurgmgrd[5515]: Starting stopped service service:i15 [etc] --clip-- Now I tried googling around for the mysterious error message #48, and couldn't find any info. What might've been up? --Janne -- Janne Peltonen From kjain at aurarianetworks.com Wed Jan 2 14:31:13 2008 From: kjain at aurarianetworks.com (Kamal Jain) Date: Wed, 2 Jan 2008 09:31:13 -0500 Subject: [Linux-cluster] GFS performance In-Reply-To: <477A71B0.1080804@redhat.com> References: <1198770380.4932.23.camel@WSBID06223> <477A71B0.1080804@redhat.com> Message-ID: Hi Wendy, IOZONE v3.283 was used to generate the results I posted. An example invocation line [for the IOPS result]: ./iozone -O -l 1 -u 8 -T -b /root/iozone_IOPS_1_TO_8_THREAD_1_DISK_ISCSI_DIRECT.xls -F /mnt/iscsi_direct1/iozone/iozone1.tmp ... It's for 1 to 8 threads, and I provided 8 file names through I'm only showing one in the line above. The file destinations were on the same disk for a single disk test, and on alternating disks for a 2-disk test. I believe IOZONE uses a simple random string, repeated in certain default record sizes, when performing its various operations. - K -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Wendy Cheng Sent: Tuesday, January 01, 2008 12:01 PM To: linux clustering Subject: Re: [Linux-cluster] GFS performance Kamal Jain wrote: > A challenge we're dealing with is a massive number of small files, so > there is a lot of file-level overhead, and as you saw in the > charts...the random reads and writes were not friends of GFS. > It is expected that GFS2 would do better in this area butt this does *not* imply GFS(1) is not fixable. One thing would be helpful is sending us the benchmark (or test program that can reasonably represent your application IO patterns) you used to generate the performance data. Then we'll see what can be done from there .... -- Wendy -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From lhh at redhat.com Wed Jan 2 16:08:57 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 02 Jan 2008 11:08:57 -0500 Subject: [Linux-cluster] #48: Unable to obtain cluster lock: Invalid argument In-Reply-To: <20080102113734.GV19197@helsinki.fi> References: <20080102113734.GV19197@helsinki.fi> Message-ID: <1199290137.5980.24.camel@ayanami.boston.devel.redhat.com> On Wed, 2008-01-02 at 13:37 +0200, Janne Peltonen wrote: > Hi. > > After running a cluster node in a production cluster since July, I got > the folllowing error: > > #48: Unable to obtain cluster lock: Invalid argument What version of rgmanager was it? -- Lon From lhh at redhat.com Wed Jan 2 16:12:28 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 02 Jan 2008 11:12:28 -0500 Subject: [Linux-cluster] RH Cluster issue-Network Failover not happening In-Reply-To: <002901c8451a$95738020$4e030196@mhd.co.om> References: <002901c8451a$95738020$4e030196@mhd.co.om> Message-ID: <1199290348.5980.29.camel@ayanami.boston.devel.redhat.com> On Sun, 2007-12-23 at 08:16 +0400, Harun wrote: > Issue: When network cable is disconnected from the Primary, primary restart > unclean and the failover to secondary do not happens. The shared drives > don't get mounted automatically for secondary neither gets it mounted on > primary, after the primary restarts. I have to then manually shut down both > Primary and Secondary, and start primary first and then secondary for the > setup to work fine again. > I want to test a live production setup... a Linux Cluster with 2 nodes, in > Linux Advanced Server (Linux DB-Primary 2.4.21-37.ELsmp #1 SMP Wed Sep 7 > 13:28:55 EDT 2005 i686 i686 i386 GNU/Linux ), Oracle data base is running on > this setup. > > The clumanager version is 1.2.28 and redhat-config-cluster version is 1.0.8 > on both primary and secondary.I want to resolve the issue with out any > upgradations. Do you think that updating can resolve the issue? If > upgradation is required please guide how to go ahead. I am trying to resolve > this issue with out any patch update. > Is this a configuration problem or some knows issue with the version used. > > Cluster.xml looks like this. > > > - > multicast_ipaddress="225.0.0.11" thread="yes" tko_count="25" /> > Set a tiebreaker_ip if you don't want it to survive network splits. This IP needs to be on the same network as the IPs which map to the hosts "DB-Primary" and "DB-Secondary", but must not reside on the hosts themselves (use a switch IP, another host, or gateway) Also, set monitor_link to 1 in the service_ipaddress ... -- Lon From lhh at redhat.com Wed Jan 2 16:14:48 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 02 Jan 2008 11:14:48 -0500 Subject: [Linux-cluster] Channel Bonding issue in Cluster Suite Setup In-Reply-To: <839293.38853.qm@web50606.mail.re2.yahoo.com> References: <839293.38853.qm@web50606.mail.re2.yahoo.com> Message-ID: <1199290488.5980.31.camel@ayanami.boston.devel.redhat.com> On Thu, 2007-12-27 at 06:29 -0800, Roger Pe?a wrote: > --- Balaji wrote: > both servers thinks that the other one are death, so I > guess you have a problem with comunication between the > nodes after the bonding is set > > What I wonder now is why fencing do not work :-( > it could be dangerus to have both nodes accessing the > same storage without know it :-( Right. Why did both become quorate? Is fencing "not there"? -- Lon From lhh at redhat.com Wed Jan 2 16:17:31 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 02 Jan 2008 11:17:31 -0500 Subject: [Linux-cluster] GFS without RHCM but with Heartbeat V2 and drbd ? In-Reply-To: <4774D07E.6030701@arcor.de> References: <20071227170006.AE68F733D5@hormel.redhat.com> <4774D07E.6030701@arcor.de> Message-ID: <1199290651.5980.34.camel@ayanami.boston.devel.redhat.com> On Fri, 2007-12-28 at 11:31 +0100, Holger Woehle wrote: > Hi, > at the moment i am evaluating RHCM and Heartbeat V2 to refine our > Heartbeat V1 Cluster. > My question as in the subject: > Is it possible to use GFS without the RHCM ? Not sure. You don't need rgmanager, but you need fencing/membership/etc. Nothing precludes you from running HBv2 in addition to RHCM+GFS. > I want to build a 2 node cluster with drbd activ/activ and RHCM/GFS or > Heartbeat V2 with filesystem OCFS or GFS. Here's how to get a 2-node DRBD setup going w/ RHCM (without HBv2): http://sources.redhat.com/cluster/wiki/DRBD_Cookbook -- Lon From janne.peltonen at helsinki.fi Wed Jan 2 16:25:02 2008 From: janne.peltonen at helsinki.fi (Janne Peltonen) Date: Wed, 2 Jan 2008 18:25:02 +0200 Subject: [Linux-cluster] #48: Unable to obtain cluster lock: Invalid argument In-Reply-To: <1199290137.5980.24.camel@ayanami.boston.devel.redhat.com> References: <20080102113734.GV19197@helsinki.fi> <1199290137.5980.24.camel@ayanami.boston.devel.redhat.com> Message-ID: <20080102162502.GA4504@helsinki.fi> On Wed, Jan 02, 2008 at 11:08:57AM -0500, Lon Hohberger wrote: > On Wed, 2008-01-02 at 13:37 +0200, Janne Peltonen wrote: > > Hi. > > > > After running a cluster node in a production cluster since July, I got > > the folllowing error: > > > > #48: Unable to obtain cluster lock: Invalid argument > > What version of rgmanager was it? [jmmpelto at pcn1 log]$ rpm -q rgmanager rgmanager-2.0.27-2.1lhh.el5 There were also a couple nodes with a newer rgmanager in the same cluster: [jmmpelto at pcn5 mappi2]$ rpm -q rgmanager rgmanager-2.0.31-1.el5.centos --Janne -- Janne Peltonen From wcheng at redhat.com Wed Jan 2 17:27:05 2008 From: wcheng at redhat.com (Wendy Cheng) Date: Wed, 02 Jan 2008 12:27:05 -0500 Subject: [Linux-cluster] GFS performance In-Reply-To: References: <1198770380.4932.23.camel@WSBID06223> <477A71B0.1080804@redhat.com> Message-ID: <477BC969.6050506@redhat.com> Kamal Jain wrote: > Hi Wendy, > > IOZONE v3.283 was used to generate the results I posted. > > An example invocation line [for the IOPS result]: > > > ./iozone -O -l 1 -u 8 -T -b /root/iozone_IOPS_1_TO_8_THREAD_1_DISK_ISCSI_DIRECT.xls -F /mnt/iscsi_direct1/iozone/iozone1.tmp ... > > > It's for 1 to 8 threads, and I provided 8 file names through I'm only showing one in the line above. The file destinations were on the same disk for a single disk test, and on alternating disks for a 2-disk test. I believe IOZONE uses a simple random string, repeated in certain default record sizes, when performing its various operations. > > Intuitively (by reading your iozone command), this is a locking issue. There are lots to say on your setup, mostly because all data and lock traffic are funneling thru the same network. Remember locking is mostly to do with *latency*, not bandwidth. So even your network is not saturated, the performance can go down. It is different from the rsync issue (as described by Jos Vos) so the glock trimming patch is not helpful in this case. However, I won't know for sure until we get the data analyzed. Thanks for the input. -- Wendy From jparsons at redhat.com Wed Jan 2 20:34:39 2008 From: jparsons at redhat.com (James Parsons) Date: Wed, 02 Jan 2008 15:34:39 -0500 Subject: [Linux-cluster] Error messages during Fence operation In-Reply-To: <4779380A.6000706@noaa.gov> References: <47793613.9000304@noaa.gov> <4779380A.6000706@noaa.gov> Message-ID: <477BF55F.5000900@redhat.com> Randy Brown wrote: > I forgot....I'm using Centos 5 with latest patches and kernel. > > Randy Brown wrote: > >> I am using an APC Masterswitch Plus as my fencing device. I am >> seeing this in my logs now when fencing occurs: >> >> Dec 31 11:36:26 nfs1-cluster fenced[3848]: agent "fence_apc" reports: >> Traceback (most recent call last): File "/sbin/fence_apc", line >> 829, in ? main() File "/sbin/fence_apc", line 289, in main >> do_login(sock) File "/sbin/fence_apc", line 444, in do_login i, >> mo, txt = sock.expect(regex_list, TELNET_TIMEOUT) >> Dec 31 11:36:26 nfs1-cluster fenced[3848]: agent "fence_apc" >> reports: File "/usr/lib/python2.4/telnetlib.py", line 620, in >> expect text = self.read_very_lazy() File >> "/usr/lib/python2.4/telnetlib.py", line 400, in read_very_lazy >> raise EOFError, 'telnet connection closed' EOFError: telnet >> connection closed >> Dec 31 11:36:26 nfs1-cluster fenced[3848]: fence >> "nfs2-cluster.nws.noaa.gov" failed >> >> This used to work just fine. If I run `fence_apc -a 192.168.42.30 -l >> cluster -n 1:7 -o Reboot -p ` from the command line, >> fencing works as expected. The relevant lines from my cluster.conf >> file are below. I will gladly provide more information as necessary. > Is it possible that you are already telnet'ed into the switch from a terminal or somesuch when the fence attempt takes place? APC switches allow only one login at a time. I should/will add a log comment that mentions this as a possible reason. If this is not the issue, well, we can keep digging... -J From jamesc at exa.com Wed Jan 2 22:35:23 2008 From: jamesc at exa.com (James Chamberlain) Date: Wed, 2 Jan 2008 17:35:23 -0500 (EST) Subject: [Linux-cluster] Instability troubles Message-ID: Hi all, I'm having some major stability problems with my three-node CS/GFS cluster. Every two or three days, one of the nodes fences another, and I have to hard-reboot the entire cluster to recover. I have had this happen twice today. I don't know what's triggering the fencing, since all the nodes appear to me to be up and running when it happens. In fact, I was logged on to node3 just now, running 'top', when node2 fenced it. When they come up, they don't automatically mount their GFS filesystems, even with "_netdev" specified as a mount option; however, the node which comes up first mounts them all as part of bringing all the services up. I did notice a couple of disconcerting things earlier today. First, I was running "watch clustat". (I prefer to see the time updating, where I can't with "clustat -i") At one point, "clustat" crashed as follows: Jan 2 15:19:54 node2 kernel: clustat[17720]: segfault at 0000000000000024 rip 0000003629e75bc0 rsp 00007fff18827178 error 4 Fairly shortly thereafter, clustat reported that node3 as "Online, Estranged, rgmanager". Can anyone shed light on what that means? Google's not telling me much. At the moment, all three nodes are running CentOS 5.1, with kernel 2.6.18-53.1.4.el5. Can anyone point me in the right direction to resolve these problems? I wasn't having trouble like this when I was running a CentOS 4 CS/GFS cluster. Is it possible to downgrade, likely via a full rebuild of all the nodes, from CentOS 5 CS/GFS to 4? Should I instead consider setting up a single node to mount the GFS filesystems and serve them out, to get around these fencing issues? Thanks, James From williamottley at gmail.com Thu Jan 3 00:58:41 2008 From: williamottley at gmail.com (William Ottley) Date: Wed, 2 Jan 2008 19:58:41 -0500 Subject: [Linux-cluster] Lars' method??? Message-ID: <8108f4850801021658q5d7a8065i8a3d68be5cba57b5@mail.gmail.com> Hello all, I'm hoping that this layout here will make it easy for anyone to figure out what booboo i've done? I'm attempting to use Lars' method, since i really don't know how to setup with only 1 (gateways have confused me) i just can't get anything working... client: 192.168.2.10 -> 192.168.2.1 via crossover cable (ping OK) web browser is pointed to 192.168.2.100 LVS (centos 5.1, pulse, piranha): eth0: 192.168.2.1/gw {none} eth1: 192.168.0.111/gw 192.168.0.1 eth0:1 - 192.168.2.100 {VIP} echo 1 > /proc/sys/net/ipv4/ip_forward no iptables is running, httpd default port is 8080, and piranha_GUI is listening at 3636 RIP1: eth0: 192.168.0.15 / gw 192.168.0.1 /etc/sysconfig/network-scripts/lo:0 lo:0 - 192.168.2.100 ifup lo:0 ping 192.168.2.100 and 192.168.0.111 {OK} echo 1 > /proc/sys/net/ipv4/ip_forward RIP2: eth0: 192.168.0.11 / gw 192.168.0.1 /etc/sysconfig/network-scripts/lo:0 lo:0 - 192.168.2.100 ifup lo:0 ping 192.168.2.100 and 192.168.0.111 {OK} echo 1 > /proc/sys/net/ipv4/ip_forward /etc/sysconfig/ha/lvs.cf: serial_no = 19 primary = 192.168.2.1 service = lvs backup = 0.0.0.0 heartbeat = 1 heartbeat_port = 539 keepalive = 6 deadtime = 18 network = nat nat_router = 192.168.0.111 eth1 nat_nmask = 255.255.255.0 debug_level = NONE virtual all-web { active = 1 address = 192.168.2.100 eth0:1 vip_nmask = 255.255.255.0 port = 80 send = "GET / HTTP/1.0\r\n\r\n" expect = "HTTP" use_regex = 0 load_monitor = none scheduler = rr protocol = tcp timeout = 6 reentry = 15 quiesce_server = 0 server offsite1 { address = 192.168.0.11 active = 1 weight = 1 } server offsite2 { address = 192.168.0.15 active = 1 weight = 1 } } [root at localhost ha]# ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn/etc/init.d/pulse start TCP 192.168.2.100:http rr localhost pulse[4676]: STARTING PULSE AS MASTER -> 192.168.0.15:http Masq 1 0 0 localhost pulse[4676]: partner dead: activating lvs -> 192.168.0.11:http Masq 1 0 0 localhost lvs[4678]: starting virtual service all-web active: 80 localhost nanny[4684]: starting LVS client monitor for 192.168.2.100:80 localhost lvs[4678]: create_monitor for all-web/offsite1 running as pid 4684 localhost nanny[4685]: starting LVS client monitor for 192.168.2.100:80 localhost lvs[4678]: create_monitor for all-web/offsite2 running as pid 4685 localhost kernel: eth1: setting full-duplex. localhost pulse[4681]: gratuitous lvs arps finished localhost nanny[4684]: making 192.168.0.11:80 available localhost nanny[4685]: making 192.168.0.15:80 available PIRANHA CONFIGURATION TOOL CURRENT LVS ROUTING TABLE IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.2.100:80 rr -> 192.168.0.15:80 Masq 1 0 0 -> 192.168.0.11:80 Masq 1 0 0 CURRENT LVS PROCESSES root 4676 0.0 0.0 1868 448 ? Ss 14:42 0:00 pulse root 4678 0.0 0.1 1848 620 ? Ss 14:42 0:00 /usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/lvs.cf root 4684 0.0 0.1 1836 668 ? Ss 14:42 0:00 /usr/sbin/nanny -c -h 192.168.0.11 -p 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 192.168.2.100 -M m -U none --lvs root 4685 0.0 0.1 1832 664 ? Ss 14:42 0:00 /usr/sbin/nanny -c -h 192.168.0.15 -p 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 192.168.2.100 -M m -U none --lvs -- --------------- Morpheus: After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes. From randy.brown at noaa.gov Thu Jan 3 13:17:00 2008 From: randy.brown at noaa.gov (Randy Brown) Date: Thu, 03 Jan 2008 08:17:00 -0500 Subject: [Linux-cluster] Error messages during Fence operation In-Reply-To: <477BF55F.5000900@redhat.com> References: <47793613.9000304@noaa.gov> <4779380A.6000706@noaa.gov> <477BF55F.5000900@redhat.com> Message-ID: <477CE04C.4000002@noaa.gov> Thanks. That makes sense and I hadn't thought of that. I don't see any other connections. However, it appears to have properly fenced one of the nodes last night and I don't believe I've changed anything in the config. Maybe I did have another connection and something I did cleared it without me realizing it. As long as it's working. :) I'm still pretty "green" when it comes to clustering and SANS and sincerely appreciate the quality responses and willingness to help on this list. Randy James Parsons wrote: > Randy Brown wrote: > >> I forgot....I'm using Centos 5 with latest patches and kernel. >> >> Randy Brown wrote: >> >>> I am using an APC Masterswitch Plus as my fencing device. I am >>> seeing this in my logs now when fencing occurs: >>> >>> Dec 31 11:36:26 nfs1-cluster fenced[3848]: agent "fence_apc" >>> reports: Traceback (most recent call last): File >>> "/sbin/fence_apc", line 829, in ? main() File >>> "/sbin/fence_apc", line 289, in main do_login(sock) File >>> "/sbin/fence_apc", line 444, in do_login i, mo, txt = >>> sock.expect(regex_list, TELNET_TIMEOUT) >>> Dec 31 11:36:26 nfs1-cluster fenced[3848]: agent "fence_apc" >>> reports: File "/usr/lib/python2.4/telnetlib.py", line 620, in >>> expect text = self.read_very_lazy() File >>> "/usr/lib/python2.4/telnetlib.py", line 400, in read_very_lazy >>> raise EOFError, 'telnet connection closed' EOFError: telnet >>> connection closed >>> Dec 31 11:36:26 nfs1-cluster fenced[3848]: fence >>> "nfs2-cluster.nws.noaa.gov" failed >>> >>> This used to work just fine. If I run `fence_apc -a 192.168.42.30 >>> -l cluster -n 1:7 -o Reboot -p ` from the command line, >>> fencing works as expected. The relevant lines from my cluster.conf >>> file are below. I will gladly provide more information as necessary. >> > Is it possible that you are already telnet'ed into the switch from a > terminal or somesuch when the fence attempt takes place? APC switches > allow only one login at a time. I should/will add a log comment that > mentions this as a possible reason. > > If this is not the issue, well, we can keep digging... > > -J > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: randy_brown.vcf Type: text/x-vcard Size: 313 bytes Desc: not available URL: From beekhof at gmail.com Thu Jan 3 13:17:45 2008 From: beekhof at gmail.com (Andrew Beekhof) Date: Thu, 3 Jan 2008 14:17:45 +0100 Subject: [Linux-cluster] GFS without RHCM but with Heartbeat V2 and drbd ? In-Reply-To: <1199290651.5980.34.camel@ayanami.boston.devel.redhat.com> References: <20071227170006.AE68F733D5@hormel.redhat.com> <4774D07E.6030701@arcor.de> <1199290651.5980.34.camel@ayanami.boston.devel.redhat.com> Message-ID: <5CD2757B-950A-4C20-86C9-EDA4331B95E6@gmail.com> On Jan 2, 2008, at 5:17 PM, Lon Hohberger wrote: > On Fri, 2007-12-28 at 11:31 +0100, Holger Woehle wrote: >> Hi, >> at the moment i am evaluating RHCM and Heartbeat V2 to refine our >> Heartbeat V1 Cluster. >> My question as in the subject: >> Is it possible to use GFS without the RHCM ? > > Not sure. You don't need rgmanager, but you need > fencing/membership/etc. > > Nothing precludes you from running HBv2 in addition to RHCM+GFS. Providing you arrange for "non-overlapping areas of concern" - otherwise you could get the two cluster managers trying to pull the cluster in opposite directions. A third option is to run the CRM^ (the part that is new with Heartbeat v2) on top of OpenAIS so that the CRM and GFS are sharing the same "membership/etc" infrastructure. ^ Now its own project called Pacemaker and with support for both cluster stacks. For more details, see: http://clusterlabs.org >> I want to build a 2 node cluster with drbd activ/activ and RHCM/GFS >> or >> Heartbeat V2 with filesystem OCFS or GFS. > > Here's how to get a 2-node DRBD setup going w/ RHCM (without HBv2): > > http://sources.redhat.com/cluster/wiki/DRBD_Cookbook > > -- Lon > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From kjain at aurarianetworks.com Thu Jan 3 14:40:25 2008 From: kjain at aurarianetworks.com (Kamal Jain) Date: Thu, 3 Jan 2008 09:40:25 -0500 Subject: [Linux-cluster] GFS performance In-Reply-To: <477BC969.6050506@redhat.com> References: <1198770380.4932.23.camel@WSBID06223> <477A71B0.1080804@redhat.com> <477BC969.6050506@redhat.com> Message-ID: Hi Wendy, Thanks for looking into this, and for your preliminary feedback. I am surprised that handling locking for 8 files might cause major performance degradation with GFS versus iSCSI-direct. As for latency, all the devices are directly connected to a Cisco 3560G switch and on the same VLAN, so I expect Ethernet/layer-2 latencies to be sub-millisecond. Also, note that the much faster iSCSI performance was on the same GbE connections between the same devices and systems, so network throughput and latency are the same. GFS overhead, in handling locking (most likely) and any GFS filesystem overhead are the likely causes IMO. Looking forward to any analysis and guidance you may be able to provide on getting GFS performance closer to iSCSI-direct. - K -----Original Message----- Intuitively (by reading your iozone command), this is a locking issue. There are lots to say on your setup, mostly because all data and lock traffic are funneling thru the same network. Remember locking is mostly to do with *latency*, not bandwidth. So even your network is not saturated, the performance can go down. It is different from the rsync issue (as described by Jos Vos) so the glock trimming patch is not helpful in this case. However, I won't know for sure until we get the data analyzed. Thanks for the input. -- Wendy From lhh at redhat.com Thu Jan 3 15:38:40 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 03 Jan 2008 10:38:40 -0500 Subject: [Linux-cluster] Instability troubles In-Reply-To: References: Message-ID: <1199374720.9564.20.camel@ayanami.boston.devel.redhat.com> On Wed, 2008-01-02 at 17:35 -0500, James Chamberlain wrote: > Hi all, > > I'm having some major stability problems with my three-node CS/GFS cluster. > Every two or three days, one of the nodes fences another, and I have to > hard-reboot the entire cluster to recover. I have had this happen twice > today. I don't know what's triggering the fencing, since all the nodes > appear to me to be up and running when it happens. In fact, I was logged > on to node3 just now, running 'top', when node2 fenced it. > > When they come up, they don't automatically mount their GFS filesystems, > even with "_netdev" specified as a mount option; however, the node which > comes up first mounts them all as part of bringing all the services up. > > I did notice a couple of disconcerting things earlier today. First, I was > running "watch clustat". (I prefer to see the time updating, where I > can't with "clustat -i") The time is displayed in RHEL5 CVS version, and will go out with 5.2. > At one point, "clustat" crashed as follows: > > Jan 2 15:19:54 node2 kernel: clustat[17720]: segfault at 0000000000000024 > rip 0000003629e75bc0 rsp 00007fff18827178 error 4 A clustat crash is not a cause for a fence operation. That is, this might be related, but is definitely not the cause of a node being evicted. > Fairly shortly thereafter, clustat reported that node3 as "Online, > Estranged, rgmanager". Can anyone shed light on what that means? > Google's not telling me much. Ordinarily, this happens when you have a node join the cluster manually w/o giving it the configuration file. CMAN would assign it a node ID - but the node is not in the cluster configuration - so clustat would display the node as 'Estranged'. In your case, I'm not sure what the problem would be. > At the moment, all three nodes are running CentOS 5.1, with kernel > 2.6.18-53.1.4.el5. Can anyone point me in the right direction to resolve > these problems? I wasn't having trouble like this when I was running a > CentOS 4 CS/GFS cluster. Is it possible to downgrade, likely via a full > rebuild of all the nodes, from CentOS 5 CS/GFS to 4? Should I instead > consider setting up a single node to mount the GFS filesystems and serve > them out, to get around these fencing issues? I'd be interested a core file. Try to reproduce your clustat crash with 'ulimit -c unlimited' set before running clustat. I haven't seen clustat crash in a very long time, so I'm interested in the cause. (Also, after the crash, check to see if ccsd is running...) Maybe it will uncover some other hints as to the cause of the behavior you saw. If ccsd indeed failed for some reason, it would cause fencing to fail as well because the fence daemon would be unable to read fencing actions. Even given all of this, this doesn't explain why the node needed to be fenced in the first place. Were there any log messages indicating why the node needed to be fenced? The RHEL5 / CentOS5 release of Cluster Suite has a fairly aggressive node death timeout (5 seconds); maybe increasing it would help. ... -- Lon From lhh at redhat.com Thu Jan 3 15:49:09 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 03 Jan 2008 10:49:09 -0500 Subject: [Linux-cluster] Lars' method??? In-Reply-To: <8108f4850801021658q5d7a8065i8a3d68be5cba57b5@mail.gmail.com> References: <8108f4850801021658q5d7a8065i8a3d68be5cba57b5@mail.gmail.com> Message-ID: <1199375349.9564.25.camel@ayanami.boston.devel.redhat.com> On Wed, 2008-01-02 at 19:58 -0500, William Ottley wrote: > Hello all, > I'm hoping that this layout here will make it easy for anyone to > figure out what booboo i've done? > I'm attempting to use Lars' method, since i really don't know how to > setup with only 1 (gateways have confused me) > > i just can't get anything working... > > client: 192.168.2.10 -> 192.168.2.1 via crossover cable (ping OK) > web browser is pointed to 192.168.2.100 > > LVS (centos 5.1, pulse, piranha): > eth0: 192.168.2.1/gw {none} > eth1: 192.168.0.111/gw 192.168.0.1 > eth0:1 - 192.168.2.100 {VIP} > > echo 1 > /proc/sys/net/ipv4/ip_forward > > no iptables is running, httpd default port is 8080, and piranha_GUI is > listening at 3636 When using NAT, do not do the lo:0 hack. That's for direct routing only. You should not be using a nat_router ip/device unless you have two LVS directors. https://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/cluster-suite/s1-piranha-globalset.html -- Lon From williamottley at gmail.com Thu Jan 3 15:54:04 2008 From: williamottley at gmail.com (William Ottley) Date: Thu, 3 Jan 2008 10:54:04 -0500 Subject: [Linux-cluster] Lars' method??? In-Reply-To: <1199375349.9564.25.camel@ayanami.boston.devel.redhat.com> References: <8108f4850801021658q5d7a8065i8a3d68be5cba57b5@mail.gmail.com> <1199375349.9564.25.camel@ayanami.boston.devel.redhat.com> Message-ID: <8108f4850801030754u2419b466i74262e70e4b89804@mail.gmail.com> thanks Lon for the pointers. I've tried so many different methods just to get this to work: lvs-dr, lvs-nat with 2 nics, no go.. I can't seem to get anything to work. so i'm doing serious google searches, and there's sooo many how-tos that tell you to do this, or do that, etc. and i'm like well which one? I'm on one now, that says for nat, create a copy of the eth1 (private IP nic) eth1:1 and assign 192.168.0.254, and use that as the default gateway for all the RIP... is this true? Thing is, i have to manually copy the eth1 file to eth1:1 and assign the IP, yet with the virtual IP, eth0:1 is created automatically... so this makes me believe something is wrong. I use pulse / piranha: what tools can I use to test and see if web traffic IS going to the RIP or not??? thanks! Will On Jan 3, 2008 10:49 AM, Lon Hohberger wrote: > On Wed, 2008-01-02 at 19:58 -0500, William Ottley wrote: > > Hello all, > > I'm hoping that this layout here will make it easy for anyone to > > figure out what booboo i've done? > > I'm attempting to use Lars' method, since i really don't know how to > > setup with only 1 (gateways have confused me) > > > > i just can't get anything working... > > > > client: 192.168.2.10 -> 192.168.2.1 via crossover cable (ping OK) > > web browser is pointed to 192.168.2.100 > > > > LVS (centos 5.1, pulse, piranha): > > eth0: 192.168.2.1/gw {none} > > eth1: 192.168.0.111/gw 192.168.0.1 > > eth0:1 - 192.168.2.100 {VIP} > > > > echo 1 > /proc/sys/net/ipv4/ip_forward > > > > no iptables is running, httpd default port is 8080, and piranha_GUI is > > listening at 3636 > > When using NAT, do not do the lo:0 hack. That's for direct routing > only. > > You should not be using a nat_router ip/device unless you have two LVS > directors. > > https://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/cluster-suite/s1-piranha-globalset.html > > -- Lon > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- --------------- Morpheus: After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes. From williamottley at gmail.com Thu Jan 3 16:54:20 2008 From: williamottley at gmail.com (William Ottley) Date: Thu, 3 Jan 2008 11:54:20 -0500 Subject: [Linux-cluster] Lars' method??? In-Reply-To: <1199375349.9564.25.camel@ayanami.boston.devel.redhat.com> References: <8108f4850801021658q5d7a8065i8a3d68be5cba57b5@mail.gmail.com> <1199375349.9564.25.camel@ayanami.boston.devel.redhat.com> Message-ID: <8108f4850801030854q73bff5c9y6bcb0feace6867cc@mail.gmail.com> Hey Lon! woohoo: that's what was causing the problem: lo:1 which was my VIP! i removed it and now my test site is working! thanks so much. now if anyone knows of good diagnostic tools? because I now have to figure out how to do LVS-TUN..... william On Jan 3, 2008 10:49 AM, Lon Hohberger wrote: > On Wed, 2008-01-02 at 19:58 -0500, William Ottley wrote: > > Hello all, > > I'm hoping that this layout here will make it easy for anyone to > > figure out what booboo i've done? > > I'm attempting to use Lars' method, since i really don't know how to > > setup with only 1 (gateways have confused me) > > > > i just can't get anything working... > > > > client: 192.168.2.10 -> 192.168.2.1 via crossover cable (ping OK) > > web browser is pointed to 192.168.2.100 > > > > LVS (centos 5.1, pulse, piranha): > > eth0: 192.168.2.1/gw {none} > > eth1: 192.168.0.111/gw 192.168.0.1 > > eth0:1 - 192.168.2.100 {VIP} > > > > echo 1 > /proc/sys/net/ipv4/ip_forward > > > > no iptables is running, httpd default port is 8080, and piranha_GUI is > > listening at 3636 > > When using NAT, do not do the lo:0 hack. That's for direct routing > only. > > You should not be using a nat_router ip/device unless you have two LVS > directors. > > https://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/cluster-suite/s1-piranha-globalset.html > > -- Lon > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- --------------- Morpheus: After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes. From lhh at redhat.com Thu Jan 3 17:40:48 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 03 Jan 2008 12:40:48 -0500 Subject: [Linux-cluster] Lars' method??? In-Reply-To: <8108f4850801030754u2419b466i74262e70e4b89804@mail.gmail.com> References: <8108f4850801021658q5d7a8065i8a3d68be5cba57b5@mail.gmail.com> <1199375349.9564.25.camel@ayanami.boston.devel.redhat.com> <8108f4850801030754u2419b466i74262e70e4b89804@mail.gmail.com> Message-ID: <1199382048.9564.32.camel@ayanami.boston.devel.redhat.com> On Thu, 2008-01-03 at 10:54 -0500, William Ottley wrote: > thanks Lon for the pointers. I've tried so many different methods just > to get this to work: lvs-dr, lvs-nat with 2 nics, no go.. I can't seem > to get anything to work. > > so i'm doing serious google searches, and there's sooo many how-tos > that tell you to do this, or do that, etc. and i'm like well which > one? I'm on one now, that says for nat, create a copy of the eth1 > (private IP nic) eth1:1 and assign 192.168.0.254, and use that as the > default gateway for all the RIP... is this true? > > Thing is, i have to manually copy the eth1 file to eth1:1 and assign > the IP, yet with the virtual IP, eth0:1 is created automatically... so > this makes me believe something is wrong. You could just change the 'nat router device' to eth1:1 in the piranha-gui. > I use pulse / piranha: what tools can I use to test and see if web > traffic IS going to the RIP or not??? The web browser should work... With NAT, the real servers need no special configuration apart from the gateway being a NAT-side IP on the LVS director. That's why it should be easy to set up. -- Lon From lhh at redhat.com Thu Jan 3 17:41:47 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 03 Jan 2008 12:41:47 -0500 Subject: [Linux-cluster] Lars' method??? In-Reply-To: <8108f4850801030854q73bff5c9y6bcb0feace6867cc@mail.gmail.com> References: <8108f4850801021658q5d7a8065i8a3d68be5cba57b5@mail.gmail.com> <1199375349.9564.25.camel@ayanami.boston.devel.redhat.com> <8108f4850801030854q73bff5c9y6bcb0feace6867cc@mail.gmail.com> Message-ID: <1199382107.9564.34.camel@ayanami.boston.devel.redhat.com> On Thu, 2008-01-03 at 11:54 -0500, William Ottley wrote: > Hey Lon! > woohoo: that's what was causing the problem: lo:1 which was my VIP! > i removed it and now my test site is working! thanks so much. > > now if anyone knows of good diagnostic tools? > > because I now have to figure out how to do LVS-TUN..... Piranha can do DR and NAT, but doesn't correctly set up tunneling. -- Lon From williamottley at gmail.com Thu Jan 3 17:46:02 2008 From: williamottley at gmail.com (William Ottley) Date: Thu, 3 Jan 2008 12:46:02 -0500 Subject: [Linux-cluster] Lars' method??? In-Reply-To: <1199382048.9564.32.camel@ayanami.boston.devel.redhat.com> References: <8108f4850801021658q5d7a8065i8a3d68be5cba57b5@mail.gmail.com> <1199375349.9564.25.camel@ayanami.boston.devel.redhat.com> <8108f4850801030754u2419b466i74262e70e4b89804@mail.gmail.com> <1199382048.9564.32.camel@ayanami.boston.devel.redhat.com> Message-ID: <8108f4850801030946vb12fd53p91c5d6d063620ad2@mail.gmail.com> Hey Lon, thanks for taking the time to respond. The funny thing about the piranha-gui, is that I did point the gateway IP, and I can see in the config (lvs.cf) that the nat gateway is pointing to eth1:1, BUT, no eth1:1 exists at boot up, or anything: like how the VIP is.. I had to manually copy the ifcfg-eth1 to ifcfg-eth1:1 and start it that way.... and what tools do I use to troubleshoot? my end goal, is to create a lvs-tun... can this be done, with lvs-nat (2 nics)?? I suspect so.... On Jan 3, 2008 12:40 PM, Lon Hohberger wrote: > On Thu, 2008-01-03 at 10:54 -0500, William Ottley wrote: > > thanks Lon for the pointers. I've tried so many different methods just > > to get this to work: lvs-dr, lvs-nat with 2 nics, no go.. I can't seem > > to get anything to work. > > > > so i'm doing serious google searches, and there's sooo many how-tos > > that tell you to do this, or do that, etc. and i'm like well which > > one? I'm on one now, that says for nat, create a copy of the eth1 > > (private IP nic) eth1:1 and assign 192.168.0.254, and use that as the > > default gateway for all the RIP... is this true? > > > > Thing is, i have to manually copy the eth1 file to eth1:1 and assign > > the IP, yet with the virtual IP, eth0:1 is created automatically... so > > this makes me believe something is wrong. > > You could just change the 'nat router device' to eth1:1 in the > piranha-gui. > > > I use pulse / piranha: what tools can I use to test and see if web > > traffic IS going to the RIP or not??? > > The web browser should work... > > With NAT, the real servers need no special configuration apart from the > gateway being a NAT-side IP on the LVS director. That's why it should > be easy to set up. > > > -- Lon > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- --------------- Morpheus: After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes. From williamottley at gmail.com Thu Jan 3 19:51:15 2008 From: williamottley at gmail.com (William Ottley) Date: Thu, 3 Jan 2008 14:51:15 -0500 Subject: [Linux-cluster] had it working.... lvs-nat need help, $50 out of my own pocket? Message-ID: <8108f4850801031151w6e8933d7x9b77637f373d7274@mail.gmail.com> Hey all, I really am stuck with this test environment. I have followed examples from the howto's, and everything worked for a split second, and than it stopped working. There are sooo many different things that need to be done, that are conflicting from different howto's. Is there anyone willing to take the time and help, if I fork out $50 out of my own pocket? (i'm poor, but I need to get this working). I can give all the configs, etc... Thank you William -- --------------- Morpheus: After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes. From chrisp at tangent.co.za Thu Jan 3 21:06:38 2008 From: chrisp at tangent.co.za (Chris Picton) Date: Thu, 03 Jan 2008 23:06:38 +0200 Subject: [Linux-cluster] GNBD/GFS/cluster questions Message-ID: <477D4E5E.5050903@tangent.co.za> Hi all I have a question regarding gnbd and clustering. I currently have two servers (store1 and store2) sharing a block device (/dev/sdc) via drbd. The 'Primary' server exports this device via gnbd, and the export fails over along with the drbd primary node. A third server (gfs1) imports the gnbd device (which is part of an lvm) and mounts a gfs2 filesystem on it. I currently do not want to run drbd in primary/primary mode, as I have read that there are potentially some performance issues with this. I have written two custom scripts to handle the drbd and gnbd resource failover. If I am not going to mount the gfs2 filesystem on store1, or store2, can I create a 'private' cluster between only those two machines, with their own dlm, or do all of the other machines (only gfs1 in this example) which will be mounting the gfs filesystem have to use the same dlm, and be part of the same cluster? If I were to share the drbd device via iscsi, then I see no need for the importing devices to even be aware that the device is being exported by a cluster of machines - does this hold true for gnbd as well? Is there any benefit of have *all* my servers in the same cluster, or can I split them up into smaller logically separated clusters. Eg, if I add another two servers, with a failover gnbd export, do they also have to be part of the same global cluster, if they will be sharing the gnbd device into the same clvm? Or can they have their own 'private' cluster between themselves as well? Regards Chris From christopher.barry at qlogic.com Thu Jan 3 21:27:28 2008 From: christopher.barry at qlogic.com (Christopher Barry) Date: Thu, 3 Jan 2008 15:27:28 -0600 Subject: [Linux-cluster] GNBD/GFS/cluster questions References: <477D4E5E.5050903@tangent.co.za> Message-ID: -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of Chris Picton Sent: Thu 1/3/2008 4:06 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] GNBD/GFS/cluster questions Hi all I have a question regarding gnbd and clustering. I currently have two servers (store1 and store2) sharing a block device (/dev/sdc) via drbd. The 'Primary' server exports this device via gnbd, and the export fails over along with the drbd primary node. A third server (gfs1) imports the gnbd device (which is part of an lvm) and mounts a gfs2 filesystem on it. I currently do not want to run drbd in primary/primary mode, as I have read that there are potentially some performance issues with this. I have written two custom scripts to handle the drbd and gnbd resource failover. If I am not going to mount the gfs2 filesystem on store1, or store2, can I create a 'private' cluster between only those two machines, with their own dlm, or do all of the other machines (only gfs1 in this example) which will be mounting the gfs filesystem have to use the same dlm, and be part of the same cluster? If I were to share the drbd device via iscsi, then I see no need for the importing devices to even be aware that the device is being exported by a cluster of machines - does this hold true for gnbd as well? Is there any benefit of have *all* my servers in the same cluster, or can I split them up into smaller logically separated clusters. Eg, if I add another two servers, with a failover gnbd export, do they also have to be part of the same global cluster, if they will be sharing the gnbd device into the same clvm? Or can they have their own 'private' cluster between themselves as well? Regards Chris -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster I'm not an expert, as I have not used gnbd, but I would say you are correct. The two nodes that export, and do not mount, do not need to be a part of the gfs cluster - and really probably should not be. Thay can simply be their own active/passive failover cluster. Another 2 nodes, that export a different volume for the gfs cluster to use should be fine as well. Think of the pairs of gnbd nodes as two slices of an array. -C -C -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3679 bytes Desc: not available URL: From jason at lexxcom.com.au Fri Jan 4 23:01:44 2008 From: jason at lexxcom.com.au (Jason Stewart) Date: Fri, 04 Jan 2008 17:01:44 -0600 Subject: [Linux-cluster] GFS not syncing Message-ID: <477EBAD8.8010508@lexxcom.com.au> I have been spending the last few weeks working on this and after a lot of trial and error have managed to get everything to appear that it is working. I am a newbie to all this stuff The problem that I am having is that any changes made in the mounted GFS directory are not being seen by the other node, I have hunted around but can't seen to find any errors or messages in the logs that anything is wrong. I can do and ccs_tool update to update the cluster.conf file, so I am assuming that ccs and cman are running correctly so there must be something with GFS. I am not sure where to look now, I have done some extensive research on google but could not find anything. any information would be handy. From Alain.Moulle at bull.net Fri Jan 4 08:02:58 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Fri, 04 Jan 2008 09:02:58 +0100 Subject: [Linux-cluster] Last tuning on Quorum Disk / question Message-ID: <477DE832.3010108@bull.net> Hi Lon, Finally, I adopt this quorum disk configuration : I just wonder if the interval values for quorum disk with regard to the one for heuristic is the best choice or not ? And which are the rules to fit the good value for interval and tko on heuristic ? (I don't completely understand why your both heuristics avoids suicide if one ping get lots, it seems to be due to tko value but ... ) Thanks Regards And best whiches for 2008 ;-) Alain Moull? > Also - your heuristic should be more like one of the following: > tko="3" > program="ping -t1 -c1 " > score="1"/> > program="ping -t3 -c1 " > score="1"/> >Reason: You don't want a single ICMP packet to determine node fitness. >If that ping gets lost (network being full, or any reason really), the >node will commit suicide. (The man page probably needs updating about >that!) > Lon From orkcu at yahoo.com Fri Jan 4 13:31:03 2008 From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=) Date: Fri, 4 Jan 2008 05:31:03 -0800 (PST) Subject: [Linux-cluster] GFS not syncing In-Reply-To: <477EBAD8.8010508@lexxcom.com.au> Message-ID: <585395.47393.qm@web50609.mail.re2.yahoo.com> --- Jason Stewart wrote: > I have been spending the last few weeks working on > this and after a lot > of trial and error have managed to get everything to > appear that it is > working. I am a newbie to all this stuff > well, in your configuration I can't see where the services are declare, do you have any? also, you are using manual fencing, this is ok for testing purpose but definitly not for production so, can you send the GFS mount options for the FS you are working with? are they inthe fstab? and also can you send the command you used to create the GFS cu roger __________________________________________ RedHat Certified ( RHCE ) Cisco Certified ( CCNA & CCDA ) ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From wferi at niif.hu Fri Jan 4 14:49:24 2008 From: wferi at niif.hu (Ferenc Wagner) Date: Fri, 04 Jan 2008 15:49:24 +0100 Subject: [Linux-cluster] GFS: assertion failure in add_to_queue Message-ID: <87y7b5odcb.fsf@tac.ki.iif.hu> Hi, I'm using a 1-node GFS1 "cluster" with DLM locking and sporadically (say once a week) get the following in the kernel logs (Linux 2.6.23): GFS: fsid=noc:cricket.0: warning: assertion "(tmp_gh->gh_flags & GL_LOCAL_EXCL) || !(gh->gh_flags & GL_LOCAL_EXCL)" failed GFS: fsid=noc:cricket.0: function = add_to_queue GFS: fsid=noc:cricket.0: file = /home/wferi/cluster/cluster-2.01.00/gfs-kernel/src/gfs/glock.c, line = 1420 GFS: fsid=noc:cricket.0: time = 1197666253 The filesystem is under constant reading/writing and seems to operate without any application-visible errors (or at least I coultn't find error messages). Maybe it's no big deal, but something is not quite right. But what? Anybody has an idea? -- Thanks, Feri. From wferi at niif.hu Fri Jan 4 15:06:17 2008 From: wferi at niif.hu (Ferenc Wagner) Date: Fri, 04 Jan 2008 16:06:17 +0100 Subject: [Linux-cluster] GFS performance In-Reply-To: (Kamal Jain's message of "Thu, 3 Jan 2008 09:40:25 -0500") References: <1198770380.4932.23.camel@WSBID06223> <477A71B0.1080804@redhat.com> <477BC969.6050506@redhat.com> Message-ID: <877iipock6.fsf@tac.ki.iif.hu> Kamal Jain writes: > I am surprised that handling locking for 8 files might cause major > performance degradation with GFS versus iSCSI-direct. > > As for latency, all the devices are directly connected to a Cisco > 3560G switch and on the same VLAN, so I expect Ethernet/layer-2 > latencies to be sub-millisecond. Also, note that the much faster > iSCSI performance was on the same GbE connections between the same > devices and systems, so network throughput and latency are the same. > > GFS overhead, in handling locking (most likely) and any GFS > filesystem overhead are the likely causes IMO. > > Looking forward to any analysis and guidance you may be able to > provide on getting GFS performance closer to iSCSI-direct. I'm really interested in the outcome of this discussion. Meanwhile I can add that 'gfs_controld -l0' and 'gfs_tool settune /mnt demote_secs 600' (as recommended on this list by the kind developers) helped me tremendously dealing with lots of files. -- Regards, Feri. From kjain at aurarianetworks.com Fri Jan 4 15:15:26 2008 From: kjain at aurarianetworks.com (Kamal Jain) Date: Fri, 4 Jan 2008 10:15:26 -0500 Subject: [Linux-cluster] GFS performance In-Reply-To: <877iipock6.fsf@tac.ki.iif.hu> References: <1198770380.4932.23.camel@WSBID06223> <477A71B0.1080804@redhat.com> <477BC969.6050506@redhat.com> <877iipock6.fsf@tac.ki.iif.hu> Message-ID: Feri, Thanks for the information. A number of people have emailed me expressing some level of interest in the outcome of this, so hopefully I will soon be able to do some tuning and performance experiments and report back our results. On the demote_secs tuning parameter, I see you're suggesting 600 seconds, which appears to be longer than the default 300 seconds as stated by Wendy Cheng at http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 -- we're running RHEL4.5. Wouldn't a SHORTER demote period be better for lots of files, whereas perhaps a longer demote period might be more efficient for a smaller number of files being locked for long periods of time? On a related note, I converted a couple of the clusters in our lab from GULM to DLM and while performance is not necessarily noticeably improved (though more detailed testing was done after the conversion), we did notice that both clusters became more stable in the DLM configuration. Has anyone here had a similar experience and can shed some light as to why? When we would do long-running application testing on GFS volumes with GULM, after a while many commands that in any way might touch the disks would hang, like "df", "mount" or even "ls". So far with DLM things have been much more stable. No other tuning or adjustment has been done; both times things were default settings. - K -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Ferenc Wagner Sent: Friday, January 04, 2008 10:06 AM To: linux clustering Subject: Re: [Linux-cluster] GFS performance Kamal Jain writes: > I am surprised that handling locking for 8 files might cause major > performance degradation with GFS versus iSCSI-direct. > > As for latency, all the devices are directly connected to a Cisco > 3560G switch and on the same VLAN, so I expect Ethernet/layer-2 > latencies to be sub-millisecond. Also, note that the much faster > iSCSI performance was on the same GbE connections between the same > devices and systems, so network throughput and latency are the same. > > GFS overhead, in handling locking (most likely) and any GFS > filesystem overhead are the likely causes IMO. > > Looking forward to any analysis and guidance you may be able to > provide on getting GFS performance closer to iSCSI-direct. I'm really interested in the outcome of this discussion. Meanwhile I can add that 'gfs_controld -l0' and 'gfs_tool settune /mnt demote_secs 600' (as recommended on this list by the kind developers) helped me tremendously dealing with lots of files. -- Regards, Feri. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From wferi at niif.hu Fri Jan 4 15:34:51 2008 From: wferi at niif.hu (Ferenc Wagner) Date: Fri, 04 Jan 2008 16:34:51 +0100 Subject: [Linux-cluster] GFS performance In-Reply-To: (Kamal Jain's message of "Fri, 4 Jan 2008 10:15:26 -0500") References: <1198770380.4932.23.camel@WSBID06223> <477A71B0.1080804@redhat.com> <477BC969.6050506@redhat.com> <877iipock6.fsf@tac.ki.iif.hu> Message-ID: <87wsqpmwo4.fsf@tac.ki.iif.hu> Kamal Jain writes: > On the demote_secs tuning parameter, I see you're suggesting 600 > seconds, which appears to be longer than the default 300 seconds as > stated by Wendy Cheng at > http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 > -- we're running RHEL4.5. Wouldn't a SHORTER demote period be > better for lots of files, whereas perhaps a longer demote period > might be more efficient for a smaller number of files being locked > for long periods of time? It depends on your usage pattern. I had to access lots of files repeatedly, ie. cycling over them periodically by one machine in the cluster. It helped me a LOT to keep those GFS locks cached on that machine, while the others were all right without being lock masters as they ever needed some of the files only, not all of them. > On a related note, I converted a couple of the clusters in our lab > from GULM to DLM and while performance is not necessarily noticeably > improved (though more detailed testing was done after the > conversion), we did notice that both clusters became more stable in > the DLM configuration. I've never tried GULM, so I can't comment on this. -- Regards, Feri. From kjain at aurarianetworks.com Fri Jan 4 15:41:24 2008 From: kjain at aurarianetworks.com (Kamal Jain) Date: Fri, 4 Jan 2008 10:41:24 -0500 Subject: [Linux-cluster] GFS performance In-Reply-To: <87wsqpmwo4.fsf@tac.ki.iif.hu> References: <1198770380.4932.23.camel@WSBID06223> <477A71B0.1080804@redhat.com> <477BC969.6050506@redhat.com> <877iipock6.fsf@tac.ki.iif.hu> <87wsqpmwo4.fsf@tac.ki.iif.hu> Message-ID: Well, in our applications usage we don't keep cycling over the same files over and over again, we run through lots of files and keep a handful open at any point in time, so perhaps shorter demote_secs is good for us. I have not been able to find out about 'gfs_controld -l0' -- where is that set and what does "-l0" do? Thanks, - K -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Ferenc Wagner Sent: Friday, January 04, 2008 10:35 AM To: linux clustering Subject: Re: [Linux-cluster] GFS performance Kamal Jain writes: > On the demote_secs tuning parameter, I see you're suggesting 600 > seconds, which appears to be longer than the default 300 seconds as > stated by Wendy Cheng at > http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 > -- we're running RHEL4.5. Wouldn't a SHORTER demote period be > better for lots of files, whereas perhaps a longer demote period > might be more efficient for a smaller number of files being locked > for long periods of time? It depends on your usage pattern. I had to access lots of files repeatedly, ie. cycling over them periodically by one machine in the cluster. It helped me a LOT to keep those GFS locks cached on that machine, while the others were all right without being lock masters as they ever needed some of the files only, not all of them. > On a related note, I converted a couple of the clusters in our lab > from GULM to DLM and while performance is not necessarily noticeably > improved (though more detailed testing was done after the > conversion), we did notice that both clusters became more stable in > the DLM configuration. I've never tried GULM, so I can't comment on this. -- Regards, Feri. From wferi at niif.hu Fri Jan 4 16:00:43 2008 From: wferi at niif.hu (Ferenc Wagner) Date: Fri, 04 Jan 2008 17:00:43 +0100 Subject: [Linux-cluster] GFS performance In-Reply-To: (Kamal Jain's message of "Fri, 4 Jan 2008 10:41:24 -0500") References: <1198770380.4932.23.camel@WSBID06223> <477A71B0.1080804@redhat.com> <477BC969.6050506@redhat.com> <877iipock6.fsf@tac.ki.iif.hu> <87wsqpmwo4.fsf@tac.ki.iif.hu> Message-ID: <87k5mpmvh0.fsf@tac.ki.iif.hu> Kamal Jain writes: > Well, in our applications usage we don't keep cycling over the same > files over and over again, we run through lots of files and keep a > handful open at any point in time, so perhaps shorter demote_secs is > good for us. It there's no single machine which does most of the accesses, then probably so. > I have not been able to find out about 'gfs_controld -l0' -- where > is that set and what does "-l0" do? Try gfs_controld -h for some help. Basically, acquiring of Posix locks (fcntl locks) is artifically throttled on GFS by default. If you invoke gfs_controld with the -l0 option, this throttling is turned off. It probably doesn't buy you much unless your application uses this type of locks. I hope I recalled the above correctly. Somebody told me that this default is likely to be changed in the future, tough. -- Regards, Feri. From wcheng at redhat.com Fri Jan 4 16:04:20 2008 From: wcheng at redhat.com (Wendy Cheng) Date: Fri, 04 Jan 2008 11:04:20 -0500 Subject: [Linux-cluster] GFS performance In-Reply-To: References: <1198770380.4932.23.camel@WSBID06223> <477A71B0.1080804@redhat.com> <477BC969.6050506@redhat.com> <877iipock6.fsf@tac.ki.iif.hu> Message-ID: <477E5904.6020804@redhat.com> Kamal Jain wrote: > Feri, > > Thanks for the information. A number of people have emailed me expressing some level of interest in the outcome of this, so hopefully I will soon be able to do some tuning and performance experiments and report back our results. > > On the demote_secs tuning parameter, I see you're suggesting 600 seconds, which appears to be longer than the default 300 seconds as stated by Wendy Cheng at http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 -- we're running RHEL4.5. Wouldn't a SHORTER demote period be better for lots of files, whereas perhaps a longer demote period might be more efficient for a smaller number of files being locked for long periods of time? > This demote_secs tunable is a little bit tricky :) ... What happens here is that, GFS caches glocks that could get accumulated to a huge amount of count. Unless vm releases these inodes (files) associated with these glocks, current GFS internal daemons will do *fruitless* scan trying to remove these glock (but never succeed). If you set the demote_secs to a large number, it will *reduce* the wake-up frequencies of these daemons doing these fruitless works, that, in turns, leaving more CPU cycles for real works. Without glock trimming patch in place, that is a way to tune a system that is constantly touching large amount of files (such as rsync). Ditto for "scand" wake-up internal, making it larger will help the performance in this situation. With the *new* glock trimming patch, we actually remove the memory reference count so glock can be "demoted" and subsequently removed from the system if in idle states. To demote the glock, we need gfs_scand daemon to wake up often - this implies we need smaller demote_secs for it to be effective. > On a related note, I converted a couple of the clusters in our lab from GULM to DLM and while performance is not necessarily noticeably improved (though more detailed testing was done after the conversion), we did notice that both clusters became more stable in the DLM configuration. > This is mostly because DLM is the current default lock manager (with on-going development efforts) while GULM is not actively maintained. -- Wendy From kjain at aurarianetworks.com Fri Jan 4 16:16:47 2008 From: kjain at aurarianetworks.com (Kamal Jain) Date: Fri, 4 Jan 2008 11:16:47 -0500 Subject: [Linux-cluster] GFS performance In-Reply-To: <477E5904.6020804@redhat.com> References: <1198770380.4932.23.camel@WSBID06223> <477A71B0.1080804@redhat.com> <477BC969.6050506@redhat.com> <877iipock6.fsf@tac.ki.iif.hu> <477E5904.6020804@redhat.com> Message-ID: Ah ha! I think this is starting to make sense now, Wendy. And thank you for the explanation of why we should be using DLM rather than GULM. So without the patch, which we do not have, it might be good to increase demote_secs [per GFS mount] to 600 or even more seconds, and scand_secs to...what's a reasonable/safe value on that? It sounds like without the patch all we're doing -- to paraphrase you -- is reducing the frequency of operations which do no good and cause harm in the form of CPU and I/O resource usage. The patch is built into RHEL 4.6 and 5.1, right? When are those expected to be available (we only care about 4.6 right now) and/or how do we get the standalone patch? Thanks again to everyone for the feedback and information. - K -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Wendy Cheng Sent: Friday, January 04, 2008 11:04 AM To: linux clustering Subject: Re: [Linux-cluster] GFS performance Kamal Jain wrote: > Feri, > > Thanks for the information. A number of people have emailed me expressing some level of interest in the outcome of this, so hopefully I will soon be able to do some tuning and performance experiments and report back our results. > > On the demote_secs tuning parameter, I see you're suggesting 600 seconds, which appears to be longer than the default 300 seconds as stated by Wendy Cheng at http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 -- we're running RHEL4.5. Wouldn't a SHORTER demote period be better for lots of files, whereas perhaps a longer demote period might be more efficient for a smaller number of files being locked for long periods of time? > This demote_secs tunable is a little bit tricky :) ... What happens here is that, GFS caches glocks that could get accumulated to a huge amount of count. Unless vm releases these inodes (files) associated with these glocks, current GFS internal daemons will do *fruitless* scan trying to remove these glock (but never succeed). If you set the demote_secs to a large number, it will *reduce* the wake-up frequencies of these daemons doing these fruitless works, that, in turns, leaving more CPU cycles for real works. Without glock trimming patch in place, that is a way to tune a system that is constantly touching large amount of files (such as rsync). Ditto for "scand" wake-up internal, making it larger will help the performance in this situation. With the *new* glock trimming patch, we actually remove the memory reference count so glock can be "demoted" and subsequently removed from the system if in idle states. To demote the glock, we need gfs_scand daemon to wake up often - this implies we need smaller demote_secs for it to be effective. > On a related note, I converted a couple of the clusters in our lab from GULM to DLM and while performance is not necessarily noticeably improved (though more detailed testing was done after the conversion), we did notice that both clusters became more stable in the DLM configuration. > This is mostly because DLM is the current default lock manager (with on-going development efforts) while GULM is not actively maintained. -- Wendy From Paul.McDowell at celera.com Fri Jan 4 17:59:10 2008 From: Paul.McDowell at celera.com (Paul n McDowell) Date: Fri, 4 Jan 2008 12:59:10 -0500 Subject: [Linux-cluster] GFS performance In-Reply-To: <477E5904.6020804@redhat.com> Message-ID: Hi all.. I feel compelled to chime in on this GFS performance thread as we have a three node GFS environment running RHEL4.6 that was suffering from severe memory utilization (100% on a 32GB system) on all nodes and unacceptably poor performance. The three nodes serve five GFS file systems which range from 100GB to 1.2TB in size and are home to a diverse combination of very large and very small files. The degradation in performance always coincided with backup process starting, i.e. large numbers of inodes being read and cached, and was so bad that I was considering abandoning our GFS implementation altogether. Basic Unix commands such as df, ls and mkdir either took several minutes to complete or never finished at all. The only way to resolve the problem was to reboot all three production nodes which alleviated the problem until the next backup started. With a recommendation from RedHat support I implemented the tunable GFS parameter that Wendy describes in http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 by setting glock_purge to 50 for all file systems and it has made a dramatic difference. The memory utilization is no longer apparent and overall performance is very acceptable even when backups are running. If you're are not at update 6 yet then I would urge you to upgrade as soon as possible to take advantage of this new feature. Regards, Paul McDowell Celera Wendy Cheng Sent by: linux-cluster-bounces at redhat.com 01/04/2008 11:04 AM Please respond to linux clustering To linux clustering cc Subject Re: [Linux-cluster] GFS performance Kamal Jain wrote: > Feri, > > Thanks for the information. A number of people have emailed me expressing some level of interest in the outcome of this, so hopefully I will soon be able to do some tuning and performance experiments and report back our results. > > On the demote_secs tuning parameter, I see you're suggesting 600 seconds, which appears to be longer than the default 300 seconds as stated by Wendy Cheng at http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 -- we're running RHEL4.5. Wouldn't a SHORTER demote period be better for lots of files, whereas perhaps a longer demote period might be more efficient for a smaller number of files being locked for long periods of time? > This demote_secs tunable is a little bit tricky :) ... What happens here is that, GFS caches glocks that could get accumulated to a huge amount of count. Unless vm releases these inodes (files) associated with these glocks, current GFS internal daemons will do *fruitless* scan trying to remove these glock (but never succeed). If you set the demote_secs to a large number, it will *reduce* the wake-up frequencies of these daemons doing these fruitless works, that, in turns, leaving more CPU cycles for real works. Without glock trimming patch in place, that is a way to tune a system that is constantly touching large amount of files (such as rsync). Ditto for "scand" wake-up internal, making it larger will help the performance in this situation. With the *new* glock trimming patch, we actually remove the memory reference count so glock can be "demoted" and subsequently removed from the system if in idle states. To demote the glock, we need gfs_scand daemon to wake up often - this implies we need smaller demote_secs for it to be effective. > On a related note, I converted a couple of the clusters in our lab from GULM to DLM and while performance is not necessarily noticeably improved (though more detailed testing was done after the conversion), we did notice that both clusters became more stable in the DLM configuration. > This is mostly because DLM is the current default lock manager (with on-going development efforts) while GULM is not actively maintained. -- Wendy -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From joseparrella at gmail.com Fri Jan 4 18:20:24 2008 From: joseparrella at gmail.com (=?ISO-8859-1?Q?Jos=E9_Miguel_Parrella_Romero?=) Date: Fri, 04 Jan 2008 13:50:24 -0430 Subject: [Linux-cluster] I/O errors and performance in GFS mounts Message-ID: <477E78E8.5010306@gmail.com> Greetings, I have a two-node cluster based on Itanium2 machines using GFS for shared storage with fibre channel as transport. The whole setup has been working OK for three months now, and I have another two-node setup which is also working OK, except for some fibre issues (see below) // other clustering applications are working OK. I'm using clvmd and I've setup two LV, mkfs'ed them with GFS and mounted them in both nodes without any problems (based, of course, on the cman cluster definitions). I've setup manual fencing since I don't have proper devices to help me with that at the time. Since a couple of days now I've seen a lot of I/O errors with the GFS mounts, for example when using df to look at the available space on local mounts, and of course when ls'ing the shares. Sometimes df also reports incorrect size information (for example only 677 MB. used when the share has circa 60 GB.) This problem only occurs in one of the two nodes at the same time, and it is mostly random. The cluster hosts IMAP (Dovecot) and SMTP (Postfix) services, which turn unusable (except for non-local mail transport in Postfix) when this I/O errors appear. Searching for errors on dmesg and syslog throws several, continual errors such as this one: GFS: can't mount proto = lock_dlm, table = mail:inbox, hostdata = ... Where ... varies from: kernel unaligned access to 0xfffffffffffffffd, ip=0xa000000100187d81 mount[2200]: error during unaligned kernel access to mount[5221]: NaT consumption 2216203124768 [4] I'm aware that unaligned kernel access are not a bug, but rather a well-handled inconsistency, but these one seems to mess with GFS way too much. I fsck'ed the filesystems and this seemed to help a little, but I'm still getting slow times when ls'ing the GFS filesystems. We've chosen GFS over HA NFS, but we're getting this kind of performance problems. Some of our problems are due to fibre issues, for example unexpected LOOP DOWN's, but this time it seems more like a software issue. I'm running kernel 2.6.18 in Debian Etch. I would like to know if some of you have run into this problem. Maybe I'm missing some critical part in my cluster setup. Greetings, Jose From lhh at redhat.com Fri Jan 4 19:23:53 2008 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 04 Jan 2008 14:23:53 -0500 Subject: [Linux-cluster] Last tuning on Quorum Disk / question In-Reply-To: <477DE832.3010108@bull.net> References: <477DE832.3010108@bull.net> Message-ID: <1199474633.16312.10.camel@ayanami.boston.devel.redhat.com> On Fri, 2008-01-04 at 09:02 +0100, Alain Moulle wrote: > Hi Lon, > > Finally, I adopt this quorum disk configuration : > > > > > > > > I just wonder if the interval values for quorum disk with regard to > the one for heuristic is the best choice or not ? * should have quotes around attr values: interval=2 -> interval="2" score=1 -> score="1" * -cX is the number of pings to send. When using -c1, you should use tko="3" or something similar. * -tX is internet time to live - usually the number of router hops. For a local gateway, X should be 1. > And which are the rules to fit the good value for interval and tko > on heuristic ? (I don't completely understand why your both heuristics > avoids suicide if one ping get lots, it seems to be due to tko value > but ... ) 1. "Send one ping 172.21.1.12 one time with a max IP TTL of 1. Do this every 2 seconds. If this execution fails 3 times, we're done." 2. "Ping 172.21.1.12 one time. Do this every 2 seconds. If we fail to get a response from this operation, we're done." 3. "Send 3 pings to 172.21.1.12. Do this every 2 seconds. If we fail to get a response from this operation, we are done." ... 1 and 3 are almost equivalent: 3 ping packets must be lost to decide the heuristic is dead. ... 2, however, means that if the ping packet is /ever/ lost, the heuristic is dead. -- Lon From charlieb-linux-cluster at e-smith.com Fri Jan 4 21:18:45 2008 From: charlieb-linux-cluster at e-smith.com (Charlie Brady) Date: Fri, 4 Jan 2008 16:18:45 -0500 (EST) Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) Message-ID: I'm helping a colleague to collect information on an application lockup problem on a two-node DLM/GFS cluster, with GFS on a shared SCSI array. I'd appreciate advice as to what information to collect next. Packages in use are: kernel-smp-2.6.9-67.EL.i686.rpm dlm-1.0.7-1.i686.rpm dlm-kernel-smp-2.6.9-52.2.i686.rpm GFS-kernel-smp-2.6.9-75.9.i686.rpm GFS-6.1.15-1.i386.rpm ccs-1.0.11-1.i686.rpm cman-1.0.17-0.i686.rpm cman-kernel-smp-2.6.9-53.5.i686.rpm We've reduced the application code to a simple test case. The following code run on each node will soon block, and doesn't receive signals until the peer node is shutdown: ... fl.l_whence=SEEK_SET; fl.l_start=0; fl.l_len=1; while (1) { fl.l_type=F_WRLCK; retval=fcntl(filedes,F_SETLKW,&fl); if (retval==-1) { perror("lock"); exit(1); } // attempt to unlock the index file fl.l_type=F_UNLCK; retval=fcntl(filedes,F_SETLKW,&fl); if (retval==-1) { perror("unlock"); exit(1); } } ... /proc/cluster/dlm_debug on the respectives nodes showed this on most recent run: Node1: 2 FS1 send einval to 2 FS1 send einval to 2 [above line many times] FS1 send einval to 2 FS1 send einval to 2 FS1 grant lock on lockqueue 2 FS1 process_lockqueue_reply id 5400c2 state 0 Node 2: FS1 (31613) req reply einval 3de002b1 fr 1 r 1 7 FS1 (31613) req reply einval 3ea30356 fr 1 r 1 7 FS1 (31613) req reply einval 3f0100d5 fr 1 r 1 7 FS1 (31613) req reply einval 3df10367 fr 1 r 1 7 FS1 (31613) req reply einval 3fa600be fr 1 r 1 7 FS1 (31613) req reply einval 3f430355 fr 1 r 1 7 FS1 (31613) req reply einval 3fd20096 fr 1 r 1 7 FS1 (31613) req reply einval 3fc900d3 fr 1 r 1 7 FS1 (31613) req reply einval 3fe60375 fr 1 r 1 7 FS1 (31613) req reply einval 3f870143 fr 1 r 1 7 FS1 (31613) req reply einval 3f690239 fr 1 r 1 7 FS1 (31613) req reply einval 3eb40379 fr 1 r 1 7 FS1 (31613) req reply einval 3fb00352 fr 1 r 1 7 FS1 (31613) req reply einval 40a002f6 fr 1 r 1 7 FS1 (31613) req reply einval 3fb90265 fr 1 r 1 7 FS1 (31613) req reply einval 400b0326 fr 1 r 1 7 I have lockdump files from each node, but don't know how to interpret them. On shutdown, GFS unmount failed, and kernel panic followed: Turning off quotas: [ OK ] Unmounting file systems: umount2: Device or resource busy umount: /diskarray: device is busy umount2: Device or resource busy umount: /diskarray: device is busy CMAN: No functional network interfaces, leaving cluster CMAN: sendmsg failed: -22 CMAN: we are leaving the cluster. WARNING: dlm_emergency_shutdown SM: 00000002 sm_stop: SG still joined SM: 01000004 sm_stop: SG still joined SM: 02000006 sm_stop: SG still joined ds: 007b es: 007b ss: 0068 Process gfs_glockd (pid: 5654, threadinfo=f40d2000 task=f3c4b230) Stack: f8ade2d3 f8bb8000 00000003 f2c4ee80 f8ad98b2 f8c28ede 00000001 f33c0c7c f33c0c60 f8c1ed63 f8c55da0 d4aa4940 f33c0c60 f8c55da0 f33c0c60 f8c1e257 f33c0c60 00000001 f33c0cf4 f8c1e30e f33c0c60 f33c0c7c f8c1e431 00000001 Call Trace: [] lm_dlm_unlock+0x14/0x1c [lock_dlm] [] gfs_lm_unlock+0x2c/0x42 [gfs] [] gfs_glock_drop_th+0xf3/0x12d [gfs] [] rq_demote+0x7f/0x98 [gfs] [] run_queue+0x5a/0xc1 [gfs] [] unlock_on_glock+0x1f/0x28 [gfs] [] gfs_reclaim_glock+0xc3/0x13c [gfs] [] gfs_glockd+0x39/0xde [gfs] [] default_wake_function+0x0/0xc [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0xc [] gfs_glockd+0x0/0xde [gfs] [] kernel_thread_helper+0x5/0xb Code: 73 34 8b 03 ff 73 2c ff 73 08 ff 73 04 ff 73 0c 56 ff 70 18 68 ef e3 ad f8 e8 de 92 64 c7 83 c4 34 68 d3 e2 ad f8 e8 d1 92 64 c7 <0f> 0b 69 01 1b e2 ad f8 68 d5 e2 ad f8 e8 8c 8a 64 c7 5b 5e 5f <0>Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception --- Charlie From williamottley at gmail.com Sat Jan 5 19:15:16 2008 From: williamottley at gmail.com (William Ottley) Date: Sat, 5 Jan 2008 14:15:16 -0500 Subject: [Linux-cluster] where would the VIP be? Message-ID: <8108f4850801051115s30314637xe5f81b4646e93c0c@mail.gmail.com> I'm trying to setup a LVS-TUN, which has 3 internet connections. eth0 - public (client) eth1 - public TUN to webserver 1 eth2 - public TUN to webserver 2 and webserver 3 where would the VIP be? eth0:1?, also, do we enable forward to the webservers or just the LVS? I'm also confused about the lo:0. do we do that on the webservers or just do the: /etc/sysctl.conf: net.ipv4.conf.eth0.arp_ignore = 1 net.ipv4.conf.eth0.arp_announce = 2 net.ipv4.conf.all.arp_ignore = 1 net.ipv4.conf.all.arp_announce = 2 sysctl -p thanks for any insight! William -- --------------- Morpheus: After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes. From chawkins at veracitynetworks.com Sat Jan 5 19:43:52 2008 From: chawkins at veracitynetworks.com (Christopher Hawkins) Date: Sat, 5 Jan 2008 14:43:52 -0500 Subject: [Linux-cluster] where would the VIP be? In-Reply-To: <8108f4850801051115s30314637xe5f81b4646e93c0c@mail.gmail.com> Message-ID: <200801051944.m05Jio1Z017502@mxmail.leaseoptions.com> The VIP should be on eth0, like you said - the clients need to be able to reach it. And on the real servers (web servers), you would do the VIP on lo:0 AND the sysctl.conf settings. Chris -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of William Ottley Sent: Saturday, January 05, 2008 2:15 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] where would the VIP be? I'm trying to setup a LVS-TUN, which has 3 internet connections. eth0 - public (client) eth1 - public TUN to webserver 1 eth2 - public TUN to webserver 2 and webserver 3 where would the VIP be? eth0:1?, also, do we enable forward to the webservers or just the LVS? I'm also confused about the lo:0. do we do that on the webservers or just do the: /etc/sysctl.conf: net.ipv4.conf.eth0.arp_ignore = 1 net.ipv4.conf.eth0.arp_announce = 2 net.ipv4.conf.all.arp_ignore = 1 net.ipv4.conf.all.arp_announce = 2 sysctl -p thanks for any insight! William -- --------------- Morpheus: After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From williamottley at gmail.com Sat Jan 5 19:54:44 2008 From: williamottley at gmail.com (William Ottley) Date: Sat, 5 Jan 2008 14:54:44 -0500 Subject: [Linux-cluster] where would the VIP be? In-Reply-To: <200801051944.m05Jio1Z017502@mxmail.leaseoptions.com> References: <8108f4850801051115s30314637xe5f81b4646e93c0c@mail.gmail.com> <200801051944.m05Jio1Z017502@mxmail.leaseoptions.com> Message-ID: <8108f4850801051154k7d13e7efyb3e124ad6dcfb066@mail.gmail.com> Hey Chris! thanks! I use centos 5.1, and I have kernel 2.6.18-53.1.4.el5 on all the machines. SO I'll setup lo:0 to point to the VIP, and use the sysctl.conf... thanks sooo much!! Will On Jan 5, 2008 2:43 PM, Christopher Hawkins wrote: > The VIP should be on eth0, like you said - the clients need to be able to > reach it. And on the real servers (web servers), you would do the VIP on > lo:0 AND the sysctl.conf settings. > > Chris > > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of William Ottley > Sent: Saturday, January 05, 2008 2:15 PM > To: linux-cluster at redhat.com > Subject: [Linux-cluster] where would the VIP be? > > I'm trying to setup a LVS-TUN, which has 3 internet connections. > eth0 - public (client) > eth1 - public TUN to webserver 1 > eth2 - public TUN to webserver 2 and webserver 3 > > where would the VIP be? eth0:1?, also, do we enable forward to the > webservers or just the LVS? > > I'm also confused about the lo:0. do we do that on the webservers or just do > the: > > /etc/sysctl.conf: > net.ipv4.conf.eth0.arp_ignore = 1 > net.ipv4.conf.eth0.arp_announce = 2 > net.ipv4.conf.all.arp_ignore = 1 > net.ipv4.conf.all.arp_announce = 2 > sysctl -p > > > thanks for any insight! > > William > > -- > --------------- > Morpheus: After this, there is no turning back. You take the blue pill > - the story ends, you wake up in your bed and believe whatever you want to > believe. You take the red pill - you stay in Wonderland and I show you how > deep the rabbit-hole goes. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- --------------- Morpheus: After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes. From williamottley at gmail.com Sat Jan 5 22:51:39 2008 From: williamottley at gmail.com (William Ottley) Date: Sat, 5 Jan 2008 17:51:39 -0500 Subject: [Linux-cluster] would this configuration work for lvs-dr? Message-ID: <8108f4850801051451n1bf70ae6nb62187498cac7db8@mail.gmail.com> could someone please let me know if this setup will work for lvs-dr? I use pulse / piranha / ipvsadm I just can't get anything to work. and I'm thinking maybe the GW's are what causing the problems. I got help with regards to the forwarding bit, and not to use lo:0 but nothing seems to work. LVS: eth0: 192.168.2.1 / gw: 192.168.2.10 (CIP, direct connect to eth0) eth0:1 192.168.2.100 (VIP) eth1: 192.168.3.1 / gw 192.168.2.1 eth2: 192.168.0.111 / gw 192.168.2.1 sysctl.conf: ipv4.ip_forward = 1 RS#1 IP: 192.168.3.10 GW: 192.168.3.1 sysctl.conf: ipv4.ip_forward = 1 sysctl.conf: net.ipv4.conf.lo.arp_ignore = 1 sysctl.conf: net.ipv4.conf.lo.arp_announce = 2 sysctl.conf: net.ipv4.conf.all.arp_ignore = 1 sysctl.conf: net.ipv4.conf.all.arp_announce = 2 RS#2 IP: 192.168.0.10 GW: 192.168.0.1 sysctl.conf: ipv4.ip_forward = 1 sysctl.conf: net.ipv4.conf.lo.arp_ignore = 1 sysctl.conf: net.ipv4.conf.lo.arp_announce = 2 sysctl.conf: net.ipv4.conf.all.arp_ignore = 1 sysctl.conf: net.ipv4.conf.all.arp_announce = 2 -- --------------- Morpheus: After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes. From mrpquter at yahoo.com Sun Jan 6 14:26:37 2008 From: mrpquter at yahoo.com (Michael Harrison) Date: Sun, 6 Jan 2008 06:26:37 -0800 (PST) Subject: [Linux-cluster] RHEL4 U4 cman heartbeats on multiple interfaces Message-ID: <95038.95673.qm@web54407.mail.yahoo.com> Hi, Is cman capable of being configured with redundant network interfaces ? I have in mind using a private network as primary, and public network connection as secondary hearbeat in case the primary goes down. Thanks! -Mike ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs From mrpquter at yahoo.com Mon Jan 7 01:00:48 2008 From: mrpquter at yahoo.com (Michael Harrison) Date: Sun, 6 Jan 2008 17:00:48 -0800 (PST) Subject: [Linux-cluster] RHEL4 U4 cman heartbeats on multiple interfaces In-Reply-To: <95038.95673.qm@web54407.mail.yahoo.com> Message-ID: <465896.59881.qm@web54410.mail.yahoo.com> I missed this entry in the faq when I posted my question. Sorry! http://sources.redhat.com/cluster/faq.html#rgm_nicfailover The answer is no, it can't. --- Michael Harrison wrote: > Hi, > > Is cman capable of being configured with redundant network interfaces > ? > I have in mind using a private network as primary, and public network > connection as secondary hearbeat in case the primary goes down. > > Thanks! > -Mike > > > > > ____________________________________________________________________________________ > Never miss a thing. Make Yahoo your home page. > http://www.yahoo.com/r/hs > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs From mrpquter at yahoo.com Mon Jan 7 01:45:33 2008 From: mrpquter at yahoo.com (Michael Harrison) Date: Sun, 6 Jan 2008 17:45:33 -0800 (PST) Subject: [Linux-cluster] freezing a service Message-ID: <394625.68046.qm@web54401.mail.yahoo.com> Hi, Is it possible to freeze a service, so that rgmanager effectively ignores it? In other words, when doing maintenance on a production cluster, it's sometimes necessary to stop the cluster services on that node. When rgmanager comes down, I'd like it to leave whatever services are running on the node alone, and not fail them over. I looked at the docs and utilities and didn't find anything like that. Don't know if I missed it somewhere. Thanks, -Mike ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From james at cloud9.co.uk Mon Jan 7 09:11:53 2008 From: james at cloud9.co.uk (James Fidell) Date: Mon, 07 Jan 2008 09:11:53 +0000 Subject: [Linux-cluster] GFS tuning advice sought Message-ID: <4781ECD9.6020903@cloud9.co.uk> I have a 3-node cluster built on CentOS 5.1, fully updated, providing Maildir mail spool filesystems to dovecot-based IMAP servers. As it stands GFS is in its default configuration -- no tuning has been done so far. Mostly, it's working fine. Unfortunately we do have a few people with tens of thousands of emails in single mailboxes who are seeing fairly significant performance problems when fetching their email and in this instance "make your mailbox smaller" isn't an acceptable solution :( Is there any GFS tuning I can do which might help speed up access to these mailboxes? Thanks, James From cluster at defuturo.co.uk Fri Jan 4 17:33:56 2008 From: cluster at defuturo.co.uk (Robert Clark) Date: Fri, 04 Jan 2008 17:33:56 +0000 Subject: [Linux-cluster] GFS: assertion failure in add_to_queue In-Reply-To: <87y7b5odcb.fsf@tac.ki.iif.hu> References: <87y7b5odcb.fsf@tac.ki.iif.hu> Message-ID: <1199468036.2388.5.camel@rutabaga.defuturo.co.uk> On Fri, 2008-01-04 at 15:49 +0100, Ferenc Wagner wrote: > I'm using a 1-node GFS1 "cluster" with DLM locking and sporadically > (say once a week) get the following in the kernel logs (Linux 2.6.23): > > GFS: fsid=noc:cricket.0: warning: assertion "(tmp_gh->gh_flags & GL_LOCAL_EXCL) || !(gh->gh_flags & GL_LOCAL_EXCL)" failed > GFS: fsid=noc:cricket.0: function = add_to_queue > GFS: fsid=noc:cricket.0: file = /home/wferi/cluster/cluster-2.01.00/gfs-kernel/src/gfs/glock.c, line = 1420 > GFS: fsid=noc:cricket.0: time = 1197666253 Could be this: https://bugzilla.redhat.com/show_bug.cgi?id=272301 We're seeing it too. The trigger is a process attempting multiple locks on a single file. Occasionally it seems to cause an oops & panic as well. As far as I know, there's no fix available for it at the moment. Robert From wferi at niif.hu Mon Jan 7 10:57:45 2008 From: wferi at niif.hu (Ferenc Wagner) Date: Mon, 07 Jan 2008 11:57:45 +0100 Subject: [Linux-cluster] GFS: assertion failure in add_to_queue In-Reply-To: <1199468036.2388.5.camel@rutabaga.defuturo.co.uk> (Robert Clark's message of "Fri, 04 Jan 2008 17:33:56 +0000") References: <87y7b5odcb.fsf@tac.ki.iif.hu> <1199468036.2388.5.camel@rutabaga.defuturo.co.uk> Message-ID: <873at9q4wm.fsf@tac.ki.iif.hu> Robert Clark writes: > On Fri, 2008-01-04 at 15:49 +0100, Ferenc Wagner wrote: > >> I'm using a 1-node GFS1 "cluster" with DLM locking and sporadically >> (say once a week) get the following in the kernel logs (Linux 2.6.23): >> >> GFS: fsid=noc:cricket.0: warning: assertion "(tmp_gh->gh_flags & GL_LOCAL_EXCL) || !(gh->gh_flags & GL_LOCAL_EXCL)" failed >> GFS: fsid=noc:cricket.0: function = add_to_queue >> GFS: fsid=noc:cricket.0: file = /home/wferi/cluster/cluster-2.01.00/gfs-kernel/src/gfs/glock.c, line = 1420 >> GFS: fsid=noc:cricket.0: time = 1197666253 > > Could be this: > > https://bugzilla.redhat.com/show_bug.cgi?id=272301 > > We're seeing it too. The trigger is a process attempting multiple locks > on a single file. Occasionally it seems to cause an oops & panic as > well. > > As far as I know, there's no fix available for it at the moment. Thanks for the reply. Though that's far from comforting... -- Regards, Feri. From wcheng at redhat.com Mon Jan 7 13:51:44 2008 From: wcheng at redhat.com (Wendy Cheng) Date: Mon, 07 Jan 2008 08:51:44 -0500 Subject: [Linux-cluster] GFS tuning advice sought In-Reply-To: <4781ECD9.6020903@cloud9.co.uk> References: <4781ECD9.6020903@cloud9.co.uk> Message-ID: <47822E70.9000507@redhat.com> James Fidell wrote: >I have a 3-node cluster built on CentOS 5.1, fully updated, providing >Maildir mail spool filesystems to dovecot-based IMAP servers. As it >stands GFS is in its default configuration -- no tuning has been done >so far. > >Mostly, it's working fine. Unfortunately we do have a few people with >tens of thousands of emails in single mailboxes who are seeing fairly >significant performance problems when fetching their email and in this >instance "make your mailbox smaller" isn't an acceptable solution :( > >Is there any GFS tuning I can do which might help speed up access to >these mailboxes? > > > > You probably need GFS2 in this case. To fix mail server issues in GFS1 would be too intrusive with current state of development cycle. -- Wendy From isplist at logicore.net Mon Jan 7 14:55:03 2008 From: isplist at logicore.net (isplist at logicore.net) Date: Mon, 7 Jan 2008 08:55:03 -0600 Subject: [Linux-cluster] GFS tuning advice sought In-Reply-To: <47822E70.9000507@redhat.com> Message-ID: <2008178553.393310@leena> >> Is there any GFS tuning I can do which might help speed up access to >> these mailboxes? >> > You probably need GFS2 in this case. To fix mail server issues in GFS1 > would be too intrusive with current state of development cycle. Wendy, I noticed you mention that GFS2 might be best for this. Would this apply for web servers as well? I've been using GFS on RHEL4 for web server cluster sharing. Would I be better to look at GFS2 for performance? Mike From rmaureira at solint.cl Mon Jan 7 19:36:44 2008 From: rmaureira at solint.cl (Robinson Maureira Castillo) Date: Mon, 07 Jan 2008 16:36:44 -0300 Subject: [Linux-cluster] freezing a service In-Reply-To: <394625.68046.qm@web54401.mail.yahoo.com> References: <394625.68046.qm@web54401.mail.yahoo.com> Message-ID: <47827F4C.5050108@solint.cl> Hi there, you can always stop and disable a service using: clusvcadm -d And start and re-enabling it later with: clusvcadm -e Hope it helps. Best regards, Michael Harrison wrote: > Hi, > > Is it possible to freeze a service, so that rgmanager effectively > ignores it? In other words, when doing maintenance on a production > cluster, it's sometimes necessary to stop the cluster services on that > node. When rgmanager comes down, I'd like it to leave whatever services > are running on the node alone, and not fail them over. > > I looked at the docs and utilities and didn't find anything like that. > Don't know if I missed it somewhere. > > Thanks, > -Mike > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Robinson Maureira Castillo Jefe de Soporte SOLINT F: +56 2 4119047 C: +56 9 95994987 From mrpquter at yahoo.com Mon Jan 7 20:04:19 2008 From: mrpquter at yahoo.com (Michael Harrison) Date: Mon, 7 Jan 2008 12:04:19 -0800 (PST) Subject: [Linux-cluster] freezing a service In-Reply-To: <47827F4C.5050108@solint.cl> Message-ID: <302270.15413.qm@web54401.mail.yahoo.com> What I'd like to do is stop the cluster components, leaving the services running. So for example, stop rgmanager, fenced, cman, ccsd for upgrades or whatever, and not have rgmanager fail over an IP address that it would otherwise control. Does it make sense? Cheers, -Mike --- Robinson Maureira Castillo wrote: > Hi there, you can always stop and disable a service using: > > clusvcadm -d > > And start and re-enabling it later with: > > clusvcadm -e > > > Hope it helps. > > Best regards, > > Michael Harrison wrote: > > Hi, > > > > Is it possible to freeze a service, so that rgmanager effectively > > ignores it? In other words, when doing maintenance on a production > > cluster, it's sometimes necessary to stop the cluster services on > that > > node. When rgmanager comes down, I'd like it to leave whatever > services > > are running on the node alone, and not fail them over. > > > > I looked at the docs and utilities and didn't find anything like > that. > > Don't know if I missed it somewhere. > > > > Thanks, > > -Mike > > > > > > > > > ____________________________________________________________________________________ > > Be a better friend, newshound, and > > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Robinson Maureira Castillo > Jefe de Soporte > SOLINT > > F: +56 2 4119047 > C: +56 9 95994987 > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs From gordan at bobich.net Mon Jan 7 21:24:51 2008 From: gordan at bobich.net (Gordan Bobic) Date: Mon, 07 Jan 2008 21:24:51 +0000 Subject: [Linux-cluster] GFS tuning advice sought In-Reply-To: <47822E70.9000507@redhat.com> References: <4781ECD9.6020903@cloud9.co.uk> <47822E70.9000507@redhat.com> Message-ID: <478298A3.7010605@bobich.net> > James Fidell wrote: > >> I have a 3-node cluster built on CentOS 5.1, fully updated, providing >> Maildir mail spool filesystems to dovecot-based IMAP servers. As it >> stands GFS is in its default configuration -- no tuning has been done >> so far. >> >> Mostly, it's working fine. Unfortunately we do have a few people with >> tens of thousands of emails in single mailboxes who are seeing fairly >> significant performance problems when fetching their email and in this >> instance "make your mailbox smaller" isn't an acceptable solution :( >> >> Is there any GFS tuning I can do which might help speed up access to >> these mailboxes? I have implemented similar mail systems in the past, and I hate to tell you this, but if you really have tens of thousands of emails in a single folder and you need to sift through them frequently, even a single-user-server with ext3 won't give you any kind of a sane performance over a WAN. Even on a 100Mb LAN, things will end up timing out, even without clustering. You'll have to split the huge mail folders up into several smaller folders. Gordan From lhh at redhat.com Mon Jan 7 22:48:10 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 07 Jan 2008 17:48:10 -0500 Subject: [Linux-cluster] freezing a service In-Reply-To: <394625.68046.qm@web54401.mail.yahoo.com> References: <394625.68046.qm@web54401.mail.yahoo.com> Message-ID: <1199746090.16312.12.camel@ayanami.boston.devel.redhat.com> On Sun, 2008-01-06 at 17:45 -0800, Michael Harrison wrote: > Hi, > > Is it possible to freeze a service, so that rgmanager effectively > ignores it? In other words, when doing maintenance on a production > cluster, it's sometimes necessary to stop the cluster services on that > node. When rgmanager comes down, I'd like it to leave whatever services > are running on the node alone, and not fail them over. > > I looked at the docs and utilities and didn't find anything like that. > Don't know if I missed it somewhere. In HEAD, yes. If you're not running HEAD... clusvcadm -d rg_test test /etc/cluster/cluster.conf start service [do maintenance here] rg_test test /etc/cluster/cluster.conf stop service clusvcadm -e -- Lon From lhh at redhat.com Mon Jan 7 22:50:10 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 07 Jan 2008 17:50:10 -0500 Subject: [Linux-cluster] freezing a service In-Reply-To: <1199746090.16312.12.camel@ayanami.boston.devel.redhat.com> References: <394625.68046.qm@web54401.mail.yahoo.com> <1199746090.16312.12.camel@ayanami.boston.devel.redhat.com> Message-ID: <1199746210.16312.15.camel@ayanami.boston.devel.redhat.com> On Mon, 2008-01-07 at 17:48 -0500, Lon Hohberger wrote: > On Sun, 2008-01-06 at 17:45 -0800, Michael Harrison wrote: > > Hi, > > > > Is it possible to freeze a service, so that rgmanager effectively > > ignores it? In other words, when doing maintenance on a production > > cluster, it's sometimes necessary to stop the cluster services on that > > node. When rgmanager comes down, I'd like it to leave whatever services > > are running on the node alone, and not fail them over. Also, there's a BZ open about supporting teardown/start w/o doing a failover. The 'freeze' function in head doesn't let you stop the cluster services, but it lets you work on the service itself w/o rgmanager checking it or moving it while you're performing maintenance. -- Lon From mrpquter at yahoo.com Mon Jan 7 22:54:45 2008 From: mrpquter at yahoo.com (Michael Harrison) Date: Mon, 7 Jan 2008 14:54:45 -0800 (PST) Subject: [Linux-cluster] freezing a service In-Reply-To: <1199746090.16312.12.camel@ayanami.boston.devel.redhat.com> Message-ID: <650291.92797.qm@web54404.mail.yahoo.com> Ah, thanks. No, for the time being I'm running the bits released for RHEL4U4. Good to know that freezing will be available at sometime in the future. BTW, what will it be called ? Freezing ? -Mike --- Lon Hohberger wrote: > On Sun, 2008-01-06 at 17:45 -0800, Michael Harrison wrote: > > Hi, > > > > Is it possible to freeze a service, so that rgmanager effectively > > ignores it? In other words, when doing maintenance on a production > > cluster, it's sometimes necessary to stop the cluster services on > that > > node. When rgmanager comes down, I'd like it to leave whatever > services > > are running on the node alone, and not fail them over. > > > > I looked at the docs and utilities and didn't find anything like > that. > > Don't know if I missed it somewhere. > > In HEAD, yes. > > If you're not running HEAD... > > clusvcadm -d > rg_test test /etc/cluster/cluster.conf start service > [do maintenance here] > rg_test test /etc/cluster/cluster.conf stop service > clusvcadm -e > > -- Lon > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From lhh at redhat.com Mon Jan 7 23:06:10 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 07 Jan 2008 18:06:10 -0500 Subject: [Linux-cluster] would this configuration work for lvs-dr? In-Reply-To: <8108f4850801051451n1bf70ae6nb62187498cac7db8@mail.gmail.com> References: <8108f4850801051451n1bf70ae6nb62187498cac7db8@mail.gmail.com> Message-ID: <1199747170.16312.32.camel@ayanami.boston.devel.redhat.com> With direct routing, all the nodes must be visible to the outside world using the same route as the director(s). If you're trying to route *through* your director, you need to use NAT (or tun, which I've never used). Direct routing means that you are not using the director as a router, just a load balancer. That is, assuming you have a gateway for all 3 hosts that's @ 192.168.2.254... Director: eth0 192.168.2.1 eth0:0 192.168.2.100 (vip) gateway / default route 192.168.2.254 Real server #1: eth0 192.168.2.2 gateway / default route 192.168.2.254 Real server #2: eth0 192.168.2.3 gateway / gateway route 192.168.2.254 Once that's done, you need to get the real servers to process requests for 192.168.1.100. I wrote this some years ago, but here are two ways of getting it working: http://people.redhat.com/lhh/piranha-direct-routing-howto.txt Depending on how you do it, you will either place 192.168.1.100 as eth0:0 and do an arptables_jf setup, or you will not put 192.168.1.100 on *any* of the real servers and instead use an iptables hack to use a transparent proxy to rewrite outbound packets to be sourced from 192.168.1.100. Some people put the VIP on lo:0 - I have never done that nor could I tell you the advantages or disadvantages. I've also never played with sysctl.conf settings. -- Lon From wcheng at redhat.com Tue Jan 8 04:17:19 2008 From: wcheng at redhat.com (Wendy Cheng) Date: Mon, 07 Jan 2008 23:17:19 -0500 Subject: [Linux-cluster] GFS tuning advice sought In-Reply-To: <2008178553.393310@leena> References: <2008178553.393310@leena> Message-ID: <4782F94F.4070302@redhat.com> isplist at logicore.net wrote: >>>Is there any GFS tuning I can do which might help speed up access to >>>these mailboxes? >>> >>> >>> >>You probably need GFS2 in this case. To fix mail server issues in GFS1 >>would be too intrusive with current state of development cycle. >> >> > >Wendy, > >I noticed you mention that GFS2 might be best for this. Would this apply for >web servers as well? I've been using GFS on RHEL4 for web server cluster >sharing. Would I be better to look at GFS2 for performance? > > > > Not sure about web servers though - I think it depends on access patterns. -- Wendy From gordan at bobich.net Tue Jan 8 07:23:25 2008 From: gordan at bobich.net (Gordan Bobic) Date: Tue, 08 Jan 2008 07:23:25 +0000 Subject: [Linux-cluster] GFS tuning advice sought In-Reply-To: <2008178553.393310@leena> References: <2008178553.393310@leena> Message-ID: <478324ED.2070403@bobich.net> isplist at logicore.net wrote: >>> Is there any GFS tuning I can do which might help speed up access to >>> these mailboxes? >>> >> You probably need GFS2 in this case. To fix mail server issues in GFS1 >> would be too intrusive with current state of development cycle. > > I noticed you mention that GFS2 might be best for this. Would this apply for > web servers as well? I've been using GFS on RHEL4 for web server cluster > sharing. Would I be better to look at GFS2 for performance? Web server disk I/O is likely to be mostly read-only, so I doubt disk performance will ever be your bottleneck. It's bouncing write-locks around that slows clustered file systems down. Gordan From lhh at redhat.com Tue Jan 8 14:36:00 2008 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 08 Jan 2008 09:36:00 -0500 Subject: [Linux-cluster] would this configuration work for lvs-dr? In-Reply-To: <1199747170.16312.32.camel@ayanami.boston.devel.redhat.com> References: <8108f4850801051451n1bf70ae6nb62187498cac7db8@mail.gmail.com> <1199747170.16312.32.camel@ayanami.boston.devel.redhat.com> Message-ID: <1199802960.16312.35.camel@ayanami.boston.devel.redhat.com> On Mon, 2008-01-07 at 18:06 -0500, Lon Hohberger wrote: > Once that's done, you need to get the real servers to process requests > for 192.168.1.100. typo ... 2.100, not 1.100. > Depending on how you do it, you will either place 192.168.1.100 as > eth0:0 and do an arptables_jf setup, or you will not put 192.168.1.100 > on *any* of the real servers and instead use an iptables hack to use a > transparent proxy to rewrite outbound packets to be sourced from > 192.168.1.100. Same here. -- Lon From Alain.Moulle at bull.net Tue Jan 8 14:39:30 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Tue, 08 Jan 2008 15:39:30 +0100 Subject: [Linux-cluster] CS5 and dlm-kernel ? Message-ID: <47838B22.5000608@bull.net> Hi I thought there was always a dlm and dlm-kernel likewise in CS4, but it seems rpms don't exist anymore ? So no more kernel module at all with CS5 ? Alain Moull? From msebedio at invap.com.ar Tue Jan 8 15:58:22 2008 From: msebedio at invap.com.ar (Mariel Sebedio) Date: Tue, 08 Jan 2008 12:58:22 -0300 Subject: [Linux-cluster] NTP In-Reply-To: <9F22D67428CC144B93A198ECA35B85BF13B59B0097@PS-MAILBOX.Progressoft.com> References: <9F22D67428CC144B93A198ECA35B85BF13B59B0097@PS-MAILBOX.Progressoft.com> Message-ID: <47839D9E.3080709@invap.com.ar> Hello, I had the same problem and configuring this files with this settings... (Sorry mi English) The server A: Edit /etc/ntp.conf and setting this Step 1 - server XXXXX (IP o name in /etc/host from where take the date-Time) Step 2 - restrict XXXXX mask 255.255.255.255 nomodify notrap noquery Step 3 - restrict XXXXX mask 255.255.255.0 nomodify notrap (in this case restrict the net where is server B, dont put noquery because server B query for time a this server) In the file /etc/ntp/step-tickers mus be contain the IP or name in step 1 Start /etc/init.d/ntpd and in this case Sync whit the server that you defined in Step 1 In Server B Edit /etc/ntp.conf and setting this Step 1 - server XXXXX (IP o name in /etc/host SERVER A) Step 2 - restrict IP o name SERVER A mask 255.255.255.255 nomodify notrap noquery In the file /etc/ntp/step-tickers mus be contain the IP or name in step 1 (SERVER A) Start /etc/init.d/ntpd and in this case Sync whit the SERVER A Good luck!! Mariel Yazan Albakheit wrote: > Dear , > > > > Can you Help me in configuring the NTP between two nodes running > RHEL_AS_V4_U5 . > > > > I Have two Server (A,B) I want server B to take its time from > server A only. > > > > > > Thanks. > > > >------------------------------------------------------------------------ > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > -- Lic. Mariel Sebedio Division Computos y Sistemas INVAP S.E. - www.invap.com.ar From kanderso at redhat.com Tue Jan 8 15:09:18 2008 From: kanderso at redhat.com (Kevin Anderson) Date: Tue, 08 Jan 2008 09:09:18 -0600 Subject: [Linux-cluster] CS5 and dlm-kernel ? In-Reply-To: <47838B22.5000608@bull.net> References: <47838B22.5000608@bull.net> Message-ID: <1199804958.2750.8.camel@dhcp80-204.msp.redhat.com> On Tue, 2008-01-08 at 15:39 +0100, Alain Moulle wrote: > Hi > > I thought there was always a dlm and dlm-kernel likewise in CS4, > but it seems rpms don't exist anymore ? > So no more kernel module at all with CS5 ? > dlm kernel module is part of the upstream and base rhel5 kernel now, doesn't need a separate rpm as it is included as part of the kernel rpm. Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From isplist at logicore.net Tue Jan 8 15:49:08 2008 From: isplist at logicore.net (isplist at logicore.net) Date: Tue, 8 Jan 2008 09:49:08 -0600 Subject: [Linux-cluster] GFS tuning advice sought In-Reply-To: <478324ED.2070403@bobich.net> Message-ID: <2008189498.774713@leena> > Web server disk I/O is likely to be mostly read-only, so I doubt disk > performance will ever be your bottleneck. It's bouncing write-locks > around that slows clustered file systems down. True and other than media, all writes are to the MySQL servers. Still, I wondered since the web servers are all sharing a GFS space for their pages. Mike From charlieb-linux-cluster at e-smith.com Tue Jan 8 20:51:36 2008 From: charlieb-linux-cluster at e-smith.com (Charlie Brady) Date: Tue, 8 Jan 2008 15:51:36 -0500 (EST) Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) In-Reply-To: Message-ID: On Fri, 4 Jan 2008, Charlie Brady wrote: > I'm helping a colleague to collect information on an application lockup > problem on a two-node DLM/GFS cluster, with GFS on a shared SCSI array. > > I'd appreciate advice as to what information to collect next. Nobody have any advice? --- Charlie From kanderso at redhat.com Tue Jan 8 21:00:42 2008 From: kanderso at redhat.com (Kevin Anderson) Date: Tue, 08 Jan 2008 15:00:42 -0600 Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) In-Reply-To: References: Message-ID: <1199826042.2774.13.camel@localhost.localdomain> On Tue, 2008-01-08 at 15:51 -0500, Charlie Brady wrote: > On Fri, 4 Jan 2008, Charlie Brady wrote: > > > I'm helping a colleague to collect information on an application lockup > > problem on a two-node DLM/GFS cluster, with GFS on a shared SCSI array. > > Missed this on my first read. What do you mean by a shared scsi array? What hardware are you using for shared storage? Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From gordan at bobich.net Tue Jan 8 21:08:37 2008 From: gordan at bobich.net (Gordan Bobic) Date: Tue, 08 Jan 2008 21:08:37 +0000 Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) In-Reply-To: References: Message-ID: <4783E655.3080102@bobich.net> Charlie Brady wrote: > On Fri, 4 Jan 2008, Charlie Brady wrote: > >> I'm helping a colleague to collect information on an application lockup >> problem on a two-node DLM/GFS cluster, with GFS on a shared SCSI array. >> >> I'd appreciate advice as to what information to collect next. > > Nobody have any advice? Shared SCSI as in iSCSI SAN or as in a shared SCSI bus with two machines connected via a SCSI cable? Gordan From snowhare at nihongo.org Tue Jan 8 21:31:58 2008 From: snowhare at nihongo.org (Benjamin Franz) Date: Tue, 8 Jan 2008 13:31:58 -0800 (PST) Subject: [Linux-cluster] fence_apc - Perl based CLI version Message-ID: As others have reported, the current fence_apc shipping with RHEL5.1/CentOS5.1 simply does not work reliably on newer APC firmwares. It breaks under all kinds of conditions (some as simple as 'works on some ports but not on other ports'). Since I *really* need it to work, I hacked together a Perl version (derived from the old fence_apc.pl in CVS) that uses the APC command line interface and dispenses with the 'menu scraping' interface entirely. I don't have any switches here that use a switchnum interface so I couldn't hack anything together for that. But it appears to reliably do what it is supposed to do (at least on my APC7900 switches running AOS 3.3.4): Fence. It would make a great deal of sense for someone to add it to the Luci/CMAN list of supported fences. Maybe "APC Power Device (CLI) / fence_apc_cli"? -- Benjamin Franz "It is moronic to predict without first establishing an error rate for a prediction and keeping track of one?s past record of accuracy." -- Nassim Nicholas Taleb, Fooled By Randomness -------------- next part -------------- #!/usr/bin/perl #########################################3 # CLI APC Fencing. This only works with APC AOS v2.7.0 or later # but it is a LOT simpler and more robust than the old menu scraping code. # use strict; use warnings; use Getopt::Std; use Net::Telnet (); # WARNING!! Do not add code bewteen "#BEGIN_VERSION_GENERATION" and # "#END_VERSION_GENERATION" It is generated by the Makefile my ($FENCE_RELEASE_NAME, $REDHAT_COPYRIGHT, $BUILD_DATE); #BEGIN_VERSION_GENERATION $FENCE_RELEASE_NAME=""; $REDHAT_COPYRIGHT=""; $BUILD_DATE=""; #END_VERSION_GENERATION ############################################################################### ############################################################################### ## ## Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. ## Copyright (C) 2004-2006 Red Hat, Inc. All rights reserved. ## ## This copyrighted material is made available to anyone wishing to use, ## modify, copy, or redistribute it subject to the terms and conditions ## of the GNU General Public License v.2. ## ############################################################################### ############################################################################### # Get the program name from $0 and strip directory names my $Program_Name = $0; $Program_Name =~ s/.*\///; my $login_prompt = '/ : /'; my $command_prompt = '/APC> $/'; my $debug_log = '/tmp/apclog'; # Location of debugging log when in verbose mode my $telnet_timeout = 2; # Seconds to wait for matching telent response my $open_wait = 5; # Seconds to wait between each telnet attempt my $max_open_tries = 3; # How many telnet attempts to make. Because the # APC can fail repeated login attempts, this number # should be more than 1 my $reboot_duration = 30; # Number of seconds plugs are turned off during a reboot command my $power_off_delay = 0; # Number of seconds to wait before actually turning off a plug my $power_on_delay = 30; # Number of seconds to wait before actually turning on a plug our %Opts = ( 'o' => 'reboot', ); our $SwitchNum; our $Logged_In = 0; our $t = Net::Telnet->new; # Our telnet object instance ### START MAIN ####################################################### if (@ARGV > 0) { getopts("a:hl:n:o:p:qTvV", \%Opts) || fail_usage(); usage() if defined $Opts{'h'}; version() if defined $Opts{'V'}; fail_usage("Unknown parameter.") if (@ARGV > 0); fail_usage("No '-a' flag specified.") unless defined $Opts{'a'}; fail_usage("No '-n' flag specified.") unless defined $Opts{'n'}; fail_usage("No '-l' flag specified.") unless defined $Opts{'l'}; fail_usage("No '-p' flag specified.") unless defined $Opts{'p'}; fail_usage("Unrecognised action '$Opts{'o'}' for '-o' flag") unless $Opts{'o'} =~ /^(Off|On|Reboot)$/i; if ( $Opts{'n'} =~ /(\d+):(\d+)/ ) { $SwitchNum = $1; $Opts{'n'} = $2; } } else { get_options_stdin(); fail("failed: no IP address") unless defined $Opts{'a'}; fail("failed: no plug number") unless defined $Opts{'n'}; fail("failed: no login name") unless defined $Opts{'l'}; fail("failed: no password") unless defined $Opts{'p'}; fail("failed: unrecognised action: $Opts{'o'}") unless $Opts{'o'} =~ /^(Off|On|Reboot)$/i; } my $option = lc($Opts{'o'}); my $plug = $Opts{'n'}; $t->prompt($command_prompt); $t->timeout($telnet_timeout); $t->input_log($debug_log) if $Opts{'v'}; $t->errmode('return'); login(); $t->errmode(\&telnet_error); my $cmd_results = ''; my $ok; if ($option eq 'reboot') { $t->cmd( String => "rebootduration $plug:$reboot_duration", Output => \$cmd_results ); } if ($option eq 'off') { $t->cmd( String => "poweroffdelay $plug:$power_off_delay", Output => \$cmd_results ); } if ($option eq 'on') { $t->cmd( String => "powerondelay $plug:$power_on_delay", Output => \$cmd_results ); } $ok = $t->cmd( String => "$option $plug", Output => \$cmd_results ); #print $cmd_results; logout(); exit 0; ### END MAIN ####################################################### sub usage { print <<"EOT"; Usage: $Program_Name [options] Options: -a IP address or hostname of MasterSwitch -h usage -l Login name -n Outlet number to change: [:] -o Action: Reboot (default), Off or On -p Login password -q quiet mode -T Test mode (cancels action) -V version -v Log to file /tmp/apclog EOT exit 0; } sub fail { my ($msg)=@_; print $msg."\n" unless defined $Opts{'q'}; if (defined $t) { # make sure we don't get stuck in a loop due to errors $t->errmode('return'); if ($Logged_In) { logout(); } $t->close(); } exit 1; } sub fail_usage { my ($msg)=@_; print STDERR $msg."\n" if $msg; print STDERR "Please use '-h' for usage.\n"; exit 1; } sub version { print "$Program_Name $FENCE_RELEASE_NAME $BUILD_DATE\n"; print "$REDHAT_COPYRIGHT\n" if ( $REDHAT_COPYRIGHT ); exit 0; } sub login { for (my $i=0; $i<$max_open_tries; $i++) { $t->open($Opts{'a'}); my ($prompt) = $t->waitfor($login_prompt); # Expect 'User Name : ' if ((not defined $prompt) || ($prompt !~ /name/i)) { $t->close(); sleep($open_wait); next; } $t->print($Opts{'l'}); ($prompt) = $t->waitfor($login_prompt); # Expect 'Password : ' if ((not defined $prompt) || ($prompt !~ /password/i )) { $t->close(); sleep($open_wait); next; } # Send password $t->print("$Opts{'p'} -c"); # The appended ' -c' activates the CLI interface my ($dummy, $login_result) = $t->waitfor('/(APC>|(?i:user name|password)\s*:) /'); if ($login_result =~ m/APC> /) { $Logged_In = 1; # send newline to flush prompt $t->print(""); return; } else { fail("invalid username or password ($login_result)"); } } fail("failed: telnet failed: " . $t->errmsg."\n"); } sub logout { $t->cmd("logout"); return; } sub get_options_stdin { my $opt; my $line = 0; while( defined($opt = <>) ) { chomp $opt; # strip leading and trailing whitespace $opt =~ s/^\s*//; $opt =~ s/\s*$//; # skip comments next if ($opt =~ m/^#/); $line += 1; next if ($opt eq ''); my ($name, $val) = split(/\s*=\s*/, $opt); if ( $name eq "" ) { print STDERR "parse error: illegal name in option $line\n"; exit 2; } elsif ($name eq "agent" ) { } # DO NOTHING -- this field is used by fenced elsif ($name eq "ipaddr" ) { $Opts{'a'} = $val; } elsif ($name eq "login" ) { $Opts{'l'} = $val; } elsif ($name eq "option" ) { $Opts{'o'} = $val; } elsif ($name eq "passwd" ) { $Opts{'p'} = $val; } elsif ($name eq "port" ) { $Opts{'n'} = $val; } elsif ($name eq "switch" ) { $SwitchNum = $val; } elsif ($name eq "test" ) { $Opts{'T'} = $val; } elsif ($name eq "verbose" ) { $Opts{'v'} = $val; } } } sub telnet_error { if ($t->errmsg ne "pattern match timed-out") { fail("failed: telnet returned: " . $t->errmsg . "\n"); } else { $t->print(""); } } From swplotner at amherst.edu Tue Jan 8 21:44:05 2008 From: swplotner at amherst.edu (Steffen Plotner) Date: Tue, 8 Jan 2008 16:44:05 -0500 Subject: [Linux-cluster] fence_apc - Perl based CLI version In-Reply-To: Message-ID: <150F55E3591CD042B77ED3DB957854652B610E@mail7.amherst.edu> Hi, I currently use a different method of fencing in our clusters (using iscsi ietd and iptables currently), however we have the APC7900 PDUs and do control them using SNMP: Use SNMP set commands to turn off and on the ports: snmpset -c PowerNet-MIB::rPDUOutletControlOutletCommand. i 2 The 'i' refers to an integer and the digit 2 means to power off the port. That has worked very reliably. Steffen > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Benjamin Franz > Sent: Tuesday, January 08, 2008 4:32 PM > To: linux clustering > Subject: [Linux-cluster] fence_apc - Perl based CLI version > > As others have reported, the current fence_apc shipping with > RHEL5.1/CentOS5.1 simply does not work reliably on newer APC > firmwares. It breaks under all kinds of conditions (some as > simple as 'works on some ports but not on other ports'). > > Since I *really* need it to work, I hacked together a Perl > version (derived from the old fence_apc.pl in CVS) that uses > the APC command line interface and dispenses with the 'menu > scraping' interface entirely. > > I don't have any switches here that use a switchnum interface > so I couldn't hack anything together for that. But it appears > to reliably do what it is supposed to do (at least on my > APC7900 switches running AOS > 3.3.4): Fence. > > It would make a great deal of sense for someone to add it to > the Luci/CMAN list of supported fences. Maybe "APC Power > Device (CLI) / fence_apc_cli"? > > -- > Benjamin Franz > > "It is moronic to predict without first establishing an error rate > for a prediction and keeping track of one's past record of > accuracy." > -- Nassim Nicholas Taleb, Fooled By Randomness > From teigland at redhat.com Tue Jan 8 22:56:09 2008 From: teigland at redhat.com (David Teigland) Date: Tue, 8 Jan 2008 16:56:09 -0600 Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) In-Reply-To: References: Message-ID: <20080108225609.GB27979@redhat.com> On Fri, Jan 04, 2008 at 04:18:45PM -0500, Charlie Brady wrote: > We've reduced the application code to a simple test case. The following > code run on each node will soon block, and doesn't receive signals until > the peer node is shutdown: > > ... > fl.l_whence=SEEK_SET; > fl.l_start=0; > fl.l_len=1; > > while (1) > { > fl.l_type=F_WRLCK; > retval=fcntl(filedes,F_SETLKW,&fl); > if (retval==-1) > { > perror("lock"); > exit(1); > } > // attempt to unlock the index file > fl.l_type=F_UNLCK; > retval=fcntl(filedes,F_SETLKW,&fl); > if (retval==-1) > { > perror("unlock"); > exit(1); > } > } Yes, this stresses a problematic design limitation in the RHEL4 dlm where the dlm master node is ping-ponging all over the place and becomes so unstable that everything comes to a halt. One possible work-around is to modify the program to hold a lock on filedes to keep the master stable, e.g. hold a zero length lock at some unused offset like 0xFFFFFF. Dave From charlieb-linux-cluster at e-smith.com Wed Jan 9 03:39:51 2008 From: charlieb-linux-cluster at e-smith.com (Charlie Brady) Date: Tue, 8 Jan 2008 22:39:51 -0500 (EST) Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) In-Reply-To: <4783E655.3080102@bobich.net> Message-ID: On Tue, 8 Jan 2008, Gordan Bobic wrote: > Charlie Brady wrote: > > On Fri, 4 Jan 2008, Charlie Brady wrote: > > > >> I'm helping a colleague to collect information on an application lockup > >> problem on a two-node DLM/GFS cluster, with GFS on a shared SCSI array. > >> > >> I'd appreciate advice as to what information to collect next. > > > > Nobody have any advice? > > Shared SCSI as in iSCSI SAN or as in a shared SCSI bus with two machines > connected via a SCSI cable? The latter. I don't have the details immediately at hand, but it's all HP gear. A pair of DL380s with an external SCSI array (MSAxx), IIRC. --- Charlie From charlieb-linux-cluster at e-smith.com Wed Jan 9 03:43:16 2008 From: charlieb-linux-cluster at e-smith.com (Charlie Brady) Date: Tue, 8 Jan 2008 22:43:16 -0500 (EST) Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) In-Reply-To: <20080108225609.GB27979@redhat.com> Message-ID: On Tue, 8 Jan 2008, David Teigland wrote: > On Fri, Jan 04, 2008 at 04:18:45PM -0500, Charlie Brady wrote: > > We've reduced the application code to a simple test case. The following > > code run on each node will soon block, and doesn't receive signals until > > the peer node is shutdown: ... > Yes, this stresses a problematic design limitation in the RHEL4 dlm where > the dlm master node is ping-ponging all over the place and becomes so > unstable that everything comes to a halt. One possible work-around is to > modify the program to hold a lock on filedes to keep the master stable, > e.g. hold a zero length lock at some unused offset like 0xFFFFFF. Thanks. I've passed the advice on. -- Charlie From gnobal at gmail.com Wed Jan 9 08:51:52 2008 From: gnobal at gmail.com (Amit Schreiber) Date: Wed, 9 Jan 2008 10:51:52 +0200 Subject: [Linux-cluster] fence_apc - Perl based CLI version In-Reply-To: <150F55E3591CD042B77ED3DB957854652B610E@mail7.amherst.edu> References: <150F55E3591CD042B77ED3DB957854652B610E@mail7.amherst.edu> Message-ID: Hi, There's a fence_apc_snmp.py script available in the cluster code repository: http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/fence/agents/apc/?cvsroot=cluster I tested it a little (replaced /sbin/fence_apc with it - they both have the same CLI parameters) and it seems to work where the fence_apc script shipped with RHEL5 fails. Amit On Jan 8, 2008 11:44 PM, Steffen Plotner wrote: > Hi, > > I currently use a different method of fencing in our clusters (using > iscsi ietd and iptables currently), however we have the APC7900 PDUs and > do control them using SNMP: > > Use SNMP set commands to turn off and on the ports: > > snmpset -c > PowerNet-MIB::rPDUOutletControlOutletCommand. i 2 > > The 'i' refers to an integer and the digit 2 means to power off the > port. > > That has worked very reliably. > > Steffen > > > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Benjamin Franz > > Sent: Tuesday, January 08, 2008 4:32 PM > > To: linux clustering > > Subject: [Linux-cluster] fence_apc - Perl based CLI version > > > > As others have reported, the current fence_apc shipping with > > RHEL5.1/CentOS5.1 simply does not work reliably on newer APC > > firmwares. It breaks under all kinds of conditions (some as > > simple as 'works on some ports but not on other ports'). > > > > Since I *really* need it to work, I hacked together a Perl > > version (derived from the old fence_apc.pl in CVS) that uses > > the APC command line interface and dispenses with the 'menu > > scraping' interface entirely. > > > > I don't have any switches here that use a switchnum interface > > so I couldn't hack anything together for that. But it appears > > to reliably do what it is supposed to do (at least on my > > APC7900 switches running AOS > > 3.3.4): Fence. > > > > It would make a great deal of sense for someone to add it to > > the Luci/CMAN list of supported fences. Maybe "APC Power > > Device (CLI) / fence_apc_cli"? > > > > -- > > Benjamin Franz > > > > "It is moronic to predict without first establishing an error rate > > for a prediction and keeping track of one's past record of > > accuracy." > > -- Nassim Nicholas Taleb, Fooled By Randomness > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From Alain.Moulle at bull.net Wed Jan 9 14:04:03 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Wed, 09 Jan 2008 15:04:03 +0100 Subject: [Linux-cluster] CS5 : clurgmgrd[28359]: segfault Message-ID: <4784D453.6050005@bull.net> Hi Testing the CS5 on a two-nodes cluster with quorum disk, when I did the test ifdown on the heart-beat interface, I got a segfault in log : Jan 9 09:45:16 s_sys at am1 avahi-daemon[3106]: Interface eth0.IPv6 no longer relevant for mDNS. Jan 9 09:45:18 s_sys at am1 qdiskd[28265]: Heuristic: 'ping -t1 -c1 172.19.1.99' missed (1/3) Jan 9 09:45:25 s_sys at am1 openais[28300]: [TOTEM] The token was lost in the OPERATIONAL state. Jan 9 09:45:25 s_sys at am1 openais[28300]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). Jan 9 09:45:25 s_sys at am1 openais[28300]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Jan 9 09:45:25 s_sys at am1 openais[28300]: [TOTEM] The network interface is down. Jan 9 09:45:25 s_sys at am1 openais[28300]: [TOTEM] entering GATHER state from 15. Jan 9 09:45:25 s_sys at am1 openais[28300]: [TOTEM] entering GATHER state from 2. Jan 9 09:45:28 s_sys at am1 qdiskd[28265]: Heuristic: 'ping -t1 -c1 172.19.1.99' missed (2/3) Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] entering GATHER state from 0. Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] Creating commit token because I am the rep. Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] Saving state aru 5c high seq received 5c Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] Storing new sequence id for ring 12c Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] entering COMMIT state. Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] entering RECOVERY state. Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] position [0] member 127.0.0.1: Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] previous ring seq 296 rep 172.19.1.78 Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] aru 5c high delivered 5c received flag 1 Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] Did not need to originate any messages in recovery. Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] Sending initial ORF token Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] CLM CONFIGURATION CHANGE Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] New Configuration: Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] r(0) ip(127.0.0.1) Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] Members Left: Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] r(0) ip(172.19.1.79) Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] Members Joined: Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] CLM CONFIGURATION CHANGE Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] New Configuration: Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] r(0) ip(127.0.0.1) Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] Members Left: Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] Members Joined: Jan 9 09:45:30 s_sys at am1 openais[28300]: [SYNC ] This node is within the primary component and will provide service. Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] entering OPERATIONAL state. Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] got nodejoin message 172.16.101.91 Jan 9 09:45:30 s_sys at am1 openais[28300]: [EVT ] recovery error node: r(0) ip(127.0.0.1) not found Jan 9 09:45:30 s_kernel at am1 kernel: clurgmgrd[28359]: segfault at 0000000000000000 rip 0000000000408c4a rsp 00007fff04a2c450 error 4 Jan 9 09:45:30 s_sys at am1 gfs_controld[28328]: cluster is down, exiting Jan 9 09:45:30 s_kernel at am1 kernel: dlm: closing connection to node 2 Jan 9 09:45:30 s_kernel at am1 kernel: dlm: closing connection to node 0 Jan 9 09:45:30 s_kernel at am1 kernel: dlm: closing connection to node 1 Jan 9 09:45:30 s_sys at am1 dlm_controld[28322]: cluster is down, exiting Jan 9 09:45:30 s_sys at am1 fenced[28316]: cman_get_nodes error -1 104 Jan 9 09:45:30 s_sys at am1 fenced[28316]: cluster is down, exiting Jan 9 09:45:30 s_sys at am1 clurgmgrd[28358]: Watchdog: Daemon died, rebooting... Jan 9 09:45:30 s_sys at am1 shutdown[18377]: shutting down for system halt Is-it already a known problem ? Thanks Regards Alain Moull? From Alexandre.Racine at mhicc.org Wed Jan 9 16:23:41 2008 From: Alexandre.Racine at mhicc.org (Alexandre Racine) Date: Wed, 9 Jan 2008 11:23:41 -0500 Subject: [Linux-cluster] scsi reservation References: <4784D453.6050005@bull.net> Message-ID: Hi all, I am currently using version 1.0.4 of GFS and the scsi reservation binairies (scsi_reserve, fence_scsi, etc) are not there. Is it suppose to be like this or this is the distro I a using playing games with me (not my choice! It's Gentoo). If it's normal that they are not there, is there a reason for this? Does it work well? Because it's still here : http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/fence/agents/scsi/?cvsroot=cluster Thanks. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2541 bytes Desc: not available URL: From Abdel.Sadek at lsi.com Wed Jan 9 16:45:10 2008 From: Abdel.Sadek at lsi.com (Sadek, Abdel) Date: Wed, 9 Jan 2008 09:45:10 -0700 Subject: [Linux-cluster] RE: scsi reservation In-Reply-To: Message-ID: I believe you may not have the sg3_utils packages installed. I'll first check for that. Thanks. Abdel.. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Alexandre Racine Sent: Wednesday, January 09, 2008 10:24 AM To: linux clustering Subject: scsi reservation Hi all, I am currently using version 1.0.4 of GFS and the scsi reservation binairies (scsi_reserve, fence_scsi, etc) are not there. Is it suppose to be like this or this is the distro I a using playing games with me (not my choice! It's Gentoo). If it's normal that they are not there, is there a reason for this? Does it work well? Because it's still here : http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/fence/agents/scsi/? cvsroot=cluster Thanks. From Alexandre.Racine at mhicc.org Wed Jan 9 17:28:41 2008 From: Alexandre.Racine at mhicc.org (Alexandre Racine) Date: Wed, 9 Jan 2008 12:28:41 -0500 Subject: [Linux-cluster] RE: scsi reservation References: Message-ID: You are right, that package was not installed. So now I installed the package, and recompiled "fence", but "fence_scsi" is still not there in /sbin/ Any more idea? (Thanks for the first hint). Alexandre Racine Projets sp?ciaux 514-461-1300 poste 3304 alexandre.racine at mhicc.org -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of Sadek, Abdel Sent: Wed 2008-01-09 11:45 To: linux clustering Subject: [Linux-cluster] RE: scsi reservation I believe you may not have the sg3_utils packages installed. I'll first check for that. Thanks. Abdel.. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Alexandre Racine Sent: Wednesday, January 09, 2008 10:24 AM To: linux clustering Subject: scsi reservation Hi all, I am currently using version 1.0.4 of GFS and the scsi reservation binairies (scsi_reserve, fence_scsi, etc) are not there. Is it suppose to be like this or this is the distro I a using playing games with me (not my choice! It's Gentoo). If it's normal that they are not there, is there a reason for this? Does it work well? Because it's still here : http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/fence/agents/scsi/? cvsroot=cluster Thanks. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3241 bytes Desc: not available URL: From kanderso at redhat.com Wed Jan 9 19:53:17 2008 From: kanderso at redhat.com (Kevin Anderson) Date: Wed, 09 Jan 2008 13:53:17 -0600 Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) In-Reply-To: References: Message-ID: <1199908397.3277.44.camel@dhcp80-204.msp.redhat.com> On Tue, 2008-01-08 at 22:39 -0500, Charlie Brady wrote: > On Tue, 8 Jan 2008, Gordan Bobic wrote: > > > Charlie Brady wrote: > > > On Fri, 4 Jan 2008, Charlie Brady wrote: > > > > > >> I'm helping a colleague to collect information on an application lockup > > >> problem on a two-node DLM/GFS cluster, with GFS on a shared SCSI array. > > >> > > >> I'd appreciate advice as to what information to collect next. > > > > > > Nobody have any advice? > > > > Shared SCSI as in iSCSI SAN or as in a shared SCSI bus with two machines > > connected via a SCSI cable? > > The latter. I don't have the details immediately at hand, but it's all HP > gear. A pair of DL380s with an external SCSI array (MSAxx), IIRC. > If it is a MSA20, MSA30 or MSA500 - they won't work with GFS. Shared SCSI bus isn't really shared, accesses lock the bus such that when one node accesses the storage the other node is locked out. GFS requires the ability to do shared concurrent access to the storage devices. This probably explains the hangs you were seeing. So, either get an iSCSI or fibre channel storage array, or go strictly with a failover storage architecture, such that only one node has the filesystem mounted at any one time. In that case, you don't need gfs anymore, just cluster suite to manage the failover. Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From comaniliut at yahoo.com Wed Jan 9 20:56:06 2008 From: comaniliut at yahoo.com (Coman ILIUT) Date: Wed, 9 Jan 2008 12:56:06 -0800 (PST) Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) Message-ID: <31596.3345.qm@web51308.mail.re2.yahoo.com> We're using an MSA500 actually, so what you're saying is that we're not using the proper hardware for GFS. Can you tell us how bad is this? The reason I'm asking is because we are already at the second version of our product using this solution and we did not have any issues before. So we never considered the hardware to be an issue. When we picked this solution, HP presented MSA500 as being able to do concurrent access to files (of course there's some serialization inside, there's only one set of reading heads in the hard disk). Also, HP DL360 have the ILO interface, which is supported by GFS. The difference now is that we are using file locking heavily and we're using files in multi-access mode. Everything seems to work fine, except for the locking. Coman Kevin Anderson wrote: On Tue, 2008-01-08 at 22:39 -0500, Charlie Brady wrote: On Tue, 8 Jan 2008, Gordan Bobic wrote: > Charlie Brady wrote: > > On Fri, 4 Jan 2008, Charlie Brady wrote: > > > >> I'm helping a colleague to collect information on an application lockup > >> problem on a two-node DLM/GFS cluster, with GFS on a shared SCSI array. > >> > >> I'd appreciate advice as to what information to collect next. > > > > Nobody have any advice? > > Shared SCSI as in iSCSI SAN or as in a shared SCSI bus with two machines > connected via a SCSI cable? The latter. I don't have the details immediately at hand, but it's all HP gear. A pair of DL380s with an external SCSI array (MSAxx), IIRC. If it is a MSA20, MSA30 or MSA500 - they won't work with GFS. Shared SCSI bus isn't really shared, accesses lock the bus such that when one node accesses the storage the other node is locked out. GFS requires the ability to do shared concurrent access to the storage devices. This probably explains the hangs you were seeing. So, either get an iSCSI or fibre channel storage array, or go strictly with a failover storage architecture, such that only one node has the filesystem mounted at any one time. In that case, you don't need gfs anymore, just cluster suite to manage the failover. Kevin -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster Looking for a X-Mas gift? Everybody needs a Flickr Pro Account. http://www.flickr.com/gift/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kanderso at redhat.com Wed Jan 9 21:47:21 2008 From: kanderso at redhat.com (Kevin Anderson) Date: Wed, 09 Jan 2008 15:47:21 -0600 Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) In-Reply-To: <31596.3345.qm@web51308.mail.re2.yahoo.com> References: <31596.3345.qm@web51308.mail.re2.yahoo.com> Message-ID: <1199915241.3277.46.camel@dhcp80-204.msp.redhat.com> Sorry, Lon gave me updated info about the MSA500. It isn't a parallel shared scsi bus configuration, so might work with gfs. However, we have never run with it before and not sure about the performance characteristics. Kevin On Wed, 2008-01-09 at 12:56 -0800, Coman ILIUT wrote: > We're using an MSA500 actually, so what you're saying is that we're > not using the proper hardware for GFS. > Can you tell us how bad is this? The reason I'm asking is because we > are already at the second version of our product using this solution > and we did not have any issues before. So we never considered the > hardware to be an issue. > > When we picked this solution, HP presented MSA500 as being able to do > concurrent access to files (of course there's some serialization > inside, there's only one set of reading heads in the hard disk). Also, > HP DL360 have the ILO interface, which is supported by GFS. > > The difference now is that we are using file locking heavily and we're > using files in multi-access mode. Everything seems to work fine, > except for the locking. > > Coman > > Kevin Anderson wrote: > On Tue, 2008-01-08 at 22:39 -0500, Charlie Brady wrote: > > On Tue, 8 Jan 2008, Gordan Bobic wrote: > Charlie Brady wrote: > > On Fri, 4 Jan 2008, Charlie Brady wrote: > > > >> I'm helping a colleague to collect information on an application lockup > >> problem on a two-node DLM/GFS cluster, with GFS on a shared SCSI array. > >> > >> I'd appreciate advice as to what information to collect next. > > > > > > Nobody have any advice? > > Shared SCSI as in iSCSI SAN or as in a shared SCSI bus with two machines > connected via a SCSI cable? The latter. I don't have the details immediately at hand, but it's all HP gear. A pair of DL380s with an external SCSI array (MSAxx), IIRC. > If it is a MSA20, MSA30 or MSA500 - they won't work with GFS. > Shared SCSI bus isn't really shared, accesses lock the bus > such that when one node accesses the storage the other node is > locked out. GFS requires the ability to do shared concurrent > access to the storage devices. This probably explains the > hangs you were seeing. So, either get an iSCSI or fibre > channel storage array, or go strictly with a failover storage > architecture, such that only one node has the filesystem > mounted at any one time. In that case, you don't need gfs > anymore, just cluster suite to manage the failover. > > Kevin > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > ______________________________________________________________________ > Ask a question on any topic and get answers from real people. Go to > Yahoo! Answers. > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Wed Jan 9 21:54:09 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 09 Jan 2008 16:54:09 -0500 Subject: [Linux-cluster] CS5 : clurgmgrd[28359]: segfault In-Reply-To: <4784D453.6050005@bull.net> References: <4784D453.6050005@bull.net> Message-ID: <1199915649.16312.149.camel@ayanami.boston.devel.redhat.com> On Wed, 2008-01-09 at 15:04 +0100, Alain Moulle wrote: > Hi > > Testing the CS5 on a two-nodes cluster with quorum disk, when I did > the test ifdown on the heart-beat interface, I got a segfault in log : > Jan 9 09:45:30 s_sys at am1 openais[28300]: [TOTEM] entering OPERATIONAL state. > Jan 9 09:45:30 s_sys at am1 openais[28300]: [CLM ] got nodejoin message 172.16.101.91 > Jan 9 09:45:30 s_sys at am1 openais[28300]: [EVT ] recovery error node: r(0) > ip(127.0.0.1) not found > Jan 9 09:45:30 s_kernel at am1 kernel: clurgmgrd[28359]: segfault at > 0000000000000000 rip 0000000000408c4a rsp 00007fff04a2c450 error 4 > Jan 9 09:45:30 s_sys at am1 gfs_controld[28328]: cluster is down, exiting > Jan 9 09:45:30 s_kernel at am1 kernel: dlm: closing connection to node 2 > Jan 9 09:45:30 s_kernel at am1 kernel: dlm: closing connection to node 0 > Jan 9 09:45:30 s_kernel at am1 kernel: dlm: closing connection to node 1 > Jan 9 09:45:30 s_sys at am1 dlm_controld[28322]: cluster is down, exiting > Jan 9 09:45:30 s_sys at am1 fenced[28316]: cman_get_nodes error -1 104 > Jan 9 09:45:30 s_sys at am1 fenced[28316]: cluster is down, exiting > Jan 9 09:45:30 s_sys at am1 clurgmgrd[28358]: Watchdog: Daemon died, > rebooting... > Jan 9 09:45:30 s_sys at am1 shutdown[18377]: shutting down for system halt > > Is-it already a known problem ? openais died, causing the dlm to go away and rgmanager to crash - the "nanny" clurgmgrd process rebooted the node. Although the segfault is probably less than ideal, the nanny process killing the node is probably fine since the node needs to be fenced at this point anyway. What should of happened with rgmanager is: * it should have seen a negative quorum transition, * halted cluster services uncleanly, and * wait to be fenced. -- Lon From lhh at redhat.com Wed Jan 9 21:59:46 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 09 Jan 2008 16:59:46 -0500 Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) In-Reply-To: <1199915241.3277.46.camel@dhcp80-204.msp.redhat.com> References: <31596.3345.qm@web51308.mail.re2.yahoo.com> <1199915241.3277.46.camel@dhcp80-204.msp.redhat.com> Message-ID: <1199915986.16312.155.camel@ayanami.boston.devel.redhat.com> On Wed, 2008-01-09 at 15:47 -0600, Kevin Anderson wrote: > Sorry, Lon gave me updated info about the MSA500. It isn't a parallel > shared scsi bus configuration, so might work with gfs. However, we > have never run with it before and not sure about the performance > characteristics. It's a multi-port SCSI RAID array, but it's not a multi-initiator parallel SCSI bus (which absolutely does not work with GFS: ex: Dell PowerVault 220S). The MSA500 has an on-box RAID controller with multiple SCSI ports, which are attached using SCSI cables to CCISS controllers in the host machines. While CCISS are host-RAID controllers, as I understand it, when talking to MSA500 arrays, they act just like dumb SCSI controllers (that are 10x the cost of regular dumb SCSI controllers, of course!) - and do nothing "intelligent" at all - leaving it up to the MSA controller to handle all the RAID operations. Also, if I'm not mistaken, each port on the MSA RAID controller is actually its own SCSI (well, cciss) bus, so you shouldn't hit typical SCSI bus problems. For example, you should not see bus resets for example during reboot of one of the nodes. -- Lon From lhh at redhat.com Wed Jan 9 22:09:40 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 09 Jan 2008 17:09:40 -0500 Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) In-Reply-To: <1199915986.16312.155.camel@ayanami.boston.devel.redhat.com> References: <31596.3345.qm@web51308.mail.re2.yahoo.com> <1199915241.3277.46.camel@dhcp80-204.msp.redhat.com> <1199915986.16312.155.camel@ayanami.boston.devel.redhat.com> Message-ID: <1199916580.16312.164.camel@ayanami.boston.devel.redhat.com> On Wed, 2008-01-09 at 16:59 -0500, Lon Hohberger wrote: > On Wed, 2008-01-09 at 15:47 -0600, Kevin Anderson wrote: > > Sorry, Lon gave me updated info about the MSA500. It isn't a parallel > > shared scsi bus configuration, so might work with gfs. However, we > > have never run with it before and not sure about the performance > > characteristics. > > It's a multi-port SCSI RAID array, but it's not a multi-initiator > parallel SCSI bus (which absolutely does not work with GFS: ex: Dell > PowerVault 220S). > > The MSA500 has an on-box RAID controller with multiple SCSI ports, which > are attached using SCSI cables to CCISS controllers in the host > machines. While CCISS are host-RAID controllers, as I understand it, > when talking to MSA500 arrays, they act just like dumb SCSI controllers > (that are 10x the cost of regular dumb SCSI controllers, of course!) - > and do nothing "intelligent" at all - leaving it up to the MSA > controller to handle all the RAID operations. Hmm, I have an archaic G1 -- apparently the MSA500G2 comes with two plain-Jane SCSI controllers: http://h18004.www1.hp.com/storage/disk_storage/msa_diskarrays/san_arrays/msa500g2/index.html "Two Ultra320 SCSI adapters included in MSA500 G2 package - no additional HBA purchase necessary" More important, however, is: http://h18004.www1.hp.com/storage/disk_storage/msa_diskarrays/san_arrays/index.html (Particularly - Note the "Dual Ultra320 SCSI Channels") Also in the FAQ: http://h18004.www1.hp.com/storage/disk_storage/msa_diskarrays/san_arrays/msa500g2/qa.html#9 "There is a market that is not currently being addressed by current SCSI JBOD or SAN products. There are limited products that offer such high availability features such as failover controllers on the storage enclosure and battery backed cache in an entry level product. The Modular Smart Array 500 G2 addresses this market head-on offering the entry level clustering and shared storage at an affordable price." So, in *theory* - the MSA500G2 should work, but as Kevin said, we have not tested it with GFS. -- Lon From jamesc at exa.com Wed Jan 9 22:16:11 2008 From: jamesc at exa.com (James Chamberlain) Date: Wed, 9 Jan 2008 17:16:11 -0500 (EST) Subject: [Linux-cluster] Instability troubles In-Reply-To: <1199374720.9564.20.camel@ayanami.boston.devel.redhat.com> References: <1199374720.9564.20.camel@ayanami.boston.devel.redhat.com> Message-ID: On Thu, 3 Jan 2008, Lon Hohberger wrote: > On Wed, 2008-01-02 at 17:35 -0500, James Chamberlain wrote: >> Hi all, >> >> I'm having some major stability problems with my three-node CS/GFS cluster. >> Every two or three days, one of the nodes fences another, and I have to >> hard-reboot the entire cluster to recover. I have had this happen twice >> today. I don't know what's triggering the fencing, since all the nodes >> appear to me to be up and running when it happens. In fact, I was logged >> on to node3 just now, running 'top', when node2 fenced it. >> >> When they come up, they don't automatically mount their GFS filesystems, >> even with "_netdev" specified as a mount option; however, the node which >> comes up first mounts them all as part of bringing all the services up. >> >> I did notice a couple of disconcerting things earlier today. First, I was >> running "watch clustat". (I prefer to see the time updating, where I >> can't with "clustat -i") > > The time is displayed in RHEL5 CVS version, and will go out with 5.2. > > >> At one point, "clustat" crashed as follows: >> >> Jan 2 15:19:54 node2 kernel: clustat[17720]: segfault at 0000000000000024 >> rip 0000003629e75bc0 rsp 00007fff18827178 error 4 > > A clustat crash is not a cause for a fence operation. That is, this > might be related, but is definitely not the cause of a node being > evicted. > > >> Fairly shortly thereafter, clustat reported that node3 as "Online, >> Estranged, rgmanager". Can anyone shed light on what that means? >> Google's not telling me much. > > Ordinarily, this happens when you have a node join the cluster manually > w/o giving it the configuration file. CMAN would assign it a node ID - > but the node is not in the cluster configuration - so clustat would > display the node as 'Estranged'. > > In your case, I'm not sure what the problem would be. I have a theory (see below). Does it give you any ideas what might have happened here? >> At the moment, all three nodes are running CentOS 5.1, with kernel >> 2.6.18-53.1.4.el5. Can anyone point me in the right direction to resolve >> these problems? I wasn't having trouble like this when I was running a >> CentOS 4 CS/GFS cluster. Is it possible to downgrade, likely via a full >> rebuild of all the nodes, from CentOS 5 CS/GFS to 4? Should I instead >> consider setting up a single node to mount the GFS filesystems and serve >> them out, to get around these fencing issues? > > I'd be interested a core file. Try to reproduce your clustat crash with > 'ulimit -c unlimited' set before running clustat. I haven't seen > clustat crash in a very long time, so I'm interested in the cause. > (Also, after the crash, check to see if ccsd is running...) I'll see what I can do for you. > Maybe it will uncover some other hints as to the cause of the behavior > you saw. > > If ccsd indeed failed for some reason, it would cause fencing to fail as > well because the fence daemon would be unable to read fencing actions. > > Even given all of this, this doesn't explain why the node needed to be > fenced in the first place. Were there any log messages indicating why > the node needed to be fenced? > > The RHEL5 / CentOS5 release of Cluster Suite has a fairly aggressive > node death timeout (5 seconds); maybe increasing it would help. > > > > > ... > I've come up with a theory on what's been going on, and so far, that theory appears to be panning out. At the very least, I haven't had any further crashes (yet). I'm hoping someone can validate it or tell me I need to keep looking. On each of the three nodes in my cluster, eth0 is used for cluster services (NFS) and the cluster's multicast group, and eth1 is used for iSCSI. I noticed that two of the three nodes were using DHCP on eth0, and that the problems always seemed to happen when the cluster was under a heavy load. My DHCP server was configured to give these nodes the same address every time, so they essentially had static addresses - they just used DHCP to get them. I think I spotted that there was a DHCP renewal going on at or just before the fencing started each time. My theory is that, under heavy load, this DHCP renewal process was somehow interfering with either the primary IP address for eth0 or with the cluster's multicast traffic and was causing the affected node(s) to get booted from the cluster. I have since switched all the nodes to use truely static addressing, and have not had a problem in the intervening week. I have not yet tried the "" trick that Lon mentioned, but I'm keeping that handy should problems crop up again. Thanks, James From charlieb-linux-cluster at e-smith.com Wed Jan 9 23:20:07 2008 From: charlieb-linux-cluster at e-smith.com (Charlie Brady) Date: Wed, 9 Jan 2008 18:20:07 -0500 (EST) Subject: [Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL) In-Reply-To: <1199908397.3277.44.camel@dhcp80-204.msp.redhat.com> Message-ID: On Wed, 9 Jan 2008, Kevin Anderson wrote: > If it is a MSA20, MSA30 or MSA500 - they won't work with GFS. Shared > SCSI bus isn't really shared, accesses lock the bus such that when one > node accesses the storage the other node is locked out. But only temporarily, surely. The filesystem should expect some latency, and all I/O is eventually serialised somewhere. > GFS requires the ability to do shared concurrent access to the storage > devices. This probably explains the hangs you were seeing. I doubt it. Both nodes were still able to access the file system. I also think there that shouldn't be any disk I/O behind fcntl(). Am I wrong? --- Charlie From jorge.gonzalez at degesys.com Thu Jan 10 16:18:21 2008 From: jorge.gonzalez at degesys.com (Jorge Gonzalez) Date: Thu, 10 Jan 2008 17:18:21 +0100 Subject: [Linux-cluster] Cluster fails after fencing by DRAC Message-ID: <4786454D.7030204@degesys.com> Hi all! I have a problem with 3 nodes cluster. When I run "fence_node node1" the node1 reeboot by drac succesfully. When node1 restarts then gets frozen: ------------------ starting clvmd: dlm: got connection fron 32 dlm: connecting to 33 dlm: got connection fron 33 [frozen] * cman_tool services shows: type level name id state fence 0 default 0001001f none [31 32 33] dlm 1 clvmd 00010020 none [31 32 33] dlm 1 rgmanager 00020020 none [32 33] It seems rgmanager has not 31 (?) * clustat shows: Member Status: Quorate Member Name ID Status ------ ---- ---- ------ xenr3u1.domain.com 31 Online xenr3u2.domain.com 32 Online, Local xenr3u3.domain.com 33 Online ------------------- Then I rebooted again the node1: Starting cluster Loading modules DLM ....... done starting ccsd starting cman starting daemons starting fencing [frozen again] after long time starting fencing [done] but cman_tool services fails * cman_tool services shows: type level name id state fence 0 default 0001001f FAIL_ALL_STOPPED [31 32 33] dlm 1 clvmd 00010020 FAIL_STOP_WAIT [31 32 33] dlm 1 rgmanager 00020020 FAIL_STOP_WAIT * clustat shows: Member Status: Quorate Member Name ID Status ------ ---- ---- ------ xenr3u1.domain.com 31 Online xenr3u2.domain.com 32 Online, Local xenr3u3.domain.com 33 Online /etc/init.d/rgmanager restart Shutting down Cluster Service Manager... Waiting for services to stop: [long timeeeeeeee] ---------------------------------- I saw this page translated to english (http://translate.google.com/translate?u=http%3A%2F%2Fken-etsu-tech.blogspot.com%2F2007%2F11%2Fred-hat-cluster-kernel-xen.html&langpair=ja%7Cen&hl=es&ie=UTF-8). It's exactly the same. A kernel bug? clvmd bug? Linux xenr3u2 2.6.18-8.1.15.el5xen #1 SMP Mon Oct 22 09:01:12 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux cman-2.0.64-1.0.1.el5 rgmanager-2.0.24-1.el5.centos lvm2-cluster-2.02.16-3.el5 Sometimes the node starts ok and cman_tool is also ok. * /etc/lvm.conf: devices { dir = "/dev" scan = [ "/dev" ] filter = [ "a/.*/" ] cache = "/etc/lvm/.cache" write_cache_state = 1 sysfs_scan = 1 md_component_detection = 1 } log { verbose = 0 syslog = 1 overwrite = 0 level = 0 indent = 1 command_names = 0 prefix = " " } backup { backup = 1 backup_dir = "/etc/lvm/backup" archive = 1 archive_dir = "/etc/lvm/archive" retain_min = 10 retain_days = 30 } shell { history_size = 100 } global { library_dir = "/usr/lib64" umask = 077 test = 0 activation = 1 proc = "/proc" locking_type = 3 fallback_to_clustered_locking = 1 fallback_to_local_locking = 1 locking_dir = "/var/lock/lvm" } activation { missing_stripe_filler = "/dev/ioerror" reserved_stack = 256 reserved_memory = 8192 process_priority = -18 mirror_region_size = 512 mirror_log_fault_policy = "allocate" mirror_device_fault_policy = "remove" } That's all ;-) Thanks in advance -------------- next part -------------- A non-text attachment was scrubbed... Name: jorge.gonzalez.vcf Type: text/x-vcard Size: 350 bytes Desc: not available URL: From Mathieu.MARY at neufcegetel.fr Fri Jan 11 11:00:09 2008 From: Mathieu.MARY at neufcegetel.fr (MARY, Mathieu) Date: Fri, 11 Jan 2008 12:00:09 +0100 Subject: [Linux-cluster] Cluster fails after fencing by DRAC In-Reply-To: <4786454D.7030204@degesys.com> Message-ID: <20080111102140.157A920B0F8@smtp3.ldcom.fr> hello, sorry to ask but is the "none" state a normal state for services? I have issues with cluster services too and I've been told that this state is not normal and indicates that the nodes didn't join the fence domain that causing issues with rgmanager too. what does show clustat and cman_tool services at startup ? regards, Mathieu -----Message d'origine----- De?: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] De la part de Jorge Gonzalez Envoy??: jeudi 10 janvier 2008 17:18 ??: linux-cluster at redhat.com Objet?: [Linux-cluster] Cluster fails after fencing by DRAC Hi all! I have a problem with 3 nodes cluster. When I run "fence_node node1" the node1 reeboot by drac succesfully. When node1 restarts then gets frozen: ------------------ starting clvmd: dlm: got connection fron 32 dlm: connecting to 33 dlm: got connection fron 33 [frozen] * cman_tool services shows: type level name id state fence 0 default 0001001f none [31 32 33] dlm 1 clvmd 00010020 none [31 32 33] dlm 1 rgmanager 00020020 none [32 33] It seems rgmanager has not 31 (?) * clustat shows: Member Status: Quorate Member Name ID Status ------ ---- ---- ------ xenr3u1.domain.com 31 Online xenr3u2.domain.com 32 Online, Local xenr3u3.domain.com 33 Online ------------------- Then I rebooted again the node1: Starting cluster Loading modules DLM ....... done starting ccsd starting cman starting daemons starting fencing [frozen again] after long time starting fencing [done] but cman_tool services fails * cman_tool services shows: type level name id state fence 0 default 0001001f FAIL_ALL_STOPPED [31 32 33] dlm 1 clvmd 00010020 FAIL_STOP_WAIT [31 32 33] dlm 1 rgmanager 00020020 FAIL_STOP_WAIT * clustat shows: Member Status: Quorate Member Name ID Status ------ ---- ---- ------ xenr3u1.domain.com 31 Online xenr3u2.domain.com 32 Online, Local xenr3u3.domain.com 33 Online /etc/init.d/rgmanager restart Shutting down Cluster Service Manager... Waiting for services to stop: [long timeeeeeeee] ---------------------------------- I saw this page translated to english (http://translate.google.com/translate?u=http%3A%2F%2Fken-etsu-tech.blogspot.com%2F2007%2F11%2Fred-hat-cluster-kernel-xen.html&langpair=ja%7Cen&hl=es&ie=UTF-8). It's exactly the same. A kernel bug? clvmd bug? Linux xenr3u2 2.6.18-8.1.15.el5xen #1 SMP Mon Oct 22 09:01:12 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux cman-2.0.64-1.0.1.el5 rgmanager-2.0.24-1.el5.centos lvm2-cluster-2.02.16-3.el5 Sometimes the node starts ok and cman_tool is also ok. * /etc/lvm.conf: devices { dir = "/dev" scan = [ "/dev" ] filter = [ "a/.*/" ] cache = "/etc/lvm/.cache" write_cache_state = 1 sysfs_scan = 1 md_component_detection = 1 } log { verbose = 0 syslog = 1 overwrite = 0 level = 0 indent = 1 command_names = 0 prefix = " " } backup { backup = 1 backup_dir = "/etc/lvm/backup" archive = 1 archive_dir = "/etc/lvm/archive" retain_min = 10 retain_days = 30 } shell { history_size = 100 } global { library_dir = "/usr/lib64" umask = 077 test = 0 activation = 1 proc = "/proc" locking_type = 3 fallback_to_clustered_locking = 1 fallback_to_local_locking = 1 locking_dir = "/var/lock/lvm" } activation { missing_stripe_filler = "/dev/ioerror" reserved_stack = 256 reserved_memory = 8192 process_priority = -18 mirror_region_size = 512 mirror_log_fault_policy = "allocate" mirror_device_fault_policy = "remove" } That's all ;-) Thanks in advance From saza_thi at yahoo.com Fri Jan 11 11:56:03 2008 From: saza_thi at yahoo.com (sahai srichock) Date: Fri, 11 Jan 2008 03:56:03 -0800 (PST) Subject: [Linux-cluster] cluster down network Message-ID: <79685.6189.qm@web54202.mail.re2.yahoo.com> I have two node cluster . /etc/cluster/cluster.conf