From mgrac at redhat.com Mon Feb 4 10:13:06 2013 From: mgrac at redhat.com (Marek Grac) Date: Mon, 04 Feb 2013 11:13:06 +0100 Subject: [Linux-cluster] fence-agents 3.1.12 stable release Message-ID: <510F89B2.1090904@redhat.com> Welcome to the fence-agents 3.1.12 release. This release includes these updates: * support for Hitachi Compute Blade 2000 * UUID can be entered direcctly as a port-number (-n / --plug / port=) * new options to set options for ssh connection (--ssh-options) * fix regression in detection of EOL for Dell CMC and Dell DRAC5 * massive code clean-up The new source tarball can be downloaded here: https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-3.1.12.tar.xz To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Thanks/congratulations to all people that contributed to achieve this milestone. m, From queszama at yahoo.in Mon Feb 4 10:42:58 2013 From: queszama at yahoo.in (Zama Ques) Date: Mon, 4 Feb 2013 18:42:58 +0800 (SGT) Subject: [Linux-cluster] fence_ipmilan Faiing for 'Administrator' user Message-ID: <1359974578.29368.YahooMailNeo@web193505.mail.sg3.yahoo.com> Hi All , Need help in configuring IPMI_Lan as fencing device for my cluster . The servers I am using are of make HP ProLiant Since fence_ipmilan internally uses ipmitool , I was trying to understand the use of ipmitool . For that purpose , I initially created a user named 'admin' using ipmitool. ===== # ipmitool user list 2 ID Name Callin Link Auth IPMI Msg Channel Priv Limit 1 Administrator true false true ADMINISTRATOR 2 admin true false true USER 3 (Empty User) true false false NO ACCESS 4 (Empty User) true false false NO ACCESS ______________________________ ]# ipmitool channel getciphers ipmi 2 ID IANA Auth Alg Integrity Alg Confidentiality Alg 0 N/A none none none 1 N/A hmac_sha1 none none 2 N/A hmac_sha1 hmac_sha1_96 none 3 N/A hmac_sha1 hmac_sha1_96 aes_cbc_128 ===== Using the 'admin' user , I am able to execute IPMI commands successfully. ===== ]# ipmitool -I lanplus -H 192.168.2.153 -U admin -L USER chassis status System Power : on Power Overload : false Power Interlock : inactive Main Power Fault : false ...... ...... ----------------------- ]# fence_ipmilan -L USER -a 192.168.2.153 -P lanplus -l admin -p xxxxxxx -T 4 -o status -v Getting status of IPMI:192.168.2.153...Spawning: '/usr/bin/ipmitool -I lanplus -H '192.168.2.153' -U 'ssdg' -L 'USER' -P '[set]' -v chassis power status'... Chassis power = On Done ======= But the same above commands fails if I use the 'Administrator' User. ===== # ipmitool -I lanplus -H 192.168.2.153 -U Administrator -L ADMINISTRATOR chassis status Password: Error: Unable to establish IPMI v2 / RMCP+ session Error sending Chassis Status command # ipmitool -I lanplus -H 192.168.2.153 -U Administrator chassis status Password: Error: Unable to establish IPMI v2 / RMCP+ session Error sending Chassis Status command ======= I am using the default password for 'Administrator' user which is printed on a little cardboard card attached to the server Kindly guide where I went wrong ? Thanks in Advance Zaman -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Mon Feb 4 12:53:38 2013 From: lists at alteeve.ca (Digimer) Date: Mon, 04 Feb 2013 07:53:38 -0500 Subject: [Linux-cluster] fence_ipmilan Faiing for 'Administrator' user In-Reply-To: <1359974578.29368.YahooMailNeo@web193505.mail.sg3.yahoo.com> References: <1359974578.29368.YahooMailNeo@web193505.mail.sg3.yahoo.com> Message-ID: <510FAF52.5070100@alteeve.ca> On 02/04/2013 05:42 AM, Zama Ques wrote: > Hi All , > > Need help in configuring IPMI_Lan as fencing device for my cluster . The > servers I am using are of make HP ProLiant > > Since fence_ipmilan internally uses ipmitool , I was trying to > understand the use of ipmitool . For that purpose , I initially created > a user named 'admin' using ipmitool. > > ===== > > |# ipmitool user list 2 > ID Name Callin Link Auth IPMI Msg Channel Priv Limit > 1 Administrator true false true ADMINISTRATOR > 2 admin true false true USER > 3 (Empty User) true false false NO ACCESS > 4 (Empty User) true false false NO ACCESS > ______________________________ > > ]# ipmitool channel getciphers ipmi 2 > ID IANA Auth Alg Integrity Alg Confidentiality Alg > 0 N/A none none none > 1 N/A hmac_sha1 none none > 2 N/A hmac_sha1 hmac_sha1_96 none > 3 N/A hmac_sha1 hmac_sha1_96 aes_cbc_128 > > ===== > > Using the 'admin' user , I am able to execute IPMI commands successfully. > > ===== > ]# ipmitool -I lanplus -H 192.168.2.153 -U admin -L USER chassis status > System Power : on > Power Overload : false > Power Interlock : inactive > Main Power Fault : false > ...... > ...... > ----------------------- > ]# fence_ipmilan -L USER -a 192.168.2.153 -P lanplus -l admin -p xxxxxxx -T 4 -o status -v > Getting status of IPMI:192.168.2.153...Spawning: '/usr/bin/ipmitool -I lanplus -H '192.168.2.153' -U 'ssdg' -L 'USER' -P '[set]' -v chassis power status'... > Chassis power = On > Done > ======= > > > But the same above commands fails if I use the 'Administrator' User. > > ===== > # ipmitool -I lanplus -H 192.168.2.153 -U Administrator -L ADMINISTRATOR chassis status > Password: > Error: Unable to establish IPMI v2 / RMCP+ > session > Error sending Chassis Status command > > # ipmitool -I lanplus -H 192.168.2.153 -U Administrator chassis status > Password: > Error: Unable to establish IPMI v2 / RMCP+ session > Error sending Chassis Status command > ======= > > I am using the default password for 'Administrator' user ||which is printed on a little cardboard card attached to the server > > Kindly guide where I went wrong ? > > Thanks in Advance > Zaman > | This appears to be a problem below fence_ipmilan. My first guess would be that something is lower-casing the "A". Can you create a user "administrator" and if so, does that work? Have you tried putting the user name in double-quotes (no idea if that would make a difference)? ie: '... -U "Administrator" ...'? -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From rossnick-lists at cybercat.ca Mon Feb 4 15:09:51 2013 From: rossnick-lists at cybercat.ca (Nicolas Ross) Date: Mon, 04 Feb 2013 10:09:51 -0500 Subject: [Linux-cluster] Active Storage is closing... Message-ID: <510FCF3F.50709@cybercat.ca> Hi ! When we first switch to RH Cluster, we were switching from xServers / xSan solution, we where about to change or old xServe raid pata disk raid enclosure to another raid encloser (still with xSan on OS X), from Active Storage. The disk raid encloser was installed in the rack, and all. We where about to upgrade old xserve g5 to newer xserve with intel-base xeons. Then apple discontinued their xserve product and we switch all of operations to a linux-based cluster with RH Cluster suite, still using the active raid encloser and fiber channel swicthes we had then. This was at the begining of 2011. By the end of 2011, we also have a second cluster, using the same technology (intel server / qlogic FC switch / active raid disk enclosure). Now, I just heard that active storage is closing it's doors. I was having trouble with one of the controler of on of our unit. I have contacted my reseller to see if they had any spare parts. So, for the future, if I were to switch to a new storage array, is there any suggestion I can get from you guys ? I am basicly searching for a good fibre channel raid enclosure that is well supported by RedHat and is not mac-centric or windows-centric... Thanks, From queszama at yahoo.in Tue Feb 5 03:12:49 2013 From: queszama at yahoo.in (Zama Ques) Date: Tue, 5 Feb 2013 11:12:49 +0800 (SGT) Subject: [Linux-cluster] fence_ipmilan Faiing for 'Administrator' user In-Reply-To: <510FAF52.5070100@alteeve.ca> References: <1359974578.29368.YahooMailNeo@web193505.mail.sg3.yahoo.com> <510FAF52.5070100@alteeve.ca> Message-ID: <1360033969.458.YahooMailNeo@web193506.mail.sg3.yahoo.com> ________________________________ From: Digimer To: Zama Ques ; linux clustering Sent: Monday, 4 February 2013 6:23 PM Subject: Re: [Linux-cluster] fence_ipmilan Faiing for 'Administrator' user On 02/04/2013 05:42 AM, Zama Ques wrote: > Hi All , > > Need help in configuring IPMI_Lan as fencing device for my cluster . The > servers I am using are of make HP ProLiant > > Since fence_ipmilan internally uses ipmitool , I was trying to > understand the use of ipmitool . For that purpose , I initially created > a user named 'admin' using ipmitool. > > ===== > > |# ipmitool user list 2 > ID? Name? ? ? ? ? ? Callin? Link Auth? IPMI Msg? Channel Priv Limit > 1? Administrator? ? true? ? false? ? ? true? ? ? ADMINISTRATOR > 2? admin? ? ? ? ? ? true? ? false? ? ? true? ? ? USER > 3? (Empty User)? ? true? ? false? ? ? false? ? ? NO ACCESS > 4? (Empty User)? ? true? ? false? ? ? false? ? ? NO ACCESS > ______________________________ > > ]# ipmitool channel getciphers ipmi 2 > ID? IANA? ? Auth Alg? ? ? ? Integrity Alg? Confidentiality Alg > 0? ? N/A? ? none? ? ? ? ? ? none? ? ? ? ? ? none? ? ? ? ? > 1? ? N/A? ? hmac_sha1? ? ? none? ? ? ? ? ? none? ? ? ? ? > 2? ? N/A? ? hmac_sha1? ? ? hmac_sha1_96? ? none? ? ? ? ? > 3? ? N/A? ? hmac_sha1? ? ? hmac_sha1_96? ? aes_cbc_128? > > ===== > > Using the 'admin' user , I am able to execute IPMI commands successfully. > > ===== > ]#? ipmitool -I lanplus -H 192.168.2.153 -U admin -L USER chassis status > System Power? ? ? ? : on > Power Overload? ? ? : false > Power Interlock? ? ? : inactive > Main Power Fault? ? : false > ...... > ...... > ----------------------- > ]# fence_ipmilan -L USER -a 192.168.2.153 -P lanplus? -l admin -p xxxxxxx -T 4? -o status -v > Getting status of IPMI:192.168.2.153...Spawning: '/usr/bin/ipmitool -I lanplus -H '192.168.2.153' -U 'ssdg' -L 'USER' -P '[set]' -v chassis power status'... > Chassis power = On > Done > ======= > > > But the same above commands fails if I use the 'Administrator' User. > > ===== > #? ipmitool -I lanplus -H 192.168.2.153 -U Administrator -L ADMINISTRATOR chassis status > Password: > Error: Unable to establish IPMI v2 / RMCP+ >? session > Error sending Chassis Status command > > #? ipmitool -I lanplus -H 192.168.2.153 -U Administrator? chassis status > Password: > Error: Unable to establish IPMI v2 / RMCP+ session > Error sending Chassis Status command > ======= > > I am using the default password for 'Administrator' user ||which is? printed on a little cardboard card attached to the server > > Kindly guide where I went wrong ? > > Thanks in Advance > Zaman > | > This appears to be a problem below fence_ipmilan. > My first guess would be that something is lower-casing the "A". Can you > create a user "administrator" and if so, does that work? Have you tried > putting the user name in double-quotes (no idea if that would make a > difference)? ie: '... -U "Administrator" ...'? Thanks Digimer for the reply. Was able to verify that proper alphabet case is being used for 'Administrator' user. ==== # fence_ipmilan -L ADMINISTRATOR -a 192.168.2.153 -P lanplus? -l Administrator? -p "XXX" -T 4? -o status -v Getting status of IPMI:192.168.2.153...Spawning: '/usr/bin/ipmitool -I lanplus -H '192.168.2.153' -U 'Administrator' -L 'ADMINISTRATOR' -P '[set]' -v chassis power status'... Chassis power = Unknown Failed ==== Looks like it was not taking the default password for 'Administrator' user. ==== # ipmitool user test 1 20 XXX Set User Password command failed (user 1): Unknown (0x80) Failure: password incorrect # ipmitool user test 1 16 XXX Set User Password command failed (user 1): Unknown (0x80) Failure: password incorrect ----- # ipmitool user test 2 16 xxxx Success # ipmitool user test 2 20 xxxx Success ==== Changed privilege for 'admin' user to ADMINISTRATOR so that it can perform fencing. ==== ]# ipmitool user list 2 ID? Name???????????? Callin? Link Auth? IPMI Msg?? Channel Priv Limit 1?? Administrator??? true??? false????? true?????? ADMINISTRATOR 2?? admin ??????????? true??? false????? true?????? ADMINISTRATOR ==== Digimer , can you please let me know whether for performing fencing , ADMINISTRATOR level privilege is needed or lower privilege levels can perform fencing ? === ?? 1?? Callback level ?? 2?? User level ?? 3?? Operator level ?? 4?? Administrator level === Thanks Zaman -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Tue Feb 5 04:46:01 2013 From: lists at alteeve.ca (Digimer) Date: Mon, 04 Feb 2013 23:46:01 -0500 Subject: [Linux-cluster] fence_ipmilan Faiing for 'Administrator' user In-Reply-To: <1360033969.458.YahooMailNeo@web193506.mail.sg3.yahoo.com> References: <1359974578.29368.YahooMailNeo@web193505.mail.sg3.yahoo.com> <510FAF52.5070100@alteeve.ca> <1360033969.458.YahooMailNeo@web193506.mail.sg3.yahoo.com> Message-ID: <51108E89.4020109@alteeve.ca> On 02/04/2013 10:12 PM, Zama Ques wrote: > > > ------------------------------------------------------------------------ > *From:* Digimer > *To:* Zama Ques ; linux clustering > > *Sent:* Monday, 4 February 2013 6:23 PM > *Subject:* Re: [Linux-cluster] fence_ipmilan Faiing for 'Administrator' user > > On 02/04/2013 05:42 AM, Zama Ques wrote: >> Hi All , >> >> Need help in configuring IPMI_Lan as fencing device for my cluster . The >> servers I am using are of make HP ProLiant >> >> Since fence_ipmilan internally uses ipmitool , I was trying to >> understand the use of ipmitool . For that purpose , I initially created >> a user named 'admin' using ipmitool. >> >> ===== >> >> |# ipmitool user list 2 >> ID Name Callin Link Auth IPMI Msg Channel Priv Limit >> 1 Administrator true false true ADMINISTRATOR >> 2 admin true false true USER >> 3 (Empty User) true false false NO ACCESS >> 4 (Empty User) true false false NO ACCESS >> ______________________________ >> >> ]# ipmitool channel getciphers ipmi 2 >> ID IANA Auth Alg Integrity Alg Confidentiality Alg >> 0 N/A none none none >> 1 N/A hmac_sha1 none none >> 2 N/A hmac_sha1 hmac_sha1_96 none >> 3 N/A hmac_sha1 hmac_sha1_96 aes_cbc_128 >> >> ===== >> >> Using the 'admin' user , I am able to execute IPMI commands successfully. >> >> ===== >> ]# ipmitool -I lanplus -H 192.168.2.153 -U admin -L USER chassis status >> System Power : on >> Power Overload : false >> Power Interlock : inactive >> Main Power Fault : false >> ...... >> ...... >> ----------------------- >> ]# fence_ipmilan -L USER -a 192.168.2.153 -P lanplus -l admin -p > xxxxxxx -T 4 -o status -v >> Getting status of IPMI:192.168.2.153...Spawning: '/usr/bin/ipmitool -I > lanplus -H '192.168.2.153' -U 'ssdg' -L 'USER' -P '[set]' -v chassis > power status'... >> Chassis power = On >> Done >> ======= >> >> >> But the same above commands fails if I use the 'Administrator' User. >> >> ===== >> # ipmitool -I lanplus -H 192.168.2.153 -U Administrator -L > ADMINISTRATOR chassis status >> Password: >> Error: Unable to establish IPMI v2 / RMCP+ >> session >> Error sending Chassis Status command >> >> # ipmitool -I lanplus -H 192.168.2.153 -U Administrator chassis status >> Password: >> Error: Unable to establish IPMI v2 / RMCP+ session >> Error sending Chassis Status command >> ======= >> >> I am using the default password for 'Administrator' user ||which is > printed on a little cardboard card attached to the server >> >> Kindly guide where I went wrong ? >> >> Thanks in Advance >> Zaman >> | > >> This appears to be a problem below fence_ipmilan. > >> My first guess would be that something is lower-casing the "A". Can you >> create a user "administrator" and if so, does that work? Have you tried >> putting the user name in double-quotes (no idea if that would make a >> difference)? ie: '... -U "Administrator" ...'? > > Thanks Digimer for the reply. > > Was able to verify that proper alphabet case is being used for > 'Administrator' user. > > ==== > # fence_ipmilan -L ADMINISTRATOR -a 192.168.2.153 -P lanplus -l > Administrator -p "XXX" -T 4 -o status -v > Getting status of IPMI:192.168.2.153...Spawning: '/usr/bin/ipmitool -I > lanplus -H '192.168.2.153' -U 'Administrator' -L 'ADMINISTRATOR' -P > '[set]' -v chassis power status'... > Chassis power = Unknown > Failed > ==== > > Looks like it was not taking the default password for 'Administrator' user. > > ==== > # ipmitool user test 1 20 XXX > Set User Password command failed (user 1): Unknown (0x80) > Failure: password incorrect > # ipmitool user test 1 16 XXX > Set User Password command failed (user 1): Unknown (0x80) > Failure: password incorrect > ----- > # ipmitool user test 2 16 xxxx > Success > # ipmitool user test 2 20 xxxx > Success > ==== > > Changed privilege for 'admin' user to ADMINISTRATOR so that it can > perform fencing. > > ==== > ]# ipmitool user list 2 > ID Name Callin Link Auth IPMI Msg Channel Priv Limit > 1 Administrator true false true ADMINISTRATOR > 2 admin true false true ADMINISTRATOR > ==== > > Digimer , can you please let me know whether for performing fencing , > ADMINISTRATOR level privilege is needed or lower privilege levels can > perform fencing ? > > === > 1 Callback level > 2 User level > 3 Operator level > 4 Administrator level > === > Thanks > Zaman It probably depends on your hardware and it's implementation. I would guess not though, given how ... dramatic a fence action is. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From arpittolani at gmail.com Tue Feb 5 13:50:35 2013 From: arpittolani at gmail.com (Arpit Tolani) Date: Tue, 5 Feb 2013 19:20:35 +0530 Subject: [Linux-cluster] Pls help me in clustring In-Reply-To: References: Message-ID: Hie sudheer On Tue, Feb 5, 2013 at 2:18 PM, sudheer khanduri wrote: > Dear Sir, > > i found your mail id in redhat site. > > > i have a problem with creating cluster. i am recieveing error on luci web > console: > > error receiving header from 11111. > There could be multiple problems, without logs & exact package version, I wont be able to comment. Cross check the permissions on /tmp 1777 # chmod 1777 /tmp IIRC there is a bug as well with RHEL6.2 (I dont remember exact version.) Upgrade it to the latest packages & see if that helps. For now, check if this issue can be worked around on the cluster nodes with the following commands: mkdir /var/lib/ricci/.libvirt chown ricci.ricci /var/lib/ricci/.libvirt Hope it helps, Also it is always a good idea to send a mail to cluster mailing lists i.e. linux-cluster https://www.redhat.com/mailman/listinfo/linux-cluster Sending a mail to mailing list give more visibility to the issue. I am sending it on your behalf :) > > pls help me. > > -- > Sudheer .K > 91-9958509308 > > 'Try not to become a man of success but rather,Try to become a man of value. > -- Thanks & Regards Arpit Tolani From jpokorny at redhat.com Wed Feb 6 09:21:42 2013 From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=) Date: Wed, 6 Feb 2013 10:21:42 +0100 Subject: [Linux-cluster] Pls help me in clustring In-Reply-To: References: Message-ID: <20130206092142.GA28345@redhat.com> Hello, On 05/02/13 19:20 +0530, Arpit Tolani wrote: > On Tue, Feb 5, 2013 at 2:18 PM, sudheer khanduri > wrote: >> i have a problem with creating cluster. i am recieveing error on luci web >> console: >> >> error receiving header from 11111. >> > > There could be multiple problems, without logs & exact package > version, I wont be able to comment. > > Cross check the permissions on /tmp 1777 > > # chmod 1777 /tmp > > IIRC there is a bug as well with RHEL6.2 (I dont remember exact > version.) Upgrade it to the latest packages & see if that helps. yes, it is probably this libvirt bug (rhbz#798177) being hit. Erratum followed [1], RHEL 6.3 should not be affected. > For now, check if this issue can be worked around on the cluster > nodes with the following commands: > > mkdir /var/lib/ricci/.libvirt > chown ricci.ricci /var/lib/ricci/.libvirt Correct, only note that dot user-group separator seems to be deprecated [2], neither man page nor [3] mentions it. > Hope it helps, Also it is always a good idea to send a mail to cluster > mailing lists i.e. linux-cluster > https://www.redhat.com/mailman/listinfo/linux-cluster > > Sending a mail to mailing list give more visibility to the issue. I am > sending it on your behalf :) Sending questions/whatever to ML is probably better idea than "spamming" individuals. [1] http://rhn.redhat.com/errata/RHBA-2012-0419.html [2] http://monkey.org/openbsd/archive/misc/0402/msg00310.html [3] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/chown.html -- Jan From rhayden.public at gmail.com Wed Feb 6 20:39:00 2013 From: rhayden.public at gmail.com (Robert Hayden) Date: Wed, 6 Feb 2013 14:39:00 -0600 Subject: [Linux-cluster] self_fence for FS resource in RHEL 6.x operational? In-Reply-To: References: <50FEDCA2.9080008@redhat.com> Message-ID: On Thu, Jan 24, 2013 at 11:28 AM, Robert Hayden wrote: > On Tue, Jan 22, 2013 at 12:38 PM, Fabio M. Di Nitto wrote: >> >> On 01/22/2013 06:22 PM, Robert Hayden wrote: >> > I am testing RHCS 6.3 and found that the self_fence option for a file >> > system resource will now longer function as expected. Before I log an >> > SR with RH, I was wondering if the design changed between RHEL 5 and >> > RHEL 6. >> > >> > In RHEL 5, I see logic in /usr/share/cluster/fs.sh that will complete a >> > "reboot -fn" command on a self_fence logic. In RHEL 6, there is little >> > to no logic around self_fence in the fs.sh file. >> >> The logic has just been moved to a common file shared by all *fs >> resources (fs-lib) >> >> >> >> > >> > Example of RHEL 5 logic in fs.sh that appears to be removed from RHEL 6: >> > if [ -n "$umount_failed" ]; then >> > ocf_log err "'umount $mp' failed, error=$ret_val" >> > >> > if [ "$self_fence" ]; then >> > ocf_log alert "umount failed - REBOOTING" >> > sync >> > reboot -fn >> > fi >> > return $FAIL >> > else >> > return $SUCCESS >> > fi >> >> same code, just different file. >> >> > >> > >> > >> > To test in RHEL 6, I simply create a file system (e.g. /test/data) >> > resource with self_fence="1" or self_fence="on" (as added by Conga). >> > Then mount a small ISO image on top of the file system. This mount will >> > cause the file system resource to be unable to unmount itself and should >> > trigger a self_fence scenario. >> > >> > Testing RHEL 6, I see the following in /var/log/messages: >> > >> > Jan 21 16:40:59 techval16 rgmanager[82637]: [fs] unmounting /test/data >> > Jan 21 16:40:59 techval16 rgmanager[82777]: [fs] Sending SIGTERM to >> > processes on /test/data >> > Jan 21 16:41:04 techval16 rgmanager[82859]: [fs] unmounting /test/data >> > Jan 21 16:41:05 techval16 rgmanager[82900]: [fs] Sending SIGKILL to >> > processes on /test/data >> > Jan 21 16:41:05 techval16 rgmanager[61929]: stop on fs "share16_data" >> > returned 1 (generic error) >> >> Looks like a bug in force_umount option. >> >> Please file a ticket with RH GSS. > > I will log a ticket in a few days when I can build a simple test case > for support. > I thought I would provide a follow-up for the community A (private, sorry) bugzilla has been created https://bugzilla.redhat.com/show_bug.cgi?id=908457 For those with Red Hat Network access, a KB article has been created https://access.redhat.com/knowledge/solutions/306483 >> >> As workaround try to disable force_umount. > > The workaround of have force_umount=0 and self_fence=1 worked with the > ISO image mount test. > > >> >> As far as I can tell, but I haven't verify it: >> ocf_log warning "Sending SIGKILL to processes on $mp" >> fuser -kvm "$mp" >> >> case $? in >> 0) >> ;; >> 1) >> return $OCF_ERR_GENERIC >> ;; >> 2) >> break >> ;; >> esac >> >> the issue is the was fuser error is handled in force_umount path, that >> would match the log you are posting. >> > > I have learned that "fuser" command will not find the sub-mounted iso > image that causes the umount to fail. So, my test case using the iso > image to test self_fence may need to be updated. > > [root at techval16]# df -k | grep data > /dev/mapper/share16vg-tv16_mq_data > 806288 17200 748128 3% /test/data > 352 352 0 100% /test/data/mnt > [root at techval16]# fuser -kvm /test/data > [root at techval16]# echo $? > 1 > [root at techval16]# umount /test/data > umount: /test/data: device is busy. > (In some cases useful info about processes that use > the device is found by lsof(8) or fuser(1)) > [root at techval16]# > > Unsure if the logic in fs-lib needs to be updated to handle > sub-mounted file systems. That is what the Support Ticket will > determine, I suppose. > >> I think the correct way would be to check if self_fence is enabled or >> not and then return/reboot later on the script. >> >> Fabio >> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster From marcus.jaeger at mcn-tele.com Thu Feb 7 20:19:29 2013 From: marcus.jaeger at mcn-tele.com (=?iso-8859-1?Q?J=E4ger=2C_Marcus?=) Date: Thu, 7 Feb 2013 20:19:29 +0000 Subject: [Linux-cluster] Need help setting up pacemaker-cluster with drbd & mysql Message-ID: <0D02861F37491A4ABA9A9003FE5C583A12D19BCC@EX-01.dom.ain> Hello there, I'm new in setting up pacemaker and need some help. My config is similar to the following howto: http://blog.non-a.net/2011/03/27/cluster_drbd The only modification is use of mysql instead of apache2. At first time try, everything worked fine. Both nodes came up and the ha1-node went in service. If ha1 failed (reboot/shutdown) ha2 took all services. If ha1 came online again it took all resources again, just as excepted. BUT: If ha2 went offline and came online again, resource drbd didn't came online again, cause it detected a split-brain and corosync kept starting and stopping drbd on ha2. (Drbd had to resnyc the whole disk every time it failed.) Maybe I fixed that by modifying /etc/drbd.d/global_common.conf and added " disk { fencing resource only; } Don't know exactly, but if ha2 disconnects and connects again no full-sync happens right now. But at the end I'm getting confused. Actually only ha1-node can become Master and if ha1 fails, ha2 does not take the resources and stays slave. I'm now trying almost 2 days to figure out the problem. Google dind't help at all. crm_mon --one-shot -V says: Stack: openais Current DC: mysql-drbd-ha2 - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 3 Resources configured. ============ Online: [ mysql-drbd-ha1 mysql-drbd-ha2 ] Resource Group: lvm datavg (ocf::heartbeat:LVM): Started mysql-drbd-ha1 fs_mysql (ocf::heartbeat:Filesystem): Started mysql-drbd-ha1 Resource Group: mysql_grp app_ip (ocf::heartbeat:IPaddr): Started mysql-drbd-ha1 app_mysql (lsb:mysql): Started mysql-drbd-ha1 Master/Slave Set: ms_drbd Masters: [ mysql-drbd-ha1 ] Slaves: [ mysql-drbd-ha2 ] Failed actions: drbd:0_promote_0 (node=mysql-drbd-ha2, call=158, rc=1, status=complete): unknown error after crm node standby of ha1-node it says " Node mysql-drbd-ha1: standby Online: [ mysql-drbd-ha2 ] Resource Group: lvm datavg (ocf::heartbeat:LVM): Started mysql-drbd-ha2 fs_mysql (ocf::heartbeat:Filesystem): Started mysql-drbd-ha2 Resource Group: mysql_grp app_ip (ocf::heartbeat:IPaddr): Started mysql-drbd-ha2 app_mysql (lsb:mysql): Started mysql-drbd-ha2 Master/Slave Set: ms_drbd Masters: [ mysql-drbd-ha2 ] Stopped: [ drbd:1 ] " After "crm node online" on ha1 it still is like: " Online: [ mysql-drbd-ha1 mysql-drbd-ha2 ] Resource Group: lvm datavg (ocf::heartbeat:LVM): Started mysql-drbd-ha2 fs_mysql (ocf::heartbeat:Filesystem): Started mysql-drbd-ha2 Resource Group: mysql_grp app_ip (ocf::heartbeat:IPaddr): Started mysql-drbd-ha2 app_mysql (lsb:mysql): Started mysql-drbd-ha2 Master/Slave Set: ms_drbd Masters: [ mysql-drbd-ha2 ] Stopped: [ drbd:1 ] " Ha1 won't become master again, unless I stop both nodes, clear the crm config and reload it. That's not working for productive use at all. If you need configs and logs, please write. Thanks in advance and greetings from Frankfurt/Main, Germany Marcus ________________________________ mcn tele.com AG Im Galluspark 17, 60326 Frankfurt Aufsichtsrat: Uwe Ruecker (Vors.) Vorstand: Wolfgang Gluecks, Ralf Taegener Sitz und Registergericht: Amtsgericht Frankfurt a.M. - HRB Nr. 89717 Registerangaben: www.mcn-tele.com Diese E-Mail enthaelt vertrauliche Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten diese E-Mail. This e-mail contains confidential information. If you are not the intended recipient or have received this e-mail in error, please notify the sender immediately and destroy this e-mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From epretorious at yahoo.com Sat Feb 9 18:50:17 2013 From: epretorious at yahoo.com (Eric) Date: Sat, 9 Feb 2013 10:50:17 -0800 (PST) Subject: [Linux-cluster] Need help setting up pacemaker-cluster with drbd & mysql In-Reply-To: <0D02861F37491A4ABA9A9003FE5C583A12D19BCC@EX-01.dom.ain> References: <0D02861F37491A4ABA9A9003FE5C583A12D19BCC@EX-01.dom.ain> Message-ID: <1360435817.5516.YahooMailNeo@web126006.mail.ne1.yahoo.com> Have you had a look at the Linbit Tech Guide "MySQL High Availability on the Pacemaker Cluster Stack"? Eric Pretorious Truckee, CA >________________________________ > From: "J?ger, Marcus" >To: "'linux-cluster at redhat.com'" >Sent: Thursday, February 7, 2013 12:19 PM >Subject: [Linux-cluster] Need help setting up pacemaker-cluster with drbd & mysql > > > >Hello there, >? >I?m new in setting up ?pacemaker and need some help. >? >My config is similar to the following howto: >? >http://blog.non-a.net/2011/03/27/cluster_drbd >? >The only modification is use of mysql instead of apache2. >? >At first time try, everything worked fine. Both nodes came up and the ha1-node went in service. >If ha1 failed (reboot/shutdown) ha2 took all services. >If ha1 came online again it took all resources again, just as excepted. >BUT: If ha2 went offline and came online again, resource drbd didn?t came online again, cause it detected a split-brain and corosync kept starting and stopping drbd on ha2. ?(Drbd had to resnyc the whole disk every time it failed.) >? >Maybe I fixed that by modifying /etc/drbd.d/global_common.conf and added ? disk { fencing resource only; } Don?t know exactly, but if ha2 disconnects and connects again no full-sync happens right now. >? >But at the end I?m getting confused. >Actually only ha1-node can become Master and if ha1 fails, ha2 does not take the resources and stays slave. >I?m now trying almost 2 days to figure out the problem. Google dind?t help at all. >? >crm_mon --one-shot ?V says: >? >Stack: openais >Current DC: mysql-drbd-ha2 - partition with quorum >Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b >2 Nodes configured, 2 expected votes >3 Resources configured. >============ >? >Online: [ mysql-drbd-ha1 mysql-drbd-ha2 ] >? >Resource Group: lvm >???? datavg???? (ocf::heartbeat:LVM):?? Started mysql-drbd-ha1 >???? fs_mysql?? (ocf::heartbeat:Filesystem):??? Started mysql-drbd-ha1 >Resource Group: mysql_grp >???? app_ip???? (ocf::heartbeat:IPaddr):??????? Started mysql-drbd-ha1 >???? app_mysql? (lsb:mysql):??? Started mysql-drbd-ha1 >Master/Slave Set: ms_drbd >???? Masters: [ mysql-drbd-ha1 ] >???? Slaves: [ mysql-drbd-ha2 ] >? >Failed actions: >??? drbd:0_promote_0 (node=mysql-drbd-ha2, call=158, rc=1, status=complete): unknown error >? >after crm node standby of ha1-node it says >? >? >Node mysql-drbd-ha1: standby >Online: [ mysql-drbd-ha2 ] >? >Resource Group: lvm >???? datavg???? (ocf::heartbeat:LVM):?? Started mysql-drbd-ha2 >???? fs_mysql?? (ocf::heartbeat:Filesystem):??? Started mysql-drbd-ha2 >Resource Group: mysql_grp >???? app_ip???? (ocf::heartbeat:IPaddr):??????? Started mysql-drbd-ha2 >???? app_mysql? (lsb:mysql):??? Started mysql-drbd-ha2 >Master/Slave Set: ms_drbd >???? Masters: [ mysql-drbd-ha2 ] >???? Stopped: [ drbd:1 ] >? >? >After ?crm node online? on ha1 it still is like: >? >? >Online: [ mysql-drbd-ha1 mysql-drbd-ha2 ] >? >Resource Group: lvm >???? datavg???? (ocf::heartbeat:LVM):?? Started mysql-drbd-ha2 >???? fs_mysql?? (ocf::heartbeat:Filesystem):??? Started mysql-drbd-ha2 >Resource Group: mysql_grp >???? app_ip???? (ocf::heartbeat:IPaddr):??????? Started mysql-drbd-ha2 >???? app_mysql? (lsb:mysql):??? Started mysql-drbd-ha2 >Master/Slave Set: ms_drbd >???? Masters: [ mysql-drbd-ha2 ] >???? Stopped: [ drbd:1 ] >? >? >? >Ha1 won?t become master again, unless I stop both nodes, clear the crm config and reload it. ?That?s not working for productive use at all. >? >If you need configs and logs, please write. >? >Thanks in advance and greetings from Frankfurt/Main, Germany >? >Marcus >? >>________________________________ > >mcn tele.com AG Im Galluspark 17, 60326 Frankfurt >Aufsichtsrat: Uwe Ruecker (Vors.) >Vorstand: Wolfgang Gluecks, Ralf Taegener >Sitz und Registergericht: Amtsgericht Frankfurt a.M. - HRB Nr. 89717 >Registerangaben: www.mcn-tele.com > >Diese E-Mail enthaelt vertrauliche Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten diese E-Mail. >This e-mail contains confidential information. If you are not the intended recipient or have received this e-mail in error, please notify the sender immediately and destroy this e-mail. > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From edorthe at cari.net Mon Feb 11 16:36:29 2013 From: edorthe at cari.net (Erik Dorthe) Date: Mon, 11 Feb 2013 08:36:29 -0800 Subject: [Linux-cluster] Tripp Lite PDU Message-ID: I know there has been an agent submitted for this before (see link at the bottom of this email), but I created an agent for the Tripp Lite SNMP Management Card. This agent uses SNMP to issue commands to a Tripp Lite PDU. I have found that some of the limitations mentioned in the past do not seem to exist anymore, which may be a result of a firmware update (I am using version 12.06.0061). The primary limitation was a delay in the power state change, which had taken up to 35 seconds in the past. I have found no delay over 10 seconds, which improves response time significantly. I have put this agent into testing in a prototype cluster, and my initial results look good. The model I am testing with is a PDUMH20ATNET. If anyone would like to look over the agent or review it for inclusion, please contact me. Previously submitted agent: http://www.redhat.com/archives/linux-cluster/2008-November/msg00215.html Erik Dorth? NOC Engineer CARI.net 858.974.5080 x217 edorthe at cari.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Mon Feb 11 17:30:24 2013 From: lists at alteeve.ca (Digimer) Date: Mon, 11 Feb 2013 12:30:24 -0500 Subject: [Linux-cluster] Tripp Lite PDU In-Reply-To: References: Message-ID: <51192AB0.4060105@alteeve.ca> On 02/11/2013 11:36 AM, Erik Dorthe wrote: > I know there has been an agent submitted for this before (see link at > the bottom of this email), but I created an agent for the Tripp Lite > SNMP Management Card. This agent uses SNMP to issue commands to a Tripp > Lite PDU. I have found that some of the limitations mentioned in the > past do not seem to exist anymore, which may be a result of a firmware > update (I am using version 12.06.0061). > > The primary limitation was a delay in the power state change, which had > taken up to 35 seconds in the past. I have found no delay over 10 > seconds, which improves response time significantly. > > I have put this agent into testing in a prototype cluster, and my > initial results look good. The model I am testing with is a > PDUMH20ATNET. If anyone would like to look over the agent or review it > for inclusion, please contact me. > > Previously submitted agent: > http://www.redhat.com/archives/linux-cluster/2008-November/msg00215.html > > Erik Dorth? > NOC Engineer > CARI.net > 858.974.5080 x217 > edorthe at cari.net > > For what it's worth, I also wrote an agent for the tripplite PDUs (though I no longer own one). Feel free to use/adapt/discard whatever helps support the devices. https://github.com/digimer/fence_tripplite_snmp -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From mgrac at redhat.com Mon Feb 11 18:23:23 2013 From: mgrac at redhat.com (Marek Grac) Date: Mon, 11 Feb 2013 19:23:23 +0100 Subject: [Linux-cluster] Tripp Lite PDU In-Reply-To: References: Message-ID: <5119371B.7000306@redhat.com> On 02/11/2013 05:36 PM, Erik Dorthe wrote: > I know there has been an agent submitted for this before (see link at > the bottom of this email), but I created an agent for the Tripp Lite > SNMP Management Card. This agent uses SNMP to issue commands to a > Tripp Lite PDU. I have found that some of the limitations mentioned in > the past do not seem to exist anymore, which may be a result of a > firmware update (I am using version 12.06.0061). > > The primary limitation was a delay in the power state change, which > had taken up to 35 seconds in the past. I have found no delay over 10 > seconds, which improves response time significantly. > > I have put this agent into testing in a prototype cluster, and my > initial results look good. The model I am testing with is a > PDUMH20ATNET. If anyone would like to look over the agent or review it > for inclusion, please contact me. Sure, please resend agent to me m, From edorthe at cari.net Mon Feb 11 19:31:02 2013 From: edorthe at cari.net (Erik Dorthe) Date: Mon, 11 Feb 2013 11:31:02 -0800 Subject: [Linux-cluster] Tripp Lite PDU In-Reply-To: <5119371B.7000306@redhat.com> References: <5119371B.7000306@redhat.com> Message-ID: Marek, This is the agent I wrote. I have not done a man page or perldoc for it, but it is documented in the comments. Regards, Erik Dorth? NOC Engineer CARI.net 858.974.5080 x217 edorthe at cari.net On Mon, Feb 11, 2013 at 10:23 AM, Marek Grac wrote: > On 02/11/2013 05:36 PM, Erik Dorthe wrote: > >> I know there has been an agent submitted for this before (see link at the >> bottom of this email), but I created an agent for the Tripp Lite SNMP >> Management Card. This agent uses SNMP to issue commands to a Tripp Lite >> PDU. I have found that some of the limitations mentioned in the past do not >> seem to exist anymore, which may be a result of a firmware update (I am >> using version 12.06.0061). >> >> The primary limitation was a delay in the power state change, which had >> taken up to 35 seconds in the past. I have found no delay over 10 seconds, >> which improves response time significantly. >> >> I have put this agent into testing in a prototype cluster, and my initial >> results look good. The model I am testing with is a PDUMH20ATNET. If anyone >> would like to look over the agent or review it for inclusion, please >> contact me. >> > Sure, please resend agent to me > > m, > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/**mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fence_tripplite_snmp Type: application/octet-stream Size: 6343 bytes Desc: not available URL: From adel.benzarrouk at gmail.com Tue Feb 12 12:34:13 2013 From: adel.benzarrouk at gmail.com (Adel Ben Zarrouk) Date: Tue, 12 Feb 2013 13:34:13 +0100 Subject: [Linux-cluster] Oracle DG Broker and Red Hat Cluster Suite Message-ID: Hello, I have a client where we have installed two different sites and in each site we have installed two nodes cluster using RHEL 6.2 and Red Hat HA adds-on to failover Oracle DB 11GR2 (Site 2 is the disaster recovery for site1). Please is there any solution to integrate DG broker with Red Hat Cluster HA, since Oracle engineers is requesting to have a solution active/active Oracle DB to be able to configure DG broker, or our installation is Active/passive. Any help will be appreciate it. Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From adel.benzarrouk at gmail.com Wed Feb 13 15:36:28 2013 From: adel.benzarrouk at gmail.com (Adel Ben Zarrouk) Date: Wed, 13 Feb 2013 16:36:28 +0100 Subject: [Linux-cluster] Oracle DG Broker and Red Hat Cluster Suite In-Reply-To: References: Message-ID: Hello, I have a client where we have installed two different sites and in each site we have installed two nodes cluster using RHEL 6.2 and Red Hat HA adds-on to failover Oracle DB 11GR2 (Site 2 is the disaster recovery for site1). Please is there any solution to integrate DG broker with Red Hat Cluster HA, since Oracle engineers is requesting to have a solution active/active Oracle DB to be able to configure DG broker, or our installation is Active/passive. Any help will be appreciate it. Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdinitto at redhat.com Wed Feb 13 19:05:44 2013 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Wed, 13 Feb 2013 20:05:44 +0100 Subject: [Linux-cluster] Oracle DG Broker and Red Hat Cluster Suite In-Reply-To: References: Message-ID: <511BE408.1040700@redhat.com> On 2/13/2013 4:36 PM, Adel Ben Zarrouk wrote: > Hello, > > I have a client where we have installed two different sites and in each > site we have installed two nodes cluster using RHEL 6.2 and Red Hat HA > adds-on to failover Oracle DB 11GR2 (Site 2 is the disaster recovery for > site1). > > Please is there any solution to integrate DG broker with Red Hat Cluster > HA, since Oracle engineers is requesting to have a solution > active/active Oracle DB to be able to configure DG broker, or our > installation is Active/passive. > > Any help will be appreciate it. > > Regards Given the special nature of this cluster, please file a full architecture review ticket with Red Hat GSS. Fabio From julian.pawlowski at gmail.com Thu Feb 14 10:03:04 2013 From: julian.pawlowski at gmail.com (Julian Pawlowski) Date: Thu, 14 Feb 2013 11:03:04 +0100 Subject: [Linux-cluster] 100% CPU load of dlm_controld Message-ID: Hello, I am currently investigating an issue with dlm_controld. After we did some performance improvements the cpu load of dlm_controld becomes nearly 100% on all 3 nodes and locking goes down from 45.000/s to 3/s ... I have a feeling this has something to do with plock_rate_limit which we disabled in cluster.conf by We are still on RHEL 6.2 and I'm not sure if there are major improvements in dlm_controld for RHEL 6.3 (looking at the Github repo of dlm there seem to be quite some improvements in general, e.g. fencing). Would anybody have a suggestion what we could test? All in all, here are some specs about the systems: - 3 nodes running RHEL 6.2 - 128GB Ram - 64 Cores - FCoE SAN - 3 NIC: 1x SAN, 1x LAN, 1x Cluster LAN - mainly running SAS and related jobs - fencing enabled with fence_ipmilan Other performance related settings: - tuned-adm profile enterprise-storage - echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled - blockdev --setra 1024 (for each FC block device) - vm.dirty_background_ratio = 0 - vm.vfs_cache_pressure = 0 - vm.swappiness = 45 - vm.min_free_kbytes = 1976531 - echo 16384 > /sys/kernel/config/dlm/cluster/lkbtbl_size (set before GFS2 mount) - echo 16384 > /sys/kernel/config/dlm/cluster/rsbtbl_size (set before GFS2 mount) - echo 16384 > /sys/kernel/config/dlm/cluster/dirtbl_size (set before GFS2 mount) With these settings we get quite good performance at the beginning but dlm_controld gets stuck after half an hour or so. I thought about setting plock_rate_limit=500 or something like this. Do you think this would be a better setting instead of using unlimited? Cheers, Julian -------------- next part -------------- An HTML attachment was scrubbed... URL: From farislinux at yahoo.com Sun Feb 17 08:07:12 2013 From: farislinux at yahoo.com (faris) Date: Sun, 17 Feb 2013 00:07:12 -0800 (PST) Subject: [Linux-cluster] (no subject) Message-ID: <1361088432.79753.YahooMailNeo@web120205.mail.ne1.yahoo.com>            http://www.bimaris.es/sytaxm/gr5bujyl/nomxbf14s7xm16mqopskxmm1fsnfo.swpqtfajxlp2fhxglqmq6xflb     Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsosic at srce.hr Mon Feb 18 18:05:51 2013 From: jsosic at srce.hr (Jakov Sosic) Date: Mon, 18 Feb 2013 19:05:51 +0100 Subject: [Linux-cluster] Resource tree in service? Message-ID: <51226D7F.1040405@srce.hr> Hi. First of all, I'm running CentOS 6.3. Now, my problem is the following. I have this kind of resource tree in my service: This configuration in cluster.conf starts postgresql in a way that database listens only on the first ip address of the service (listen='192.168.50.101'). If I reorder the code to look like this: then postgres can't even start because no IP address was found (listen=''). So, script obviously checks only first level of hierarchy in the cluster.conf. I've examined the script further, and found out that postgres-8.sh uses bash function 'build_ip_list()' from "config-utils.sh" script. That function iterates through ip's, by using ccs_tool, like this: # ccs_tool query '/cluster/rm/service[@name="pgmaster"]/ip[$x]/@ref' $x is integer that equals 1, and is being iterated until ccs_tool returns error. In the first case presented, this is the output that I get: # ccs_tool query '/cluster/rm/service[@name="pgmaster"]/ip[1]/@ref' 192.168.50.101 # echo $? 0 # ccs_tool query '/cluster/rm/service[@name="pgmaster"]/ip[2]/@ref' Query failed: Invalid argument # echo $? 1 Now, for the current state of scripts to work ok, I have to model my service in this manner: Now both IP addresses get figured out by ccs_tool query, and postgresql starts fine with listen='192.168.50.101,10.200.200.101'. My question is, is this a bug, or should I always put all my ip addresses at the root level of service hierarchy, out of dependency tree? And is this a problem for a service, I mean, what is the ordering in which resources are started? In this scenario if the order of execution of resources isn't deterministic but random, there is a possibility for a service to fail, if IP addresses didn't get up before the service started. If that's not the case, and if ip addresses have to be in the dependency relation to other resources, then "config-utils.sh" script should have to be rewritten, in a way that searches for IP addresses deeply in a hierarchy and not only at root level... Thank you for all your comments. From jsosic at srce.hr Mon Feb 18 18:25:07 2013 From: jsosic at srce.hr (Jakov Sosic) Date: Mon, 18 Feb 2013 19:25:07 +0100 Subject: [Linux-cluster] Resource tree in service? Message-ID: <51227203.7010000@srce.hr> Hi. First of all, I'm running CentOS 6.3. Now, my problem is the following. I have this kind of resource tree in my service: This configuration in cluster.conf starts postgresql in a way that database listens only on the first ip address of the service (listen='192.168.50.101'). If I reorder the code to look like this: then postgres can't even start because no IP address was found (listen=''). So, script obviously checks only first level of hierarchy in the cluster.conf. I've examined the script further, and found out that postgres-8.sh uses bash function 'build_ip_list()' from "config-utils.sh" script. That function iterates through ip's, by using ccs_tool, like this: # ccs_tool query '/cluster/rm/service[@name="pgmaster"]/ip[$x]/@ref' $x is integer that equals 1, and is being iterated until ccs_tool returns error. In the first case presented, this is the output that I get: # ccs_tool query '/cluster/rm/service[@name="pgmaster"]/ip[1]/@ref' 192.168.50.101 # echo $? 0 # ccs_tool query '/cluster/rm/service[@name="pgmaster"]/ip[2]/@ref' Query failed: Invalid argument # echo $? 1 Now, for the current state of scripts to work ok, I have to model my service in this manner: Now both IP addresses get figured out by ccs_tool query, and postgresql starts fine with listen='192.168.50.101,10.200.200.101'. My question is, is this a bug, or should I always put all my ip addresses at the root level of service hierarchy, out of dependency tree? And is this a problem for a service, I mean, what is the ordering in which resources are started? In this scenario if the order of execution of resources isn't deterministic but random, there is a possibility for a service to fail, if IP addresses didn't get up before the service started. If that's not the case, and if ip addresses have to be in the dependency relation to other resources, then "config-utils.sh" script should have to be rewritten, in a way that searches for IP addresses deeply in a hierarchy and not only at root level... Thank you for all your comments. From emi2fast at gmail.com Mon Feb 18 20:35:39 2013 From: emi2fast at gmail.com (emmanuel segura) Date: Mon, 18 Feb 2013 21:35:39 +0100 Subject: [Linux-cluster] Resource tree in service? In-Reply-To: <51227203.7010000@srce.hr> References: <51227203.7010000@srce.hr> Message-ID: Hello Jakov rgmanager has an internal order and this order is used to know what the services (start|stod) sequence vim +/special/ /usr/share/cluster/service.sh So sometime force i the hierarchy in the service definetion is not needed 2013/2/18 Jakov Sosic > Hi. > > First of all, I'm running CentOS 6.3. Now, my problem is the following. > I have this kind of resource tree in my service: > > > > > > > > > > > > > This configuration in cluster.conf starts postgresql in a way that > database listens only on the first ip address of the service > (listen='192.168.50.101'). > > If I reorder the code to look like this: > > > > > > > > > > > > > then postgres can't even start because no IP address was found (listen=''). > > So, script obviously checks only first level of hierarchy in the > cluster.conf. > > I've examined the script further, and found out that postgres-8.sh uses > bash function 'build_ip_list()' from "config-utils.sh" script. > > That function iterates through ip's, by using ccs_tool, like this: > > # ccs_tool query '/cluster/rm/service[@name="pgmaster"]/ip[$x]/@ref' > > $x is integer that equals 1, and is being iterated until ccs_tool > returns error. > > In the first case presented, this is the output that I get: > > # ccs_tool query '/cluster/rm/service[@name="pgmaster"]/ip[1]/@ref' > 192.168.50.101 > > # echo $? > 0 > > # ccs_tool query '/cluster/rm/service[@name="pgmaster"]/ip[2]/@ref' > Query failed: Invalid argument > > # echo $? > 1 > > > Now, for the current state of scripts to work ok, I have to model my > service in this manner: > > > > > > > > > > Now both IP addresses get figured out by ccs_tool query, and postgresql > starts fine with listen='192.168.50.101,10.200.200.101'. > > > My question is, is this a bug, or should I always put all my ip > addresses at the root level of service hierarchy, out of dependency tree? > > And is this a problem for a service, I mean, what is the ordering in > which resources are started? In this scenario if the order of execution > of resources isn't deterministic but random, there is a possibility for > a service to fail, if IP addresses didn't get up before the service > started. > > If that's not the case, and if ip addresses have to be in the dependency > relation to other resources, then "config-utils.sh" script should have > to be rewritten, in a way that searches for IP addresses deeply in a > hierarchy and not only at root level... > > > > > Thank you for all your comments. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsosic at srce.hr Mon Feb 18 22:53:02 2013 From: jsosic at srce.hr (Jakov Sosic) Date: Mon, 18 Feb 2013 23:53:02 +0100 Subject: [Linux-cluster] Resource tree in service? In-Reply-To: References: <51227203.7010000@srce.hr> Message-ID: <5122B0CE.4080505@srce.hr> On 02/18/2013 09:35 PM, emmanuel segura wrote: > vim +/special/ /usr/share/cluster/service.sh Nice! So if I understand it correctly, ip has attribute start set to 7, while "script" has it 9. That means that IP will always start before script. But what about "postgres-8" and other cluster resource agents that are not listed in "specia" section of service.sh? Are they all auto-magically assigned value of 9? Also, not forcing hierarchy doesn't mean that IP recognition within 'config-utils.sh' is not a bug that should be solved... -- Jakov Sosic www.srce.unizg.hr From rmitchel at redhat.com Tue Feb 19 00:23:02 2013 From: rmitchel at redhat.com (Ryan Mitchell) Date: Tue, 19 Feb 2013 11:23:02 +1100 Subject: [Linux-cluster] Resource tree in service? In-Reply-To: <5122B0CE.4080505@srce.hr> References: <51227203.7010000@srce.hr> <5122B0CE.4080505@srce.hr> Message-ID: <5122C5E6.6090905@redhat.com> On 02/19/2013 09:53 AM, Jakov Sosic wrote: > So if I understand it correctly, ip has attribute start set to 7, while > "script" has it 9. That means that IP will always start before script. > > But what about "postgres-8" and other cluster resource agents that are > not listed in "specia" section of service.sh? Are they all > auto-magically assigned value of 9? Best Centos documentation I can find for that is here: http://www.centos.org/docs/5/html/5.2/Cluster_Administration/s1-clust-rsc-sibling-starting-order-CA.html "... a non-typed child resource is started and stopped according to its order in /etc/cluster.cluster.conf. In addition, non-typed child resources are started after all typed child resources have started and are stopped before any typed child resources have stopped." Also see the following 2 pages in the documentation I linked above. This is virtually the same as in the RHEL6 guide and so should apply to Centos6. > Also, not forcing hierarchy doesn't mean that IP recognition within > 'config-utils.sh' is not a bug that should be solved... I recommend raising a bug for this issue. On a slightly different topic, perhaps see if pacemaker can do what you need it to. http://clusterlabs.org/doc/ Regards, Ryan Mitchell From JMaxwell at pbp1.com Tue Feb 19 15:32:54 2013 From: JMaxwell at pbp1.com (Maxwell, Jamison [HDS]) Date: Tue, 19 Feb 2013 10:32:54 -0500 Subject: [Linux-cluster] Cannot connect to rgmanager Message-ID: <8D3E04735E183443BE64F98A192E5816030F1E84C9@GHDMBX04.hsi.hughessupply.com> I am attempting to create a two node cluster where the only resource required would be a shared IP address, however, after a couple of attempts I continue to fail. I have followed a guide located at http://www.openlogic.com/wazi/bid/188071/ . Everything appears to work fine until I get to the point where I actually add the IP address resource, both cluster nodes appear as online and quorate and the configuration validates, but will not enable the new resource. Below I am including any information that I think may be relevant, but feel free to ask for more. =========================================== [root@ hostname]# cat /etc/cluster/cluster.conf =========================================== [root@ hostname]# clusvcadm -e IP Local machine trying to enable service:IP...Could not connect to resource group manager =========================================== [root@ hostname]# strace clusvcadm -e IP ... connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, 110) = -1 ENOENT (No such file or directory) close(5) = 0 write(1, "Could not connect to resource gr"..., 44Could not connect to resource group manager ) = 44 exit_group(1) = ? =========================================== I would most like to call your attention to the line "write(1, " connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, 110) = -1 ENOENT (No such file or directory)". There was also a someone who mailed this list with what appears to be the same problem, however, no issue is present in the conversation. The topic is located here: http://www.redhat.com/archives/linux-cluster/2012-August/msg00156.html . This is version 6.3, no iptables and no selinux until I can get this working. I greatly appreciate any assistance that can be offered. Jamison Maxwell Sr. Systems Administrator HD Suppy - Facilities Maintenance -------------- next part -------------- An HTML attachment was scrubbed... URL: From emi2fast at gmail.com Tue Feb 19 16:03:18 2013 From: emi2fast at gmail.com (emmanuel segura) Date: Tue, 19 Feb 2013 17:03:18 +0100 Subject: [Linux-cluster] Cannot connect to rgmanager In-Reply-To: <8D3E04735E183443BE64F98A192E5816030F1E84C9@GHDMBX04.hsi.hughessupply.com> References: <8D3E04735E183443BE64F98A192E5816030F1E84C9@GHDMBX04.hsi.hughessupply.com> Message-ID: did you started rgmanager? 2013/2/19 Maxwell, Jamison [HDS] > I am attempting to create a two node cluster where the only resource > required would be a shared IP address, however, after a couple of attempts > I continue to fail. I have followed a guide located at > http://www.openlogic.com/wazi/bid/188071/ . Everything appears to work > fine until I get to the point where I actually add the IP address resource, > both cluster nodes appear as online and quorate and the configuration > validates, but will not enable the new resource. Below I am including any > information that I think may be relevant, but feel free to ask for more.** > ** > > ** ** > > ===========================================**** > > [root@* hostname*]# cat /etc/cluster/cluster.conf** > > **** > > **** > > **** > > **** > > votes="1">**** > > **** > > **** > > **** > > **** > > votes="1">**** > > **** > > **** > > **** > > **** > > **** > > **** > > **** > > **** > > **** > > recovery="relocate">**** > > monitor_link="on" sleeptime="10"/>**** > > **** > > **** > > **** > > ===========================================**** > > ** ** > > [root@* hostname*]# clusvcadm -e IP** > > Local machine trying to enable service:IP...Could not connect to resource > group manager**** > > ===========================================**** > > ** ** > > [root@* hostname*]# strace clusvcadm -e IP** > > ?**** > > connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, > 110) = -1 ENOENT (No such file or directory)**** > > close(5) = 0**** > > write(1, "Could not connect to resource gr"..., 44Could not connect to > resource group manager**** > > ) = 44**** > > exit_group(1) = ?**** > > ===========================================**** > > ** ** > > I would most like to call your attention to the line ?write(1, " > connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, > 110) = -1 ENOENT (No such file or directory)?. There was also a someone > who mailed this list with what appears to be the same problem, however, no > issue is present in the conversation. The topic is located here: > http://www.redhat.com/archives/linux-cluster/2012-August/msg00156.html . > **** > > ** ** > > This is version 6.3, no iptables and no selinux until I can get this > working. I greatly appreciate any assistance that can be offered.**** > > ** ** > > ** ** > > Jamison Maxwell > Sr. Systems Administrator > HD Suppy - Facilities Maintenance**** > > ** ** > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From JMaxwell at pbp1.com Tue Feb 19 16:19:01 2013 From: JMaxwell at pbp1.com (Maxwell, Jamison [HDS]) Date: Tue, 19 Feb 2013 11:19:01 -0500 Subject: [Linux-cluster] Cannot connect to rgmanager In-Reply-To: References: <8D3E04735E183443BE64F98A192E5816030F1E84C9@GHDMBX04.hsi.hughessupply.com> Message-ID: <8D3E04735E183443BE64F98A192E5816030F1E85C1@GHDMBX04.hsi.hughessupply.com> Yes, rgmanager, cman, ricci, and modclustered are started and start automatically in run levels three through five... Jamison Maxwell Sr. Systems Administrator HD Supply - Facilities Maintenance From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of emmanuel segura Sent: Tuesday, February 19, 2013 11:03 AM To: linux clustering Subject: Re: [Linux-cluster] Cannot connect to rgmanager did you started rgmanager? 2013/2/19 Maxwell, Jamison [HDS] > I am attempting to create a two node cluster where the only resource required would be a shared IP address, however, after a couple of attempts I continue to fail. I have followed a guide located at http://www.openlogic.com/wazi/bid/188071/ . Everything appears to work fine until I get to the point where I actually add the IP address resource, both cluster nodes appear as online and quorate and the configuration validates, but will not enable the new resource. Below I am including any information that I think may be relevant, but feel free to ask for more. =========================================== [root@ hostname]# cat /etc/cluster/cluster.conf =========================================== [root@ hostname]# clusvcadm -e IP Local machine trying to enable service:IP...Could not connect to resource group manager =========================================== [root@ hostname]# strace clusvcadm -e IP ... connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, 110) = -1 ENOENT (No such file or directory) close(5) = 0 write(1, "Could not connect to resource gr"..., 44Could not connect to resource group manager ) = 44 exit_group(1) = ? =========================================== I would most like to call your attention to the line "write(1, " connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, 110) = -1 ENOENT (No such file or directory)". There was also a someone who mailed this list with what appears to be the same problem, however, no issue is present in the conversation. The topic is located here: http://www.redhat.com/archives/linux-cluster/2012-August/msg00156.html . This is version 6.3, no iptables and no selinux until I can get this working. I greatly appreciate any assistance that can be offered. Jamison Maxwell Sr. Systems Administrator HD Suppy - Facilities Maintenance -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsosic at srce.hr Wed Feb 20 00:19:33 2013 From: jsosic at srce.hr (Jakov Sosic) Date: Wed, 20 Feb 2013 01:19:33 +0100 Subject: [Linux-cluster] Resource tree in service? In-Reply-To: <5122C5E6.6090905@redhat.com> References: <51227203.7010000@srce.hr> <5122B0CE.4080505@srce.hr> <5122C5E6.6090905@redhat.com> Message-ID: <51241695.9040303@srce.hr> On 02/19/2013 01:23 AM, Ryan Mitchell wrote: > Best Centos documentation I can find for that is here: > http://www.centos.org/docs/5/html/5.2/Cluster_Administration/s1-clust-rsc-sibling-starting-order-CA.html > > > "... a non-typed child resource is started and stopped according to its > order in /etc/cluster.cluster.conf. In addition, non-typed child > resources are started after all typed child resources have started and > are stopped before any typed child resources have stopped." > > Also see the following 2 pages in the documentation I linked above. This > is virtually the same as in the RHEL6 guide and so should apply to Centos6. Yeah, that's it... So far I had few dozen of stop/start/relocate/restart with this service, and no problems at all. > I recommend raising a bug for this issue. > > On a slightly different topic, perhaps see if pacemaker can do what you > need it to. http://clusterlabs.org/doc/ I will raise a bug, but before that - I want to get sure this *is a bug*. Maybe I should ask on a cluster-devel list? If it's not intended behaviour, I can write the patch myself, and then report it back... -- Jakov Sosic www.srce.unizg.hr From parvez.h.shaikh at gmail.com Fri Feb 22 10:15:36 2013 From: parvez.h.shaikh at gmail.com (Parvez Shaikh) Date: Fri, 22 Feb 2013 15:45:36 +0530 Subject: [Linux-cluster] Restarting NTP on cluster nodes Message-ID: Hi experts, I tried finding information about this in documentation but couldn't. Is it safe to restart ntpd on cluster nodes (after adding entries in ntpd.conf) while cluster services are still running? I need to perform this on production setup however not sure about impact of this Thanks, Parvez -------------- next part -------------- An HTML attachment was scrubbed... URL: From shanti.pahari at sierra.sg Fri Feb 22 11:29:30 2013 From: shanti.pahari at sierra.sg (Shanti Pahari) Date: Fri, 22 Feb 2013 19:29:30 +0800 (SGT) Subject: [Linux-cluster] Restarting NTP on cluster nodes In-Reply-To: References: Message-ID: Yes it wont affect your cluster i did it several times Sent from my iPhone On 22 Feb, 2013, at 6:44 PM, Parvez Shaikh wrote: > Hi experts, > > I tried finding information about this in documentation but couldn't. > > Is it safe to restart ntpd on cluster nodes (after adding entries in ntpd.conf) while cluster services are still running? > > I need to perform this on production setup however not sure about impact of this > > Thanks, > Parvez > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From Ralph.Grothe at itdz-berlin.de Fri Feb 22 17:29:06 2013 From: Ralph.Grothe at itdz-berlin.de (Ralph.Grothe at itdz-berlin.de) Date: Fri, 22 Feb 2013 18:29:06 +0100 Subject: [Linux-cluster] Can anyone help tackling issues with custom resource agent for RHCS? Message-ID: Hello, I have written a custom resource agent for Informix servers in accordance to the OCF RA Developer's Guide http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html but I have some doubt if it properly integrates with RHCS's idiosyncrasies. I would have liked to save me the work finding out all the RedHat deviations, and would have taken a readily supplied agent for Informix Server resources if only I had found one. I found that I not only had to write the agent and simply place it in /usr/share/cluster (as far as I understand this would have sufficed in a Pacemaker cluster, albeit under a different directory path) I also had to tinker with /usr/share/system-config-cluster/misc/cluster.ng in order to have my extensions pass the XML validity check. Now my new tag is accepted in cluster.conf along with its paramtrezation via in the agent's meta-data dump as well as my definitions in cluster.ng, and I was able to successfully commit an ccs_tool update. These are the attributes and actions my agent supports so far: # /usr/share/cluster/ifxdb.sh meta-data|grep -E '(parameter|action) name' rg_test shows its rules and I am able to start, stop and monitor (or status in RHCS mannerism) my Informix resources like e.g. # rg_test test /etc/cluster/cluster.conf start ifxdb ju_09tcp I can also enable, stop and relocate every service that contains an instance of my ifxdb resource via clusvcadm. As I made ample use of the ocf_log function (I prefer to have some meaningful output from the agents in the logs) at the command line when invoked through rg_test the agent verbosely reports what it's doing. What puzzles me is that I cannot see any of these logging entries when my agent is run by clurgmgrd. That it is run at all I can verify as on e.g. service startup the Informix server instance is successfully started. What puzzles me even more is that I cannot see any entries in /var/log/messages from the regular monitoring invocatins of the agent at given intervals according to above meta-data dump, so as if no resource monitoring checks are at all performed. And even worse, when I shutdown an Informix instance manually (e.g. omnode -ky) it doesn't get restarted by clurgmgrd even though *no* __independent_subtree attribute is defined in the ifxdb tags of cluster.conf. So although I can manually start, stop and relocate the affected services through clusvcadm, the whole rgmanager HA treatment seems dysfunctional to me. Is there anything or anywhere that I have forgotten to also manipulate to fully enable my custom resource agent? Regards, Ralph -- Fon +49 30 90222 6481 Fax +49 30 90222 3151 IT-Dienstleistungszentrum Berlin Anstalt ?ffentlichen Rechts PB22 Unix/Linux/vSphere Administration Berliner Str. 112-115 D-10713 Berlin Handelsregisternr.: HRA 36349 B Registergericht: Amtsgericht Charlottenburg www.itdz-berlin.de www.itdz.verwalt-berlin.de Das ITDZ Berlin ist im Rahmen des "audit berufundfamilie" als familienbewusstes Unternehmen zertifiziert. Bitte pr?fen Sie aus Gr?nden des Umweltschutzes, ob der Ausdruck dieser E-Mail erforderlich ist. From lists at alteeve.ca Fri Feb 22 18:01:41 2013 From: lists at alteeve.ca (Digimer) Date: Fri, 22 Feb 2013 13:01:41 -0500 Subject: [Linux-cluster] Can anyone help tackling issues with custom resource agent for RHCS? In-Reply-To: References: Message-ID: <5127B285.40303@alteeve.ca> Would this help? https://fedorahosted.org/cluster/wiki/ResourceActions digimer On 02/22/2013 12:29 PM, Ralph.Grothe at itdz-berlin.de wrote: > Hello, > > I have written a custom resource agent for Informix servers in > accordance to the OCF RA Developer's Guide > http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html > but I have some doubt if it properly integrates with RHCS's > idiosyncrasies. > > I would have liked to save me the work finding out all the RedHat > deviations, and would have taken a readily supplied agent for > Informix Server resources if only I had found one. > > I found that I not only had to write the agent and simply place > it in /usr/share/cluster > (as far as I understand this would have sufficed in a Pacemaker > cluster, albeit under a different directory path) > I also had to tinker with > /usr/share/system-config-cluster/misc/cluster.ng in order to have > my extensions pass the XML validity check. > > Now my new tag is accepted in cluster.conf along with > its paramtrezation via in the agent's meta-data dump as well as > my definitions in cluster.ng, > and I was able to successfully commit an ccs_tool update. > > > These are the attributes and actions my agent supports so far: > > # /usr/share/cluster/ifxdb.sh meta-data|grep -E > '(parameter|action) name' > > > > > > > > > > > > > interval="15"/> > interval="30"/> > > > > > > > > > rg_test shows its rules and I am able to start, stop and monitor > (or status in RHCS mannerism) my Informix resources > like e.g. > > # rg_test test /etc/cluster/cluster.conf start ifxdb ju_09tcp > > > I can also enable, stop and relocate every service that contains > an instance of my ifxdb resource via clusvcadm. > > > As I made ample use of the ocf_log function > (I prefer to have some meaningful output from the agents in the > logs) > at the command line when invoked through rg_test the agent > verbosely reports what it's doing. > > What puzzles me is that I cannot see any of these logging entries > when my agent is run by clurgmgrd. > That it is run at all I can verify as on e.g. service startup the > Informix server instance is successfully started. > > What puzzles me even more is that I cannot see any entries in > /var/log/messages from the regular monitoring invocatins of the > agent at given intervals according to above meta-data dump, so as > if no resource monitoring checks are at all performed. > > And even worse, when I shutdown an Informix instance manually > (e.g. omnode -ky) it doesn't get restarted by clurgmgrd even > though *no* __independent_subtree attribute is defined in the > ifxdb tags of cluster.conf. > > So although I can manually start, stop and relocate the affected > services through clusvcadm, the whole rgmanager HA treatment > seems dysfunctional to me. > > Is there anything or anywhere that I have forgotten to also > manipulate to fully enable my custom resource agent? > > > > Regards, > Ralph -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From Ralph.Grothe at itdz-berlin.de Sun Feb 24 10:49:18 2013 From: Ralph.Grothe at itdz-berlin.de (Ralph.Grothe at itdz-berlin.de) Date: Sun, 24 Feb 2013 11:49:18 +0100 Subject: [Linux-cluster] Can anyone help tackling issues with custom resource agent for RHCS? In-Reply-To: <5127B285.40303@alteeve.ca> References: <5127B285.40303@alteeve.ca> Message-ID: Hallo digimer, I already knew this link and have read FAQs and other stuff there. Unfortunately, many features such as dependencies between cluster services, that our customers demand from us to be enabled in their clusters (and what they have been accustomed to in their former clusters (e.g. Veritas) which are to be migrated from to RHCS ones) are hardly anywhere documented. But when I posted my query I was mistaken. My ifxdb agent isn't dysfunctional. It really works. But what it still lacks is that clurgmgrd doesn't log its actions despite the fact that I used mentioned ocf_log function (I also check in my agent if that function is defined at run time and if not I resource the /usr/schare/cluster/ocf-shellfuncs) and although it logs every step whenever I run it during disabled services through rg_test utility. I have no explanation why clurgmgrd is so taciturn when it comes to logging output from my ifxdb agent. I think that I have enabled logging up to debug level. [root at altair:/usr/share/cluster] # grep rm /etc/cluster/cluster.conf [root at altair:/usr/share/cluster] # grep local6 /etc/syslog.conf local6.* /var/log/clurgmgrd.log Regards, Ralph -----Urspr?ngliche Nachricht----- Von: Digimer [mailto:lists at alteeve.ca] Gesendet: Freitag, 22. Februar 2013 19:02 An: linux clustering Cc: Grothe, Ralph Betreff: Re: [Linux-cluster] Can anyone help tackling issues with custom resource agent for RHCS? Would this help? https://fedorahosted.org/cluster/wiki/ResourceActions digimer On 02/22/2013 12:29 PM, Ralph.Grothe at itdz-berlin.de wrote: > Hello, > > I have written a custom resource agent for Informix servers in > accordance to the OCF RA Developer's Guide > http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html > but I have some doubt if it properly integrates with RHCS's > idiosyncrasies. > > I would have liked to save me the work finding out all the RedHat > deviations, and would have taken a readily supplied agent for > Informix Server resources if only I had found one. > > I found that I not only had to write the agent and simply place > it in /usr/share/cluster > (as far as I understand this would have sufficed in a Pacemaker > cluster, albeit under a different directory path) > I also had to tinker with > /usr/share/system-config-cluster/misc/cluster.ng in order to have > my extensions pass the XML validity check. > > Now my new tag is accepted in cluster.conf along with > its paramtrezation via in the agent's meta-data dump as well as > my definitions in cluster.ng, > and I was able to successfully commit an ccs_tool update. > > > These are the attributes and actions my agent supports so far: > > # /usr/share/cluster/ifxdb.sh meta-data|grep -E > '(parameter|action) name' > > > > > > > > > > > > > interval="15"/> > interval="30"/> > > > > > > > > > rg_test shows its rules and I am able to start, stop and monitor > (or status in RHCS mannerism) my Informix resources > like e.g. > > # rg_test test /etc/cluster/cluster.conf start ifxdb ju_09tcp > > > I can also enable, stop and relocate every service that contains > an instance of my ifxdb resource via clusvcadm. > > > As I made ample use of the ocf_log function > (I prefer to have some meaningful output from the agents in the > logs) > at the command line when invoked through rg_test the agent > verbosely reports what it's doing. > > What puzzles me is that I cannot see any of these logging entries > when my agent is run by clurgmgrd. > That it is run at all I can verify as on e.g. service startup the > Informix server instance is successfully started. > > What puzzles me even more is that I cannot see any entries in > /var/log/messages from the regular monitoring invocatins of the > agent at given intervals according to above meta-data dump, so as > if no resource monitoring checks are at all performed. > > And even worse, when I shutdown an Informix instance manually > (e.g. omnode -ky) it doesn't get restarted by clurgmgrd even > though *no* __independent_subtree attribute is defined in the > ifxdb tags of cluster.conf. > > So although I can manually start, stop and relocate the affected > services through clusvcadm, the whole rgmanager HA treatment > seems dysfunctional to me. > > Is there anything or anywhere that I have forgotten to also > manipulate to fully enable my custom resource agent? > > > > Regards, > Ralph -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From emi2fast at gmail.com Sun Feb 24 18:31:31 2013 From: emi2fast at gmail.com (emmanuel segura) Date: Sun, 24 Feb 2013 19:31:31 +0100 Subject: [Linux-cluster] Cannot connect to rgmanager In-Reply-To: <8D3E04735E183443BE64F98A192E5816030F1E85C1@GHDMBX04.hsi.hughessupply.com> References: <8D3E04735E183443BE64F98A192E5816030F1E84C9@GHDMBX04.hsi.hughessupply.com> <8D3E04735E183443BE64F98A192E5816030F1E85C1@GHDMBX04.hsi.hughessupply.com> Message-ID: Hello Sorry For my late reply, try to configure your fence devices and after start the rgmanager service 2013/2/19 Maxwell, Jamison [HDS] > Yes, rgmanager, cman, ricci, and modclustered are started and start > automatically in run levels three through five?**** > > ** ** > > ** ** > > ** ** > > ** ** > > Jamison Maxwell > Sr. Systems Administrator > HD Supply - Facilities Maintenance**** > > ** ** > > *From:* linux-cluster-bounces at redhat.com [mailto: > linux-cluster-bounces at redhat.com] *On Behalf Of *emmanuel segura > *Sent:* Tuesday, February 19, 2013 11:03 AM > *To:* linux clustering > *Subject:* Re: [Linux-cluster] Cannot connect to rgmanager**** > > ** ** > > did you started rgmanager?**** > > 2013/2/19 Maxwell, Jamison [HDS] **** > > I am attempting to create a two node cluster where the only resource > required would be a shared IP address, however, after a couple of attempts > I continue to fail. I have followed a guide located at > http://www.openlogic.com/wazi/bid/188071/ . Everything appears to work > fine until I get to the point where I actually add the IP address resource, > both cluster nodes appear as online and quorate and the configuration > validates, but will not enable the new resource. Below I am including any > information that I think may be relevant, but feel free to ask for more.** > ** > > **** > > ===========================================**** > > [root@* hostname*]# cat /etc/cluster/cluster.conf**** > > **** > > **** > > **** > > **** > > votes="1">**** > > **** > > **** > > **** > > **** > > votes="1">**** > > **** > > **** > > **** > > **** > > **** > > **** > > **** > > **** > > **** > > recovery="relocate">**** > > monitor_link="on" sleeptime="10"/>**** > > **** > > **** > > **** > > ===========================================**** > > **** > > [root@* hostname*]# clusvcadm -e IP**** > > Local machine trying to enable service:IP...Could not connect to resource > group manager**** > > ===========================================**** > > **** > > [root@* hostname*]# strace clusvcadm -e IP**** > > ?**** > > connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, > 110) = -1 ENOENT (No such file or directory)**** > > close(5) = 0**** > > write(1, "Could not connect to resource gr"..., 44Could not connect to > resource group manager**** > > ) = 44**** > > exit_group(1) = ?**** > > ===========================================**** > > **** > > I would most like to call your attention to the line ?write(1, " > connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, > 110) = -1 ENOENT (No such file or directory)?. There was also a someone > who mailed this list with what appears to be the same problem, however, no > issue is present in the conversation. The topic is located here: > http://www.redhat.com/archives/linux-cluster/2012-August/msg00156.html . > **** > > **** > > This is version 6.3, no iptables and no selinux until I can get this > working. I greatly appreciate any assistance that can be offered.**** > > **** > > **** > > Jamison Maxwell > Sr. Systems Administrator > HD Suppy - Facilities Maintenance**** > > **** > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster**** > > > > > -- > esta es mi vida e me la vivo hasta que dios quiera **** > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From JMaxwell at pbp1.com Wed Feb 27 22:48:49 2013 From: JMaxwell at pbp1.com (Maxwell, Jamison [HDS]) Date: Wed, 27 Feb 2013 17:48:49 -0500 Subject: [Linux-cluster] Cannot connect to rgmanager In-Reply-To: References: <8D3E04735E183443BE64F98A192E5816030F1E84C9@GHDMBX04.hsi.hughessupply.com> <8D3E04735E183443BE64F98A192E5816030F1E85C1@GHDMBX04.hsi.hughessupply.com> Message-ID: <8D3E04735E183443BE64F98A192E5816030F3A2F85@GHDMBX04.hsi.hughessupply.com> Figured it out! I wasn't allowing multicast on the vlan interface on my switch. I never even thought about this as an issue because I'm not routing, the nodes are right next to each other. However, tcpdump showed that I was sending tons of multicast, but not receiving any. As soon as I enabled multicast on the vlan interface everything came up. Also, in frustration I dropped cman, rgmanager, modclusterd in favor of corosync + pacemaker, but I believe I would have been as successful with either option. I think I prefer stonith to fencing, though. Lastly, is there a way to use unicast? I realize that multicast would be greatly preferable in 3+ node clusters, but in this two node it would be easier to use unicast. Jamison Maxwell Sr. Systems Administrator HD Supply - Facilities Maintenance From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of emmanuel segura Sent: Sunday, February 24, 2013 1:32 PM To: linux clustering Subject: Re: [Linux-cluster] Cannot connect to rgmanager Hello Sorry For my late reply, try to configure your fence devices and after start the rgmanager service 2013/2/19 Maxwell, Jamison [HDS] > Yes, rgmanager, cman, ricci, and modclustered are started and start automatically in run levels three through five... Jamison Maxwell Sr. Systems Administrator HD Supply - Facilities Maintenance From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of emmanuel segura Sent: Tuesday, February 19, 2013 11:03 AM To: linux clustering Subject: Re: [Linux-cluster] Cannot connect to rgmanager did you started rgmanager? 2013/2/19 Maxwell, Jamison [HDS] > I am attempting to create a two node cluster where the only resource required would be a shared IP address, however, after a couple of attempts I continue to fail. I have followed a guide located at http://www.openlogic.com/wazi/bid/188071/ . Everything appears to work fine until I get to the point where I actually add the IP address resource, both cluster nodes appear as online and quorate and the configuration validates, but will not enable the new resource. Below I am including any information that I think may be relevant, but feel free to ask for more. =========================================== [root@ hostname]# cat /etc/cluster/cluster.conf =========================================== [root@ hostname]# clusvcadm -e IP Local machine trying to enable service:IP...Could not connect to resource group manager =========================================== [root@ hostname]# strace clusvcadm -e IP ... connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, 110) = -1 ENOENT (No such file or directory) close(5) = 0 write(1, "Could not connect to resource gr"..., 44Could not connect to resource group manager ) = 44 exit_group(1) = ? =========================================== I would most like to call your attention to the line "write(1, " connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, 110) = -1 ENOENT (No such file or directory)". There was also a someone who mailed this list with what appears to be the same problem, however, no issue is present in the conversation. The topic is located here: http://www.redhat.com/archives/linux-cluster/2012-August/msg00156.html . This is version 6.3, no iptables and no selinux until I can get this working. I greatly appreciate any assistance that can be offered. Jamison Maxwell Sr. Systems Administrator HD Suppy - Facilities Maintenance -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- esta es mi vida e me la vivo hasta que dios quiera -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From emi2fast at gmail.com Thu Feb 28 13:36:14 2013 From: emi2fast at gmail.com (emmanuel segura) Date: Thu, 28 Feb 2013 14:36:14 +0100 Subject: [Linux-cluster] Cannot connect to rgmanager In-Reply-To: <8D3E04735E183443BE64F98A192E5816030F3A2F85@GHDMBX04.hsi.hughessupply.com> References: <8D3E04735E183443BE64F98A192E5816030F1E84C9@GHDMBX04.hsi.hughessupply.com> <8D3E04735E183443BE64F98A192E5816030F1E85C1@GHDMBX04.hsi.hughessupply.com> <8D3E04735E183443BE64F98A192E5816030F3A2F85@GHDMBX04.hsi.hughessupply.com> Message-ID: Stonith = fencing, so the same thing For use corosync with udp look here http://www.thatsgeeky.com/2011/12/installing-corosync-on-ec2/ 2013/2/27 Maxwell, Jamison [HDS] > Figured it out! I wasn?t allowing multicast on the vlan interface on my > switch. I never even thought about this as an issue because I?m not > routing, the nodes are right next to each other. However, tcpdump showed > that I was sending tons of multicast, but not receiving any. As soon as I > enabled multicast on the vlan interface everything came up.**** > > ** ** > > Also, in frustration I dropped cman, rgmanager, modclusterd in favor of > corosync + pacemaker, but I believe I would have been as successful with > either option. I think I prefer stonith to fencing, though.**** > > ** ** > > Lastly, is there a way to use unicast? I realize that multicast would be > greatly preferable in 3+ node clusters, but in this two node it would be > easier to use unicast.**** > > ** ** > > ** ** > > ** ** > > ** ** > > Jamison Maxwell > Sr. Systems Administrator > HD Supply - Facilities Maintenance**** > > ** ** > > *From:* linux-cluster-bounces at redhat.com [mailto: > linux-cluster-bounces at redhat.com] *On Behalf Of *emmanuel segura > *Sent:* Sunday, February 24, 2013 1:32 PM > > *To:* linux clustering > *Subject:* Re: [Linux-cluster] Cannot connect to rgmanager**** > > ** ** > > Hello Sorry > > For my late reply, try to configure your fence devices and after start the > rgmanager service > > **** > > 2013/2/19 Maxwell, Jamison [HDS] **** > > Yes, rgmanager, cman, ricci, and modclustered are started and start > automatically in run levels three through five?**** > > **** > > **** > > **** > > **** > > Jamison Maxwell > Sr. Systems Administrator**** > > HD Supply - Facilities Maintenance**** > > **** > > *From:* linux-cluster-bounces at redhat.com [mailto: > linux-cluster-bounces at redhat.com] *On Behalf Of *emmanuel segura > *Sent:* Tuesday, February 19, 2013 11:03 AM > *To:* linux clustering > *Subject:* Re: [Linux-cluster] Cannot connect to rgmanager**** > > **** > > did you started rgmanager?**** > > 2013/2/19 Maxwell, Jamison [HDS] **** > > I am attempting to create a two node cluster where the only resource > required would be a shared IP address, however, after a couple of attempts > I continue to fail. I have followed a guide located at > http://www.openlogic.com/wazi/bid/188071/ . Everything appears to work > fine until I get to the point where I actually add the IP address resource, > both cluster nodes appear as online and quorate and the configuration > validates, but will not enable the new resource. Below I am including any > information that I think may be relevant, but feel free to ask for more.** > ** > > **** > > ===========================================**** > > [root@* hostname*]# cat /etc/cluster/cluster.conf**** > > **** > > **** > > **** > > **** > > votes="1">**** > > **** > > **** > > **** > > **** > > votes="1">**** > > **** > > **** > > **** > > **** > > **** > > **** > > **** > > **** > > **** > > recovery="relocate">**** > > monitor_link="on" sleeptime="10"/>**** > > **** > > **** > > **** > > ===========================================**** > > **** > > [root@* hostname*]# clusvcadm -e IP**** > > Local machine trying to enable service:IP...Could not connect to resource > group manager**** > > ===========================================**** > > **** > > [root@* hostname*]# strace clusvcadm -e IP**** > > ?**** > > connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, > 110) = -1 ENOENT (No such file or directory)**** > > close(5) = 0**** > > write(1, "Could not connect to resource gr"..., 44Could not connect to > resource group manager**** > > ) = 44**** > > exit_group(1) = ?**** > > ===========================================**** > > **** > > I would most like to call your attention to the line ?write(1, " > connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, > 110) = -1 ENOENT (No such file or directory)?. There was also a someone > who mailed this list with what appears to be the same problem, however, no > issue is present in the conversation. The topic is located here: > http://www.redhat.com/archives/linux-cluster/2012-August/msg00156.html . > **** > > **** > > This is version 6.3, no iptables and no selinux until I can get this > working. I greatly appreciate any assistance that can be offered.**** > > **** > > **** > > Jamison Maxwell > Sr. Systems Administrator > HD Suppy - Facilities Maintenance**** > > **** > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster**** > > > > > -- > esta es mi vida e me la vivo hasta que dios quiera **** > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster**** > > > > > -- > esta es mi vida e me la vivo hasta que dios quiera **** > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From JMaxwell at pbp1.com Thu Feb 28 16:20:34 2013 From: JMaxwell at pbp1.com (Maxwell, Jamison [HDS]) Date: Thu, 28 Feb 2013 11:20:34 -0500 Subject: [Linux-cluster] Cannot connect to rgmanager In-Reply-To: References: <8D3E04735E183443BE64F98A192E5816030F1E84C9@GHDMBX04.hsi.hughessupply.com> <8D3E04735E183443BE64F98A192E5816030F1E85C1@GHDMBX04.hsi.hughessupply.com> <8D3E04735E183443BE64F98A192E5816030F3A2F85@GHDMBX04.hsi.hughessupply.com> Message-ID: <8D3E04735E183443BE64F98A192E5816030F3E52D4@GHDMBX04.hsi.hughessupply.com> > Stonith = fencing, so the same thing But the name is cooler! "Shoot the other node in the head", that's awesome. Jamison Maxwell Sr. Systems Administrator HD Supply - Facilities Maintenance From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of emmanuel segura Sent: Thursday, February 28, 2013 8:36 AM To: linux clustering Subject: Re: [Linux-cluster] Cannot connect to rgmanager Stonith = fencing, so the same thing For use corosync with udp look here http://www.thatsgeeky.com/2011/12/installing-corosync-on-ec2/ 2013/2/27 Maxwell, Jamison [HDS] > Figured it out! I wasn't allowing multicast on the vlan interface on my switch. I never even thought about this as an issue because I'm not routing, the nodes are right next to each other. However, tcpdump showed that I was sending tons of multicast, but not receiving any. As soon as I enabled multicast on the vlan interface everything came up. Also, in frustration I dropped cman, rgmanager, modclusterd in favor of corosync + pacemaker, but I believe I would have been as successful with either option. I think I prefer stonith to fencing, though. Lastly, is there a way to use unicast? I realize that multicast would be greatly preferable in 3+ node clusters, but in this two node it would be easier to use unicast. Jamison Maxwell Sr. Systems Administrator HD Supply - Facilities Maintenance From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of emmanuel segura Sent: Sunday, February 24, 2013 1:32 PM To: linux clustering Subject: Re: [Linux-cluster] Cannot connect to rgmanager Hello Sorry For my late reply, try to configure your fence devices and after start the rgmanager service 2013/2/19 Maxwell, Jamison [HDS] > Yes, rgmanager, cman, ricci, and modclustered are started and start automatically in run levels three through five... Jamison Maxwell Sr. Systems Administrator HD Supply - Facilities Maintenance From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of emmanuel segura Sent: Tuesday, February 19, 2013 11:03 AM To: linux clustering Subject: Re: [Linux-cluster] Cannot connect to rgmanager did you started rgmanager? 2013/2/19 Maxwell, Jamison [HDS] > I am attempting to create a two node cluster where the only resource required would be a shared IP address, however, after a couple of attempts I continue to fail. I have followed a guide located at http://www.openlogic.com/wazi/bid/188071/ . Everything appears to work fine until I get to the point where I actually add the IP address resource, both cluster nodes appear as online and quorate and the configuration validates, but will not enable the new resource. Below I am including any information that I think may be relevant, but feel free to ask for more. =========================================== [root@ hostname]# cat /etc/cluster/cluster.conf =========================================== [root@ hostname]# clusvcadm -e IP Local machine trying to enable service:IP...Could not connect to resource group manager =========================================== [root@ hostname]# strace clusvcadm -e IP ... connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, 110) = -1 ENOENT (No such file or directory) close(5) = 0 write(1, "Could not connect to resource gr"..., 44Could not connect to resource group manager ) = 44 exit_group(1) = ? =========================================== I would most like to call your attention to the line "write(1, " connect(5, {sa_family=AF_FILE, path="/var/run/cluster/rgmanager.sk"}, 110) = -1 ENOENT (No such file or directory)". There was also a someone who mailed this list with what appears to be the same problem, however, no issue is present in the conversation. The topic is located here: http://www.redhat.com/archives/linux-cluster/2012-August/msg00156.html . This is version 6.3, no iptables and no selinux until I can get this working. I greatly appreciate any assistance that can be offered. Jamison Maxwell Sr. Systems Administrator HD Suppy - Facilities Maintenance -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- esta es mi vida e me la vivo hasta que dios quiera -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- esta es mi vida e me la vivo hasta que dios quiera -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From franchu.garcia at gmail.com Thu Feb 28 16:44:09 2013 From: franchu.garcia at gmail.com (Fran Garcia) Date: Thu, 28 Feb 2013 16:44:09 +0000 Subject: [Linux-cluster] Cannot connect to rgmanager In-Reply-To: <8D3E04735E183443BE64F98A192E5816030F3E52D4@GHDMBX04.hsi.hughessupply.com> References: <8D3E04735E183443BE64F98A192E5816030F1E84C9@GHDMBX04.hsi.hughessupply.com> <8D3E04735E183443BE64F98A192E5816030F1E85C1@GHDMBX04.hsi.hughessupply.com> <8D3E04735E183443BE64F98A192E5816030F3A2F85@GHDMBX04.hsi.hughessupply.com> <8D3E04735E183443BE64F98A192E5816030F3E52D4@GHDMBX04.hsi.hughessupply.com> Message-ID: On Thu, Feb 28, 2013 at 4:20 PM, Maxwell, Jamison [HDS] wrote: > But the name is cooler! ?Shoot the other node in the head?, that?s awesome. Specially if you have this pic in your mind when saying stonith: http://devopsreactions.tumblr.com/post/43391083841/how-stonith-works SCNR :-) From kiss.zoltan at bardiauto.hu Thu Feb 28 16:51:09 2013 From: kiss.zoltan at bardiauto.hu (=?iso-8859-2?B?S2lzcyBab2x04W4=?=) Date: Thu, 28 Feb 2013 17:51:09 +0100 Subject: [Linux-cluster] Cannot connect to rgmanager In-Reply-To: Message-ID: Lol :) Made my day, thx! - Kiss Zolt?n | B?rdi Aut? Zrt. senior system engineer ? -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Fran Garcia Sent: Thursday, February 28, 2013 5:44 PM To: linux clustering Subject: Re: [Linux-cluster] Cannot connect to rgmanager On Thu, Feb 28, 2013 at 4:20 PM, Maxwell, Jamison [HDS] wrote: > But the name is cooler! "Shoot the other node in the head", that's awesome. Specially if you have this pic in your mind when saying stonith: http://devopsreactions.tumblr.com/post/43391083841/how-stonith-works SCNR :-) -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From lists at alteeve.ca Thu Feb 28 19:36:44 2013 From: lists at alteeve.ca (Digimer) Date: Thu, 28 Feb 2013 14:36:44 -0500 Subject: [Linux-cluster] Cannot connect to rgmanager In-Reply-To: <8D3E04735E183443BE64F98A192E5816030F3E52D4@GHDMBX04.hsi.hughessupply.com> References: <8D3E04735E183443BE64F98A192E5816030F1E84C9@GHDMBX04.hsi.hughessupply.com> <8D3E04735E183443BE64F98A192E5816030F1E85C1@GHDMBX04.hsi.hughessupply.com> <8D3E04735E183443BE64F98A192E5816030F3A2F85@GHDMBX04.hsi.hughessupply.com> <8D3E04735E183443BE64F98A192E5816030F3E52D4@GHDMBX04.hsi.hughessupply.com> Message-ID: <512FB1CC.50701@alteeve.ca> On 02/28/2013 11:20 AM, Maxwell, Jamison [HDS] wrote: >> Stonith = fencing, so the same thing > > But the name is cooler! ?Shoot the other node in the head?, that?s awesome. True, I suppose, but it implies power fencing only. "Fencing" as a concept, isolating a problem node, can be accomplished by cutting it off of the SAN/network. This concept is called "fabric fencing". -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?