From mgrac at redhat.com Thu Aug 1 07:39:50 2013 From: mgrac at redhat.com (Marek Grac) Date: Thu, 01 Aug 2013 09:39:50 +0200 Subject: [Linux-cluster] fence_ipmilan In-Reply-To: <9A757AF2CA7F204A8F2444FFC5C27C30485F536C@Exchange2010.Skynet.local> References: <9A757AF2CA7F204A8F2444FFC5C27C30485F536C@Exchange2010.Skynet.local> Message-ID: <51FA10C6.6090701@redhat.com> On 07/31/2013 03:57 PM, Johannes M?ulen wrote: > > Hi there, > > I'm trying to setup a cluster and had issues with 'fence_ipmilan' from > the package fence-agents. > > I'm running debian 7.1 with a 3.2.0-4-amd64 kernel. > > 'fence_ipmilan --V' gives 'fence_ipmilan 3.1.5' > > My Cluster nodes are running on Supermicro Motherboards with IPMI > on-board. (To be exact: > http://www.supermicro.nl/products/motherboard/Xeon/C202_C204/X9SCA-F.cfm ) > > I've experienced the following behavior: > > fence_ipmilan -a xxx.xxx.xxx.xxx -l USER-p PASS -v -o off; echo $? > > Powering off machine @ IPMI:xxx.xxx.xxx.xxx...Spawning: > '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v > chassis power status'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power off'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power status'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power off'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power status'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power off'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power status'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power off'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power status'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power off'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power status'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power off'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power status'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power off'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power status'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power off'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power status'... > > ipmilan: Power still on > > Failed > > 1 > > More or less in the same moment when I got this message the machine > went down. So all the commands were working, but not in the expected time. > > ( Using Supermicro Mainboards with IPMI onboard, > http://www.supermicro.nl/products/motherboard/Xeon/C202_C204/X9SCA-F.cfm ) > > I've played around with available parameters and wasn't able to fix > this behavior. > > So I went into the source > code(fence-agents-3.1.5/fence/agents/ipmilan/ipmilan.c) and had a look > at the ipmi_off function. > > There was a fixed value of 2 seconds to sleep. > > I modified this to use the same parameter like ipmi_on: > ipmi->i_power_wait instead of 2, so that I can modify this value and > test if it has effect on my problem. > > Now when I use the modified version of fence_ipmilan the output looks > like: > > ./fence_ipmilan -a xxx.xxx.xxx.xxx -l USER -p PASS -T 10 -v -o off ; > echo $? > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power off'... > > Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P > 'PASS' -v chassis power status'... > > Done > > 0 > > So I think this fixed my problem, and I think it might help other > users experiencing the same issues. > > Kind regards > > Thanks for reporting, I'm quite suprised that you use such old version. This issue was fixed in 3.1.8 (26 Mar 2012) m, -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgrac at redhat.com Thu Aug 1 07:42:22 2013 From: mgrac at redhat.com (Marek Grac) Date: Thu, 01 Aug 2013 09:42:22 +0200 Subject: [Linux-cluster] fence_drac5 timeouts In-Reply-To: References: Message-ID: <51FA115E.1050602@redhat.com> On 07/26/2013 04:29 AM, ch urnd wrote: > I'm trying to get fence_drac5 working on a cluster I'm setting up of > two Dell R410's. The primary issue I'm seeing are timeouts. The > fence does seem to work as the other node will get shut down, but the > script always exits 1. > Please which version do you use? This looks very likely to be bug resolved in https://git.fedorahosted.org/cgit/fence-agents.git/commit/?id=4bd62484e17cc63b27a103c744ec11fb00610b48 when autodetect of EOL was not working properly on DRAC devices when using ssh. m, > Here's the output: > > # fence_drac5 -a 192.168.1.100 --power-timeout 30 -x -l root -p calvin > -c 'admin1->' -o reboot > Connection timed out > > # fence_drac5 -a 192.168.1.100 --power-timeout 30 -v -x -l root -p > calvin -c 'admin1->' -o reboot > root at 192.168.1.100 's password: > /admin1-> racadm serveraction powerstatus > Server power status: ON > /admin1-> > /admin1-> racadm serveraction powerdown > Server power operation successful > /admin1->Traceback (most recent call last): > File "/usr/sbin/fence_drac5", line 154, in > main() > File "/usr/sbin/fence_drac5", line 137, in main > result = fence_action(conn, options, set_power_status, > get_power_status, get_list_devices) > File "/usr/share/fence/fencing.py", line 838, in fence_action > if wait_power_status(tn, options, get_power_fn) == 0: > File "/usr/share/fence/fencing.py", line 744, in wait_power_status > if get_power_fn(tn, options) != options["-o"]: > File "/usr/sbin/fence_drac5", line 38, in get_power_status > status = re.compile("(^|: )(ON|OFF|Powering ON|Powering OFF)\s*$", > re.IGNORECASE | re.MULTILINE).search(conn.before).group(2) > AttributeError: 'NoneType' object has no attribute 'group' > > > > Even though I pass "-o reboot", it still powers off. It does the same > even if I don't pass that option. > > I added --power-timeout 30 in the latest test to see if that'd help > but no dice. Doesn't work without it either. > > I have tried fence_ipmilan & it works great, but the iDRAC interfaces > are somewhat exposed & need to use SSH for security reasons, which > limits me to fence_drac5. > > Thanks. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Maeulen at awp-shop.de Thu Aug 1 11:35:59 2013 From: Maeulen at awp-shop.de (=?iso-8859-1?Q?Johannes_M=E4ulen?=) Date: Thu, 1 Aug 2013 11:35:59 +0000 Subject: [Linux-cluster] fence_ipmilan In-Reply-To: <51FA10C6.6090701@redhat.com> References: <9A757AF2CA7F204A8F2444FFC5C27C30485F536C@Exchange2010.Skynet.local> <51FA10C6.6090701@redhat.com> Message-ID: <9A757AF2CA7F204A8F2444FFC5C27C30485F58C9@Exchange2010.Skynet.local> Hi there, thanks for your reply. This version is in the debian stable repositories, I?ve installed from there. http://packages.debian.org/search?keywords=fence-agents Next time I know better Kind regards Von: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] Im Auftrag von Marek Grac Gesendet: Donnerstag, 1. August 2013 09:40 An: linux-cluster at redhat.com Betreff: Re: [Linux-cluster] fence_ipmilan On 07/31/2013 03:57 PM, Johannes M?ulen wrote: Hi there, I?m trying to setup a cluster and had issues with ?fence_ipmilan? from the package fence-agents. I?m running debian 7.1 with a 3.2.0-4-amd64 kernel. ?fence_ipmilan ?V? gives ?fence_ipmilan 3.1.5? My Cluster nodes are running on Supermicro Motherboards with IPMI on-board. (To be exact: http://www.supermicro.nl/products/motherboard/Xeon/C202_C204/X9SCA-F.cfm ) I?ve experienced the following behavior: fence_ipmilan -a xxx.xxx.xxx.xxx -l USER-p PASS -v -o off; echo $? Powering off machine @ IPMI:xxx.xxx.xxx.xxx...Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power status'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power off'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power status'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power off'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power status'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power off'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power status'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power off'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power status'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power off'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power status'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power off'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power status'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power off'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power status'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power off'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power status'... ipmilan: Power still on Failed 1 More or less in the same moment when I got this message the machine went down. So all the commands were working, but not in the expected time. ( Using Supermicro Mainboards with IPMI onboard, http://www.supermicro.nl/products/motherboard/Xeon/C202_C204/X9SCA-F.cfm ) I?ve played around with available parameters and wasn?t able to fix this behavior. So I went into the source code(fence-agents-3.1.5/fence/agents/ipmilan/ipmilan.c) and had a look at the ipmi_off function. There was a fixed value of 2 seconds to sleep. I modified this to use the same parameter like ipmi_on: ipmi-> i_power_wait instead of 2, so that I can modify this value and test if it has effect on my problem. Now when I use the modified version of fence_ipmilan the output looks like: ./fence_ipmilan -a xxx.xxx.xxx.xxx -l USER -p PASS -T 10 -v -o off ; echo $? Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power off'... Spawning: '/usr/bin/ipmitool -I lan -H 'xxx.xxx.xxx.xxx' -U 'USER' -P 'PASS' -v chassis power status'... Done 0 So I think this fixed my problem, and I think it might help other users experiencing the same issues. Kind regards Thanks for reporting, I'm quite suprised that you use such old version. This issue was fixed in 3.1.8 (26 Mar 2012) m, -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6310 bytes Desc: not available URL: From churnd at gmail.com Thu Aug 1 12:46:25 2013 From: churnd at gmail.com (ch urnd) Date: Thu, 1 Aug 2013 08:46:25 -0400 Subject: [Linux-cluster] fence_drac5 timeouts In-Reply-To: <51FA115E.1050602@redhat.com> References: <51FA115E.1050602@redhat.com> Message-ID: $ fence_drac5 -V 3.1.5 (built Fri Feb 22 06:44:39 UTC 2013) Copyright (C) Red Hat, Inc. 2004-2010 All rights reserved. It's the version that came in the CentOS repos. How do I get the fixed version? On Thu, Aug 1, 2013 at 3:42 AM, Marek Grac wrote: > On 07/26/2013 04:29 AM, ch urnd wrote: > > I'm trying to get fence_drac5 working on a cluster I'm setting up of two > Dell R410's. The primary issue I'm seeing are timeouts. The fence does > seem to work as the other node will get shut down, but the script always > exits 1. > > Please which version do you use? > > This looks very likely to be bug resolved in > https://git.fedorahosted.org/cgit/fence-agents.git/commit/?id=4bd62484e17cc63b27a103c744ec11fb00610b48 > when autodetect of EOL was not working properly on DRAC devices when using > ssh. > > m, > > > Here's the output: > > # fence_drac5 -a 192.168.1.100 --power-timeout 30 -x -l root -p calvin > -c 'admin1->' -o reboot > Connection timed out > > # fence_drac5 -a 192.168.1.100 --power-timeout 30 -v -x -l root -p > calvin -c 'admin1->' -o reboot > root at 192.168.1.100's password: > /admin1-> racadm serveraction powerstatus > Server power status: ON > /admin1-> > /admin1-> racadm serveraction powerdown > Server power operation successful > /admin1->Traceback (most recent call last): > File "/usr/sbin/fence_drac5", line 154, in > main() > File "/usr/sbin/fence_drac5", line 137, in main > result = fence_action(conn, options, set_power_status, > get_power_status, get_list_devices) > File "/usr/share/fence/fencing.py", line 838, in fence_action > if wait_power_status(tn, options, get_power_fn) == 0: > File "/usr/share/fence/fencing.py", line 744, in wait_power_status > if get_power_fn(tn, options) != options["-o"]: > File "/usr/sbin/fence_drac5", line 38, in get_power_status > status = re.compile("(^|: )(ON|OFF|Powering ON|Powering OFF)\s*$", > re.IGNORECASE | re.MULTILINE).search(conn.before).group(2) > AttributeError: 'NoneType' object has no attribute 'group' > > > > Even though I pass "-o reboot", it still powers off. It does the same > even if I don't pass that option. > > I added --power-timeout 30 in the latest test to see if that'd help but > no dice. Doesn't work without it either. > > I have tried fence_ipmilan & it works great, but the iDRAC interfaces > are somewhat exposed & need to use SSH for security reasons, which limits > me to fence_drac5. > > Thanks. > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From goutam.baul at rp-sg.in Fri Aug 2 12:34:41 2013 From: goutam.baul at rp-sg.in (Goutam Baul) Date: Fri, 2 Aug 2013 18:04:41 +0530 Subject: [Linux-cluster] Red Hat Cluster Suit and DRBD Message-ID: <00a901ce8f7c$a98119d0$fc834d70$@rp-sg.in> Dear List, We have deployed mailing solution over a two node cluster using Red Hat Cluster Suit. The OS is RHEL 6.1 (64 bit) at our data center. We are having an identical setup at our DR site. We need to replicate the mail store and the mail queue to the DR site. For this we are planning to implement DRBD 8.4.3 (community version). We have created the DRBD related RPMs using the rpmbuild command and this has generated the following packages: drbd-heartbeat-8.4.3-2.el6.x86_64 drbd-xen-8.4.3-2.el6.x86_64 drbd-km-debuginfo-8.4.3-2.el6.x86_64 drbd-udev-8.4.3-2.el6.x86_64 drbd-8.4.3-2.el6.x86_64 drbd-pacemaker-8.4.3-2.el6.x86_64 drbd-debuginfo-8.4.3-2.el6.x86_64 drbd-utils-8.4.3-2.el6.x86_64 drbd-km-2.6.32_131.0.15.el6.x86_64-8.4.3-2.el6.x86_64 drbd-bash-completion-8.4.3-2.el6.x86_64 We are finding that the packages drbd-heartbeat-8.4.3-2.el6.x86_64 and drbd-pacemaker-8.4.3-2.el6.x86_64 are "MUST TO INSTALL" for the installation of the package drbd-8.4.3-2.el6.x86_64. Our question is whether these packages would interfere in any way with the Red Hat Cluster Suit. Will anyone kindly clear our doubts please? With regards, Goutam -------------- next part -------------- An HTML attachment was scrubbed... URL: From linuxtovishesh at gmail.com Fri Aug 2 14:21:06 2013 From: linuxtovishesh at gmail.com (Vishesh kumar) Date: Fri, 2 Aug 2013 07:21:06 -0700 Subject: [Linux-cluster] Red Hat Cluster Suit and DRBD In-Reply-To: <00a901ce8f7c$a98119d0$fc834d70$@rp-sg.in> References: <00a901ce8f7c$a98119d0$fc834d70$@rp-sg.in> Message-ID: No these package will not interfere with RHCS. Thanks On Fri, Aug 2, 2013 at 5:34 AM, Goutam Baul wrote: > Dear List,**** > > ** ** > > We have deployed mailing solution over a two node cluster using Red Hat > Cluster Suit. The OS is RHEL 6.1 (64 bit) at our data center. We are having > an identical setup at our DR site. We need to replicate the mail store and > the mail queue to the DR site. For this we are planning to implement DRBD > 8.4.3 (community version). We have created the DRBD related RPMs using the > rpmbuild command and this has generated the following packages:**** > > ** ** > > drbd-heartbeat-8.4.3-2.el6.x86_64**** > > drbd-xen-8.4.3-2.el6.x86_64**** > > drbd-km-debuginfo-8.4.3-2.el6.x86_64**** > > drbd-udev-8.4.3-2.el6.x86_64**** > > drbd-8.4.3-2.el6.x86_64**** > > drbd-pacemaker-8.4.3-2.el6.x86_64**** > > drbd-debuginfo-8.4.3-2.el6.x86_64**** > > drbd-utils-8.4.3-2.el6.x86_64**** > > drbd-km-2.6.32_131.0.15.el6.x86_64-8.4.3-2.el6.x86_64**** > > drbd-bash-completion-8.4.3-2.el6.x86_64**** > > ** ** > > We are finding that the packages drbd-heartbeat-8.4.3-2.el6.x86_64 and > drbd-pacemaker-8.4.3-2.el6.x86_64 are ?MUST TO INSTALL? for the > installation of the package drbd-8.4.3-2.el6.x86_64. **** > > ** ** > > Our question is whether these packages would interfere in any way with the > Red Hat Cluster Suit. Will anyone kindly clear our doubts please?**** > > ** ** > > With regards,**** > > ** ** > > Goutam**** > > ** ** > > ** ** > > ** ** > > ** ** > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- http://linuxmantra.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Colin.Simpson at iongeo.com Fri Aug 2 15:16:49 2013 From: Colin.Simpson at iongeo.com (Colin Simpson) Date: Fri, 2 Aug 2013 15:16:49 +0000 Subject: [Linux-cluster] Red Hat Cluster Suit and DRBD In-Reply-To: References: <00a901ce8f7c$a98119d0$fc834d70$@rp-sg.in> Message-ID: <1375456609.24061.17.camel@bhac.iouk.ioroot.tld> RH do support the use of DRBD with RH Cluster Suite, I guess in limited circumstances (and not all the packages you see). But you have to have a contract with Linbit to support the DRBD side so RH can escalate to them if issues are on their side and not the Red Hat Side. There was a long RFE Bugzilla on getting DRBD into RHEL 6. https://bugzilla.redhat.com/show_bug.cgi?id=585309 , plus a solutions article: https://access.redhat.com/site/solutions/32085 Thanks Colin On Fri, 2013-08-02 at 07:21 -0700, Vishesh kumar wrote: > No these package will not interfere with RHCS. > > > Thanks > > > > On Fri, Aug 2, 2013 at 5:34 AM, Goutam Baul > wrote: > Dear List, > > > > We have deployed mailing solution over a two node cluster > using Red Hat Cluster Suit. The OS is RHEL 6.1 (64 bit) at our > data center. We are having an identical setup at our DR site. > We need to replicate the mail store and the mail queue to the > DR site. For this we are planning to implement DRBD 8.4.3 > (community version). We have created the DRBD related RPMs > using the rpmbuild command and this has generated the > following packages: > > > > drbd-heartbeat-8.4.3-2.el6.x86_64 > > drbd-xen-8.4.3-2.el6.x86_64 > > drbd-km-debuginfo-8.4.3-2.el6.x86_64 > > drbd-udev-8.4.3-2.el6.x86_64 > > drbd-8.4.3-2.el6.x86_64 > > drbd-pacemaker-8.4.3-2.el6.x86_64 > > drbd-debuginfo-8.4.3-2.el6.x86_64 > > drbd-utils-8.4.3-2.el6.x86_64 > > drbd-km-2.6.32_131.0.15.el6.x86_64-8.4.3-2.el6.x86_64 > > drbd-bash-completion-8.4.3-2.el6.x86_64 > > > > We are finding that the packages > drbd-heartbeat-8.4.3-2.el6.x86_64 and > drbd-pacemaker-8.4.3-2.el6.x86_64 are ?MUST TO INSTALL? for > the installation of the package drbd-8.4.3-2.el6.x86_64. > > > > Our question is whether these packages would interfere in any > way with the Red Hat Cluster Suit. Will anyone kindly clear > our doubts please? > > > > With regards, > > > > Goutam > > > > > > > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > http://linuxmantra.com ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. From lists at alteeve.ca Fri Aug 2 15:37:37 2013 From: lists at alteeve.ca (Digimer) Date: Fri, 02 Aug 2013 11:37:37 -0400 Subject: [Linux-cluster] Red Hat Cluster Suit and DRBD In-Reply-To: <1375456609.24061.17.camel@bhac.iouk.ioroot.tld> References: <00a901ce8f7c$a98119d0$fc834d70$@rp-sg.in> <1375456609.24061.17.camel@bhac.iouk.ioroot.tld> Message-ID: <51FBD241.6090607@alteeve.ca> On 02/08/13 11:16, Colin Simpson wrote: > RH do support the use of DRBD with RH Cluster Suite, I guess in limited > circumstances (and not all the packages you see). > > But you have to have a contract with Linbit to support the DRBD side so > RH can escalate to them if issues are on their side and not the Red Hat > Side. > > There was a long RFE Bugzilla on getting DRBD into RHEL 6. > > https://bugzilla.redhat.com/show_bug.cgi?id=585309 > > , plus a solutions article: > > https://access.redhat.com/site/solutions/32085 > > Thanks > > Colin To expand a little on Colin's answer; Red Hat will support an environment with DRBD installed, but they do not support DRBD itself. If there is an issue that is identified as a DRBD issue, they will forward you on to Linbit for support (different support contract needed). -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From topumirza at gmail.com Sun Aug 4 12:31:28 2013 From: topumirza at gmail.com (topu mirza) Date: Sun, 4 Aug 2013 18:31:28 +0600 Subject: [Linux-cluster] Add. Sense: Logical unit not ready, manual intervention required Message-ID: Dear All some time automatically switch resources one node to another node Error Message Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e 7f fc 20 00 00 08 00 Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Sense Key : Not Ready [current] Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Add. Sense: Logical unit not ready, manual intervention required Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e 7f fc 28 00 00 10 00 Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Sense Key : Not Ready [current] Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Add. Sense: Logical unit not ready, manual intervention required Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e 7f fc 28 00 00 08 00 Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Sense Key : Not Ready [current] Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Add. Sense: Logical unit not ready, manual intervention required Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e 7f fc 38 00 00 10 00 Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Sense Key : Not Ready [current] Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Add. Sense: Logical unit not ready, manual intervention required Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e 7f fc 38 00 00 08 00 Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Sense Key : Not Ready [current] Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Add. Sense: Logical unit not ready, manual intervention required Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e 7f fc 48 00 00 08 00 Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Sense Key : Not Ready [current] Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Add. Sense: Logical unit not ready, manual intervention required Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e 7f fc 48 00 00 08 00 Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready -- Thanks, Topu Mirza -------------- next part -------------- An HTML attachment was scrubbed... URL: From emi2fast at gmail.com Mon Aug 5 07:26:21 2013 From: emi2fast at gmail.com (emmanuel segura) Date: Mon, 5 Aug 2013 09:26:21 +0200 Subject: [Linux-cluster] Add. Sense: Logical unit not ready, manual intervention required In-Reply-To: References: Message-ID: Hello Is not a cluster problem, are you using san or iscsi? 2013/8/4 topu mirza > > Dear All > some time automatically switch resources one node to another node > > > Error Message > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e > 7f fc 20 00 00 08 00 > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Sense Key : Not Ready > [current] > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Add. Sense: Logical > unit not ready, manual intervention required > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e > 7f fc 28 00 00 10 00 > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Sense Key : Not Ready > [current] > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Add. Sense: Logical > unit not ready, manual intervention required > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e > 7f fc 28 00 00 08 00 > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Sense Key : Not Ready > [current] > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Add. Sense: Logical > unit not ready, manual intervention required > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e > 7f fc 38 00 00 10 00 > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Sense Key : Not Ready > [current] > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Add. Sense: Logical > unit not ready, manual intervention required > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e > 7f fc 38 00 00 08 00 > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Sense Key : Not Ready > [current] > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Add. Sense: Logical > unit not ready, manual intervention required > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e > 7f fc 48 00 00 08 00 > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Sense Key : Not Ready > [current] > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Add. Sense: Logical > unit not ready, manual intervention required > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] CDB: Read(10): 28 00 3e > 7f fc 48 00 00 08 00 > Jul 26 00:50:00 rxdb2 kernel: sd 2:0:0:157: [sdf] Device not ready > -- > Thanks, > Topu Mirza > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgrac at redhat.com Mon Aug 5 08:12:41 2013 From: mgrac at redhat.com (Marek Grac) Date: Mon, 05 Aug 2013 10:12:41 +0200 Subject: [Linux-cluster] fence_drac5 timeouts In-Reply-To: References: <51FA115E.1050602@redhat.com> Message-ID: <51FF5E79.6060702@redhat.com> On 08/01/2013 02:46 PM, ch urnd wrote: > $ fence_drac5 -V > 3.1.5 (built Fri Feb 22 06:44:39 UTC 2013) > Copyright (C) Red Hat, Inc. 2004-2010 All rights reserved. > > It's the version that came in the CentOS repos. > > How do I get the fixed version? > Feel free to use version from fedora m, From jfriesse at redhat.com Mon Aug 5 11:05:01 2013 From: jfriesse at redhat.com (Jan Friesse) Date: Mon, 05 Aug 2013 13:05:01 +0200 Subject: [Linux-cluster] corosync and token, token_retransmit, token_retransmit_before_loss_const confusion In-Reply-To: <51F8731E.10008@jonesmail.me> References: <51F8731E.10008@jonesmail.me> Message-ID: <51FF86DD.2040007@redhat.com> Rusell, Russell Jones napsal(a): > Hi all, > > I am trying to understand how the corosync token, token_retansmit, and > token_retransmit_before_loss_const variables all tie in together. > Definitively look to corosync.conf man page. Summary: token: How long to wait until receive token. When not received, start forming new cluster token_retransmit is automatically computed from token_retransmits_before_loss_const: It's used for making membership more stable. If token is not received in given time, previous token is retransmitted. So If token was lost on the line (and because of UDP it's possible), it may be retransmitted. This value is SMALLER then token (usually 1/4 of token), so it means, 4 tokens are sent before node tries to recreate membership. Generally, don't modify token_retransmit and token_retransmits_before_loss_const. Just modify token if you have big latency. Some setups (very rarely) also need to modify send_join and join. > I have a standard RHCS v3 cluster set up and running. The token timeout > is set to 10000. When testing it seems to detect failed members pretty > consistently within 10 seconds. What I am not understanding is *when* a > node is declared dead, and a fence call is actually made. The man pages > show that the cluster is reconfigured when the "token" time is reached, > and also when token_retransmits_before_loss_const is reached. This is > confusing :-) As I said, formula is token/token_retransmits_before_loss_const = token_retransmit. So just set token if you need something special. If you will set token_retransmit incorrectly, it may take precedence or token may take precedence (whatever is smaller). > > > Which one is it that will reform the cluster? Both? When does one taken > precedence over the other? > Both. Smaller one. > > Thanks! > Regards, Honza From russell at jonesmail.me Mon Aug 5 23:20:02 2013 From: russell at jonesmail.me (Russell Jones) Date: Mon, 05 Aug 2013 18:20:02 -0500 Subject: [Linux-cluster] corosync and token, token_retransmit, token_retransmit_before_loss_const confusion In-Reply-To: <51FF86DD.2040007@redhat.com> References: <51F8731E.10008@jonesmail.me> <51FF86DD.2040007@redhat.com> Message-ID: <52003322.8080809@jonesmail.me> That was very helpful, thank you! On 8/5/2013 6:05 AM, Jan Friesse wrote: > Rusell, > > Russell Jones napsal(a): >> Hi all, >> >> I am trying to understand how the corosync token, token_retansmit, and >> token_retransmit_before_loss_const variables all tie in together. >> > Definitively look to corosync.conf man page. > > Summary: > token: How long to wait until receive token. When not received, start > forming new cluster > > token_retransmit is automatically computed from > token_retransmits_before_loss_const: It's used for making membership > more stable. If token is not received in given time, previous token is > retransmitted. So If token was lost on the line (and because of UDP it's > possible), it may be retransmitted. This value is SMALLER then token > (usually 1/4 of token), so it means, 4 tokens are sent before node tries > to recreate membership. > > Generally, don't modify token_retransmit and > token_retransmits_before_loss_const. Just modify token if you have big > latency. Some setups (very rarely) also need to modify send_join and join. > > >> I have a standard RHCS v3 cluster set up and running. The token timeout >> is set to 10000. When testing it seems to detect failed members pretty >> consistently within 10 seconds. What I am not understanding is *when* a >> node is declared dead, and a fence call is actually made. The man pages >> show that the cluster is reconfigured when the "token" time is reached, >> and also when token_retransmits_before_loss_const is reached. This is >> confusing :-) > As I said, formula is token/token_retransmits_before_loss_const = > token_retransmit. So just set token if you need something special. If > you will set token_retransmit incorrectly, it may take precedence or > token may take precedence (whatever is smaller). >> >> Which one is it that will reform the cluster? Both? When does one taken >> precedence over the other? >> > Both. Smaller one. > >> Thanks! >> > Regards, > Honza > From sdake at redhat.com Tue Aug 6 18:36:50 2013 From: sdake at redhat.com (Steven Dake) Date: Tue, 06 Aug 2013 11:36:50 -0700 Subject: [Linux-cluster] corosync and token, token_retransmit, token_retransmit_before_loss_const confusion In-Reply-To: <51F8731E.10008@jonesmail.me> References: <51F8731E.10008@jonesmail.me> Message-ID: <52014242.9090303@redhat.com> On 07/30/2013 07:14 PM, Russell Jones wrote: > Hi all, > > I am trying to understand how the corosync token, token_retansmit, and > token_retransmit_before_loss_const variables all tie in together. > > I have a standard RHCS v3 cluster set up and running. The token > timeout is set to 10000. When testing it seems to detect failed > members pretty consistently within 10 seconds. What I am not > understanding is *when* a node is declared dead, and a fence call is > actually made. The man pages show that the cluster is reconfigured > when the "token" time is reached, and also when > token_retransmits_before_loss_const is reached. This is confusing :-) > I agree, after reading the man page, it appears a bit confusing. Better wording for token_retransmits_before_loss_const would be: token_retransmits_before_loss_const This value identifies how many token retransmits should be attempted. If no token is received by the next processor in the ring before token expires, a new configuration will be formed. If this value is set, retransmit and hold will be automatically calculated from retransmits_before_loss and token. The default is 4 retransmissions. I've submitted an upstream manual page change. > > Which one is it that will reform the cluster? Both? When does one > taken precedence over the other? > > > Thanks! > From dc12078 at gmail.com Wed Aug 7 14:23:51 2013 From: dc12078 at gmail.com (D C) Date: Wed, 7 Aug 2013 10:23:51 -0400 Subject: [Linux-cluster] DRBD on RHEL6 Cluster? Message-ID: I'm trying to get drbd setup on a new Centos6 cluster. everything seems to be ok in my cluster.conf, except whenever I add the drbd resource, it stops working. I also noticed i don't see anything in /usr/shared/cluster/ for drbd. Am I missing a package maybe? ccs_config_validate fails with: [root at e-clust-01 cluster]# ccs_config_validate Relax-NG validity error : Extra element rm in interleave tempfile:20: element rm: Relax-NG validity error : Element cluster failed to validate content Configuration fails to validate and when i use rg_test it always skips over the drbd resource, and anything nested inside it. cluster.conf: