From mkparam at gmail.com Sat Sep 1 04:23:25 2012 From: mkparam at gmail.com (PARAM KRISH) Date: Sat, 1 Sep 2012 09:53:25 +0530 Subject: [Linux-cluster] Services getting stuck on node In-Reply-To: References: Message-ID: Hi I just started using Redhat Cluster two weeks ago so i don't claim myself an expert. Looking at this error, i can recommend you to look at /var/log/cluster/fenced.log and also try commands like "fence_tool ls , fence_tool dump" and look at the output if it returns any error. Alternately, if you have time to investigate, do "service stop rgmanager" and make sure it does not run, and try starting in the foreground as "rgmanager -f" and see what it reports when you can simulate the same scenario. Other than that, your /var/log/messages and /var/log/cluster/*.log files must tell you something going on. Param On Sat, Sep 1, 2012 at 4:03 AM, Colin Simpson wrote: > Hi > > I had a strange issue this afternoon. One of my cluster nodes died > (possible hw fault or driver issue). But the other node failed to take a > number of it's services (2 node cluster), when it was successfully fenced. > > The clustat indicated that the services were on still on the original node > (started) but the top lines correctly stated that the node was "offline". > The rgmanager log says for this event: > > Aug 31 17:19:30 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:30 rgmanager [ip] Local ping to 10.10.1.45 succeeded > Aug 31 17:19:37 rgmanager State change: bld1uxn1i DOWN > Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.46, Level 10 > Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.45, Level 0 > Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.33, Level 0 > Aug 31 17:19:49 rgmanager [ip] 10.10.1.46 present on bond0 > Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.43, Level 0 > Aug 31 17:19:49 rgmanager [ip] 10.10.1.45 present on bond0 > Aug 31 17:19:49 rgmanager [ip] 10.10.1.33 present on bond0 > Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected > Aug 31 17:19:49 rgmanager [ip] 10.10.1.43 present on bond0 > Aug 31 17:19:49 rgmanager Taking over service service:nfsdprj from down > member bld1uxn1i > Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected > Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected > Aug 31 17:19:49 rgmanager #47: Failed changing service status > Aug 31 17:19:49 rgmanager Taking over service service:httpd from down > member bld1uxn1i > Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected > Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:49 rgmanager #47: Failed changing service status > Aug 31 17:19:49 rgmanager [ip] Local ping to 10.10.1.46 succeeded > Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:49 rgmanager #13: Service service:nfsdprj failed to stop > cleanly > Aug 31 17:19:49 rgmanager #13: Service service:httpd failed to stop cleanly > A couple of other services did successfully switch after this. > > I have seem this a few times (randomly) on various clusters since around > the time of upgrading to 6.3 from 6.2 (services refusing to cleanly stop on > a node). It's hard to reproduce and when down we usually just want a > restart as fast as possible (thereby limiting time for debugging). > > How can I see what is causing the "#47: Failed changing service status" or > any more debugging we can turn on in rgmanager to help with this? > > Or better still has anyone else seen anything like this? > > Thanks > > Colin > > ________________________________ > > > This email and any files transmitted with it are confidential and are > intended solely for the use of the individual or entity to whom they are > addressed. If you are not the original recipient or the person responsible > for delivering the email to the intended recipient, be advised that you > have received this email in error, and that any use, dissemination, > forwarding, printing, or copying of this email is strictly prohibited. If > you received this email in error, please immediately notify the sender and > delete the original. > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emi2fast at gmail.com Sat Sep 1 10:04:39 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Sat, 1 Sep 2012 12:04:39 +0200 Subject: [Linux-cluster] Services getting stuck on node In-Reply-To: References: Message-ID: Hello Colin maybe your service doesn't switch because this happen ====================================================== Aug 31 17:19:49 rgmanager #13: Service service:nfsdprj failed to stop cleanly Aug 31 17:19:49 rgmanager #13: Service service:httpd failed to stop cleanly ====================================================== for debug your service stop, you can use rg_test test /etc/cluster/cluster.conf stop service for help you think is more easy if you show your cluster.conf Thanks :-) 2012/9/1 Colin Simpson > Hi > > I had a strange issue this afternoon. One of my cluster nodes died > (possible hw fault or driver issue). But the other node failed to take a > number of it's services (2 node cluster), when it was successfully fenced. > > The clustat indicated that the services were on still on the original node > (started) but the top lines correctly stated that the node was "offline". > The rgmanager log says for this event: > > Aug 31 17:19:30 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:30 rgmanager [ip] Local ping to 10.10.1.45 succeeded > Aug 31 17:19:37 rgmanager State change: bld1uxn1i DOWN > Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.46, Level 10 > Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.45, Level 0 > Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.33, Level 0 > Aug 31 17:19:49 rgmanager [ip] 10.10.1.46 present on bond0 > Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.43, Level 0 > Aug 31 17:19:49 rgmanager [ip] 10.10.1.45 present on bond0 > Aug 31 17:19:49 rgmanager [ip] 10.10.1.33 present on bond0 > Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected > Aug 31 17:19:49 rgmanager [ip] 10.10.1.43 present on bond0 > Aug 31 17:19:49 rgmanager Taking over service service:nfsdprj from down > member bld1uxn1i > Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected > Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected > Aug 31 17:19:49 rgmanager #47: Failed changing service status > Aug 31 17:19:49 rgmanager Taking over service service:httpd from down > member bld1uxn1i > Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected > Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:49 rgmanager #47: Failed changing service status > Aug 31 17:19:49 rgmanager [ip] Local ping to 10.10.1.46 succeeded > Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 > Aug 31 17:19:49 rgmanager #13: Service service:nfsdprj failed to stop > cleanly > Aug 31 17:19:49 rgmanager #13: Service service:httpd failed to stop cleanly > A couple of other services did successfully switch after this. > > I have seem this a few times (randomly) on various clusters since around > the time of upgrading to 6.3 from 6.2 (services refusing to cleanly stop on > a node). It's hard to reproduce and when down we usually just want a > restart as fast as possible (thereby limiting time for debugging). > > How can I see what is causing the "#47: Failed changing service status" or > any more debugging we can turn on in rgmanager to help with this? > > Or better still has anyone else seen anything like this? > > Thanks > > Colin > > ________________________________ > > > This email and any files transmitted with it are confidential and are > intended solely for the use of the individual or entity to whom they are > addressed. If you are not the original recipient or the person responsible > for delivering the email to the intended recipient, be advised that you > have received this email in error, and that any use, dissemination, > forwarding, printing, or copying of this email is strictly prohibited. If > you received this email in error, please immediately notify the sender and > delete the original. > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From Colin.Simpson at iongeo.com Sat Sep 1 12:56:47 2012 From: Colin.Simpson at iongeo.com (Colin Simpson) Date: Sat, 1 Sep 2012 12:56:47 +0000 Subject: [Linux-cluster] Services getting stuck on node In-Reply-To: References: , Message-ID: Thanks for getting back. I'll try the debug shutdown with that command. Though I think the "failed to stop cleanly" is far from clear what that means. The node it was running on has gone (was fenced) so there was nothing to stop before starting on this node. Thanks Colin ________________________________ From: linux-cluster-bounces at redhat.com [linux-cluster-bounces at redhat.com] on behalf of emmanuel segura [emi2fast at gmail.com] Sent: 01 September 2012 11:04 To: linux clustering Subject: Re: [Linux-cluster] Services getting stuck on node Hello Colin maybe your service doesn't switch because this happen ====================================================== Aug 31 17:19:49 rgmanager #13: Service service:nfsdprj failed to stop cleanly Aug 31 17:19:49 rgmanager #13: Service service:httpd failed to stop cleanly ====================================================== for debug your service stop, you can use rg_test test /etc/cluster/cluster.conf stop service for help you think is more easy if you show your cluster.conf Thanks :-) 2012/9/1 Colin Simpson > Hi I had a strange issue this afternoon. One of my cluster nodes died (possible hw fault or driver issue). But the other node failed to take a number of it's services (2 node cluster), when it was successfully fenced. The clustat indicated that the services were on still on the original node (started) but the top lines correctly stated that the node was "offline". The rgmanager log says for this event: Aug 31 17:19:30 rgmanager [ip] Link detected on bond0 Aug 31 17:19:30 rgmanager [ip] Local ping to 10.10.1.45 succeeded Aug 31 17:19:37 rgmanager State change: bld1uxn1i DOWN Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.46, Level 10 Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.45, Level 0 Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.33, Level 0 Aug 31 17:19:49 rgmanager [ip] 10.10.1.46 present on bond0 Aug 31 17:19:49 rgmanager [ip] Checking 10.10.1.43, Level 0 Aug 31 17:19:49 rgmanager [ip] 10.10.1.45 present on bond0 Aug 31 17:19:49 rgmanager [ip] 10.10.1.33 present on bond0 Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected Aug 31 17:19:49 rgmanager [ip] 10.10.1.43 present on bond0 Aug 31 17:19:49 rgmanager Taking over service service:nfsdprj from down member bld1uxn1i Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected Aug 31 17:19:49 rgmanager #47: Failed changing service status Aug 31 17:19:49 rgmanager Taking over service service:httpd from down member bld1uxn1i Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 Aug 31 17:19:49 rgmanager [ip] Link for bond0: Detected Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 Aug 31 17:19:49 rgmanager #47: Failed changing service status Aug 31 17:19:49 rgmanager [ip] Local ping to 10.10.1.46 succeeded Aug 31 17:19:49 rgmanager [ip] Link detected on bond0 Aug 31 17:19:49 rgmanager #13: Service service:nfsdprj failed to stop cleanly Aug 31 17:19:49 rgmanager #13: Service service:httpd failed to stop cleanly A couple of other services did successfully switch after this. I have seem this a few times (randomly) on various clusters since around the time of upgrading to 6.3 from 6.2 (services refusing to cleanly stop on a node). It's hard to reproduce and when down we usually just want a restart as fast as possible (thereby limiting time for debugging). How can I see what is causing the "#47: Failed changing service status" or any more debugging we can turn on in rgmanager to help with this? Or better still has anyone else seen anything like this? Thanks Colin ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- esta es mi vida e me la vivo hasta que dios quiera ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. -------------- next part -------------- An HTML attachment was scrubbed... URL: From d_joshi84 at yahoo.com Sat Sep 1 15:57:38 2012 From: d_joshi84 at yahoo.com (joshi dhaval) Date: Sat, 1 Sep 2012 23:57:38 +0800 (SGT) Subject: [Linux-cluster] Understanding Fencing In-Reply-To: <1346084054.38090.YahooMailClassic@web190405.mail.sg3.yahoo.com> Message-ID: <1346515058.79506.YahooMailClassic@web190404.mail.sg3.yahoo.com> Hello, I tried to read some documents on fencing, still bit confused with technology. ( i dont want to buy any extra hardware just for fencing ). we are using HP DL 380 G6, G7 servers at out environment, only way i can see fencing possible in my environment is HP ILO. what is PDU ? do i need to purchase separate device to enable fencing using PDU ? is that IPMI is same as HP ILO ? for above hardware what you suggest are the most reliable fencing techniques i should use ? is that cross cable connection is possible just to check hearbeats like VCS has gab and llt ? i am panning to configure 2 nodes cluster first once i will have confidence i will move it to 4 or 5 node cluster. Regards, Dhaval -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Sat Sep 1 16:28:18 2012 From: lists at alteeve.ca (Digimer) Date: Sat, 01 Sep 2012 12:28:18 -0400 Subject: [Linux-cluster] Understanding Fencing In-Reply-To: <1346515058.79506.YahooMailClassic@web190404.mail.sg3.yahoo.com> References: <1346515058.79506.YahooMailClassic@web190404.mail.sg3.yahoo.com> Message-ID: <504237A2.8050409@alteeve.ca> Side note, then i will answer in-line. When possible, please start a new email to a mailing list, instead of hitting "reply" on an existing message and then deleting the content. A lot of people's email clients threading breaks when an email isn't new. On 09/01/2012 11:57 AM, joshi dhaval wrote: > Hello, > > I tried to read some documents on fencing, still bit confused with > technology. ( i dont want to buy any extra hardware just for fencing ). Was this one of the things you read? https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing > we are using HP DL 380 G6, G7 servers at out environment, only way i can > see fencing possible in my environment is HP ILO. Yes, you can use fence_ilo with that. I have done so myself and cover how to set it up here: https://alteeve.ca/w/Configuring_HP_iLO_2_on_EL6 and how to use it as a fence device here: https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Example_.3Cfencedevice....3E_Tag_For_HP_iLO > what is PDU ? do i need to purchase separate device to enable fencing > using PDU ? A PDU (Power Distribution Unit) is, by itself, just another name for a power bar, though it generally refers to rack-mounted power bars. In fencing though, we use a version called a "Switched PDU". These are power bars with a network connection. They allow you to connect remotely and turn each outlet on and off independently of the other ports. They also offer power monitoring and so one, but that's outside fencing. So in fencing, if for example, the power supply failed then the server would power down and take the IPMI or iLO interface with it (see below). Without any power at all, the IPMI will not reply as it will also have no power. We know in this case that the node is gone, but the other nodes don't. All they know is that they can't talk to the node or it's IPMI/iLO interfaces, which could just as well be network outage leaving the node alive. In this case, the cluster can call the Switched PDUs and ask them to turn off the outlet(s) feeding the server. When the PDUs say "ok, they're off", *then* the cluster can safely say "ok, now I know it has to be off" and can begin recovery. > is that IPMI is same as HP ILO ? No, but they are similar. I have a short write-up of it here: https://alteeve.ca/w/IPMI IPMI is a generic way for a server to offer "Out of Band" management. That is just a fancy way of saying "You can check on the state of the server even when the server is powered off". The piece of hardware inside your server that provides IPMI is called a "BMC" (Baseboard Management Controller). Think of it like a little, separate computer sitting on your server's motherboard. It draws it's power from the host, it can read the host's sensors (power state, fans, temperatures, etc) but it is still a totally separate device. In fencing, if one node stops responding (say because the OS crashed), another node in the cluster will call the victim's IPMI interface and say "please power off the host". The BMC then, effectively, "pushes and holds the power button" until the host shuts down. Then the IPMI device tells the caller that the power off was successful. The cluster then knows the state of the victim (it is powered off now) so it can safely recover. As for the difference between IPMI and iLO; Most major hardware vendors took IPMI and added a bunch of features on top of it. Then they renamed it to something they wanted. So HP called theirs "iLO", IBM called theirs "RSA", Dell called theirs "DRAC" and so on. These are all very similar to IPMI (some are similar enough that stock IPMI tools work with them). > for above hardware what you suggest are the most reliable fencing > techniques i should use ? I would use 'fence_ilo'. > is that cross cable connection is possible just to check hearbeats like > VCS has gab and llt ? I don't know VCS or llt so I can't comment. In RHCS, we use "corosync" for cluster membership. By default, it uses a multicast group for passing messages around the cluster and for detecting a node's death. It's similar to what I think you mean by "heartbeat". It is advised that you use a proper switch, though I do not believe it is required. > i am panning to configure 2 nodes cluster first once i will have > confidence i will move it to 4 or 5 node cluster. Then definitely use a proper switch, not back to back. > Regards, > Dhaval A final comment; In clustering, a failed fence action will leave the cluster in a state where it does not know the condition of a member. Given the dangers of making an assumption, the cluster would rather block (hang) than proceed in a way that could cause damage. This is why fencing is so critical; It restores the cluster to a known state after a fault. If you use only iLO for fencing (and many people do only use IPMI, iLO, etc), then you will be fine most of the time. For me personally, this is not good enough. If for any reason the other node(s) can't reach the IPMI or iLO interface, the fence action will fail and the cluster will hang. With a switched PDU, you have a backup fence device that would protect you against this by providing an alternate method of confirming the node's state. Thus, adding a switched PDU to your cluster, you remove another single point of failure. digimer -- Digimer Papers and Projects: https://alteeve.ca From kveri at kveri.com Sun Sep 2 00:11:31 2012 From: kveri at kveri.com (Kveri) Date: Sun, 2 Sep 2012 02:11:31 +0200 Subject: [Linux-cluster] GFS2 Message-ID: <59266966-EEC3-48E0-9703-1F9A3B9FB595@kveri.com> Hello, we're using gfs2 on drbd, we created cluster in incomplete state (only 1 node). When doing dd if=/dev/zero of=/gfs_partition/file we get filesystem freezes every 1-2 minutes for 10-20 seconds, I mean every filesystem on that machine freezes, doing ls /etc hangs in D state for 10-20 seconds. Sometimes this hang last for more than 2 minutes and hung task message gets logged in dmesg. iotop shows gfs2_logd and flush-XXX:X kernel process taking 99% io resources. GFS is mounted with rw,noatime,nodiratime,hostdata=jid=0 options. gettune options: quota_warn_period = 10 quota_quantum = 60 max_readahead = 262144 complain_secs = 10 statfs_slow = 0 quota_simul_sync = 64 statfs_quantum = 30 quota_scale = 1.0000 (1, 1) new_files_jdata = 0 Server is kernel 3.2.0-25 64bit. What could be the problem? Thank you. Martin From kveri at kveri.com Sun Sep 2 15:02:01 2012 From: kveri at kveri.com (Kveri) Date: Sun, 2 Sep 2012 17:02:01 +0200 Subject: [Linux-cluster] gfs2_logd eating 99% io, random filesystem freezes Message-ID: <1B898DB9-53D1-4982-8954-0F7DB2C2387F@kveri.com> Hello, we're using gfs2 on drbd, we created cluster in incomplete state (only 1 node). When doing dd if=/dev/zero of=/gfs_partition/file we get filesystem freezes every 1-2 minutes for 10-20 seconds, I mean every filesystem on that machine freezes, doing ls /etc hangs in D state for 10-20 seconds. Sometimes this hang last for more than 2 minutes and hung task message gets logged in dmesg. iotop shows gfs2_logd and flush-XXX:X kernel process taking 99% io resources. GFS is mounted with rw,noatime,nodiratime,hostdata=jid=0 options. gettune options: quota_warn_period = 10 quota_quantum = 60 max_readahead = 262144 complain_secs = 10 statfs_slow = 0 quota_simul_sync = 64 statfs_quantum = 30 quota_scale = 1.0000 (1, 1) new_files_jdata = 0 Server is kernel 3.2.0-25 64bit. Dmesg error (we did echo 1 > /proc/sys/kernel/hung_task_timeout_secs, but we also tested it with 120 secs): [ 818.882147] INFO: task ls:3531 blocked for more than 1 seconds. [ 818.882479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 818.882929] ls D ffff8803639364e0 0 3531 3269 0x00000000 [ 818.882932] ffff88033c789c58 0000000000000082 ffff88033c789be8 ffff8801e9c33780 [ 818.882936] ffff88033c789fd8 ffff88033c789fd8 ffff88033c789fd8 0000000000013780 [ 818.882940] ffff8801e5a72e00 ffff8801e5b32e00 0000000000000286 ffff88033c789ce0 [ 818.882943] Call Trace: [ 818.882950] [] ? gfs2_glock_demote_wait+0x20/0x20 [gfs2] [ 818.882953] [] schedule+0x3f/0x60 [ 818.882959] [] gfs2_glock_holder_wait+0xe/0x20 [gfs2] [ 818.882963] [] __wait_on_bit+0x5f/0x90 [ 818.882965] [] ? _raw_spin_lock+0xe/0x20 [ 818.882972] [] ? gfs2_glock_demote_wait+0x20/0x20 [gfs2] [ 818.882975] [] out_of_line_wait_on_bit+0x7c/0x90 [ 818.882978] [] ? autoremove_wake_function+0x40/0x40 [ 818.882985] [] gfs2_glock_wait+0x47/0x90 [gfs2] [ 818.882992] [] gfs2_glock_nq+0x318/0x440 [gfs2] [ 818.882998] [] ? kmem_cache_free+0x2f/0x110 [ 818.883007] [] gfs2_getattr+0xbb/0xf0 [gfs2] [ 818.883015] [] ? gfs2_getattr+0xb2/0xf0 [gfs2] [ 818.883020] [] vfs_getattr+0x4e/0x80 [ 818.883023] [] vfs_fstatat+0x4e/0x70 [ 818.883026] [] vfs_lstat+0x1e/0x20 [ 818.883029] [] sys_newlstat+0x1a/0x40 [ 818.883033] [] ? mntput+0x1f/0x30 [ 818.883036] [] ? path_put+0x22/0x30 [ 818.883039] [] ? sys_lgetxattr+0x5b/0x70 [ 818.883042] [] system_call_fastpath+0x16/0x1b What could be the problem? Thank you. Martin From member at linkedin.com Tue Sep 4 11:42:10 2012 From: member at linkedin.com (Jose Nuno Neto via LinkedIn) Date: Tue, 4 Sep 2012 11:42:10 +0000 (UTC) Subject: [Linux-cluster] Invitation to connect on LinkedIn Message-ID: <1222223745.6147249.1346758930716.JavaMail.app@ela4-app2316.prod> LinkedIn ------------ Jose Nuno Neto requested to add you as a connection on LinkedIn: ------------------------------------------ Krishna, I'd like to add you to my professional network on LinkedIn. - Jose Nuno Accept invitation from Jose Nuno Neto http://www.linkedin.com/e/-odgn7o-h6oxhtld-67/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/blk/I173266432_45/0UcDpKqiRzolZKqiRybmRSrCBvrmRLoORIrmkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYRd5YOcPgSdz8PdP59bSB4qjhPtQpSbPcOc3sMc34Vd3wLrCBxbOYWrSlI/eml-comm_invm-b-in_ac-inv28/?hs=false&tok=2gESaHQQGuvRo1 View profile of Jose Nuno Neto http://www.linkedin.com/e/-odgn7o-h6oxhtld-67/rso/3659852/fdfW/name/46069589_I173266432_45/?hs=false&tok=1gQ-DtZUiuvRo1 ------------------------------------------ You are receiving Invitation emails. This email was intended for Krishna Kumar. Learn why this is included: http://www.linkedin.com/e/-odgn7o-h6oxhtld-67/plh/http%3A%2F%2Fhelp%2Elinkedin%2Ecom%2Fapp%2Fanswers%2Fdetail%2Fa_id%2F4788/-GXI/?hs=false&tok=3Gfm08dEOuvRo1 (c) 2012, LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA. -------------- next part -------------- An HTML attachment was scrubbed... URL: From td3201 at gmail.com Tue Sep 4 15:01:25 2012 From: td3201 at gmail.com (Terry) Date: Tue, 4 Sep 2012 10:01:25 -0500 Subject: [Linux-cluster] NFS locks and failing over services Message-ID: Hello, I am running an NFS cluster with 3 exports distributed across 2 nodes. When I try to relocate an NFS export, it fails. I then have to disable and enable it on the other node. Does anyone have any tricks to get around this issue? I am sure it is due to file locking. Here's the config: ** ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *