From rohara at redhat.com Mon Jul 2 14:17:41 2012 From: rohara at redhat.com (Ryan O'Hara) Date: Mon, 02 Jul 2012 09:17:41 -0500 Subject: [Linux-cluster] Options for fencing at the node level . In-Reply-To: <1340937126.95735.YahooMailNeo@web193003.mail.sg3.yahoo.com> References: <1340937126.95735.YahooMailNeo@web193003.mail.sg3.yahoo.com> Message-ID: <4FF1AD85.5060500@redhat.com> On 06/28/2012 09:32 PM, Zama Ques wrote: > Hi All , > > I need to setup HA clustering using redhat cluster suite on two nodes , primary concern being high availability . Before trying it on production , I am trying to configure the setup on two desktop machines . For storage , I am creating a partition and sharing the partition as a iscsi target on a third machine . Would like to know what are the options for fencing available at the node level . I tried going through the conga interface for creating a > shared fence device , I could see one option is using GNBD . virtual machine fencing is there in the list but that is for xen based HA > cluster . scsi fencing is there , but as far as what I understand it does not support iscsi target as of now. Manual fencing is also there , and I am planning to use that , but would like to know is there any other options are available for fencing at node level ? SCSI fencing will work with iscsi if the iscsi target is SPC-3 compliant. The target must also support the preempt-and-abort SCSI subcommand. It really depends on what iscsi target you are using. I've used fence_scsi with iscsi a few times and it has worked, but I know that some iscsi targets have problems. Ryan From urgrue at bulbous.org Mon Jul 2 17:08:52 2012 From: urgrue at bulbous.org (urgrue) Date: Mon, 02 Jul 2012 19:08:52 +0200 Subject: [Linux-cluster] CLVM in a 3-node cluster Message-ID: <4FF1D5A4.3060105@bulbous.org> I'm trying to set up a 3-node cluster with clvm. Problem is, one node can't access the storage, and I'm getting: Error locking on node node3: Volume group for uuid not found: whenever I try to activate the LVs on one of the working nodes. This can't be "by design", can it? -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Mon Jul 2 17:14:10 2012 From: lists at alteeve.ca (Digimer) Date: Mon, 02 Jul 2012 13:14:10 -0400 Subject: [Linux-cluster] CLVM in a 3-node cluster In-Reply-To: <4FF1D5A4.3060105@bulbous.org> References: <4FF1D5A4.3060105@bulbous.org> Message-ID: <4FF1D6E2.7010209@alteeve.ca> On 07/02/2012 01:08 PM, urgrue wrote: > I'm trying to set up a 3-node cluster with clvm. Problem is, one node > can't access the storage, and I'm getting: > Error locking on node node3: Volume group for uuid not found: > whenever I try to activate the LVs on one of the working nodes. > > This can't be "by design", can it? Does pvscan show the right device? Are all nodes in the cluster? What does 'cman_tool status' and 'dlm_tool ls' show? -- Digimer Papers and Projects: https://alteeve.com From urgrue at bulbous.org Mon Jul 2 21:39:07 2012 From: urgrue at bulbous.org (urgrue) Date: Mon, 02 Jul 2012 23:39:07 +0200 Subject: [Linux-cluster] CLVM in a 3-node cluster In-Reply-To: <4FF1D6E2.7010209@alteeve.ca> References: <4FF1D5A4.3060105@bulbous.org> <4FF1D6E2.7010209@alteeve.ca> Message-ID: <4FF214FB.7000906@bulbous.org> On 2/7/12 19:14, Digimer wrote: > On 07/02/2012 01:08 PM, urgrue wrote: >> I'm trying to set up a 3-node cluster with clvm. Problem is, one node >> can't access the storage, and I'm getting: >> Error locking on node node3: Volume group for uuid not found: >> whenever I try to activate the LVs on one of the working nodes. >> >> This can't be "by design", can it? > > Does pvscan show the right device? Are all nodes in the cluster? What > does 'cman_tool status' and 'dlm_tool ls' show? > Sorry, I realize now I was misleading, let me clarify: The third node cannot access the storage, this is by design. I have three datacenters but only two have access to the active storage. The third datacenter only has an async copy, and will only activate (manually) in case of a massive disaster (failure of both the other datacenters). So I deliberately have a failover domain with only node1 and node2. node3's function is to provide quorum, but also be able to be activated (manually is fine) in case of a massive disaster. In other words node3 is part of the cluster, but it can't see the storage during normal operation. Looking at it another way, it's kind of as if we had a 3-node cluster where one node had an HBA failure but is otherwise working. Surely node1 and node2 should be able to continue running the services? So my question is, do I have an error somehwere, or is clvm really actually not able to function without all nodes being active and able to access storage? From emi2fast at gmail.com Mon Jul 2 22:40:10 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Tue, 3 Jul 2012 00:40:10 +0200 Subject: [Linux-cluster] CLVM in a 3-node cluster In-Reply-To: <4FF214FB.7000906@bulbous.org> References: <4FF1D5A4.3060105@bulbous.org> <4FF1D6E2.7010209@alteeve.ca> <4FF214FB.7000906@bulbous.org> Message-ID: So my question is, do I have an error somehwere, or is clvm really actually not able to function without all nodes being active and able to access storage? Clvm need to be in a quorate cluster for work & if you use clvm in one node of the cluster i think the should has access to the storage your using the 3node to provide the quorum? esample: if one node of your two primary nodes goes down the it's still quorute, but if two node goes down and you are no using a quorum disk, you lose the quorum state I don't know why you use a node to privide the quorum, if you are use SAN why not use a lun for use as quorum disk All nodes in the cluster should has access to the storag 2012/7/2 urgrue > On 2/7/12 19:14, Digimer wrote: > >> On 07/02/2012 01:08 PM, urgrue wrote: >> >>> I'm trying to set up a 3-node cluster with clvm. Problem is, one node >>> can't access the storage, and I'm getting: >>> Error locking on node node3: Volume group for uuid not found: >>> whenever I try to activate the LVs on one of the working nodes. >>> >>> This can't be "by design", can it? >>> >> >> Does pvscan show the right device? Are all nodes in the cluster? What >> does 'cman_tool status' and 'dlm_tool ls' show? >> >> > Sorry, I realize now I was misleading, let me clarify: > The third node cannot access the storage, this is by design. I have three > datacenters but only two have access to the active storage. The third > datacenter only has an async copy, and will only activate (manually) in > case of a massive disaster (failure of both the other datacenters). > So I deliberately have a failover domain with only node1 and node2. > node3's function is to provide quorum, but also be able to be activated > (manually is fine) in case of a massive disaster. > In other words node3 is part of the cluster, but it can't see the storage > during normal operation. > Looking at it another way, it's kind of as if we had a 3-node cluster > where one node had an HBA failure but is otherwise working. Surely node1 > and node2 should be able to continue running the services? > So my question is, do I have an error somehwere, or is clvm really > actually not able to function without all nodes being active and able to > access storage? > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/**mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam at dotsec.com Mon Jul 2 23:17:42 2012 From: sam at dotsec.com (Sam Wilson) Date: Tue, 03 Jul 2012 09:17:42 +1000 Subject: [Linux-cluster] CLVM in a 3-node cluster In-Reply-To: References: <4FF1D5A4.3060105@bulbous.org> <4FF1D6E2.7010209@alteeve.ca> <4FF214FB.7000906@bulbous.org> Message-ID: <4FF22C16.5010300@dotsec.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 As I understand it, you could have the node as a quorum only node by running only corosync on it. However for DR it seems to me like you would actually want the storage replicated to Node3. So it seems logical to me that clvmd would have to be running on it. Cheers, Sam -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iF4EAREIAAYFAk/yLBUACgkQFdt86iEfl/e3wgD9FMJl355ta20pJfdSvfSDuJDU DK7jt6idjCAg1LNpFYIA/RswrmTCxdzWXETw1ny4WBOxKo5tDwYmKUBKq5UOdcuU =HNtS -----END PGP SIGNATURE----- From fdinitto at redhat.com Tue Jul 3 04:04:08 2012 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Tue, 03 Jul 2012 06:04:08 +0200 Subject: [Linux-cluster] CLVM in a 3-node cluster In-Reply-To: <4FF214FB.7000906@bulbous.org> References: <4FF1D5A4.3060105@bulbous.org> <4FF1D6E2.7010209@alteeve.ca> <4FF214FB.7000906@bulbous.org> Message-ID: <4FF26F38.3040705@redhat.com> On 07/02/2012 11:39 PM, urgrue wrote: > On 2/7/12 19:14, Digimer wrote: >> On 07/02/2012 01:08 PM, urgrue wrote: >>> I'm trying to set up a 3-node cluster with clvm. Problem is, one node >>> can't access the storage, and I'm getting: >>> Error locking on node node3: Volume group for uuid not found: >>> whenever I try to activate the LVs on one of the working nodes. >>> >>> This can't be "by design", can it? >> >> Does pvscan show the right device? Are all nodes in the cluster? What >> does 'cman_tool status' and 'dlm_tool ls' show? >> > > Sorry, I realize now I was misleading, let me clarify: > The third node cannot access the storage, this is by design. I have > three datacenters but only two have access to the active storage. The > third datacenter only has an async copy, and will only activate > (manually) in case of a massive disaster (failure of both the other > datacenters). > So I deliberately have a failover domain with only node1 and node2. > node3's function is to provide quorum, but also be able to be activated > (manually is fine) in case of a massive disaster. > In other words node3 is part of the cluster, but it can't see the > storage during normal operation. > Looking at it another way, it's kind of as if we had a 3-node cluster > where one node had an HBA failure but is otherwise working. Surely node1 > and node2 should be able to continue running the services? > So my question is, do I have an error somehwere, or is clvm really > actually not able to function without all nodes being active and able to > access storage? CLVM requires a consistent view of the storage from all nodes in the cluster. This is by design. A storage failure during operations (aka you start with all nodes able to access the storage and then downgrade) is handle correctly. Fabio From urgrue at bulbous.org Tue Jul 3 11:06:57 2012 From: urgrue at bulbous.org (urgrue) Date: Tue, 03 Jul 2012 14:06:57 +0300 Subject: [Linux-cluster] CLVM in a 3-node cluster In-Reply-To: <4FF26F38.3040705@redhat.com> References: <4FF1D5A4.3060105@bulbous.org> <4FF1D6E2.7010209@alteeve.ca> <4FF214FB.7000906@bulbous.org> <4FF26F38.3040705@redhat.com> Message-ID: <1341313617.20197.140661097201337.09C0B36C@webmail.messagingengine.com> On Tue, Jul 3, 2012, at 06:04, Fabio M. Di Nitto wrote: > CLVM requires a consistent view of the storage from all nodes in the > cluster. This is by design. > > A storage failure during operations (aka you start with all nodes able > to access the storage and then downgrade) is handle correctly. Ok, I understand. I find it a little curious though, since I don't see what the risk is in allowing startup as long as the cluster is quorate. Imagine you have a multi-node cluster that suffers a total outage - a wider infrastructure problem or some kind for example - and upon recovery one node is still out of the cluster for whatever reason. It's pretty common in my experience that larger outages result in many smaller resulting issues that take a while to clean-up. From queszama at yahoo.in Thu Jul 5 14:12:11 2012 From: queszama at yahoo.in (Zama Ques) Date: Thu, 5 Jul 2012 22:12:11 +0800 (SGT) Subject: [Linux-cluster] Options for fencing at the node level . In-Reply-To: <4FF1AD85.5060500@redhat.com> References: <1340937126.95735.YahooMailNeo@web193003.mail.sg3.yahoo.com> <4FF1AD85.5060500@redhat.com> Message-ID: <1341497531.6847.YahooMailNeo@web193001.mail.sg3.yahoo.com> ________________________________ From: Ryan O'Hara To: linux-cluster at redhat.com Sent: Monday, 2 July 2012 7:47 PM Subject: Re: [Linux-cluster] Options for fencing at the node level . On 06/28/2012 09:32 PM, Zama Ques wrote: > Hi All , > > I need to setup HA clustering using redhat cluster suite on two nodes , primary concern being high availability . Before trying it on production , I am trying to configure the setup on two desktop machines . For storage , I am creating a partition and sharing the partition as a iscsi target on a third machine . Would like to know what are the options for fencing available at the node level? .? I tried going through the conga interface for creating a > shared fence device , I could see one option is using GNBD . virtual machine fencing is there in the list but that is for xen based HA > cluster . scsi fencing is there , but as far as what I understand it does not support iscsi target as of now. Manual fencing is also there , and I am planning to use that , but would like to? know is there any other options are available for fencing at node level ? ?> SCSI fencing will work with iscsi if the iscsi target is SPC-3 compliant. The target must also support the preempt-and-abort SCSI subcommand. > It really depends on what iscsi target you are using. I've used fence_scsi with iscsi a few times and it has worked, but I know that some iscsi >targets have problems. Was actually trying to do the setup on two desktop nodes before doing it on production . So for that , has thought of using a third node and configure one of the partition on that node as iscsi target? and share it among the cluster nodes. Can we use fence_scsi to fence such linux based iscsi targets ? Thanks Zaman -------------- next part -------------- An HTML attachment was scrubbed... URL: From queszama at yahoo.in Thu Jul 5 14:37:43 2012 From: queszama at yahoo.in (Zama Ques) Date: Thu, 5 Jul 2012 22:37:43 +0800 (SGT) Subject: [Linux-cluster] cman service stucks during booting of cluster node Message-ID: <1341499063.54762.YahooMailNeo@web193002.mail.sg3.yahoo.com> Hi All, I am facing some issues with startup of? cluster nodes after configuring a node two cluster using xen virtualization and redhat cluster suite. The issue is that when i fence any of the cluster nodes using fence_xvm or by using conga interface ,? the cluster host while booting up gets stucked at starting the fencing component of the cman service . The boot process got halts there . Same happens when I reboot the host. But if do chkconfig cman off and start the cman service after the host completely boots , then? cman service start successfully without any delay including the fencing component . So , my understanding is that? there is some dependency for fencing component of cman service which is available after the host boots up . I am using xen fencing and iptables is disabled on both the nodes.? Please provide suggestions/steps how to troubleshoot this. Thanks Zaman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ali.bendriss at gmail.com Tue Jul 10 08:45:17 2012 From: ali.bendriss at gmail.com (Ali Bendriss) Date: Tue, 10 Jul 2012 10:45:17 +0200 Subject: [Linux-cluster] gfs2 quota tools Message-ID: <201207101045.17891.ali.bendriss@gmail.com> Hello, It's look like recent version of GFS2 use the standard linux quota tools, but I've tried the mainstream quota-tools (ver 4.00) without success. Which version sould be used ? thanks -- Ali -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhiteho at redhat.com Tue Jul 10 09:20:34 2012 From: swhiteho at redhat.com (Steven Whitehouse) Date: Tue, 10 Jul 2012 10:20:34 +0100 Subject: [Linux-cluster] gfs2 quota tools In-Reply-To: <201207101045.17891.ali.bendriss@gmail.com> References: <201207101045.17891.ali.bendriss@gmail.com> Message-ID: <1341912034.2717.0.camel@menhir> Hi, On Tue, 2012-07-10 at 10:45 +0200, Ali Bendriss wrote: > Hello, > > It's look like recent version of GFS2 use the standard linux quota > tools, > > but I've tried the mainstream quota-tools (ver 4.00) without success. > > Which version sould be used ? > > thanks > The quota tools should work with GFS2. Can you explain which kernel version you were using and what exactly didn't work? What mount options did you use? Steve. From ali.bendriss at gmail.com Tue Jul 10 10:11:38 2012 From: ali.bendriss at gmail.com (Ali Bendriss) Date: Tue, 10 Jul 2012 12:11:38 +0200 Subject: [Linux-cluster] gfs2 quota tools In-Reply-To: <1341912034.2717.0.camel@menhir> References: <201207101045.17891.ali.bendriss@gmail.com> <1341912034.2717.0.camel@menhir> Message-ID: <201207101211.38846.ali.bendriss@gmail.com> > Hi, > > On Tue, 2012-07-10 at 10:45 +0200, Ali Bendriss wrote: > > Hello, > > > > It's look like recent version of GFS2 use the standard linux quota > > tools, > > > > but I've tried the mainstream quota-tools (ver 4.00) without success. > > > > Which version sould be used ? > > > > thanks > > The quota tools should work with GFS2. Can you explain which kernel > version you were using and what exactly didn't work? What mount options > did you use? > > Steve. Sorry for the missing information: I'm running slackware with kernel : 3.4.3 cluster : 3.1.92 gfsutils: 3.1.4 The file system I want to use the quota with is /dev/mapper/shared-desktop on /home/csamba/desktop type gfs2 (rw,noatime,nodiratime,hostdata=jid=0,quota=on) first I was using gfs2_quota, I was able to init and set the quota for users but get command was wrong after (when the limit is reached). in ex: du -h /home/csamba/desktop/abendriss 19M /home/csamba/desktop/abendriss # gfs2_quota get -f /home/csamba/desktop/ -u abendriss -m user PARIS8\abendriss: limit: 20.0 warn: 0.0 value: 40810.8 gfs2_quota init -f /home/csamba/desktop/ -u abendriss -m mismatch: user 3000272: scan = 8, quotafile = 16 mismatch: user 3000208: scan = 8, quotafile = 16 mismatch: user 3000335: scan = 8, quotafile = 16 root at minnie:/# gfs2_quota get -f /home/csamba/desktop/ -u abendriss -m user PARIS8\abendriss: limit: 20.0 warn: 0.0 value: 18.0 Each time I need to call init to get the real value back. I was thinking that the value were updated each 60s but on my system it's not the case. The I tried then the quota-tools 4.00 (from source) and get: root at minnie:/var/tmp/quota-4/quota-tools# ./quotacheck -v -c -u /home/csamba/desktop/ quotacheck: Scanning /dev/dm-10 [/home/csamba/desktop] done quotacheck: Cannot stat old user quota file on: No such file or directory. Usage will not be substracted. quotacheck: Old group file name could not been determined. Usage will not be substracted. quotacheck: Checked 1102 directories and 1 files quotacheck: Cannot turn user quotas off on /dev/dm-10: Function not implemented Kernel won't know about changes quotacheck did. thanks, -- Ali -------------- next part -------------- An HTML attachment was scrubbed... URL: From akinoztopuz at yahoo.com Wed Jul 11 09:57:16 2012 From: akinoztopuz at yahoo.com (=?utf-8?B?QUtJTiDDv2ZmZmZmZmZmZmZkNlpUT1BVWg==?=) Date: Wed, 11 Jul 2012 02:57:16 -0700 (PDT) Subject: [Linux-cluster] CLVM in a 3-node cluster In-Reply-To: <4FF26F38.3040705@redhat.com> References: <4FF1D5A4.3060105@bulbous.org> <4FF1D6E2.7010209@alteeve.ca> <4FF214FB.7000906@bulbous.org> <4FF26F38.3040705@redhat.com> Message-ID: <1342000636.45149.YahooMailNeo@web125802.mail.ne1.yahoo.com> Hi ? I have 2-nodes cluster without quorum disks.? noticed a problem at below: ? ? when I want to move resources to other node it is failed?? to relocate services to other node and again services?? run the orginal node. ? but when I want to restart node it is ok ? could you have any ideas? ________________________________ From: Fabio M. Di Nitto To: linux-cluster at redhat.com Sent: Tuesday, July 3, 2012 7:04 AM Subject: Re: [Linux-cluster] CLVM in a 3-node cluster On 07/02/2012 11:39 PM, urgrue wrote: > On 2/7/12 19:14, Digimer wrote: >> On 07/02/2012 01:08 PM, urgrue wrote: >>> I'm trying to set up a 3-node cluster with clvm. Problem is, one node >>> can't access the storage, and I'm getting: >>> Error locking on node node3: Volume group for uuid not found: >>> whenever I try to activate the LVs on one of the working nodes. >>> >>> This can't be "by design", can it? >> >> Does pvscan show the right device? Are all nodes in the cluster? What >> does 'cman_tool status' and 'dlm_tool ls' show? >> > > Sorry, I realize now I was misleading, let me clarify: > The third node cannot access the storage, this is by design. I have > three datacenters but only two have access to the active storage. The > third datacenter only has an async copy, and will only activate > (manually) in case of a massive disaster (failure of both the other > datacenters). > So I deliberately have a failover domain with only node1 and node2. > node3's function is to provide quorum, but also be able to be activated > (manually is fine) in case of a massive disaster. > In other words node3 is part of the cluster, but it can't see the > storage during normal operation. > Looking at it another way, it's kind of as if we had a 3-node cluster > where one node had an HBA failure but is otherwise working. Surely node1 > and node2 should be able to continue running the services? > So my question is, do I have an error somehwere, or is clvm really > actually not able to function without all nodes being active and able to > access storage? CLVM requires a consistent view of the storage from all nodes in the cluster. This is by design. A storage failure during operations (aka you start with all nodes able to access the storage and then downgrade) is handle correctly. Fabio -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhiteho at redhat.com Wed Jul 11 11:07:44 2012 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 11 Jul 2012 12:07:44 +0100 Subject: [Linux-cluster] gfs2 quota tools In-Reply-To: <201207101211.38846.ali.bendriss@gmail.com> References: <201207101045.17891.ali.bendriss@gmail.com> <1341912034.2717.0.camel@menhir> <201207101211.38846.ali.bendriss@gmail.com> Message-ID: <1342004864.2700.28.camel@menhir> Hi, On Tue, 2012-07-10 at 12:11 +0200, Ali Bendriss wrote: > > Hi, > > > > > > On Tue, 2012-07-10 at 10:45 +0200, Ali Bendriss wrote: > > > > Hello, > > > > > > > > It's look like recent version of GFS2 use the standard linux quota > > > > tools, > > > > > > > > but I've tried the mainstream quota-tools (ver 4.00) without > success. > > > > > > > > Which version sould be used ? > > > > > > > > thanks > > > > > > The quota tools should work with GFS2. Can you explain which kernel > > > version you were using and what exactly didn't work? What mount > options > > > did you use? > > > > > > Steve. > > Sorry for the missing information: > > I'm running slackware with > > kernel : 3.4.3 > > cluster : 3.1.92 > > gfsutils: 3.1.4 > > The file system I want to use the quota with is > > /dev/mapper/shared-desktop on /home/csamba/desktop type gfs2 > (rw,noatime,nodiratime,hostdata=jid=0,quota=on) > That looks ok... > first I was using gfs2_quota, I was able to init and set the quota for > users > > but get command was wrong after (when the limit is reached). > > in ex: > > du -h /home/csamba/desktop/abendriss > > 19M /home/csamba/desktop/abendriss > > # gfs2_quota get -f /home/csamba/desktop/ -u abendriss -m > > user PARIS8\abendriss: limit: 20.0 warn: 0.0 value: 40810.8 > > gfs2_quota init -f /home/csamba/desktop/ -u abendriss -m > > mismatch: user 3000272: scan = 8, quotafile = 16 > > mismatch: user 3000208: scan = 8, quotafile = 16 > > mismatch: user 3000335: scan = 8, quotafile = 16 > > root at minnie:/# gfs2_quota get -f /home/csamba/desktop/ -u abendriss -m > > user PARIS8\abendriss: limit: 20.0 warn: 0.0 value: 18.0 > > Each time I need to call init to get the real value back. I was > thinking that the value were updated each 60s but on my system it's > not the case. > The GFS2 quota system is such that it is possible, depending on circumstances to sometimes exceed the quota limits. There are settings which allow you to bound the error in time and space, with the tradeoff being that the more accurate the quotas, the greater the overhead of the quota management system. That said, the number of blocks should be correct given a sync of the quota data on the node in question, in any case. Did you sync the quota data before examining the quota file? > The I tried then the quota-tools 4.00 (from source) and get: > > root at minnie:/var/tmp/quota-4/quota-tools# ./quotacheck -v -c > -u /home/csamba/desktop/ > > quotacheck: Scanning /dev/dm-10 [/home/csamba/desktop] done > > quotacheck: Cannot stat old user quota file on: No such file or > directory. Usage will not be substracted. > > quotacheck: Old group file name could not been determined. Usage will > not be substracted. > > quotacheck: Checked 1102 directories and 1 files > > quotacheck: Cannot turn user quotas off on /dev/dm-10: Function not > implemented > This is true. You can't turn quotas on and off using the quota tools, but only by using the mount arguments (and mount -o remount). I don't think that should be required in order to run quotacheck, but Abhi can probably confirm whether that is the case or not, Steve. > Kernel won't know about changes quotacheck did. > > thanks, > > -- > > Ali > From urgrue at bulbous.org Wed Jul 11 11:26:51 2012 From: urgrue at bulbous.org (urgrue) Date: Wed, 11 Jul 2012 14:26:51 +0300 Subject: [Linux-cluster] Third node unable to join cluster Message-ID: <1342006011.15912.140661100612353.756F3A4D@webmail.messagingengine.com> I have a third node unable to join my cluster (RHEL 6.3). It fails at 'joining fence domain'. Though I suspect that's a bit of a red herring. The log isn't telling me much, even though I've increased verbosity. Can someone point me in the right direction as to how to debug? The error: Joining fence domain... fence_tool: waiting for fenced to join the fence group. fence_tool: fenced not running, no lockfile >From fenced.log: Jul 11 13:17:54 fenced fenced 3.0.12.1 started Jul 11 13:17:55 fenced cpg_join fenced:daemon ... And then the only errors/warning I see in corosync.log: Jul 11 13:17:54 corosync [CMAN ] daemon: About to process command Jul 11 13:17:54 corosync [CMAN ] memb: command to process is 90 Jul 11 13:17:54 corosync [CMAN ] memb: command return code is 0 Jul 11 13:17:54 corosync [CMAN ] daemon: Returning command data. length = 440 Jul 11 13:17:54 corosync [CMAN ] daemon: sending reply 40000090 to fd 18 Jul 11 13:17:54 corosync [CMAN ] daemon: read 0 bytes from fd 18 Jul 11 13:17:54 corosync [CMAN ] daemon: Freed 0 queued messages Jul 11 13:17:54 corosync [TOTEM ] Received ringid(10.128.32.22:28272) seq 61 Jul 11 13:17:54 corosync [TOTEM ] Delivering 2 to 61 Jul 11 13:17:54 corosync [TOTEM ] Delivering 2 to 61 Jul 11 13:17:54 corosync [TOTEM ] FAILED TO RECEIVE Jul 11 13:17:54 corosync [TOTEM ] entering GATHER state from 6. Jul 11 13:17:54 corosync [CONFDB] lib_init_fn: conn=0xd78100 Jul 11 13:17:54 corosync [CONFDB] exit_fn for conn=0xd78100 Jul 11 13:17:54 corosync [CONFDB] lib_init_fn: conn=0xd78100 Jul 11 13:17:54 corosync [CONFDB] exit_fn for conn=0xd78100 Jul 11 13:17:54 corosync [CONFDB] lib_init_fn: conn=0xd78100 Jul 11 13:17:54 corosync [CONFDB] exit_fn for conn=0xd78100 Jul 11 13:17:54 corosync [CMAN ] daemon: read 20 bytes from fd 18 Jul 11 13:17:59 corosync [CMAN ] daemon: About to process command Jul 11 13:17:59 corosync [CMAN ] memb: command to process is 90 Jul 11 13:17:59 corosync [CMAN ] memb: cmd_get_node failed: id=0, name='^?' Jul 11 13:17:59 corosync [CMAN ] memb: command return code is -2 Jul 11 13:17:59 corosync [CMAN ] daemon: Returning command data. length = 0 Jul 11 13:17:59 corosync [CMAN ] daemon: sending reply 40000090 to fd 23 Jul 11 13:17:59 corosync [CMAN ] daemon: read 0 bytes from fd 23 Jul 11 13:17:59 corosync [CMAN ] daemon: Freed 0 queued messages Jul 11 13:17:59 corosync [CMAN ] daemon: read 20 bytes from fd 23 Jul 11 13:17:59 corosync [CMAN ] daemon: client command is 5 Jul 11 13:17:59 corosync [CMAN ] daemon: About to process command Jul 11 13:17:59 corosync [CMAN ] memb: command to process is 5 Jul 11 13:17:59 corosync [CMAN ] daemon: Returning command data. length = 0 Jul 11 13:17:59 corosync [CMAN ] daemon: sending reply 40000005 to fd 23 Back in fenced.log: Jul 11 13:18:05 fenced daemon cpg_join error retrying Jul 11 13:18:15 fenced daemon cpg_join error retrying Jul 11 13:18:21 fenced daemon cpg_join error 2 Jul 11 13:18:23 fenced cpg_leave fenced:daemon ... Jul 11 13:18:23 fenced daemon cpg_leave error 9 And in /var/log/messages: Jul 11 13:17:50 server3 corosync[31116]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Jul 11 13:17:50 server3 corosync[31116]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Jul 11 13:17:50 server3 corosync[31116]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jul 11 13:17:50 server3 corosync[31116]: [QUORUM] Members[1]: 3 Jul 11 13:17:50 server3 corosync[31116]: [QUORUM] Members[1]: 3 Jul 11 13:17:50 server3 ntpd[1747]: synchronized to 10.135.136.17, stratum 1 Jul 11 13:17:50 server3 corosync[31116]: [CPG ] chosen downlist: sender r(0) ip(10.130.32.32) ; members(old:0 left:0) Jul 11 13:17:50 server3 corosync[31116]: [MAIN ] Completed service synchronization, ready to provide service. Jul 11 13:17:50 server3 corosync[31116]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jul 11 13:17:50 server3 corosync[31116]: [CMAN ] quorum regained, resuming activity Jul 11 13:17:50 server3 corosync[31116]: [QUORUM] This node is within the primary component and will provide service. Jul 11 13:17:50 server3 corosync[31116]: [QUORUM] Members[2]: 2 3 Jul 11 13:17:50 server3 corosync[31116]: [QUORUM] Members[2]: 2 3 Jul 11 13:17:54 server3 corosync[31116]: [TOTEM ] FAILED TO RECEIVE Jul 11 13:17:54 server3 fenced[31174]: fenced 3.0.12.1 started Jul 11 13:17:55 server3 dlm_controld[31192]: dlm_controld 3.0.12.1 started Jul 11 13:18:05 server3 dlm_controld[31192]: daemon cpg_join error retrying Jul 11 13:18:05 server3 fenced[31174]: daemon cpg_join error retrying Jul 11 13:18:05 server3 gfs_controld[31264]: gfs_controld 3.0.12.1 started Jul 11 13:18:15 server3 dlm_controld[31192]: daemon cpg_join error retrying Jul 11 13:18:15 server3 fenced[31174]: daemon cpg_join error retrying Jul 11 13:18:15 server3 gfs_controld[31264]: daemon cpg_join error retrying Jul 11 13:18:19 server3 abrtd: Directory 'ccpp-2012-07-11-13:18:18-31116' creation detected Jul 11 13:18:19 server3 abrt[31313]: Saved core dump of pid 31116 (/usr/sbin/corosync) to /var/spool/abrt/ccpp-2012-07-11-13:18:18-31116 (47955968 Jul 11 13:18:21 server3 dlm_controld[31192]: daemon cpg_join error 2 Jul 11 13:18:21 server3 gfs_controld[31264]: daemon cpg_join error 2 Jul 11 13:18:21 server3 fenced[31174]: daemon cpg_join error 2 Jul 11 13:18:23 server3 kernel: dlm: closing connection to node 3 Jul 11 13:18:23 server3 kernel: dlm: closing connection to node 2 Jul 11 13:18:23 server3 dlm_controld[31192]: daemon cpg_leave error 9 Jul 11 13:18:23 server3 gfs_controld[31264]: daemon cpg_leave error 9 Jul 11 13:18:23 server3 fenced[31174]: daemon cpg_leave error 9 Jul 11 13:18:30 server3 abrtd: Sending an email... Jul 11 13:18:30 server3 abrtd: Email was sent to: root at localhost Jul 11 13:18:30 server3 abrtd: Duplicate: UUID Jul 11 13:18:30 server3 abrtd: DUP_OF_DIR: /var/spool/abrt/ccpp-2012-07-06-10:30:40-22107 Jul 11 13:18:30 server3 abrtd: Problem directory is a duplicate of /var/spool/abrt/ccpp-2012-07-06-10:30:40-22107 Jul 11 13:18:30 server3 abrtd: Deleting problem directory ccpp-2012-07-11-13:18:18-31116 (dup of ccpp-2012-07-06-10:30:40-22107) Any tips much appreciated. From lists at alteeve.ca Wed Jul 11 14:04:09 2012 From: lists at alteeve.ca (Digimer) Date: Wed, 11 Jul 2012 10:04:09 -0400 Subject: [Linux-cluster] CLVM in a 3-node cluster In-Reply-To: <1342000636.45149.YahooMailNeo@web125802.mail.ne1.yahoo.com> References: <4FF1D5A4.3060105@bulbous.org> <4FF1D6E2.7010209@alteeve.ca> <4FF214FB.7000906@bulbous.org> <4FF26F38.3040705@redhat.com> <1342000636.45149.YahooMailNeo@web125802.mail.ne1.yahoo.com> Message-ID: <4FFD87D9.3010109@alteeve.ca> Please start a new thread, with a new subject, and include your cluster.conf file please. Digimer On 07/11/2012 05:57 AM, AKIN ?ffffffffffd6ZTOPUZ wrote: > Hi > > I have 2-nodes cluster without quorum disks.? noticed a problem at below: > > > when I want to move resources to other node it is failed to relocate > services to other node and again services run the orginal node. > > but when I want to restart node it is ok > > could you have any ideas? > > *From:* Fabio M. Di Nitto > *To:* linux-cluster at redhat.com > *Sent:* Tuesday, July 3, 2012 7:04 AM > *Subject:* Re: [Linux-cluster] CLVM in a 3-node cluster > > On 07/02/2012 11:39 PM, urgrue wrote: >> On 2/7/12 19:14, Digimer wrote: >>> On 07/02/2012 01:08 PM, urgrue wrote: >>>> I'm trying to set up a 3-node cluster with clvm. Problem is, one node >>>> can't access the storage, and I'm getting: >>>> Error locking on node node3: Volume group for uuid not found: >>>> whenever I try to activate the LVs on one of the working nodes. >>>> >>>> This can't be "by design", can it? >>> >>> Does pvscan show the right device? Are all nodes in the cluster? What >>> does 'cman_tool status' and 'dlm_tool ls' show? >>> >> >> Sorry, I realize now I was misleading, let me clarify: >> The third node cannot access the storage, this is by design. I have >> three datacenters but only two have access to the active storage. The >> third datacenter only has an async copy, and will only activate >> (manually) in case of a massive disaster (failure of both the other >> datacenters). >> So I deliberately have a failover domain with only node1 and node2. >> node3's function is to provide quorum, but also be able to be activated >> (manually is fine) in case of a massive disaster. >> In other words node3 is part of the cluster, but it can't see the >> storage during normal operation. >> Looking at it another way, it's kind of as if we had a 3-node cluster >> where one node had an HBA failure but is otherwise working. Surely node1 >> and node2 should be able to continue running the services? >> So my question is, do I have an error somehwere, or is clvm really >> actually not able to function without all nodes being active and able to >> access storage? > > CLVM requires a consistent view of the storage from all nodes in the > cluster. This is by design. > > A storage failure during operations (aka you start with all nodes able > to access the storage and then downgrade) is handle correctly. > > Fabio > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Digimer Papers and Projects: https://alteeve.com From urgrue at bulbous.org Thu Jul 12 07:57:28 2012 From: urgrue at bulbous.org (urgrue) Date: Thu, 12 Jul 2012 10:57:28 +0300 Subject: [Linux-cluster] Third node unable to join cluster In-Reply-To: <1342006011.15912.140661100612353.756F3A4D@webmail.messagingengine.com> References: <1342006011.15912.140661100612353.756F3A4D@webmail.messagingengine.com> Message-ID: <1342079848.3795.140661101025673.1412A83D@webmail.messagingengine.com> Solved. It seems the issue was that it was a two-node cluster and adding the third means the cluster has to reconfigure itself from a 2-node to a 3-node cluster which requires a restart of the cluster. I would've expected it could give a clear error message regarding this but seems it just silently fails instead. On Wed, Jul 11, 2012, at 14:26, urgrue wrote: > I have a third node unable to join my cluster (RHEL 6.3). It fails at > 'joining fence domain'. Though I suspect that's a bit of a red herring. > The log isn't telling me much, even though I've increased verbosity. Can > someone point me in the right direction as to how to debug? > > The error: > Joining fence domain... fence_tool: waiting for fenced to join the > fence group. > fence_tool: fenced not running, no lockfile > > >From fenced.log: > Jul 11 13:17:54 fenced fenced 3.0.12.1 started > Jul 11 13:17:55 fenced cpg_join fenced:daemon ... > > And then the only errors/warning I see in corosync.log: > Jul 11 13:17:54 corosync [CMAN ] daemon: About to process command > Jul 11 13:17:54 corosync [CMAN ] memb: command to process is 90 > Jul 11 13:17:54 corosync [CMAN ] memb: command return code is 0 > Jul 11 13:17:54 corosync [CMAN ] daemon: Returning command data. length > = 440 > Jul 11 13:17:54 corosync [CMAN ] daemon: sending reply 40000090 to fd > 18 > Jul 11 13:17:54 corosync [CMAN ] daemon: read 0 bytes from fd 18 > Jul 11 13:17:54 corosync [CMAN ] daemon: Freed 0 queued messages > Jul 11 13:17:54 corosync [TOTEM ] Received ringid(10.128.32.22:28272) > seq 61 > Jul 11 13:17:54 corosync [TOTEM ] Delivering 2 to 61 > Jul 11 13:17:54 corosync [TOTEM ] Delivering 2 to 61 > Jul 11 13:17:54 corosync [TOTEM ] FAILED TO RECEIVE > Jul 11 13:17:54 corosync [TOTEM ] entering GATHER state from 6. > Jul 11 13:17:54 corosync [CONFDB] lib_init_fn: conn=0xd78100 > Jul 11 13:17:54 corosync [CONFDB] exit_fn for conn=0xd78100 > Jul 11 13:17:54 corosync [CONFDB] lib_init_fn: conn=0xd78100 > Jul 11 13:17:54 corosync [CONFDB] exit_fn for conn=0xd78100 > Jul 11 13:17:54 corosync [CONFDB] lib_init_fn: conn=0xd78100 > Jul 11 13:17:54 corosync [CONFDB] exit_fn for conn=0xd78100 > Jul 11 13:17:54 corosync [CMAN ] daemon: read 20 bytes from fd 18 > > > Jul 11 13:17:59 corosync [CMAN ] daemon: About to process command > Jul 11 13:17:59 corosync [CMAN ] memb: command to process is 90 > Jul 11 13:17:59 corosync [CMAN ] memb: cmd_get_node failed: id=0, > name='^?' > Jul 11 13:17:59 corosync [CMAN ] memb: command return code is -2 > Jul 11 13:17:59 corosync [CMAN ] daemon: Returning command data. length > = 0 > Jul 11 13:17:59 corosync [CMAN ] daemon: sending reply 40000090 to fd > 23 > Jul 11 13:17:59 corosync [CMAN ] daemon: read 0 bytes from fd 23 > Jul 11 13:17:59 corosync [CMAN ] daemon: Freed 0 queued messages > Jul 11 13:17:59 corosync [CMAN ] daemon: read 20 bytes from fd 23 > Jul 11 13:17:59 corosync [CMAN ] daemon: client command is 5 > Jul 11 13:17:59 corosync [CMAN ] daemon: About to process command > Jul 11 13:17:59 corosync [CMAN ] memb: command to process is 5 > Jul 11 13:17:59 corosync [CMAN ] daemon: Returning command data. length > = 0 > Jul 11 13:17:59 corosync [CMAN ] daemon: sending reply 40000005 to fd > 23 > > Back in fenced.log: > Jul 11 13:18:05 fenced daemon cpg_join error retrying > Jul 11 13:18:15 fenced daemon cpg_join error retrying > Jul 11 13:18:21 fenced daemon cpg_join error 2 > Jul 11 13:18:23 fenced cpg_leave fenced:daemon ... > Jul 11 13:18:23 fenced daemon cpg_leave error 9 > > And in /var/log/messages: > Jul 11 13:17:50 server3 corosync[31116]: [SERV ] Service engine > loaded: corosync cluster quorum service v0.1 > Jul 11 13:17:50 server3 corosync[31116]: [MAIN ] Compatibility mode > set to whitetank. Using V1 and V2 of the synchronization engine. > Jul 11 13:17:50 server3 corosync[31116]: [TOTEM ] A processor joined > or left the membership and a new membership was formed. > Jul 11 13:17:50 server3 corosync[31116]: [QUORUM] Members[1]: 3 > Jul 11 13:17:50 server3 corosync[31116]: [QUORUM] Members[1]: 3 > Jul 11 13:17:50 server3 ntpd[1747]: synchronized to 10.135.136.17, > stratum 1 > Jul 11 13:17:50 server3 corosync[31116]: [CPG ] chosen downlist: > sender r(0) ip(10.130.32.32) ; members(old:0 left:0) > Jul 11 13:17:50 server3 corosync[31116]: [MAIN ] Completed service > synchronization, ready to provide service. > Jul 11 13:17:50 server3 corosync[31116]: [TOTEM ] A processor joined > or left the membership and a new membership was formed. > Jul 11 13:17:50 server3 corosync[31116]: [CMAN ] quorum regained, > resuming activity > Jul 11 13:17:50 server3 corosync[31116]: [QUORUM] This node is within > the primary component and will provide service. > Jul 11 13:17:50 server3 corosync[31116]: [QUORUM] Members[2]: 2 3 > Jul 11 13:17:50 server3 corosync[31116]: [QUORUM] Members[2]: 2 3 > Jul 11 13:17:54 server3 corosync[31116]: [TOTEM ] FAILED TO RECEIVE > Jul 11 13:17:54 server3 fenced[31174]: fenced 3.0.12.1 started > Jul 11 13:17:55 server3 dlm_controld[31192]: dlm_controld 3.0.12.1 > started > Jul 11 13:18:05 server3 dlm_controld[31192]: daemon cpg_join error > retrying > Jul 11 13:18:05 server3 fenced[31174]: daemon cpg_join error retrying > Jul 11 13:18:05 server3 gfs_controld[31264]: gfs_controld 3.0.12.1 > started > Jul 11 13:18:15 server3 dlm_controld[31192]: daemon cpg_join error > retrying > Jul 11 13:18:15 server3 fenced[31174]: daemon cpg_join error retrying > Jul 11 13:18:15 server3 gfs_controld[31264]: daemon cpg_join error > retrying > Jul 11 13:18:19 server3 abrtd: Directory > 'ccpp-2012-07-11-13:18:18-31116' creation detected > Jul 11 13:18:19 server3 abrt[31313]: Saved core dump of pid 31116 > (/usr/sbin/corosync) to /var/spool/abrt/ccpp-2012-07-11-13:18:18-31116 > (47955968 > Jul 11 13:18:21 server3 dlm_controld[31192]: daemon cpg_join error 2 > Jul 11 13:18:21 server3 gfs_controld[31264]: daemon cpg_join error 2 > Jul 11 13:18:21 server3 fenced[31174]: daemon cpg_join error 2 > Jul 11 13:18:23 server3 kernel: dlm: closing connection to node 3 > Jul 11 13:18:23 server3 kernel: dlm: closing connection to node 2 > Jul 11 13:18:23 server3 dlm_controld[31192]: daemon cpg_leave error 9 > Jul 11 13:18:23 server3 gfs_controld[31264]: daemon cpg_leave error 9 > Jul 11 13:18:23 server3 fenced[31174]: daemon cpg_leave error 9 > Jul 11 13:18:30 server3 abrtd: Sending an email... > Jul 11 13:18:30 server3 abrtd: Email was sent to: root at localhost > Jul 11 13:18:30 server3 abrtd: Duplicate: UUID > Jul 11 13:18:30 server3 abrtd: DUP_OF_DIR: > /var/spool/abrt/ccpp-2012-07-06-10:30:40-22107 > Jul 11 13:18:30 server3 abrtd: Problem directory is a duplicate of > /var/spool/abrt/ccpp-2012-07-06-10:30:40-22107 > Jul 11 13:18:30 server3 abrtd: Deleting problem directory > ccpp-2012-07-11-13:18:18-31116 (dup of ccpp-2012-07-06-10:30:40-22107) > > > Any tips much appreciated. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From akinoztopuz at yahoo.com Thu Jul 12 08:20:36 2012 From: akinoztopuz at yahoo.com (=?utf-8?B?QUtJTiDDv2ZmZmZmZmZmZmZkNlpUT1BVWg==?=) Date: Thu, 12 Jul 2012 01:20:36 -0700 (PDT) Subject: [Linux-cluster] service relocate problem in 2 nodes cluster Message-ID: <1342081236.45360.YahooMailNeo@web125802.mail.ne1.yahoo.com> ????Hello ? I have 2 nodes clsuter without quorum disk. ? I saw a problem when I moved to services to other node. ? disk? loyout is iscsi . ? I th?nk problem is about gfs. when I stop service in node1? and related file systems(included in service) are unmounted from that node and I want to mount it on node2 manually ?, I?am tak?ng a message about resource busy.? ? [root at clsn2 ~]# mount -t gfs2? /dev/mapper/SAPClusterVG_d7-SAPClusterLV_d7 /usr/sap/PRO/ASCS01 /sbin/mount.gfs2: /dev/mapper/SAPClusterVG_d7-SAPClusterLV_d7 already mounted or /usr/sap/PRO/ASCS01 busy ? ? Could you have any ideas? ? ? ? cluster.conf is at the below: ? ?xml version="1.0"?> ??????? ??????? ??????????????? ??????????????????????? ??????????????????????????????? ??????????????????????????????????????? ??????????????????????????????? ??????????????????????? ??????????????? ??????????????? ??????????????????????? ??????????????????????????????? ??????????????????????????????????????? ??????????????????????????????? ??????????????????????? ??????????????? ??????? ??????? ??????? ??????????????? ??????????????? ??????????????? ??????????????? ??????????????? ??????????????? ??????????????? ??????????????? ??????? ??????? ??????????????? ??????????????????????? ??????????????????????????????? ??????????????????????????????? ??????????????????????? ??????????????? ??????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????? ??????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????? ??????????????? ??????????????????????? ??????????????????????? ??????????????? ??????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlopmart at gmail.com Thu Jul 12 09:27:49 2012 From: carlopmart at gmail.com (C. L. Martinez) Date: Thu, 12 Jul 2012 11:27:49 +0200 Subject: [Linux-cluster] Problems using fence_virt as a fence agent for two kvm guests Message-ID: Hi all, I have installed two kvm guests (CentOS 6.3) to do some tests using RHCS under a CentOS 6.3 kvm host. As a fence device I am trying to use fence_virt, but it doesn't works for me. fence_virt.conf in kvm host is: fence_virtd { listener = "multicast"; backend = "libvirt"; } listeners { multicast { key_file = "/etc/fence_virt.key"; interface = "siemif"; address = "225.0.0.12"; family = "ipv4"; } } backends { libvirt { uri = "qemu:///system"; } } fence_virt.key is located under /etc directory: -r-------- 1 root root 18 Jul 12 09:48 /etc/fence_virt.key cluster.conf on both kvm guest nodes is: of course, fence_virt.key is copied under /etc/cluster dir in both nodes. In cosnode01 I see this error: fenced[4074]: fence cosnode02.domain.local dev 0.0 agent fence_virt result: error from agent fenced[4074]: fence cosnode02.domain.loca failed What am I doing wrong?? Do I need to modify libvirtd.conf to listen in siemif interface?? Thanks. From a_mdl at mail.ru Thu Jul 12 09:52:41 2012 From: a_mdl at mail.ru (=?UTF-8?B?RGVuaXMgIE1lZHZlZGV2?=) Date: Thu, 12 Jul 2012 13:52:41 +0400 Subject: [Linux-cluster] =?utf-8?q?2-node_or_degraded_3-nodes=3F?= Message-ID: <1342086761.686739545@f323.mail.ru> If I will plan to add more nodes later, but have only 2 right now, is it better to make 2-nodes cluster or degraded 3 nodes? I recently heard that you cannot add more nodes to 2-nodes cluster without a clusterwide reboot. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdinitto at redhat.com Thu Jul 12 10:32:24 2012 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Thu, 12 Jul 2012 12:32:24 +0200 Subject: [Linux-cluster] 2-node or degraded 3-nodes? In-Reply-To: <1342086761.686739545@f323.mail.ru> References: <1342086761.686739545@f323.mail.ru> Message-ID: <4FFEA7B8.4090401@redhat.com> On 7/12/2012 11:52 AM, Denis Medvedev wrote: > If I will plan to add more nodes later, but have only 2 right now, > is it better to make 2-nodes cluster or degraded 3 nodes? > I recently heard that you cannot add more nodes to 2-nodes cluster > without a clusterwide reboot. Both have advantages and disadvantages. In your position, I would make a 2 node cluster and the schedule downtime to add more nodes later on. The downtime will give you time to test the new nodes, test service relocation, fencing and so on... that no matter how good you are as sysadmin, it?s good practice to do before placing the cluster in production anyway. Fabio From linuxis4me at gmail.com Thu Jul 12 16:19:50 2012 From: linuxis4me at gmail.com (linux admin) Date: Thu, 12 Jul 2012 21:49:50 +0530 Subject: [Linux-cluster] Cluster documents Message-ID: Hi, Can somebody provide me the document or SOP to make a HA-Cluster . I am new in the Clustering filed .I want to learn HA-Cluster please provide me configuration steps to make the Cluster. -- Thanks Ranveer singh -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Thu Jul 12 16:35:27 2012 From: lists at alteeve.ca (Digimer) Date: Thu, 12 Jul 2012 12:35:27 -0400 Subject: [Linux-cluster] Cluster documents In-Reply-To: References: Message-ID: <4FFEFCCF.6060707@alteeve.ca> On 07/12/2012 12:19 PM, linux admin wrote: > > Hi, > > Can somebody provide me the document or SOP to make a HA-Cluster . I am > new in the Clustering filed .I want to learn HA-Cluster > > please provide me configuration steps to make the Cluster. > -- > Thanks > Ranveer singh "Cluster" is a very broad term. What exactly are you trying to make highly available? What OS/distro? If it's for VMs on RHEL / Centos; https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial Digimer -- Digimer Papers and Projects: https://alteeve.com From washer at trlp.com Thu Jul 12 16:49:32 2012 From: washer at trlp.com (James Washer) Date: Thu, 12 Jul 2012 09:49:32 -0700 Subject: [Linux-cluster] Cluster documents In-Reply-To: References: Message-ID: Have you read the Redhat Cluster documentation? It's a good place to start. On Thu, Jul 12, 2012 at 9:19 AM, linux admin wrote: > > Hi, > > Can somebody provide me the document or SOP to make a HA-Cluster . I am > new in the Clustering filed .I want to learn HA-Cluster > > please provide me configuration steps to make the Cluster. > -- > Thanks > Ranveer singh > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- - jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From delete at fedoraproject.org Thu Jul 12 22:38:17 2012 From: delete at fedoraproject.org (Matias Kreder) Date: Thu, 12 Jul 2012 19:38:17 -0300 Subject: [Linux-cluster] gfs_fsck estimation Message-ID: Hi, I'm trying to find a method to estimate the time that gfs_fsck will take in a specific server. I have seen a lot of different results. Do you know of any method/procedure already written? If not, which variables should I consider to make an estimation? I'm thinking on considering: - filesystem size - number of Journals - CPU speed/number and memory capacity Any thoughts? Regards Matias Kreder From rpeterso at redhat.com Fri Jul 13 12:18:23 2012 From: rpeterso at redhat.com (Bob Peterson) Date: Fri, 13 Jul 2012 08:18:23 -0400 (EDT) Subject: [Linux-cluster] gfs_fsck estimation In-Reply-To: Message-ID: <82f3f5b9-e04f-4619-b41b-f9a049d5b403@zmail12.collab.prod.int.phx2.redhat.com> ----- Original Message ----- | Hi, | | I'm trying to find a method to estimate the time that gfs_fsck will | take in a specific server. I have seen a lot of different results. | Do you know of any method/procedure already written? If not, which | variables should I consider to make an estimation? | I'm thinking on considering: | - filesystem size | - number of Journals | - CPU speed/number and memory capacity | | Any thoughts? | | Regards | Matias Kreder Hi Matias, I don't think it's possible to estimate the run time of gfs_fsck. If the file system is clean, it should be doable, but the problem is that different kinds of corruption cause especially long delays, and that corruption is unpredictable. Another thing to be aware of: Starting with RHEL6.3, the fsck.gfs2 is now able to analyze and repair GFS1 file systems as well as GFS2, and it is orders of magnitude faster. It's also much more accurate in its analysis and more able to repair corruption that gfs_fsck would just give up and throw away. Regards, Bob Peterson Red Hat File Systems From carlopmart at gmail.com Fri Jul 13 12:26:35 2012 From: carlopmart at gmail.com (C. L. Martinez) Date: Fri, 13 Jul 2012 14:26:35 +0200 Subject: [Linux-cluster] Problems using fence_virt as a fence agent for two kvm guests In-Reply-To: References: Message-ID: On Thu, Jul 12, 2012 at 11:27 AM, C. L. Martinez wrote: > Hi all, > > I have installed two kvm guests (CentOS 6.3) to do some tests using > RHCS under a CentOS 6.3 kvm host. As a fence device I am trying to use > fence_virt, but it doesn't works for me. > > fence_virt.conf in kvm host is: > > fence_virtd { > listener = "multicast"; > backend = "libvirt"; > } > > listeners { > multicast { > key_file = "/etc/fence_virt.key"; > interface = "siemif"; > address = "225.0.0.12"; > family = "ipv4"; > } > } > > backends { > libvirt { > uri = "qemu:///system"; > } > } > > fence_virt.key is located under /etc directory: > > -r-------- 1 root root 18 Jul 12 09:48 /etc/fence_virt.key > > cluster.conf on both kvm guest nodes is: > > > > > > > > > > > > > > > > > > > > > > multicast_address="225.0.0.12" key_file="/etc/cluster/fence_virt.key" > name="kvm_cosnode01"/> > multicast_address="225.0.0.12" key_file="/etc/cluster/fence_virt.key" > name="kvm_cosnode02"/> > > > > > > > > > > > > restricted="1"> > > > > restricted="1"> > > > > > > > > of course, fence_virt.key is copied under /etc/cluster dir in both nodes. > > In cosnode01 I see this error: > > fenced[4074]: fence cosnode02.domain.local dev 0.0 agent fence_virt > result: error from agent > fenced[4074]: fence cosnode02.domain.loca failed > > What am I doing wrong?? Do I need to modify libvirtd.conf to listen in > siemif interface?? > > Thanks. Please, any help?? From mkreder at gmail.com Fri Jul 13 16:04:36 2012 From: mkreder at gmail.com (Matias Kreder) Date: Fri, 13 Jul 2012 13:04:36 -0300 Subject: [Linux-cluster] gfs_fsck estimation In-Reply-To: <82f3f5b9-e04f-4619-b41b-f9a049d5b403@zmail12.collab.prod.int.phx2.redhat.com> References: <82f3f5b9-e04f-4619-b41b-f9a049d5b403@zmail12.collab.prod.int.phx2.redhat.com> Message-ID: On Fri, Jul 13, 2012 at 9:18 AM, Bob Peterson wrote: > ----- Original Message ----- > | Hi, > | > | I'm trying to find a method to estimate the time that gfs_fsck will > | take in a specific server. I have seen a lot of different results. > | Do you know of any method/procedure already written? If not, which > | variables should I consider to make an estimation? > | I'm thinking on considering: > | - filesystem size > | - number of Journals > | - CPU speed/number and memory capacity > | > | Any thoughts? > | > | Regards > | Matias Kreder > > Hi Matias, > > I don't think it's possible to estimate the run time of gfs_fsck. > If the file system is clean, it should be doable, but the problem is > that different kinds of corruption cause especially long delays, > and that corruption is unpredictable. > > Another thing to be aware of: Starting with RHEL6.3, the fsck.gfs2 > is now able to analyze and repair GFS1 file systems as well as GFS2, > and it is orders of magnitude faster. It's also much more accurate in > its analysis and more able to repair corruption that gfs_fsck would > just give up and throw away. > > Regards, > > Bob Peterson > Red Hat File Systems > Bob, Thanks for the explanation. I didn't give you the full scenario. I'm looking to estimate the time of fsck before GFS to GFS2 conversion so I can assume that filesystems are clean prior to the fsck as they are mounted and non-corrupted filesystems. Regards Matias Kreder > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From rpeterso at redhat.com Fri Jul 13 16:17:53 2012 From: rpeterso at redhat.com (Bob Peterson) Date: Fri, 13 Jul 2012 12:17:53 -0400 (EDT) Subject: [Linux-cluster] gfs_fsck estimation In-Reply-To: Message-ID: <3884b756-a978-4ce4-9afb-92b39c1ea97d@zmail12.collab.prod.int.phx2.redhat.com> | Bob, | | Thanks for the explanation. I didn't give you the full scenario. I'm | looking to estimate the time of fsck before GFS to GFS2 conversion so | I can assume that filesystems are clean prior to the fsck as they are | mounted and non-corrupted filesystems. | | Regards | Matias Kreder Hi Matias, If you're on RHEL6.3 or migrating to RHEL6.3, you can move the storage, then run the new fsck.gfs2 before doing the gfs2_convert. Save you some time. :) Regards, Bob Peterson Red Hat File Systems From mkreder at gmail.com Fri Jul 13 16:36:32 2012 From: mkreder at gmail.com (Matias Kreder) Date: Fri, 13 Jul 2012 13:36:32 -0300 Subject: [Linux-cluster] gfs_fsck estimation In-Reply-To: <3884b756-a978-4ce4-9afb-92b39c1ea97d@zmail12.collab.prod.int.phx2.redhat.com> References: <3884b756-a978-4ce4-9afb-92b39c1ea97d@zmail12.collab.prod.int.phx2.redhat.com> Message-ID: On Fri, Jul 13, 2012 at 1:17 PM, Bob Peterson wrote: > | Bob, > | > | Thanks for the explanation. I didn't give you the full scenario. I'm > | looking to estimate the time of fsck before GFS to GFS2 conversion so > | I can assume that filesystems are clean prior to the fsck as they are > | mounted and non-corrupted filesystems. > | > | Regards > | Matias Kreder > > Hi Matias, > > If you're on RHEL6.3 or migrating to RHEL6.3, you can move the storage, > then run the new fsck.gfs2 before doing the gfs2_convert. Save you some > time. :) > > Regards, > > Bob Peterson > Red Hat File Systems > Bob, Unfortunately we will not be migrating to RHEL6 yet but we are migrating from GFS to GFS2 on RHEL5 to gain the GFS benefits. Any thoughts in how to estimate the fsck time? Regards Matias Kreder > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From jeff.sturm at eprize.com Fri Jul 13 16:51:29 2012 From: jeff.sturm at eprize.com (Jeff Sturm) Date: Fri, 13 Jul 2012 16:51:29 +0000 Subject: [Linux-cluster] gfs_fsck estimation In-Reply-To: References: <3884b756-a978-4ce4-9afb-92b39c1ea97d@zmail12.collab.prod.int.phx2.redhat.com> Message-ID: > -----Original Message----- > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] > On Behalf Of Matias Kreder > Sent: Friday, July 13, 2012 12:37 PM > > Unfortunately we will not be migrating to RHEL6 yet but we are migrating from GFS to > GFS2 on RHEL5 to gain the GFS benefits. > > Any thoughts in how to estimate the fsck time? If your SAN supports LUN snapshots, you could try gfs_fsck on a snapshot first, and see how long it runs. -Jeff From jvdiago at gmail.com Mon Jul 16 16:03:34 2012 From: jvdiago at gmail.com (Javier Vela) Date: Mon, 16 Jul 2012 18:03:34 +0200 Subject: [Linux-cluster] Strange behaviours in two-node cluster Message-ID: Hi, two weeks ago I asked for some help building a two-node cluster with HA-LVM. After some e-mails, finally I got my cluster working. The problem now is that sometimes, and in some clusters (I have three clusters with the same configuration), I got very strange behaviours. #1 Openais detects some problem and shutdown itself. The network is Ok, is a virtual device in vmware, shared with the other cluster hearbet networks, and only happens in one cluster. The error messages: Jul 16 08:50:32 node1 openais[3641]: [TOTEM] FAILED TO RECEIVE Jul 16 08:50:32 node1 openais[3641]: [TOTEM] entering GATHER state from 6. Jul 16 08:50:36 node1 openais[3641]: [TOTEM] entering GATHER state from 0 Do you know what can I check in order to solve the problem? I don't know from where I should start. What makes Openais to not receive messages? #2 I'm getting a lot of RGmanager errors when rgmanager tries to change the service status. i.e: clusvdcam -d service. Always happens when I have the two nodes UP. If I shutdown one node, then the command finishes succesfully. Prior to execute the command, I always check the status with clustat, and everything is OK: clurgmgrd[5667]: #52: Failed changing RG status Another time, what can I check in order to detect problems with rgmanager that clustat and cman_tool doesn't show? #3 Sometimes, not always, a node that has been fenced cannot join the cluster after the reboot. With clustat I can see that there is quorum: clustat: [root at node2 ~]# clustat Cluster Status test_cluster @ Mon Jul 16 05:46:57 2012 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1-hb 1 Offline node2-hb 2 Online, Local, rgmanager /dev/disk/by-path/pci-0000:02:01.0-scsi- 0 Online, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- service:test node2-hb started The log show how node2 fenced node1: node2 messages Jul 13 04:00:31 node2 fenced[4219]: node1 not a cluster member after 0 sec post_fail_delay Jul 13 04:00:31 node2 fenced[4219]: fencing node "node1" Jul 13 04:00:36 node2 clurgmgrd[4457]: Waiting for node #1 to be fenced Jul 13 04:01:04 node2 fenced[4219]: fence "node1" success Jul 13 04:01:06 node2 clurgmgrd[4457]: Node #1 fenced; continuing But the node that tries to join the cluster says that there isn't quorum. Finally. It finishes inquorate, without seeing node1 and the quorum disk. node1 messages Jul 16 05:48:19 node1 ccsd[4207]: Error while processing connect: Connection refused Jul 16 05:48:19 node1 ccsd[4207]: Cluster is not quorate. Refusing connection. Have something in common the three errors? What should I check? I've discarded cluster configuration because cluster is working, and the errors doesn't appear in all the nodes. The most annoying error cureently is the #1. Every 10-15 minutes Openais fails and the nodes gets fenced. I attach the cluster.conf. Thanks in advance. Regards, Javi -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part --------------