From queszama at yahoo.in Thu Jan 3 10:00:46 2013 From: queszama at yahoo.in (Zama Ques) Date: Thu, 3 Jan 2013 18:00:46 +0800 (SGT) Subject: [Linux-cluster] GFS without creating a cluster Message-ID: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com> Hi All , Need few clarification regarding GFS. I need to create a shared file system for our servers . The servers will write to the shared file system at the same time and there is no requirement for a cluster . Planning to use GFS but GFS requires cluster software to be running . My confusion here is If I just run the cluster software ( cman etc ) without creating a cluster , will I be able to configure and run GFS2. Also , is it possible to write to a GFS file system from many servers at the same time ? Will be great if somebody can clarify by doubts. Thanks in Advance Zaman From swhiteho at redhat.com Thu Jan 3 10:16:38 2013 From: swhiteho at redhat.com (Steven Whitehouse) Date: Thu, 03 Jan 2013 10:16:38 +0000 Subject: [Linux-cluster] GFS without creating a cluster In-Reply-To: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com> References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com> Message-ID: <1357208199.2696.1.camel@menhir> Hi, On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote: > Hi All , > > > Need few clarification regarding GFS. > > > I need to create a shared file system for our servers . The servers will write to the shared file system at the same time and there is no requirement for a cluster . > > Planning to use GFS but GFS requires cluster software to be running . My confusion here is If I just run the cluster software ( cman etc ) without creating a cluster , will I be able to configure and run GFS2. Also , is it possible to write to a GFS file system from many servers at the same time ? > > Will be great if somebody can clarify by doubts. > > > Thanks in Advance > Zaman > > If you want to use GFS2 without a cluster, then you'll only be able to use it from a single node (just like if you were using ext3 for example). If you want to use GFS2 as intended, with multiple nodes accessing the same filesystem, then you'll need to set up a cluster in order to do so, Steve. From rainer.hartwig.schubert at gmail.com Thu Jan 3 13:21:27 2013 From: rainer.hartwig.schubert at gmail.com (Rainer Schubert) Date: Thu, 3 Jan 2013 14:21:27 +0100 Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out Message-ID: Hi, I have created a small CMAN-Cluster with 3 Nodes and a CLVM configuration. Now, I want to add a new node (mynode4). CMAN works fine, cman_tool shows all members: # cman_tool nodes Node Sts Inc Joined Name 1 M 408 2013-01-03 14:00:57 mynode1 2 M 408 2013-01-03 14:00:57 mynode2 3 M 408 2013-01-03 14:00:57 mynode3 4 M 404 2013-01-03 14:00:56 mynode4 cman_tool services (on mynode4) fence domain member count 4 victim count 0 victim now 0 master nodeid 1 wait state none members 1 2 3 4 corosync: corosync-cfgtool -s Printing ring status. Local node ID 4 RING ID 0 id = 10.10.10.13 status = ring 0 active with no faults Everything looks fine, from my site. No I will start clvmd :~# /etc/init.d/clvm start Starting Cluster LVM Daemon: clvm clvmd startup timed out The CLVM runs into a time out. My System: cat /etc/debian_version 6.0.6 # lvm version LVM version: 2.02.66(2) (2010-05-20) Library version: 1.02.48 (2010-05-20) Driver version: 4.22.0 dpkg -l |grep clvm ii clvm 2.02.66-5 Cluster LVM Daemon for lvm2 dpkg -l |grep cman ii cman 3.0.12-2 Red Hat cluster suite - cluster manager ii libcman3 3.0.12-2 Red Hat cluster suite - cluster manager libraries Have anybody a idea, what running false? best regards From queszama at yahoo.in Thu Jan 3 13:37:35 2013 From: queszama at yahoo.in (Zama Ques) Date: Thu, 3 Jan 2013 21:37:35 +0800 (SGT) Subject: [Linux-cluster] GFS without creating a cluster In-Reply-To: <1357208199.2696.1.camel@menhir> References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com> <1357208199.2696.1.camel@menhir> Message-ID: <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com> ----- Original Message ----- From: Steven Whitehouse To: Zama Ques ; linux clustering Cc: Sent: Thursday, 3 January 2013 3:46 PM Subject: Re: [Linux-cluster] GFS without creating a cluster Hi, On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote: > Hi All , > > > Need few clarification regarding GFS. > > > I need to create a shared file system for our servers . The servers will write to the shared file system at the same time and there is no requirement for a cluster . > > Planning to use GFS but GFS requires cluster software to be running . My confusion here is If I just run the cluster software ( cman etc ) without creating a cluster , will I be able to configure and run GFS2. Also , is it possible to write to a GFS file system from many servers at the same time ? > > Will be great if somebody can clarify by doubts. > > > Thanks in Advance > Zaman > > > If you want to use GFS2 without a cluster, then you'll only be able to > use it from a single node (just like if you were using ext3 for > example). If you want to use GFS2 as intended, with multiple nodes > accessing the same filesystem, then you'll need to set up a cluster in > order to do so, Thanks Steve for the reply . As you said setting up a cluster is needed to use GFS2 with multiple nodes, does that mean that I need to create cluster.conf or running cluster services (cman etc) should be fine for setting up GFS2. Not sure whether cman will run without creating cluster.conf Assuming that I need to setup cluster.conf in order to use GFS2 , that means if there are two nodes in the cluster with GFS2 as file system resource , GFS2 will be mounted on only one host based on failover domain policy . But our requirement is like that GFS2 should be mounted on both servers at the same time? . Based on my little understanding of GFS , looks to me that I will not be able to achieve this using GFS2 or there are some way to achieve this ? Please clarify on this. Thanks in Advance Zaman From queszama at yahoo.in Thu Jan 3 13:37:35 2013 From: queszama at yahoo.in (Zama Ques) Date: Thu, 3 Jan 2013 21:37:35 +0800 (SGT) Subject: [Linux-cluster] GFS without creating a cluster In-Reply-To: <1357208199.2696.1.camel@menhir> References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com> <1357208199.2696.1.camel@menhir> Message-ID: <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com> ----- Original Message ----- From: Steven Whitehouse To: Zama Ques ; linux clustering Cc: Sent: Thursday, 3 January 2013 3:46 PM Subject: Re: [Linux-cluster] GFS without creating a cluster Hi, On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote: > Hi All , > > > Need few clarification regarding GFS. > > > I need to create a shared file system for our servers . The servers will write to the shared file system at the same time and there is no requirement for a cluster . > > Planning to use GFS but GFS requires cluster software to be running . My confusion here is If I just run the cluster software ( cman etc ) without creating a cluster , will I be able to configure and run GFS2. Also , is it possible to write to a GFS file system from many servers at the same time ? > > Will be great if somebody can clarify by doubts. > > > Thanks in Advance > Zaman > > > If you want to use GFS2 without a cluster, then you'll only be able to > use it from a single node (just like if you were using ext3 for > example). If you want to use GFS2 as intended, with multiple nodes > accessing the same filesystem, then you'll need to set up a cluster in > order to do so, Thanks Steve for the reply . As you said setting up a cluster is needed to use GFS2 with multiple nodes, does that mean that I need to create cluster.conf or running cluster services (cman etc) should be fine for setting up GFS2. Not sure whether cman will run without creating cluster.conf Assuming that I need to setup cluster.conf in order to use GFS2 , that means if there are two nodes in the cluster with GFS2 as file system resource , GFS2 will be mounted on only one host based on failover domain policy . But our requirement is like that GFS2 should be mounted on both servers at the same time? . Based on my little understanding of GFS , looks to me that I will not be able to achieve this using GFS2 or there are some way to achieve this ? Please clarify on this. Thanks in Advance Zaman From torajveersingh at gmail.com Thu Jan 3 15:08:10 2013 From: torajveersingh at gmail.com (Rajveer Singh) Date: Thu, 3 Jan 2013 20:38:10 +0530 Subject: [Linux-cluster] GFS without creating a cluster In-Reply-To: <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com> References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com> <1357208199.2696.1.camel@menhir> <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com> Message-ID: On Thu, Jan 3, 2013 at 7:07 PM, Zama Ques wrote: > > > > > > ----- Original Message ----- > From: Steven Whitehouse > To: Zama Ques ; linux clustering < > linux-cluster at redhat.com> > Cc: > Sent: Thursday, 3 January 2013 3:46 PM > Subject: Re: [Linux-cluster] GFS without creating a cluster > > Hi, > > On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote: > > Hi All , > > > > > > Need few clarification regarding GFS. > > > > > > I need to create a shared file system for our servers . The servers will > write to the shared file system at the same time and there is no > requirement for a cluster . > > > > Planning to use GFS but GFS requires cluster software to be running . My > confusion here is If I just run the cluster software ( cman etc ) without > creating a cluster , will I be able to configure and run GFS2. Also , is it > possible to write to a GFS file system from many servers at the same time ? > > > > Will be great if somebody can clarify by doubts. > > > > > > Thanks in Advance > > Zaman > > > > > > > If you want to use GFS2 without a cluster, then you'll only be able to > > use it from a single node (just like if you were using ext3 for > > example). If you want to use GFS2 as intended, with multiple nodes > > accessing the same filesystem, then you'll need to set up a cluster in > > order to do so, > > Thanks Steve for the reply . As you said setting up a cluster is needed to > use GFS2 with multiple nodes, does that mean that I need to create > cluster.conf or running cluster services (cman etc) should be fine for > setting up GFS2. Not sure whether cman will run without creating > cluster.conf > > Assuming that I need to setup cluster.conf in order to use GFS2 , that > means if there are two nodes in the cluster with GFS2 as file system > resource , GFS2 will be mounted on only one host based on failover domain > policy . But our requirement is like that GFS2 should be mounted on both > servers at the same time . Based on my little understanding of GFS , looks > to me that I will not be able to achieve this using GFS2 or there are some > way to achieve this ? > > Please clarify on this. > > Hi Zama, As steve said, you must have to configure proper cluster to use GFS2 filesystem and mounted on multiple nodes at the same time so that all can access it. You do not need to configure GFS2 filesystem to be managed by cluster i.e. rgmanager. but just make the entry in /etc/fstab file as like normal ext3 filesystem. I hope, it answers your question. Regards, Rajveer Singh -------------- next part -------------- An HTML attachment was scrubbed... URL: From queszama at yahoo.in Thu Jan 3 15:22:35 2013 From: queszama at yahoo.in (Zama Ques) Date: Thu, 3 Jan 2013 23:22:35 +0800 (SGT) Subject: [Linux-cluster] GFS without creating a cluster In-Reply-To: References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com> <1357208199.2696.1.camel@menhir> <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com> Message-ID: <1357226555.85245.YahooMailNeo@web193503.mail.sg3.yahoo.com> ________________________________ From: Rajveer Singh To: Zama Ques ; linux clustering Cc: Steven Whitehouse Sent: Thursday, 3 January 2013 8:38 PM Subject: Re: [Linux-cluster] GFS without creating a cluster On Thu, Jan 3, 2013 at 7:07 PM, Zama Ques wrote: > > > > >----- Original Message ----- >From: Steven Whitehouse >To: Zama Ques ; linux clustering >Cc: >Sent: Thursday, 3 January 2013 3:46 PM >Subject: Re: [Linux-cluster] GFS without creating a cluster > >Hi, > >On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote: >> Hi All , >> >> >> Need few clarification regarding GFS. >> >> >> I need to create a shared file system for our servers . The servers will write to the shared file system at the same time and there is no requirement for a cluster . >> >> Planning to use GFS but GFS requires cluster software to be running . My confusion here is If I just run the cluster software ( cman etc ) without creating a cluster , will I be able to configure and run GFS2. Also , is it possible to write to a GFS file system from many servers at the same time ? >> >> Will be great if somebody can clarify by doubts. >> >> >> Thanks in Advance >> Zaman >> >> > >> If you want to use GFS2 without a cluster, then you'll only be able to >> use it from a single node (just like if you were using ext3 for >> example). If you want to use GFS2 as intended, with multiple nodes >> accessing the same filesystem, then you'll need to set up a cluster in >> order to do so, > >Thanks Steve for the reply . As you said setting up a cluster is needed to use GFS2 with multiple nodes, does that mean that I need to create cluster.conf or running cluster services (cman etc) should be fine for setting up GFS2. Not sure whether cman will run without creating cluster.conf > >Assuming that I need to setup cluster.conf in order to use GFS2 , that means if there are two nodes in the cluster with GFS2 as file system resource , GFS2 will be mounted on only one host based on failover domain policy . But our requirement is like that GFS2 should be mounted on both servers at the same time? . Based on my little understanding of GFS , looks to me that I will not be able to achieve this using GFS2 or there are some way to achieve this ? > >Please clarify on this. > > ?> Hi Zama, > As steve said, you must have to configure proper cluster to use GFS2 filesystem and mounted on multiple nodes at the same time so that all can > access it. You do not need to configure GFS2 filesystem to be managed by cluster i.e. rgmanager. but just make the entry in /etc/fstab file as like > normal ext3 filesystem. > I hope, it answers your question. Thanks Rajveer for clarifying . I think I am clear now . Will now try to configure GFS2. Thanks Zaman -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Thu Jan 3 17:17:43 2013 From: lists at alteeve.ca (Digimer) Date: Thu, 03 Jan 2013 12:17:43 -0500 Subject: [Linux-cluster] GFS without creating a cluster In-Reply-To: <1357226555.85245.YahooMailNeo@web193503.mail.sg3.yahoo.com> References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com> <1357208199.2696.1.camel@menhir> <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com> <1357226555.85245.YahooMailNeo@web193503.mail.sg3.yahoo.com> Message-ID: <50E5BD37.9080804@alteeve.ca> On 01/03/2013 10:22 AM, Zama Ques wrote: > > > ------------------------------------------------------------------------ > *From:* Rajveer Singh > *To:* Zama Ques ; linux clustering > > *Cc:* Steven Whitehouse > *Sent:* Thursday, 3 January 2013 8:38 PM > *Subject:* Re: [Linux-cluster] GFS without creating a cluster > > > > On Thu, Jan 3, 2013 at 7:07 PM, Zama Ques > wrote: > > > > > > > ----- Original Message ----- > From: Steven Whitehouse > > To: Zama Ques >; linux > clustering > > Cc: > Sent: Thursday, 3 January 2013 3:46 PM > Subject: Re: [Linux-cluster] GFS without creating a cluster > > Hi, > > On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote: > > Hi All , > > > > > > Need few clarification regarding GFS. > > > > > > I need to create a shared file system for our servers . The > servers will write to the shared file system at the same time and > there is no requirement for a cluster . > > > > Planning to use GFS but GFS requires cluster software to be > running . My confusion here is If I just run the cluster software ( > cman etc ) without creating a cluster , will I be able to configure > and run GFS2. Also , is it possible to write to a GFS file system > from many servers at the same time ? > > > > Will be great if somebody can clarify by doubts. > > > > > > Thanks in Advance > > Zaman > > > > > > > If you want to use GFS2 without a cluster, then you'll only be able to > > use it from a single node (just like if you were using ext3 for > > example). If you want to use GFS2 as intended, with multiple nodes > > accessing the same filesystem, then you'll need to set up a cluster in > > order to do so, > > Thanks Steve for the reply . As you said setting up a cluster is > needed to use GFS2 with multiple nodes, does that mean that I need > to create cluster.conf or running cluster services (cman etc) should > be fine for setting up GFS2. Not sure whether cman will run without > creating cluster.conf > > Assuming that I need to setup cluster.conf in order to use GFS2 , > that means if there are two nodes in the cluster with GFS2 as file > system resource , GFS2 will be mounted on only one host based on > failover domain policy . But our requirement is like that GFS2 > should be mounted on both servers at the same time . Based on my > little understanding of GFS , looks to me that I will not be able to > achieve this using GFS2 or there are some way to achieve this ? > > Please clarify on this. > > > Hi Zama, >> As steve said, you must have to configure proper cluster to use GFS2 > filesystem and mounted on multiple nodes at the same time so that all > can > access it. You do not need to configure GFS2 filesystem to be > managed by cluster i.e. rgmanager. but just make the entry in /etc/fstab > file as like > normal ext3 filesystem. >> I hope, it answers your question. > > Thanks Rajveer for clarifying . I think I am clear now . Will now try to > configure GFS2. > > > Thanks > Zaman Note that you will also need proper fencing setup (usually using the nodes' IPMI interface). Without properly configured, tested fencing, the first time a node fails the GFS2 partition will hang (by design). The reason the cluster is needed is that the access to the shared storage and file system has to be coordinated between the nodes so that one node doesn't step on the other. This is possible thanks to DLM; distributed lock manager. DLM uses the cluster communications, hence the need for the cluster. Note also that you need shared storage, obviously. iSCSI or DRBD if you only have two nodes. Please take a look at this link. It explains in details how this works; https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From lists at alteeve.ca Thu Jan 3 17:22:02 2013 From: lists at alteeve.ca (Digimer) Date: Thu, 03 Jan 2013 12:22:02 -0500 Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out In-Reply-To: References: Message-ID: <50E5BE3A.1090006@alteeve.ca> On 01/03/2013 08:21 AM, Rainer Schubert wrote: > Hi, > > I have created a small CMAN-Cluster with 3 Nodes and a CLVM > configuration. Now, I want to add a new node (mynode4). CMAN works > fine, cman_tool shows all members: > > # cman_tool nodes > Node Sts Inc Joined Name > 1 M 408 2013-01-03 14:00:57 mynode1 > 2 M 408 2013-01-03 14:00:57 mynode2 > 3 M 408 2013-01-03 14:00:57 mynode3 > 4 M 404 2013-01-03 14:00:56 mynode4 > > > cman_tool services (on mynode4) > > fence domain > member count 4 > victim count 0 > victim now 0 > master nodeid 1 > wait state none > members 1 2 3 4 > > > corosync: > > corosync-cfgtool -s > Printing ring status. > Local node ID 4 > RING ID 0 > id = 10.10.10.13 > status = ring 0 active with no faults > > Everything looks fine, from my site. No I will start clvmd > > :~# /etc/init.d/clvm start > Starting Cluster LVM Daemon: clvm clvmd startup timed out > > The CLVM runs into a time out. > > My System: > > cat /etc/debian_version > 6.0.6 > > # lvm version > LVM version: 2.02.66(2) (2010-05-20) > Library version: 1.02.48 (2010-05-20) > Driver version: 4.22.0 > > dpkg -l |grep clvm > ii clvm 2.02.66-5 > Cluster LVM Daemon for lvm2 > > dpkg -l |grep cman > ii cman 3.0.12-2 > Red Hat cluster suite - cluster manager > ii libcman3 3.0.12-2 > Red Hat cluster suite - cluster manager libraries > > Have anybody a idea, what running false? > > best regards > Can you post your cluster.conf please? Obfuscate as little as you can please. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From radu.rendec at mindbit.ro Thu Jan 3 20:47:11 2013 From: radu.rendec at mindbit.ro (Radu Rendec) Date: Thu, 03 Jan 2013 22:47:11 +0200 Subject: [Linux-cluster] GFS without creating a cluster In-Reply-To: <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com> References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com> <1357208199.2696.1.camel@menhir> <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com> Message-ID: <1357246031.9208.127.camel@localhost> On Thu, 2013-01-03 at 21:37 +0800, Zama Ques wrote: > Thanks Steve for the reply . As you said setting up a cluster is > needed to use GFS2 with multiple nodes, does that mean that I need to > create cluster.conf or running cluster services (cman etc) should be > fine for setting up GFS2. Not sure whether cman will run without > creating cluster.conf > > Assuming that I need to setup cluster.conf in order to use GFS2 , that > means if there are two nodes in the cluster with GFS2 as file system > resource , GFS2 will be mounted on only one host based on failover > domain policy . But our requirement is like that GFS2 should be > mounted on both servers at the same time . Based on my little > understanding of GFS , looks to me that I will not be able to achieve > this using GFS2 or there are some way to achieve this ? Hi, This may be a little bit off-topic for this list (as it focuses on the clustering suite) but if all you need is a shared filesystem (without the clustering) you may want to take a look at glusterfs (www.gluster.org). Cheers, Radu From rainer.hartwig.schubert at gmail.com Fri Jan 4 07:26:16 2013 From: rainer.hartwig.schubert at gmail.com (Rainer Schubert) Date: Fri, 4 Jan 2013 08:26:16 +0100 Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out In-Reply-To: <50E5BE3A.1090006@alteeve.ca> References: <50E5BE3A.1090006@alteeve.ca> Message-ID: Hi, my cluster.conf: 2013/1/3 Digimer : > On 01/03/2013 08:21 AM, Rainer Schubert wrote: >> Hi, >> >> I have created a small CMAN-Cluster with 3 Nodes and a CLVM >> configuration. Now, I want to add a new node (mynode4). CMAN works >> fine, cman_tool shows all members: >> >> # cman_tool nodes >> Node Sts Inc Joined Name >> 1 M 408 2013-01-03 14:00:57 mynode1 >> 2 M 408 2013-01-03 14:00:57 mynode2 >> 3 M 408 2013-01-03 14:00:57 mynode3 >> 4 M 404 2013-01-03 14:00:56 mynode4 >> >> >> cman_tool services (on mynode4) >> >> fence domain >> member count 4 >> victim count 0 >> victim now 0 >> master nodeid 1 >> wait state none >> members 1 2 3 4 >> >> >> corosync: >> >> corosync-cfgtool -s >> Printing ring status. >> Local node ID 4 >> RING ID 0 >> id = 10.10.10.13 >> status = ring 0 active with no faults >> >> Everything looks fine, from my site. No I will start clvmd >> >> :~# /etc/init.d/clvm start >> Starting Cluster LVM Daemon: clvm clvmd startup timed out >> >> The CLVM runs into a time out. >> >> My System: >> >> cat /etc/debian_version >> 6.0.6 >> >> # lvm version >> LVM version: 2.02.66(2) (2010-05-20) >> Library version: 1.02.48 (2010-05-20) >> Driver version: 4.22.0 >> >> dpkg -l |grep clvm >> ii clvm 2.02.66-5 >> Cluster LVM Daemon for lvm2 >> >> dpkg -l |grep cman >> ii cman 3.0.12-2 >> Red Hat cluster suite - cluster manager >> ii libcman3 3.0.12-2 >> Red Hat cluster suite - cluster manager libraries >> >> Have anybody a idea, what running false? >> >> best regards >> > > Can you post your cluster.conf please? Obfuscate as little as you can > please. > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? From emi2fast at gmail.com Fri Jan 4 08:14:07 2013 From: emi2fast at gmail.com (emmanuel segura) Date: Fri, 4 Jan 2013 09:14:07 +0100 Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out In-Reply-To: References: <50E5BE3A.1090006@alteeve.ca> Message-ID: Hello I think you need the fecing 2013/1/4 Rainer Schubert > Hi, > > my cluster.conf: > > > > > > > > > > > > > > > > > 2013/1/3 Digimer : > > On 01/03/2013 08:21 AM, Rainer Schubert wrote: > >> Hi, > >> > >> I have created a small CMAN-Cluster with 3 Nodes and a CLVM > >> configuration. Now, I want to add a new node (mynode4). CMAN works > >> fine, cman_tool shows all members: > >> > >> # cman_tool nodes > >> Node Sts Inc Joined Name > >> 1 M 408 2013-01-03 14:00:57 mynode1 > >> 2 M 408 2013-01-03 14:00:57 mynode2 > >> 3 M 408 2013-01-03 14:00:57 mynode3 > >> 4 M 404 2013-01-03 14:00:56 mynode4 > >> > >> > >> cman_tool services (on mynode4) > >> > >> fence domain > >> member count 4 > >> victim count 0 > >> victim now 0 > >> master nodeid 1 > >> wait state none > >> members 1 2 3 4 > >> > >> > >> corosync: > >> > >> corosync-cfgtool -s > >> Printing ring status. > >> Local node ID 4 > >> RING ID 0 > >> id = 10.10.10.13 > >> status = ring 0 active with no faults > >> > >> Everything looks fine, from my site. No I will start clvmd > >> > >> :~# /etc/init.d/clvm start > >> Starting Cluster LVM Daemon: clvm clvmd startup timed out > >> > >> The CLVM runs into a time out. > >> > >> My System: > >> > >> cat /etc/debian_version > >> 6.0.6 > >> > >> # lvm version > >> LVM version: 2.02.66(2) (2010-05-20) > >> Library version: 1.02.48 (2010-05-20) > >> Driver version: 4.22.0 > >> > >> dpkg -l |grep clvm > >> ii clvm 2.02.66-5 > >> Cluster LVM Daemon for lvm2 > >> > >> dpkg -l |grep cman > >> ii cman 3.0.12-2 > >> Red Hat cluster suite - cluster manager > >> ii libcman3 3.0.12-2 > >> Red Hat cluster suite - cluster manager libraries > >> > >> Have anybody a idea, what running false? > >> > >> best regards > >> > > > > Can you post your cluster.conf please? Obfuscate as little as you can > > please. > > > > -- > > Digimer > > Papers and Projects: https://alteeve.ca/w/ > > What if the cure for cancer is trapped in the mind of a person without > > access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From emi2fast at gmail.com Fri Jan 4 08:14:41 2013 From: emi2fast at gmail.com (emmanuel segura) Date: Fri, 4 Jan 2013 09:14:41 +0100 Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out In-Reply-To: References: <50E5BE3A.1090006@alteeve.ca> Message-ID: Sorry for my bad english I think you need the fencing 2013/1/4 emmanuel segura > Hello > > I think you need the fecing > > > 2013/1/4 Rainer Schubert > >> Hi, >> >> my cluster.conf: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> 2013/1/3 Digimer : >> > On 01/03/2013 08:21 AM, Rainer Schubert wrote: >> >> Hi, >> >> >> >> I have created a small CMAN-Cluster with 3 Nodes and a CLVM >> >> configuration. Now, I want to add a new node (mynode4). CMAN works >> >> fine, cman_tool shows all members: >> >> >> >> # cman_tool nodes >> >> Node Sts Inc Joined Name >> >> 1 M 408 2013-01-03 14:00:57 mynode1 >> >> 2 M 408 2013-01-03 14:00:57 mynode2 >> >> 3 M 408 2013-01-03 14:00:57 mynode3 >> >> 4 M 404 2013-01-03 14:00:56 mynode4 >> >> >> >> >> >> cman_tool services (on mynode4) >> >> >> >> fence domain >> >> member count 4 >> >> victim count 0 >> >> victim now 0 >> >> master nodeid 1 >> >> wait state none >> >> members 1 2 3 4 >> >> >> >> >> >> corosync: >> >> >> >> corosync-cfgtool -s >> >> Printing ring status. >> >> Local node ID 4 >> >> RING ID 0 >> >> id = 10.10.10.13 >> >> status = ring 0 active with no faults >> >> >> >> Everything looks fine, from my site. No I will start clvmd >> >> >> >> :~# /etc/init.d/clvm start >> >> Starting Cluster LVM Daemon: clvm clvmd startup timed out >> >> >> >> The CLVM runs into a time out. >> >> >> >> My System: >> >> >> >> cat /etc/debian_version >> >> 6.0.6 >> >> >> >> # lvm version >> >> LVM version: 2.02.66(2) (2010-05-20) >> >> Library version: 1.02.48 (2010-05-20) >> >> Driver version: 4.22.0 >> >> >> >> dpkg -l |grep clvm >> >> ii clvm 2.02.66-5 >> >> Cluster LVM Daemon for lvm2 >> >> >> >> dpkg -l |grep cman >> >> ii cman 3.0.12-2 >> >> Red Hat cluster suite - cluster manager >> >> ii libcman3 3.0.12-2 >> >> Red Hat cluster suite - cluster manager libraries >> >> >> >> Have anybody a idea, what running false? >> >> >> >> best regards >> >> >> > >> > Can you post your cluster.conf please? Obfuscate as little as you can >> > please. >> > >> > -- >> > Digimer >> > Papers and Projects: https://alteeve.ca/w/ >> > What if the cure for cancer is trapped in the mind of a person without >> > access to education? >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > > -- > esta es mi vida e me la vivo hasta que dios quiera -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Fri Jan 4 14:55:24 2013 From: lists at alteeve.ca (Digimer) Date: Fri, 04 Jan 2013 09:55:24 -0500 Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out In-Reply-To: References: <50E5BE3A.1090006@alteeve.ca> Message-ID: <50E6ED5C.1090003@alteeve.ca> As Emmanuel said, you need fencing. Please read this: https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing digimer On 01/04/2013 02:26 AM, Rainer Schubert wrote: > Hi, > > my cluster.conf: > > > > > > > > > > > > > > > > > 2013/1/3 Digimer : >> On 01/03/2013 08:21 AM, Rainer Schubert wrote: >>> Hi, >>> >>> I have created a small CMAN-Cluster with 3 Nodes and a CLVM >>> configuration. Now, I want to add a new node (mynode4). CMAN works >>> fine, cman_tool shows all members: >>> >>> # cman_tool nodes >>> Node Sts Inc Joined Name >>> 1 M 408 2013-01-03 14:00:57 mynode1 >>> 2 M 408 2013-01-03 14:00:57 mynode2 >>> 3 M 408 2013-01-03 14:00:57 mynode3 >>> 4 M 404 2013-01-03 14:00:56 mynode4 >>> >>> >>> cman_tool services (on mynode4) >>> >>> fence domain >>> member count 4 >>> victim count 0 >>> victim now 0 >>> master nodeid 1 >>> wait state none >>> members 1 2 3 4 >>> >>> >>> corosync: >>> >>> corosync-cfgtool -s >>> Printing ring status. >>> Local node ID 4 >>> RING ID 0 >>> id = 10.10.10.13 >>> status = ring 0 active with no faults >>> >>> Everything looks fine, from my site. No I will start clvmd >>> >>> :~# /etc/init.d/clvm start >>> Starting Cluster LVM Daemon: clvm clvmd startup timed out >>> >>> The CLVM runs into a time out. >>> >>> My System: >>> >>> cat /etc/debian_version >>> 6.0.6 >>> >>> # lvm version >>> LVM version: 2.02.66(2) (2010-05-20) >>> Library version: 1.02.48 (2010-05-20) >>> Driver version: 4.22.0 >>> >>> dpkg -l |grep clvm >>> ii clvm 2.02.66-5 >>> Cluster LVM Daemon for lvm2 >>> >>> dpkg -l |grep cman >>> ii cman 3.0.12-2 >>> Red Hat cluster suite - cluster manager >>> ii libcman3 3.0.12-2 >>> Red Hat cluster suite - cluster manager libraries >>> >>> Have anybody a idea, what running false? >>> >>> best regards >>> >> >> Can you post your cluster.conf please? Obfuscate as little as you can >> please. >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ >> What if the cure for cancer is trapped in the mind of a person without >> access to education? -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From mkathuria at tuxtechnologies.co.in Sat Jan 5 07:23:36 2013 From: mkathuria at tuxtechnologies.co.in (Manish Kathuria) Date: Sat, 5 Jan 2013 12:53:36 +0530 Subject: [Linux-cluster] Packet loss after configuring Ethernet bonding In-Reply-To: <509DD694.1000900@alteeve.ca> References: <1352514375.40862.YahooMailNeo@web193003.mail.sg3.yahoo.com> <509DC1E9.9090704@alteeve.ca> <1352520739.40244.YahooMailNeo@web193002.mail.sg3.yahoo.com> <509DD694.1000900@alteeve.ca> Message-ID: On Sat, Nov 10, 2012 at 9:52 AM, Digimer wrote: > On 11/09/2012 11:12 PM, Zama Ques wrote: >>> Need help on resolving a issue related to implementing High Availability at network level . I understand that this is not the right forum to ask this question , but since it is related to HA and Linux , I am asking here and I feel somebody here will have answer to the issues I am facing . >>> >>> I am trying to implement Ethernet Bonding , Both the interface in my server are connected to two different network switches . >>> >>> My configuration is as follows: >>> >>> ======== >>> # cat /proc/net/bonding/bond0 >>> >>> Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009) >>> >>> Bonding Mode: adaptive load balancing Primary Slave: None Currently >>> Active Slave: eth0 MII Status: up MII Polling Interval (ms): 0 Up Delay >>> (ms): 0 Down Delay (ms): 0 >>> >>> Slave Interface: eth0 MII Status: up Speed: 1000 Mbps Duplex: full Link >>> Failure Count: 0 Permanent HW addr: e4:e1:5b:d0:11:10 Slave queue ID: 0 >>> >>> Slave Interface: eth1 MII Status: up Speed: 1000 Mbps Duplex: full Link >>> Failure Count: 0 Permanent HW addr: e4:e1:5b:d0:11:14 Slave queue ID: 0 >>> ------------ >>> # cat /sys/class/net/bond0/bonding/mode >>> >>> balance-alb 6 >>> >>> >>> # cat /sys/class/net/bond0/bonding/miimon >>> 0 >>> >>> ============ >>> >>> >>> The issue for me is that I am seeing packet loss after configuring bonding . Tried connecting both the interface to the same switch , but still seeing the packet loss . Also , tried changing miimon value to 100 , but still seeing the packet loss. >>> >>> What I am missing in the configuration ? Any help will be highly appreciated in resolving the problem . >>> >>> >>> >>> Thanks >>> Zaman >> >> > You didn't share any details on your configuration, but I will assume >>> you are using corosync. >> >>> The only supported bonding mode is Active/Passive (mode=1). I've >>> personally tried all modes, out of curiosity, and all had problems. The >>> short of it is that if you need more that 1 gbit of performance, buy >>> faster cards. >> >>> If you are interested in what I use, it's documented here: >> >>> https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Network >> >>> I've used this setup in several production clusters and have tested >>> failure are recovery extensively. It's proven very stable. :) >> >> >> Thanks Digimer for the quick response and pointing me to the link . I am yet to reach cluster configuration , initially trying to understand ethernet bonding before going into cluster configuration. So , option for me is only to use Active/Passive bonding mode in case of clustered environment. >> Few more clarifications needed , Can we use other bonding modes in non clustered environment . I am seeing packet loss in other modes . Also , the support of using only mode=1 in cluster environment is it a restriction of RHEL Cluster suite or it is by design . >> >> Will be great if you clarify these queries . >> >> Thanks in Advance >> Zaman > > Corosync is the only actively developed/supported (HA) cluster > communications and membership tool. It's used on all modern distros for > clustering and the requirement for mode=1 is with it. As such, it > doesn't matter which OS you are on, it's the only mode that will work > (reliably). > > The problem is that corosync needs to detect state changes quickly. It > does this using the totem protocol (which serves other purposes), which > passes a token around the nodes in the cluster. If a node is sent a > token and the token is not returned within a time-out period, it is > declared lost and a new token is dispatched. Once too many failures > occur in a row, the node is declared lost and it is ejected from the > cluster. This process is detailed in the link above under the "Concept; > Fencing" section. > > With all modes other than mode=1, the failure recovery and/or the > restoration of a link in the bond causes a sufficient disruption to > cause a node to be declared lost. As I mentioned, this matches my > experience in testing the other modes. It isn't an arbitrary rule. > > As for non-clustered traffic; the usefulness of other bond modes depends > entirely on the traffic you are pushing over it. Personally, I am > focused on HA in clusters, so I only use mode=1, regardless of the > traffic designed for it. > > digimer I was dealing with an issue where network performance had to be improved in a high availability cluster and while going through the archives I saw this thread. Would this condition of bonding mode being 1 (or active backup) also apply when we have different interfaces for cluster communication and service networks ? In such a scenario, can't we have the bonding mode for the cluster communication network interfaces as 1 and the bonding mode for the interfaces on service network as 0 or 5 (or any other suitable mode) ? Thanks, -- Manish From lists at alteeve.ca Sat Jan 5 19:20:10 2013 From: lists at alteeve.ca (Digimer) Date: Sat, 05 Jan 2013 14:20:10 -0500 Subject: [Linux-cluster] Packet loss after configuring Ethernet bonding In-Reply-To: References: <1352514375.40862.YahooMailNeo@web193003.mail.sg3.yahoo.com> <509DC1E9.9090704@alteeve.ca> <1352520739.40244.YahooMailNeo@web193002.mail.sg3.yahoo.com> <509DD694.1000900@alteeve.ca> Message-ID: <50E87CEA.6090609@alteeve.ca> On 01/05/2013 02:23 AM, Manish Kathuria wrote: > On Sat, Nov 10, 2012 at 9:52 AM, Digimer wrote: >> On 11/09/2012 11:12 PM, Zama Ques wrote: > >>>> Need help on resolving a issue related to implementing High Availability at network level . I understand that this is not the right forum to ask this question , but since it is related to HA and Linux , I am asking here and I feel somebody here will have answer to the issues I am facing . >>>> >>>> I am trying to implement Ethernet Bonding , Both the interface in my server are connected to two different network switches . >>>> >>>> My configuration is as follows: >>>> >>>> ======== >>>> # cat /proc/net/bonding/bond0 >>>> >>>> Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009) >>>> >>>> Bonding Mode: adaptive load balancing Primary Slave: None Currently >>>> Active Slave: eth0 MII Status: up MII Polling Interval (ms): 0 Up Delay >>>> (ms): 0 Down Delay (ms): 0 >>>> >>>> Slave Interface: eth0 MII Status: up Speed: 1000 Mbps Duplex: full Link >>>> Failure Count: 0 Permanent HW addr: e4:e1:5b:d0:11:10 Slave queue ID: 0 >>>> >>>> Slave Interface: eth1 MII Status: up Speed: 1000 Mbps Duplex: full Link >>>> Failure Count: 0 Permanent HW addr: e4:e1:5b:d0:11:14 Slave queue ID: 0 >>>> ------------ >>>> # cat /sys/class/net/bond0/bonding/mode >>>> >>>> balance-alb 6 >>>> >>>> >>>> # cat /sys/class/net/bond0/bonding/miimon >>>> 0 >>>> >>>> ============ >>>> >>>> >>>> The issue for me is that I am seeing packet loss after configuring bonding . Tried connecting both the interface to the same switch , but still seeing the packet loss . Also , tried changing miimon value to 100 , but still seeing the packet loss. >>>> >>>> What I am missing in the configuration ? Any help will be highly appreciated in resolving the problem . >>>> >>>> >>>> >>>> Thanks >>>> Zaman >>> >>> > You didn't share any details on your configuration, but I will assume >>>> you are using corosync. >>> >>>> The only supported bonding mode is Active/Passive (mode=1). I've >>>> personally tried all modes, out of curiosity, and all had problems. The >>>> short of it is that if you need more that 1 gbit of performance, buy >>>> faster cards. >>> >>>> If you are interested in what I use, it's documented here: >>> >>>> https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Network >>> >>>> I've used this setup in several production clusters and have tested >>>> failure are recovery extensively. It's proven very stable. :) >>> >>> >>> Thanks Digimer for the quick response and pointing me to the link . I am yet to reach cluster configuration , initially trying to understand ethernet bonding before going into cluster configuration. So , option for me is only to use Active/Passive bonding mode in case of clustered environment. >>> Few more clarifications needed , Can we use other bonding modes in non clustered environment . I am seeing packet loss in other modes . Also , the support of using only mode=1 in cluster environment is it a restriction of RHEL Cluster suite or it is by design . >>> >>> Will be great if you clarify these queries . >>> >>> Thanks in Advance >>> Zaman >> >> Corosync is the only actively developed/supported (HA) cluster >> communications and membership tool. It's used on all modern distros for >> clustering and the requirement for mode=1 is with it. As such, it >> doesn't matter which OS you are on, it's the only mode that will work >> (reliably). >> >> The problem is that corosync needs to detect state changes quickly. It >> does this using the totem protocol (which serves other purposes), which >> passes a token around the nodes in the cluster. If a node is sent a >> token and the token is not returned within a time-out period, it is >> declared lost and a new token is dispatched. Once too many failures >> occur in a row, the node is declared lost and it is ejected from the >> cluster. This process is detailed in the link above under the "Concept; >> Fencing" section. >> >> With all modes other than mode=1, the failure recovery and/or the >> restoration of a link in the bond causes a sufficient disruption to >> cause a node to be declared lost. As I mentioned, this matches my >> experience in testing the other modes. It isn't an arbitrary rule. >> >> As for non-clustered traffic; the usefulness of other bond modes depends >> entirely on the traffic you are pushing over it. Personally, I am >> focused on HA in clusters, so I only use mode=1, regardless of the >> traffic designed for it. >> >> digimer > > I was dealing with an issue where network performance had to be > improved in a high availability cluster and while going through the > archives I saw this thread. > > Would this condition of bonding mode being 1 (or active backup) also > apply when we have different interfaces for cluster communication and > service networks ? In such a scenario, can't we have the bonding mode > for the cluster communication network interfaces as 1 and the bonding > mode for the interfaces on service network as 0 or 5 (or any other > suitable mode) ? > > Thanks, > -- > Manish That should be fine. Note though that if you use your other network as a backup totem ring, and for some reason corosync fails over to that ring, it will fail back again if a member in the non-mode=1 bond hiccups or fails. I've not tested this though, of course, so there might be a gotcha I don't know about. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From queszama at yahoo.in Sun Jan 6 02:35:30 2013 From: queszama at yahoo.in (Zama Ques) Date: Sun, 6 Jan 2013 10:35:30 +0800 (SGT) Subject: [Linux-cluster] GFS without creating a cluster In-Reply-To: <50E5BD37.9080804@alteeve.ca> References: <1357207246.26063.YahooMailNeo@web193504.mail.sg3.yahoo.com> <1357208199.2696.1.camel@menhir> <1357220255.13131.YahooMailNeo@web193506.mail.sg3.yahoo.com> <1357226555.85245.YahooMailNeo@web193503.mail.sg3.yahoo.com> <50E5BD37.9080804@alteeve.ca> Message-ID: <1357439730.4360.YahooMailNeo@web193506.mail.sg3.yahoo.com> ________________________________ From: Digimer To: Zama Ques ; linux clustering Cc: Rajveer Singh Sent: Thursday, 3 January 2013 10:47 PM Subject: Re: [Linux-cluster] GFS without creating a cluster On 01/03/2013 10:22 AM, Zama Ques wrote: >? ? ----- Original Message ----- >? ? From: Steven Whitehouse ? ? > >? ? To: Zama Ques >; linux >? ? clustering > >? ? Cc: >? ? Sent: Thursday, 3 January 2013 3:46 PM >? ? Subject: Re: [Linux-cluster] GFS without creating a cluster > >? ? Hi, > >? ? On Thu, 2013-01-03 at 18:00 +0800, Zama Ques wrote: >? ? > Hi All , >? ? > >? ? > >? ? > Need few clarification regarding GFS. >? ? > >? ? > >? ? > I need to create a shared file system for our servers . The >? ? servers will write to the shared file system at the same time and >? ? there is no requirement for a cluster . >? ? > >? ? > Planning to use GFS but GFS requires cluster software to be >? ? running . My confusion here is If I just run the cluster software ( >? ? cman etc ) without creating a cluster , will I be able to configure >? ? and run GFS2. Also , is it possible to write to a GFS file system >? ? from many servers at the same time ? >? ? > >? ? > Will be great if somebody can clarify by doubts. >? ? > >? ? > >? ? > Thanks in Advance >? ? > Zaman >? ? > >? ? > > >? ? > If you want to use GFS2 without a cluster, then you'll only be able to >? ? > use it from a single node (just like if you were using ext3 for >? ? > example). If you want to use GFS2 as intended, with multiple nodes >? ? > accessing the same filesystem, then you'll need to set up a cluster in >? ? > order to do so, > >? ? Thanks Steve for the reply . As you said setting up a cluster is >? ? needed to use GFS2 with multiple nodes, does that mean that I need >? ? to create cluster.conf or running cluster services (cman etc) should >? ? be fine for setting up GFS2. Not sure whether cman will run without >? ? creating cluster.conf > >? ? Assuming that I need to setup cluster.conf in order to use GFS2 , >? ? that means if there are two nodes in the cluster with GFS2 as file >? ? system resource , GFS2 will be mounted on only one host based on >? ? failover domain policy . But our requirement is like that GFS2 >? ? should be mounted on both servers at the same time? . Based on my >? ? little understanding of GFS , looks to me that I will not be able to >? ? achieve this using GFS2 or there are some way to achieve this ? > >? ? Please clarify on this. > >? > Hi Zama, >> As steve said, you must have to configure proper cluster to use GFS2 > filesystem and mounted on multiple nodes at the same time so that all > can > access it. You do not need to configure GFS2 filesystem to be > managed by cluster i.e. rgmanager. but just make the entry in /etc/fstab > file as like > normal ext3 filesystem. >> I hope, it answers your question. > > Thanks Rajveer for clarifying . I think I am clear now . Will now try to > configure GFS2. > > > Thanks > Zaman > Note that you will also need proper fencing setup (usually using the > nodes' IPMI interface). Without properly configured, tested fencing, the > first time a node fails the GFS2 partition will hang (by design). > The reason the cluster is needed is that the access to the shared > storage and file system has to be coordinated between the nodes so that > one node doesn't step on the other. This is possible thanks to DLM; > distributed lock manager. DLM uses the cluster communications, hence the > need for the cluster. > Note also that you need shared storage, obviously. iSCSI or DRBD if you > only have two nodes. ? > Please take a look at this link. It explains in details how this works; >? https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing Thanks Digimer for pointing the need of proper fencing setup . After configuring GFS , I did power down on one of the node and could see that the GFS mount point got hung on the other host as you have pointed out . Will now try to add fencing to the cluster. We are using HP Storage works for shared storage and accessing space from it using multipathing. Thanks Zaman Thanks Digimer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainer.hartwig.schubert at gmail.com Mon Jan 7 07:55:44 2013 From: rainer.hartwig.schubert at gmail.com (Rainer Schubert) Date: Mon, 7 Jan 2013 08:55:44 +0100 Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out In-Reply-To: <50E6ED5C.1090003@alteeve.ca> References: <50E5BE3A.1090006@alteeve.ca> <50E6ED5C.1090003@alteeve.ca> Message-ID: Hi, thank you for the fast answer. Now I have one question: - It is possible to integrate the fencing-service on a working cluster? I have working virtualisation enviroment with 50 VMs, so i can't take them down. best regards 2013/1/4 Digimer : > As Emmanuel said, you need fencing. > > Please read this: > > https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing > > digimer > > On 01/04/2013 02:26 AM, Rainer Schubert wrote: >> Hi, >> >> my cluster.conf: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> 2013/1/3 Digimer : >>> On 01/03/2013 08:21 AM, Rainer Schubert wrote: >>>> Hi, >>>> >>>> I have created a small CMAN-Cluster with 3 Nodes and a CLVM >>>> configuration. Now, I want to add a new node (mynode4). CMAN works >>>> fine, cman_tool shows all members: >>>> >>>> # cman_tool nodes >>>> Node Sts Inc Joined Name >>>> 1 M 408 2013-01-03 14:00:57 mynode1 >>>> 2 M 408 2013-01-03 14:00:57 mynode2 >>>> 3 M 408 2013-01-03 14:00:57 mynode3 >>>> 4 M 404 2013-01-03 14:00:56 mynode4 >>>> >>>> >>>> cman_tool services (on mynode4) >>>> >>>> fence domain >>>> member count 4 >>>> victim count 0 >>>> victim now 0 >>>> master nodeid 1 >>>> wait state none >>>> members 1 2 3 4 >>>> >>>> >>>> corosync: >>>> >>>> corosync-cfgtool -s >>>> Printing ring status. >>>> Local node ID 4 >>>> RING ID 0 >>>> id = 10.10.10.13 >>>> status = ring 0 active with no faults >>>> >>>> Everything looks fine, from my site. No I will start clvmd >>>> >>>> :~# /etc/init.d/clvm start >>>> Starting Cluster LVM Daemon: clvm clvmd startup timed out >>>> >>>> The CLVM runs into a time out. >>>> >>>> My System: >>>> >>>> cat /etc/debian_version >>>> 6.0.6 >>>> >>>> # lvm version >>>> LVM version: 2.02.66(2) (2010-05-20) >>>> Library version: 1.02.48 (2010-05-20) >>>> Driver version: 4.22.0 >>>> >>>> dpkg -l |grep clvm >>>> ii clvm 2.02.66-5 >>>> Cluster LVM Daemon for lvm2 >>>> >>>> dpkg -l |grep cman >>>> ii cman 3.0.12-2 >>>> Red Hat cluster suite - cluster manager >>>> ii libcman3 3.0.12-2 >>>> Red Hat cluster suite - cluster manager libraries >>>> >>>> Have anybody a idea, what running false? >>>> >>>> best regards >>>> >>> >>> Can you post your cluster.conf please? Obfuscate as little as you can >>> please. >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.ca/w/ >>> What if the cure for cancer is trapped in the mind of a person without >>> access to education? > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? From misch at schwartzkopff.org Mon Jan 7 08:59:25 2013 From: misch at schwartzkopff.org (Michael Schwartzkopff) Date: Mon, 07 Jan 2013 09:59:25 +0100 Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out In-Reply-To: References: <50E6ED5C.1090003@alteeve.ca> Message-ID: <2810431.8ivETSkJLk@nb003> Am Montag, 7. Januar 2013, 08:55:44 schrieb Rainer Schubert: > Hi, > > thank you for the fast answer. Now I have one question: > > - It is possible to integrate the fencing-service on a working cluster? > > I have working virtualisation enviroment with 50 VMs, so i can't take > them down. > > best regards Hi, yes you can update your setup while the cluster is running. See the doc of cman. Please be careful when setting up and testing the fencing while your cluster provides services. Greetings, -- Dr. Michael Schwartzkopff Guardinistr. 63 81375 M?nchen Tel: (0163) 172 50 98 Fax: (089) 620 304 13 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Mon Jan 7 15:46:55 2013 From: lists at alteeve.ca (Digimer) Date: Mon, 07 Jan 2013 10:46:55 -0500 Subject: [Linux-cluster] CMAN & CLVM clvmd startup timed out In-Reply-To: References: <50E5BE3A.1090006@alteeve.ca> <50E6ED5C.1090003@alteeve.ca> Message-ID: <50EAEDEF.3050505@alteeve.ca> Technically, yes. Practically, no. You need to know if your configuration is working. The best way to do that is to simulate a failure and watch to make sure that the fence actions happen. I would strongly recommend scheduling down time to do this. digimer On 01/07/2013 02:55 AM, Rainer Schubert wrote: > Hi, > > thank you for the fast answer. Now I have one question: > > - It is possible to integrate the fencing-service on a working cluster? > > I have working virtualisation enviroment with 50 VMs, so i can't take > them down. > > best regards > > > > 2013/1/4 Digimer : >> As Emmanuel said, you need fencing. >> >> Please read this: >> >> https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing >> >> digimer >> >> On 01/04/2013 02:26 AM, Rainer Schubert wrote: >>> Hi, >>> >>> my cluster.conf: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> 2013/1/3 Digimer : >>>> On 01/03/2013 08:21 AM, Rainer Schubert wrote: >>>>> Hi, >>>>> >>>>> I have created a small CMAN-Cluster with 3 Nodes and a CLVM >>>>> configuration. Now, I want to add a new node (mynode4). CMAN works >>>>> fine, cman_tool shows all members: >>>>> >>>>> # cman_tool nodes >>>>> Node Sts Inc Joined Name >>>>> 1 M 408 2013-01-03 14:00:57 mynode1 >>>>> 2 M 408 2013-01-03 14:00:57 mynode2 >>>>> 3 M 408 2013-01-03 14:00:57 mynode3 >>>>> 4 M 404 2013-01-03 14:00:56 mynode4 >>>>> >>>>> >>>>> cman_tool services (on mynode4) >>>>> >>>>> fence domain >>>>> member count 4 >>>>> victim count 0 >>>>> victim now 0 >>>>> master nodeid 1 >>>>> wait state none >>>>> members 1 2 3 4 >>>>> >>>>> >>>>> corosync: >>>>> >>>>> corosync-cfgtool -s >>>>> Printing ring status. >>>>> Local node ID 4 >>>>> RING ID 0 >>>>> id = 10.10.10.13 >>>>> status = ring 0 active with no faults >>>>> >>>>> Everything looks fine, from my site. No I will start clvmd >>>>> >>>>> :~# /etc/init.d/clvm start >>>>> Starting Cluster LVM Daemon: clvm clvmd startup timed out >>>>> >>>>> The CLVM runs into a time out. >>>>> >>>>> My System: >>>>> >>>>> cat /etc/debian_version >>>>> 6.0.6 >>>>> >>>>> # lvm version >>>>> LVM version: 2.02.66(2) (2010-05-20) >>>>> Library version: 1.02.48 (2010-05-20) >>>>> Driver version: 4.22.0 >>>>> >>>>> dpkg -l |grep clvm >>>>> ii clvm 2.02.66-5 >>>>> Cluster LVM Daemon for lvm2 >>>>> >>>>> dpkg -l |grep cman >>>>> ii cman 3.0.12-2 >>>>> Red Hat cluster suite - cluster manager >>>>> ii libcman3 3.0.12-2 >>>>> Red Hat cluster suite - cluster manager libraries >>>>> >>>>> Have anybody a idea, what running false? >>>>> >>>>> best regards >>>>> >>>> >>>> Can you post your cluster.conf please? Obfuscate as little as you can >>>> please. >>>> >>>> -- >>>> Digimer >>>> Papers and Projects: https://alteeve.ca/w/ >>>> What if the cure for cancer is trapped in the mind of a person without >>>> access to education? >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ >> What if the cure for cancer is trapped in the mind of a person without >> access to education? -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From tc3driver at gmail.com Mon Jan 7 19:25:14 2013 From: tc3driver at gmail.com (Bill G.) Date: Mon, 7 Jan 2013 11:25:14 -0800 Subject: [Linux-cluster] Please settle a bet for me Message-ID: Hi list, We are having a discussion about clustering on RHEL 5.2 and 5.4. Knowing that there are no supported fence devices for VMWare 4.1 and the given versions of RHEL. As far as I can tell there must be a fence device for any type of automatic fail over, but coworkers are insisting that if a clustered process dies it can and will automatically start/fail over... it just won't if there is a hardware failure. They are also saying that you will be able to move processes from one node to another without fencing. I am insisting that clustering does not work without an fence device. Please settle this :) -- Thanks, Bill G. tc3driver at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From arpittolani at gmail.com Mon Jan 7 19:50:18 2013 From: arpittolani at gmail.com (Arpit Tolani) Date: Tue, 8 Jan 2013 01:20:18 +0530 Subject: [Linux-cluster] Please settle a bet for me In-Reply-To: References: Message-ID: Hello On Tue, Jan 8, 2013 at 12:55 AM, Bill G. wrote: > Hi list, > > We are having a discussion about clustering on RHEL 5.2 and 5.4. > > Knowing that there are no supported fence devices for VMWare 4.1 and the > given versions of RHEL. > > As far as I can tell there must be a fence device for any type of automatic > fail over, but coworkers are insisting that if a clustered process dies it > can and will automatically start/fail over... it just won't if there is a > hardware failure. They are also saying that you will be able to move > processes from one node to another without fencing. > > I am insisting that clustering does not work without an fence device. > > Please settle this :) > Yes, fencing mandatory with cluster, Without fencing your cluster will fail to work. Refer https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Concept.3B_Fencing If you have Red Hat Support, Check below Kbase. https://access.redhat.com/knowledge/solutions/15575 Even manual fencing is not supported i.e. fence_manual. Hope that helps. Regards Arpit Tolani From sam at dotsec.com Mon Jan 7 22:29:28 2013 From: sam at dotsec.com (Sam Wilson) Date: Tue, 08 Jan 2013 08:29:28 +1000 Subject: [Linux-cluster] Please settle a bet for me In-Reply-To: References: Message-ID: <50EB4C48.8020209@dotsec.com> Hi Bill, As far as I have experienced under pacemaker this is true in most cases. EG: Two nodes running a master/slave httpd will fail over without fencing. However, if for example your nodes are also using GFS2 and something goes wrong then you will find your filesystem locked by DLM which will obviously break fail over for services on that filesystem. In short, best to configure fencing unless this is a lab environment your willing to break! Cheers, Sam From lists at alteeve.ca Tue Jan 8 01:24:34 2013 From: lists at alteeve.ca (Digimer) Date: Mon, 07 Jan 2013 20:24:34 -0500 Subject: [Linux-cluster] Please settle a bet for me In-Reply-To: <50EB4C48.8020209@dotsec.com> References: <50EB4C48.8020209@dotsec.com> Message-ID: <50EB7552.30903@alteeve.ca> On 01/07/2013 05:29 PM, Sam Wilson wrote: > Hi Bill, > > As far as I have experienced under pacemaker this is true in most cases. > EG: Two nodes running a master/slave httpd will fail over without fencing. > > However, if for example your nodes are also using GFS2 and something > goes wrong then you will find your filesystem locked by DLM which will > obviously break fail over for services on that filesystem. > > In short, best to configure fencing unless this is a lab environment > your willing to break! > > Cheers, > > Sam DLM absolutely requires fencing, but even without it, a production cluster without fencing is a bad day waiting to happen. Please always use fencing... It will save you far more headache in the long run. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From ccaulfie at redhat.com Tue Jan 8 09:22:30 2013 From: ccaulfie at redhat.com (Christine Caulfield) Date: Tue, 08 Jan 2013 09:22:30 +0000 Subject: [Linux-cluster] Please settle a bet for me In-Reply-To: References: Message-ID: <50EBE556.1030309@redhat.com> On 07/01/13 19:25, Bill G. wrote: > Hi list, > > We are having a discussion about clustering on RHEL 5.2 and 5.4. > > Knowing that there are no supported fence devices for VMWare 4.1 and the > given versions of RHEL. > > As far as I can tell there must be a fence device for any type of > automatic fail over, but coworkers are insisting that if a clustered > process dies it can and will automatically start/fail over... it just > won't if there is a hardware failure. They are also saying that you will > be able to move processes from one node to another without fencing. > > I am insisting that clustering does not work without an fence device. > > Please settle this :) Clustering will work without a fence device but no-one will support you doing it. In the vast majority of cases you are risking your data. In particular cases you can run without fencing if you really, really, really know what you are doing (ie are on the dev team) and have a particular workload. But if you misjudge the installation ... see above :) Chrissie From rossnick-lists at cybercat.ca Wed Jan 9 15:50:25 2013 From: rossnick-lists at cybercat.ca (Nicolas Ross) Date: Wed, 09 Jan 2013 10:50:25 -0500 Subject: [Linux-cluster] Moving Physical extents from one PV to another in a clustered environement. In-Reply-To: <50C9FFEF.6030303@cybercat.ca> References: <50C8C89F.9080200@cybercat.ca> <20121212183937.GI14097@squishy.elizium.za.net> <50C8DB71.7010009@cybercat.ca> <20121212202905.GJ14097@squishy.elizium.za.net> <50C94177.8070706@cybercat.ca> <00386E1F-FEC6-4CD2-8BB8-8C61A48E17DE@gmail.com> <50C9E757.4080005@cybercat.ca> <50C9E91A.1080807@redhat.com> <50C9FFEF.6030303@cybercat.ca> Message-ID: <50ED91C1.7020207@cybercat.ca> >> You will need to install 'cmirror' package(s),and start cmirror service >> on all cluster nodes >> >> # service cmirror start >> >> After that pvmove should work > No it didn't. I posted in a previous email what it did, It complains > that it cannot lock the vg. Just a quick note on this issue. Yesterday, I installed the latest versions of the kernel and rebooted the whole cluster, and now I can move my lv to another PV ! So cmirrord was indeed the solutiuon. Regards, From gounini.geekarea at gmail.com Fri Jan 11 12:02:18 2013 From: gounini.geekarea at gmail.com (GouNiNi Geekarea) Date: Fri, 11 Jan 2013 13:02:18 +0100 (CET) Subject: [Linux-cluster] [rgmanager] sending email on relocate In-Reply-To: <597884669.12276.1357905595375.JavaMail.root@geekarea.fr> Message-ID: <1060367169.12284.1357905738867.JavaMail.root@geekarea.fr> Hello everyone, I didn't find any simple solution to send emails to alerte when rgmanager decides to relocate services. Do you know simple solution other than create a script ressource? Regards, From robejrm at gmail.com Fri Jan 11 14:15:05 2013 From: robejrm at gmail.com (Juan Ramon Martin Blanco) Date: Fri, 11 Jan 2013 15:15:05 +0100 Subject: [Linux-cluster] [rgmanager] sending email on relocate In-Reply-To: <1060367169.12284.1357905738867.JavaMail.root@geekarea.fr> References: <597884669.12276.1357905595375.JavaMail.root@geekarea.fr> <1060367169.12284.1357905738867.JavaMail.root@geekarea.fr> Message-ID: On Fri, Jan 11, 2013 at 1:02 PM, GouNiNi Geekarea < gounini.geekarea at gmail.com> wrote: > Hello everyone, > > I didn't find any simple solution to send emails to alerte when rgmanager > decides to relocate services. > Do you know simple solution other than create a script ressource? > > Regards, > > You can parse cluster logs (if in debug mode) and send mail if something like: "clurgmgrd[30752]: Sent remote-start request to 2" or "attempting to relocate" happens on them. I.e: you can use rsyslog ommail http://www.rsyslog.com/doc/ommail.html Greetings, Juanra > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gounini.geekarea at gmail.com Fri Jan 11 14:05:09 2013 From: gounini.geekarea at gmail.com (GouNiNi Geekarea) Date: Fri, 11 Jan 2013 15:05:09 +0100 (CET) Subject: [Linux-cluster] [rgmanager] sending email on relocate In-Reply-To: References: <597884669.12276.1357905595375.JavaMail.root@geekarea.fr> <1060367169.12284.1357905738867.JavaMail.root@geekarea.fr> Message-ID: <1681853954.12472.1357913109168.JavaMail.root@geekarea.fr> Good idea, does it mean there is nothing built in rgmanager ? ----- Mail original ----- > De: "Juan Ramon Martin Blanco" > ?: "linux clustering" > Envoy?: Vendredi 11 Janvier 2013 15:15:05 > Objet: Re: [Linux-cluster] [rgmanager] sending email on relocate > > > > On Fri, Jan 11, 2013 at 1:02 PM, GouNiNi Geekarea < > gounini.geekarea at gmail.com > wrote: > > > Hello everyone, > > I didn't find any simple solution to send emails to alerte when > rgmanager decides to relocate services. > Do you know simple solution other than create a script ressource? > > Regards, > > You can parse cluster logs (if in debug mode) and send mail if > something like: > "clurgmgrd[30752]: Sent remote-start request to 2" > or > "attempting to relocate" > happens on them. > > I.e: you can use rsyslog ommail > http://www.rsyslog.com/doc/ommail.html > > Greetings, > Juanra > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From christiangrassi at gmail.com Sat Jan 12 00:24:20 2013 From: christiangrassi at gmail.com (Christian Grassi) Date: Sat, 12 Jan 2013 01:24:20 +0100 Subject: [Linux-cluster] Corosync softlookup In-Reply-To: References: Message-ID: Hi all, I have a three node cluster which run KVM guests a services. The system run fine for some months but the suddenly it started to have soft lockups as you can se below and the nodes get fenced. The guests use clvm with raw lv as back end, and the config files are on shared gfs2 file systems. Any idea which could be the cause ? A attache also my cluster.conf Any idea is welcome Regards Chris Pid: 136556, comm: corosync Not tainted 2.6.32-279.el6.x86_64 #1 HP ProLiant DL980 G7 RIP: 0010:[] [] wait_for_rqlock+0x2e/0x40 RSP: 0018:ffff881c12231ee8 EFLAGS: 00000206 RAX: 00000000e52ae4c7 RBX: ffff881c12231ee8 RCX: ffff882070e16680 RDX: 00000000e52ae4c7 RSI: ffff882070e11960 RDI: 0000000000000000 RBP: ffffffff8100bc0e R08: 0000000000000000 R09: dead000000200200 R10: ffff881c125830c0 R11: 00000000000000d2 R12: 0000000000000282 R13: ffffffff81aa5700 R14: ffff882070e11960 R15: ffff881c12583438 FS: 0000000000000000(0000) GS:ffff882070e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000035a489a490 CR3: 0000000001a85000 CR4: 00000000000026e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process corosync (pid: 136556, threadinfo ffff881c12230000, task ffff881c12582aa0) Stack: ffff881c12231f68 ffffffff8107091b ffff881c12231f78 ffff881c12231f28 ffff881faf1d5660 ffff881c12582f68 ffff881c12582f68 0000000000000000 ffff881c12231f28 ffff881c12231f28 ffff881c12231f78 00007f9ce339d440 Call Trace: [] ? do_exit+0x5ab/0x870 [] ? sys_exit+0x17/0x20 [] ? system_call_fastpath+0x16/0x1b Code: e5 0f 1f 44 00 00 48 c7 c0 80 66 01 00 65 48 8b 0c 25 b0 e0 00 00 0f ae f0 48 01 c1 eb 09 0f 1f 80 00 00 00 00 f3 90 8b 01 89 c2 fa 10 66 39 c2 75 f2 c9 c3 0f 1f 84 00 00 00 00 00 55 48 89 Call Trace: [] ? do_exit+0x5ab/0x870 [] ? sys_exit+0x17/0x20 [] ? system_call_fastpath+0x16/0x1b BUG: soft lockup - CPU#90 stuck for 67s! [multipathd:141345] Modules linked in: iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables gfs2 dlm configfs autofs4 sunrpc bridge bonding 8021q garp stp llc ipv6 ext2 vhost_net macvtap macvlan tun kvm_intel kvm microcode serio_raw power_meter be2net bnx2 netxen_nic iTCO_wdt iTCO_vendor_support hpilo hpwdt sg i7core_edac edac_core shpchp ext4 mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif lpfc scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cluster.conf Type: application/octet-stream Size: 7783 bytes Desc: not available URL: From jsosic at srce.hr Mon Jan 14 01:47:56 2013 From: jsosic at srce.hr (Jakov Sosic) Date: Mon, 14 Jan 2013 02:47:56 +0100 Subject: [Linux-cluster] Problem with automating ricci & ccs Message-ID: <50F363CC.3020308@srce.hr> Hi. I'm using CentOS 6, and have a problem with ccs & ricci. At first use, ccs asks for password for each node. After that, ~/.ccs is generated with cert in it. 1. I've found how to generate private key in ~/.ccs from the code in ccs python executable (/usr/sbin/ccs). 2. I've also found how to generate CA in /var/lib/ricci/certs => code for that can be found in init script of ricci (/etc/init.d/ricci). But what I am missing is how to use the user key/certificate from step 1 and sign it into CA in step 2? I'm building puppet module which will autoconfigure whole cluster from bare metal to working state. So far my only problem is updating cluster.conf, for which I need fully working ricci CA and user certificates in /root/.ccs of every node... So, any ideas are welcome. -- Jakov Sosic www.srce.unizg.hr From Ralph.Grothe at itdz-berlin.de Wed Jan 16 09:19:19 2013 From: Ralph.Grothe at itdz-berlin.de (Ralph.Grothe at itdz-berlin.de) Date: Wed, 16 Jan 2013 10:19:19 +0100 Subject: [Linux-cluster] [rgmanager] sending email on relocate In-Reply-To: <1060367169.12284.1357905738867.JavaMail.root@geekarea.fr> References: <597884669.12276.1357905595375.JavaMail.root@geekarea.fr> <1060367169.12284.1357905738867.JavaMail.root@geekarea.fr> Message-ID: I have implemented this on our RHCS clusters where such a feature, as e.g. sending out a notification SMS text message when a service does relocate, was requested by the users/admins of this service, by simply adding a script function to the RHCS resource agent (RA) code (i.e. mostly a custom script of RHCS RA type "script", very similar to a SysV init script) and placing an invocation statement to this function in the RA script's start and stop blocks with applicable subject line and body text. I found the SWAKS client ( http://www.jetmore.org/john/code/swaks/ ) nifty to this end because it's easy to use and offers (almost?) complete control over SMTP communication. If I feel the urge to dig deeper I usually use the CPAN module Mail::Sender ( http://search.cpan.org/~jenda/Mail-Sender-0.8.22/Sender.pm ) with a few helper modules should they be required. On the other hand I have Nagios service checks for most of our cluster services (not only Linux clusters) that would trigger a notification or other event handler on critical state changes and let Nagios do a centralized notifying. Often that's the only way to get out messages anyway (e.g. where clusters operate in shielded LANs) > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of > GouNiNi Geekarea > Sent: Friday, January 11, 2013 13:02 > To: linux clustering > Subject: [Linux-cluster] [rgmanager] sending email on relocate > > Hello everyone, > > I didn't find any simple solution to send emails to alerte > when rgmanager decides to relocate services. > Do you know simple solution other than create a script ressource? > > Regards, > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From jpokorny at redhat.com Wed Jan 16 13:14:56 2013 From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=) Date: Wed, 16 Jan 2013 14:14:56 +0100 Subject: [Linux-cluster] Problem with automating ricci & ccs In-Reply-To: <50F363CC.3020308@srce.hr> References: <50F363CC.3020308@srce.hr> Message-ID: <20130116131456.GA6079@redhat.com> Hello Jakov, On 14/01/13 02:47 +0100, Jakov Sosic wrote: > Hi. > > I'm using CentOS 6, and have a problem with ccs & ricci. > > At first use, ccs asks for password for each node. After that, ~/.ccs is > generated with cert in it. > > 1. I've found how to generate private key in ~/.ccs from the code in ccs > python executable (/usr/sbin/ccs). > > 2. I've also found how to generate CA in /var/lib/ricci/certs => code for > that can be found in init script of ricci (/etc/init.d/ricci). > > But what I am missing is how to use the user key/certificate from step 1 and > sign it into CA in step 2? The point here is that once the public certificate of ccs is recognized by ricci as authorized by supplying the password within the initial session, any other other session will be passwordless, based only on the "proved" client's certificate. Your intention seems to be to skip the initial phase involving password, is it the case? This should be doable by forcing ccs to generate its certificate by doing some NO-OP, then copying (scp?) the public part to the predefined destination at the machine with ricci installed, e.g.: [root at client1]# ccs -h localhost -p IGNOREME --getconf &>/dev/null [root at client1]# PUBLIC_CERT=~/.ccs/cacert/pem [root at client1]# RICCI_CLIENTS=/var/lib/ricci/certs/clients [root at client1]# UNIQUE_SUFFIX=$(hostname | sha1sum | cut -b1-6) [root at client1]# RICCI_CERT=${RICCI_CLIENTS}/client_cert_${UNIQUE_SUFFIX} [root at client1]# scp $PUBLIC_CERT riccihost:$RICCI_CERT Please note that 'sha1sum' command in the above example is only used to minimize possible collision at certificate filenames coming from other machines (under highly unprobable circumstances, collision can still happen) that will possibly run the same sequence, and otherwise does not guarantee any anonymity of the certificate within the ricci's certs/clients directory. Surely, the first step can be substituted by either using pregenerated certificate + key on the locations expected by ccs (~/.ccs) or generating them explicitly (e.g., by "openssl req") as part of the process. The point is that css-local and ricci-tracked certificate (one of presumably many) matches. > I'm building puppet module which will autoconfigure whole cluster from bare > metal to working state. So far my only problem is updating cluster.conf, for > which I need fully working ricci CA and user certificates in /root/.ccs of > every node... By any chance, are you willing to share the module or its skeleton to the community? > So, any ideas are welcome. Hope the above helps. -- Jan From epretorious at yahoo.com Fri Jan 18 04:59:04 2013 From: epretorious at yahoo.com (Eric) Date: Thu, 17 Jan 2013 20:59:04 -0800 (PST) Subject: [Linux-cluster] HA iSCSI with DRBD Message-ID: <1358485144.16911.YahooMailNeo@web126001.mail.ne1.yahoo.com> I've been attempting to follow the recipe laid-out in the Linbit guide "Highly available iSCSI storage with DRBD and Pacemaker" to create a highly-available iSCSI server on the two servers san1 & san2 but can't quite get the details right: > crm configure property stonith-enabled=false > crm configure property no-quorum-policy=ignore > > crm configure primitive p_IP-1_254 ocf:heartbeat:IPaddr2 params ip=192.168.1.254 cidr_netmask=24 op monitor interval=30s > > crm configure primitive p_DRBD-r0 ocf:linbit:drbd params drbd_resource=r0 op monitor interval=60s > crm configure ms ms_DRBD-r0 p_DRBD-r0 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true > > crm configure primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget params iqn=iqn.2012-11.com.example.san1:sda op monitor interval=10s > crm configure primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=0 path=/dev/drbd0 op monitor interval=10s > crm configure primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=1 path=/dev/drbd1 op monitor interval=10s > crm configure primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=2 path=/dev/drbd2 op monitor interval=10s > crm configure primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=3 path=/dev/drbd3 op monitor interval=10s > crm configure primitive p_iSCSI-san1_4 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=4 path=/dev/drbd4 op monitor interval=10s > > crm configure group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_iSCSI-san1_4 p_IP-1_254 > crm configure order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start > crm configure colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master > crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1 IET (i.e., iscsitarget) is already running (with the default configuration) and DRBD's already correctly configured to create the resource r0... > resource r0 { > ??? volume 0 { > ??? ??? device /dev/drbd0 ; > ??? ??? disk /dev/sda7 ; > ??? ??? meta-disk internal ; > ??? } > ??? volume 1 { > ??? ??? device /dev/drbd1 ; > ??? ??? disk /dev/sda8 ; > ??? ??? meta-disk internal ; > ??? } > ??? volume 2 { > ??? ??? device /dev/drbd2 ; > ??? ??? disk /dev/sda9 ; > ??? ??? meta-disk internal ; > ??? } > ??? volume 3 { > ??? ??? device /dev/drbd3 ; > ??? ??? disk /dev/sda10 ; > ??? ??? meta-disk internal ; > ??? } > ??? on san1 { > ??? ??? address 192.168.1.1:7789 ; > ??? } > ??? on san2 { > ??? ??? address 192.168.1.2:7789 ; > ??? } > } But the shared IP address won't start nor will the LUN's: > san1:~ # crm_mon -1 > ============ > Last updated: Thu Jan 17 20:55:55 2013 > Last change: Thu Jan 17 20:55:09 2013 by root via cibadmin on san1 > Stack: openais > Current DC: san1 - partition with quorum > Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf > 2 Nodes configured, 2 expected votes > 9 Resources configured. > ============ > > Online: [ san1 san2 ] > >? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0] > ???? Masters: [ san1 ] > ???? Slaves: [ san2 ] >? Resource Group: g_iSCSI-san1 > ???? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san1 > ???? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped > ???? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped > ???? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped > ???? p_iSCSI-san1_3??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped > ???? p_iSCSI-san1_4??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped > ???? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Stopped > > Failed actions: > ??? p_iSCSI-san1_0_start_0 (node=san1, call=23, rc=1, status=complete): unknown error > ??? p_iSCSI-san1_1_start_0 (node=san1, call=26, rc=1, status=complete): unknown error > ??? p_iSCSI-san1_2_start_0 (node=san1, call=29, rc=1, status=complete): unknown error > ??? p_iSCSI-san1_3_start_0 (node=san1, call=32, rc=1, status=complete): unknown error > ??? p_iSCSI-san1_4_start_0 (node=san1, call=35, rc=1, status=complete): unknown error > ??? p_iSCSI-san1_0_start_0 (node=san2, call=11, rc=1, status=complete): unknown error > ??? p_iSCSI-san1_1_start_0 (node=san2, call=14, rc=1, status=complete): unknown error > ??? p_iSCSI-san1_2_start_0 (node=san2, call=17, rc=1, status=complete): unknown error > ??? p_iSCSI-san1_3_start_0 (node=san2, call=20, rc=1, status=complete): unknown error > ??? p_iSCSI-san1_4_start_0 (node=san2, call=23, rc=1, status=complete): unknown error What am I doing wrong? TIA, Eric Pretorious Truckee, CA -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsosic at srce.hr Fri Jan 18 18:44:16 2013 From: jsosic at srce.hr (Jakov Sosic) Date: Fri, 18 Jan 2013 19:44:16 +0100 Subject: [Linux-cluster] Problem with automating ricci & ccs In-Reply-To: <20130116131456.GA6079@redhat.com> References: <50F363CC.3020308@srce.hr> <20130116131456.GA6079@redhat.com> Message-ID: <50F99800.6070102@srce.hr> On 01/16/2013 02:14 PM, Jan Pokorn? wrote: > The point here is that once the public certificate of ccs is recognized by > ricci as authorized by supplying the password within the initial session, > any other other session will be passwordless, based only on the "proved" > client's certificate. > > Your intention seems to be to skip the initial phase involving password, > is it the case? This should be doable by forcing ccs to generate its > certificate by doing some NO-OP, then copying (scp?) the public part > to the predefined destination at the machine with ricci installed, > e.g.: > > [root at client1]# ccs -h localhost -p IGNOREME --getconf &>/dev/null > [root at client1]# PUBLIC_CERT=~/.ccs/cacert/pem > [root at client1]# RICCI_CLIENTS=/var/lib/ricci/certs/clients > [root at client1]# UNIQUE_SUFFIX=$(hostname | sha1sum | cut -b1-6) > [root at client1]# RICCI_CERT=${RICCI_CLIENTS}/client_cert_${UNIQUE_SUFFIX} > [root at client1]# scp $PUBLIC_CERT riccihost:$RICCI_CERT Thank you for your explanation. I've figured that out later myself :) So, instead of using the sha1sum to avoide collisions, I use nodename. So my client_cert names look like this: client_cert_mynode1 client_cert_mynode2 ... Is this ok? Or should I obfuscate name for some reason... > Surely, the first step can be substituted by either using pregenerated > certificate + key on the locations expected by ccs (~/.ccs) or > generating them explicitly (e.g., by "openssl req") as part > of the process. The point is that css-local and ricci-tracked > certificate (one of presumably many) matches. I've done this by pre-generating the certificates on my puppet master. >> I'm building puppet module which will autoconfigure whole cluster from bare >> metal to working state. So far my only problem is updating cluster.conf, for >> which I need fully working ricci CA and user certificates in /root/.ccs of >> every node... > > By any chance, are you willing to share the module or its skeleton > to the community? Offcourse, as soon as I'm happy with the code and the level of functionality. We have around dozen clusters, I'm developing this module on a new one that's supposed to go into production soon. Other clusters use older module which really doesn't solve any of this, so as soon as my code is stable we'll push other clusters to new puppet module. After that, I will publish it. So expect it in another week or two. > Hope the above helps. Yeah, it really did help. But, for some strange reason it seems that ccs_sync doesn't use certificates, but instead it asks for password... My idea was to use ccs_sync to propagate new cluster.conf. So, puppet puts cluster.conf in /etc/cluster.conf, and after that runs ccs_sync -f /etc/cluster.conf But unfortunately, ccs_sync doesn't seem to recognize the certificates as ccs does :( Any idea on this one? pgsql01-xc # ccs -h pgsql01-xc --getversion 2 pgsql01-xc # ccs_sync -f /etc/cluster.conf You have not authenticated to the ricci daemon on pgsql01-xc Password: I'm digging into source code to try to get some sense of it :-/ From jsosic at srce.hr Fri Jan 18 19:44:19 2013 From: jsosic at srce.hr (Jakov Sosic) Date: Fri, 18 Jan 2013 20:44:19 +0100 Subject: [Linux-cluster] Problem with automating ricci & ccs In-Reply-To: <50F99800.6070102@srce.hr> References: <50F363CC.3020308@srce.hr> <20130116131456.GA6079@redhat.com> <50F99800.6070102@srce.hr> Message-ID: <50F9A613.30902@srce.hr> On 01/18/2013 07:44 PM, Jakov Sosic wrote: > On 01/16/2013 02:14 PM, Jan Pokorn? wrote: > >> The point here is that once the public certificate of ccs is recognized by >> ricci as authorized by supplying the password within the initial session, >> any other other session will be passwordless, based only on the "proved" >> client's certificate. >> >> Your intention seems to be to skip the initial phase involving password, >> is it the case? This should be doable by forcing ccs to generate its >> certificate by doing some NO-OP, then copying (scp?) the public part >> to the predefined destination at the machine with ricci installed, >> e.g.: >> >> [root at client1]# ccs -h localhost -p IGNOREME --getconf &>/dev/null >> [root at client1]# PUBLIC_CERT=~/.ccs/cacert/pem >> [root at client1]# RICCI_CLIENTS=/var/lib/ricci/certs/clients >> [root at client1]# UNIQUE_SUFFIX=$(hostname | sha1sum | cut -b1-6) >> [root at client1]# RICCI_CERT=${RICCI_CLIENTS}/client_cert_${UNIQUE_SUFFIX} >> [root at client1]# scp $PUBLIC_CERT riccihost:$RICCI_CERT > > Thank you for your explanation. I've figured that out later myself :) > > So, instead of using the sha1sum to avoide collisions, I use nodename. > So my client_cert names look like this: > > client_cert_mynode1 > client_cert_mynode2 > ... > > Is this ok? Or should I obfuscate name for some reason... > > >> Surely, the first step can be substituted by either using pregenerated >> certificate + key on the locations expected by ccs (~/.ccs) or >> generating them explicitly (e.g., by "openssl req") as part >> of the process. The point is that css-local and ricci-tracked >> certificate (one of presumably many) matches. > > I've done this by pre-generating the certificates on my puppet master. > > >>> I'm building puppet module which will autoconfigure whole cluster from bare >>> metal to working state. So far my only problem is updating cluster.conf, for >>> which I need fully working ricci CA and user certificates in /root/.ccs of >>> every node... >> >> By any chance, are you willing to share the module or its skeleton >> to the community? > > Offcourse, as soon as I'm happy with the code and the level of > functionality. We have around dozen clusters, I'm developing this module > on a new one that's supposed to go into production soon. Other clusters > use older module which really doesn't solve any of this, so as soon as > my code is stable we'll push other clusters to new puppet module. After > that, I will publish it. So expect it in another week or two. > > >> Hope the above helps. > > Yeah, it really did help. But, for some strange reason it seems that > ccs_sync doesn't use certificates, but instead it asks for password... > > My idea was to use ccs_sync to propagate new cluster.conf. So, puppet > puts cluster.conf in /etc/cluster.conf, and after that runs ccs_sync -f > /etc/cluster.conf > > But unfortunately, ccs_sync doesn't seem to recognize the certificates > as ccs does :( Any idea on this one? > > pgsql01-xc # ccs -h pgsql01-xc --getversion > 2 > > pgsql01-xc # ccs_sync -f /etc/cluster.conf > You have not authenticated to the ricci daemon on pgsql01-xc > Password: > > > I'm digging into source code to try to get some sense of it :-/ It seems that ccs_sync run as root uses /var/lib/ricci/cacert.pem as it's own client certificate... Do you think if it's OK to use same client certificate for root user (/root/.ccs/cacert.pem) and for ricci user (/var/lib/ricci/cacert.pem) on the same machine? That way I wouldn't need to generate additional certificates for root user but just use existing ones. As it seems ccs_sync already uses them... From epretorious at yahoo.com Fri Jan 18 20:40:49 2013 From: epretorious at yahoo.com (Eric) Date: Fri, 18 Jan 2013 12:40:49 -0800 (PST) Subject: [Linux-cluster] [RESOLVED] HA iSCSI with DRBD In-Reply-To: <1358485144.16911.YahooMailNeo@web126001.mail.ne1.yahoo.com> References: <1358485144.16911.YahooMailNeo@web126001.mail.ne1.yahoo.com> Message-ID: <1358541649.85444.YahooMailNeo@web126004.mail.ne1.yahoo.com> After rebooting both nodes, I checked the cluster status again and found this: Code: > san1:~ # crm_mon -1 > ============ > Last updated: Fri Jan 18 11:51:28 2013 > Last change: Fri Jan 18 09:00:03 2013 by root via cibadmin on san2 > Stack: openais > Current DC: san2 - partition with quorum > Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf > 2 Nodes configured, 2 expected votes > 9 Resources configured. > ============ > > Online: [ san1 san2 ] > >? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0] >????? Masters: [ san2 ] >????? Slaves: [ san1 ] >? Resource Group: g_iSCSI-san1 >????? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san2 >????? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >????? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >????? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >????? p_iSCSI-san1_3??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >????? p_iSCSI-san1_4??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped >????? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Stopped > > Failed actions: >???? p_iSCSI-san1_4_start_0 (node=san1, call=25, rc=1, status=complete): unknown error >???? p_iSCSI-san1_4_start_0 (node=san2, call=30, rc=1, status=complete): unknown error ...and that's when it occured to me: There are only four volumes defined in the DRBD cofiguration (0, 1, 2, & 3) - not five (0, 1, 2, 3, & 4)! i.e., The p_iSCSI-san1_4 primitive was failing (because there is no volume /dev/drbd4) and that, in turn, was holding up theresource group g_iSCSI-san1 and causing all of the other primitives [e.g., p_IP-1_254] to fail too! So, I deleted p_iSCSI-san1_4 from the CIB and the cluster began working as designed: > san2:~ # ll /dev/drbd* > brw-rw---- 1 root disk 147, 0 Jan 18 11:47 /dev/drbd0 > brw-rw---- 1 root disk 147, 1 Jan 18 11:47 /dev/drbd1 > brw-rw---- 1 root disk 147, 2 Jan 18 11:47 /dev/drbd2 > brw-rw---- 1 root disk 147, 3 Jan 18 11:47 /dev/drbd3 > > ... > > san2:~ # crm_mon -1 > ============ > Last updated: Fri Jan 18 11:53:03 2013 > Last change: Fri Jan 18 11:52:58 2013 by root via cibadmin on san2 > Stack: openais > Current DC: san2 - partition with quorum > Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf > 2 Nodes configured, 2 expected votes > 8 Resources configured. > ============ > > Online: [ san1 san2 ] > >? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0] >????? Masters: [ san2 ] >????? Slaves: [ san1 ] >? Resource Group: g_iSCSI-san1 >????? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san2 >????? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >????? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >????? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >????? p_iSCSI-san1_3??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >????? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Started san2 From the iSCSI client (xen2): > xen2:~ # iscsiadm -m discovery -t st -p 192.168.1.254 > 192.168.1.254:3260,1 iqn.2012-11.com.example.san1:sda > 192.168.0.2:3260,1 iqn.2012-11.com.example.san1:sda > 192.168.1.2:3260,1 iqn.2012-11.com.example.san1:sda Problem fixed! Eric Pretorious Truckee, CA >________________________________ > From: Eric >To: linux clustering >Sent: Thursday, January 17, 2013 8:59 PM >Subject: [Linux-cluster] HA iSCSI with DRBD > > >I've been attempting to follow the recipe laid-out in the Linbit guide "Highly available iSCSI storage with DRBD and Pacemaker" to create a highly-available iSCSI server on the two servers san1 & san2 but can't quite get the details right: > > >> crm configure property stonith-enabled=false >> crm configure property no-quorum-policy=ignore >> >> crm configure primitive p_IP-1_254 ocf:heartbeat:IPaddr2 params ip=192.168.1.254 cidr_netmask=24 op monitor interval=30s >> >> crm configure primitive p_DRBD-r0 ocf:linbit:drbd params drbd_resource=r0 op monitor interval=60s >> crm configure ms ms_DRBD-r0 p_DRBD-r0 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true >> >> crm configure primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget params iqn=iqn.2012-11.com.example.san1:sda op monitor interval=10s >> crm configure primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=0 path=/dev/drbd0 op monitor interval=10s >> crm configure primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=1 path=/dev/drbd1 op monitor interval=10s >> crm configure primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=2 path=/dev/drbd2 op monitor interval=10s >> crm configure primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=3 path=/dev/drbd3 op monitor interval=10s >> crm configure primitive p_iSCSI-san1_4 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=4 path=/dev/drbd4 op monitor interval=10s >> >> crm configure group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_iSCSI-san1_4 p_IP-1_254 >> crm configure order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start >> crm configure colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master >> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1 > > >IET (i.e., iscsitarget) is already running (with the default configuration) and DRBD's already correctly configured to create the resource r0... > > >> resource r0 { >> ??? volume 0 { >> ??? ??? device /dev/drbd0 ; >> ??? ??? disk /dev/sda7 ; >> ??? ??? meta-disk internal ; >> ??? } >> ??? volume 1 { >> ??? ??? device /dev/drbd1 ; >> ??? ??? disk /dev/sda8 ; >> ??? ??? meta-disk internal ; >> ??? } >> ??? volume 2 { >> ??? ??? device /dev/drbd2 ; >> ??? ??? disk /dev/sda9 ; >> ??? ??? meta-disk internal ; >> ??? } >> ??? volume 3 { >> ??? ??? device /dev/drbd3 ; >> ??? ??? disk /dev/sda10 ; >> ??? ??? meta-disk internal ; >> ??? } >> ??? on san1 { >> ??? ??? address 192.168.1.1:7789 ; >> ??? } >> ??? on san2 { >> ??? ??? address 192.168.1.2:7789 ; >> ??? } >> } > > > >But the shared IP address won't start nor will the LUN's: > > >> san1:~ # crm_mon -1 >> ============ >> Last updated: Thu Jan 17 20:55:55 2013 >> Last change: Thu Jan 17 20:55:09 2013 by root via cibadmin on san1 >> Stack: openais >> Current DC: san1 - partition with quorum >> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf >> 2 Nodes configured, 2 expected votes >> 9 Resources configured. >> ============ >> >> Online: [ san1 san2 ] >> >>? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0] >> ???? Masters: [ san1 ] >> ???? Slaves: [ san2 ] >>? Resource Group: g_iSCSI-san1 >> ???? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san1 >> ???? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped >> ???? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped >> ???? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped >> ???? p_iSCSI-san1_3??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped >> ???? p_iSCSI-san1_4??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped >> ???? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Stopped >> >> Failed actions: >> ??? p_iSCSI-san1_0_start_0 (node=san1, call=23, rc=1, status=complete): unknown error >> ??? p_iSCSI-san1_1_start_0 (node=san1, call=26, rc=1, status=complete): unknown error >> ??? p_iSCSI-san1_2_start_0 (node=san1, call=29, rc=1, status=complete): unknown error >> ??? p_iSCSI-san1_3_start_0 (node=san1, call=32, rc=1, status=complete): unknown error >> ??? p_iSCSI-san1_4_start_0 (node=san1, call=35, rc=1, status=complete): unknown error >> ??? p_iSCSI-san1_0_start_0 (node=san2, call=11, rc=1, status=complete): unknown error >> ??? p_iSCSI-san1_1_start_0 (node=san2, call=14, rc=1, status=complete): unknown error >> ??? p_iSCSI-san1_2_start_0 (node=san2, call=17, rc=1, status=complete): unknown error >> ??? p_iSCSI-san1_3_start_0 (node=san2, call=20, rc=1, status=complete): unknown error >> ??? p_iSCSI-san1_4_start_0 (node=san2, call=23, rc=1, status=complete): unknown error > > > >What am I doing wrong? > > > >TIA, >Eric Pretorious >Truckee, CA > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jpokorny at redhat.com Fri Jan 18 22:04:44 2013 From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=) Date: Fri, 18 Jan 2013 23:04:44 +0100 Subject: [Linux-cluster] Problem with automating ricci & ccs In-Reply-To: <50F9A613.30902@srce.hr> References: <50F363CC.3020308@srce.hr> <20130116131456.GA6079@redhat.com> <50F99800.6070102@srce.hr> <50F9A613.30902@srce.hr> Message-ID: <20130118220444.GA13473@redhat.com> On 18/01/13 20:44 +0100, Jakov Sosic wrote: > On 01/18/2013 07:44 PM, Jakov Sosic wrote: >> >> Thank you for your explanation. I've figured that out later myself :) >> >> So, instead of using the sha1sum to avoide collisions, I use nodename. >> So my client_cert names look like this: >> >> client_cert_mynode1 >> client_cert_mynode2 >> ... >> >> Is this ok? Or should I obfuscate name for some reason... Indeed, you can go with whatever naming convention you like, only location is important. Honestly, I wasn't sure so sticked with the internal naming convention of ricci, which does not hurt either (note that during the cluster lifetime, you can, e.g., re-add a node under different name so this descriptive naming can get out of sync, but not a big deal). >>> Surely, the first step can be substituted by either using pregenerated >>> certificate + key on the locations expected by ccs (~/.ccs) or >>> generating them explicitly (e.g., by "openssl req") as part >>> of the process. The point is that css-local and ricci-tracked >>> certificate (one of presumably many) matches. >> >> I've done this by pre-generating the certificates on my puppet master. >> >> >>>> I'm building puppet module which will autoconfigure whole cluster >>>> from bare metal to working state. So far my only problem is >>>> updating cluster.conf, for which I need fully working ricci CA >>>> and user certificates in /root/.ccs of every node... >>> >>> By any chance, are you willing to share the module or its skeleton >>> to the community? >> >> Offcourse, as soon as I'm happy with the code and the level of >> functionality. We have around dozen clusters, I'm developing this module >> on a new one that's supposed to go into production soon. Other clusters >> use older module which really doesn't solve any of this, so as soon as >> my code is stable we'll push other clusters to new puppet module. After >> that, I will publish it. So expect it in another week or two. Cool, thanks. >>> Hope the above helps. >> >> Yeah, it really did help. But, for some strange reason it seems that >> ccs_sync doesn't use certificates, but instead it asks for password... See below. >> My idea was to use ccs_sync to propagate new cluster.conf. So, >> puppet puts cluster.conf in /etc/cluster.conf, and after that runs >> ccs_sync -f /etc/cluster.conf >> >> But unfortunately, ccs_sync doesn't seem to recognize the certificates >> as ccs does :( Any idea on this one? >> >> pgsql01-xc # ccs -h pgsql01-xc --getversion >> 2 >> >> pgsql01-xc # ccs_sync -f /etc/cluster.conf >> You have not authenticated to the ricci daemon on pgsql01-xc >> Password: >> >> >> I'm digging into source code to try to get some sense of it :-/ > > It seems that ccs_sync run as root uses /var/lib/ricci/cacert.pem as > it's own client certificate... (/var/lib/ricci/certs/cacert.pem) Yes, but it is a little bit more complicated. When ricci is run for the first time (prerequisite [1] to run either "cman_tool version ..." or ccs_sync directly [*]), it generates its OpenSSL certificate (/var/lib/ricci/certs/cacert.pem) + key, which are then 1:1 cloned to PKCS#12 format and put into NSS certificate DB (in the same dir) and this is what ccs_sync uses to obtain its client certificate. > Do you think if it's OK to use same client certificate for root user > (/root/.ccs/cacert.pem) and for ricci user (/var/lib/ricci/cacert.pem) > on the same machine? As long as you do not need per-client granularity (e.g., to forcibly revoke/remove particular certificate from /var/lib/ricci/certs/clients; btw. custom named cert files here would actually prove useful as otherwise one would have to do a tedious certificate-content-matching task to identify a correct victim)... > That way I wouldn't need to generate additional certificates for root > user but just use existing ones. As it seems ccs_sync already uses them... See above, really depends on the level of permissions management you want to achieve (in extreme, there can be a single certificate for everything, but I wouldn't recommend this). Other possibility, although suffering from the similar global permission issue, is to use local certificate authority whose certificates (every and each) will be automatically considered as trusted. It looks like that to you utilize this, you would need to append the certificate of this CA to /var/lib/ricci/certs/auth_CAs.pem. As this path is not commonly used, this is for the braver ones :) [*] in fact, this prerequisite can be avoided (a/ by specifying "-c" option to ccs_sync AND b/ explicitly listing other nodes as arguments, but again, these nodes have to be running ricci), however this a degenerate case and best if forgotten [1] http://www.redhat.com/archives/linux-cluster/2010-November/msg00163.html -- Jan From DJCapstick1 at uclan.ac.uk Mon Jan 21 16:18:59 2013 From: DJCapstick1 at uclan.ac.uk (David John Capstick) Date: Mon, 21 Jan 2013 16:18:59 +0000 Subject: [Linux-cluster] Course of action if Cluster Manager cannot stop a Percona Mysql application/service Message-ID: Hi, I am investigating a problem that occurred some time ago with a two node cluster. It would appear that rgmanager was unable to stop the application (percona mysql) cleanly according to /var/log/messages. After a while it would appear that rgmanager did start the service again. Does this mean that despite the messages it was indeed able to shut the service down first ? If a service cannot be stopped cleanly I would have thought that rgmanager does not try and start it again - is this view wrong ? Also the logs show that rgmanager tried to stop the service at 05:06:04 but how do you discover why this action was taken ? I have included an excerpt of /var/log/messages. Many Thanks David Nov 17 22:43:03 db1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="2202" x-info="http://www.rsyslog.com"] rsyslogd was HUPed Nov 20 05:06:04 db1 rgmanager[11672]: Stopping service service:mysql-master Nov 20 05:06:04 db1 rgmanager[14368]: [mysqld] Stopping Service mysqld:mysql-master Nov 20 05:06:26 db1 rgmanager[14463]: [mysqld] Stopping Service mysqld:mysql-master > Failed - Application Is Still Running Nov 20 05:06:26 db1 rgmanager[14485]: [mysqld] Stopping Service mysqld:mysql-master > Failed Nov 20 05:06:26 db1 rgmanager[11672]: stop on mysqld "mysql-master" returned 1 (generic error) Nov 20 05:06:26 db1 rgmanager[14559]: [fs] unmounting /srv/mysql-master/mnt Nov 20 05:06:31 db1 rgmanager[14637]: [fs] unmounting /srv/mysql-master/mnt Nov 20 05:06:37 db1 rgmanager[14713]: [fs] unmounting /srv/mysql-master/mnt Nov 20 05:06:37 db1 rgmanager[14758]: [fs] 'umount /srv/mysql-master/mnt' failed, error=1 Nov 20 05:06:37 db1 rgmanager[11672]: stop on fs "mysql-master" returned 1 (generic error) Nov 20 05:06:37 db1 rgmanager[14811]: [ip] Removing IPv4 address 192.168.249.120/24 from eth0 Nov 20 05:06:38 db1 ntpd[8006]: Deleting interface #28 eth0, 192.168.249.120#123, interface stats: received=0, sent=0, dropped=0, active_time=5767950 secs Nov 20 05:06:47 db1 rgmanager[11672]: #12: RG service:mysql-master failed to stop; intervention required Nov 20 05:06:47 db1 rgmanager[11672]: Service service:mysql-master is failed Nov 20 05:07:32 db1 rgmanager[11672]: #43: Service service:mysql-master has failed; can not start. Nov 20 05:07:32 db1 rgmanager[11672]: #13: Service service:mysql-master failed to stop cleanly Nov 20 05:09:46 db1 rgmanager[11672]: #43: Service service:mysql-master has failed; can not start. Nov 20 05:09:46 db1 rgmanager[11672]: #13: Service service:mysql-master failed to stop cleanly Nov 20 05:10:37 db1 rgmanager[11672]: #43: Service service:mysql-master has failed; can not start. Nov 20 05:10:37 db1 rgmanager[11672]: #13: Service service:mysql-master failed to stop cleanly Nov 20 05:11:06 db1 rgmanager[11672]: #43: Service service:mysql-master has failed; can not start. Nov 20 05:11:06 db1 rgmanager[11672]: #13: Service service:mysql-master failed to stop cleanly Nov 20 05:16:50 db1 rgmanager[11672]: Starting stopped service service:mysql-master Nov 20 05:16:50 db1 rgmanager[15291]: [ip] Adding IPv4 address 192.168.249.120/24 to eth0 Nov 20 05:16:53 db1 ntpd[8006]: Listening on interface #29 eth0, 192.168.249.120#123 Enabled Nov 20 05:16:53 db1 rgmanager[15516]: [mysqld] Checking Existence Of File /var/run/cluster/mysqld/mysqld:mysql-master.pid [mysqld:mysql-master] > Failed Nov 20 05:16:54 db1 rgmanager[15538]: [mysqld] Monitoring Service mysqld:mysql-master > Service Is Not Running Nov 20 05:16:54 db1 rgmanager[15560]: [mysqld] Starting Service mysqld:mysql-master Nov 20 05:16:58 db1 rgmanager[11672]: Service service:mysql-master started Nov 20 10:42:01 db1 auditd[7280]: Audit daemon rotating log files -------------- next part -------------- An HTML attachment was scrubbed... URL: From pankajgundare at gmail.com Tue Jan 22 04:58:23 2013 From: pankajgundare at gmail.com (Pankaj) Date: Tue, 22 Jan 2013 10:28:23 +0530 Subject: [Linux-cluster] Cluster restarting..... Message-ID: Hi, I Have two node of production server , but now a days when I am starting cman service on both node simultaneously one node(secondary ) restarted automatically , please give me solution. Thanks Pankaj -------------- next part -------------- An HTML attachment was scrubbed... URL: From washer at trlp.com Tue Jan 22 05:32:44 2013 From: washer at trlp.com (James Washer) Date: Mon, 21 Jan 2013 21:32:44 -0800 Subject: [Linux-cluster] Cluster restarting..... In-Reply-To: References: Message-ID: Did you look at the logs? On Mon, Jan 21, 2013 at 8:58 PM, Pankaj wrote: > Hi, > > I Have two node of production server , but now a days when I am starting > cman service on both node simultaneously one node(secondary ) restarted > automatically , please give me solution. > > Thanks > Pankaj > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- - jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sagar.Shimpi at tieto.com Tue Jan 22 05:59:14 2013 From: Sagar.Shimpi at tieto.com (Sagar.Shimpi at tieto.com) Date: Tue, 22 Jan 2013 07:59:14 +0200 Subject: [Linux-cluster] Cluster restarting..... In-Reply-To: References: Message-ID: Can you send me the logs of both the nodes ? Regards, Sagar Shimpi, Senior Technical Specialist, OSS Labs Tieto email sagar.shimpi at tieto.com, Wing 1, Cluster D, EON Free Zone, Plot No. 1, Survery # 77, MIDC Kharadi Knowledge Park, Pune 411014, India, www.tieto.com www.tieto.in TIETO. Knowledge. Passion. Results. From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Pankaj Sent: Tuesday, January 22, 2013 10:28 AM To: linux-cluster at redhat.com Subject: [Linux-cluster] Cluster restarting..... Hi, I Have two node of production server , but now a days when I am starting cman service on both node simultaneously one node(secondary ) restarted automatically , please give me solution. Thanks Pankaj -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Tue Jan 22 15:59:54 2013 From: lists at alteeve.ca (Digimer) Date: Tue, 22 Jan 2013 10:59:54 -0500 Subject: [Linux-cluster] Cluster restarting..... In-Reply-To: References: Message-ID: <50FEB77A.3050907@alteeve.ca> On 01/21/2013 11:58 PM, Pankaj wrote: > Hi, > > I Have two node of production server , but now a days when I am starting > cman service on both node simultaneously one node(secondary ) restarted > automatically , please give me solution. > > Thanks > Pankaj You need to share more information about what cluster you are running. I suspect what happened is that the "post_join_delay", which is a default of 6 seconds, expired so the first node fenced the second node. You can change this to, say, "30" seconds to give yourself more time to start cman on the other node. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From rhayden.public at gmail.com Tue Jan 22 17:22:25 2013 From: rhayden.public at gmail.com (Robert Hayden) Date: Tue, 22 Jan 2013 11:22:25 -0600 Subject: [Linux-cluster] self_fence for FS resource in RHEL 6.x operational? Message-ID: I am testing RHCS 6.3 and found that the self_fence option for a file system resource will now longer function as expected. Before I log an SR with RH, I was wondering if the design changed between RHEL 5 and RHEL 6. In RHEL 5, I see logic in /usr/share/cluster/fs.sh that will complete a "reboot -fn" command on a self_fence logic. In RHEL 6, there is little to no logic around self_fence in the fs.sh file. Example of RHEL 5 logic in fs.sh that appears to be removed from RHEL 6: if [ -n "$umount_failed" ]; then ocf_log err "'umount $mp' failed, error=$ret_val" if [ "$self_fence" ]; then ocf_log alert "umount failed - REBOOTING" sync reboot -fn fi return $FAIL else return $SUCCESS fi To test in RHEL 6, I simply create a file system (e.g. /test/data) resource with self_fence="1" or self_fence="on" (as added by Conga). Then mount a small ISO image on top of the file system. This mount will cause the file system resource to be unable to unmount itself and should trigger a self_fence scenario. Testing RHEL 6, I see the following in /var/log/messages: Jan 21 16:40:59 techval16 rgmanager[82637]: [fs] unmounting /test/data Jan 21 16:40:59 techval16 rgmanager[82777]: [fs] Sending SIGTERM to processes on /test/data Jan 21 16:41:04 techval16 rgmanager[82859]: [fs] unmounting /test/data Jan 21 16:41:05 techval16 rgmanager[82900]: [fs] Sending SIGKILL to processes on /test/data Jan 21 16:41:05 techval16 rgmanager[61929]: stop on fs "share16_data" returned 1 (generic error) Jan 21 16:41:05 techval16 rgmanager[61929]: #12: RG service:fstest_node16 failed to stop; intervention required Jan 21 16:41:05 techval16 rgmanager[61929]: Service service:fstest_node16 is failed Thanks Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdinitto at redhat.com Tue Jan 22 18:38:26 2013 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Tue, 22 Jan 2013 19:38:26 +0100 Subject: [Linux-cluster] self_fence for FS resource in RHEL 6.x operational? In-Reply-To: References: Message-ID: <50FEDCA2.9080008@redhat.com> On 01/22/2013 06:22 PM, Robert Hayden wrote: > I am testing RHCS 6.3 and found that the self_fence option for a file > system resource will now longer function as expected. Before I log an > SR with RH, I was wondering if the design changed between RHEL 5 and RHEL 6. > > In RHEL 5, I see logic in /usr/share/cluster/fs.sh that will complete a > "reboot -fn" command on a self_fence logic. In RHEL 6, there is little > to no logic around self_fence in the fs.sh file. The logic has just been moved to a common file shared by all *fs resources (fs-lib) > > Example of RHEL 5 logic in fs.sh that appears to be removed from RHEL 6: > if [ -n "$umount_failed" ]; then > ocf_log err "'umount $mp' failed, error=$ret_val" > > if [ "$self_fence" ]; then > ocf_log alert "umount failed - REBOOTING" > sync > reboot -fn > fi > return $FAIL > else > return $SUCCESS > fi same code, just different file. > > > > To test in RHEL 6, I simply create a file system (e.g. /test/data) > resource with self_fence="1" or self_fence="on" (as added by Conga). > Then mount a small ISO image on top of the file system. This mount will > cause the file system resource to be unable to unmount itself and should > trigger a self_fence scenario. > > Testing RHEL 6, I see the following in /var/log/messages: > > Jan 21 16:40:59 techval16 rgmanager[82637]: [fs] unmounting /test/data > Jan 21 16:40:59 techval16 rgmanager[82777]: [fs] Sending SIGTERM to > processes on /test/data > Jan 21 16:41:04 techval16 rgmanager[82859]: [fs] unmounting /test/data > Jan 21 16:41:05 techval16 rgmanager[82900]: [fs] Sending SIGKILL to > processes on /test/data > Jan 21 16:41:05 techval16 rgmanager[61929]: stop on fs "share16_data" > returned 1 (generic error) Looks like a bug in force_umount option. Please file a ticket with RH GSS. As workaround try to disable force_umount. As far as I can tell, but I haven't verify it: ocf_log warning "Sending SIGKILL to processes on $mp" fuser -kvm "$mp" case $? in 0) ;; 1) return $OCF_ERR_GENERIC ;; 2) break ;; esac the issue is the was fuser error is handled in force_umount path, that would match the log you are posting. I think the correct way would be to check if self_fence is enabled or not and then return/reboot later on the script. Fabio From epretorious at yahoo.com Wed Jan 23 01:05:57 2013 From: epretorious at yahoo.com (Eric) Date: Tue, 22 Jan 2013 17:05:57 -0800 (PST) Subject: [Linux-cluster] [RESOLVED] HA iSCSI with DRBD In-Reply-To: <1358541649.85444.YahooMailNeo@web126004.mail.ne1.yahoo.com> References: <1358485144.16911.YahooMailNeo@web126001.mail.ne1.yahoo.com> <1358541649.85444.YahooMailNeo@web126004.mail.ne1.yahoo.com> Message-ID: <1358903157.86396.YahooMailNeo@web126001.mail.ne1.yahoo.com> I realized, quite accidentally, that any downtime on either of the nodes (e.g., a reboot) causes corruption/inconsistencies in the DRBD resources because the DRBD node that was the DRBD primary (i.e., the preferred-primary) will forcefully become primary again when the node returns [thereby discarding modifications made on the older primary]. Therefore, in order to prevent this from happening, it's probably best to REMOVE the final primitive from each group: > crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1 > crm configure location l_iSCSI-san1+DRBD-r1 p_IP-1_253 10240: san2 This will prevent Pacemaker from promoting the younger primary and overwriting the modifications made on the older primary [when the preferred-primary node returns]. The DRBD resources can be moved manually... > crm resource move p_IP-1_254 san1 > crm resource move p_IP-1_253 san2 ...in order to distribute the workload between san1 & san2. Thoughts? Suggestions? Eric Pretorious Truckee, CA >________________________________ > From: Eric >To: linux clustering >Sent: Friday, January 18, 2013 12:40 PM >Subject: Re: [Linux-cluster] [RESOLVED] HA iSCSI with DRBD > > >After rebooting both nodes, I checked the cluster status again and found this: >Code: > >> san1:~ # crm_mon -1 >> ============ >> Last updated: Fri Jan 18 11:51:28 2013 >> Last change: Fri Jan 18 09:00:03 2013 by root via cibadmin on san2 >> Stack: openais >> Current DC: san2 - partition with quorum >> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf >> 2 Nodes configured, 2 expected votes >> 9 Resources configured. >> ============ >> >> Online: [ san1 san2 ] >> >>? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0] >>????? Masters: [ san2 ] >>????? Slaves: [ san1 ] >>? Resource Group: g_iSCSI-san1 >>????? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san2 >>????? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >>????? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >>????? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >>????? p_iSCSI-san1_3??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >>????? p_iSCSI-san1_4??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped >>????? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Stopped >> >> Failed actions: >>???? p_iSCSI-san1_4_start_0 (node=san1, call=25, rc=1, status=complete): unknown error >>???? p_iSCSI-san1_4_start_0 (node=san2, call=30, rc=1, status=complete): unknown error > >...and that's when it occured to me: There are only four volumes defined in the DRBD cofiguration (0, 1, 2, & 3) - not five (0, 1, 2, 3, & 4)! i.e., The p_iSCSI-san1_4 primitive was failing (because there is no volume /dev/drbd4) and that, in turn, was holding up theresource group g_iSCSI-san1 and causing all of the other primitives [e.g., p_IP-1_254] to fail too! > >So, I deleted p_iSCSI-san1_4 from the CIB and the cluster began working as designed: > >> san2:~ # ll /dev/drbd* >> brw-rw---- 1 root disk 147, 0 Jan 18 11:47 /dev/drbd0 >> brw-rw---- 1 root disk 147, 1 Jan 18 11:47 /dev/drbd1 >> brw-rw---- 1 root disk 147, 2 Jan 18 11:47 /dev/drbd2 >> brw-rw---- 1 root disk 147, 3 Jan 18 11:47 /dev/drbd3 >> >> ... >> > >> san2:~ # crm_mon -1 >> ============ >> Last updated: Fri Jan 18 11:53:03 2013 >> Last change: Fri Jan 18 11:52:58 2013 by root via cibadmin on san2 >> Stack: openais >> Current DC: san2 - partition with quorum >> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf >> 2 Nodes configured, 2 expected votes >> 8 Resources configured. >> ============ >> >> Online: [ san1 san2 ] >> >>? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0] >>????? Masters: [ san2 ] >>????? Slaves: [ san1 ] >>? Resource Group: g_iSCSI-san1 >>????? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san2 >>????? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >>????? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >>????? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >>????? p_iSCSI-san1_3??? (ocf::heartbeat:iSCSILogicalUnit):??? Started san2 >>????? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Started san2 > >From the iSCSI client (xen2): > >> xen2:~ # iscsiadm -m discovery -t st -p 192.168.1.254 >> 192.168.1.254:3260,1 iqn.2012-11.com.example.san1:sda >> 192.168.0.2:3260,1 iqn.2012-11.com.example.san1:sda >> 192.168.1.2:3260,1 iqn.2012-11.com.example.san1:sda > > >Problem fixed! > > >Eric Pretorious >Truckee, CA > > > >>________________________________ >> From: Eric >>To: linux clustering >>Sent: Thursday, January 17, 2013 8:59 PM >>Subject: [Linux-cluster] HA iSCSI with DRBD >> >> >>I've been attempting to follow the recipe laid-out in the Linbit guide "Highly available iSCSI storage with DRBD and Pacemaker" to create a highly-available iSCSI server on the two servers san1 & san2 but can't quite get the details right: >> >> >>> crm configure property stonith-enabled=false >>> crm configure property no-quorum-policy=ignore >>> >>> crm configure primitive p_IP-1_254 ocf:heartbeat:IPaddr2 params ip=192.168.1.254 cidr_netmask=24 op monitor interval=30s >>> >>> crm configure primitive p_DRBD-r0 ocf:linbit:drbd params drbd_resource=r0 op monitor interval=60s >>> crm configure ms ms_DRBD-r0 p_DRBD-r0 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true >>> >>> crm configure primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget params iqn=iqn.2012-11.com.example.san1:sda op monitor interval=10s >>> crm configure primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=0 path=/dev/drbd0 op monitor interval=10s >>> crm configure primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=1 path=/dev/drbd1 op monitor interval=10s >>> crm configure primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=2 path=/dev/drbd2 op monitor interval=10s >>> crm configure primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=3 path=/dev/drbd3 op monitor interval=10s >>> crm configure primitive p_iSCSI-san1_4 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=4 path=/dev/drbd4 op monitor interval=10s >>> >>> crm configure group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_iSCSI-san1_4 p_IP-1_254 >>> crm configure order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start >>> crm configure colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master >>> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1 >> >> >>IET (i.e., iscsitarget) is already running (with the default configuration) and DRBD's already correctly configured to create the resource r0... >> >> >>> resource r0 { >>> ??? volume 0 { >>> ??? ??? device /dev/drbd0 ; >>> ??? ??? disk /dev/sda7 ; >>> ??? ??? meta-disk internal ; >>> ??? } >>> ??? volume 1 { >>> ??? ??? device /dev/drbd1 ; >>> ??? ??? disk /dev/sda8 ; >>> ??? ??? meta-disk internal ; >>> ??? } >>> ??? volume 2 { >>> ??? ??? device /dev/drbd2 ; >>> ??? ??? disk /dev/sda9 ; >>> ??? ??? meta-disk internal ; >>> ??? } >>> ??? volume 3 { >>> ??? ??? device /dev/drbd3 ; >>> ??? ??? disk /dev/sda10 ; >>> ??? ??? meta-disk internal ; >>> ??? } >>> ??? on san1 { >>> ??? ??? address 192.168.1.1:7789 ; >>> ??? } >>> ??? on san2 { >>> ??? ??? address 192.168.1.2:7789 ; >>> ??? } >>> } >> >> >> >>But the shared IP address won't start nor will the LUN's: >> >> >>> san1:~ # crm_mon -1 >>> ============ >>> Last updated: Thu Jan 17 20:55:55 2013 >>> Last change: Thu Jan 17 20:55:09 2013 by root via cibadmin on san1 >>> Stack: openais >>> Current DC: san1 - partition with quorum >>> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf >>> 2 Nodes configured, 2 expected votes >>> 9 Resources configured. >>> ============ >>> >>> Online: [ san1 san2 ] >>> >>>? Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0] >>> ???? Masters: [ san1 ] >>> ???? Slaves: [ san2 ] >>>? Resource Group: g_iSCSI-san1 >>> ???? p_iSCSI-san1??? (ocf::heartbeat:iSCSITarget):??? Started san1 >>> ???? p_iSCSI-san1_0??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped >>> ???? p_iSCSI-san1_1??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped >>> ???? p_iSCSI-san1_2??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped >>> ???? p_iSCSI-san1_3??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped >>> ???? p_iSCSI-san1_4??? (ocf::heartbeat:iSCSILogicalUnit):??? Stopped >>> ???? p_IP-1_254??? (ocf::heartbeat:IPaddr2):??? Stopped >>> >>> Failed actions: >>> ??? p_iSCSI-san1_0_start_0 (node=san1, call=23, rc=1, status=complete): unknown error >>> ??? p_iSCSI-san1_1_start_0 (node=san1, call=26, rc=1, status=complete): unknown error >>> ??? p_iSCSI-san1_2_start_0 (node=san1, call=29, rc=1, status=complete): unknown error >>> ??? p_iSCSI-san1_3_start_0 (node=san1, call=32, rc=1, status=complete): unknown error >>> ??? p_iSCSI-san1_4_start_0 (node=san1, call=35, rc=1, status=complete): unknown error >>> ??? p_iSCSI-san1_0_start_0 (node=san2, call=11, rc=1, status=complete): unknown error >>> ??? p_iSCSI-san1_1_start_0 (node=san2, call=14, rc=1, status=complete): unknown error >>> ??? p_iSCSI-san1_2_start_0 (node=san2, call=17, rc=1, status=complete): unknown error >>> ??? p_iSCSI-san1_3_start_0 (node=san2, call=20, rc=1, status=complete): unknown error >>> ??? p_iSCSI-san1_4_start_0 (node=san2, call=23, rc=1, status=complete): unknown error >> >> >> >>What am I doing wrong? >> >> >> >>TIA, >>Eric Pretorious >>Truckee, CA >> >>-- >>Linux-cluster mailing list >>Linux-cluster at redhat.com >>https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From topumirza at gmail.com Thu Jan 24 16:03:47 2013 From: topumirza at gmail.com (topu mirza) Date: Thu, 24 Jan 2013 22:03:47 +0600 Subject: [Linux-cluster] Cluster restarting..... In-Reply-To: <50FEB77A.3050907@alteeve.ca> References: <50FEB77A.3050907@alteeve.ca> Message-ID: set multicast address 224.0.0.1 and fence_daemon post_fail_delay=45 post_join_delay=60 Thanks Topu Mirza On Tue, Jan 22, 2013 at 9:59 PM, Digimer wrote: > On 01/21/2013 11:58 PM, Pankaj wrote: > > Hi, > > > > I Have two node of production server , but now a days when I am starting > > cman service on both node simultaneously one node(secondary ) restarted > > automatically , please give me solution. > > > > Thanks > > Pankaj > > You need to share more information about what cluster you are running. I > suspect what happened is that the "post_join_delay", which is a default > of 6 seconds, expired so the first node fenced the second node. You can > change this to, say, "30" seconds to give yourself more time to start > cman on the other node. > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Mirza Jubayar Siddiq -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tom_Dryden at BUDCO.com Thu Jan 24 17:16:10 2013 From: Tom_Dryden at BUDCO.com (Dryden, Tom) Date: Thu, 24 Jan 2013 12:16:10 -0500 Subject: [Linux-cluster] LDAP as a service Message-ID: <1E7F581BEF7B8444A6D29997EECCC66C0838B556@BPMC-G0-EX1.budcotdc.net> Greeting All I am looking into implementing a 389 directory server in a clustered/GFS environment. Can anyone out there provide a pointer to information on implementing 389-directory server as a clustered service that can be relocated? Thanks in advance Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhayden.public at gmail.com Thu Jan 24 17:28:38 2013 From: rhayden.public at gmail.com (Robert Hayden) Date: Thu, 24 Jan 2013 11:28:38 -0600 Subject: [Linux-cluster] self_fence for FS resource in RHEL 6.x operational? In-Reply-To: <50FEDCA2.9080008@redhat.com> References: <50FEDCA2.9080008@redhat.com> Message-ID: On Tue, Jan 22, 2013 at 12:38 PM, Fabio M. Di Nitto wrote: > > On 01/22/2013 06:22 PM, Robert Hayden wrote: > > I am testing RHCS 6.3 and found that the self_fence option for a file > > system resource will now longer function as expected. Before I log an > > SR with RH, I was wondering if the design changed between RHEL 5 and > > RHEL 6. > > > > In RHEL 5, I see logic in /usr/share/cluster/fs.sh that will complete a > > "reboot -fn" command on a self_fence logic. In RHEL 6, there is little > > to no logic around self_fence in the fs.sh file. > > The logic has just been moved to a common file shared by all *fs > resources (fs-lib) > > > > > > > Example of RHEL 5 logic in fs.sh that appears to be removed from RHEL 6: > > if [ -n "$umount_failed" ]; then > > ocf_log err "'umount $mp' failed, error=$ret_val" > > > > if [ "$self_fence" ]; then > > ocf_log alert "umount failed - REBOOTING" > > sync > > reboot -fn > > fi > > return $FAIL > > else > > return $SUCCESS > > fi > > same code, just different file. > > > > > > > > > To test in RHEL 6, I simply create a file system (e.g. /test/data) > > resource with self_fence="1" or self_fence="on" (as added by Conga). > > Then mount a small ISO image on top of the file system. This mount will > > cause the file system resource to be unable to unmount itself and should > > trigger a self_fence scenario. > > > > Testing RHEL 6, I see the following in /var/log/messages: > > > > Jan 21 16:40:59 techval16 rgmanager[82637]: [fs] unmounting /test/data > > Jan 21 16:40:59 techval16 rgmanager[82777]: [fs] Sending SIGTERM to > > processes on /test/data > > Jan 21 16:41:04 techval16 rgmanager[82859]: [fs] unmounting /test/data > > Jan 21 16:41:05 techval16 rgmanager[82900]: [fs] Sending SIGKILL to > > processes on /test/data > > Jan 21 16:41:05 techval16 rgmanager[61929]: stop on fs "share16_data" > > returned 1 (generic error) > > Looks like a bug in force_umount option. > > Please file a ticket with RH GSS. I will log a ticket in a few days when I can build a simple test case for support. > > As workaround try to disable force_umount. The workaround of have force_umount=0 and self_fence=1 worked with the ISO image mount test. > > As far as I can tell, but I haven't verify it: > ocf_log warning "Sending SIGKILL to processes on $mp" > fuser -kvm "$mp" > > case $? in > 0) > ;; > 1) > return $OCF_ERR_GENERIC > ;; > 2) > break > ;; > esac > > the issue is the was fuser error is handled in force_umount path, that > would match the log you are posting. > I have learned that "fuser" command will not find the sub-mounted iso image that causes the umount to fail. So, my test case using the iso image to test self_fence may need to be updated. [root at techval16]# df -k | grep data /dev/mapper/share16vg-tv16_mq_data 806288 17200 748128 3% /test/data 352 352 0 100% /test/data/mnt [root at techval16]# fuser -kvm /test/data [root at techval16]# echo $? 1 [root at techval16]# umount /test/data umount: /test/data: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) [root at techval16]# Unsure if the logic in fs-lib needs to be updated to handle sub-mounted file systems. That is what the Support Ticket will determine, I suppose. > I think the correct way would be to check if self_fence is enabled or > not and then return/reboot later on the script. > > Fabio > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From arpittolani at gmail.com Thu Jan 24 18:38:49 2013 From: arpittolani at gmail.com (Arpit Tolani) Date: Fri, 25 Jan 2013 00:08:49 +0530 Subject: [Linux-cluster] LDAP as a service In-Reply-To: <1E7F581BEF7B8444A6D29997EECCC66C0838B556@BPMC-G0-EX1.budcotdc.net> References: <1E7F581BEF7B8444A6D29997EECCC66C0838B556@BPMC-G0-EX1.budcotdc.net> Message-ID: Hello On Thu, Jan 24, 2013 at 10:46 PM, Dryden, Tom wrote: > Greeting All > > > > I am looking into implementing a 389 directory server in a clustered/GFS > environment. > > Can anyone out there provide a pointer to information on implementing > 389-directory server as a clustered service that can be relocated? > > Why do you want to configure LDAP server on cluster ? Most of the ldap clients (nss_ldap, SSSD) can talk to multiple LDAP server & can failover when primary is down. > > Thanks in advance > > Tom > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Regards Arpit Tolani From stephen.krampach at lmco.com Thu Jan 24 18:43:16 2013 From: stephen.krampach at lmco.com (Krampach, Stephen) Date: Thu, 24 Jan 2013 18:43:16 +0000 Subject: [Linux-cluster] Cluster Shut Down Procedures Message-ID: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com> I hate to ask simple questions however, I've been perusing books and blogs for two hours and have no definitive procedure; We are having a power outage. What is the procedure to completely shut down and power off a Red Hat 6.3 cluster? Thanks in advance! Steve K -------------- next part -------------- An HTML attachment was scrubbed... URL: From epretorious at yahoo.com Thu Jan 24 20:50:35 2013 From: epretorious at yahoo.com (Eric) Date: Thu, 24 Jan 2013 12:50:35 -0800 (PST) Subject: [Linux-cluster] Cluster Shut Down Procedures In-Reply-To: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com> References: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com> Message-ID: <1359060635.97933.YahooMailNeo@web126006.mail.ne1.yahoo.com> Would `crm node standby` [on each of the nodes] be too simple? Eric Pretorious Truckee, CA >________________________________ > From: "Krampach, Stephen" >To: "linux-cluster at redhat.com" >Sent: Thursday, January 24, 2013 10:43 AM >Subject: [Linux-cluster] Cluster Shut Down Procedures > > > >I hate to ask simple questions however, I?ve been perusing >books and blogs for two hours and have no definitive procedure; >? >We are having a power outage. What is the procedure to completely >shut down and power off a Red Hat 6.3 cluster? >? >Thanks in advance! Steve K >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swegner at celltrak.com Thu Jan 24 21:14:14 2013 From: swegner at celltrak.com (Steve Wegner) Date: Thu, 24 Jan 2013 15:14:14 -0600 Subject: [Linux-cluster] Cluster Shut Down Procedures In-Reply-To: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com> References: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com> Message-ID: Could it be as simple as " service rgmanager stop " on each node, then normal shutdown? From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Krampach, Stephen Sent: Thursday, January 24, 2013 12:43 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] Cluster Shut Down Procedures I hate to ask simple questions however, I've been perusing books and blogs for two hours and have no definitive procedure; We are having a power outage. What is the procedure to completely shut down and power off a Red Hat 6.3 cluster? Thanks in advance! Steve K PRIVILEGED & CONFIDENTIAL The information contained in this email message is intended only for use of the person or entity to whom it is addressed. The contained information is CONFIDENTIAL and LEGALLY PRIVILEGED and exempt from disclosure under applicable laws. If you read this message and are not the addressee, you are notified that use, dissemination or reproduction of this message is prohibited. If you have received this message in error, please notify the sender immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.scheblein at marquette.edu Thu Jan 24 21:17:37 2013 From: adam.scheblein at marquette.edu (Scheblein, Adam) Date: Thu, 24 Jan 2013 21:17:37 +0000 Subject: [Linux-cluster] Cluster Shut Down Procedures In-Reply-To: References: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com> Message-ID: <040102649A06724CA48186EAA4A2FBB80B953E@ITS-EXMBITS1.marqnet.mu.edu> I typically do a ccs --stopall, shutdown, startup, then because stopall disables cluster autostart i do a ccs --startall. Adam On Jan 24, 2013, at 3:14 PM, Steve Wegner wrote: > Could it be as simple as ? service rgmanager stop ? on each node, then normal shutdown? > > > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Krampach, Stephen > Sent: Thursday, January 24, 2013 12:43 PM > To: linux-cluster at redhat.com > Subject: [Linux-cluster] Cluster Shut Down Procedures > > I hate to ask simple questions however, I?ve been perusing > books and blogs for two hours and have no definitive procedure; > > We are having a power outage. What is the procedure to completely > shut down and power off a Red Hat 6.3 cluster? > > Thanks in advance! Steve K > PRIVILEGED & CONFIDENTIAL > The information contained in this email message is intended only for use of the person or entity to whom it is addressed. The contained information is CONFIDENTIAL and LEGALLY PRIVILEGED and exempt from disclosure under applicable laws. If you read this message and are not the addressee, you are notified that use, dissemination or reproduction of this message is prohibited. If you have received this message in error, please notify the sender immediately. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3203 bytes Desc: not available URL: From stephen.krampach at lmco.com Thu Jan 24 21:34:11 2013 From: stephen.krampach at lmco.com (Krampach, Stephen) Date: Thu, 24 Jan 2013 21:34:11 +0000 Subject: [Linux-cluster] Cluster Shut Down Procedures In-Reply-To: <040102649A06724CA48186EAA4A2FBB80B953E@ITS-EXMBITS1.marqnet.mu.edu> References: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com> <040102649A06724CA48186EAA4A2FBB80B953E@ITS-EXMBITS1.marqnet.mu.edu> Message-ID: <9F9C54F90C94584AAADAD6877B09C42D0F59504D@HDXDSP51.us.lmco.com> I'm really not sure. I've never heard of the css command and man css does not show results. What I've read on some blogs thus far is; because the cluster is going down in totality, you need to tell the system to ignore the quorum, stop the fencing and then leave the cluster however, I have not heard anyone corroborate this info. I hate being the newbie. umount /mnt - Unmounts a GFS file system IF required vgchange -aln - Deactivates LVM volumes (locally) killall clvmd - Stops the CLVM daemon fence_tool leave - Leaves the fence domain (stops fenced) cman_tool leave remove -w - Leaves the cluster Steve K From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Scheblein, Adam Sent: Thursday, January 24, 2013 1:18 PM To: linux clustering Subject: EXTERNAL: Re: [Linux-cluster] Cluster Shut Down Procedures I typically do a ccs --stopall, shutdown, startup, then because stopall disables cluster autostart i do a ccs --startall. Adam On Jan 24, 2013, at 3:14 PM, Steve Wegner > wrote: Could it be as simple as " service rgmanager stop " on each node, then normal shutdown? From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Krampach, Stephen Sent: Thursday, January 24, 2013 12:43 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] Cluster Shut Down Procedures I hate to ask simple questions however, I've been perusing books and blogs for two hours and have no definitive procedure; We are having a power outage. What is the procedure to completely shut down and power off a Red Hat 6.3 cluster? Thanks in advance! Steve K PRIVILEGED & CONFIDENTIAL The information contained in this email message is intended only for use of the person or entity to whom it is addressed. The contained information is CONFIDENTIAL and LEGALLY PRIVILEGED and exempt from disclosure under applicable laws. If you read this message and are not the addressee, you are notified that use, dissemination or reproduction of this message is prohibited. If you have received this message in error, please notify the sender immediately. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam.scheblein at marquette.edu Thu Jan 24 21:41:44 2013 From: adam.scheblein at marquette.edu (Scheblein, Adam) Date: Thu, 24 Jan 2013 21:41:44 +0000 Subject: [Linux-cluster] Cluster Shut Down Procedures In-Reply-To: <9F9C54F90C94584AAADAD6877B09C42D0F59504D@HDXDSP51.us.lmco.com> References: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com> <040102649A06724CA48186EAA4A2FBB80B953E@ITS-EXMBITS1.marqnet.mu.edu> <9F9C54F90C94584AAADAD6877B09C42D0F59504D@HDXDSP51.us.lmco.com> Message-ID: <040102649A06724CA48186EAA4A2FBB80B9663@ITS-EXMBITS1.marqnet.mu.edu> CCS became a good tool starting in rhel 6.x, prior to that I never used it Here is the man page: http://linux.die.net/man/8/ccs From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Krampach, Stephen Sent: Thursday, January 24, 2013 3:34 PM To: linux clustering Subject: Re: [Linux-cluster] Cluster Shut Down Procedures I'm really not sure. I've never heard of the css command and man css does not show results. What I've read on some blogs thus far is; because the cluster is going down in totality, you need to tell the system to ignore the quorum, stop the fencing and then leave the cluster however, I have not heard anyone corroborate this info. I hate being the newbie. umount /mnt - Unmounts a GFS file system IF required vgchange -aln - Deactivates LVM volumes (locally) killall clvmd - Stops the CLVM daemon fence_tool leave - Leaves the fence domain (stops fenced) cman_tool leave remove -w - Leaves the cluster Steve K From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Scheblein, Adam Sent: Thursday, January 24, 2013 1:18 PM To: linux clustering Subject: EXTERNAL: Re: [Linux-cluster] Cluster Shut Down Procedures I typically do a ccs --stopall, shutdown, startup, then because stopall disables cluster autostart i do a ccs --startall. Adam On Jan 24, 2013, at 3:14 PM, Steve Wegner wrote: Could it be as simple as " service rgmanager stop " on each node, then normal shutdown? From: linux-cluster-bounces at redhat.com [mailto:linux- cluster-bounces at redhat.com] On Behalf Of Krampach, Stephen Sent: Thursday, January 24, 2013 12:43 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] Cluster Shut Down Procedures I hate to ask simple questions however, I've been perusing books and blogs for two hours and have no definitive procedure; We are having a power outage. What is the procedure to completely shut down and power off a Red Hat 6.3 cluster? Thanks in advance! Steve K PRIVILEGED & CONFIDENTIAL The information contained in this email message is intended only for use of the person or entity to whom it is addressed. The contained information is CONFIDENTIAL and LEGALLY PRIVILEGED and exempt from disclosure under applicable laws. If you read this message and are not the addressee, you are notified that use, dissemination or reproduction of this message is prohibited. If you have received this message in error, please notify the sender immediately. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4266 bytes Desc: not available URL: From Tom_Dryden at BUDCO.com Thu Jan 24 21:57:28 2013 From: Tom_Dryden at BUDCO.com (Dryden, Tom) Date: Thu, 24 Jan 2013 16:57:28 -0500 Subject: [Linux-cluster] LDAP as a service Message-ID: <1E7F581BEF7B8444A6D29997EECCC66C0838B560@BPMC-G0-EX1.budcotdc.net> Good Afternoon, There are a couple of reasons to implement LDAP on a cluster. 1. I have a cluster with GFS partitions available. 2. Want to avoid the cost putting up 2 more machines for master - master LDAP operation. 3. Want to avoid the timeout the client experiences when the primary is unavailable. My thought is to have the LADP data stored on a GFS partition while the LDAP server process and IP address are managed as a service. In this configuration the process can move between nodes with no impact to the clients. Thanks Tom Message: 3 Date: Fri, 25 Jan 2013 00:08:49 +0530 From: Arpit Tolani To: linux clustering Subject: Re: [Linux-cluster] LDAP as a service Message-ID: Content-Type: text/plain; charset=ISO-8859-1 Hello On Thu, Jan 24, 2013 at 10:46 PM, Dryden, Tom wrote: > Greeting All > > > > I am looking into implementing a 389 directory server in a > clustered/GFS environment. > > Can anyone out there provide a pointer to information on implementing > 389-directory server as a clustered service that can be relocated? > > Why do you want to configure LDAP server on cluster ? Most of the ldap clients (nss_ldap, SSSD) can talk to multiple LDAP server & can failover when primary is down. > > Thanks in advance > > Tom > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Regards Arpit Tolani From stephen.krampach at lmco.com Thu Jan 24 22:22:08 2013 From: stephen.krampach at lmco.com (Krampach, Stephen) Date: Thu, 24 Jan 2013 22:22:08 +0000 Subject: [Linux-cluster] Cluster Shut Down Procedures In-Reply-To: <040102649A06724CA48186EAA4A2FBB80B9663@ITS-EXMBITS1.marqnet.mu.edu> References: <9F9C54F90C94584AAADAD6877B09C42D0F592F75@HDXDSP51.us.lmco.com> <040102649A06724CA48186EAA4A2FBB80B953E@ITS-EXMBITS1.marqnet.mu.edu> <9F9C54F90C94584AAADAD6877B09C42D0F59504D@HDXDSP51.us.lmco.com> <040102649A06724CA48186EAA4A2FBB80B9663@ITS-EXMBITS1.marqnet.mu.edu> Message-ID: <9F9C54F90C94584AAADAD6877B09C42D0F595088@HDXDSP51.us.lmco.com> OH - I saw on http://www.sourceware.org/cluster/conga/ that it is not in the standard Fedora distribution. Unfortunately, I will need to get authorization prior to installing that. :( Q. Why is Conga not in the Fedora 6 distribution? A. Development for Conga started after the freeze for inclusion in FC6. We have prepared RPMs to run with Fedora 6 on our Downloads page. - Steve K From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Scheblein, Adam Sent: Thursday, January 24, 2013 1:42 PM To: linux clustering Subject: EXTERNAL: Re: [Linux-cluster] Cluster Shut Down Procedures CCS became a good tool starting in rhel 6.x, prior to that I never used it Here is the man page: http://linux.die.net/man/8/ccs From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Krampach, Stephen Sent: Thursday, January 24, 2013 3:34 PM To: linux clustering Subject: Re: [Linux-cluster] Cluster Shut Down Procedures I'm really not sure. I've never heard of the css command and man css does not show results. What I've read on some blogs thus far is; because the cluster is going down in totality, you need to tell the system to ignore the quorum, stop the fencing and then leave the cluster however, I have not heard anyone corroborate this info. I hate being the newbie. umount /mnt - Unmounts a GFS file system IF required vgchange -aln - Deactivates LVM volumes (locally) killall clvmd - Stops the CLVM daemon fence_tool leave - Leaves the fence domain (stops fenced) cman_tool leave remove -w - Leaves the cluster Steve K From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Scheblein, Adam Sent: Thursday, January 24, 2013 1:18 PM To: linux clustering Subject: EXTERNAL: Re: [Linux-cluster] Cluster Shut Down Procedures I typically do a ccs --stopall, shutdown, startup, then because stopall disables cluster autostart i do a ccs --startall. Adam On Jan 24, 2013, at 3:14 PM, Steve Wegner > wrote: Could it be as simple as " service rgmanager stop " on each node, then normal shutdown? From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Krampach, Stephen Sent: Thursday, January 24, 2013 12:43 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] Cluster Shut Down Procedures I hate to ask simple questions however, I've been perusing books and blogs for two hours and have no definitive procedure; We are having a power outage. What is the procedure to completely shut down and power off a Red Hat 6.3 cluster? Thanks in advance! Steve K PRIVILEGED & CONFIDENTIAL The information contained in this email message is intended only for use of the person or entity to whom it is addressed. The contained information is CONFIDENTIAL and LEGALLY PRIVILEGED and exempt from disclosure under applicable laws. If you read this message and are not the addressee, you are notified that use, dissemination or reproduction of this message is prohibited. If you have received this message in error, please notify the sender immediately. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From ricks at alldigital.com Thu Jan 24 22:49:45 2013 From: ricks at alldigital.com (Rick Stevens) Date: Thu, 24 Jan 2013 14:49:45 -0800 Subject: [Linux-cluster] LDAP as a service In-Reply-To: <1E7F581BEF7B8444A6D29997EECCC66C0838B560@BPMC-G0-EX1.budcotdc.net> References: <1E7F581BEF7B8444A6D29997EECCC66C0838B560@BPMC-G0-EX1.budcotdc.net> Message-ID: <5101BA89.90506@alldigital.com> On 01/24/2013 01:57 PM, Dryden, Tom issued this missive: > > Good Afternoon, > > There are a couple of reasons to implement LDAP on a cluster. > 1. I have a cluster with GFS partitions available. Good. > 2. Want to avoid the cost putting up 2 more machines for master - > master LDAP operation. Master-master LDAP replication is not hard to do and you're still going to have two machines running LDAP. Perhaps not simultaneously, but you will still have two machines. > 3. Want to avoid the timeout the client experiences when the primary is > unavailable. This is what the TIMEOUT and SIZELIMIT and NETWORK_TIMEOUT variables in the various incarnations of the ldap.conf file are for. The defaults do make things sluggish if a primary goes down, but you can tweak that. > My thought is to have the LADP data stored on a GFS partition while the > LDAP server process and IP address are managed as a service. In this > configuration the process can move between nodes with no impact to the > clients. Personally, I think you're over complicating things and unless you have a ridiculously big LDAP database that you don't want to replicate, I don't think you're really buying anything here. We run several master- master LDAP clusters here--even with one replicating across the country (California <--> Florida). Works fine. That being said, as with most FOSS stuff, there's more than one way to skin a mule. Do as you wish. ---------------------------------------------------------------------- - Rick Stevens, Systems Engineer, AllDigital ricks at alldigital.com - - AIM/Skype: therps2 ICQ: 22643734 Yahoo: origrps2 - - - - All generalizations are false. - ---------------------------------------------------------------------- From kkovachev at varna.net Fri Jan 25 09:47:49 2013 From: kkovachev at varna.net (Kaloyan Kovachev) Date: Fri, 25 Jan 2013 11:47:49 +0200 Subject: [Linux-cluster] LDAP as a service In-Reply-To: <5101BA89.90506@alldigital.com> References: <1E7F581BEF7B8444A6D29997EECCC66C0838B560@BPMC-G0-EX1.budcotdc.net> <5101BA89.90506@alldigital.com> Message-ID: <562ee94a434cdad2daf8cb973e46e51f@mx.varna.net> Hi, there should be openldap resource in your cluster, but if not you can always use a script resource or write your own. On Thu, 24 Jan 2013 14:49:45 -0800, Rick Stevens wrote: > On 01/24/2013 01:57 PM, Dryden, Tom issued this missive: >> >> Good Afternoon, >> >> There are a couple of reasons to implement LDAP on a cluster. >> 1. I have a cluster with GFS partitions available. > > Good. > >> 2. Want to avoid the cost putting up 2 more machines for master - >> master LDAP operation. > > Master-master LDAP replication is not hard to do and you're still going > to have two machines running LDAP. Perhaps not simultaneously, but you > will still have two machines. > >> 3. Want to avoid the timeout the client experiences when the primary is >> unavailable. > > This is what the TIMEOUT and SIZELIMIT and NETWORK_TIMEOUT variables in > the various incarnations of the ldap.conf file are for. The defaults do > make things sluggish if a primary goes down, but you can tweak that. > >> My thought is to have the LADP data stored on a GFS partition while the >> LDAP server process and IP address are managed as a service. In this >> configuration the process can move between nodes with no impact to the >> clients. > > Personally, I think you're over complicating things and unless you have > a ridiculously big LDAP database that you don't want to replicate, I > don't think you're really buying anything here. We run several master- > master LDAP clusters here--even with one replicating across the country > (California <--> Florida). Works fine. > > That being said, as with most FOSS stuff, there's more than one way to > skin a mule. Do as you wish. > ---------------------------------------------------------------------- > - Rick Stevens, Systems Engineer, AllDigital ricks at alldigital.com - > - AIM/Skype: therps2 ICQ: 22643734 Yahoo: origrps2 - > - - > - All generalizations are false. - > ---------------------------------------------------------------------- From Ralf.Aumueller at informatik.uni-stuttgart.de Tue Jan 29 07:13:56 2013 From: Ralf.Aumueller at informatik.uni-stuttgart.de (Ralf Aumueller) Date: Tue, 29 Jan 2013 08:13:56 +0100 Subject: [Linux-cluster] Share cluster interconnect switch hardware with second cluster Message-ID: <510776B4.9040404@informatik.uni-stuttgart.de> Hello, we have a two node cluster (CentOS6) configured and running. The cluster-interconnect is over two network switches (unmanaged. Reserved for the cluster-interconnect). Now we want to install a second two node cluster. Is it possible to share the switches for the cluster-interconnect of the new cluster? Do I have to set something special in /etc/cluster/cluster.conf of the new cluster? Thanx and best regards, Ralf From lists at alteeve.ca Tue Jan 29 07:17:36 2013 From: lists at alteeve.ca (Digimer) Date: Tue, 29 Jan 2013 02:17:36 -0500 Subject: [Linux-cluster] Share cluster interconnect switch hardware with second cluster In-Reply-To: <510776B4.9040404@informatik.uni-stuttgart.de> References: <510776B4.9040404@informatik.uni-stuttgart.de> Message-ID: <51077790.4060908@alteeve.ca> On 01/29/2013 02:13 AM, Ralf Aumueller wrote: > Hello, > > we have a two node cluster (CentOS6) configured and running. The > cluster-interconnect is over two network switches (unmanaged. Reserved for the > cluster-interconnect). > Now we want to install a second two node cluster. Is it possible to share the > switches for the cluster-interconnect of the new cluster? Do I have to set > something special in /etc/cluster/cluster.conf of the new cluster? > > Thanx and best regards, > > Ralf > It's fine. Each cluster will use a different multicast group. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From misch at schwartzkopff.org Tue Jan 29 08:53:58 2013 From: misch at schwartzkopff.org (Michael Schwartzkopff) Date: Tue, 29 Jan 2013 09:53:58 +0100 Subject: [Linux-cluster] Share cluster interconnect switch hardware with second cluster In-Reply-To: <510776B4.9040404@informatik.uni-stuttgart.de> References: <510776B4.9040404@informatik.uni-stuttgart.de> Message-ID: <1899303.vecLKUaSl7@nb003> Am Dienstag, 29. Januar 2013, 08:13:56 schrieb Ralf Aumueller: > Hello, > > we have a two node cluster (CentOS6) configured and running. The > cluster-interconnect is over two network switches (unmanaged. Reserved for > the cluster-interconnect). > Now we want to install a second two node cluster. Is it possible to share > the switches for the cluster-interconnect of the new cluster? Do I have to > set something special in /etc/cluster/cluster.conf of the new cluster? > > Thanx and best regards, > > Ralf To be sure configure a different multicast group on the new cluster. -- Dr. Michael Schwartzkopff Guardinistr. 63 81375 M?nchen Tel: (0163) 172 50 98 Fax: (089) 620 304 13 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmatchett at cfl.rr.com Tue Jan 29 20:58:18 2013 From: jmatchett at cfl.rr.com (jmatchett at cfl.rr.com) Date: Tue, 29 Jan 2013 15:58:18 -0500 Subject: [Linux-cluster] Cluster and Fencing on different subnetworks? Message-ID: <20130129205818.X4LBR.64959.root@cdptpa-web33-z01.mail.rr.com> I have a RHEL6.3 cluster with RHCS and DRBD. When I kill the master node, DRBD on the slave calls rhcs_fence, but the script thinks its fails returning a (1) since the fence device was not on the same subnet as defined by the clusternode name in the cluster.conf. The fencing actually does occur, but when the fenced node reboots, and it tries fo come back in, the New master DRBD always reports Primary/Unknown. THis requires a reboot of both nodes. Is this by design or a problem. I switched back to Obliterate-peer.sh and the problem goes away. here is an excerpt from my cluster.conf. I ##10.10.10.x Best regards John Matchett From ksorensen at nordija.com Wed Jan 30 11:31:22 2013 From: ksorensen at nordija.com (Kristian =?ISO-8859-1?Q?Gr=F8nfeldt_S=F8rensen?=) Date: Wed, 30 Jan 2013 12:31:22 +0100 Subject: [Linux-cluster] GFS2 - F_SETLK fails with "ENOSYS" after umount + mount Message-ID: <1359545482.15913.38.camel@kriller.nordija.dk> Hi, I'm setting up a two-node cluster sharing a single GFS2 filesystem backed by a dual-primary DRBD-device (DRBD on top of LVM, so no CLVM involved). I am experiencing more or less the same as the OP in this thread: http://www.redhat.com/archives/linux-cluster/2010-July/msg00136.html I have an activemq-5.6.0 instance on each server that tries to lock a file on the GFS2-filesystem (using ). When i start the cluster, everything works as expected. The first activemq instance that starts up acquires the lock, the lock is released when the activemq exits, and the second instance takes the lock. The problem shows when I unmount and subsequently mount the GFS2 filesystem again on one of the nodes, or reboot one of the nodes (after having started at least one activemq instance.) The I start seeing statements like this in the activemq log files: Database /srv/activemq/queue#3a#2f#2fstat.#3e/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: Function not implemented | org.apache.activemq.store.kahadb.MessageDatabase strace -f while that message is logged gives the following: [pid 3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0 [pid 3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0 [pid 3549] open("/srv/activemq/queue#3a#2f#2fstat.#3e/lock", O_RDWR|O_CREAT, 0666) = 133 [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 [pid 3549] fcntl(133, F_GETFD) = 0 [pid 3549] fcntl(133, F_SETFD, FD_CLOEXEC) = 0 [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 [pid 3549] fcntl(133, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = -1 ENOSYS (Function not implemented) [pid 3549] dup2(138, 133) = 133 [pid 3549] close(133) As you can see, the "Function not implemented" originates from the F_SETLK fnctl that the JVM does. The only way to recover from this state seems to be by unmounting the GFS2-filesystem on both nodes, then mounting it again again on both nodes. I've tried to isolate this by using a simpler testcase than starting two activemq instances. I ended up using the java sample from http://www.javabeat.net/2007/10/locking-files-using-java/ . I haven't managed to get the system in to a state where F_SETLK returns "Function no implemented" by only using the above FileLockTest class, (I need activemq in order to trigger the situation) but when the system is in that state, I can run FileLockTest, and it will print out the following stacktrace. Exception in thread "main" java.io.IOException: Function not implemented at sun.nio.ch.FileChannelImpl.lock0(Native Method) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:871) at java.nio.channels.FileChannel.tryLock(FileChannel.java:962) at FileLockTest.main(FileLockTest.java:15) If I run this on the other server (where the GFS2 fs was not unmounted and mounted again), it works correctly. Any ideas to what happens, and why? BR Kristian S?rensen From rpeterso at redhat.com Wed Jan 30 13:17:25 2013 From: rpeterso at redhat.com (Bob Peterson) Date: Wed, 30 Jan 2013 08:17:25 -0500 (EST) Subject: [Linux-cluster] GFS2 - F_SETLK fails with "ENOSYS" after umount + mount In-Reply-To: <1359545482.15913.38.camel@kriller.nordija.dk> Message-ID: <106620543.15884204.1359551845224.JavaMail.root@redhat.com> ----- Original Message ----- | Hi, | | I'm setting up a two-node cluster sharing a single GFS2 filesystem | backed by a dual-primary DRBD-device (DRBD on top of LVM, so no CLVM | involved). | | I am experiencing more or less the same as the OP in this thread: | http://www.redhat.com/archives/linux-cluster/2010-July/msg00136.html | | I have an activemq-5.6.0 instance on each server that tries to lock a | file on the GFS2-filesystem (using ). | | When i start the cluster, everything works as expected. The first | activemq instance that starts up acquires the lock, the lock is | released | when the activemq exits, and the second instance takes the lock. | | The problem shows when I unmount and subsequently mount the GFS2 | filesystem again on one of the nodes, or reboot one of the nodes | (after | having started at least one activemq instance.) | The I start seeing statements like this in the activemq log files: | | Database /srv/activemq/queue#3a#2f#2fstat.#3e/lock is locked... | waiting 10 seconds for the database to be unlocked. Reason: | java.io.IOException: Function not implemented | | org.apache.activemq.store.kahadb.MessageDatabase | | strace -f while that message is logged gives the following: | | [pid 3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", | {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0 | [pid 3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", | {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0 | [pid 3549] open("/srv/activemq/queue#3a#2f#2fstat.#3e/lock", | O_RDWR|O_CREAT, 0666) = 133 | [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 | [pid 3549] fcntl(133, F_GETFD) = 0 | [pid 3549] fcntl(133, F_SETFD, FD_CLOEXEC) = 0 | [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 | [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 | [pid 3549] fcntl(133, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, | start=0, len=1}) = -1 ENOSYS (Function not implemented) | [pid 3549] dup2(138, 133) = 133 | [pid 3549] close(133) | | As you can see, the "Function not implemented" originates from the | F_SETLK fnctl that the JVM does. | The only way to recover from this state seems to be by unmounting the | GFS2-filesystem on both nodes, then mounting it again again on both | nodes. | | I've tried to isolate this by using a simpler testcase than starting | two | activemq instances. I ended up using the java sample from | http://www.javabeat.net/2007/10/locking-files-using-java/ . | | I haven't managed to get the system in to a state where F_SETLK | returns | "Function no implemented" by only using the above FileLockTest class, | (I | need activemq in order to trigger the situation) but when the system | is | in that state, I can run FileLockTest, and it will print out the | following stacktrace. | | Exception in thread "main" java.io.IOException: Function not | implemented | at sun.nio.ch.FileChannelImpl.lock0(Native Method) | at | sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:871) | at | java.nio.channels.FileChannel.tryLock(FileChannel.java:962) | at FileLockTest.main(FileLockTest.java:15) | | | If I run this on the other server (where the GFS2 fs was not | unmounted | and mounted again), it works correctly. | | Any ideas to what happens, and why? | | BR | Kristian S?rensen Hi Kristian, After doing some simple checks (which shouldn't be your problem) GFS2 passes all posix lock requests down to the DLM for further processing. I'm not sure what DLM does with them from there, but I believe the requests are processed by user space, i.e. openais, etc., depending on what version you're running. I recommend checking "dmesg" to see if there are any pertinent errors logged there. You could also check /var/log/messages to see if user space logged any complaints. Also, you might want to do this command to check for pertinent errors: group_tool dump gfs (Now, if it was an flock rather than a posix lock, I could help you because flocks are handled by GFS2 and not just passed on to DLM). Regards, Bob Peterson Red Hat File Systems From swhiteho at redhat.com Wed Jan 30 13:34:41 2013 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 30 Jan 2013 13:34:41 +0000 Subject: [Linux-cluster] GFS2 - F_SETLK fails with "ENOSYS" after umount + mount In-Reply-To: <1359545482.15913.38.camel@kriller.nordija.dk> References: <1359545482.15913.38.camel@kriller.nordija.dk> Message-ID: <1359552881.2719.12.camel@menhir> Hi, On Wed, 2013-01-30 at 12:31 +0100, Kristian Gr?nfeldt S?rensen wrote: > Hi, > > I'm setting up a two-node cluster sharing a single GFS2 filesystem > backed by a dual-primary DRBD-device (DRBD on top of LVM, so no CLVM > involved). > > I am experiencing more or less the same as the OP in this thread: > http://www.redhat.com/archives/linux-cluster/2010-July/msg00136.html > Well I'm not so sure about that. We never found out what the issue was in that case, but in your case it seems that you are doing something which should work. Also, in the msg00136 case it seems that the lock request didn't work at all, whereas in your case it appears that it does work until a umount/mount of one node - at least if I've understood it correctly. Which kernel and userspace are you using? It would be a good plan to report this as a bug (or via support if you are a supported customer and are using RHEL) as it should work correctly, Steve. > I have an activemq-5.6.0 instance on each server that tries to lock a > file on the GFS2-filesystem (using ). > > When i start the cluster, everything works as expected. The first > activemq instance that starts up acquires the lock, the lock is released > when the activemq exits, and the second instance takes the lock. > > The problem shows when I unmount and subsequently mount the GFS2 > filesystem again on one of the nodes, or reboot one of the nodes (after > having started at least one activemq instance.) > The I start seeing statements like this in the activemq log files: > > Database /srv/activemq/queue#3a#2f#2fstat.#3e/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: Function not implemented | org.apache.activemq.store.kahadb.MessageDatabase > > strace -f while that message is logged gives the following: > > [pid 3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0 > [pid 3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0 > [pid 3549] open("/srv/activemq/queue#3a#2f#2fstat.#3e/lock", O_RDWR|O_CREAT, 0666) = 133 > [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > [pid 3549] fcntl(133, F_GETFD) = 0 > [pid 3549] fcntl(133, F_SETFD, FD_CLOEXEC) = 0 > [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > [pid 3549] fcntl(133, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = -1 ENOSYS (Function not implemented) > [pid 3549] dup2(138, 133) = 133 > [pid 3549] close(133) > > As you can see, the "Function not implemented" originates from the > F_SETLK fnctl that the JVM does. > The only way to recover from this state seems to be by unmounting the > GFS2-filesystem on both nodes, then mounting it again again on both > nodes. > > I've tried to isolate this by using a simpler testcase than starting two > activemq instances. I ended up using the java sample from > http://www.javabeat.net/2007/10/locking-files-using-java/ . > > I haven't managed to get the system in to a state where F_SETLK returns > "Function no implemented" by only using the above FileLockTest class, (I > need activemq in order to trigger the situation) but when the system is > in that state, I can run FileLockTest, and it will print out the > following stacktrace. > > Exception in thread "main" java.io.IOException: Function not implemented > at sun.nio.ch.FileChannelImpl.lock0(Native Method) > at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:871) > at java.nio.channels.FileChannel.tryLock(FileChannel.java:962) > at FileLockTest.main(FileLockTest.java:15) > > > If I run this on the other server (where the GFS2 fs was not unmounted > and mounted again), it works correctly. > > Any ideas to what happens, and why? > > BR > Kristian S?rensen > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From rainer.hartwig.schubert at gmail.com Wed Jan 30 15:22:36 2013 From: rainer.hartwig.schubert at gmail.com (Rainer Schubert) Date: Wed, 30 Jan 2013 16:22:36 +0100 Subject: [Linux-cluster] clvmd not running on node mynode3 Message-ID: Hi, I can't get a new LVM blockdevice or do a simple resize: lvresize /dev/vm-storage/windowsserver1 -L +20G Extending logical volume windowsserver1 to 90.00 GiB clvmd not running on node mynode3 Unable to drop cached metadata for VG vm-storage The node mynode is correctly running. What can I do in this situation? Best regards From ksorensen at nordija.com Wed Jan 30 16:11:09 2013 From: ksorensen at nordija.com (Kristian =?ISO-8859-1?Q?Gr=F8nfeldt_S=F8rensen?=) Date: Wed, 30 Jan 2013 17:11:09 +0100 Subject: [Linux-cluster] GFS2 - F_SETLK fails with "ENOSYS" after umount + mount In-Reply-To: <1359552881.2719.12.camel@menhir> References: <1359545482.15913.38.camel@kriller.nordija.dk> <1359552881.2719.12.camel@menhir> Message-ID: <1359562269.15913.57.camel@kriller.nordija.dk> On Wed, 2013-01-30 at 13:34 +0000, Steven Whitehouse wrote: > Hi, > > On Wed, 2013-01-30 at 12:31 +0100, Kristian Gr?nfeldt S?rensen wrote: > > Hi, > > > > I'm setting up a two-node cluster sharing a single GFS2 filesystem > > backed by a dual-primary DRBD-device (DRBD on top of LVM, so no CLVM > > involved). > > > > I am experiencing more or less the same as the OP in this thread: > > http://www.redhat.com/archives/linux-cluster/2010-July/msg00136.html > > > > Well I'm not so sure about that. We never found out what the issue was > in that case, but in your case it seems that you are doing something > which should work. Also, in the msg00136 case it seems that the lock > request didn't work at all, whereas in your case it appears that it does > work until a umount/mount of one node - at least if I've understood it > correctly. Correct. And I am able to bring the system into a working state by unmounting the file system from all nodes at the same time, and mounting it again. > Which kernel and userspace are you using? It's Debian testing - kernel is from experimental ( 3.7.1-1~experimental.2), since I had problems deleting files with the gfs2-module included in the default Debian testing kernel (3.2.x). cman + libdlm3 is v3.0.12 corosync is v1.4.2 Let me know if you need version numbers of other stuff. > It would be a good plan to report this as a bug (or via support if you > are a supported customer and are using RHEL) as it should work > correctly, OK will probably file a bug report then. It's at least encouraging to hear that it should work:-) /Kristian > Steve. > > > > I have an activemq-5.6.0 instance on each server that tries to lock a > > file on the GFS2-filesystem (using ). > > > > When i start the cluster, everything works as expected. The first > > activemq instance that starts up acquires the lock, the lock is released > > when the activemq exits, and the second instance takes the lock. > > > > The problem shows when I unmount and subsequently mount the GFS2 > > filesystem again on one of the nodes, or reboot one of the nodes (after > > having started at least one activemq instance.) > > The I start seeing statements like this in the activemq log files: > > > > Database /srv/activemq/queue#3a#2f#2fstat.#3e/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: Function not implemented | org.apache.activemq.store.kahadb.MessageDatabase > > > > strace -f while that message is logged gives the following: > > > > [pid 3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0 > > [pid 3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0 > > [pid 3549] open("/srv/activemq/queue#3a#2f#2fstat.#3e/lock", O_RDWR|O_CREAT, 0666) = 133 > > [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > > [pid 3549] fcntl(133, F_GETFD) = 0 > > [pid 3549] fcntl(133, F_SETFD, FD_CLOEXEC) = 0 > > [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > > [pid 3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > > [pid 3549] fcntl(133, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = -1 ENOSYS (Function not implemented) > > [pid 3549] dup2(138, 133) = 133 > > [pid 3549] close(133) > > > > As you can see, the "Function not implemented" originates from the > > F_SETLK fnctl that the JVM does. > > The only way to recover from this state seems to be by unmounting the > > GFS2-filesystem on both nodes, then mounting it again again on both > > nodes. > > > > I've tried to isolate this by using a simpler testcase than starting two > > activemq instances. I ended up using the java sample from > > http://www.javabeat.net/2007/10/locking-files-using-java/ . > > > > I haven't managed to get the system in to a state where F_SETLK returns > > "Function no implemented" by only using the above FileLockTest class, (I > > need activemq in order to trigger the situation) but when the system is > > in that state, I can run FileLockTest, and it will print out the > > following stacktrace. > > > > Exception in thread "main" java.io.IOException: Function not implemented > > at sun.nio.ch.FileChannelImpl.lock0(Native Method) > > at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:871) > > at java.nio.channels.FileChannel.tryLock(FileChannel.java:962) > > at FileLockTest.main(FileLockTest.java:15) > > > > > > If I run this on the other server (where the GFS2 fs was not unmounted > > and mounted again), it works correctly. > > > > Any ideas to what happens, and why? > > > > BR > > Kristian S?rensen > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > From queszama at yahoo.in Wed Jan 30 16:29:23 2013 From: queszama at yahoo.in (Zama Ques) Date: Thu, 31 Jan 2013 00:29:23 +0800 (SGT) Subject: [Linux-cluster] GFS2 File System mount failing Message-ID: <1359563363.57630.YahooMailNeo@web193504.mail.sg3.yahoo.com> I am facing few issues while creating a GFS2 file system . GFS2 file creation is successful , but? it is failing while trying to mount the file system . It is failing with the following error : === [root at eser~]# /etc/init.d/gfs2 start Mounting GFS2 filesystem (/sharedweb): fs is for a different cluster error mounting lockproto lock_dlm ?????????????????????????????????????????????????????????? [FAILED] ---------- [root at eser ~]# tail -f /var/log/messages Jan 30 15:50:27 eser modcluster: Updating cluster version Jan 30 15:50:27 eser corosync[7121]:?? [QUORUM] Members[2]: 1 2 Jan 30 15:50:28 eser rgmanager[7379]: Reconfiguring Jan 30 15:50:28 eser rgmanager[7379]: Loading Service Data Jan 30 15:50:29 eserrgmanager[7379]: Stopping changed resources. Jan 30 15:50:29 eser rgmanager[7379]: Restarting changed resources. Jan 30 15:50:29 eser rgmanager[7379]: Starting changed resources. Jan 30 15:56:21 eser gfs_controld[7254]: join: fs requires cluster="mycluster" current="sharedweb" Jan 30 16:02:43 eser gfs_controld[7254]: join: fs requires cluster="mycluster" current="sharedweb" Jan 30 18:46:48 eser gfs_controld[7254]: join: fs requires cluster="mycluster" current="sharedweb" ? ========== "sharedweb" is the cluster which I created earlier and created the GFS2 file system using this cluster name. But I deleted "sharedweb" cluster and created a new cluster called "mycluster" , but while mounting the GFS2 partition with the new cluster , it is showing the error as mentioned above . I created the new GFS2 file system using the command as shown below mkfs.gfs2 -t mycluster:mygfs2 -p lock_dlm -j 2 /dev/mapper/mpathcp1 My cluster config is as follows: ===== # cat /etc/cluster/cluster.conf ??????? ??????????????? ??????????????? ??????? ??????? === Please suggest how to resolve the issue Thanks Zaman -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhiteho at redhat.com Wed Jan 30 16:53:31 2013 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 30 Jan 2013 16:53:31 +0000 Subject: [Linux-cluster] GFS2 File System mount failing In-Reply-To: <1359563363.57630.YahooMailNeo@web193504.mail.sg3.yahoo.com> References: <1359563363.57630.YahooMailNeo@web193504.mail.sg3.yahoo.com> Message-ID: <1359564811.2719.25.camel@menhir> Hi, On Thu, 2013-01-31 at 00:29 +0800, Zama Ques wrote: > > > I am facing few issues while creating a GFS2 file system . GFS2 file > creation is successful , but it is failing while trying to mount the > file system . > > It is failing with the following error : > > === > > [root at eser~]# /etc/init.d/gfs2 start > Mounting GFS2 filesystem (/sharedweb): fs is for a different cluster > error mounting lockproto lock_dlm > [FAILED] > ---------- > Did you restart the cluster demons after you changed the config file? It looks like it still is looking at the old data from the messages you've posted, Steve. > [root at eser ~]# tail -f /var/log/messages > Jan 30 15:50:27 eser modcluster: Updating cluster version > Jan 30 15:50:27 eser corosync[7121]: [QUORUM] Members[2]: 1 2 > Jan 30 15:50:28 eser rgmanager[7379]: Reconfiguring > Jan 30 15:50:28 eser rgmanager[7379]: Loading Service Data > Jan 30 15:50:29 eserrgmanager[7379]: Stopping changed resources. > Jan 30 15:50:29 eser rgmanager[7379]: Restarting changed resources. > Jan 30 15:50:29 eser rgmanager[7379]: Starting changed resources. > Jan 30 15:56:21 eser gfs_controld[7254]: join: fs requires > cluster="mycluster" current="sharedweb" > Jan 30 16:02:43 eser gfs_controld[7254]: join: fs requires > cluster="mycluster" current="sharedweb" > Jan 30 18:46:48 eser gfs_controld[7254]: join: fs requires > cluster="mycluster" current="sharedweb" > > > ========== > > "sharedweb" is the cluster which I created earlier and created the > GFS2 file system using this cluster name. But I deleted "sharedweb" > cluster and created a new cluster called "mycluster" , but while > mounting the GFS2 partition with the new cluster , it is showing the > error as mentioned above . > > I created the new GFS2 file system using the command as shown below > > mkfs.gfs2 -t mycluster:mygfs2 -p lock_dlm -j 2 /dev/mapper/mpathcp1 > > My cluster config is as follows: > > ===== > # cat /etc/cluster/cluster.conf > > > > > > > > > === > > Please suggest how to resolve the issue > > > Thanks > Zaman > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From queszama at yahoo.in Thu Jan 31 01:56:21 2013 From: queszama at yahoo.in (Zama Ques) Date: Thu, 31 Jan 2013 09:56:21 +0800 (SGT) Subject: [Linux-cluster] GFS2 File System mount failing In-Reply-To: <1359564811.2719.25.camel@menhir> References: <1359563363.57630.YahooMailNeo@web193504.mail.sg3.yahoo.com> <1359564811.2719.25.camel@menhir> Message-ID: <1359597381.66958.YahooMailNeo@web193503.mail.sg3.yahoo.com> Thanks Steve . Restart of cman resolved the issue . Thanks Zaman ________________________________ From: Steven Whitehouse To: Zama Ques ; linux clustering Sent: Wednesday, 30 January 2013 10:23 PM Subject: Re: [Linux-cluster] GFS2 File System mount failing Hi, On Thu, 2013-01-31 at 00:29 +0800, Zama Ques wrote: > > > I am facing few issues while creating a GFS2 file system . GFS2 file > creation is successful , but? it is failing while trying to mount the > file system . > > It is failing with the following error : > > === > > [root at eser~]# /etc/init.d/gfs2 start > Mounting GFS2 filesystem (/sharedweb): fs is for a different cluster > error mounting lockproto lock_dlm >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? [FAILED] > ---------- > > Did you restart the cluster demons after you changed the config file? It > looks like it still is looking at the old data from the messages you've >? posted, > [root at eser ~]# tail -f /var/log/messages > Jan 30 15:50:27 eser modcluster: Updating cluster version > Jan 30 15:50:27 eser corosync[7121]:? [QUORUM] Members[2]: 1 2 > Jan 30 15:50:28 eser rgmanager[7379]: Reconfiguring > Jan 30 15:50:28 eser rgmanager[7379]: Loading Service Data > Jan 30 15:50:29 eserrgmanager[7379]: Stopping changed resources. > Jan 30 15:50:29 eser rgmanager[7379]: Restarting changed resources. > Jan 30 15:50:29 eser rgmanager[7379]: Starting changed resources. > Jan 30 15:56:21 eser gfs_controld[7254]: join: fs requires > cluster="mycluster" current="sharedweb" > Jan 30 16:02:43 eser gfs_controld[7254]: join: fs requires > cluster="mycluster" current="sharedweb" > Jan 30 18:46:48 eser gfs_controld[7254]: join: fs requires > cluster="mycluster" current="sharedweb" > > >? ========== > > "sharedweb" is the cluster which I created earlier and created the > GFS2 file system using this cluster name. But I deleted "sharedweb" > cluster and created a new cluster called "mycluster" , but while > mounting the GFS2 partition with the new cluster , it is showing the > error as mentioned above . > > I created the new GFS2 file system using the command as shown below > > mkfs.gfs2 -t mycluster:mygfs2 -p lock_dlm -j 2 /dev/mapper/mpathcp1 > > My cluster config is as follows: > > ===== > # cat /etc/cluster/cluster.conf > > >? ? ? ? >? ? ? ? ? ? ? ? >? ? ? ? ? ? ? ? >? ? ? ? >? ? ? ? > > === > > Please suggest how to resolve the issue > > > Thanks > Zaman > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: