From kjalleda at gmail.com Mon May 1 03:07:29 2006 From: kjalleda at gmail.com (Kishore Jalleda) Date: Sun, 30 Apr 2006 23:07:29 -0400 Subject: [Linux-cluster] MySQL on GFS benchmarks In-Reply-To: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl> References: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl> Message-ID: <78aaf6710604302007s434bc328uf0fe3530e92877a2@mail.gmail.com> No matter what you do, a standalone server would be faster than a clustered architecture, if it is GFS over SAN, or even the MySQL cluster, due to obvious reasons of latency invloved. Anyway what exactly are you trying to build with MySQL, I mean what kind of performance you want from MySQL, may be you could try Replication or if you want good scalability, then you would be better off with the MySQL cluster. Kishore Jalleda http://kjalleda.googlepages.com/projects On 4/26/06, Sander van Beek - Elexis wrote: > Hi all, > > We did a quick benchmark on our 2 node rhel4 testcluster with gfs and > a gnbd storage server. The results were very sad. One of the nodes > (p3 1ghz, 512 mb) could run +/- 2400 insert queries per second when > running mysqld-max 5.0.20 on a local ext3 filesystem. With a 2 node > GFS over GNBD setup and inserts on both nodes at the same time, we > only could do 80 inserts per second. I'm very interested in the > perfomance others got in a similar setup. Would the performance > increase when we use software based iscsi instead of gnbd? > Or should we simply buy SAN equipment? Does anyone have statistics to > compare a standalone mysql setup to a small gfs cluster using a san? > > > With best regards, > Sander van Beek > > --------------------------------------- > > Ing. S. van Beek > Elexis > Marketing 9 > 6921 RE Duiven > The Netherlands > > Tel: +31 (0)26 7110329 > Mob: +31 (0)6 28395109 > Fax: +31 (0)318 611112 > Email: sander at elexis.nl > Web: http://www.elexis.nl > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sander at elexis.nl Mon May 1 13:02:03 2006 From: sander at elexis.nl (Sander van Beek - Elexis) Date: Mon, 01 May 2006 15:02:03 +0200 Subject: [Linux-cluster] MySQL on GFS benchmarks In-Reply-To: References: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl> Message-ID: <7.0.1.0.0.20060501150057.0232a040@elexis.nl> Hi, Im trying (software based) iSCSI right now, could post my benchmarks if other people are interested. Best regards, Sander At 00:00 1-5-2006, you wrote: >Sander, > >It depends, if you are looking for performance, definately SAN. >iSCSI might have a better performance over GNBD. >I found this on google >http://www.bwbug.org/docs/RedHat-GNBD-Ethernet-SAN.pdf > >It has some detais about GFS on SAN and on GNBD, It might help though.3 >Good Luck and keep us posted. > >Att. >FTM > > >On 4/26/06, Sander van Beek - Elexis ><sander at elexis.nl> wrote: >Hi all, > >We did a quick benchmark on our 2 node rhel4 testcluster with gfs and >a gnbd storage server. The results were very sad. One of the nodes >(p3 1ghz, 512 mb) could run +/- 2400 insert queries per second when >running mysqld-max 5.0.20 on a local ext3 filesystem. With a 2 node >GFS over GNBD setup and inserts on both nodes at the same time, we >only could do 80 inserts per second. I'm very interested in the >perfomance others got in a similar setup. Would the performance >increase when we use software based iscsi instead of gnbd? >Or should we simply buy SAN equipment? Does anyone have statistics to >compare a standalone mysql setup to a small gfs cluster using a san? > > >With best regards, >Sander van Beek > >--------------------------------------- > >Ing. S. van Beek >Elexis >Marketing 9 >6921 RE Duiven >The Netherlands > >Tel: +31 (0)26 7110329 >Mob: +31 (0)6 28395109 >Fax: +31 (0)318 611112 >Email: sander at elexis.nl >Web: http://www.elexis.nl > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > > > > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster >No virus found in this incoming message. >Checked by AVG Free Edition. >Version: 7.1.385 / Virus Database: 268.5.1/327 - Release Date: 28-4-2006 Met vriendelijke groet, Sander van Beek --------------------------------------- Ing. S. van Beek Elexis Marketing 9 6921 RE Duiven Tel: +31 (0)26 7110329 Mob: +31 (0)6 28395109 Fax: +31 (0)318 611112 Email: sander at elexis.nl Web: http://www.elexis.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From sander at elexis.nl Mon May 1 13:08:34 2006 From: sander at elexis.nl (Sander van Beek - Elexis) Date: Mon, 01 May 2006 15:08:34 +0200 Subject: [Linux-cluster] MySQL on GFS benchmarks In-Reply-To: <78aaf6710604302007s434bc328uf0fe3530e92877a2@mail.gmail.co m> References: <7.0.1.0.0.20060426135049.022f99d8@elexis.nl> <78aaf6710604302007s434bc328uf0fe3530e92877a2@mail.gmail.com> Message-ID: <7.0.1.0.0.20060501150206.022e2600@elexis.nl> Hi, Ofcourse I understand that the performance will be less because of extra overhead. My goal is to be as close to standalone server performance as possible, with a certain budget in mind. One of the demands I have is that the cluster solution I'm building is transparent to clients, highly available, load balanced, and has to be scalable up to 2-8 servers. Both replication and mysql-cluster are not fully transparent, mysql on gfs can be I Think. I'll keep the list updated when I get more benchmarks. Best regards, Sander At 05:07 1-5-2006, you wrote: >No matter what you do, a standalone server would be faster than a >clustered architecture, if it is GFS over SAN, or even the MySQL >cluster, due to obvious reasons of latency invloved. Anyway what >exactly are you trying to build with MySQL, I mean what kind of >performance you want from MySQL, may be you could try Replication or >if you want good scalability, then you would be better off with the >MySQL cluster. > >Kishore Jalleda >http://kjalleda.googlepages.com/projects > > > >On 4/26/06, Sander van Beek - Elexis ><sander at elexis.nl > wrote: > > Hi all, > > > > We did a quick benchmark on our 2 node rhel4 testcluster with gfs and > > a gnbd storage server. The results were very sad. One of the nodes > > (p3 1ghz, 512 mb) could run +/- 2400 insert queries per second when > > running mysqld-max 5.0.20 on a local ext3 filesystem. With a 2 node > > GFS over GNBD setup and inserts on both nodes at the same time, we > > only could do 80 inserts per second. I'm very interested in the > > perfomance others got in a similar setup. Would the performance > > increase when we use software based iscsi instead of gnbd? > > Or should we simply buy SAN equipment? Does anyone have statistics to > > compare a standalone mysql setup to a small gfs cluster using a san? > > > > > > With best regards, > > Sander van Beek > > > > --------------------------------------- > > > > Ing. S. van Beek > > Elexis > > Marketing 9 > > 6921 RE Duiven > > The Netherlands > > > > Tel: +31 (0)26 7110329 > > Mob: +31 (0)6 28395109 > > Fax: +31 (0)318 611112 > > Email: sander at elexis.nl > > Web: http://www.elexis.nl > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster >No virus found in this incoming message. >Checked by AVG Free Edition. >Version: 7.1.385 / Virus Database: 268.5.1/327 - Release Date: 28-4-2006 Met vriendelijke groet, Sander van Beek --------------------------------------- Ing. S. van Beek Elexis Marketing 9 6921 RE Duiven Tel: +31 (0)26 7110329 Mob: +31 (0)6 28395109 Fax: +31 (0)318 611112 Email: sander at elexis.nl Web: http://www.elexis.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From jparsons at redhat.com Mon May 1 13:32:48 2006 From: jparsons at redhat.com (James Parsons) Date: Mon, 01 May 2006 09:32:48 -0400 Subject: [Linux-cluster] 2nd try: fencing? In-Reply-To: <20060430005245.GA76504@monsterjam.org> References: <20060429015851.GB66106@monsterjam.org> <1146278162.5933.12.camel@mechanism.localnet> <20060430005245.GA76504@monsterjam.org> Message-ID: <44560E00.20806@redhat.com> Jason wrote: >>What you use for a fence method all depends on your hardware. If you >>give a quick explanation of your hardware setup, we might be able to >>help you pick a fence device that will work with what you have already. >>Or if you don't have anything that could be used to block access, you >>might have to buy some network power switches. >> >> > >right now, all I have is 2 dell servers in a rack with identical configs. (dual ethernet >controllers and 1 separate controller for the heartbeat). > Do your Dell servers have Drac support? RHCS supports Drac 4/I and DracIII/MC. >Both are running >linux-ha and are both connected to a dell powervault 220S storage array which is configured so >that both hosts can access the drives concurrently (cluster mode). Im following the instructions >at >http://www.gyrate.org/archives/9 and am at step 17.. which says to configure CCS. > >I guess we could get an APC power switch, but what would you folks suggest? i.e. what model for >just a 2 cluster node (each server has 2 power supplies). Or is there a better way? > An AP7900 would probably work for you. If you use system-config-cluster to configure your cluster, it will detect that you are fencing each node twice with a 'power switch' type of fence on the same level and set the appropriate attributes for you in the cluster.conf file. -J From 14117614 at sun.ac.za Mon May 1 14:01:18 2006 From: 14117614 at sun.ac.za (Pool Lee, Mr <14117614@sun.ac.za>) Date: Mon, 1 May 2006 16:01:18 +0200 Subject: [Linux-cluster] < fecing with out any hardware? > References: <20060429015851.GB66106@monsterjam.org> <1146278162.5933.12.camel@mechanism.localnet> <20060430005245.GA76504@monsterjam.org> <44560E00.20806@redhat.com> Message-ID: <2C04D2F14FD8254386851063BC2B67065E08B2@STBEVS01.stb.sun.ac.za> Hi... I was wondering if it would be a good idea to fence without any type of hardware, besides the pc's. At the moment I have about 6 machines that I want to have a gfs on these 6 machines but due to budget constraints I cant afford to by hardware. How is it possible to fence without the use of hardware, besides manual fencing. These machines are your basic desktop pc's, each has a 80gig HD and a 3Ghz P4 processor. They are all connected by a 1 Gigabit switch. Would if be possible to use GFS or is there any other variant. They all run FC5( Fedora Core 5 ). I need some sort of GFS, because we intend on setting up a mysql clustering system as well. Any ideas would be greatly appreciated. He who has a why to live can bear with almost any how. Friedrich Nietzsche -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of James Parsons Sent: Mon 2006/05/01 03:32 PM To: linux clustering Subject: Re: [Linux-cluster] 2nd try: fencing? Jason wrote: >>What you use for a fence method all depends on your hardware. If you >>give a quick explanation of your hardware setup, we might be able to >>help you pick a fence device that will work with what you have already. >>Or if you don't have anything that could be used to block access, you >>might have to buy some network power switches. >> >> > >right now, all I have is 2 dell servers in a rack with identical configs. (dual ethernet >controllers and 1 separate controller for the heartbeat). > Do your Dell servers have Drac support? RHCS supports Drac 4/I and DracIII/MC. >Both are running >linux-ha and are both connected to a dell powervault 220S storage array which is configured so >that both hosts can access the drives concurrently (cluster mode). Im following the instructions >at >http://www.gyrate.org/archives/9 and am at step 17.. which says to configure CCS. > >I guess we could get an APC power switch, but what would you folks suggest? i.e. what model for >just a 2 cluster node (each server has 2 power supplies). Or is there a better way? > An AP7900 would probably work for you. If you use system-config-cluster to configure your cluster, it will detect that you are fencing each node twice with a 'power switch' type of fence on the same level and set the appropriate attributes for you in the cluster.conf file. -J -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 4078 bytes Desc: not available URL: From ehimmel at burlingtontelecom.com Mon May 1 14:18:14 2006 From: ehimmel at burlingtontelecom.com (Evan Himmel) Date: Mon, 01 May 2006 14:18:14 -0000 Subject: [Linux-cluster] Cluster Suite Message-ID: <43E0D124.4090004@burlingtontelecom.com> I am installing Cluster Suite and GFS to RHEL Update 3. What I noticed is that the modules are installing for kernel 2.6.9-22.ELsmp not the current kernel 2.6.9-34.ELsmp. Is there something I am missing? -- Evan __________________________________________________________________________________________________________________________________________________ Attention! This electronic message contains information that may be legally confidential and/or privileged. The information is intended solely for the individual or entity named above and access by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. From jparsons at redhat.com Mon May 1 14:36:11 2006 From: jparsons at redhat.com (James Parsons) Date: Mon, 01 May 2006 10:36:11 -0400 Subject: [Linux-cluster] < fecing with out any hardware? > In-Reply-To: <2C04D2F14FD8254386851063BC2B67065E08B2@STBEVS01.stb.sun.ac.za> References: <20060429015851.GB66106@monsterjam.org> <1146278162.5933.12.camel@mechanism.localnet> <20060430005245.GA76504@monsterjam.org> <44560E00.20806@redhat.com> <2C04D2F14FD8254386851063BC2B67065E08B2@STBEVS01.stb.sun.ac.za> Message-ID: <44561CDB.4010506@redhat.com> Pool Lee, Mr <14117614 at sun.ac.za> wrote: >Hi... > >I was wondering if it would be a good idea to fence without any type of hardware, besides the pc's. > >At the moment I have about 6 machines that I want to have a gfs on these 6 machines but due to budget constraints I cant afford to by hardware. How is it possible to fence without the use of hardware, besides manual fencing. > >These machines are your basic desktop pc's, each has a 80gig HD and a 3Ghz P4 processor. They are all connected by a 1 Gigabit switch. > >Would if be possible to use GFS or is there any other variant. They all run FC5( Fedora Core 5 ). I need some sort of GFS, because we intend on setting up a mysql clustering system as well. > >Any ideas would be greatly appreciated. > If you plan on doing anything with your cluster other than just tinkering...that is, if you intend to do real work with it, then you just need fencing. It is a requirement for a sound cluster/GFS environment. Here is a WTI unit http://cgi.ebay.com/WTI-NPS-115-Telnet-Dial-Up-Network-Power-Switch_W0QQitemZ9717474832QQcategoryZ86723QQssPageNameZWDVWQQrdZ1QQcmdZViewItem You can also find used APC switches there as well. You just need fencing. -J From Bowie_Bailey at BUC.com Mon May 1 15:42:39 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Mon, 1 May 2006 11:42:39 -0400 Subject: [Linux-cluster] RE: < fecing with out any hardware? > Message-ID: <4766EEE585A6D311ADF500E018C154E30213398C@bnifex.cis.buc.com> Pool Lee, Mr <14117614 at sun.ac.za> wrote: > I was wondering if it would be a good idea to fence without any type > of hardware, besides the pc's. > > At the moment I have about 6 machines that I want to have a gfs on > these 6 machines but due to budget constraints I cant afford to by > hardware. How is it possible to fence without the use of hardware, > besides manual fencing. > > These machines are your basic desktop pc's, each has a 80gig HD and a > 3Ghz P4 processor. They are all connected by a 1 Gigabit switch. > > Would if be possible to use GFS or is there any other variant. They > all run FC5( Fedora Core 5 ). I need some sort of GFS, because we > intend on setting up a mysql clustering system as well. > > Any ideas would be greatly appreciated. You need to have some sort of fencing. You can use manual fencing, but it doesn't work well with production systems. What happens is that any problem in the cluster causes everything to come to a dead stop and wait for you to fix the problem and then let the cluster know it's ok to continue operation. The only real option for a production system is some sort of hardware or software that allows for the cluster to fence misbehaving nodes on it's own. The cheapest is a power switch. -- Bowie From cfeist at redhat.com Mon May 1 16:14:22 2006 From: cfeist at redhat.com (Chris Feist) Date: Mon, 01 May 2006 11:14:22 -0500 Subject: [Linux-cluster] Cluster Suite In-Reply-To: <43E0D124.4090004@burlingtontelecom.com> References: <43E0D124.4090004@burlingtontelecom.com> Message-ID: <445633DE.3010808@redhat.com> Where are you getting the Cluster Suite & GFS Rpms for? The latest versions are built against the latest (2.6.9-34) kernel. You should be able to find them on RHN. Thanks! Chris Evan Himmel wrote: > I am installing Cluster Suite and GFS to RHEL Update 3. What I noticed > is that the modules are installing for kernel 2.6.9-22.ELsmp not the > current kernel 2.6.9-34.ELsmp. Is there something I am missing? > From mwill at penguincomputing.com Mon May 1 16:55:11 2006 From: mwill at penguincomputing.com (Michael Will) Date: Mon, 1 May 2006 09:55:11 -0700 Subject: [Linux-cluster] MySQL on GFS benchmarks Message-ID: <433093DF7AD7444DA65EFAFE3987879C107DE3@jellyfish.highlyscyld.com> I would be interested in any numbers you could provide, but make sure to also state exactly what the underlying hardware is, i.e. node model, cpu speed, ram speed and size, ethernet switch, disk model etc. Michael ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Sander van Beek - Elexis Sent: Monday, May 01, 2006 6:09 AM To: linux clustering Subject: Re: [Linux-cluster] MySQL on GFS benchmarks Hi, Ofcourse I understand that the performance will be less because of extra overhead. My goal is to be as close to standalone server performance as possible, with a certain budget in mind. One of the demands I have is that the cluster solution I'm building is transparent to clients, highly available, load balanced, and has to be scalable up to 2-8 servers. Both replication and mysql-cluster are not fully transparent, mysql on gfs can be I Think. I'll keep the list updated when I get more benchmarks. Best regards, Sander At 05:07 1-5-2006, you wrote: No matter what you do, a standalone server would be faster than a clustered architecture, if it is GFS over SAN, or even the MySQL cluster, due to obvious reasons of latency invloved. Anyway what exactly are you trying to build with MySQL, I mean what kind of performance you want from MySQL, may be you could try Replication or if you want good scalability, then you would be better off with the MySQL cluster. Kishore Jalleda http://kjalleda.googlepages.com/projects On 4/26/06, Sander van Beek - Elexis wrote: > Hi all, > > We did a quick benchmark on our 2 node rhel4 testcluster with gfs and > a gnbd storage server. The results were very sad. One of the nodes > (p3 1ghz, 512 mb) could run +/- 2400 insert queries per second when > running mysqld-max 5.0.20 on a local ext3 filesystem. With a 2 node > GFS over GNBD setup and inserts on both nodes at the same time, we > only could do 80 inserts per second. I'm very interested in the > perfomance others got in a similar setup. Would the performance > increase when we use software based iscsi instead of gnbd? > Or should we simply buy SAN equipment? Does anyone have statistics to > compare a standalone mysql setup to a small gfs cluster using a san? > > > With best regards, > Sander van Beek > > --------------------------------------- > > Ing. S. van Beek > Elexis > Marketing 9 > 6921 RE Duiven > The Netherlands > > Tel: +31 (0)26 7110329 > Mob: +31 (0)6 28395109 > Fax: +31 (0)318 611112 > Email: sander at elexis.nl > Web: http://www.elexis.nl > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.385 / Virus Database: 268.5.1/327 - Release Date: 28-4-2006 Met vriendelijke groet, Sander van Beek --------------------------------------- Ing. S. van Beek Elexis Marketing 9 6921 RE Duiven Tel: +31 (0)26 7110329 Mob: +31 (0)6 28395109 Fax: +31 (0)318 611112 Email: sander at elexis.nl Web: http://www.elexis.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From omer at faruk.net Mon May 1 19:11:18 2006 From: omer at faruk.net (Omer Faruk Sen) Date: Mon, 1 May 2006 22:11:18 +0300 (EEST) Subject: [Linux-cluster] HP msa20 and rhcs? Message-ID: <1333.85.101.156.147.1146510678.squirrel@85.101.156.147> Hi, I need a cheap shared storage and wanted to know if anyone in this list used HP MSA20 (shared SCSI) on rhcs? I want to setup a 2 node cluster with shared storage and cheapest HP shared storage seems to be MSA20.. Best Reagrads, -- Omer Faruk Sen http://www.faruk.net From 14117614 at sun.ac.za Mon May 1 19:45:55 2006 From: 14117614 at sun.ac.za (Pool Lee, Mr <14117614@sun.ac.za>) Date: Mon, 1 May 2006 21:45:55 +0200 Subject: [Linux-cluster] RE: < fecing with out any hardware? > References: <4766EEE585A6D311ADF500E018C154E30213398C@bnifex.cis.buc.com> Message-ID: <2C04D2F14FD8254386851063BC2B67065E08B4@STBEVS01.stb.sun.ac.za> Hi.. What about software fencing? Is it really nesasary to be hardware! Is there a difference between lutre/cfs, the product that sun uses, and gfs? I'm planning to do mostly numerical work with the cluster and thus I would like all the machines to be able to retrieve data, as if it was local on the machine. NFS is very limited in this regard because we intend on using vast arrays of matrices, that can be up to 1-2 Gig. I was hoping to implement GFS since all the machines are already setup, without the hardware fencing though. Kind Regards Lee He who has a why to live can bear with almost any how. Friedrich Nietzsche -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of Bowie Bailey Sent: Mon 2006/05/01 05:42 PM To: linux clustering Subject: [Linux-cluster] RE: < fecing with out any hardware? > Pool Lee, Mr <14117614 at sun.ac.za> wrote: > I was wondering if it would be a good idea to fence without any type > of hardware, besides the pc's. > > At the moment I have about 6 machines that I want to have a gfs on > these 6 machines but due to budget constraints I cant afford to by > hardware. How is it possible to fence without the use of hardware, > besides manual fencing. > > These machines are your basic desktop pc's, each has a 80gig HD and a > 3Ghz P4 processor. They are all connected by a 1 Gigabit switch. > > Would if be possible to use GFS or is there any other variant. They > all run FC5( Fedora Core 5 ). I need some sort of GFS, because we > intend on setting up a mysql clustering system as well. > > Any ideas would be greatly appreciated. You need to have some sort of fencing. You can use manual fencing, but it doesn't work well with production systems. What happens is that any problem in the cluster causes everything to come to a dead stop and wait for you to fix the problem and then let the cluster know it's ok to continue operation. The only real option for a production system is some sort of hardware or software that allows for the cluster to fence misbehaving nodes on it's own. The cheapest is a power switch. -- Bowie -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3942 bytes Desc: not available URL: From Bowie_Bailey at BUC.com Mon May 1 20:15:43 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Mon, 1 May 2006 16:15:43 -0400 Subject: [Linux-cluster] RE: < fecing with out any hardware? > Message-ID: <4766EEE585A6D311ADF500E018C154E30268496F@bnifex.cis.buc.com> Pool Lee, Mr <14117614 at sun.ac.za> wrote: > > What about software fencing? Is it really nesasary to be hardware! > > Is there a difference between lutre/cfs, the product that sun uses, > and gfs? > > I'm planning to do mostly numerical work with the cluster and thus I > would like all the machines to be able to retrieve data, as if it > was local on the machine. NFS is very limited in this regard because > we intend on using vast arrays of matrices, that can be up to 1-2 > Gig. > > I was hoping to implement GFS since all the machines are already > setup, without the hardware fencing though. The thing with fencing is that you have to choose a method which is supported by your configuration. These are the basic ways to fence a cluster: Manual fencing - nothing special needed, but it doesn't work well in a production environment. Power fencing - Forcibly reboots a misbehaving node. Requires a compatible power switch. Network fencing - Blocks the misbehaving node's access to the cluster resources. Requires a compatible switch (usually used with fiber switches). Software fencing - Notifies storage management software to block access to the misbehaving node. Requires compatible storage configuration. I believe this is only supported with GNBD storage servers. Your choices are limited by your configuration. The only options that can be used with any configuration are manual and power. I don't know about the differences between the RedHat Clustering and lutre/cfs. I DO know that any type of clustering will require fencing of some sort. -- Bowie From fgp at phlo.org Mon May 1 20:30:45 2006 From: fgp at phlo.org (Florian G. Pflug) Date: Mon, 01 May 2006 22:30:45 +0200 Subject: [Linux-cluster] iSCSI fence agent In-Reply-To: <1146147436.12841.13.camel@merlin.Mines.EDU> References: <44507BE8.20402@adelpha-lan.org> <1146144991.2984.127.camel@ayanami.boston.redhat.com> <4450CA79.9020400@adelpha-lan.org> <1146147436.12841.13.camel@merlin.Mines.EDU> Message-ID: <44566FF5.1000602@phlo.org> Matthew B. Brookover wrote: > I have not used this tool in a while, but it did work on my system. > > I would not trust this version to fence properly. Using system does not > allow the exit status of iptables to be checked for errors. System only > reports the status of the ssh command, not the command that is called on > the remote host. I believe ssh 'forwards' that exit-code of the remote-command - at least the version that comes with debian/sarge does. > ssh fgp at dev '/bin/false'; echo $? gives: 1 > ssh fgp at dev '/bin/true'; echo $? gives: 0 At least on my machine.. greetings, Florian Pflug From mbrookov at mines.edu Mon May 1 20:45:30 2006 From: mbrookov at mines.edu (Matthew B. Brookover) Date: Mon, 01 May 2006 14:45:30 -0600 Subject: [Linux-cluster] iSCSI fence agent In-Reply-To: <44566FF5.1000602@phlo.org> References: <44507BE8.20402@adelpha-lan.org> <1146144991.2984.127.camel@ayanami.boston.redhat.com> <4450CA79.9020400@adelpha-lan.org> <1146147436.12841.13.camel@merlin.Mines.EDU> <44566FF5.1000602@phlo.org> Message-ID: <1146516331.16843.9.camel@merlin.Mines.EDU> Hmm, nice to know that. I must have been thinking of rsh or something else. I would encourage you to test carefully. Matt On Mon, 2006-05-01 at 22:30 +0200, Florian G. Pflug wrote: > Matthew B. Brookover wrote: > > I have not used this tool in a while, but it did work on my system. > > > > I would not trust this version to fence properly. Using system does not > > allow the exit status of iptables to be checked for errors. System only > > reports the status of the ssh command, not the command that is called on > > the remote host. > I believe ssh 'forwards' that exit-code of the remote-command - at > least the version that comes with debian/sarge does. > > > ssh fgp at dev '/bin/false'; echo $? > gives: > 1 > > > ssh fgp at dev '/bin/true'; echo $? > gives: > 0 > > At least on my machine.. > > greetings, Florian Pflug -------------- next part -------------- An HTML attachment was scrubbed... URL: From placid at adelpha-lan.org Mon May 1 20:47:52 2006 From: placid at adelpha-lan.org (Castang Jerome) Date: Mon, 01 May 2006 22:47:52 +0200 Subject: [Linux-cluster] iSCSI fence agent In-Reply-To: <1146516331.16843.9.camel@merlin.Mines.EDU> References: <44507BE8.20402@adelpha-lan.org> <1146144991.2984.127.camel@ayanami.boston.redhat.com> <4450CA79.9020400@adelpha-lan.org> <1146147436.12841.13.camel@merlin.Mines.EDU> <44566FF5.1000602@phlo.org> <1146516331.16843.9.camel@merlin.Mines.EDU> Message-ID: <445673F8.1060507@adelpha-lan.org> Matthew B. Brookover a ?crit : > Hmm, nice to know that. I must have been thinking of rsh or something > else. I would encourage you to test carefully. > > Matt It seems to work.. -- Jerome CASTANG Tel: 06.85.74.33.02 mail: jerome.castang at adelpha-lan.org --------------------------------------------- RTFM ! From vcmarti at sph.emory.edu Mon May 1 21:03:49 2006 From: vcmarti at sph.emory.edu (Vernard Martin) Date: Mon, 01 May 2006 17:03:49 -0400 Subject: [Linux-cluster] Cluster Suite In-Reply-To: <445633DE.3010808@redhat.com> References: <43E0D124.4090004@burlingtontelecom.com> <445633DE.3010808@redhat.com> Message-ID: <445677B5.5070708@sph.emory.edu> Chris Feist wrote: > Where are you getting the Cluster Suite & GFS Rpms for? The latest > versions are built against the latest (2.6.9-34) kernel. You should > be able to find them on RHN. I was trying to install RHCS & GFS as well but could only find the RHEL3 version on RHN. Am I just looking in the wrong spot? I found the SRPMs on the redhat site at ftp://ftp.redhat.com/pub/redhat/linux/enterprise/4/en/RHCS/x86_64/SRPMS which apparently was last built again the 2.6.9-11 kernle as that is what it was looking for in the .spec files for cman-kernel and dlm-kernel. am I looking in the wrong spot? If so, where is the correct spot? -- Vernard Martin (vcmarti at sph.emory.edu) Applications Developer/Analyst Information Services -- School of Public Health -- Emory University From cfeist at redhat.com Mon May 1 22:12:51 2006 From: cfeist at redhat.com (Chris Feist) Date: Mon, 01 May 2006 17:12:51 -0500 Subject: [Linux-cluster] Cluster Suite In-Reply-To: <445677B5.5070708@sph.emory.edu> References: <43E0D124.4090004@burlingtontelecom.com> <445633DE.3010808@redhat.com> <445677B5.5070708@sph.emory.edu> Message-ID: <445687E3.7030906@redhat.com> If you pay for Cluster Suite Entitlements for RHEL4, there should be a Cluster Suite channel under the main RHEL4 channel. Otherwise you can download the upgraded SRPMS here: ftp://ftp.redhat.com/pub/redhat/linux/updates/enterprise/4AS/en/RHGFS/SRPMS/ Thanks, Chris Vernard Martin wrote: > Chris Feist wrote: >> Where are you getting the Cluster Suite & GFS Rpms for? The latest >> versions are built against the latest (2.6.9-34) kernel. You should >> be able to find them on RHN. > I was trying to install RHCS & GFS as well but could only find the RHEL3 > version on RHN. Am I just looking in the wrong spot? > > I found the SRPMs on the redhat site at > ftp://ftp.redhat.com/pub/redhat/linux/enterprise/4/en/RHCS/x86_64/SRPMS > which apparently was last built again the > 2.6.9-11 kernle as that is what it was looking for in the .spec files > for cman-kernel and dlm-kernel. > > am I looking in the wrong spot? If so, where is the correct spot? > > > From prolay123 at yahoo.com Tue May 2 08:29:17 2006 From: prolay123 at yahoo.com (prolay chatterjee) Date: Tue, 2 May 2006 01:29:17 -0700 (PDT) Subject: [Linux-cluster] Slow data writing rate Message-ID: <20060502082917.3246.qmail@web60711.mail.yahoo.com> Hi, I am administrating one site having IBM X255 server with IBM FASt T500 storage with fiber optic link using Qlogic 2300 controler.The site was set up few years back with RHEL AS 2.1.Now it is observed that when cluster service is up writing data in a cluster partition is very slow (in KBs) as it was expected to be at least in terms of 100MBs.The Qlogic speed is 2 GBPS.It also found that whenever cluster service is st oped and cluster partition mounted manually with general mount command data writing speed is nearly 1GBPS.Please suggest the solution of this problem. Regards, Prolay Chatterjee --------------------------------- Love cheap thrills? Enjoy PC-to-Phone calls to 30+ countries for just 2?/min with Yahoo! Messenger with Voice. -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.vit at exadron.com Tue May 2 11:50:37 2006 From: m.vit at exadron.com (Vit Matteo) Date: Tue, 02 May 2006 13:50:37 +0200 Subject: [Linux-cluster] problem with fencing Message-ID: <4457478D.1050007@exadron.com> Hi, I've a system with 4 nodes and I can sucessfully mount a gfs partition. But if I shutdown one node, I can't access to the gfs partition (If I try to write something, the program that access to the gfs partition hangs). I find in the logs Apr 28 13:37:40 c0-21 fenced[4863]: fencing node "c0-28" Apr 28 13:37:40 c0-21 fenced[4863]: fence "c0-28" failed where c0-28 is the node powered off. If I power on c0-28, then I can access the gfs partition. Is it correct ? Someone with the same problem ? I use cluster-1.01.00 built from source with a 2.6.14 kernel. Every node has one vote. The fence agent is fence_manual. Matteo Vit From vlaurenz at advance.net Tue May 2 23:35:14 2006 From: vlaurenz at advance.net (Vito Laurenza) Date: Tue, 02 May 2006 19:35:14 -0400 Subject: [Linux-cluster] RHEL4 and Cluster Suite Message-ID: <4457ECB2.7020705@advance.net> Hello all, I'm new to Cluster Suite and I was wondering if there was a tutorial of some kind regarding the cluster.conf file. I've read the Red Hat docs and they suggest using the GUI to configure, but I'm running strictly command line here and need to know how to properly write the XML. I've only come across a couple of samples and was hoping someone could give give me (or point me to) a complete run down of valid tags and attributes. Any help would be appreciated. :::: Vito Laurenza From gforte at leopard.us.udel.edu Tue May 2 23:46:43 2006 From: gforte at leopard.us.udel.edu (Greg Forte) Date: Tue, 02 May 2006 19:46:43 -0400 Subject: [Linux-cluster] RHEL4 and Cluster Suite In-Reply-To: <4457ECB2.7020705@advance.net> References: <4457ECB2.7020705@advance.net> Message-ID: <4457EF63.4050601@leopard.us.udel.edu> Agreed! I asked about this months ago, don't think I ever got a straight answer. 'course I suppose technically we could go wade through the code that reads the file to figure it out ourselves ... I'd rather see a document, though. Maybe if I get unlazy I'll go do just that and write one. Unless someone's got one handy ... Vito, I wouldn't personally recommend the gui, anyway; I don't find it to be very robust, and you'll be better off learning to do it by hand in the long run. -g p.s. just to pick a nit, you can be "strictly command line" on a box and still run the gui tools remotely from another machine; you just need to have the X11, etc. packages installed but set the default run level to 3 in /etc/inittab. Vito Laurenza wrote: > Hello all, > I'm new to Cluster Suite and I was wondering if there was a tutorial of > some kind regarding the cluster.conf file. I've read the Red Hat docs > and they suggest using the GUI to configure, but I'm running strictly > command line here and need to know how to properly write the XML. I've > only come across a couple of samples and was hoping someone could give > give me (or point me to) a complete run down of valid tags and > attributes. Any help would be appreciated. > > :::: Vito Laurenza > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From vlaurenz at advance.net Wed May 3 00:08:54 2006 From: vlaurenz at advance.net (Vito Laurenza) Date: Tue, 02 May 2006 20:08:54 -0400 Subject: [Linux-cluster] RHEL4 and Cluster Suite In-Reply-To: <4457EF63.4050601@leopard.us.udel.edu> References: <4457ECB2.7020705@advance.net> <4457EF63.4050601@leopard.us.udel.edu> Message-ID: <4457F496.80203@advance.net> Greg, I'm glad I'm not the only one who can't find the info. What I meant by "strictly command line" is that I have no desire to use the GUI. :) Let me know if you find anything. I'll keep you posted as well. :::: Vito Laurenza Greg Forte wrote: > Agreed! I asked about this months ago, don't think I ever got a > straight answer. 'course I suppose technically we could go wade through > the code that reads the file to figure it out ourselves ... I'd rather > see a document, though. Maybe if I get unlazy I'll go do just that and > write one. Unless someone's got one handy ... > > Vito, I wouldn't personally recommend the gui, anyway; I don't find it > to be very robust, and you'll be better off learning to do it by hand in > the long run. > > -g > > p.s. just to pick a nit, you can be "strictly command line" on a box and > still run the gui tools remotely from another machine; you just need to > have the X11, etc. packages installed but set the default run level to 3 > in /etc/inittab. > > Vito Laurenza wrote: >> Hello all, >> I'm new to Cluster Suite and I was wondering if there was a tutorial >> of some kind regarding the cluster.conf file. I've read the Red Hat >> docs and they suggest using the GUI to configure, but I'm running >> strictly command line here and need to know how to properly write the >> XML. I've only come across a couple of samples and was hoping someone >> could give give me (or point me to) a complete run down of valid tags >> and attributes. Any help would be appreciated. >> >> :::: Vito Laurenza >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From Alain.Moulle at bull.net Wed May 3 06:37:03 2006 From: Alain.Moulle at bull.net (Alain Moulle) Date: Wed, 03 May 2006 08:37:03 +0200 Subject: [Linux-cluster] CS4 Update 2 / fencing and dump Message-ID: <44584F8F.3020408@bull.net> Hi I'm facing a big problem using the CS4 with fence_ipmilan : when a node is "crashing", the other node is fencing it poweroff/poweron whereas the node was entering a dump process ... the reason is that with many systems, when a machine dumps, the state is still "RUNNING", and there is never a state "DUMPING". So with this fencing method, I would never have a dump to analyse a problem. Did someone face this problem and has an idea or a workaround ? Thanks Alain Moull? From christoph.thommen at bl.ch Wed May 3 06:54:46 2006 From: christoph.thommen at bl.ch (Thommen, Christoph FKD) Date: Wed, 3 May 2006 08:54:46 +0200 Subject: [Linux-cluster] which fence device? Message-ID: <553B0E9C0C87D24A876E6B14FFE373D6BFD753@faimbx01.bl.ch> Hi, I'm looking for a power fencing switch for my 3-4 cluster nodes... which switch do you use, can someone recommend one to me? Thanks for your response Greets Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlopmart at gmail.com Wed May 3 10:32:10 2006 From: carlopmart at gmail.com (carlopmart) Date: Wed, 03 May 2006 12:32:10 +0200 Subject: [Linux-cluster] Recommended HP servers for cluster suite In-Reply-To: <445123AE.4000204@redhat.com> References: <4450F472.8050205@gmail.com> <20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de> <445123AE.4000204@redhat.com> Message-ID: <445886AA.5000608@gmail.com> And what about Porliant DL 380?? Thanks James Parsons wrote: > rainer at ultra-secure.de wrote: > >> Quoting carlopmart : >> >>> Hi all, >>> >>> Somebody can recommends me some HP servers to use with Redhat >>> Cluster Suite for RHEL 4?? My requeriments are: >>> >>> - 4GB RAM >>> - Scsi disks >>> - Two CPUs >>> - iLO support for RHCS fence agent. >>> >>> I don't need shred storage. >> >> >> Blades. >> bl20p can be had very cheap nowadays, but should be enough for most >> tasks. >> Downside: only two internal disks, the rest is via SAN (or iSCSI). > > I want to add a vote for the proliant bl* series. It uses iLO...not the > older Riloe cards, which have been problematic now and then. > > -J > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- CL Martinez carlopmart {at} gmail {d0t} com From sanelson at gmail.com Wed May 3 10:45:46 2006 From: sanelson at gmail.com (Steve Nelson) Date: Wed, 3 May 2006 11:45:46 +0100 Subject: [Linux-cluster] Recommended HP servers for cluster suite In-Reply-To: <445886AA.5000608@gmail.com> References: <4450F472.8050205@gmail.com> <20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de> <445123AE.4000204@redhat.com> <445886AA.5000608@gmail.com> Message-ID: On 5/3/06, carlopmart wrote: > And what about Porliant DL 380?? I use dl380s for low-end, dl580s and now 585s for upper-end clusters. Very very happy with them. S. From Timothy.Lin at noaa.gov Wed May 3 10:50:41 2006 From: Timothy.Lin at noaa.gov (Timothy Lin) Date: Wed, 03 May 2006 06:50:41 -0400 Subject: [Linux-cluster] Recommended HP servers for cluster suite In-Reply-To: <445886AA.5000608@gmail.com> References: <4450F472.8050205@gmail.com> <20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de> <445123AE.4000204@redhat.com> <445886AA.5000608@gmail.com> Message-ID: <44588B01.6000302@noaa.gov> Blades are a lot more expensive than comparable 1U/2U boxes. Great if you are running out of space. another good thing is, with proper SAN planning and boot-from san setup, you can do drop-in replacement when a blade fails. we have good experience putting clustersuite on BL35p ( the half height blades ) , but GFS is another story. (Might have something to do with MSA1500 we have, Redhat and HP are blaming each other on that issue) iLO is pretty nice, but I think VMware consoles are easier to use :) Now if i can just figure out how to make GFS work properly in ESX server .... Tim. >>> >>> >>> Blades. >>> bl20p can be had very cheap nowadays, but should be enough for most >>> tasks. >>> Downside: only two internal disks, the rest is via SAN (or iSCSI). >> >> >> I want to add a vote for the proliant bl* series. It uses iLO...not >> the older Riloe cards, which have been problematic now and then. >> >> -J >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > From cosimo at streppone.it Wed May 3 10:53:03 2006 From: cosimo at streppone.it (Cosimo Streppone) Date: Wed, 03 May 2006 12:53:03 +0200 Subject: [Linux-cluster] Recommended HP servers for cluster suite In-Reply-To: References: <4450F472.8050205@gmail.com> <20060427215714.3rwrvufugw0gckgw@www.ultra-secure.de> <445123AE.4000204@redhat.com> <445886AA.5000608@gmail.com> Message-ID: <44588B8F.1090609@streppone.it> Steve Nelson wrote: > On 5/3/06, carlopmart wrote: > >> And what about Porliant DL 380?? > > I use dl380s for low-end, dl580s and now 585s for upper-end clusters. > Very very happy with them. Do you use iLO port for fencing? Please can you explain your iLO configuration? I have some doubts on how to configure fencing. Example: you have nodes A,B and iLO devices Ai, Bi Fencing device for node A should be Ai or Bi? I had also troubles on installation of additional perl modules required to make fence_ilo agent work: despite having IO::Socket::SSL and Net::SSL::something correctly installed, it keeps throwing error messages and I can't seem to correctly startup fenced. -- Cosimo From kanderso at redhat.com Wed May 3 15:56:58 2006 From: kanderso at redhat.com (Kevin Anderson) Date: Wed, 03 May 2006 10:56:58 -0500 Subject: [Linux-cluster] Red Hat Summit - Cluster and Storage Talks Message-ID: <1146671819.2876.56.camel@dhcp80-204.msp.redhat.com> Hi all, Caution - this is a shameless plug for some of the cluster developers. First, I would like to apologize but we have been too focused on getting the new cluster infrastructure integrated, and pushing GFS2/DLM upstream, that we have not organized a cluster summit for this year. However, some of the key architects of the cluster components are going to be speaking at this years Red Hat Summit ( http://www.redhat.com/promo/summit/ ) at the end of the month of May in Nashville. Dave Teigland, Steven Whitehouse, Steven Dake and Jim Parsons are all on the schedule to speak. * Dave Teigland will be covering the evolution and exposure of the cluster components APIs including DLM, CMAN and CCS. * Steve Whitehouse will describe the changes between GFS and GFS2, reasons behind the changes and share some details about the new layout. * Steven Dake is going to cover the openais project, the integration of totem protocol into the core cluster infrastructure and cover the new high availability APIs that openais provides and some direction on where it is heading. * Jim Parsons will describe the Conga project, which is going to provide the new management interfaces and infrastructure to make cluster and storage administration much simpler. All of these presentations are currently scheduled for May 31, the first day of the Red Hat Summit, and will include some Q&A time. We have not set up any formal cluster group discussions, but if people were to have an interest, I would imagine that there would be ample opportunities to find a local establishment where all of these guys would be to have an informal get together of cluster developers. It is not often that all of these guys are in the same country at the same time, so hopefully we can take advantage of it. So, check out the web site for the Red Hat Summit, we are under the Cluster and Storage track. If you sign up (sorry, but there is a fee to attend), either let me know or respond to the linux-cluster mailing list. If enough cluster developers are interested, we can be more specific about where we will be hanging out, rather than just strolling all over Nashville. Thanks and hope to see you in Nashville. Kevin Anderson Director, Cluster and Storage Development Red Hat kanderso at redhat.com Red Hat Summit http://www.redhat.com/promo/summit/ Cluster Track http://www.redhat.com/promo/summit/tracks/#cluster From m_list at eshine.de Wed May 3 15:57:20 2006 From: m_list at eshine.de (Arnd) Date: Wed, 03 May 2006 17:57:20 +0200 Subject: [Linux-cluster] RE: < fecing with out any hardware? > In-Reply-To: <4766EEE585A6D311ADF500E018C154E30268496F@bnifex.cis.buc.com> References: <4766EEE585A6D311ADF500E018C154E30268496F@bnifex.cis.buc.com> Message-ID: <4458D2E0.3080802@eshine.de> Bowie Bailey wrote: > Pool Lee, Mr <14117614 at sun.ac.za> wrote: >> What about software fencing? Is it really nesasary to be hardware! > > Your choices are limited by your configuration. The only options that > can be used with any configuration are manual and power. I was testing a few possibilities of fencing. GFS expects from the fencing script the status "0" to decide if it was successfull. So you can specify any script by your own in the cluster.conf. (Please correct me, if I'm wrong) This script can be an automatic login to the failed server (ssh, rlogin, serial console) which can execute any remote operation (for example unload the module of the SAN-device) or causing an kernel panic (which is the fencing-method in ocfs2 ;-) ). Your fencing-script must assure that the failed host doesn't have access to the filesystem anymore! Arnd From Bowie_Bailey at BUC.com Wed May 3 16:08:19 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Wed, 3 May 2006 12:08:19 -0400 Subject: [Linux-cluster] RE: < fecing with out any hardware? > Message-ID: <4766EEE585A6D311ADF500E018C154E302684987@bnifex.cis.buc.com> Arnd wrote: > Bowie Bailey wrote: > > Pool Lee, Mr <14117614 at sun.ac.za> wrote: > > > What about software fencing? Is it really nesasary to be hardware! > > > > Your choices are limited by your configuration. The only options > > that can be used with any configuration are manual and power. > > I was testing a few possibilities of fencing. GFS expects from the > fencing script the status "0" to decide if it was successfull. So you > can specify any script by your own in the cluster.conf. (Please > correct me, if I'm wrong) > > This script can be an automatic login to the failed server (ssh, > rlogin, serial console) which can execute any remote operation (for > example unload the module of the SAN-device) or causing an kernel > panic (which is the fencing-method in ocfs2 ;-) ). > > Your fencing-script must assure that the failed host doesn't have > access to the filesystem anymore! I'm not an expert on the topic. I just use the built-in stuff. But my understanding is that you can write your own fence script without too much trouble. You just have to be careful and make sure that it is bulletproof. If your script relies on an ssh connection to the failed server and the failed server is not responding to ssh, then the fencing fails and the entire cluster must stop and wait for manual intervention. -- Bowie From vlaurenz at advance.net Wed May 3 21:16:39 2006 From: vlaurenz at advance.net (Vito Laurenza) Date: Wed, 03 May 2006 17:16:39 -0400 Subject: [Linux-cluster] RHEL4 and Cluster Suite In-Reply-To: <4457ECB2.7020705@advance.net> References: <4457ECB2.7020705@advance.net> Message-ID: <44591DB7.9070706@advance.net> ...Also... How do I configure Cluster Suite to notify (via email) on Heartbeat events, fence events, etc? Thanks! :::: Vito Laurenza Vito Laurenza wrote: > Hello all, > I'm new to Cluster Suite and I was wondering if there was a tutorial of > some kind regarding the cluster.conf file. I've read the Red Hat docs > and they suggest using the GUI to configure, but I'm running strictly > command line here and need to know how to properly write the XML. I've > only come across a couple of samples and was hoping someone could give > give me (or point me to) a complete run down of valid tags and > attributes. Any help would be appreciated. > > :::: Vito Laurenza > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From herta.vandeneynde at cc.kuleuven.be Thu May 4 07:25:57 2006 From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde) Date: Thu, 04 May 2006 09:25:57 +0200 Subject: [Linux-cluster] umount failed - device is busy In-Reply-To: <434C0FA7.9000803@cc.kuleuven.be> References: <434A7ADE.108@cc.kuleuven.be> <434A8FE6.40508@cc.kuleuven.be> <1128963722.4680.21.camel@ayanami.boston.redhat.com> <434AC9DE.50606@cc.kuleuven.be> <1128978146.4680.37.camel@ayanami.boston.redhat.com> <434ADB8C.9010508@cc.kuleuven.be> <1129043197.4680.85.camel@ayanami.boston.redhat.com> <434BDECD.2060303@cc.kuleuven.be> <1129054711.4680.119.camel@ayanami.boston.redhat.com> <434C0FA7.9000803@cc.kuleuven.be> Message-ID: <4459AC85.7020308@cc.kuleuven.be> Herta Van den Eynde wrote: > Lon Hohberger wrote: > >> On Tue, 2005-10-11 at 17:48 +0200, Herta Van den Eynde wrote: >> >> >>> Bit of extra information: the system that was running the services >>> got STONITHed by the other cluster member shortly before midnight. >>> The services all failed over nicely, but the situation remains: if I >>> try to stop or relocate a service, I get a "device is busy". >>> I suppose that rules out an intermittent issue. >>> >>> There's no mounts below mounts. >> >> >> >> Drat. >> >> Nfsd is the most likely candidate for holding the reference. >> Unfortunately, this is not something I can track down; you will have to >> either file a support request and/or a Bugzilla. When you get a chance, >> you should definitely try stopping nfsd and seeing if that clears the >> mystery references (allowing you to unmount). If the problem comes from >> nfsd, it should not be terribly difficult to track down. >> >> Also, you should not need to recompile your kernel to probe all the LUNs >> per device; just edit /etc/modules.conf: >> >> options scsi_mod max_scsi_luns=128 >> >> ... then run mkinitrd to rebuild the initrd image. >> >> -- Lon > > Next maintenance window is 4 weeks away, so I won't be able to test the > nfsd hypothesis anytime soon. In the meantime, I'll file a support > request. I'll keep you posted. > > At least the unexpected STONITH confirms that the failover still works. > > The /etc/modules.conf tip is a big time saver. Rebuilding the modules > takes forever. > > Thanks, Lon. > > Herta Apologies for not updating this sooner. (Thanks for remindeing me, Owen.) During a later maintenance window, I shut down the cluster services, but it wasn't until I stopped the nfsd, that the filesystems could actually be unmounted, which seems to confirm Lon's theory about nfsd being the likely candidate for holding the reference. I found a note elsewhere on the web where someone worked around the problem by stopping nfsd, stopping the service, restarting nfsd, and relocating the service. Disadvantage being that all nfs services experience a minor interrupt at the time. Anyway, my problem disappeared during the latest maintenance window. Both nfs-utils and clumanager were updated (nfs-utils-1.0.6-42EL -> nfs-utils-1.0.6-43EL, clumanager-1.2.28-1 -> clumanager-1.2.31-1), so I'm not 100% sure which of the two fixed it, and curious though I am, I simply don't have the time to start reading the code. If anyone has further insights, I'd love to read about it, though. Kind regards, Herta Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From mark at wormgoor.com Thu May 4 18:16:29 2006 From: mark at wormgoor.com (Mark Wormgoor) Date: Thu, 04 May 2006 20:16:29 +0200 Subject: [Linux-cluster] Sharing disk using gnbd Message-ID: <445A44FD.4040604@wormgoor.com> Hi, I have a small network with 3 machines. All machines are FC5 as of yesterday. One machine is a server and has most of my storage. I'm currently sharing the disks using NFS, but am researching better ways of sharing my disks. My main reason for doing this is that I would like Posix semantics, but better performance over NFS would be a nice benefit. I think my main options are gnbd (GFS), iscsi and ata-over-ethernet. Since GFS is best supported in Fedora, that was my first attempt. However, when going through the docs, I noticed that I could not mount the disk on the server itsself. 1. If you use GFS on the disk and mount it like that on the server, you have to share it using gnbd with nocache, which is a huge performance hit. 2. According to the gnbd docs, you should never import the disks on the machine they are exported on, so that's out as well. Can this be true? Is gnbd unusable if you want to use the disk on the server? On the other hand, GFS is a bit overkill, since I don't need the clustering; I just want to share my disk. However, for aoe and iscsi, I think there is no way of sharing the file system between multiple systems, which would make them unusable. Besides, I could not find rpms for aoe, and for iscsi I could only find the server rpm, not the client. Is there a solution? Am I stuck with NFS? Kind regards, Mark From saju8 at rediffmail.com Thu May 4 19:06:51 2006 From: saju8 at rediffmail.com (saju john) Date: 4 May 2006 19:06:51 -0000 Subject: [Linux-cluster] Centralized Cron Message-ID: <20060504190651.30345.qmail@webmail10.rediffmail.com> Dear All, Is there any way to make a centalized cron while using Redhat HA cluster with Sahred storage. I mean to put the crontab entry for a particular user on shared storage, so that when the cluster shifts, on the other node cron should read from the cron file in shared storage. This setup has the advantage that we don't need to manullay update the cron entry in both nodes. I tried two ways , but not success a) Make a soft link from /var/spool/cron/ to /path/to/shared/storage/. This will work as long as I didn't make any changes to existing crontab. Once I make changes to crontab, the link is removed and file is created at /var/spool/cron/ b) Soft link the cron directory in /var/spool to /path/to/shared/storage/cron. This is working till the cluster shift. The cron is getting dead when the cluster shifts as it lose the /var/spool/cron link's destination driectory which will be mapped to the other node Thanks in advance Saju John -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bowie_Bailey at BUC.com Thu May 4 19:31:49 2006 From: Bowie_Bailey at BUC.com (Bowie Bailey) Date: Thu, 4 May 2006 15:31:49 -0400 Subject: [Linux-cluster] Sharing disk using gnbd Message-ID: <4766EEE585A6D311ADF500E018C154E3026849A1@bnifex.cis.buc.com> Mark Wormgoor wrote: > Hi, > > I have a small network with 3 machines. All machines are FC5 as of > yesterday. One machine is a server and has most of my storage. I'm > currently sharing the disks using NFS, but am researching better ways > of sharing my disks. My main reason for doing this is that I would > like Posix semantics, but better performance over NFS would be a nice > benefit. > > I think my main options are gnbd (GFS), iscsi and ata-over-ethernet. > Since GFS is best supported in Fedora, that was my first attempt. > However, when going through the docs, I noticed that I could not mount > the disk on the server itsself. > 1. If you use GFS on the disk and mount it like that on the server, > you have to share it using gnbd with nocache, which is a huge > performance hit. > 2. According to the gnbd docs, you should never import the disks on > the machine they are exported on, so that's out as well. > Can this be true? Is gnbd unusable if you want to use the disk on the > server? On the other hand, GFS is a bit overkill, since I don't need > the clustering; I just want to share my disk. > > However, for aoe and iscsi, I think there is no way of sharing the > file system between multiple systems, which would make them unusable. > Besides, I could not find rpms for aoe, and for iscsi I could only > find the server rpm, not the client. You DO need the clustering. That is what you are doing with GNBD/iSCSI/AoE. You are allowing multiple computers to read/write directly to the storage media. This requires GFS and a cluster to manage access and prevent the storage from becoming corrupted. With NFS, the clients only access the storage through the NFS server, so do not need this. AoE and iSCSI can be natively shared with as many computers as you can connect up to the storage network. I don't know where you can find the iSCSI drivers, but for AoE, you can get them from http://www.coraid.com/support/linux/. It's not an rpm, but a small, easily compiled tarball. There may be an rpm version somewhere, but I've always just compiled it myself. I can't comment on the limitations of GNBD. I've never used it myself, so I'm not sure. -- Bowie From eric at bootseg.com Thu May 4 19:43:36 2006 From: eric at bootseg.com (Eric Kerin) Date: Thu, 04 May 2006 15:43:36 -0400 Subject: [Linux-cluster] Centralized Cron In-Reply-To: <20060504190651.30345.qmail@webmail10.rediffmail.com> References: <20060504190651.30345.qmail@webmail10.rediffmail.com> Message-ID: <1146771816.3407.24.camel@auh5-0479.corp.jabil.org> On Thu, 2006-05-04 at 19:06 +0000, saju john wrote: > > > Dear All, > > Is there any way to make a centalized cron while using Redhat HA > cluster with Sahred storage. I mean to put the crontab entry for a > particular user on shared storage, so that when the cluster shifts, on > the other node cron should read from the cron file in shared storage. > > This setup has the advantage that we don't need to manullay update the > cron entry in both nodes. > > I tried two ways , but not success > What I'm currently doing is creating wrapper scripts that check to see if the clustered filesystem is mounted, then if it does, execute the job. This script is then placed in crontab. The downside is that I have to update the crontab on all cluster nodes, as well as copy the wrapper script to each node. I've been toying with the idea of making an rgmanager aware cron, but haven't worked out enough details of how it would work to write something up for comments. Thanks, Eric Kerin eric at bootseg.com From herta.vandeneynde at cc.kuleuven.be Thu May 4 23:25:59 2006 From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde) Date: Fri, 05 May 2006 01:25:59 +0200 Subject: [Linux-cluster] umount failed - device is busy In-Reply-To: <4459AC85.7020308@cc.kuleuven.be> References: <434A7ADE.108@cc.kuleuven.be> <434A8FE6.40508@cc.kuleuven.be> <1128963722.4680.21.camel@ayanami.boston.redhat.com> <434AC9DE.50606@cc.kuleuven.be> <1128978146.4680.37.camel@ayanami.boston.redhat.com> <434ADB8C.9010508@cc.kuleuven.be> <1129043197.4680.85.camel@ayanami.boston.redhat.com> <434BDECD.2060303@cc.kuleuven.be> <1129054711.4680.119.camel@ayanami.boston.redhat.com> <434C0FA7.9000803@cc.kuleuven.be> <4459AC85.7020308@cc.kuleuven.be> Message-ID: <445A8D87.5030900@cc.kuleuven.be> Herta Van den Eynde wrote: > Herta Van den Eynde wrote: > >> Lon Hohberger wrote: >> >>> On Tue, 2005-10-11 at 17:48 +0200, Herta Van den Eynde wrote: >>> >>> >>>> Bit of extra information: the system that was running the services >>>> got STONITHed by the other cluster member shortly before midnight. >>>> The services all failed over nicely, but the situation remains: if >>>> I try to stop or relocate a service, I get a "device is busy". >>>> I suppose that rules out an intermittent issue. >>>> >>>> There's no mounts below mounts. >>> >>> >>> >>> >>> Drat. >>> >>> Nfsd is the most likely candidate for holding the reference. >>> Unfortunately, this is not something I can track down; you will have to >>> either file a support request and/or a Bugzilla. When you get a chance, >>> you should definitely try stopping nfsd and seeing if that clears the >>> mystery references (allowing you to unmount). If the problem comes from >>> nfsd, it should not be terribly difficult to track down. >>> >>> Also, you should not need to recompile your kernel to probe all the LUNs >>> per device; just edit /etc/modules.conf: >>> >>> options scsi_mod max_scsi_luns=128 >>> >>> ... then run mkinitrd to rebuild the initrd image. >>> >>> -- Lon >> >> >> Next maintenance window is 4 weeks away, so I won't be able to test >> the nfsd hypothesis anytime soon. In the meantime, I'll file a >> support request. I'll keep you posted. >> >> At least the unexpected STONITH confirms that the failover still works. >> >> The /etc/modules.conf tip is a big time saver. Rebuilding the modules >> takes forever. >> >> Thanks, Lon. >> >> Herta > > > Apologies for not updating this sooner. (Thanks for remindeing me, Owen.) > > During a later maintenance window, I shut down the cluster services, but > it wasn't until I stopped the nfsd, that the filesystems could actually > be unmounted, which seems to confirm Lon's theory about nfsd being the > likely candidate for holding the reference. > > I found a note elsewhere on the web where someone worked around the > problem by stopping nfsd, stopping the service, restarting nfsd, and > relocating the service. Disadvantage being that all nfs services > experience a minor interrupt at the time. > > Anyway, my problem disappeared during the latest maintenance window. > Both nfs-utils and clumanager were updated (nfs-utils-1.0.6-42EL -> > nfs-utils-1.0.6-43EL, clumanager-1.2.28-1 -> clumanager-1.2.31-1), so > I'm not 100% sure which of the two fixed it, and curious though I am, I > simply don't have the time to start reading the code. If anyone has > further insights, I'd love to read about it, though. > > Kind regards, > > Herta Someone reported off line that they are experiencing the same problem while running the same versions we currently are. So just for completeness sake: expecting problems, I also upped the clumanager log levels during the last maintenance window. They are now at: clumembd loglevel="6" cluquorumd loglevel="6" clurmtabd loglevel="7" clusvcmgrd loglevel="6" clulockd loglevel="6" Come to think of it, I probably loosened the log levels during the maintenance window when our problems began (I wanted to reduce the size of the logs). Not sure how - or even if - this might affect things, though. Kind regards, Herta Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From uli.schroeder at gmx.net Fri May 5 06:25:25 2006 From: uli.schroeder at gmx.net (Uli Schroeder) Date: Fri, 5 May 2006 08:25:25 +0200 (MEST) Subject: [Linux-cluster] GFS: assertion "x <= length" failed Message-ID: <8991.1146810325@www070.gmx.net> Hi everyone, I encounter the following problem with GFS unter RHEL4. Anyone familiar with this one. Apr 28 14:24:26 tstserver kernel: GFS: fsid=clmdb1:gfs05_0.1: fatal: assertion "x <= length" failed Apr 28 14:24:26 tstserver kernel: GFS: fsid=clmdb1:gfs05_0.1: function = blkalloc_internal Apr 28 14:24:26 tstserver kernel: GFS: fsid=clmdb1:gfs05_0.1: file = /usr/src/build/574066-ia64/BUILD/gfs-kernel-2.6.9-35/src/gfs/rgrp.c, line = 1450 Apr 28 14:24:26 tstserver kernel: GFS: fsid=clmdb1:gfs05_0.1: time = 1146227066 Apr 28 14:24:26 tstserver kernel: GFS: fsid=clmdb1:gfs05_0.1: about to withdraw from the cluster Apr 28 14:24:26 tstserver kernel: GFS: fsid=clmdb1:gfs05_0.1: waiting for outstanding I/O What can I do against it? Anytime the error occurs all I get when I try to access a directory on that volume is an "Input/Output error". The failure occurs regularly and can only be resolved by booting the system. Interestingly the error doesn't apply to all GFS volumes on a server. One volume regularly fails while the other is up and running all the time. There was no difference in setting them up. Anyway the could be observed on different servers. Best regards, Uli -- Echte DSL-Flatrate dauerhaft f?r 0,- Euro*! "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl From cjk at techma.com Fri May 5 12:50:50 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Fri, 5 May 2006 08:50:50 -0400 Subject: [Linux-cluster] Recommended HP servers for cluster suite Message-ID: iLO fencing works just fine. You might be missing perl-Crypt-SSLeay which is required for iLO fencing. You need to put all the fence_ilo options in the "fence.ccs" and the option ' action = reboot (or off) for in the fence section of your nodes.ccs file on RHEL3+GFS6.0x. If you are using RHEL4 + GFS 6.1, then it is simpler since the config is expected to be in the same file etc. In either case, you need to make sure the web access to the iLO port is working and that you have a valid account in the iLO config (the built in Administrator account will work) Also, if you are using RHEL4 and an updated iLO firmware, you need to disable power management for the machine due to a change in the way the iLO powers off the machine. It seems to try a nice shutdown by sending the machine into runlevel 6 instead of just pulling the carpet out from under it. My suggestion to the fencing agent coders would be to issue a "power reset" instead of a "power off" as a reset will in fact pull the plug, and is much faster (and thereby "safer") than a power off command. Also, you can always add more than one fencing method to each node. For instace, you can fence the machine at the fibre port as well. I believe you need to manually enable the port again once you have determined that there is no problem etc. Any specific problem you are having? Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Cosimo Streppone Sent: Wednesday, May 03, 2006 6:53 AM To: linux clustering Subject: Re: [Linux-cluster] Recommended HP servers for cluster suite Steve Nelson wrote: > On 5/3/06, carlopmart wrote: > >> And what about Porliant DL 380?? > > I use dl380s for low-end, dl580s and now 585s for upper-end clusters. > Very very happy with them. Do you use iLO port for fencing? Please can you explain your iLO configuration? I have some doubts on how to configure fencing. Example: you have nodes A,B and iLO devices Ai, Bi Fencing device for node A should be Ai or Bi? I had also troubles on installation of additional perl modules required to make fence_ilo agent work: despite having IO::Socket::SSL and Net::SSL::something correctly installed, it keeps throwing error messages and I can't seem to correctly startup fenced. -- Cosimo -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From proftpd at rodriges.spb.ru Sat May 6 14:36:14 2006 From: proftpd at rodriges.spb.ru (proftpd at rodriges.spb.ru) Date: Sat, 06 May 2006 18:36:14 +0400 Subject: [Linux-cluster] mount at other disk Message-ID: Hello. I'm using Vtrack as iSCSI target and 2 RHEL4 hosts as iSCSI initiators. 2 RHEL in a cluster and have /dev/sda as iSCSI attached targed. I make CLVM2 #pvcreate /dev/sda #vgcreate test /dev/sda #lvcreate -n test -L10G test and GFS #gfs_mkfs -p lock_dlm -t alpha:a -j 8 /dev/test/test and successfully mount /dev/test/test at both machine. All OK. But then i'm increase the size of target at Vtrack. After I remount iSCSI at 2 RHEL, i see that insteed /dev/sda target become /dev/sdb!!! Of course, LVM2 wants to see /dev/sda as PV. So I can't use data. What can I do to mount iSCSI targer always as /dev/sda? From Jon.Stanley at savvis.net Sun May 7 03:05:35 2006 From: Jon.Stanley at savvis.net (Stanley, Jon) Date: Sat, 6 May 2006 22:05:35 -0500 Subject: [Linux-cluster] RE: < fecing with out any hardware? > Message-ID: <9A6FE0FCC2B29846824C5CD81C6647B901CE00B9@s228130hz1ew08.apptix-01.savvis.net> > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bowie Bailey > Sent: Wednesday, May 03, 2006 11:08 AM > To: linux clustering > Subject: RE: [Linux-cluster] RE: < fecing with out any hardware? > > > > > This script can be an automatic login to the failed server (ssh, > > rlogin, serial console) which can execute any remote operation (for > > example unload the module of the SAN-device) or causing an kernel > > panic (which is the fencing-method in ocfs2 ;-) ). > > If you have such a script, it cannot be guaranteed to be successful. If the server is so misbehaving that it will not respond to ssh, then all bets are off and this will never succeed. The question that I have is that there is functionality in the SCSI-3 spec for Persistent Group Reservations. Basically, what happens is that each system that wants access to a disk puts a "reservation" and "registration" on it. A commercial clustering solution (Symantec) uses this feature in order to do it's I/O fencing. The initial reservation on the disk is "Write Exclusive Registrants Only", meaning that if you are not registered to be on the disk, you cannot write to it. When the node comes up, upon synchronizing with all of the other nodes, etc, it puts it's key onto the disk. It can then write to the disk, without any problem. When the node dies, the surviving node(s) see that, and eject the dead node, making it physically impossible to write to the disk. This of course requires support from the array to do it (it's a SCSI-3 standard, but not all arrays implement it), thereby limiting the choice of storage to mid-to-high-end enterprise arrays. The question is why can't we use that as a fence mechanism, and do away with the hardware poweroff stuff, if the array supports it? Of course the hardware poweroff stuff could be left in for older/lower end arrays, etc, but I think that options are a Good Thing(TM). From hirantha at vcs.informatics.lk Mon May 8 04:54:03 2006 From: hirantha at vcs.informatics.lk (Hirantha Wijayawardena) Date: Mon, 8 May 2006 10:54:03 +0600 Subject: [Linux-cluster] Recommended HP servers for cluster suite In-Reply-To: Message-ID: <20060508045815.58D7F27C43@ux-mail.informatics.lk> Hi Corey I have something to get clarify with you about 'Disabling the Power Management' (I'm not quite sure whether my question is lacks my knowledge on HP servers) If you disable the power Management, is it possible to boot/reboot/shutdown the server from web-based SIM utility? I believe there should be interaction between iLO and Power Management. I hope my second question is posted and replied already - rebooting Vs power off. Sending node to runlevel 6 - What if server hung long time before network service down! Will other node wait or takeover the package ran on that hung server - this will crash right? Thanks in advance - Hirantha -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J. Sent: Friday, May 05, 2006 6:51 PM To: linux clustering Subject: RE: [Linux-cluster] Recommended HP servers for cluster suite iLO fencing works just fine. You might be missing perl-Crypt-SSLeay which is required for iLO fencing. You need to put all the fence_ilo options in the "fence.ccs" and the option ' action = reboot (or off) for in the fence section of your nodes.ccs file on RHEL3+GFS6.0x. If you are using RHEL4 + GFS 6.1, then it is simpler since the config is expected to be in the same file etc. In either case, you need to make sure the web access to the iLO port is working and that you have a valid account in the iLO config (the built in Administrator account will work) Also, if you are using RHEL4 and an updated iLO firmware, you need to disable power management for the machine due to a change in the way the iLO powers off the machine. It seems to try a nice shutdown by sending the machine into runlevel 6 instead of just pulling the carpet out from under it. My suggestion to the fencing agent coders would be to issue a "power reset" instead of a "power off" as a reset will in fact pull the plug, and is much faster (and thereby "safer") than a power off command. Also, you can always add more than one fencing method to each node. For instace, you can fence the machine at the fibre port as well. I believe you need to manually enable the port again once you have determined that there is no problem etc. Any specific problem you are having? Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Cosimo Streppone Sent: Wednesday, May 03, 2006 6:53 AM To: linux clustering Subject: Re: [Linux-cluster] Recommended HP servers for cluster suite Steve Nelson wrote: > On 5/3/06, carlopmart wrote: > >> And what about Porliant DL 380?? > > I use dl380s for low-end, dl580s and now 585s for upper-end clusters. > Very very happy with them. Do you use iLO port for fencing? Please can you explain your iLO configuration? I have some doubts on how to configure fencing. Example: you have nodes A,B and iLO devices Ai, Bi Fencing device for node A should be Ai or Bi? I had also troubles on installation of additional perl modules required to make fence_ilo agent work: despite having IO::Socket::SSL and Net::SSL::something correctly installed, it keeps throwing error messages and I can't seem to correctly startup fenced. -- Cosimo -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From knelson at glasshouse.com Mon May 8 10:15:31 2006 From: knelson at glasshouse.com (Kevin Nelson) Date: Mon, 8 May 2006 11:15:31 +0100 Subject: [Linux-cluster] CLVMD Message-ID: Setting up a cluster using RedHat ES4 Update 3. Installed cluster software (now running system config cluster 1.0.25), then installed GFS (6.1). Device mapper and LVM2 were part of the RedHat install. I have a cluster up and running, I can create a volume group, a logical volume and then a GFS volume. Read write fine. What I cannot do is share it in the cluster, CLVMD is not installed, the only information I can find to install is as part of the LVM2 make which allows you to make LVM2 with cluster options and CLVM but this fails on the ./configure Would appreciate any help if possible or if you need any further information let me know. Thank you Kevin Nelson Systems Integration Consultant GlassHouse Technologies (UK) THE GLOBAL LEADER IN INDEPENDENT STORAGE SERVICES Tel: +44 1932 428812 Mob: +44 7767 302108 Fax: +44 2392 498853 http://www.glasshouse.com Mailto:knelson at glasshouse.com This message is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please accept our apology. We should be obliged if you would telephone the sender on the above number or email them by return. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pcaulfie at redhat.com Mon May 8 12:10:11 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Mon, 08 May 2006 13:10:11 +0100 Subject: [Linux-cluster] CLVMD In-Reply-To: References: Message-ID: <445F3523.2080109@redhat.com> Kevin Nelson wrote: > Setting up a cluster using RedHat ES4 Update 3. Installed cluster > software (now running system config cluster 1.0.25), then installed GFS > (6.1). Device mapper and LVM2 were part of the RedHat install. I have a > cluster up and running, I can create a volume group, a logical volume > and then a GFS volume. Read write fine. What I cannot do is share it in > the cluster, CLVMD is not installed, the only information I can find to > install is as part of the LVM2 make which allows you to make LVM2 with > cluster options and CLVM but this fails on the ./configure > > Would appreciate any help if possible or if you need any further > information let me know. > Thank you > If you just need clvmd then you'll find it in the lvm2-cluster package. If you really want to build it from sources then you'll need to post the configure errors here. I suspect it's just some dependant packages that are missing. -- patrick From lhh at redhat.com Mon May 8 13:49:21 2006 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 08 May 2006 09:49:21 -0400 Subject: [Linux-cluster] RE: < fecing with out any hardware? > In-Reply-To: <2C04D2F14FD8254386851063BC2B67065E08B4@STBEVS01.stb.sun.ac.za> References: <4766EEE585A6D311ADF500E018C154E30213398C@bnifex.cis.buc.com> <2C04D2F14FD8254386851063BC2B67065E08B4@STBEVS01.stb.sun.ac.za> Message-ID: <1147096161.11396.29.camel@ayanami.boston.redhat.com> Sorry for the late response. On Mon, 2006-05-01 at 21:45 +0200, Pool Lee, Mr <14117614 at sun.ac.za> wrote: > Hi.. > > What about software fencing? Is it really nesasary to be hardware! Fencing basically is using a device which is not directly controlled by cluster nodes to ensure a given node is cut off from performing I/O, thereby corrupting shared data. > Is there a difference between lutre/cfs, the product that sun uses, and gfs? I have not read much about Sun's product(s), but GFS is significantly different architecturally from Lustre. http://lustre.org/architecture.html GFS has no metadata or data servers per se (though, when using gulm, you have a 'lock' server); all nodes are accessing the same block devices directly. > I'm planning to do mostly numerical work with the cluster and thus I would like all the machines to be able to > retrieve data, as if it was local on the machine. NFS is very limited in this regard because we intend on using vast arrays of matrices, that can be up to 1-2 Gig. You can use GFS and export the same NFS volume from multiple servers if you need to, which helps eliminate the single-NFS-server bottleneck. (In this case, you only need to set up fencing for the GFS cluster.) Or, you can connect all the nodes in your cluster to the same block devices and use GFS directly. > I was hoping to implement GFS since all the machines are already setup, without the hardware fencing though. Well, you /can/ do this, but if a node hangs and comes back to life, plan on rebooting the entire cluster, recreating the file system from scratch, and restarting your calculations. -- Lon From lhh at redhat.com Mon May 8 14:11:13 2006 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 08 May 2006 10:11:13 -0400 Subject: [Linux-cluster] HP msa20 and rhcs? In-Reply-To: <1333.85.101.156.147.1146510678.squirrel@85.101.156.147> References: <1333.85.101.156.147.1146510678.squirrel@85.101.156.147> Message-ID: <1147097473.11396.46.camel@ayanami.boston.redhat.com> On Mon, 2006-05-01 at 22:11 +0300, Omer Faruk Sen wrote: > Hi, > > I need a cheap shared storage and wanted to know if anyone in this list > used HP MSA20 (shared SCSI) on rhcs? I want to setup a 2 node cluster > with shared storage and cheapest HP shared storage seems to be MSA20.. I've never used one; maybe someone else has. General rules of thumb when using SCSI shared storage: * If it requires a specific controller to make the RAID work, it probably is not a good bet, regardless of what the marketing literature would have you believe. * If it works with a plain-jane SCSI (ex- an Adaptec 2940U2W) card and still has some way of accessing the array management tools (ex: a serial port) for configuring/presenting the LUNs, it should generally "just work". There are undoubtedly exceptions to these rules. I'm not at all familiar with the MSA20. I do, however, have a MSA500 which has been working fine. The MSA500 needs CCISS controllers to talk to the on-board MSA controller and configure the LUNs during bootup. After that, the CCISS controllers act as "dumb" SCSI cards when talking to the MSA500, or so it seems. It's been working fine in a 2-node failover cluster for a couple of years, but I have not tried it with GFS. Whether that means the MSA20 will work... I do not know. It might ;) -- Lon From teigland at redhat.com Mon May 8 14:19:02 2006 From: teigland at redhat.com (David Teigland) Date: Mon, 8 May 2006 09:19:02 -0500 Subject: [Linux-cluster] RE: < fecing with out any hardware? > In-Reply-To: <9A6FE0FCC2B29846824C5CD81C6647B901CE00B9@s228130hz1ew08.apptix-01.savvis.net> References: <9A6FE0FCC2B29846824C5CD81C6647B901CE00B9@s228130hz1ew08.apptix-01.savvis.net> Message-ID: <20060508141902.GB21898@redhat.com> On Sat, May 06, 2006 at 10:05:35PM -0500, Stanley, Jon wrote: > The question that I have is that there is functionality in the SCSI-3 > spec for Persistent Group Reservations. Basically, what happens is that > each system that wants access to a disk puts a "reservation" and > "registration" on it. A commercial clustering solution (Symantec) uses > this feature in order to do it's I/O fencing. > > The initial reservation on the disk is "Write Exclusive Registrants > Only", meaning that if you are not registered to be on the disk, you > cannot write to it. When the node comes up, upon synchronizing with all > of the other nodes, etc, it puts it's key onto the disk. It can then > write to the disk, without any problem. When the node dies, the > surviving node(s) see that, and eject the dead node, making it > physically impossible to write to the disk. > > This of course requires support from the array to do it (it's a SCSI-3 > standard, but not all arrays implement it), thereby limiting the choice > of storage to mid-to-high-end enterprise arrays. > > The question is why can't we use that as a fence mechanism, and do away > with the hardware poweroff stuff, if the array supports it? Of course > the hardware poweroff stuff could be left in for older/lower end arrays, > etc, but I think that options are a Good Thing(TM). You could definately use persistent reservations to do fencing, we just don't have a fencing agent written for it yet. It's one of those things that no one ever quite gets the time to do. It's something that would be _really_ nice to have and would spare a lot of people a lot of hassle. Dave From cjk at techma.com Mon May 8 15:36:59 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Mon, 8 May 2006 11:36:59 -0400 Subject: [Linux-cluster] Recommended HP servers for cluster suite Message-ID: I believe you can simply turn acpid off to disable the power management. As far as SIM and shutdown goes, you should still be able to. The power management stuff is just that part that intercepts front panel power button hits and sends the computer into shutdown instead of a hard power off. It's useful for headless machines so you don't have to remote login, or connect a head just to power down a machine. The problem is that the iLO function for powering off the machine, is the equivelent of a button push, and therefore sends the machine into init 6. As far as taking a while for the machine to shutdown, that's why in my message I suggested doing a "power reset" rather than an "power off" since a power reset actually pulls the carpet out from under the machine no matter what. Regards Corey >I have something to get clarify with you about 'Disabling the Power Management' (I'm not quite sure whether >my question is lacks my knowledge on HP servers) >If you disable the power Management, is it possible to boot/reboot/shutdown the server from web-based SIM >utility? I believe there should be interaction between iLO and Power Management. >I hope my second question is posted and replied already - rebooting Vs power off. Sending node to runlevel 6 >- What if server hung long time before network service down! Will other node wait or takeover the package ran on that hung server - this will crash right? -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J. Sent: Friday, May 05, 2006 6:51 PM To: linux clustering Subject: RE: [Linux-cluster] Recommended HP servers for cluster suite iLO fencing works just fine. You might be missing perl-Crypt-SSLeay which is required for iLO fencing. You need to put all the fence_ilo options in the "fence.ccs" and the option ' action = reboot (or off) for in the fence section of your nodes.ccs file on RHEL3+GFS6.0x. If you are using RHEL4 + GFS 6.1, then it is simpler RHEL3+since the config is expected to be in the same file etc. In either case, you need to make sure the web access to the iLO port is working and that you have a valid account in the iLO config (the built in Administrator account will work) Also, if you are using RHEL4 and an updated iLO firmware, you need to disable power management for the machine due to a change in the way the iLO powers off the machine. It seems to try a nice shutdown by sending the machine into runlevel 6 instead of just pulling the carpet out from under it. My suggestion to the fencing agent coders would be to issue a "power reset" instead of a "power off" as a reset will in fact pull the plug, and is much faster (and thereby "safer") than a power off command. Also, you can always add more than one fencing method to each node. For instace, you can fence the machine at the fibre port as well. I believe you need to manually enable the port again once you have determined that there is no problem etc. Any specific problem you are having? Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Cosimo Streppone Sent: Wednesday, May 03, 2006 6:53 AM To: linux clustering Subject: Re: [Linux-cluster] Recommended HP servers for cluster suite Steve Nelson wrote: > On 5/3/06, carlopmart wrote: > >> And what about Porliant DL 380?? > > I use dl380s for low-end, dl580s and now 585s for upper-end clusters. > Very very happy with them. Do you use iLO port for fencing? Please can you explain your iLO configuration? I have some doubts on how to configure fencing. Example: you have nodes A,B and iLO devices Ai, Bi Fencing device for node A should be Ai or Bi? I had also troubles on installation of additional perl modules required to make fence_ilo agent work: despite having IO::Socket::SSL and Net::SSL::something correctly installed, it keeps throwing error messages and I can't seem to correctly startup fenced. -- Cosimo -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From cjk at techma.com Mon May 8 15:43:46 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Mon, 8 May 2006 11:43:46 -0400 Subject: [Linux-cluster] HP msa20 and rhcs? Message-ID: The MSA20 is only a disk shelf. You'd need to have it conneted to a raid controller which is built into the DL360 and above, or simply access the individual drives themselves. It does allow multi-initiator connections, but I think it's more along the lines of having multiple paths to an MSA500 which is a two node non-fibre SAN if it can even be considered a SAN since no fibre is involved. It's not more than a high end external storage device. Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger Sent: Monday, May 08, 2006 10:11 AM To: omer at faruk.net; linux clustering Subject: Re: [Linux-cluster] HP msa20 and rhcs? On Mon, 2006-05-01 at 22:11 +0300, Omer Faruk Sen wrote: > Hi, > > I need a cheap shared storage and wanted to know if anyone in this > list used HP MSA20 (shared SCSI) on rhcs? I want to setup a 2 node > cluster with shared storage and cheapest HP shared storage seems to be MSA20.. I've never used one; maybe someone else has. General rules of thumb when using SCSI shared storage: * If it requires a specific controller to make the RAID work, it probably is not a good bet, regardless of what the marketing literature would have you believe. * If it works with a plain-jane SCSI (ex- an Adaptec 2940U2W) card and still has some way of accessing the array management tools (ex: a serial port) for configuring/presenting the LUNs, it should generally "just work". There are undoubtedly exceptions to these rules. I'm not at all familiar with the MSA20. I do, however, have a MSA500 which has been working fine. The MSA500 needs CCISS controllers to talk to the on-board MSA controller and configure the LUNs during bootup. After that, the CCISS controllers act as "dumb" SCSI cards when talking to the MSA500, or so it seems. It's been working fine in a 2-node failover cluster for a couple of years, but I have not tried it with GFS. Whether that means the MSA20 will work... I do not know. It might ;) -- Lon -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From cosimo at streppone.it Mon May 8 19:31:14 2006 From: cosimo at streppone.it (Cosimo Streppone) Date: Mon, 08 May 2006 21:31:14 +0200 Subject: [Linux-cluster] Recommended HP servers for cluster suite In-Reply-To: References: Message-ID: <445F9C82.2030801@streppone.it> Kovacs, Corey J. wrote: > iLO fencing works just fine. > [...] > If you are using RHEL4 + GFS 6.1, then it is simpler since the > config is expected to be in the same file etc. > > [...] I seem to have got past the SSL modules installation, so that is not the problem. Thanks for sharing your experience, but I admit I still haven't understood when fencing takes place. What are the conditions that trigger fencing? > Any specific problem you are having? Yes. The main problem is that I'm now beginning to find my way through RHCS4. :-) Other random problems that I had: - oom-killer kernel thread killed my ccs daemon, causing the entire two-node cluster to suddenly become unmanageable; - start/stop of shared filesystem resources (SAN) is causing errors and is therefore not managed properly; - don't know how to properly configure heartbeat; I know these are not iLO problem. In fact, I'm trying to solve one problem at a time, and don't know if iLO fencing can be the cause of these problems. I need to do some more researching. I'll be back with more useful info. -- Cosimo From cjk at techma.com Mon May 8 20:13:37 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Mon, 8 May 2006 16:13:37 -0400 Subject: [Linux-cluster] Recommended HP servers for cluster suite Message-ID: Cosimo, fencing takes place any time a condition exists, where "the cluster" cannot communicate with a node, or cannot gaurentee the state of a particular node. Other can surely do the question more justtice but in a nutshell that's it. To test this, simply pull the netwrok cable from a node. The others will not be able to status it and it will get fenced. Same thing happens if you call 'fence_node node01' or whatever your nodes are named. The machine will actually be booted _twice_. Once from the command, then another when the cluster decides it's no longer talking. I think the fence command should at least have an option to inform the cluster that a node was fenced but it's not a big deal. If oom is killing your cluster nodes, I think your out of luck. GFS can gobble memory from my experience. More is better.. Also, in GFS 6.0x there is a bug that causes system ram to be exhausted by GFS locks. The newest release has a tunable paramter "inoded_purge" which allows you to tune a periodic percentage of locks to try and purge. This helped me a LOT. I was having nodes hang cuz nodes could not fork. BTW, if the GFS folks are reading this, I'd like ot make a suggestion. I have not gone code diving yet but it seems that if the mechanism for a node to respond was actually spawning a thread or something that required the system to be able to fork then systems that are starved of memory would indeed get fenced since the "OK" response would not get back to the cluster. I realize that doesn't FIX anything per se' but it would prevent the system from hanging for any length of time. On the start/stop of SAN resources, what exactly do you mean? It sounds like you are talking about what happens when qlogic drivers load and unload. If that's the case, you need to properly set up zoning on your fibre switch. The load/unload of the qlogic drivers causes a scsi reset to be sent along the bus, which in the case of fibre channel, is every device in the fibre mesh. You need to set up individual zones for your storage ports, then zones which include the host ports, and the storage together. So on a 5 node cluster, you'd end up with 5 zones, one for storage, and 4 host/storage combos, then make them all part of the active config. That way any scsi resets are not seen by other nodes HBA's. I had problems that were causeing nodes to go down due to lost connections to the storage from the scsi resets, not good.... Heartbeat should not need any tweaking if everything else is working. Not to say you can't tune it to your situation, just that it should be fine with default settings while you get things stable. Hope this helps Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Cosimo Streppone Sent: Monday, May 08, 2006 3:31 PM To: linux clustering Subject: Re: [Linux-cluster] Recommended HP servers for cluster suite Kovacs, Corey J. wrote: > iLO fencing works just fine. > [...] > If you are using RHEL4 + GFS 6.1, then it is simpler since the > config is expected to be in the same file etc. > > [...] I seem to have got past the SSL modules installation, so that is not the problem. Thanks for sharing your experience, but I admit I still haven't understood when fencing takes place. What are the conditions that trigger fencing? > Any specific problem you are having? Yes. The main problem is that I'm now beginning to find my way through RHCS4. :-) Other random problems that I had: - oom-killer kernel thread killed my ccs daemon, causing the entire two-node cluster to suddenly become unmanageable; - start/stop of shared filesystem resources (SAN) is causing errors and is therefore not managed properly; - don't know how to properly configure heartbeat; I know these are not iLO problem. In fact, I'm trying to solve one problem at a time, and don't know if iLO fencing can be the cause of these problems. I need to do some more researching. I'll be back with more useful info. -- Cosimo -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From jason at monsterjam.org Tue May 9 01:32:13 2006 From: jason at monsterjam.org (Jason) Date: Mon, 8 May 2006 21:32:13 -0400 Subject: [Linux-cluster] question about creating partitions and gfs Message-ID: <20060509013213.GA91908@monsterjam.org> so still following instructions at http://www.gyrate.org/archives/9 im at the part that says "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1" in my config, I have the dell PERC 4/DC cards, and I believe the logical drive showed up as /dev/sdb so do I need to create a partition on this logical drive with fdisk first before I run ccs_tool create /root/cluster /dev/sdb1 or am I totally off track here? i did ccs_tool create /root/cluster /dev/sdb and it seemed to work fine, but doesnt seem right.. Jason From cosimo at streppone.it Tue May 9 09:09:34 2006 From: cosimo at streppone.it (Cosimo Streppone) Date: Tue, 09 May 2006 11:09:34 +0200 Subject: [Linux-cluster] Interesting cluster case after a node hardware failure Message-ID: <44605C4E.7010103@streppone.it> I'm through an interesting case that I don't fully understand. I found some log messages that I never saw before, that are quite worrying. I report them here for quick reference, then I'm going to include full log extract with comments on what I think has happened. Please correct me when I'm wrong. This is on RedHat Enterprise ES 4 Update 3 with RHCS 4, with hand-modified init scripts to make them work with CS4. node2 kernel: CMAN: too many transition restarts - will die node2 kernel: CMAN: we are leaving the cluster. Inconsistent cluster view node2 kernel: WARNING: dlm_emergency_shutdown node2 kernel: WARNING: dlm_emergency_shutdown node2 kernel: SM: 00000001 sm_stop: SG still joined node2 kernel: SM: 01000003 sm_stop: SG still joined node2 kernel: SM: 03000002 sm_stop: SG still joined node2 clurgmgrd[2820]: #67: Shutting down uncleanly node2 ccsd[2473]: Cluster manager shutdown. Attemping to reconnect... node2 ccsd[2473]: Cluster is not quorate. Refusing connection. node2 ccsd[2473]: Error while processing connect: Connection refused node2 ccsd[2473]: Invalid descriptor specified (-111). node2 ccsd[2473]: Someone may be attempting something evil. node2 ccsd[2473]: Error while processing get: Invalid request descriptor node2 ccsd[2473]: Invalid descriptor specified (-111). node2 ccsd[2473]: Someone may be attempting something evil. node2 ccsd[2473]: Error while processing get: Invalid request descriptor node2 ccsd[2473]: Invalid descriptor specified (-21). node2 ccsd[2473]: Someone may be attempting something evil. node2 ccsd[2473]: Error while processing disconnect: Invalid request descriptor node2 clurgmgrd: [2820]: Executing /etc/rc.d/init.d/xinetd stop Cluster is composed of two nodes (node1 and node2), two HP DL360 machines with iLO devices configured for fencing but not connected for now. There is one service only which has several shared resources attached (fs, init scripts, and 1 ip address). As I said, I attached an extract of "messages" log that shows a series of events which led to malfunctioning clustered service. Please can anyone shed some light on this? Thank you for any suggestion. Trace begins. Node1 dies of hardware failure. ---------8<--------------------8<--------------- May 8 15:30:20 node2 kernel: CMAN: removing node node1 from the cluster : Missed too many heartbeats May 8 15:30:20 node2 fenced[2540]: node1 not a cluster member after 0 sec post_fail_delay May 8 15:30:20 node2 fenced[2540]: fencing node "node1" May 8 15:30:23 node2 fenced[2540]: agent "fence_ilo" reports: connect: No route to host at /opt/perl/lib/site_perl/5.8.6/linux/Net/SSL May 8 15:30:23 node2 fenced[2540]: fence "node1" failed Fencing could never work here because iLO interface of node1 was down (and *not* connected, FWIW). [...] May 8 15:31:55 node2 kernel: CMAN: node node1 rejoining May 8 15:31:59 node2 clurgmgrd[2820]: Magma Event: Membership Change May 8 15:31:59 node2 clurgmgrd[2820]: State change: node1 DOWN May 8 15:31:59 node2 clurgmgrd: [2820]: Executing /etc/rc.d/init.d/xinetd status Cluster recognizes the node1 is down. Ok. I didn't understand the "rejoining" though. Node1 was down. [...] May 8 15:36:40 node2 kernel: CMAN: too many transition restarts - will die May 8 15:36:40 node2 kernel: CMAN: we are leaving the cluster. Inconsistent cluster view May 8 15:36:40 node2 kernel: WARNING: dlm_emergency_shutdown May 8 15:36:40 node2 kernel: WARNING: dlm_emergency_shutdown May 8 15:36:40 node2 kernel: SM: 00000001 sm_stop: SG still joined May 8 15:36:40 node2 kernel: SM: 01000003 sm_stop: SG still joined May 8 15:36:40 node2 kernel: SM: 03000002 sm_stop: SG still joined May 8 15:36:40 node2 clurgmgrd[2820]: #67: Shutting down uncleanly May 8 15:36:40 node2 ccsd[2473]: Cluster manager shutdown. Attemping to reconnect... May 8 15:36:40 node2 ccsd[2473]: Cluster is not quorate. Refusing connection. May 8 15:36:40 node2 ccsd[2473]: Error while processing connect: Connection refused May 8 15:36:40 node2 ccsd[2473]: Invalid descriptor specified (-111). May 8 15:36:40 node2 ccsd[2473]: Someone may be attempting something evil. May 8 15:36:40 node2 ccsd[2473]: Error while processing get: Invalid request descriptor May 8 15:36:40 node2 ccsd[2473]: Invalid descriptor specified (-111). May 8 15:36:40 node2 ccsd[2473]: Someone may be attempting something evil. May 8 15:36:40 node2 ccsd[2473]: Error while processing get: Invalid request descriptor May 8 15:36:40 node2 ccsd[2473]: Invalid descriptor specified (-21). May 8 15:36:40 node2 ccsd[2473]: Someone may be attempting something evil. May 8 15:36:40 node2 ccsd[2473]: Error while processing disconnect: Invalid request descriptor May 8 15:36:40 node2 clurgmgrd: [2820]: Executing /etc/rc.d/init.d/xinetd stop From now on, all services are being shut down by the cluster resource manager daemon. But what could have happened that triggered a `dlm_emergency_shutdown'? May 8 15:36:40 node2 ccsd[2473]: Cluster is not quorate. Refusing connection. May 8 15:36:40 node2 ccsd[2473]: Error while processing connect: Connection refused May 8 15:36:40 node2 ccsd[2473]: Invalid descriptor specified (-111). May 8 15:36:40 node2 ccsd[2473]: Someone may be attempting something evil. May 8 15:36:40 node2 ccsd[2473]: Error while processing get: Invalid request descriptor May 8 15:36:40 node2 ccsd[2473]: Invalid descriptor specified (-111). May 8 15:36:40 node2 ccsd[2473]: Someone may be attempting something evil. May 8 15:36:40 node2 ccsd[2473]: Error while processing get: Invalid request descriptor May 8 15:36:40 node2 ccsd[2473]: Invalid descriptor specified (-21). May 8 15:36:40 node2 ccsd[2473]: Someone may be attempting something evil. May 8 15:36:40 node2 ccsd[2473]: Error while processing disconnect: Invalid request descriptor [...] All services were shut down, shared ip address was released and SAN volume unmounted. May 8 15:37:26 node2 ccsd[2473]: Unable to connect to cluster infrastructure after 60 seconds. ... May 8 15:41:26 node2 ccsd[2473]: Unable to connect to cluster infrastructure after 300 seconds. ... The morning after, the node2 was rebooted. The shut down is not clean, but rebooting has restored the cluster in a consistent state. Node1 is not accessible due to complete hardware failure. May 9 09:08:05 node2 fenced: Stopping fence domain: May 9 09:08:05 node2 fenced: shutdown failed May 9 09:08:05 node2 fenced: ESC[60G May 9 09:08:05 node2 fenced: May 9 09:08:05 node2 rc: Stopping fenced: failed May 9 09:08:05 node2 lock_gulmd: Stopping lock_gulmd: May 9 09:08:05 node2 lock_gulmd: shutdown succeeded May 9 09:08:05 node2 lock_gulmd: ESC[60G May 9 09:08:05 node2 lock_gulmd: May 9 09:08:05 node2 rc: Stopping lock_gulmd: succeeded May 9 09:08:05 node2 cman: Stopping cman: May 9 09:08:08 node2 cman: failed to stop cman failed May 9 09:08:08 node2 cman: ESC[60G May 9 09:08:08 node2 cman: May 9 09:08:08 node2 rc: Stopping cman: failed May 9 09:08:08 node2 ccsd: Stopping ccsd: May 9 09:08:08 node2 ccsd[2473]: Stopping ccsd, SIGTERM received. May 9 09:08:09 node2 ccsd: shutdown succeeded May 9 09:08:09 node2 ccsd: ESC[60G May 9 09:08:09 node2 ccsd: May 9 09:08:09 node2 rc: Stopping ccsd: succeeded May 9 09:08:09 node2 irqbalance: irqbalance shutdown succeeded May 9 09:08:09 node2 multipathd: mpath0: stop event checker thread May 9 09:08:09 node2 multipathd: multipathd shutdown succeeded May 9 09:08:09 node2 kernel: Kernel logging (proc) stopped. May 9 09:08:09 node2 kernel: Kernel log daemon terminating. May 9 09:08:10 node2 syslog: klogd shutdown succeeded May 9 09:08:10 node2 exiting on signal 15 ---------8<--------------------8<--------------- End of trace. ? -- Cosimo From pcaulfie at redhat.com Tue May 9 09:22:43 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Tue, 09 May 2006 10:22:43 +0100 Subject: [Linux-cluster] Interesting cluster case after a node hardware failure In-Reply-To: <44605C4E.7010103@streppone.it> References: <44605C4E.7010103@streppone.it> Message-ID: <44605F63.8080404@redhat.com> Cosimo Streppone wrote: > I'm through an interesting case that I don't fully understand. > I found some log messages that I never saw before, that are quite > worrying. I report them here for quick reference, then I'm going > to include full log extract with comments on what I think has happened. > Please correct me when I'm wrong. > > This is on RedHat Enterprise ES 4 Update 3 with > RHCS 4, with hand-modified init scripts to make them work with CS4. > > node2 kernel: CMAN: too many transition restarts - will die > node2 kernel: CMAN: we are leaving the cluster. Inconsistent cluster view This is a known bug and I'm currently testing a fix for it. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=187777 -- patrick From omer at faruk.net Tue May 9 11:04:33 2006 From: omer at faruk.net (Omer Faruk Sen) Date: Tue, 9 May 2006 14:04:33 +0300 (EEST) Subject: [Linux-cluster] HP msa20 and rhcs? In-Reply-To: References: Message-ID: <57497.193.140.74.2.1147172673.squirrel@193.140.74.2> Since msa20 allows multi-initiator connections than I think it works with dl380 since it has an external scsi port. Also I have heard that msa20 can have an internal raid controller so ACU can configure it. By the way I have doubts to use shared scsi in RHCS. Does anyone use it (I am sure there are) and those who use it recommend it for a cheap but RELIABLE cluster storage with RHCS? > The MSA20 is only a disk shelf. You'd need to have it conneted to a raid > controller > which is built into the DL360 and above, or simply access the individual > drives > themselves. It does allow multi-initiator connections, but I think it's > more > along > the lines of having multiple paths to an MSA500 which is a two node > non-fibre > SAN > if it can even be considered a SAN since no fibre is involved. It's not > more > than > a high end external storage device. > > > > Corey > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger > Sent: Monday, May 08, 2006 10:11 AM > To: omer at faruk.net; linux clustering > Subject: Re: [Linux-cluster] HP msa20 and rhcs? > > On Mon, 2006-05-01 at 22:11 +0300, Omer Faruk Sen wrote: >> Hi, >> >> I need a cheap shared storage and wanted to know if anyone in this >> list used HP MSA20 (shared SCSI) on rhcs? I want to setup a 2 node >> cluster with shared storage and cheapest HP shared storage seems to be > MSA20.. > > I've never used one; maybe someone else has. > > General rules of thumb when using SCSI shared storage: > > * If it requires a specific controller to make the RAID work, it probably > is > not a good bet, regardless of what the marketing literature would have you > believe. > > * If it works with a plain-jane SCSI (ex- an Adaptec 2940U2W) card and > still > has some way of accessing the array management tools (ex: a serial > port) for configuring/presenting the LUNs, it should generally "just > work". > > There are undoubtedly exceptions to these rules. I'm not at all familiar > with the MSA20. > > I do, however, have a MSA500 which has been working fine. The MSA500 > needs > CCISS controllers to talk to the on-board MSA controller and configure the > LUNs during bootup. After that, the CCISS controllers act as "dumb" SCSI > cards when talking to the MSA500, or so it seems. It's been working fine > in > a 2-node failover cluster for a couple of years, but I have not tried it > with > GFS. > > Whether that means the MSA20 will work... I do not know. It might ;) > > -- Lon > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Omer Faruk Sen http://www.faruk.net From cjk at techma.com Tue May 9 12:07:09 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Tue, 9 May 2006 08:07:09 -0400 Subject: [Linux-cluster] HP msa20 and rhcs? Message-ID: No, it is an external disk shelf (enclosure) that happens to have a raid (SATA) module. If the unit is connected to a 6400 series HP controller, then the controller acts like a dumb scsi card, and the "onboard" raid is used. If it's plugged into an MSA1500, then the onboard controller is bypassed and the MSA1500 (san) is the controller. The Docs also say it is it's connectivity is for a _single_ host so no failover for a cluster. It's not a good "enterprise" solution anyway as SATA drives are not considered a "high performance" drive anwyay. Spend some cash and get a SAN and real SCSI drives. Spend some more and get a bigger SAN and Fibre Attatched SCSI drives. much faster and can actually take a prolonged beating. I never said it didn't work with a DL380, I said it will work with a DL360 and above. I'd say DL380 is _above_ DL360. :) http://h18004.www1.hp.com/products/servers/proliantstorage/sharedstorage/sacl uster/msa20/index.html http://h18004.www1.hp.com/products/quickspecs/11942_na/11942_na.html Anyway, it supports multiple "types" of raid cards and the starter kit gets you a shelf and a raid card.. It is NOT meant for connecting two computers to.. Now back to the regularly scheduled program.... Corey -----Original Message----- From: Omer Faruk Sen [mailto:omer at faruk.net] Sent: Tuesday, May 09, 2006 7:05 AM To: Kovacs, Corey J. Cc: linux clustering Subject: RE: [Linux-cluster] HP msa20 and rhcs? Since msa20 allows multi-initiator connections than I think it works with dl380 since it has an external scsi port. Also I have heard that msa20 can have an internal raid controller so ACU can configure it. By the way I have doubts to use shared scsi in RHCS. Does anyone use it (I am sure there are) and those who use it recommend it for a cheap but RELIABLE cluster storage with RHCS? > The MSA20 is only a disk shelf. You'd need to have it conneted to a > raid controller which is built into the DL360 and above, or simply > access the individual drives themselves. It does allow multi-initiator > connections, but I think it's more along the lines of having multiple > paths to an MSA500 which is a two node non-fibre SAN if it can even be > considered a SAN since no fibre is involved. It's not more than a high > end external storage device. > > > > Corey > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger > Sent: Monday, May 08, 2006 10:11 AM > To: omer at faruk.net; linux clustering > Subject: Re: [Linux-cluster] HP msa20 and rhcs? > > On Mon, 2006-05-01 at 22:11 +0300, Omer Faruk Sen wrote: >> Hi, >> >> I need a cheap shared storage and wanted to know if anyone in this >> list used HP MSA20 (shared SCSI) on rhcs? I want to setup a 2 node >> cluster with shared storage and cheapest HP shared storage seems to >> be > MSA20.. > > I've never used one; maybe someone else has. > > General rules of thumb when using SCSI shared storage: > > * If it requires a specific controller to make the RAID work, it > probably is not a good bet, regardless of what the marketing > literature would have you believe. > > * If it works with a plain-jane SCSI (ex- an Adaptec 2940U2W) card and > still has some way of accessing the array management tools (ex: a > serial > port) for configuring/presenting the LUNs, it should generally "just > work". > > There are undoubtedly exceptions to these rules. I'm not at all > familiar with the MSA20. > > I do, however, have a MSA500 which has been working fine. The MSA500 > needs CCISS controllers to talk to the on-board MSA controller and > configure the LUNs during bootup. After that, the CCISS controllers > act as "dumb" SCSI cards when talking to the MSA500, or so it seems. > It's been working fine in a 2-node failover cluster for a couple of > years, but I have not tried it with GFS. > > Whether that means the MSA20 will work... I do not know. It might ;) > > -- Lon > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Omer Faruk Sen http://www.faruk.net From cjk at techma.com Tue May 9 12:16:07 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Tue, 9 May 2006 08:16:07 -0400 Subject: [Linux-cluster] question about creating partitions and gfs Message-ID: Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. Do you have a shared storage device? If /dev/sdb1 is not a shared device, then I think you might need to take a step back and get a hold of a SAN of some type. If you are just playing around, there are ways to get some firewire drives to accept two hosts and act like a cheap shared devices. There are docs on the Oracle site documenting the process of setting up the drive and the kernel. Note, that you'll only be able to use two nodes using the firewire idea. Also, you should specify a partition for the command below. That partition can be very small. Something on the order of 10MB sounds right. Even that is probably way too big. Then use the rest for GFS storage pools. Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason Sent: Monday, May 08, 2006 9:32 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] question about creating partitions and gfs so still following instructions at http://www.gyrate.org/archives/9 im at the part that says "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1" in my config, I have the dell PERC 4/DC cards, and I believe the logical drive showed up as /dev/sdb so do I need to create a partition on this logical drive with fdisk first before I run ccs_tool create /root/cluster /dev/sdb1 or am I totally off track here? i did ccs_tool create /root/cluster /dev/sdb and it seemed to work fine, but doesnt seem right.. Jason -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From hkubota at gmx.net Tue May 9 14:52:46 2006 From: hkubota at gmx.net (Harald Kubota) Date: Tue, 09 May 2006 23:52:46 +0900 Subject: [Linux-cluster] Centralized Cron In-Reply-To: <20060504190651.30345.qmail@webmail10.rediffmail.com> References: <20060504190651.30345.qmail@webmail10.rediffmail.com> Message-ID: <4460ACBE.3070505@gmx.net> saju john wrote: > > > Is there any way to make a centalized cron > at work (more clusters, all Veritas though) we use autosys, which is basically a program scheduler to start programs on other machines. cron jobs are for jobs which are for each node (e.g. sending a health status to a central server once a day), thus bound to a physical machine. autosys is for everything which can move (cluster service groups, e.g. running a DB cleanup job on the node which runs the DB). Since each service group has its own IP address and DNS entry, autosys simply connect to that IP address and executes a script (and does more like checking the status, handling timeouts, sending out alarms if anything went wrong etc.). Since autosys is commercial software and required quite some infrastructure, a simpler approach is to set up one machine (maybe cluster it ;-) which starts jobs on other machines according to a list it maintains. Harald From mwill at penguincomputing.com Tue May 9 15:03:54 2006 From: mwill at penguincomputing.com (Michael Will) Date: Tue, 9 May 2006 08:03:54 -0700 Subject: [Linux-cluster] Centralized Cron Message-ID: <433093DF7AD7444DA65EFAFE3987879C125CCF@jellyfish.highlyscyld.com> Or use cron on the headnode to submit jobs to the clusterscheduler if that does not support recurring timed jobs... -----Original Message----- From: Harald Kubota [mailto:hkubota at gmx.net] Sent: Tue May 09 07:53:08 2006 To: linux clustering Subject: Re: [Linux-cluster] Centralized Cron saju john wrote: > > > Is there any way to make a centalized cron > at work (more clusters, all Veritas though) we use autosys, which is basically a program scheduler to start programs on other machines. cron jobs are for jobs which are for each node (e.g. sending a health status to a central server once a day), thus bound to a physical machine. autosys is for everything which can move (cluster service groups, e.g. running a DB cleanup job on the node which runs the DB). Since each service group has its own IP address and DNS entry, autosys simply connect to that IP address and executes a script (and does more like checking the status, handling timeouts, sending out alarms if anything went wrong etc.). Since autosys is commercial software and required quite some infrastructure, a simpler approach is to set up one machine (maybe cluster it ;-) which starts jobs on other machines according to a list it maintains. Harald -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From nick at sqrt.co.uk Tue May 9 16:25:19 2006 From: nick at sqrt.co.uk (Nick Burrett) Date: Tue, 09 May 2006 09:25:19 -0700 Subject: [Linux-cluster] Centralized Cron In-Reply-To: <20060504190651.30345.qmail@webmail10.rediffmail.com> References: <20060504190651.30345.qmail@webmail10.rediffmail.com> Message-ID: <4460C26F.6070108@sqrt.co.uk> saju john wrote: > > > Dear All, > > Is there any way to make a centalized cron while using Redhat HA cluster > with Sahred storage. I mean to put the crontab entry for a particular > user on shared storage, so that when the cluster shifts, on the other > node cron should read from the cron file in shared storage. If you want some form of high availability cron, you could try to leverage the Condor application to suit your needs. If you link your cron applications against the Condor libraries, then you get process check pointing and all sorts of other wonderful stuff. > This setup has the advantage that we don't need to manullay update the > cron entry in both nodes. > b) Soft link the cron directory in /var/spool to > /path/to/shared/storage/cron. This is working till the cluster shift. > The cron is getting dead when the cluster shifts as it lose the > /var/spool/cron link's destination driectory which will be mapped to the > other node You need to add in a heartbeat trigger. The cron daemon runs on one server only. When that server goes offline, then start the cron daemon on the backup server. This is a terrible solution though. Nick. From jason at monsterjam.org Wed May 10 00:23:12 2006 From: jason at monsterjam.org (Jason) Date: Tue, 9 May 2006 20:23:12 -0400 Subject: [Linux-cluster] question about creating partitions and gfs In-Reply-To: References: Message-ID: <20060510002312.GA4927@monsterjam.org> yes, both boxes are connected to the storage, its a dell powervault 220S configured for cluster mode. [root at tf1 cluster]# fdisk -l /dev/sdb Disk /dev/sdb: 146.5 GB, 146548981760 bytes 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 2433 19543041 83 Linux [root at tf1 cluster]# [root at tf2 cluster]# fdisk -l /dev/sdb Disk /dev/sdb: 146.5 GB, 146548981760 bytes 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 2433 19543041 83 Linux [root at tf2 cluster]# so both sides see the storage. on tf1, I can start ccsd fine, but on tf2, I cant, and I see May 8 22:00:21 tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May 8 22:00:21 tf2 ccsd: startup failed May 9 20:17:21 tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May 9 20:17:21 tf2 ccsd: startup failed May 9 20:17:30 tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May 9 20:17:30 tf2 ccsd: startup failed [root at tf2 cluster]# in the logs Jason On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote: > Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. Do you > have a shared storage device? If /dev/sdb1 is not a shared device, then I > think > you might need to take a step back and get a hold of a SAN of some type. If > you > are just playing around, there are ways to get some firewire drives to accept > > two hosts and act like a cheap shared devices. There are docs on the Oracle > site documenting the process of setting up the drive and the kernel. Note, > that > you'll only be able to use two nodes using the firewire idea. > > Also, you should specify a partition for the command below. That partition > can > be very small. Something on the order of 10MB sounds right. Even that is > probably > way too big. Then use the rest for GFS storage pools. > > > Corey > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > Sent: Monday, May 08, 2006 9:32 PM > To: linux-cluster at redhat.com > Subject: [Linux-cluster] question about creating partitions and gfs > > so still following instructions at > http://www.gyrate.org/archives/9 > im at the part that says > > "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1" > > in my config, I have the dell PERC 4/DC cards, and I believe the logical > drive showed up as /dev/sdb > > so do I need to create a partition on this logical drive with fdisk first > before I run > > ccs_tool create /root/cluster /dev/sdb1 > > or am I totally off track here? > > i did ccs_tool create /root/cluster /dev/sdb and it seemed to work fine, but > doesnt seem right.. > > Jason > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- ================================================ | Jason Welsh jason at monsterjam.org | | http://monsterjam.org DSS PGP: 0x5E30CC98 | | gpg key: http://monsterjam.org/gpg/ | ================================================ From saju8 at rediffmail.com Wed May 10 04:17:16 2006 From: saju8 at rediffmail.com (saju john) Date: 10 May 2006 04:17:16 -0000 Subject: [Linux-cluster] Centralized Cron Message-ID: <20060510041716.8037.qmail@webmail50.rediffmail.com> Dear All, Thanks for all replay. What i need exactly is How to make the Cron centralized and NOT how to make it not running on backup node. Without having a centralized cron I need to edit the cron file in all nodes.This is the difficulty which I am facing. Any suggession will be valuable. Thank You, Saju John On Tue, 09 May 2006 Nick Burrett wrote : >saju john wrote: >> >>Dear All, >> >>Is there any way to make a centalized cron while using Redhat HA cluster with Sahred storage. I mean to put the crontab entry for a particular user on shared storage, so that when the cluster shifts, on the other node cron should read from the cron file in shared storage. > >If you want some form of high availability cron, you could try to leverage the Condor application to suit your needs. If you link your cron applications against the Condor libraries, then you get process check pointing and all sorts of other wonderful stuff. > > >>This setup has the advantage that we don't need to manullay update the cron entry in both nodes. > >>b) Soft link the cron directory in /var/spool to /path/to/shared/storage/cron. This is working till the cluster shift. The cron is getting dead when the cluster shifts as it lose the /var/spool/cron link's destination driectory which will be mapped to the other node > >You need to add in a heartbeat trigger. The cron daemon runs on one server only. When that server goes offline, then start the cron daemon on the backup server. This is a terrible solution though. > > >Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjk at techma.com Wed May 10 12:30:58 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Wed, 10 May 2006 08:30:58 -0400 Subject: [Linux-cluster] question about creating partitions and gfs Message-ID: Jason, couple of questions.... (And I assume you are working with RHEL3+GFS6.0x) 1. Are you actually using raw devices? if so, why? 2. Does the device /dev/raw/raw64 actually exist on tf2? GFS does not use raw devices for anything. The standard Redhat Cluster suite does, but not GFS. GFS uses "storage pools". Also, if memory servs me right, later versions of GFS for RHEL3 need to be told what pools to use in the "/etc/sysconfig/gfs" config file. Used to be that GFS just did a scan and "found" the pools, but no longer I believe. Hope this helps. If not, can you give more details about your config? Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason Sent: Tuesday, May 09, 2006 8:23 PM To: linux clustering Subject: Re: [Linux-cluster] question about creating partitions and gfs yes, both boxes are connected to the storage, its a dell powervault 220S configured for cluster mode. [root at tf1 cluster]# fdisk -l /dev/sdb Disk /dev/sdb: 146.5 GB, 146548981760 bytes 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 2433 19543041 83 Linux [root at tf1 cluster]# [root at tf2 cluster]# fdisk -l /dev/sdb Disk /dev/sdb: 146.5 GB, 146548981760 bytes 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 2433 19543041 83 Linux [root at tf2 cluster]# so both sides see the storage. on tf1, I can start ccsd fine, but on tf2, I cant, and I see May 8 22:00:21 tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May 8 22:00:21 tf2 ccsd: startup failed May 9 20:17:21 tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May 9 20:17:21 tf2 ccsd: startup failed May 9 20:17:30 tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May 9 20:17:30 tf2 ccsd: startup failed [root at tf2 cluster]# in the logs Jason On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote: > Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. > Do you have a shared storage device? If /dev/sdb1 is not a shared > device, then I think you might need to take a step back and get a hold > of a SAN of some type. If you are just playing around, there are ways > to get some firewire drives to accept > > two hosts and act like a cheap shared devices. There are docs on the > Oracle site documenting the process of setting up the drive and the > kernel. Note, that you'll only be able to use two nodes using the > firewire idea. > > Also, you should specify a partition for the command below. That > partition can be very small. Something on the order of 10MB sounds > right. Even that is probably way too big. Then use the rest for GFS > storage pools. > > > Corey > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > Sent: Monday, May 08, 2006 9:32 PM > To: linux-cluster at redhat.com > Subject: [Linux-cluster] question about creating partitions and gfs > > so still following instructions at > http://www.gyrate.org/archives/9 > im at the part that says > > "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1" > > in my config, I have the dell PERC 4/DC cards, and I believe the > logical drive showed up as /dev/sdb > > so do I need to create a partition on this logical drive with fdisk > first before I run > > ccs_tool create /root/cluster /dev/sdb1 > > or am I totally off track here? > > i did ccs_tool create /root/cluster /dev/sdb and it seemed to work > fine, but doesnt seem right.. > > Jason > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- ================================================ | Jason Welsh jason at monsterjam.org | | http://monsterjam.org DSS PGP: 0x5E30CC98 | | gpg key: http://monsterjam.org/gpg/ | ================================================ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From cjk at techma.com Wed May 10 12:33:04 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Wed, 10 May 2006 08:33:04 -0400 Subject: [Linux-cluster] question about creating partitions and gfs Message-ID: Jason, I just realized what the problem is. You need to apply the config to a "pool" not a normal device. What do your pooll definitions look like? The one you created for the config is where you need to point ccs_tool at to activate the config... Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J. Sent: Wednesday, May 10, 2006 8:31 AM To: linux clustering Subject: RE: [Linux-cluster] question about creating partitions and gfs Jason, couple of questions.... (And I assume you are working with RHEL3+GFS6.0x) 1. Are you actually using raw devices? if so, why? 2. Does the device /dev/raw/raw64 actually exist on tf2? GFS does not use raw devices for anything. The standard Redhat Cluster suite does, but not GFS. GFS uses "storage pools". Also, if memory servs me right, later versions of GFS for RHEL3 need to be told what pools to use in the "/etc/sysconfig/gfs" config file. Used to be that GFS just did a scan and "found" the pools, but no longer I believe. Hope this helps. If not, can you give more details about your config? Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason Sent: Tuesday, May 09, 2006 8:23 PM To: linux clustering Subject: Re: [Linux-cluster] question about creating partitions and gfs yes, both boxes are connected to the storage, its a dell powervault 220S configured for cluster mode. [root at tf1 cluster]# fdisk -l /dev/sdb Disk /dev/sdb: 146.5 GB, 146548981760 bytes 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 2433 19543041 83 Linux [root at tf1 cluster]# [root at tf2 cluster]# fdisk -l /dev/sdb Disk /dev/sdb: 146.5 GB, 146548981760 bytes 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 2433 19543041 83 Linux [root at tf2 cluster]# so both sides see the storage. on tf1, I can start ccsd fine, but on tf2, I cant, and I see May 8 22:00:21 tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May 8 22:00:21 tf2 ccsd: startup failed May 9 20:17:21 tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May 9 20:17:21 tf2 ccsd: startup failed May 9 20:17:30 tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May 9 20:17:30 tf2 ccsd: startup failed [root at tf2 cluster]# in the logs Jason On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote: > Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. > Do you have a shared storage device? If /dev/sdb1 is not a shared > device, then I think you might need to take a step back and get a hold > of a SAN of some type. If you are just playing around, there are ways > to get some firewire drives to accept > > two hosts and act like a cheap shared devices. There are docs on the > Oracle site documenting the process of setting up the drive and the > kernel. Note, that you'll only be able to use two nodes using the > firewire idea. > > Also, you should specify a partition for the command below. That > partition can be very small. Something on the order of 10MB sounds > right. Even that is probably way too big. Then use the rest for GFS > storage pools. > > > Corey > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > Sent: Monday, May 08, 2006 9:32 PM > To: linux-cluster at redhat.com > Subject: [Linux-cluster] question about creating partitions and gfs > > so still following instructions at > http://www.gyrate.org/archives/9 > im at the part that says > > "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1" > > in my config, I have the dell PERC 4/DC cards, and I believe the > logical drive showed up as /dev/sdb > > so do I need to create a partition on this logical drive with fdisk > first before I run > > ccs_tool create /root/cluster /dev/sdb1 > > or am I totally off track here? > > i did ccs_tool create /root/cluster /dev/sdb and it seemed to work > fine, but doesnt seem right.. > > Jason > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- ================================================ | Jason Welsh jason at monsterjam.org | | http://monsterjam.org DSS PGP: 0x5E30CC98 | | gpg key: http://monsterjam.org/gpg/ | ================================================ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From jason at monsterjam.org Wed May 10 13:07:58 2006 From: jason at monsterjam.org (Jason) Date: Wed, 10 May 2006 09:07:58 -0400 Subject: [Linux-cluster] question about creating partitions and gfs In-Reply-To: References: Message-ID: <20060510130758.GA48550@monsterjam.org> On Wed, May 10, 2006 at 08:30:58AM -0400, Kovacs, Corey J. wrote: > Jason, couple of questions.... (And I assume you are working with > RHEL3+GFS6.0x) [root at tf1 cluster]# cat /etc/redhat-release Red Hat Enterprise Linux AS release 3 (Taroon Update 7) [root at tf1 cluster]# [root at tf1 cluster]# rpm -qa | grep -i gfs GFS-modules-smp-6.0.2.30-0 GFS-devel-6.0.2.30-0 GFS-debuginfo-6.0.2.30-0 GFS-6.0.2.30-0 GFS-modules-6.0.2.30-0 [root at tf1 cluster]# > > > 1. Are you actually using raw devices? if so, why? not intentionally.. ;) > 2. Does the device /dev/raw/raw64 actually exist on tf2? [root at tf2 cluster]# !ls ls -al /dev/raw/raw64 crw-rw---- 1 root disk 162, 64 Jun 24 2004 /dev/raw/raw64 [root at tf2 cluster]# [root at tf1 cluster]# !ls ls -al /dev/raw/raw64 crw-rw---- 1 root disk 162, 64 Jun 24 2004 /dev/raw/raw64 [root at tf1 cluster]# so theyre both there.. > > > GFS does not use raw devices for anything. The standard Redhat Cluster suite > does, but not GFS. GFS uses "storage pools". Also, if memory servs me right, > later versions of GFS for RHEL3 need to be told what pools to use in the > "/etc/sysconfig/gfs" config file. Used to be that GFS just did a scan and > "found" the pools, but no longer I believe. > in /etc/sysconfig/gfs on both boxes, I have CCS_ARCHIVE="/dev/sdb1" (everything else is commented out) regards, Jason From lhh at redhat.com Wed May 10 14:15:38 2006 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 10 May 2006 10:15:38 -0400 Subject: [Linux-cluster] question about creating partitions and gfs In-Reply-To: <20060509013213.GA91908@monsterjam.org> References: <20060509013213.GA91908@monsterjam.org> Message-ID: <1147270538.11396.72.camel@ayanami.boston.redhat.com> On Mon, 2006-05-08 at 21:32 -0400, Jason wrote: > so still following instructions at > http://www.gyrate.org/archives/9 > im at the part that says > > "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1" > > in my config, I have the dell PERC 4/DC cards, and I believe the logical drive showed up as > /dev/sdb > > so do I need to create a partition on this logical drive with fdisk first before I run Yes. > i did ccs_tool create /root/cluster /dev/sdb > and it seemed to work fine, but doesnt seem right.. Well, you could do that, but you're using the entire logical drive for the configuration archive -- which is probably not what you want. -- Lon From stephen.willey at framestore-cfc.com Tue May 9 13:14:38 2006 From: stephen.willey at framestore-cfc.com (Stephen Willey) Date: Tue, 09 May 2006 14:14:38 +0100 Subject: [Linux-cluster] GFS resiliancy Message-ID: <446095BE.4010706@framestore-cfc.com> If the GFS filesystem is built over several discs LVMed together, how will it behave in the event of a failure of one of those discs (when setup with a linear striped LVM)? Is there any way of securing the filesystem against the failure of any one physical volume? Thanks, Stephen From stephen.willey at framestore-cfc.com Tue May 9 13:18:12 2006 From: stephen.willey at framestore-cfc.com (Stephen Willey) Date: Tue, 09 May 2006 14:18:12 +0100 Subject: [Linux-cluster] Size limits of the various components Message-ID: <44609694.7060609@framestore-cfc.com> We're testing GFS on 64 bit servers/64 bit RHEL4 and need to know how big LVM2 and GFS will scale. Can anyone tell me the maximum sizes of these component parts: GFS filesystem (C)LVM2 logical volume (C)LVM2 volume group (C)LVM2 physical volumes We're considering building a filesystem that may need to scale to 100Tb or more and I've found various different answers on this list and elsewhere. Stephen From stephen.willey at framestore-cfc.com Tue May 9 13:23:43 2006 From: stephen.willey at framestore-cfc.com (Stephen Willey) Date: Tue, 09 May 2006 14:23:43 +0100 Subject: [Linux-cluster] Slow dfs Message-ID: <446097DF.4080005@framestore-cfc.com> I saw a question a while back from Jeffrey Bethke about speeding up df operations. We're considering building a large GFS filesystem and the 11Tb filesystem that we have now can take a very long time to return from a df (either regular or using gfs_tool). Once it's run once it appears to cache the information and runs quite quickly, but we're concerned about how this will scale once we get up to 100 or so Tb. Waiting ages for a df to return isn't ideal... Stephen From eric at bootseg.com Wed May 10 15:59:56 2006 From: eric at bootseg.com (Eric Kerin) Date: Wed, 10 May 2006 11:59:56 -0400 Subject: [Linux-cluster] GFS resiliancy In-Reply-To: <446095BE.4010706@framestore-cfc.com> References: <446095BE.4010706@framestore-cfc.com> Message-ID: <1147276797.3533.6.camel@auh5-0479.corp.jabil.org> On Tue, 2006-05-09 at 14:14 +0100, Stephen Willey wrote: > If the GFS filesystem is built over several discs LVMed together, how > will it behave in the event of a failure of one of those discs (when > setup with a linear striped LVM)? > > Is there any way of securing the filesystem against the failure of any > one physical volume? > With GFS this needs to be handled at the storage layer, you'll need a storage subsystem that supports some form of RAID (4/5/1+0/etc) to keep disk failures from destroying your filesystem. You can't use software RAID, or LVM, since they aren't currently cluster aware. Thanks, Eric Kerin eric at bootseg.com From teigland at redhat.com Wed May 10 16:30:14 2006 From: teigland at redhat.com (David Teigland) Date: Wed, 10 May 2006 11:30:14 -0500 Subject: [Linux-cluster] Slow dfs In-Reply-To: <446097DF.4080005@framestore-cfc.com> References: <446097DF.4080005@framestore-cfc.com> Message-ID: <20060510163014.GB26524@redhat.com> On Tue, May 09, 2006 at 02:23:43PM +0100, Stephen Willey wrote: > I saw a question a while back from Jeffrey Bethke about speeding up df > operations. > > We're considering building a large GFS filesystem and the 11Tb > filesystem that we have now can take a very long time to return from a > df (either regular or using gfs_tool). > > Once it's run once it appears to cache the information and runs quite > quickly, but we're concerned about how this will scale once we get up to > 100 or so Tb. Waiting ages for a df to return isn't ideal... It'll get slower as the fs grows. In GFS2 df will have no delay at all regardless of fs size -- tradeoff is that it's "fuzzy", not perfectly accurate. Dave From jlbeti at dsic.upv.es Wed May 10 16:33:25 2006 From: jlbeti at dsic.upv.es (Jose Luis Beti) Date: Wed, 10 May 2006 18:33:25 +0200 Subject: [Linux-cluster] Are raw partitions needed in RHCS4? Message-ID: <1147278805.2595.46.camel@superlopez.dsic.upv.es> Hi all, RHCS4 manual talks about crating 2 raw partitions if we are using shared storage, but after creating them, they are not used any more. Anyone could tell me if it's necessary to create raw partitions? Thanks in advanced. Sorry if the question has been answer before. Jose Luis. -- ------------------------------------------------ Jose Luis Beti Departament de Sistemes Informatics i Computacio Universitat Politecnica de Valencia Telefon: 963877355 Extensio: 73553 From cjk at techma.com Wed May 10 17:36:59 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Wed, 10 May 2006 13:36:59 -0400 Subject: [Linux-cluster] Are raw partitions needed in RHCS4? Message-ID: RHCS4? or RHCS3? RHCS3 uses them, not 4. If it's in the docs for 4 it's a mistake. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jose Luis Beti Sent: Wednesday, May 10, 2006 12:33 PM To: linux clustering Subject: [Linux-cluster] Are raw partitions needed in RHCS4? Hi all, RHCS4 manual talks about crating 2 raw partitions if we are using shared storage, but after creating them, they are not used any more. Anyone could tell me if it's necessary to create raw partitions? Thanks in advanced. Sorry if the question has been answer before. Jose Luis. -- ------------------------------------------------ Jose Luis Beti Departament de Sistemes Informatics i Computacio Universitat Politecnica de Valencia Telefon: 963877355 Extensio: 73553 -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From jlbeti at dsic.upv.es Wed May 10 18:45:33 2006 From: jlbeti at dsic.upv.es (Jose Luis Beti) Date: Wed, 10 May 2006 20:45:33 +0200 Subject: [Linux-cluster] Are raw partitions needed in RHCS4? In-Reply-To: References: Message-ID: <1147286733.2595.59.camel@superlopez.dsic.upv.es> I was talking about RHCS4. Thanks again. El mi?, 10-05-2006 a las 13:36 -0400, Kovacs, Corey J. escribi?: -- ------------------------------------------------ Jose Luis Beti Departament de Sistemes Informatics i Computacio Universitat Politecnica de Valencia Telefon: 963877355 Extensio: 73553 From gstaltari at arnet.net.ar Wed May 10 19:12:28 2006 From: gstaltari at arnet.net.ar (German Staltari) Date: Wed, 10 May 2006 16:12:28 -0300 Subject: [Linux-cluster] lot of scsi devices bug Message-ID: <44623B1C.2040108@arnet.net.ar> Hi, this is maybe a udev bug, but it affected me when I was creating a lv in a cluster, so it could help some with this configuration. When I added some scsi disk (SAN) to the cluster nodes (more than 64 SCSI devices), udev created the device node for capi20 instead of sdbm. This produced a bad behavior in lvm when I was trying to create the vg's and lv's, it started to give errors like: Error locking on node node-06: Internal lvm error, check syslog Error locking on node node-05: Internal lvm error, check syslog Error locking on node node-04: Internal lvm error, check syslog Error locking on node node-01: Internal lvm error, check syslog Error locking on node node-02: Internal lvm error, check syslog Error locking on node node-03: Internal lvm error, check syslog Failed to activate new LV. When I commented out this lines SYSFS{dev}="68:0", NAME="capi20" SYSFS{dev}="191:[0-9]*", NAME="capi/%n" KERNEL=="capi*", MODE="0660" in /etc/udev/rules.d/50-udev.rules, everything worked again. I hope this could help, German Staltari From gstaltari at arnet.net.ar Wed May 10 20:54:32 2006 From: gstaltari at arnet.net.ar (German Staltari) Date: Wed, 10 May 2006 17:54:32 -0300 Subject: [Linux-cluster] lot of scsi devices bug In-Reply-To: <44623B1C.2040108@arnet.net.ar> References: <44623B1C.2040108@arnet.net.ar> Message-ID: <44625308.5010402@arnet.net.ar> German Staltari wrote: > Hi, this is maybe a udev bug, but it affected me when I was creating a > lv in a cluster, so it could help some with this configuration. > When I added some scsi disk (SAN) to the cluster nodes (more than 64 > SCSI devices), udev created the device node for capi20 instead of > sdbm. This produced a bad behavior in lvm when I was trying to create > the vg's and lv's, it started to give errors like: > > Error locking on node node-06: Internal lvm error, check syslog > Error locking on node node-05: Internal lvm error, check syslog > Error locking on node node-04: Internal lvm error, check syslog > Error locking on node node-01: Internal lvm error, check syslog > Error locking on node node-02: Internal lvm error, check syslog > Error locking on node node-03: Internal lvm error, check syslog > Failed to activate new LV. > > When I commented out this lines > > SYSFS{dev}="68:0", NAME="capi20" > SYSFS{dev}="191:[0-9]*", NAME="capi/%n" > KERNEL=="capi*", MODE="0660" > > in /etc/udev/rules.d/50-udev.rules, everything worked again. > > I hope this could help, > German Staltari > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > Forgot to add: FC4 system, totally updated. From jason at monsterjam.org Thu May 11 00:53:21 2006 From: jason at monsterjam.org (Jason) Date: Wed, 10 May 2006 20:53:21 -0400 Subject: [Linux-cluster] question about creating partitions and gfs In-Reply-To: References: Message-ID: <20060511005321.GA45370@monsterjam.org> ummm I was thinking that was the answer too, but I have no idea what the "pool" device is.. how can I tell? Jason On Wed, May 10, 2006 at 08:33:04AM -0400, Kovacs, Corey J. wrote: > Jason, I just realized what the problem is. You need to apply the config to a > "pool" > not a normal device. What do your pooll definitions look like? The one you > created > for the config is where you need to point ccs_tool at to activate the > config... > > > Corey > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J. > Sent: Wednesday, May 10, 2006 8:31 AM > To: linux clustering > Subject: RE: [Linux-cluster] question about creating partitions and gfs > > Jason, couple of questions.... (And I assume you are working with > RHEL3+GFS6.0x) > > > 1. Are you actually using raw devices? if so, why? > 2. Does the device /dev/raw/raw64 actually exist on tf2? > > > GFS does not use raw devices for anything. The standard Redhat Cluster suite > does, but not GFS. GFS uses "storage pools". Also, if memory servs me right, > later versions of GFS for RHEL3 need to be told what pools to use in the > "/etc/sysconfig/gfs" config file. Used to be that GFS just did a scan and > "found" the pools, but no longer I believe. > > Hope this helps. If not, can you give more details about your config? > > > > Corey > > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > Sent: Tuesday, May 09, 2006 8:23 PM > To: linux clustering > Subject: Re: [Linux-cluster] question about creating partitions and gfs > > yes, both boxes are connected to the storage, its a dell powervault 220S > configured for cluster mode. > > [root at tf1 cluster]# fdisk -l /dev/sdb > > Disk /dev/sdb: 146.5 GB, 146548981760 bytes > 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512 > = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/sdb1 1 2433 19543041 83 Linux > [root at tf1 cluster]# > > [root at tf2 cluster]# fdisk -l /dev/sdb > > Disk /dev/sdb: 146.5 GB, 146548981760 bytes > 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of 16065 * 512 > = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/sdb1 1 2433 19543041 83 Linux > [root at tf2 cluster]# > > > so both sides see the storage. > > on tf1, I can start ccsd fine, but on tf2, I cant, and I see May 8 22:00:21 > tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or > address May 8 22:00:21 tf2 ccsd: startup failed May 9 20:17:21 tf2 ccsd: > Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address May 9 > 20:17:21 tf2 ccsd: startup failed May 9 20:17:30 tf2 ccsd: Unable to open > /dev/sdb1 (/dev/raw/raw64): No such device or address May 9 20:17:30 tf2 > ccsd: startup failed > [root at tf2 cluster]# > > in the logs > > Jason > > > > > On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote: > > Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. > > Do you have a shared storage device? If /dev/sdb1 is not a shared > > device, then I think you might need to take a step back and get a hold > > of a SAN of some type. If you are just playing around, there are ways > > to get some firewire drives to accept > > > > two hosts and act like a cheap shared devices. There are docs on the > > Oracle site documenting the process of setting up the drive and the > > kernel. Note, that you'll only be able to use two nodes using the > > firewire idea. > > > > Also, you should specify a partition for the command below. That > > partition can be very small. Something on the order of 10MB sounds > > right. Even that is probably way too big. Then use the rest for GFS > > storage pools. > > > > > > Corey > > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > > Sent: Monday, May 08, 2006 9:32 PM > > To: linux-cluster at redhat.com > > Subject: [Linux-cluster] question about creating partitions and gfs > > > > so still following instructions at > > http://www.gyrate.org/archives/9 > > im at the part that says > > > > "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1" > > > > in my config, I have the dell PERC 4/DC cards, and I believe the > > logical drive showed up as /dev/sdb > > > > so do I need to create a partition on this logical drive with fdisk > > first before I run > > > > ccs_tool create /root/cluster /dev/sdb1 > > > > or am I totally off track here? > > > > i did ccs_tool create /root/cluster /dev/sdb and it seemed to work > > fine, but doesnt seem right.. > > > > Jason > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > ================================================ > | Jason Welsh jason at monsterjam.org | > | http://monsterjam.org DSS PGP: 0x5E30CC98 | > | gpg key: http://monsterjam.org/gpg/ | > ================================================ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- ================================================ | Jason Welsh jason at monsterjam.org | | http://monsterjam.org DSS PGP: 0x5E30CC98 | | gpg key: http://monsterjam.org/gpg/ | ================================================ From mathieu.avila at seanodes.com Thu May 11 06:44:35 2006 From: mathieu.avila at seanodes.com (Mathieu Avila) Date: Thu, 11 May 2006 08:44:35 +0200 Subject: [Linux-cluster] GFS 6.0 - ccsd configuration file Message-ID: <4462DD53.3060602@seanodes.com> Hello, In GFS 6.0, has anybody experienced using a CCA archive on a local file instead of a shared volume or server ? More precisely, if using this method correctly by managing the consistency of the configuration file over all nodes, is there any greater risk of data corruption than with a shared volume archive or a server ? I am in the special case where i don't want to manage another shared volume, and this option, although less documented, seems better to me. Documentation only tells that it is "less recommanded". Thanks in advance, -- Mathieu From carlopmart at gmail.com Thu May 11 08:04:35 2006 From: carlopmart at gmail.com (carlopmart) Date: Thu, 11 May 2006 10:04:35 +0200 Subject: [Linux-cluster] Postgresql under RHCS4 Message-ID: <4462F013.40201@gmail.com> Hi all, Somebody have tried to setup two nodes with Postgresql under RHCS4?. Is it possible to do this without shared storage like mysql cluster feature does? Thank you very much. -- CL Martinez carlopmart {at} gmail {d0t} com From devrim at gunduz.org Thu May 11 08:18:58 2006 From: devrim at gunduz.org (Devrim GUNDUZ) Date: Thu, 11 May 2006 11:18:58 +0300 (EEST) Subject: [Linux-cluster] Postgresql under RHCS4 In-Reply-To: <4462F013.40201@gmail.com> References: <4462F013.40201@gmail.com> Message-ID: Hi, On Thu, 11 May 2006, carlopmart wrote: > Somebody have tried to setup two nodes with Postgresql under RHCS4?. Is it > possible to do this without shared storage like mysql cluster feature does? If you want to run an active/passive cluster, then go on with RHCS+ext{2,3}(or GFS). PostgreSQL cannot run on active/active cluster systems, natively. However, you might give PgCluster a try. Even though it is not the best way, it is worth trying. Slony-II will be implementing multimaster replication feature, but it is still under development. BTW, GFS is not a prerequisite, if you run an active/passive cluster; however I used it, in order to prevent a spof. Regards, -- Devrim GUNDUZ devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org From nattaponv at hotmail.com Thu May 11 08:21:43 2006 From: nattaponv at hotmail.com (nattapon viroonsri) Date: Thu, 11 May 2006 08:21:43 +0000 Subject: [Linux-cluster] fence_manual problem Message-ID: I use rhcs 4 on rhel 4.0 I have setup 2 node cluster use manual fenceing node name = cluster1 , cluster2 It failover completely if i stop service for each nod But when i try to disconnected cable from cluster1 both node try to fence each other and have following log: May 11 15:50:26 cluster2 fenced[2183]: fencing node "cluster1" May 11 15:50:26 cluster2 fenced[2183]: agent "fence_manual" reports: failed: fence_manual no node name May 11 15:50:26 cluster2 fenced[2183]: fence "cluster1" failed I try to run "fence_ack_manual -n node1" but it's out put show that have no file "/tmp/fence_manual.fifo" so i create "/tmp/fence_manual.fifo" manually and re run fence_ack_manual it show "done" but in the logfile still the same as if no thing happen and service still not failover Nattapon, Regard _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ From nattaponv at hotmail.com Thu May 11 08:46:53 2006 From: nattaponv at hotmail.com (nattapon viroonsri) Date: Thu, 11 May 2006 08:46:53 +0000 Subject: [Linux-cluster] manual_fence problem Message-ID: I use rhcs 4 on rhel 4.0 I have setup 2 node cluster use manual fenceing node name = cluster1 , cluster2 It failover completely if i stop service for each nod But when i try to disconnected cable from cluster1 both node try to fence each other and have following log: May 11 15:50:26 cluster2 fenced[2183]: fencing node "cluster1" May 11 15:50:26 cluster2 fenced[2183]: agent "fence_manual" reports: failed: fence_manual no node name May 11 15:50:26 cluster2 fenced[2183]: fence "cluster1" failed >From system-config-cluster menu it have no parameter to specify node name for manual fencing but in command line can. so I try to run "fence_ack_manual -n node1" but it's out put show that have no file "/tmp/fence_manual.fifo" after i create "/tmp/fence_manual.fifo" manually and re run fence_ack_manual it show "done" but in the logfile still the same as if no thing happen and service still not failover Nattapon, Regard _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ From cjk at techma.com Thu May 11 11:16:14 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Thu, 11 May 2006 07:16:14 -0400 Subject: [Linux-cluster] question about creating partitions and gfs In-Reply-To: <20060511005321.GA45370@monsterjam.org> Message-ID: Jason, the docs should run through the creation of the pool devices. They can be a bit of a labrynth though, so here is an example called "pool_cca.cfg". <----cut here----> poolname pool_cca #name of the pool/volume to create subpools 1 #how many subpools make up this pool/volume (always starts as 1) subpool 0 128 1 gfs_data #first subpool, zero indexed, 128k stripe, 1 devices pooldevice 0 0 /dev/sdb1 #physical device for pool 0, device 0 (again, zero indexed) <-end cut here --> Additional pools just need a different "poolname" and "pooldevice". NOTE, the cluster nodes need to be "seeing" the devices listed as pooldevices the same way. node1 sees the second physical disk as /dev/sdb, then third as /dev/sdc and so on. Now, if you make /dev/sdb1 about 10MB, you'll have enough space to create a cluster config pool. Then to actually use it, you need to do the following... pool_tool -c pool_cca.cfg then you can issue ... service pool start on all nodes. Just make sure all nodes have a clean view of the partition table (reboot, or issue partprobe). Once you have the cca pool created and activated, you can apply the cluster config to it... ccs_tool create /path/to/configs/ /dev/pool/pool_cca Then do a "service ccsd start" on all nodes followed by "service lock_gulmd start" on all nodes.. To check to see if things are working...do... gulm_tool nodelist nameofalockserver and you should see a list of your nodes and some info about each one. That's should be enough to get you started. to add storage for actual gfs filesystems, simply create more pools. you can also expand pools by adding subpools after creation. It's sort of a poor mans volume management if you will. It can be done to a running system and the filesystem on top of it can be expaned live as well. Anyway, hope this helps... Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason Sent: Wednesday, May 10, 2006 8:53 PM To: linux clustering Subject: Re: [Linux-cluster] question about creating partitions and gfs ummm I was thinking that was the answer too, but I have no idea what the "pool" device is.. how can I tell? Jason On Wed, May 10, 2006 at 08:33:04AM -0400, Kovacs, Corey J. wrote: > Jason, I just realized what the problem is. You need to apply the > config to a "pool" > not a normal device. What do your pooll definitions look like? The > one you created for the config is where you need to point ccs_tool at > to activate the config... > > > Corey > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J. > Sent: Wednesday, May 10, 2006 8:31 AM > To: linux clustering > Subject: RE: [Linux-cluster] question about creating partitions and > gfs > > Jason, couple of questions.... (And I assume you are working with > RHEL3+GFS6.0x) > > > 1. Are you actually using raw devices? if so, why? > 2. Does the device /dev/raw/raw64 actually exist on tf2? > > > GFS does not use raw devices for anything. The standard Redhat Cluster > suite does, but not GFS. GFS uses "storage pools". Also, if memory > servs me right, later versions of GFS for RHEL3 need to be told what > pools to use in the "/etc/sysconfig/gfs" config file. Used to be that > GFS just did a scan and "found" the pools, but no longer I believe. > > Hope this helps. If not, can you give more details about your config? > > > > Corey > > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > Sent: Tuesday, May 09, 2006 8:23 PM > To: linux clustering > Subject: Re: [Linux-cluster] question about creating partitions and > gfs > > yes, both boxes are connected to the storage, its a dell powervault > 220S configured for cluster mode. > > [root at tf1 cluster]# fdisk -l /dev/sdb > > Disk /dev/sdb: 146.5 GB, 146548981760 bytes > 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of > 16065 * 512 = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/sdb1 1 2433 19543041 83 Linux > [root at tf1 cluster]# > > [root at tf2 cluster]# fdisk -l /dev/sdb > > Disk /dev/sdb: 146.5 GB, 146548981760 bytes > 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of > 16065 * 512 = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/sdb1 1 2433 19543041 83 Linux > [root at tf2 cluster]# > > > so both sides see the storage. > > on tf1, I can start ccsd fine, but on tf2, I cant, and I see May 8 > 22:00:21 > tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or > address May 8 22:00:21 tf2 ccsd: startup failed May 9 20:17:21 tf2 ccsd: > Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address > May 9 > 20:17:21 tf2 ccsd: startup failed May 9 20:17:30 tf2 ccsd: Unable to > open > /dev/sdb1 (/dev/raw/raw64): No such device or address May 9 20:17:30 > tf2 > ccsd: startup failed > [root at tf2 cluster]# > > in the logs > > Jason > > > > > On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote: > > Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. > > Do you have a shared storage device? If /dev/sdb1 is not a shared > > device, then I think you might need to take a step back and get a > > hold of a SAN of some type. If you are just playing around, there > > are ways to get some firewire drives to accept > > > > two hosts and act like a cheap shared devices. There are docs on the > > Oracle site documenting the process of setting up the drive and the > > kernel. Note, that you'll only be able to use two nodes using the > > firewire idea. > > > > Also, you should specify a partition for the command below. That > > partition can be very small. Something on the order of 10MB sounds > > right. Even that is probably way too big. Then use the rest for GFS > > storage pools. > > > > > > Corey > > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > > Sent: Monday, May 08, 2006 9:32 PM > > To: linux-cluster at redhat.com > > Subject: [Linux-cluster] question about creating partitions and gfs > > > > so still following instructions at > > http://www.gyrate.org/archives/9 > > im at the part that says > > > > "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1" > > > > in my config, I have the dell PERC 4/DC cards, and I believe the > > logical drive showed up as /dev/sdb > > > > so do I need to create a partition on this logical drive with fdisk > > first before I run > > > > ccs_tool create /root/cluster /dev/sdb1 > > > > or am I totally off track here? > > > > i did ccs_tool create /root/cluster /dev/sdb and it seemed to work > > fine, but doesnt seem right.. > > > > Jason > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > ================================================ > | Jason Welsh jason at monsterjam.org | > | http://monsterjam.org DSS PGP: 0x5E30CC98 | > | gpg key: http://monsterjam.org/gpg/ | > ================================================ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- ================================================ | Jason Welsh jason at monsterjam.org | | http://monsterjam.org DSS PGP: 0x5E30CC98 | | gpg key: http://monsterjam.org/gpg/ | ================================================ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From cjk at techma.com Thu May 11 11:20:12 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Thu, 11 May 2006 07:20:12 -0400 Subject: [Linux-cluster] GFS 6.0 - ccsd configuration file In-Reply-To: <4462DD53.3060602@seanodes.com> Message-ID: I've used it (in testing) and I think the main reasons it's "less recommended" is that it's easier to keep a single copy consistant on shared storage as it's automatic. However the method you mention must be kept consistant manually. I don't think there is any greater risk for data loss other than that. Others may know more tho... Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Mathieu Avila Sent: Thursday, May 11, 2006 2:45 AM To: linux clustering Subject: [Linux-cluster] GFS 6.0 - ccsd configuration file Hello, In GFS 6.0, has anybody experienced using a CCA archive on a local file instead of a shared volume or server ? More precisely, if using this method correctly by managing the consistency of the configuration file over all nodes, is there any greater risk of data corruption than with a shared volume archive or a server ? I am in the special case where i don't want to manage another shared volume, and this option, although less documented, seems better to me. Documentation only tells that it is "less recommanded". Thanks in advance, -- Mathieu -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From stephen.willey at framestore-cfc.com Thu May 11 14:29:24 2006 From: stephen.willey at framestore-cfc.com (Stephen Willey) Date: Thu, 11 May 2006 15:29:24 +0100 Subject: [Linux-cluster] gfs_fsck problems (not doing get_get_meta_buffer) Message-ID: <44634A44.9060309@framestore-cfc.com> gfs_fsck seems to break my filesystem! Here's the sequence of events (everything acts as expected unless I state otherwise): pvcreate /dev/sda; pvcreate /dev/sdb vgcreate gfs_vg /dev/sda /dev/sdb vgdisplay lvcreate -l 4171379 gfs_vg -n gfs_lv (the extents number obviously gleaned from vgdisplay) vgchange -aly gfs_mkfs -p lock_dlm -t mycluster:gfs1 -j 8 /dev/gfs_vg/gfs_lv mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2 df -h /mnt/disk2 cd /mnt/disk2 touch 1 2 3 4 5 6 7 8 9 10 ls -lh cd .. umount /mnt/disk2 gfs_fsck -nvv /dev/gfs_vg/gfs_lv (output below - notice I'm running it read-only) Initializing fsck Initializing lists... Initializing special inodes... (file.c:45) readi: Offset (640) is >= the file size (640). (super.c:208) 8 journals found. ATTENTION -- not doing gfs_get_meta_buffer... mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2 cd /mnt/disk2 (successful) ls -lh (successful) cd .. umount /mnt/disk2 gfs_fsck -vv /dev/gfs_vg/gfs_lv (output below) Initializing fsck Initializing lists... (bio.c:140) Writing to 65536 - 16 4096 Initializing special inodes... (file.c:45) readi: Offset (640) is >= the file size (640). (super.c:208) 8 journals found. ATTENTION -- not doing gfs_get_meta_buffer... mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2 (output below) mount: No such file or directory The syslog shows: Lock_Harness 2.6.9-34.R5.2 (built May 11 2006 14:15:58) installed May 11 15:12:43 gfstest1 kernel: GFS 2.6.9-34.R5.2 (built May 11 2006 14:16:10) installed May 11 15:12:43 gfstest1 kernel: GFS: Trying to join cluster "fsck_dlm", "mycluster:gfs1" May 11 15:12:43 gfstest1 kernel: lock_harness: can't find protocol fsck_dlm May 11 15:12:43 gfstest1 kernel: GFS: can't mount proto = fsck_dlm, table = mycluster:gfs1, hostdata = May 11 15:12:43 gfstest1 mount: mount: No such file or directory May 11 15:12:43 gfstest1 gfs: Mounting GFS filesystems: failed If I use the following to change the lock method, I can mount it again: gfs_tool sb /dev/gfs_vg/gfs_lv proto lock_dlm but shortly after I'll sometimes get I/O errors on the drive not letting me cd into it or ls or df. fsck isn't supposed to break clean filesystems so does anyone have any ideas? FYI - The other machines in the cluster were at no point mounting the filesystem during this exercise. Stephen From lhh at redhat.com Thu May 11 14:29:39 2006 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 11 May 2006 10:29:39 -0400 Subject: [Linux-cluster] GFS 6.0 - ccsd configuration file In-Reply-To: References: Message-ID: <1147357779.11396.115.camel@ayanami.boston.redhat.com> On Thu, 2006-05-11 at 07:20 -0400, Kovacs, Corey J. wrote: > I've used it (in testing) and I think the main reasons it's "less > recommended" > is that it's easier to keep a single copy consistant on shared storage as > it's > automatic. However the method you mention must be kept consistant manually. I > don't think there is any greater risk for data loss other than that. Others > may know more tho... That's correct. -- Lon From lhh at redhat.com Thu May 11 14:31:53 2006 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 11 May 2006 10:31:53 -0400 Subject: [Linux-cluster] fence_manual problem In-Reply-To: References: Message-ID: <1147357913.11396.118.camel@ayanami.boston.redhat.com> On Thu, 2006-05-11 at 08:21 +0000, nattapon viroonsri wrote: > I use rhcs 4 on rhel 4.0 > I have setup 2 node cluster use manual fenceing > node name = cluster1 , cluster2 > It failover completely if i stop service for each nod > But when i try to disconnected cable from cluster1 both node try to fence > each other and > have following log: > > May 11 15:50:26 cluster2 fenced[2183]: fencing node "cluster1" > May 11 15:50:26 cluster2 fenced[2183]: agent "fence_manual" reports: failed: > fence_manual no node name > May 11 15:50:26 cluster2 fenced[2183]: fence "cluster1" failed > > I try to run "fence_ack_manual -n node1" but it's out put show that have > no file "/tmp/fence_manual.fifo" > so i create "/tmp/fence_manual.fifo" manually and re run fence_ack_manual > it show "done" > > but in the logfile still the same as if no thing happen and service still > not failover I think the UI is supposed to provide a nodename="name_of_node" in the fence device reference under the given node, but doesn't. I also thought it was fixed recently *scratches head*... -- Lon From roman.tobjasz at 7bulls.com Thu May 11 15:00:34 2006 From: roman.tobjasz at 7bulls.com (Roman Tobjasz) Date: Thu, 11 May 2006 17:00:34 +0200 Subject: [Linux-cluster] IP resource Message-ID: <20060511150034.GA32599@warszawa.7bulls.com> I configured two node cluster. On each node I created bonding device (bond0) as primary network interface. On the 1st node bond0 I assigned IP address 192.168.1.100 (network 192.168.1.0, netmask 255.255.255.0). On the 2nd node bond0 I assigned IP address 192.168.1.101 (network and netmask like above). Next I created IP address 172.16.10.10 as a resource and added it to a service. Service doesn't start. If I change resource IP to 192.168.1.200 then service starts corectly. Is it possible to set up resource IP which isn't from this same network as primary network interface ? Best regards. From cjk at techma.com Thu May 11 16:09:25 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Thu, 11 May 2006 12:09:25 -0400 Subject: [Linux-cluster] cman/dlm errors Message-ID: There seem to be lot bugs filed against cman and dlm in bugzilla post update 3 and I believe I am seeing some of the same problems. In particular, after some time running, if I manually kick one of the nodes in any way (fence, pull the cord, whatever) it ends up taking one of the remaining nodes with it due to a problem in the membership routines of cman. This is also a major pain since this will happen due to what appear to be dlm issues and a node will fall on it's face anwyay, then bring down another member. We are exporting 6 gfs filesystems via nfs which is where the dlm problem seems to stem from. So, I have two questions for the redhat cluster folks.... 1. When is the errata scheduled to come out that covers the latest round of bugs? 2. When the next version of GFS is released, will the new architecture replace the current one for RHEL4 or will it be a RHEL5 only version? I believe the former is true, but I'd like to here it from the redhat folks. >From what I've read lately from all the "features" documents flying around (Thanks for those by the way) things look much better on the horizon then they are right now. Regards Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason at monsterjam.org Fri May 12 01:51:49 2006 From: jason at monsterjam.org (Jason) Date: Thu, 11 May 2006 21:51:49 -0400 Subject: [Linux-cluster] question about creating partitions and gfs In-Reply-To: References: <20060511005321.GA45370@monsterjam.org> Message-ID: <20060512015149.GB64851@monsterjam.org> ok, so reading the docs and your example, they reference /dev/sdb1 this is still the 10 meg partition that i create with fdisk.. right? then what about the rest of the disk? do I need to reference it as a pooldevice as well? i.e. /dev/sdb1 <-10 meg partition /dev/sdb2 <--- rest of logical disk ?? Jason On Thu, May 11, 2006 at 07:16:14AM -0400, Kovacs, Corey J. wrote: > Jason, the docs should run through the creation of the pool devices. They can > be > a bit of a labrynth though, so here is an example called "pool_cca.cfg". > > > <----cut here----> > poolname pool_cca #name of the pool/volume to create > subpools 1 #how many subpools make up this > pool/volume (always starts as 1) > subpool 0 128 1 gfs_data #first subpool, zero indexed, 128k stripe, 1 > devices > pooldevice 0 0 /dev/sdb1 #physical device for pool 0, device 0 (again, > zero indexed) > <-end cut here --> > > Additional pools just need a different "poolname" and "pooldevice". > > NOTE, the cluster nodes need to be "seeing" the devices listed as pooldevices > the same > way. node1 sees the second physical disk as /dev/sdb, then third as /dev/sdc > and so on. > > > Now, if you make /dev/sdb1 about 10MB, you'll have enough space to create a > cluster > config pool. Then to actually use it, you need to do the following... > > pool_tool -c pool_cca.cfg > > then you can issue ... > > service pool start > > on all nodes. Just make sure all nodes have a clean view of the partition > table (reboot, or issue partprobe). > > Once you have the cca pool created and activated, you can apply the cluster > config > to it... > > ccs_tool create /path/to/configs/ /dev/pool/pool_cca > > Then do a "service ccsd start" on all nodes followed by "service lock_gulmd > start" > on all nodes.. > > To check to see if things are working...do... > > gulm_tool nodelist nameofalockserver > > and you should see a list of your nodes and some info about each one. > > That's should be enough to get you started. to add storage for actual gfs > filesystems, simply > create more pools. you can also expand pools by adding subpools after > creation. It's sort of > a poor mans volume management if you will. It can be done to a running system > and the filesystem > on top of it can be expaned live as well. > > > Anyway, hope this helps... > > > Corey > > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > Sent: Wednesday, May 10, 2006 8:53 PM > To: linux clustering > Subject: Re: [Linux-cluster] question about creating partitions and gfs > > ummm I was thinking that was the answer too, but I have no idea what the > "pool" device is.. > how can I tell? > > Jason > > > On Wed, May 10, 2006 at 08:33:04AM -0400, Kovacs, Corey J. wrote: > > Jason, I just realized what the problem is. You need to apply the > > config to a "pool" > > not a normal device. What do your pooll definitions look like? The > > one you created for the config is where you need to point ccs_tool at > > to activate the config... > > > > > > Corey > > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J. > > Sent: Wednesday, May 10, 2006 8:31 AM > > To: linux clustering > > Subject: RE: [Linux-cluster] question about creating partitions and > > gfs > > > > Jason, couple of questions.... (And I assume you are working with > > RHEL3+GFS6.0x) > > > > > > 1. Are you actually using raw devices? if so, why? > > 2. Does the device /dev/raw/raw64 actually exist on tf2? > > > > > > GFS does not use raw devices for anything. The standard Redhat Cluster > > suite does, but not GFS. GFS uses "storage pools". Also, if memory > > servs me right, later versions of GFS for RHEL3 need to be told what > > pools to use in the "/etc/sysconfig/gfs" config file. Used to be that > > GFS just did a scan and "found" the pools, but no longer I believe. > > > > Hope this helps. If not, can you give more details about your config? > > > > > > > > Corey > > > > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > > Sent: Tuesday, May 09, 2006 8:23 PM > > To: linux clustering > > Subject: Re: [Linux-cluster] question about creating partitions and > > gfs > > > > yes, both boxes are connected to the storage, its a dell powervault > > 220S configured for cluster mode. > > > > [root at tf1 cluster]# fdisk -l /dev/sdb > > > > Disk /dev/sdb: 146.5 GB, 146548981760 bytes > > 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of > > 16065 * 512 = 8225280 bytes > > > > Device Boot Start End Blocks Id System > > /dev/sdb1 1 2433 19543041 83 Linux > > [root at tf1 cluster]# > > > > [root at tf2 cluster]# fdisk -l /dev/sdb > > > > Disk /dev/sdb: 146.5 GB, 146548981760 bytes > > 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of > > 16065 * 512 = 8225280 bytes > > > > Device Boot Start End Blocks Id System > > /dev/sdb1 1 2433 19543041 83 Linux > > [root at tf2 cluster]# > > > > > > so both sides see the storage. > > > > on tf1, I can start ccsd fine, but on tf2, I cant, and I see May 8 > > 22:00:21 > > tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or > > address May 8 22:00:21 tf2 ccsd: startup failed May 9 20:17:21 tf2 ccsd: > > Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address > > May 9 > > 20:17:21 tf2 ccsd: startup failed May 9 20:17:30 tf2 ccsd: Unable to > > open > > /dev/sdb1 (/dev/raw/raw64): No such device or address May 9 20:17:30 > > tf2 > > ccsd: startup failed > > [root at tf2 cluster]# > > > > in the logs > > > > Jason > > > > > > > > > > On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote: > > > Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. > > > Do you have a shared storage device? If /dev/sdb1 is not a shared > > > device, then I think you might need to take a step back and get a > > > hold of a SAN of some type. If you are just playing around, there > > > are ways to get some firewire drives to accept > > > > > > two hosts and act like a cheap shared devices. There are docs on the > > > Oracle site documenting the process of setting up the drive and the > > > kernel. Note, that you'll only be able to use two nodes using the > > > firewire idea. > > > > > > Also, you should specify a partition for the command below. That > > > partition can be very small. Something on the order of 10MB sounds > > > right. Even that is probably way too big. Then use the rest for GFS > > > storage pools. > > > > > > > > > Corey > > > > > > -----Original Message----- > > > From: linux-cluster-bounces at redhat.com > > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > > > Sent: Monday, May 08, 2006 9:32 PM > > > To: linux-cluster at redhat.com > > > Subject: [Linux-cluster] question about creating partitions and gfs > > > > > > so still following instructions at > > > http://www.gyrate.org/archives/9 > > > im at the part that says > > > > > > "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1" > > > > > > in my config, I have the dell PERC 4/DC cards, and I believe the > > > logical drive showed up as /dev/sdb > > > > > > so do I need to create a partition on this logical drive with fdisk > > > first before I run > > > > > > ccs_tool create /root/cluster /dev/sdb1 > > > > > > or am I totally off track here? > > > > > > i did ccs_tool create /root/cluster /dev/sdb and it seemed to work > > > fine, but doesnt seem right.. > > > > > > Jason > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > ================================================ > > | Jason Welsh jason at monsterjam.org | > > | http://monsterjam.org DSS PGP: 0x5E30CC98 | > > | gpg key: http://monsterjam.org/gpg/ | > > ================================================ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > ================================================ > | Jason Welsh jason at monsterjam.org | > | http://monsterjam.org DSS PGP: 0x5E30CC98 | > | gpg key: http://monsterjam.org/gpg/ | > ================================================ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- ================================================ | Jason Welsh jason at monsterjam.org | | http://monsterjam.org DSS PGP: 0x5E30CC98 | | gpg key: http://monsterjam.org/gpg/ | ================================================ From nitin.prakash at qsoftindia.com Fri May 12 02:43:01 2006 From: nitin.prakash at qsoftindia.com (Nitin) Date: Fri, 12 May 2006 08:13:01 +0530 Subject: [Linux-cluster] Re: Redhat Cluster Message-ID: <1147401781.4033.4.camel@localhost.localdomain> Dear All, I installed redhat cluster suite on 2 node cluster, i configured NFS service by using NFS druid. But we i am going to start the service buy clicking start cluster locally only one node it is showing started but when i go to other node and start the cluster both nodes are restarted. When i start cluster buy issuing command clumanager in both the node again the nodes are restarting. Please tell the solution for this problem. Regards Nitin. P From cjk at techma.com Fri May 12 11:21:44 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Fri, 12 May 2006 07:21:44 -0400 Subject: [Linux-cluster] question about creating partitions and gfs In-Reply-To: <20060512015149.GB64851@monsterjam.org> Message-ID: Yes, you need to create pool devices for all things gfs, the first of which is the cluster configuration archive. You'll need to make more for actual GFS filesystems you want to create. You can think of pools as cluster aware volumes. Just as in LVM, pools relate to volumes which relate to "presented devices". Make sense? Good luck! Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason Sent: Thursday, May 11, 2006 9:52 PM To: linux clustering Subject: Re: [Linux-cluster] question about creating partitions and gfs ok, so reading the docs and your example, they reference /dev/sdb1 this is still the 10 meg partition that i create with fdisk.. right? then what about the rest of the disk? do I need to reference it as a pooldevice as well? i.e. /dev/sdb1 <-10 meg partition /dev/sdb2 <--- rest of logical disk ?? Jason On Thu, May 11, 2006 at 07:16:14AM -0400, Kovacs, Corey J. wrote: > Jason, the docs should run through the creation of the pool devices. > They can be a bit of a labrynth though, so here is an example called > "pool_cca.cfg". > > > <----cut here----> > poolname pool_cca #name of the pool/volume to create > subpools 1 #how many subpools make up this > pool/volume (always starts as 1) > subpool 0 128 1 gfs_data #first subpool, zero indexed, 128k stripe, 1 > devices > pooldevice 0 0 /dev/sdb1 #physical device for pool 0, device 0 (again, > zero indexed) > <-end cut here --> > > Additional pools just need a different "poolname" and "pooldevice". > > NOTE, the cluster nodes need to be "seeing" the devices listed as > pooldevices the same way. node1 sees the second physical disk as > /dev/sdb, then third as /dev/sdc and so on. > > > Now, if you make /dev/sdb1 about 10MB, you'll have enough space to > create a cluster config pool. Then to actually use it, you need to do > the following... > > pool_tool -c pool_cca.cfg > > then you can issue ... > > service pool start > > on all nodes. Just make sure all nodes have a clean view of the > partition table (reboot, or issue partprobe). > > Once you have the cca pool created and activated, you can apply the > cluster config to it... > > ccs_tool create /path/to/configs/ /dev/pool/pool_cca > > Then do a "service ccsd start" on all nodes followed by "service > lock_gulmd start" > on all nodes.. > > To check to see if things are working...do... > > gulm_tool nodelist nameofalockserver > > and you should see a list of your nodes and some info about each one. > > That's should be enough to get you started. to add storage for actual > gfs filesystems, simply create more pools. you can also expand pools > by adding subpools after creation. It's sort of a poor mans volume > management if you will. It can be done to a running system and the > filesystem on top of it can be expaned live as well. > > > Anyway, hope this helps... > > > Corey > > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > Sent: Wednesday, May 10, 2006 8:53 PM > To: linux clustering > Subject: Re: [Linux-cluster] question about creating partitions and > gfs > > ummm I was thinking that was the answer too, but I have no idea what > the "pool" device is.. > how can I tell? > > Jason > > > On Wed, May 10, 2006 at 08:33:04AM -0400, Kovacs, Corey J. wrote: > > Jason, I just realized what the problem is. You need to apply the > > config to a "pool" > > not a normal device. What do your pooll definitions look like? The > > one you created for the config is where you need to point ccs_tool > > at to activate the config... > > > > > > Corey > > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J. > > Sent: Wednesday, May 10, 2006 8:31 AM > > To: linux clustering > > Subject: RE: [Linux-cluster] question about creating partitions and > > gfs > > > > Jason, couple of questions.... (And I assume you are working with > > RHEL3+GFS6.0x) > > > > > > 1. Are you actually using raw devices? if so, why? > > 2. Does the device /dev/raw/raw64 actually exist on tf2? > > > > > > GFS does not use raw devices for anything. The standard Redhat > > Cluster suite does, but not GFS. GFS uses "storage pools". Also, if > > memory servs me right, later versions of GFS for RHEL3 need to be > > told what pools to use in the "/etc/sysconfig/gfs" config file. Used > > to be that GFS just did a scan and "found" the pools, but no longer I believe. > > > > Hope this helps. If not, can you give more details about your config? > > > > > > > > Corey > > > > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > > Sent: Tuesday, May 09, 2006 8:23 PM > > To: linux clustering > > Subject: Re: [Linux-cluster] question about creating partitions and > > gfs > > > > yes, both boxes are connected to the storage, its a dell powervault > > 220S configured for cluster mode. > > > > [root at tf1 cluster]# fdisk -l /dev/sdb > > > > Disk /dev/sdb: 146.5 GB, 146548981760 bytes > > 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of > > 16065 * 512 = 8225280 bytes > > > > Device Boot Start End Blocks Id System > > /dev/sdb1 1 2433 19543041 83 Linux > > [root at tf1 cluster]# > > > > [root at tf2 cluster]# fdisk -l /dev/sdb > > > > Disk /dev/sdb: 146.5 GB, 146548981760 bytes > > 255 heads, 63 sectors/track, 17816 cylinders Units = cylinders of > > 16065 * 512 = 8225280 bytes > > > > Device Boot Start End Blocks Id System > > /dev/sdb1 1 2433 19543041 83 Linux > > [root at tf2 cluster]# > > > > > > so both sides see the storage. > > > > on tf1, I can start ccsd fine, but on tf2, I cant, and I see May 8 > > 22:00:21 > > tf2 ccsd: Unable to open /dev/sdb1 (/dev/raw/raw64): No such device > > or address May 8 22:00:21 tf2 ccsd: startup failed May 9 20:17:21 tf2 ccsd: > > Unable to open /dev/sdb1 (/dev/raw/raw64): No such device or address > > May 9 > > 20:17:21 tf2 ccsd: startup failed May 9 20:17:30 tf2 ccsd: Unable > > to open > > /dev/sdb1 (/dev/raw/raw64): No such device or address May 9 > > 20:17:30 > > tf2 > > ccsd: startup failed > > [root at tf2 cluster]# > > > > in the logs > > > > Jason > > > > > > > > > > On Tue, May 09, 2006 at 08:16:07AM -0400, Kovacs, Corey J. wrote: > > > Jason, if IIRC, the dells internal disks show up as /dev/sd* devices. > > > Do you have a shared storage device? If /dev/sdb1 is not a shared > > > device, then I think you might need to take a step back and get a > > > hold of a SAN of some type. If you are just playing around, there > > > are ways to get some firewire drives to accept > > > > > > two hosts and act like a cheap shared devices. There are docs on > > > the Oracle site documenting the process of setting up the drive > > > and the kernel. Note, that you'll only be able to use two nodes > > > using the firewire idea. > > > > > > Also, you should specify a partition for the command below. That > > > partition can be very small. Something on the order of 10MB sounds > > > right. Even that is probably way too big. Then use the rest for > > > GFS storage pools. > > > > > > > > > Corey > > > > > > -----Original Message----- > > > From: linux-cluster-bounces at redhat.com > > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jason > > > Sent: Monday, May 08, 2006 9:32 PM > > > To: linux-cluster at redhat.com > > > Subject: [Linux-cluster] question about creating partitions and > > > gfs > > > > > > so still following instructions at > > > http://www.gyrate.org/archives/9 > > > im at the part that says > > > > > > "# ccs_tool create /root/cluster /dev/iscsi/bus0/target0/lun0/part1" > > > > > > in my config, I have the dell PERC 4/DC cards, and I believe the > > > logical drive showed up as /dev/sdb > > > > > > so do I need to create a partition on this logical drive with > > > fdisk first before I run > > > > > > ccs_tool create /root/cluster /dev/sdb1 > > > > > > or am I totally off track here? > > > > > > i did ccs_tool create /root/cluster /dev/sdb and it seemed to work > > > fine, but doesnt seem right.. > > > > > > Jason > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > ================================================ > > | Jason Welsh jason at monsterjam.org | > > | http://monsterjam.org DSS PGP: 0x5E30CC98 | > > | gpg key: http://monsterjam.org/gpg/ | > > ================================================ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > ================================================ > | Jason Welsh jason at monsterjam.org | > | http://monsterjam.org DSS PGP: 0x5E30CC98 | > | gpg key: http://monsterjam.org/gpg/ | > ================================================ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- ================================================ | Jason Welsh jason at monsterjam.org | | http://monsterjam.org DSS PGP: 0x5E30CC98 | | gpg key: http://monsterjam.org/gpg/ | ================================================ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From cosimo at streppone.it Fri May 12 12:54:10 2006 From: cosimo at streppone.it (Cosimo Streppone) Date: Fri, 12 May 2006 14:54:10 +0200 Subject: [Linux-cluster] RHCS4 heartbeat configuration Message-ID: <44648572.6030105@streppone.it> I read the CS4 manual, but I can't seem to find a way to configure the heartbeat behaviour. I have a 2-nodes cluster and I'd like to set up main network interface on eth0 and heartbeat interface on eth1 (or serial port? or both?). Do I need to run the piranha_gui? I'm using a different httpd, not that shipped with RHEL, so I'm having a hard time running piranha_gui... Is it possible to manually configure it? In this case, what is the configuration file for heartbeat? I found an empty lvs.cf but I don't know what it is. Maybe I'm asking too many questions... If I missed the obvious, please point me to the right manual section where this is explained. Thank you. -- Cosimo From lhh at redhat.com Fri May 12 14:04:20 2006 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 12 May 2006 10:04:20 -0400 Subject: [Linux-cluster] IP resource In-Reply-To: <20060511150034.GA32599@warszawa.7bulls.com> References: <20060511150034.GA32599@warszawa.7bulls.com> Message-ID: <1147442660.11396.134.camel@ayanami.boston.redhat.com> On Thu, 2006-05-11 at 17:00 +0200, Roman Tobjasz wrote: > I configured two node cluster. > On each node I created bonding device (bond0) as primary network > interface. > On the 1st node bond0 I assigned IP address 192.168.1.100 (network > 192.168.1.0, netmask 255.255.255.0). > On the 2nd node bond0 I assigned IP address 192.168.1.101 (network and > netmask like above). > Next I created IP address 172.16.10.10 as a resource and added it to a > service. > Service doesn't start. If I change resource IP to 192.168.1.200 > then service starts corectly. > > Is it possible to set up resource IP which isn't from this same > network as primary network interface ? Not currently. The IP address selects its interface based on existing IP addresses. Ex. If you have eth0 on 192.168.0.0/16 and eth1 on 172.16.0.0/16, and add an IP 192.168.1.2, it will go on eth0. If you add IP 172.16.1.2, it will go on eth1. Why is it done this way? It's done this way because cluster nodes are not assumed to have all NICs assigned the same ways. For example, if you tell an IP to always bind to eth0, and another node has eth0 on another network (but eth1 on the correct network), the IP will be added to the wrong interface. The link will be up, but the service will be completely inaccessible to clients. The easy solution, I think, is to just add an IP to your bond0 interface which is on the subnet, even if you shut off all traffic to that interface using iptables. -- Lon From lhh at redhat.com Fri May 12 14:11:10 2006 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 12 May 2006 10:11:10 -0400 Subject: [Linux-cluster] RHCS4 heartbeat configuration In-Reply-To: <44648572.6030105@streppone.it> References: <44648572.6030105@streppone.it> Message-ID: <1147443070.11396.136.camel@ayanami.boston.redhat.com> On Fri, 2006-05-12 at 14:54 +0200, Cosimo Streppone wrote: > I read the CS4 manual, but I can't seem to find > a way to configure the heartbeat behaviour. > > I have a 2-nodes cluster and I'd like to set up > main network interface on eth0 and heartbeat > interface on eth1 (or serial port? or both?). > > Do I need to run the piranha_gui? Nope -- not specifically, but it's a whole lot easier. > I'm using a different httpd, not that shipped with RHEL, > so I'm having a hard time running piranha_gui... That isn't surprising. ;) > Is it possible to manually configure it? man 5 lvs.cf > In this case, what is the configuration file for heartbeat? For piranha, /etc/sysconfig/ha/lvs.cf -- Lon From lhh at redhat.com Fri May 12 14:11:46 2006 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 12 May 2006 10:11:46 -0400 Subject: [Linux-cluster] Re: Redhat Cluster In-Reply-To: <1147401781.4033.4.camel@localhost.localdomain> References: <1147401781.4033.4.camel@localhost.localdomain> Message-ID: <1147443106.11396.138.camel@ayanami.boston.redhat.com> On Fri, 2006-05-12 at 08:13 +0530, Nitin wrote: > I installed redhat cluster suite on 2 node cluster, i configured > NFS service by using NFS druid. But we i am going to start the service > buy clicking start cluster locally only one node it is showing started > but when i go to other node and start the cluster both nodes are > restarted. When i start cluster buy issuing command clumanager in both > the node again the nodes are restarting. > > Please tell the solution for this problem. What version of clumanager is it? -- Lon From vlaurenz at advance.net Fri May 12 20:11:26 2006 From: vlaurenz at advance.net (Vito Laurenza) Date: Fri, 12 May 2006 16:11:26 -0400 Subject: [Linux-cluster] CS4 & RHEL4: File System Errors Message-ID: <4464EBEE.8030802@advance.net> I'm having some issues with Cluster Suite 4 on RHEL4 u3. I have a two node cluster with shared storage on a SAN. The service is running in an Active-Passive environment hence only one node should be accessing the file system at a time. I *was* having a problem where a manual failover (clusvcadm -r ) would fail time to time because the unmounting the file system failed due to being "busy". I then added force_unmount=1 to my fs tag in cluster.conf to ensure the file system was unmounted. This seemed to solve the failover failures, however, I began to get journal errors on my shared storage. 1. Are these errors cropping up because of the forced unmount? 2. How can I ensure that a mount or unmount done by CS is clean? Thanks in advance. :::: Vito Laurenza :: Systems Administrator :: Advance Internet :: 201.793.1807 :: vlaurenz at advance.net From cosimo at streppone.it Fri May 12 21:36:56 2006 From: cosimo at streppone.it (Cosimo Streppone) Date: Fri, 12 May 2006 23:36:56 +0200 Subject: [Linux-cluster] RHCS4 heartbeat configuration In-Reply-To: <1147443070.11396.136.camel@ayanami.boston.redhat.com> References: <44648572.6030105@streppone.it> <1147443070.11396.136.camel@ayanami.boston.redhat.com> Message-ID: <4464FFF8.3010000@streppone.it> Lon Hohberger wrote: > On Fri, 2006-05-12 at 14:54 +0200, Cosimo Streppone wrote: > >>I read the CS4 manual, but I can't seem to find >>a way to configure the heartbeat behaviour. > > man 5 lvs.cf Don't know why I didn't think of firing up man... :-) Thanks for your assistance. -- Cosimo From jason at monsterjam.org Sat May 13 02:34:16 2006 From: jason at monsterjam.org (Jason) Date: Fri, 12 May 2006 22:34:16 -0400 Subject: [Linux-cluster] question about creating partitions and gfs (getting closer!) In-Reply-To: References: <20060512015149.GB64851@monsterjam.org> Message-ID: <20060513023416.GA62167@monsterjam.org> woohoo! I got it figgered out.. Ive got /dev/sdb1 (10 megs) /dev/sdb2 (rest of disk) make the pools, did the ccs_tool create , did service ccsd start did service lock_gulmd start (but had to figger out my DNS issues first ;) now im at the point where I do gfs_mkfs -p lock_gulm -t bla bla and so now im doing [root at tf1 cluster]# gfs_mkfs -p lock_gulm -t progressive:gfs1 -j 8 /dev/pool/pool0 gfs_mkfs: Partition too small for number/size of journals [root at tf1 cluster]# gfs_mkfs -p lock_gulm -t progressive:gfs1 -j 4 /dev/pool/pool0 gfs_mkfs: Partition too small for number/size of journals [root at tf1 cluster]# gfs_mkfs -p lock_gulm -t progressive:gfs1 -j 2 /dev/pool/pool0 gfs_mkfs: Partition too small for number/size of journals [root at tf1 cluster]# and cant figure out why its giving me grief heres my pools config. poolname pool0 #name of the pool/volume to create subpools 1 #how many subpools make up this subpool 0 128 2 gfs_data #first subpool, zero indexed, 128k stripe, 1 pooldevice 0 0 /dev/sdb1 #physical device for pool 0, device 0 (again, zero indexed) pooldevice 0 1 /dev/sdb2 #physical device for pool 0, device 1 (again, zero indexed) regards, Jason From sunjw at onewaveinc.com Sat May 13 04:10:50 2006 From: sunjw at onewaveinc.com (=?GB2312?B?y++/oc6w?=) Date: Sat, 13 May 2006 12:10:50 +0800 Subject: [Linux-cluster] gfs withdrawed in function blkalloc_internal Message-ID: Hi,all I have a test cluster with 3 nodes which are nd09, nd10 and nd12. The cluster software is the newest branch of STABLE, the kernel is 2.6.15. In nd12: I have 11 process to sequentially write to the GFS without speed limit, each process will remove an oldest file after write finish of a newest file. 1 process to do 'ls' of the whole GFS. 200 thread to concurrently read 200 files which are written by the above processes. 5 process to do 'df' of the GFS with 0.5 second interval. In nd10: I have 1 process to write. 200 thread to read the same files in nd12. 1 process to do 'ls'. 5 process to do 'df'. In nd09: 200 thread to read the same files in nd12. 1 process to do 'ls'. 5 process to do 'df'. After about 10 hours of the test, gfs withdrawed in node nd10 and nd12, the messages were: <-- May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2: fatal: assertion "x <= length" failed May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2: function = blkalloc_internal May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2: file = /home/sunjw/projects/cluster.STABLE/gfs- kernel/src/gfs/rgrp.c , line = 1458 May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2: time = 1147476646 May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2: about to withdraw from the cluster May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2: waiting for outstanding I/O May 13 07:30:47 nd12 kernel: GFS: fsid=test:gfs-dm1.2: telling LM to withdraw May 13 07:30:49 nd12 kernel: lock_dlm: withdraw abandoned memory May 13 07:30:49 nd12 kernel: GFS: fsid=test:gfs-dm1.2: withdrawn May 13 07:30:54 nd10 kernel: GFS: fsid=test:gfs-dm1.1: jid=2: Trying to acquire journal lock... May 13 07:30:54 nd10 kernel: GFS: fsid=test:gfs-dm1.1: jid=2: Busy May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1: fatal: assertion "x <= length" failed May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1: function = blkalloc_internal May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1: file = /home/sunjw/projects/cluster.STABLE/gfs- kernel/src/gfs/rgrp.c , line = 1458 May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1: time = 1147477010 May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1: about to withdraw from the cluster May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1: waiting for outstanding I/O May 13 07:36:51 nd10 kernel: GFS: fsid=test:gfs-dm1.1: telling LM to withdraw May 13 07:36:54 nd10 kernel: lock_dlm: withdraw abandoned memory May 13 07:36:54 nd10 kernel: GFS: fsid=test:gfs-dm1.1: withdrawn May 13 01:20:05 nd09 kernel: dlm: gfs-dm1: process_lockqueue_reply id 62f203f3 state 0 May 13 01:41:09 nd09 kernel: dlm: gfs-dm1: process_lockqueue_reply id 6fa600de state 0 May 13 07:28:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Trying to acquire journal lock... May 13 07:28:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Looking at journal... May 13 07:28:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Acquiring the transaction lock... May 13 07:28:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Replaying journal... May 13 07:28:48 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Replayed 160 of 532 blocks May 13 07:28:48 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: replays = 160, skips = 99, sames = 273 May 13 07:28:48 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Journal replayed in 1s May 13 07:28:48 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=2: Done May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Trying to acquire journal lock... May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Looking at journal... May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Acquiring the transaction lock... May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Replaying journal... May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Replayed 6 of 71 blocks May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: replays = 6, skips = 4, sames = 61 May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Journal replayed in 1s May 13 07:34:47 nd09 kernel: GFS: fsid=test:gfs-dm1.0: jid=1: Done --> The clock of 3 nodes are not in synchronization. What should be the problem? Thanks for any reply, Luckey From jason at monsterjam.org Sat May 13 17:49:09 2006 From: jason at monsterjam.org (Jason) Date: Sat, 13 May 2006 13:49:09 -0400 Subject: [Linux-cluster] question about creating partitions and gfs (getting closer!) In-Reply-To: <20060513023416.GA62167@monsterjam.org> References: <20060512015149.GB64851@monsterjam.org> <20060513023416.GA62167@monsterjam.org> Message-ID: <20060513174909.GB49184@monsterjam.org> ok, figured that out too.. http://www.redhat.com/archives/linux-cluster/2005-January/msg00032.html is what helped. one last newbie question.. (i hope) I had to mount my new gfs filesystem manually with mount -t gfs /dev/pool/gfs1 /mnt/gfs/ the service gfs start did nothing.. returned a prompt seemingly without doing anything.. no errors, nothing in syslog.. nuthing.. hopefully Ill figger this out too. Jason On Fri, May 12, 2006 at 10:34:16PM -0400, Jason wrote: > woohoo! > I got it figgered out.. > Ive got > /dev/sdb1 (10 megs) > /dev/sdb2 (rest of disk) > make the pools, did the ccs_tool create , > did service ccsd start > did service lock_gulmd start (but had to figger out my DNS issues first ;) > now im at the point where I do > gfs_mkfs -p lock_gulm -t bla bla > > and so now im doing > > [root at tf1 cluster]# gfs_mkfs -p lock_gulm -t progressive:gfs1 -j 8 /dev/pool/pool0 > gfs_mkfs: Partition too small for number/size of journals > [root at tf1 cluster]# gfs_mkfs -p lock_gulm -t progressive:gfs1 -j 4 /dev/pool/pool0 > gfs_mkfs: Partition too small for number/size of journals > [root at tf1 cluster]# gfs_mkfs -p lock_gulm -t progressive:gfs1 -j 2 /dev/pool/pool0 > gfs_mkfs: Partition too small for number/size of journals > [root at tf1 cluster]# > > and cant figure out why its giving me grief > > heres my pools config. > > poolname pool0 #name of the pool/volume to create > subpools 1 #how many subpools make up this > subpool 0 128 2 gfs_data #first subpool, zero indexed, 128k stripe, 1 > pooldevice 0 0 /dev/sdb1 #physical device for pool 0, device 0 (again, zero indexed) > pooldevice 0 1 /dev/sdb2 #physical device for pool 0, device 1 (again, zero indexed) > > > regards, > Jason > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From jason at monsterjam.org Sat May 13 18:03:24 2006 From: jason at monsterjam.org (Jason) Date: Sat, 13 May 2006 14:03:24 -0400 Subject: [Linux-cluster] question about creating partitions and gfs (RESOLVED) In-Reply-To: <20060513174909.GB49184@monsterjam.org> References: <20060512015149.GB64851@monsterjam.org> <20060513023416.GA62167@monsterjam.org> <20060513174909.GB49184@monsterjam.org> Message-ID: <20060513180324.GC49184@monsterjam.org> had an entry in the /etc/fstab that it didnt like.. now I have /dev/pool/gfs1 /mnt/gfs gfs noatime 0 0 in the fstab.. that look sane? Jason From jason at monsterjam.org Sat May 13 19:07:36 2006 From: jason at monsterjam.org (Jason) Date: Sat, 13 May 2006 15:07:36 -0400 Subject: [Linux-cluster] question about rebooting master server Message-ID: <20060513190736.GE49184@monsterjam.org> so I have both servers tf1, and tf2 connected to shared storage Dell 220S with 6.0.2. They both seem to access the shared storage fine, but if I reboot the node thats the master, the slave cannot access the shared storage until the master comes back up.. heres the info from the logs. (reboot of tf2, and this is the log on tf1) May 13 14:55:29 tf1 heartbeat: [5333]: info: local resource transition completed. May 13 14:55:35 tf1 kernel: lock_gulm: Checking for journals for node "tf2.localdomain" May 13 14:55:35 tf1 lock_gulmd_core[5007]: Master Node has logged out. May 13 14:55:35 tf1 kernel: lock_gulm: Checking for journals for node "tf2.localdomain" May 13 14:55:36 tf1 lock_gulmd_core[5007]: I see no Masters, So I am Arbitrating until enough Slaves talk to me. May 13 14:55:36 tf1 lock_gulmd_core[5007]: Could not send quorum update to slave tf1.localdomain May 13 14:55:36 tf1 lock_gulmd_LTPX[5014]: New Master at tf1.localdomain:192.168.1.5 May 13 14:55:57 tf1 lock_gulmd_core[5007]: Timeout (15000000) on fd:6 (tf2.localdomain:192.168.1.6) May 13 14:56:32 tf1 last message repeated 2 times May 13 14:57:40 tf1 last message repeated 4 times May 13 14:58:31 tf1 last message repeated 3 times May 13 14:58:45 tf1 lock_gulmd_core[5007]: Now have Slave quorum, going full Master. May 13 14:58:45 tf1 lock_gulmd_core[5007]: New Client: idx:2 fd:6 from (192.168.1.6:tf2.localdomain) May 13 14:58:45 tf1 lock_gulmd_LT000[5010]: New Client: idx 2 fd 7 from (192.168.1.5:tf1.localdomain) May 13 14:58:45 tf1 lock_gulmd_LTPX[5014]: Logged into LT000 at tf1.localdomain:192.168.1.5 May 13 14:58:45 tf1 lock_gulmd_LTPX[5014]: Finished resending to LT000 May 13 14:58:46 tf1 lock_gulmd_LT000[5010]: Attached slave tf2.localdomain:192.168.1.6 idx:3 fd:8 (soff:3 connected:0x8) May 13 14:58:46 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Trying to acquire journal lock... May 13 14:58:46 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Looking at journal... May 13 14:58:47 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Done May 13 14:58:47 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Trying to acquire journal lock... May 13 14:58:47 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Busy May 13 14:58:47 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Trying to acquire journal lock... May 13 14:58:47 tf1 kernel: GFS: fsid=progressive:gfs1.0: jid=1: Busy May 13 14:58:47 tf1 lock_gulmd_LT000[5010]: New Client: idx 4 fd 9 from (192.168.1.6:tf2.localdomain) is this normal? I would assume that when the master was rebooted, the other node should still be able to access the storage with no problems. regards, Jason From kanderso at redhat.com Sat May 13 19:37:23 2006 From: kanderso at redhat.com (Kevin Anderson) Date: Sat, 13 May 2006 14:37:23 -0500 Subject: [Linux-cluster] question about rebooting master server In-Reply-To: <20060513190736.GE49184@monsterjam.org> References: <20060513190736.GE49184@monsterjam.org> Message-ID: <1147549043.3077.3.camel@localhost.localdomain> On Sat, 2006-05-13 at 15:07 -0400, Jason wrote: > so I have both servers tf1, and tf2 connected to shared storage Dell 220S with 6.0.2. > They both seem to access the shared storage fine, but if I reboot the node thats the master, > the slave cannot access the shared storage until the master comes back up.. > heres the info from the logs. > > is this normal? I would assume that when the master was rebooted, the other node should still be able to > access the storage with no problems. > Yes it is normal. The gulm lock manager requires a minimum of 3 nodes in order to be able to determine who is master. With only two nodes running and you lose one, the remaining node has no way to determine that you are not in a split brain situation. So, the lock manager waits until quorum is restored. For a two node cluster, you need to be running the GFS 6.1 and DLM for a lock manager on a 2.6 kernel. Kevin From jason at monsterjam.org Sat May 13 19:46:33 2006 From: jason at monsterjam.org (Jason) Date: Sat, 13 May 2006 15:46:33 -0400 Subject: [Linux-cluster] question about rebooting master server In-Reply-To: <1147549043.3077.3.camel@localhost.localdomain> References: <20060513190736.GE49184@monsterjam.org> <1147549043.3077.3.camel@localhost.localdomain> Message-ID: <20060513194633.GG49184@monsterjam.org> > Yes it is normal. The gulm lock manager requires a minimum of 3 nodes > in order to be able to determine who is master. With only two nodes > running and you lose one, the remaining node has no way to determine > that you are not in a split brain situation. So, the lock manager waits > until quorum is restored. For a two node cluster, you need to be > running the GFS 6.1 and DLM for a lock manager on a 2.6 kernel. aww man that blows.. ok, so assuming I get this Red Hat Enterprise Linux AS release 3 up to the 2.6 kernel and reinstall from the srpms at ftp.redhat.com:/pub/redhat/linux/enterprise/4/en/RHGFS/i386/SRPMS does anyone forsee any problems?? I mean running the 6.1 GFS on the 2.6 kernel on a base Red Hat Enterprise Linux AS release 3 box? Jason From ookami at gmx.de Sat May 13 22:36:13 2006 From: ookami at gmx.de (Wolfgang Pauli) Date: Sat, 13 May 2006 16:36:13 -0600 Subject: [Linux-cluster] cman ignores interface setting on ipv4 Message-ID: <200605131636.13466.ookami@gmx.de> I am running the fc4 and installed the cluster tools with yum. I think my cman version is 1.0.0, is it possible that this version still ignores the interface settings. Does anybody know, where to get newer versions of the cluster software without compiling it myself? Thanks! > Hi, > > The current ipv4 code in the stable branch for cman completely ignores > the interface="" attribute for multicast. I've attached a minimal patch > that fixes that. > > I've only done minimal testing (ie it works here).. it will probably > break if there is no interface set, etc.. Have fun ;) > > -- > Olivier Cr?te > ocrete max-t com > Maximum Throughput Inc. > > Index: cman/cman_tool/join.c > =================================================================== > RCS file: /cvs/cluster/cluster/cman/cman_tool/join.c,v > retrieving revision 1.12.2.7.4.1 > diff -u -r1.12.2.7.4.1 join.c > --- cman/cman_tool/join.c 31 May 2005 15:08:24 -0000 > 1.12.2.7.4.1 +++ cman/cman_tool/join.c 19 Jul 2005 22:14:45 -0000 > @@ -79,6 +79,7 @@ > int ret; > int he_errno; > uint32_t bcast; > + struct ifreq ifr; > > memset(&mcast_sin, 0, sizeof(mcast_sin)); > mcast_sin.sin_family = AF_INET; > @@ -148,11 +149,14 @@ > > /* Join the multicast group */ > if (bhe) { > - struct ip_mreq mreq; > + struct ip_mreqn mreq; > char mcast_opt; > > memcpy(&mreq.imr_multiaddr, bhe->h_addr, bhe->h_length); > - memcpy(&mreq.imr_interface, he->h_addr, he->h_length); > + // memcpy(&mreq.imr_address, he->h_addr, he->h_length); > + mreq.imr_ifindex = if_nametoindex(comline->interfaces[num]); > + printf("num %d index %d if %s mcastname %s nodename %s\n", num, > mreq.imr_ifindex, comline->interfaces[num], comline->multicast_names[num], > comline->nodenames[num]); + > if (setsockopt(mcast_sock, SOL_IP, IP_ADD_MEMBERSHIP, (void > *)&mreq, sizeof(mreq))) die("Unable to join multicast group %s: %s\n", > comline->multicast_names[num], strerror(errno)); > > @@ -162,6 +166,11 @@ > > mcast_opt = 0; > setsockopt(mcast_sock, SOL_IP, IP_MULTICAST_LOOP, (void > *)&mcast_opt, sizeof(mcast_opt)); + > + if (setsockopt(mcast_sock, SOL_IP, IP_MULTICAST_IF, (void *)&mreq, > sizeof(mreq))) { + die("Unable to set multicast interface > %s\n", comline->interfaces[num]); + } > + > } > > /* Local socket */ > @@ -169,6 +178,17 @@ > if (local_sock < 0) > die("Can't open local socket: %s", strerror(errno)); > > + strcpy(ifr.ifr_name, comline->interfaces[num]); > + ifr.ifr_addr.sa_family = AF_INET; > + > + if (ioctl(local_sock, SIOCGIFADDR, &ifr ) < 0) > + die("Can't find IP ADDR for interface: %s", strerror(errno)); > + > + > + > + memcpy(&local_sin.sin_addr, &((struct sockaddr_in > *)&ifr.ifr_addr)->sin_addr, + sizeof(local_sin.sin_addr)); > + > if (bind(local_sock, (struct sockaddr *)&local_sin, > sizeof(local_sin))) die("Cannot bind local address: %s", strerror(errno)); From ookami at gmx.de Sat May 13 23:23:18 2006 From: ookami at gmx.de (Wolfgang Pauli) Date: Sat, 13 May 2006 17:23:18 -0600 Subject: [Linux-cluster] magma and rgmanager compile error Message-ID: <200605131723.18491.ookami@gmx.de> I checked out the stable cluster software from cvs and got these two compile errors: first error: magma/lib/message.c: In function ?connect_nb?: message.c:270: warning: pointer targets in passing argument 5 of ?getsockopt? differ in signedness diff message.c message.c.~1.9.2.2.~ < int ret, flags = 1, err; < unsigned int l; --- > int ret, flags = 1, err, l; 2nd error: In file included from clulog.c:49: ../../include/clulog.h:49: error: multiple storage classes in declaration specifiers clulog.c:67: error: static declaration of ?loglevel? follows non-static declaration ../../include/clulog.h:49: error: previous declaration of ?loglevel? was here make[2]: *** [clulog.o] Error 1 rgmanager/src/clulib: diff clulog.c clulog.c.~1.2.2.1.~ 67c67 < int loglevel = LOGLEVEL_DFLT; --- > static int loglevel = LOGLEVEL_DFLT; Cheers, wolfgang From pauli at grey.colorado.edu Sun May 14 04:59:01 2006 From: pauli at grey.colorado.edu (Wolfgang Pauli) Date: Sat, 13 May 2006 22:59:01 -0600 Subject: [Linux-cluster] multicast howto In-Reply-To: References: Message-ID: <200605132259.01964.pauli@grey.colorado.edu> OK. I still did not get it to work. But in the meantime I simplified the setup so that we only have two subnets divided only by the head node. I was hoping that somebody could give me some hints how I can get this to work, so here is a detailed description of our setup: We have a computing cluster running myrinet and all this nodes are on a private subnet, in the middle there is our headnode with two ethernet devices one connected to the myrinet guys and the other one to our labmachines. This second ethernet device is now also on the same subnet as the labmachines. I was hoping that we won't need the multicast setup anymore. Currently the headnode is exporting a nfs filesystem, but we want to switch it to gfs. I head the multihome setup from the wiki working, but dlm does not support it, so I canceled that. We can not use ethernet bonding (i guess), because than the nodes would not find the headnode anymore as it would then be on a different subnet. I thought this is like a standard setup, but it seems to be strange. I wish there was more documentation. Thanks for any hints. wolfgang From roman.tobjasz at 7bulls.com Thu May 11 06:54:57 2006 From: roman.tobjasz at 7bulls.com (Roman Tobjasz) Date: Thu, 11 May 2006 08:54:57 +0200 Subject: [Linux-cluster] IP resource Message-ID: <20060511065457.GD24151@warszawa.7bulls.com> I configured two node cluster. On each node I created bonding device (bond0) as primary network interface. On the 1st node bond0 I assigned IP address 192.168.1.100 (network 192.168.1.0, netmask 255.255.255.0). On the 2nd node bond0 I assigned IP address 192.168.1.101 (network and netmask like above). Next I created IP address 172.16.10.10 as a resource and added it to a service. Service doesn't start. If I change resource IP to 192.168.1.200 then service starts corectly. Is it possible to set up resource IP which isn't from this same network as primary network interface ? Best regards. From pcaulfie at redhat.com Mon May 15 07:45:14 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Mon, 15 May 2006 08:45:14 +0100 Subject: [Linux-cluster] cman ignores interface setting on ipv4 In-Reply-To: <200605131636.13466.ookami@gmx.de> References: <200605131636.13466.ookami@gmx.de> Message-ID: <4468318A.9030901@redhat.com> Wolfgang Pauli wrote: > I am running the fc4 and installed the cluster tools with yum. I think my cman > version is 1.0.0, is it possible that this version still ignores the > interface settings. Does anybody know, where to get newer versions of the > cluster software without compiling it myself? > Yes, 1.0.0 has that bug. The easiest way to get that version to use the interface you want is to use the host name assigned to only that interface. eg if you have two interfaces: eth0 node1 eth1 node1a then use node1a as the nodename in cluster.conf rather than node1 -- patrick From stephen.willey at framestore-cfc.com Mon May 15 10:46:22 2006 From: stephen.willey at framestore-cfc.com (Stephen Willey) Date: Mon, 15 May 2006 11:46:22 +0100 Subject: [Linux-cluster] Re: gfs_fsck problems (not doing get_get_meta_buffer) In-Reply-To: <44634A44.9060309@framestore-cfc.com> References: <44634A44.9060309@framestore-cfc.com> Message-ID: <44685BFE.3010301@framestore-cfc.com> Having looked into this a bit, it appears that gfs_fsck doesn't like large drives. It works fine on a 137Gb drive but fails instantly with the below symptoms on a 10Tb RAID. Is it still the case that GFS is not scalable to very large filesystems? Stephen Stephen Willey wrote: > gfs_fsck seems to break my filesystem! > > Here's the sequence of events (everything acts as expected unless I > state otherwise): > > pvcreate /dev/sda; pvcreate /dev/sdb > vgcreate gfs_vg /dev/sda /dev/sdb > vgdisplay > lvcreate -l 4171379 gfs_vg -n gfs_lv (the extents number obviously > gleaned from vgdisplay) > vgchange -aly > gfs_mkfs -p lock_dlm -t mycluster:gfs1 -j 8 /dev/gfs_vg/gfs_lv > > mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2 > df -h /mnt/disk2 > cd /mnt/disk2 > touch 1 2 3 4 5 6 7 8 9 10 > ls -lh > > cd .. > umount /mnt/disk2 > gfs_fsck -nvv /dev/gfs_vg/gfs_lv (output below - notice I'm running it > read-only) > > Initializing fsck > Initializing lists... > Initializing special inodes... > (file.c:45) readi: Offset (640) is >= the file size (640). > (super.c:208) 8 journals found. > ATTENTION -- not doing gfs_get_meta_buffer... > > mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2 > cd /mnt/disk2 (successful) > ls -lh (successful) > > cd .. > umount /mnt/disk2 > gfs_fsck -vv /dev/gfs_vg/gfs_lv (output below) > > Initializing fsck > Initializing lists... > (bio.c:140) Writing to 65536 - 16 4096 > Initializing special inodes... > (file.c:45) readi: Offset (640) is >= the file size (640). > (super.c:208) 8 journals found. > ATTENTION -- not doing gfs_get_meta_buffer... > > mount -t gfs /dev/gfs_vg/gfs_lv /mnt/disk2 (output below) > > mount: No such file or directory > > The syslog shows: > > Lock_Harness 2.6.9-34.R5.2 (built May 11 2006 14:15:58) installed > May 11 15:12:43 gfstest1 kernel: GFS 2.6.9-34.R5.2 (built May 11 2006 > 14:16:10) installed > May 11 15:12:43 gfstest1 kernel: GFS: Trying to join cluster "fsck_dlm", > "mycluster:gfs1" > May 11 15:12:43 gfstest1 kernel: lock_harness: can't find protocol fsck_dlm > May 11 15:12:43 gfstest1 kernel: GFS: can't mount proto = fsck_dlm, > table = mycluster:gfs1, hostdata = > May 11 15:12:43 gfstest1 mount: mount: No such file or directory > May 11 15:12:43 gfstest1 gfs: Mounting GFS filesystems: failed > > If I use the following to change the lock method, I can mount it again: > > gfs_tool sb /dev/gfs_vg/gfs_lv proto lock_dlm > > but shortly after I'll sometimes get I/O errors on the drive not letting > me cd into it or ls or df. > > fsck isn't supposed to break clean filesystems so does anyone have any > ideas? > > FYI - The other machines in the cluster were at no point mounting the > filesystem during this exercise. > > Stephen > From stephen.willey at framestore-cfc.com Mon May 15 10:50:23 2006 From: stephen.willey at framestore-cfc.com (Stephen Willey) Date: Mon, 15 May 2006 11:50:23 +0100 Subject: [Linux-cluster] Re: Size limits of the various components In-Reply-To: <44609694.7060609@framestore-cfc.com> References: <44609694.7060609@framestore-cfc.com> Message-ID: <44685CEF.9070600@framestore-cfc.com> I've had no replies to this but following the recent failure of gfs_fsck I'm guessing GFS still doesn't scale well. Or am I missing something? Stephen Stephen Willey wrote: > We're testing GFS on 64 bit servers/64 bit RHEL4 and need to know how > big LVM2 and GFS will scale. > > Can anyone tell me the maximum sizes of these component parts: > > GFS filesystem > (C)LVM2 logical volume > (C)LVM2 volume group > (C)LVM2 physical volumes > > We're considering building a filesystem that may need to scale to 100Tb > or more and I've found various different answers on this list and elsewhere. > > Stephen > From Jon.Stanley at savvis.net Mon May 15 13:21:08 2006 From: Jon.Stanley at savvis.net (Stanley, Jon) Date: Mon, 15 May 2006 08:21:08 -0500 Subject: [Linux-cluster] Re: Size limits of the various components Message-ID: <9A6FE0FCC2B29846824C5CD81C6647B901DE25D5@s228130hz1ew08.apptix-01.savvis.net> > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Stephen Willey > Sent: Monday, May 15, 2006 6:50 AM > To: linux-cluster at redhat.com > Cc: Daire Byrne > Subject: [Linux-cluster] Re: Size limits of the various components > > I've had no replies to this but following the recent failure > of gfs_fsck > I'm guessing GFS still doesn't scale well. > > Or am I missing something? > > Stephen > All of this being said, I've found that a filesystem of any type really doesn't scale well beyond 500GB. Not a technioal limitation really, but rather one imposed by backup limitations - at the restore rates that we see (using tape), that would take over a *year* to restore a 100TB filesystem. By splitting it, you have the advantage of being able to use multiple tape drives and simultaneous restore sessions. I'm assuming that either the system is not backed up and the data is not critical, or you have some other method for restoring the filesystem should it go south???? From awone at arrow.com Mon May 15 13:11:12 2006 From: awone at arrow.com (Allen Wone) Date: Mon, 15 May 2006 13:11:12 +0000 (UTC) Subject: [Linux-cluster] Re: gfs withdrawed in function blkalloc_internal References: Message-ID: Have you gotten any resolution on this? I am having the same issue. From teigland at redhat.com Mon May 15 20:14:37 2006 From: teigland at redhat.com (David Teigland) Date: Mon, 15 May 2006 15:14:37 -0500 Subject: [Linux-cluster] Re: gfs withdrawed in function blkalloc_internal In-Reply-To: References: Message-ID: <20060515201437.GB9050@redhat.com> On Sat, May 13, 2006 at 12:10:50PM +0800, ?????? wrote: > The clock of 3 nodes are not in synchronization. > What should be the problem? I can't explain the assertion, it wouldn't be caused by the clocks. Unsynchronized clocks can slow down gfs significantly, though, by causing constant inode atime updating/locking. Dave From teigland at redhat.com Mon May 15 20:21:54 2006 From: teigland at redhat.com (David Teigland) Date: Mon, 15 May 2006 15:21:54 -0500 Subject: [Linux-cluster] Re: gfs_fsck problems (not doing get_get_meta_buffer) In-Reply-To: <44685BFE.3010301@framestore-cfc.com> References: <44634A44.9060309@framestore-cfc.com> <44685BFE.3010301@framestore-cfc.com> Message-ID: <20060515202154.GC9050@redhat.com> On Mon, May 15, 2006 at 11:46:22AM +0100, Stephen Willey wrote: > Having looked into this a bit, it appears that gfs_fsck doesn't like > large drives. > > It works fine on a 137Gb drive but fails instantly with the below > symptoms on a 10Tb RAID. > > Is it still the case that GFS is not scalable to very large filesystems? It's probably more a case of no one ever even trying trying fsck on a fs that large given how long it would probably take. Dave From mwill at penguincomputing.com Mon May 15 15:29:01 2006 From: mwill at penguincomputing.com (Michael Will) Date: Mon, 15 May 2006 08:29:01 -0700 Subject: [Linux-cluster] Re: gfs_fsck problems (not doingget_get_meta_buffer) Message-ID: <433093DF7AD7444DA65EFAFE3987879C125D05@jellyfish.highlyscyld.com> Is the fs code closer to ext than to say xfs? -----Original Message----- From: David Teigland [mailto:teigland at redhat.com] Sent: Mon May 15 08:23:42 2006 To: Stephen Willey Cc: Daire Byrne; linux-cluster at redhat.com Subject: Re: [Linux-cluster] Re: gfs_fsck problems (not doingget_get_meta_buffer) On Mon, May 15, 2006 at 11:46:22AM +0100, Stephen Willey wrote: > Having looked into this a bit, it appears that gfs_fsck doesn't like > large drives. > > It works fine on a 137Gb drive but fails instantly with the below > symptoms on a 10Tb RAID. > > Is it still the case that GFS is not scalable to very large filesystems? It's probably more a case of no one ever even trying trying fsck on a fs that large given how long it would probably take. Dave -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen.willey at framestore-cfc.com Mon May 15 15:33:03 2006 From: stephen.willey at framestore-cfc.com (Stephen Willey) Date: Mon, 15 May 2006 16:33:03 +0100 Subject: [Linux-cluster] Re: gfs_fsck problems (not doing get_get_meta_buffer) In-Reply-To: <20060515202154.GC9050@redhat.com> References: <44634A44.9060309@framestore-cfc.com> <44685BFE.3010301@framestore-cfc.com> <20060515202154.GC9050@redhat.com> Message-ID: <44689F2F.6000002@framestore-cfc.com> I filed a bug with redhat on this and it was a duplicate of this bug: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=186125 Stephen David Teigland wrote: > On Mon, May 15, 2006 at 11:46:22AM +0100, Stephen Willey wrote: >> Having looked into this a bit, it appears that gfs_fsck doesn't like >> large drives. >> >> It works fine on a 137Gb drive but fails instantly with the below >> symptoms on a 10Tb RAID. >> >> Is it still the case that GFS is not scalable to very large filesystems? > > It's probably more a case of no one ever even trying trying fsck on a fs > that large given how long it would probably take. > > Dave > From teigland at redhat.com Mon May 15 20:40:45 2006 From: teigland at redhat.com (David Teigland) Date: Mon, 15 May 2006 15:40:45 -0500 Subject: [Linux-cluster] Re: gfs_fsck problems (not doingget_get_meta_buffer) In-Reply-To: <433093DF7AD7444DA65EFAFE3987879C125D05@jellyfish.highlyscyld.com> References: <433093DF7AD7444DA65EFAFE3987879C125D05@jellyfish.highlyscyld.com> Message-ID: <20060515204045.GD9050@redhat.com> On Mon, May 15, 2006 at 08:29:01AM -0700, Michael Will wrote: > Is the fs code closer to ext than to say xfs? I wouldn't say GFS code is close to anything. Dave From jmendler at ucla.edu Mon May 15 15:46:24 2006 From: jmendler at ucla.edu (Jordan Mendler) Date: Mon, 15 May 2006 08:46:24 -0700 Subject: [Linux-cluster] RHEL AS4 Server Farming options Message-ID: <1147707984.1177.9.camel@localhost.localdomain> I am looking to build a server farm for 5-10 RHEL AS4 web servers that handle at a given time anywhere from 30-50 different web domains and their sites (using AOLServer). I am in the preliminary steps on researching this and so far the only program I have heard about is Lnux Virtual Server. Can anyone tell me if there are any other software options to consider for this project aside from LVS and the pro's and cons of LVS versus something else? Also if anyone has any pointers, tips, praise or criticism, or other good information I would love to hear it now before I start building the farm and it is too late. Lastly any good websites or other documentation aside from the RHAS v2.1 LVS section and LVS's webpage would be greatly appreciated. Thanks, Jordan From Jon.Stanley at savvis.net Mon May 15 15:55:48 2006 From: Jon.Stanley at savvis.net (Stanley, Jon) Date: Mon, 15 May 2006 10:55:48 -0500 Subject: [Linux-cluster] Re: gfs_fsck problems (not doingget_get_meta_buffer) Message-ID: <9A6FE0FCC2B29846824C5CD81C6647B901DE28E3@s228130hz1ew08.apptix-01.savvis.net> > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Stephen Willey > Sent: Monday, May 15, 2006 11:33 AM > To: David Teigland > Cc: Daire Byrne; linux-cluster at redhat.com > Subject: Re: [Linux-cluster] Re: gfs_fsck problems (not > doingget_get_meta_buffer) > > I filed a bug with redhat on this and it was a duplicate of this bug: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=186125 > > Stephen > To which mere mortals do not have access :-( From rpeterso at redhat.com Mon May 15 16:05:26 2006 From: rpeterso at redhat.com (Robert S Peterson) Date: Mon, 15 May 2006 11:05:26 -0500 Subject: [Linux-cluster] Re: gfs_fsck problems (not doing get_get_meta_buffer) Message-ID: <1147709126.16950.45.camel@technetium.msp.redhat.com> Stephen Willey wrote: > gfs_fsck seems to break my filesystem! This is a known problem documented in Bugzilla as bz 186125. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=186125 There is a hotfix available for it that may be downloaded from: http://seg.rdu.redhat.com/scripts/hotfix/edit.pl?id=980 Regards, Bob Peterson Red Hat Cluster Suite From hson at ludd.luth.se Mon May 15 16:22:29 2006 From: hson at ludd.luth.se (=?ISO-8859-1?Q?Roger_H=E5kansson?=) Date: Mon, 15 May 2006 18:22:29 +0200 Subject: [Linux-cluster] RH Cluster Suite 4 + GFS + iptables In-Reply-To: <442D46C3.6050807@dsic.upv.es> References: <442D46C3.6050807@dsic.upv.es> Message-ID: <4468AAC5.4000403@ludd.luth.se> Jose Luis Beti wrote: > Hi everybody, > I'm new in the list, so I apologize if this question has been answered > before. > > Anyone could explain me how to configure IPTABLES to allow working > right Redhat Cluster Suite RHEL4 + GFS? > What ports and protocols (tcp, udp) should I configure? > I've been searching through the archives and this question pops up now and again, but I can't find any answers except some old ones back in 2004 when similar questions wer answered in a bunch of separate mails. But those answers were related to RHEL3 and from what I can see, some things have changed This was the answer back then: "Gulm uses the following by default: 40040 core 40042 ltpx 41040 lt000 if you set lt_partitions to >1 then 41041 lt001 (and up to what ever you set lt_partitions to.)" "CCS is 50006 and 50005" "the gnbd server uses 14243" "34001 - 34004 for clumanager" "Also 1228 / 1229 for broadcast / multicast heartbeating" >From the output from netstat, this is the listening ports I can see on my CentOS4-setup : Unknown processes ($NODENAME is the IP of the cluster node, and $BROADCAST is the broadcast address of the cluster node's network): TCP $NODENAME:21064 UDP $NODENAME:6809 UDP $BROADCAST:6809 clurgmgrd: TCP *:41966 TCP *:41968 TCP *:41967 TCP *:41969 ccsd: TCP localhost:50006 UDP *:50007 TCP *:50008 TCP *:50009 Also, I can see a active tcp connection between $nodeA:21064<->$nodeB:32774 and $nodeB:21064<->$nodeA:32773 -- Roger H?kansson From rpeterso at redhat.com Mon May 15 16:41:53 2006 From: rpeterso at redhat.com (Robert S Peterson) Date: Mon, 15 May 2006 11:41:53 -0500 Subject: [Linux-cluster] Re: gfs_fsck problems (not doingget_get_meta_buffer) In-Reply-To: <9A6FE0FCC2B29846824C5CD81C6647B901DE28E3@s228130hz1ew08.apptix-01.savvis.net> References: <9A6FE0FCC2B29846824C5CD81C6647B901DE28E3@s228130hz1ew08.apptix-01.savvis.net> Message-ID: <1147711313.16950.53.camel@technetium.msp.redhat.com> On Mon, 2006-05-15 at 10:55 -0500, Stanley, Jon wrote: > To which mere mortals do not have access :-( Hi Jon, The fix was to fs_bmap.c. The CVS source tree has the fixes for the RHEL4, STABLE and HEAD branches, which should take care of most people, RHEL4, Fedora, or otherwise. And that is public. Here's a link: http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/gfs/gfs_fsck/fs_bmap.c?cvsroot=cluster No one has requested a fix for RHEL3, so one hasn't been done. In addition, I tried to open up this bugzilla for better viewing. Regards, Bob Peterson Red Hat Cluster Suite From pauli at grey.colorado.edu Mon May 15 17:15:36 2006 From: pauli at grey.colorado.edu (Wolfgang Pauli) Date: Mon, 15 May 2006 11:15:36 -0600 Subject: [Linux-cluster] cman ignores interface setting on ipv4 In-Reply-To: <4468318A.9030901@redhat.com> References: <200605131636.13466.ookami@gmx.de> <4468318A.9030901@redhat.com> Message-ID: <200605151115.36176.pauli@grey.colorado.edu> Thanks, that worked. Patick said I can not use the multihome setup with dlm, can I do that with gulm? wolfi From rmm-linux-cluster at z.odi.ac Mon May 15 20:13:37 2006 From: rmm-linux-cluster at z.odi.ac (Ross Mellgren) Date: Mon, 15 May 2006 16:13:37 -0400 Subject: [Linux-cluster] Periodic hang of file system accesses using GFS/GNBD (gnbd (pid 12082: du) got signal 1) Message-ID: <4468E0F1.9090108@z.odi.ac> Hi, I have a two-node cluster where each node exports filesystems to the other node, e.g. nodeA: 2tb array /dev/sdc LVM PV/VG/LV created /dev/nodea_sdc_vg/lvol0 mounted on /array/nodea /dev/sdc is exported via gnbd nodeb gnbd (/dev/sdc) device is imported /dev/nodeb_sdc_vg/lvol0 mounted on /array/nodeb nodeB: 2tb array /dev/sdc LVM PV/VG/LV created /dev/nodeb_sdc_vg/lvol0 mounted on /array/nodeb /dev/sdc is exported via gnbd nodea gnbd (/dev/sdc) device is imported /dev/nodea_sdc_vg/lvol0 mounted on /array/nodea Everything seemed to work fine when I set it up. I ran some bonnie++ tests with pretty vigorous settings, on each node against it's local GFS and on each node against the remote GFS, and the same simultaneously, everything worked fine. I've now put 200+gb of data on it and I'm encountering the problem where normal processes like find, du, or ls hang against nodeb's array while on nodea. Messages like the following appear in the dmesg on nodea (note that I have not used kill on any of these processes, so I'm not kill -9'ing them to get this): gnbd (pid 12082: du) got signal 9 gnbd0: Send control failed (result -4) gnbd0: Receive control failed (result -32) gnbd0: shutting down socket exitting GNBD_DO_IT ioctl resending requests gnbd (pid 12082: du) got signal 1 gnbd0: Send control failed (result -4) gnbd (pid 20598: find) got signal 9 gnbd0: Send control failed (result -4) gnbd (pid 4238: diff) got signal 9 gnbd0: Send control failed (result -4) gnbd0: Receive control failed (result -32) gnbd0: shutting down socket exitting GNBD_DO_IT ioctl resending requests Looking at the code with my limited knowledge of kernel programming, it looks like this means that a SIGKILL/SIGSEGV got trapped during the sock_sendmsg/sock_recvmsg? It's pretty easy to get this problem to manifest. I can clear the hang by doing gndb_export -O -R on the server (nodeb) and reexport. The client (nodea) automatically picks up the disconnect/reconnect and SIGKILL's the hung process. After this has happened a bunch of times, it looks like the GFS has got a little corrupted -- I ran a gfs_fsck -y -v on it and it cleaned up a bunch of fsck bitmap mismatches. It doesn't look like network connectivity is being lost at all between the two nodes, but I can't be absolutely sure a single packet didn't get dropped here or there. Any help would be greatly appreciated! -Ross Vital statistics of the systems (both are running identical kernel + GFS/GNBD/CMAN/etc modules, compiled on one and copied to the other) Linux nodea 2.6.12.6 #2 SMP Fri Apr 14 19:59:14 EDT 2006 i686 i686 i386 GNU/Linux cman-kernel-2.6.11.5-20050601.152643.FC4 dlm-kernel-2.6.11.5-20050601.152643.FC4 gfs-kernel-2.6.11.8-20050601.152643.FC4 gnbd-kernel-2.6.11.2-20050420.133124.FC4 Both boxes are dual xeons 2.8ghz with 4gb ram each (but with the BIOS memory mapping issue that prevents us from seeing all 4gb, so really 3.3gb). The arrays are SATA arrays on top of Areca cards -- one box has dual ARC-1120's and the other has a single ARC-1160 split up using LVM. From basv at sara.nl Tue May 16 12:53:23 2006 From: basv at sara.nl (Bas van der Vlies) Date: Tue, 16 May 2006 14:53:23 +0200 Subject: [Linux-cluster] Module gfs_kernel does not compile from CVS stable Message-ID: <4469CB43.1080509@sara.nl> Hello, Due to the new build software i always get an error that it can not find . It is in /usr/include/cluster/cnxman,h but not in the kernel source directory. A simple solution is to make in kernel source directory a link to the one in /usr/include/cluster: - cd /usr/src/linux/include - ln -s /usr/include/cluster . This solves my problem, but maybe there is a better solution? Regards -- -- ******************************************************************** * * * Bas van der Vlies e-mail: basv at sara.nl * * SARA - Academic Computing Services phone: +31 20 592 8012 * * Kruislaan 415 fax: +31 20 6683167 * * 1098 SJ Amsterdam * * * ******************************************************************** From system_admin at pah156.warszawa.sdi.tpnet.pl Tue May 16 13:06:10 2006 From: system_admin at pah156.warszawa.sdi.tpnet.pl (Czeslaw M) Date: Tue, 16 May 2006 15:06:10 +0200 Subject: [Linux-cluster] Web services 2 node cluster Message-ID: <20060516130610.GA5613@pah156.warszawa.sdi.tpnet.pl> Good Day everyone. I have read archives but could not find answer for my case. Situation: 2 node cluster with 5-6 web demons (Apache) running on virtual IP, system is RedHat Enterprise 4 (Nahant Update 2). cluster.conf looks like: ---------------- cut ----------------