From Alain.Moulle at bull.net Tue Aug 1 06:24:29 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 01 Aug 2006 08:24:29 +0200
Subject: [Linux-cluster] 2-node fencing question
Message-ID: <44CEF39D.8000001@bull.net>
> Also is there a way to configure fence_ipmilan in cluster.xml to reboot
> rather than stop the server? fence_ipmilan by itself takes the ?o
> option (on,off,reboot)
I use fence_ipmilan (with CS4 Update 2) it does at
first poweroff AND then poweron ... except if it does not get
the off status after the poweroff. (check agent ipmilan.c)
Alain Moull?
From zachacker at ibh.de Tue Aug 1 06:38:18 2006
From: zachacker at ibh.de (Zachacker, Maik)
Date: Tue, 1 Aug 2006 08:38:18 +0200
Subject: [Linux-cluster] 2-node fencing question
Message-ID: <0DDD325898FC3C4C88393E540965B37F1E59DB@dcdwa.ibh>
>> Also is there a way to configure fence_ipmilan in cluster.xml to
reboot
>> rather than stop the server? fence_ipmilan by itself takes the -o
>> option (on,off,reboot)
>
> I use fence_ipmilan (with CS4 Update 2) it does at
> first poweroff AND then poweron ... except if it does not get
> the off status after the poweroff. (check agent ipmilan.c)
I use fence_ilo and fence_apc (CS4U3) - both first poweroff and then
poweron too. This is only a problem in a two node configuration because
both nodes send the poweroff command and non of them can send the
poweron command because both are down.
The most fence-devices have an option or action tag, that is not
available via the cluster configuration tool. They can be used to force
a reboot (default) or an poweroff.
Maik Zachacker
--
Maik Zachacker
IBH Prof. Dr. Horn GmbH, Dresden, Germany
From zhiwei at linuxone.myftp.org Tue Aug 1 08:02:17 2006
From: zhiwei at linuxone.myftp.org (zhiwei)
Date: Tue, 01 Aug 2006 16:02:17 +0800
Subject: [Linux-cluster] Re: lvm2 liblvm2clusterlock.so on fc5 (Jeff Hardy)
In-Reply-To: <20060728141302.E2A9573A0E@hormel.redhat.com>
References: <20060728141302.E2A9573A0E@hormel.redhat.com>
Message-ID: <1154419337.6987.13.camel@alpha01.mcs.com>
> Message: 4
> Date: Thu, 27 Jul 2006 13:52:43 -0400
> From: Jeff Hardy
> Subject: [Linux-cluster] lvm2 liblvm2clusterlock.so on fc5
> To: linux clustering
> Message-ID: <1154022763.2789.120.camel at fritzdesk.potsdam.edu>
> Content-Type: text/plain
>
> I apologize if this has been answered already or appeared in release
> notes somewhere, but I cannot find it. FC4 had the lvm2-cluster package
> to provide the clvm locking library. This was removed in FC5 (as
> indicated in the release notes).
>
> Is this still necessary for a clvm setup:
>
> In /etc/lvm/lvm.conf:
> locking_type = 2
> locking_library = "/lib/liblvm2clusterlock.so"
>
> And if so, where does one find this now?
>
You can obtain lvm2 source code from rhcs and recompile it to enable
clvmd option. Clvmd is needed to manage the shared storage to share the
lvm information among cluster members.
Zhiwei
From stephen.willey at framestore-cfc.com Tue Aug 1 10:40:20 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Tue, 01 Aug 2006 11:40:20 +0100
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
Message-ID: <44CF2F94.4000003@framestore-cfc.com>
We fscked the filesystem because we'd started seeing the following
errors following a power failure.
GFS: fsid=nearlineA:gfs1.0: fatal: invalid metadata block
GFS: fsid=nearlineA:gfs1.0: bh = 2644310219 (type: exp=4, found=5)
GFS: fsid=nearlineA:gfs1.0: function = gfs_get_meta_buffer
GFS: fsid=nearlineA:gfs1.0: file =
/usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/gfs/dio.c, line = 1223
GFS: fsid=nearlineA:gfs1.0: time = 1154425344
GFS: fsid=nearlineA:gfs1.0: about to withdraw from the cluster
GFS: fsid=nearlineA:gfs1.0: waiting for outstanding I/O
GFS: fsid=nearlineA:gfs1.0: telling LM to withdraw
lock_dlm: withdraw abandoned memory
GFS: fsid=nearlineA:gfs1.0: withdrawn
And another instance:
GFS: fsid=nearlineA:gfs1.1: fatal: filesystem consistency error
GFS: fsid=nearlineA:gfs1.1: inode = 2384574146/2384574146
GFS: fsid=nearlineA:gfs1.1: function = dir_e_del
GFS: fsid=nearlineA:gfs1.1: file =
/usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/gfs/dir.c, line = 1495
GFS: fsid=nearlineA:gfs1.1: time = 1154393717
GFS: fsid=nearlineA:gfs1.1: about to withdraw from the cluster
GFS: fsid=nearlineA:gfs1.1: waiting for outstanding I/O
GFS: fsid=nearlineA:gfs1.1: telling LM to withdraw
lock_dlm: withdraw abandoned memory
GFS: fsid=nearlineA:gfs1.1: withdrawn
Running gfs_fsck -vvv -y /dev/gfs1_vg/gfs1_lv
Returns the following after chewing all the physical and swap RAM. The
machines have 4Gb or RAM and 2Gb of swap. We can increase the swap
size, but is this just gonna keep running out of RAM?
We're running on x86_64 so it can use as much memory as it likes. The
filesystem is roughly 45Tb.
Initializing fsck
Initializing lists...
Initializing special inodes...
Setting block ranges...
Creating a block list of size 11105160192...
Unable to allocate bitmap of size 1388145025
Segmentation fault
[root at ns1a ~]# gfs_fsck -vvv -y /dev/gfs1_vg/gfs1_lv
Initializing fsck
Initializing lists...
(bio.c:140) Writing to 65536 - 16 4096
Initializing special inodes...
(file.c:45) readi: Offset (640) is >= the file size (640).
(super.c:208) 8 journals found.
(file.c:45) readi: Offset (7116576) is >= the file size (7116576).
(super.c:265) 74131 resource groups found.
Setting block ranges...
Creating a block list of size 11105160192...
(bitmap.c:68) Allocated bitmap of size 5552580097 with 2 chunks per byte
Unable to allocate bitmap of size 1388145025
(block_list.c:72) - block_list_create()
Segmentation fault
--
Stephen Willey
Senior Systems Engineer, Framestore-CFC
+44 (0)207 344 8000
http://www.framestore-cfc.com
From Alain.Moulle at bull.net Tue Aug 1 11:06:54 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 01 Aug 2006 13:06:54 +0200
Subject: [Linux-cluster] CS4 Update 2 / is this problem fix more recent
update ?
Message-ID: <44CF35CE.1060700@bull.net>
Hi
We are facing a big problem of split-brain, due to the fact
that the process and Clurgmgrd daemon from RedHat Cluster-Suite unexpectedly
disappeared (still for an unknown reason ...) on one of the HA-Nodes pair. This
caused the other Clurgmgrd on the other Node to be aware of this and then simply
to re-start the application service without effective fenceing/migration.
It seems to be an abnormal behavior, isn't it ?
Is there a already a fix available in more recent Update ?
Have you any suggestion about this ?
Thanks a lot
Alain Moull?
From kent2004 at gmail.com Tue Aug 1 13:23:47 2006
From: kent2004 at gmail.com (Kent Chen)
Date: Tue, 1 Aug 2006 21:23:47 +0800
Subject: [Linux-cluster] hung when 3rd nodes mounting the gfs using dlm
Message-ID:
I connect 4 SUN x4100 (2 AMD dual core, 2G RAM ) to a SUN Storage 3510 with
a silkworm 200e FC switch.
the OS is RHEL 4 U3 for X86_64.
I make 2 GFS FS?one called Alpha:gfs1, another called Alpha:gfs2
All things seems good when only 2 nodes mount GFS.
Once 3rd node mount the GFS, the command hang.
Is there anyone who encounted the similar problem?
Is it a bug of GFS?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From kent2004 at gmail.com Tue Aug 1 13:23:47 2006
From: kent2004 at gmail.com (Kent Chen)
Date: Tue, 1 Aug 2006 21:23:47 +0800
Subject: [Linux-cluster] hung when 3rd nodes mounting the gfs using dlm
Message-ID:
I connect 4 SUN x4100 (2 AMD dual core, 2G RAM ) to a SUN Storage 3510 with
a silkworm 200e FC switch.
the OS is RHEL 4 U3 for X86_64.
I make 2 GFS FS?one called Alpha:gfs1, another called Alpha:gfs2
All things seems good when only 2 nodes mount GFS.
Once 3rd node mount the GFS, the command hang.
Is there anyone who encounted the similar problem?
Is it a bug of GFS?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From rpeterso at redhat.com Tue Aug 1 16:38:27 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Tue, 01 Aug 2006 11:38:27 -0500
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <44CF2F94.4000003@framestore-cfc.com>
References: <44CF2F94.4000003@framestore-cfc.com>
Message-ID: <44CF8383.3040208@redhat.com>
Stephen Willey wrote:
> We fscked the filesystem because we'd started seeing the following
> errors following a power failure.
> (snip)
> We're running on x86_64 so it can use as much memory as it likes. The
> filesystem is roughly 45Tb.
>
Hi Stephen,
Yes, this is a problem with gfs_fsck. The problem is, it tries to
allocate memory
for bitmaps based on the size of the file system. The bitmap structures
are used
throughout the code, so they're not optional. I'll have to figure out
how to
do this a better way. Thanks for opening the bugzilla (200883). I'll
work on it.
Regards,
Bob Peterson
Red Hat Cluster Suite
From stephen.willey at framestore-cfc.com Tue Aug 1 16:32:45 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Tue, 01 Aug 2006 17:32:45 +0100
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <44CF8383.3040208@redhat.com>
References: <44CF2F94.4000003@framestore-cfc.com> <44CF8383.3040208@redhat.com>
Message-ID: <44CF822D.7070705@framestore-cfc.com>
Robert Peterson wrote:
>
> Hi Stephen,
>
> Yes, this is a problem with gfs_fsck. The problem is, it tries to
> allocate memory
> for bitmaps based on the size of the file system. The bitmap structures
> are used
> throughout the code, so they're not optional. I'll have to figure out
> how to
> do this a better way. Thanks for opening the bugzilla (200883). I'll
> work on it.
>
> Regards,
>
> Bob Peterson
> Red Hat Cluster Suite
The fsck is now running after we added the 137Gb swap drive. It appears
to consistently chew about 4Gb of RAM (sometimes higher) but it is
working (for now).
Any ballpark idea of how long it'll take to fsck a 45Tb FS? I know
that's a "how long is a piece of string" question, but are we talking
hours/days/weeks?
Stephen
From teigland at redhat.com Tue Aug 1 16:35:57 2006
From: teigland at redhat.com (David Teigland)
Date: Tue, 1 Aug 2006 11:35:57 -0500
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <44CF2F94.4000003@framestore-cfc.com>
References: <44CF2F94.4000003@framestore-cfc.com>
Message-ID: <20060801163557.GD5976@redhat.com>
On Tue, Aug 01, 2006 at 11:40:20AM +0100, Stephen Willey wrote:
> We fscked the filesystem because we'd started seeing the following
> errors following a power failure.
>
> GFS: fsid=nearlineA:gfs1.0: fatal: invalid metadata block
> GFS: fsid=nearlineA:gfs1.0: bh = 2644310219 (type: exp=4, found=5)
> GFS: fsid=nearlineA:gfs1.0: function = gfs_get_meta_buffer
> GFS: fsid=nearlineA:gfs1.0: file =
> /usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/gfs/dio.c, line = 1223
> GFS: fsid=nearlineA:gfs1.0: time = 1154425344
> GFS: fsid=nearlineA:gfs1.0: about to withdraw from the cluster
> GFS: fsid=nearlineA:gfs1.0: waiting for outstanding I/O
> GFS: fsid=nearlineA:gfs1.0: telling LM to withdraw
> lock_dlm: withdraw abandoned memory
> GFS: fsid=nearlineA:gfs1.0: withdrawn
>
> And another instance:
>
> GFS: fsid=nearlineA:gfs1.1: fatal: filesystem consistency error
> GFS: fsid=nearlineA:gfs1.1: inode = 2384574146/2384574146
> GFS: fsid=nearlineA:gfs1.1: function = dir_e_del
> GFS: fsid=nearlineA:gfs1.1: file =
> /usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/gfs/dir.c, line = 1495
> GFS: fsid=nearlineA:gfs1.1: time = 1154393717
> GFS: fsid=nearlineA:gfs1.1: about to withdraw from the cluster
> GFS: fsid=nearlineA:gfs1.1: waiting for outstanding I/O
> GFS: fsid=nearlineA:gfs1.1: telling LM to withdraw
> lock_dlm: withdraw abandoned memory
> GFS: fsid=nearlineA:gfs1.1: withdrawn
What kind of fencing are you using in the cluster? Trying to understand
how this might have happened.
Dave
From stephen.willey at framestore-cfc.com Tue Aug 1 16:40:48 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Tue, 01 Aug 2006 17:40:48 +0100
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <20060801163557.GD5976@redhat.com>
References: <44CF2F94.4000003@framestore-cfc.com>
<20060801163557.GD5976@redhat.com>
Message-ID: <44CF8410.7040507@framestore-cfc.com>
David Teigland wrote:
>
> What kind of fencing are you using in the cluster? Trying to understand
> how this might have happened.
>
> Dave
>
We're using STONITH through HP/Compaq ILO. We believe that the
corruption was almost certainly caused during a building-wide power
failure though.
That'll teach us to double-check the UPS setup.
--
Stephen Willey
Senior Systems Engineer, Framestore-CFC
+44 (0)207 344 8000
http://www.framestore-cfc.com
From rpeterso at redhat.com Tue Aug 1 17:53:02 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Tue, 01 Aug 2006 12:53:02 -0500
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <44CF822D.7070705@framestore-cfc.com>
References: <44CF2F94.4000003@framestore-cfc.com> <44CF8383.3040208@redhat.com>
<44CF822D.7070705@framestore-cfc.com>
Message-ID: <44CF94FE.3070407@redhat.com>
Stephen Willey wrote:
> The fsck is now running after we added the 137Gb swap drive. It appears
> to consistently chew about 4Gb of RAM (sometimes higher) but it is
> working (for now).
>
> Any ballpark idea of how long it'll take to fsck a 45Tb FS? I know
> that's a "how long is a piece of string" question, but are we talking
> hours/days/weeks?
>
> Stephen
>
Hi Stephen,
I don't know how long it will take to fsck a 45TB fs, but it wouldn't
surprise me if
it took several days. It also varies because of hardware differences,
and of course
if you're going to swap, that might slow it down too.
Any way you look at it, 45TB is a lot of data to go through with a
fine-tooth comb like gfs_fsck does.
The latest RHEL4 U3 version (and up) and recent STABLE
and HEAD versions (in CVS) now give you a percent complete number every
second during the more lengthy passes, such as pass5.
When it finishes, can you post something on the list to let us know?
We've tried to kick around ideas on how to improve the speed, such as
(1) adding an option to only focus on areas where the journals are dirty,
(2) introducing multiple threads to process the different RGs, and even
(3) trying to get multiple nodes in the cluster to team up and do different
areas of the file system. None of these have been implemented yet
because of higher priorities. Since this is an open-source project, anyone
could step in and do these. Volunteers?
Regards,
Bob Peterson
Red Hat Cluster Suite
From Leonardo.Mello at planejamento.gov.br Tue Aug 1 16:24:01 2006
From: Leonardo.Mello at planejamento.gov.br (Leonardo Rodrigues de Mello)
Date: Tue, 1 Aug 2006 13:24:01 -0300
Subject: [Linux-cluster] hung when 3rd nodes mounting the gfs using dlm
Message-ID: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B55@corp-bsa-mp01.planejamento.gov.br>
What command had you used to create the gfs filesystem ?
GFS need one journal for each server that mount the filesystem. If you have created the filesystem only with 2 journals, you won't be able to use more than two machines.
The is used among other things to restore the filesystem in case of server failure.
If gfs in the current architeture permits you to mount in more servers than the number of journals you have, will damage the filesystem, maybe this is one of the reasons for gfs block access from more servers than the number of journals.
01 - To specify the number of journals at filesystem creation:
with the option -j number at mkfs.gfs where number is the number of machines.
for 4 machines the option will be: -j 4
02 - To increase the number of journals in filesystem that has been already created: (possibly that is your case)
for this task exist the tool gfs_jadd. see it manpage
to use this tool, you need to mount the gfs filesystem in the machine that will increase the number of journal.
gfs_jadd -j number_to_increase /gfs/filesystem/mount/point
number_to_increase must be how many journals you want to add to that filesystem, by default this number is 1. in your case with four servers: (you already have 2 journals, could be like:
gfs_jadd -j 2 /gfs/filesystem/mount/point
some times gfs_jadd doesnt work, because there isnt space in the disk for the journal creation. in that case the best solution i know is to format the filesystem specifying the correct number of journals.
Best Regards
Leonardo Rodrigues de Mello
-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Kent Chen
Sent: ter 1/8/2006 10:23
To: linux-cluster at redhat.com
Cc:
Subject: [Linux-cluster] hung when 3rd nodes mounting the gfs using dlm
I connect 4 SUN x4100 (2 AMD dual core, 2G RAM ) to a SUN Storage 3510 with
a silkworm 200e FC switch.
the OS is RHEL 4 U3 for X86_64.
I make 2 GFS FS,one called Alpha:gfs1, another called Alpha:gfs2
All things seems good when only 2 nodes mount GFS.
Once 3rd node mount the GFS, the command hang.
Is there anyone who encounted the similar problem?
Is it a bug of GFS?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3755 bytes
Desc: not available
URL:
From mykleb at no.ibm.com Tue Aug 1 20:16:42 2006
From: mykleb at no.ibm.com (Jan-Frode Myklebust)
Date: Tue, 1 Aug 2006 22:16:42 +0200
Subject: [Linux-cluster] Re: E-Mail Cluster
References: <44CE15B1.9010603@fiocruz.br>
Message-ID:
On 2006-07-31, Nicholas Anderson wrote:
> I'm new to clustering and was wondering what would be the best solution
> when clustering an email server.
>
> Today we've 1 server with a storage where all mailboxes (mbox format)
For clustering, I think it would be better to use Maildir-format
for the mailboxes. Then you'll avoid any locking problems on the
mailboxes. New messages can be delivered on one machine while other
messages in the same mail-folder is being deleted on another machine.
If your users are only accessing their email by pop/imap, moving to
Maildir shouldn't be any issue.
> and home dirs are stored.
> I'm planning to use 3 or 4 nodes running imap, pop and smtp, all of them
> sharing users' data.
>
> Should I use NFS or GFS?
NFS is very single-point-of-failure.. so definately a clusterfs/GFS.
If you can move to Maildir, you should be able to run any number of
servers where each server is running all services (imap, pop and smtp),
and incoming traffic is routed to a random server trough f.ex. round
robin dns.
To handle single-node downtime/crash, you'll just need to move the
ip-address to an available node. Easily achivable trough f.ex.
heartbeat from linux-ha.org, and probably also RH Cluster Suite..
-jf
From hyperbaba at neobee.net Wed Aug 2 06:49:24 2006
From: hyperbaba at neobee.net (Vladimir Grujic)
Date: Wed, 2 Aug 2006 08:49:24 +0200
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <44CF822D.7070705@framestore-cfc.com>
References: <44CF2F94.4000003@framestore-cfc.com> <44CF8383.3040208@redhat.com>
<44CF822D.7070705@framestore-cfc.com>
Message-ID: <200608020849.25035.hyperbaba@neobee.net>
On Tuesday 01 August 2006 18:32, Stephen Willey wrote:
> Robert Peterson wrote:
> > Hi Stephen,
> >
> > Yes, this is a problem with gfs_fsck. The problem is, it tries to
> > allocate memory
> > for bitmaps based on the size of the file system. The bitmap structures
> > are used
> > throughout the code, so they're not optional. I'll have to figure out
> > how to
> > do this a better way. Thanks for opening the bugzilla (200883). I'll
> > work on it.
> >
> > Regards,
> >
> > Bob Peterson
> > Red Hat Cluster Suite
>
> The fsck is now running after we added the 137Gb swap drive. It appears
> to consistently chew about 4Gb of RAM (sometimes higher) but it is
> working (for now).
>
> Any ballpark idea of how long it'll take to fsck a 45Tb FS? I know
> that's a "how long is a piece of string" question, but are we talking
> hours/days/weeks?
>
> Stephen
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
It took 55 hours for all 7 passes on my 1TB partition (with alot of files on
it) . partition resided on raid 10 sata storage. Does anyone else have
execution times for gfs_gsck ?
From stephen.willey at framestore-cfc.com Wed Aug 2 09:51:05 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Wed, 02 Aug 2006 10:51:05 +0100
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <200608020849.25035.hyperbaba@neobee.net>
References: <44CF2F94.4000003@framestore-cfc.com>
<44CF8383.3040208@redhat.com> <44CF822D.7070705@framestore-cfc.com>
<200608020849.25035.hyperbaba@neobee.net>
Message-ID: <44D07589.7090507@framestore-cfc.com>
> On Tuesday 01 August 2006 18:32, Stephen Willey wrote:
>> The fsck is now running after we added the 137Gb swap drive. It appears
>> to consistently chew about 4Gb of RAM (sometimes higher) but it is
>> working (for now).
>>
>> Any ballpark idea of how long it'll take to fsck a 45Tb FS? I know
>> that's a "how long is a piece of string" question, but are we talking
>> hours/days/weeks?
>>
>> Stephen
>>
Is there any way we can determine the progress during all passes? At
the moment all we're seeing is lines like the following:
(pass1.c:213) Setting 557096777 to data block
Is this representative of simply the numbers of blocks in the
filesystem? If so, how do we get the numbers of blocks in the
filesystem while the fsck is running?
We use this FS for backups and we're currently determining whether we'd
be better off just wiping it and re-syncing all our data (which would
take a couple of days, not several) so unless we can get a reliable
indication of how long this will take, we probably won't finish it.
--
Stephen Willey
Senior Systems Engineer, Framestore-CFC
+44 (0)207 344 8000
http://www.framestore-cfc.com
From f.hackenberger at mediatransfer.com Wed Aug 2 11:03:15 2006
From: f.hackenberger at mediatransfer.com (Falk Hackenberger - MediaTransfer AG Netresearch & Consulting)
Date: Wed, 02 Aug 2006 13:03:15 +0200
Subject: [Linux-cluster] clurgmgrd stops service without reason
Message-ID: <44D08673.3010207@mediatransfer.com>
Hello,
we have running cs4 config with 2 nodes.
for debuging one node is offline.
so it is running only on 1 node.
Now we have the problem that clurgmgrd stops the services wich he
provides without recognizable reason.
we have log_level 7 on the cman and on rm -lines in cluster.conf
but the reason of stoping the service is not recognizable.
I see in the logfile entries as:
--snip--
Aug 1 17:31:28 kain clurgmgrd: [4780]: Executing
/exports/imap/checkimapstartup.sh status
Aug 1 17:31:28 kain clurgmgrd: [4780]: Executing
/exports/subversion/etc/rc.d/init.d/svnserver status
Aug 1 17:31:28 kain clurgmgrd: [4780]: Checking 192.168.0.223,
Level 0
Aug 1 17:31:28 kain clurgmgrd: [4780]: 192.168.0.223 present on
eth0
Aug 1 17:31:28 kain clurgmgrd: [4780]: Link for eth0: Detected
Aug 1 17:31:28 kain clurgmgrd: [4780]: Link detected on eth0
Aug 1 17:31:37 kain clurgmgrd[4780]: Stopping service storage
--snap--
how to say to clurgmgrd, that he should log the reason for stoping the
service?
any other hints ?
thanks falk
From Michael.Roethlein at ri-solution.com Wed Aug 2 12:09:29 2006
From: Michael.Roethlein at ri-solution.com (=?iso-8859-1?Q?R=F6thlein_Michael_=28RI-Solution=29?=)
Date: Wed, 2 Aug 2006 14:09:29 +0200
Subject: [Linux-cluster] Tracing gfs problems
Message-ID: <992633B6A0E42B49BC5A41C10A8C841B030E29B8@MUCEX004.root.local>
Hello,
In the past there occured hangs resulting in reboots of our 4 node cluster. The real problem is: there aren't any traces in the log files of the nodes.
Is there a possibilty to raise the verbosity of gfs?
Thanks
Michael
From rpeterso at redhat.com Wed Aug 2 14:27:32 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Wed, 02 Aug 2006 09:27:32 -0500
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <44D07589.7090507@framestore-cfc.com>
References: <44CF2F94.4000003@framestore-cfc.com> <44CF8383.3040208@redhat.com> <44CF822D.7070705@framestore-cfc.com> <200608020849.25035.hyperbaba@neobee.net>
<44D07589.7090507@framestore-cfc.com>
Message-ID: <44D0B654.5010508@redhat.com>
Stephen Willey wrote:
> Is there any way we can determine the progress during all passes? At
> the moment all we're seeing is lines like the following:
>
> (pass1.c:213) Setting 557096777 to data block
>
> Is this representative of simply the numbers of blocks in the
> filesystem? If so, how do we get the numbers of blocks in the
> filesystem while the fsck is running?
>
> We use this FS for backups and we're currently determining whether we'd
> be better off just wiping it and re-syncing all our data (which would
> take a couple of days, not several) so unless we can get a reliable
> indication of how long this will take, we probably won't finish it.
Hi Stephen,
The latest gfs_fsck will report the percent complete for passes 1 and 5,
which take the longest. It sounds like you're running it in verbose mode
(i.e. with -v) which is going to do a lot of unnecessary I/O to stdout and
will slow it down considerably. If you're redirecting stdout, you can
do a 'grep "percent complete" /your/stdout | tail' or something similar to
figure out how far along it is with that pass.
Only passes 1 and 5 go block-by-block and therefore it's easy to figure
out how far they've gotten. For the other passes, it would be difficult to
estimate their progress, and probably not worth the overhead in terms
of time the computer would have to spend figuring it out.
You can get it to go faster by restarting it without the -v, but then it
will
have to re-do all the work it's already done to this point.
Based on what you've told me, it probably will take longer to fsck than
you're willing to wait.
Regards,
Bob Peterson
Red Hat Cluster Suite
From stephen.willey at framestore-cfc.com Wed Aug 2 14:19:56 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Wed, 02 Aug 2006 15:19:56 +0100
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <44D0B654.5010508@redhat.com>
References: <44CF2F94.4000003@framestore-cfc.com> <44CF8383.3040208@redhat.com> <44CF822D.7070705@framestore-cfc.com> <200608020849.25035.hyperbaba@neobee.net> <44D07589.7090507@framestore-cfc.com>
<44D0B654.5010508@redhat.com>
Message-ID: <44D0B48C.4080004@framestore-cfc.com>
Robert Peterson wrote:
> Hi Stephen,
>
> The latest gfs_fsck will report the percent complete for passes 1 and 5,
> which take the longest. It sounds like you're running it in verbose mode
> (i.e. with -v) which is going to do a lot of unnecessary I/O to stdout and
> will slow it down considerably. If you're redirecting stdout, you can
> do a 'grep "percent complete" /your/stdout | tail' or something similar to
> figure out how far along it is with that pass.
>
> Only passes 1 and 5 go block-by-block and therefore it's easy to figure
> out how far they've gotten. For the other passes, it would be difficult to
> estimate their progress, and probably not worth the overhead in terms
> of time the computer would have to spend figuring it out.
>
> You can get it to go faster by restarting it without the -v, but then it
> will
> have to re-do all the work it's already done to this point.
>
> Based on what you've told me, it probably will take longer to fsck than
> you're willing to wait.
> Regards,
We have restarted it without the -vs and it does appear to be
progressing much faster. We'll give it a while...
--
Stephen Willey
Senior Systems Engineer, Framestore-CFC
+44 (0)207 344 8000
http://www.framestore-cfc.com
From danwest at comcast.net Wed Aug 2 15:50:16 2006
From: danwest at comcast.net (danwest at comcast.net)
Date: Wed, 02 Aug 2006 15:50:16 +0000
Subject: [Linux-cluster] 2-node fencing question
Message-ID: <080220061550.6837.44D0C9B800021AD200001AB522007347489B9C0A99020E0B@comcast.net>
It seems like a significant problem to have fence_ipmilan issue a power-off followed by a power-on with a 2 node cluster. As described both nodes power-off and are then unable to issue the required power-on. Does anyone know a solution to this? This seems to make a 2-node cluster with ipmi fencing pointless. It looks like fence_ipmilan needs to support sending a cycle instead of a poweroff than a poweron?
According to fence_ipmilan.c it looks like cycle is not an option although it is an option for ipmitool. (ipmitool -H -U -P chassis power cycle)
>From fence_ipmilan.c:
switch(op) {
case ST_POWERON:
snprintf(arg, sizeof(arg),
"%s chassis power on", cmd);
break;
case ST_POWEROFF:
snprintf(arg, sizeof(arg),
"%s chassis power off", cmd);
break;
case ST_STATUS:
snprintf(arg, sizeof(arg),
"%s chassis power status", cmd);
break;
}
Thanks,
Dan
-------------- Original message ----------------------
From: "Zachacker, Maik"
> >> Also is there a way to configure fence_ipmilan in cluster.xml to
> reboot
> >> rather than stop the server? fence_ipmilan by itself takes the -o
> >> option (on,off,reboot)
> >
> > I use fence_ipmilan (with CS4 Update 2) it does at
> > first poweroff AND then poweron ... except if it does not get
> > the off status after the poweroff. (check agent ipmilan.c)
>
> I use fence_ilo and fence_apc (CS4U3) - both first poweroff and then
> poweron too. This is only a problem in a two node configuration because
> both nodes send the poweroff command and non of them can send the
> poweron command because both are down.
>
> The most fence-devices have an option or action tag, that is not
> available via the cluster configuration tool. They can be used to force
> a reboot (default) or an poweroff.
>
>
>
>
> Maik Zachacker
> --
> Maik Zachacker
> IBH Prof. Dr. Horn GmbH, Dresden, Germany
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
From nicholas at fiocruz.br Wed Aug 2 18:43:47 2006
From: nicholas at fiocruz.br (Nicholas Anderson)
Date: Wed, 02 Aug 2006 15:43:47 -0300
Subject: [Linux-cluster] Re: E-Mail Cluster
In-Reply-To:
References: <44CE15B1.9010603@fiocruz.br>
Message-ID: <44D0F263.509@fiocruz.br>
Hi Jan,
I'm searching in google how to convert from mbox to maildir using
sendmail/procmail .... i have 3000+ users and something like 70GB of
emails and I'll have to test it very well before doing in the
production server ....
As soon as i get this things working fine, i'll try gfs and the other
cluster stuff .....
I'm thinking on doing something like you said ... 3 nodes running
imap/pop/smtp sharing one filesystem probably with gfs where user data
will be stored.....
I was running slackware but now I'm thinking about something like redhat
or centos (will depend on our budget :-) ) to the nodes ....
It'll be easier to keep them up2dated :-)
Any new tips are welcome :-)
thanks,
Nick
Jan-Frode Myklebust wrote:
> On 2006-07-31, Nicholas Anderson wrote:
>
> For clustering, I think it would be better to use Maildir-format
> for the mailboxes. Then you'll avoid any locking problems on the
> mailboxes. New messages can be delivered on one machine while other
> messages in the same mail-folder is being deleted on another machine.
>
> If your users are only accessing their email by pop/imap, moving to
> Maildir shouldn't be any issue.
>
> NFS is very single-point-of-failure.. so definately a clusterfs/GFS.
> If you can move to Maildir, you should be able to run any number of
> servers where each server is running all services (imap, pop and smtp),
> and incoming traffic is routed to a random server trough f.ex. round
> robin dns.
>
> To handle single-node downtime/crash, you'll just need to move the
> ip-address to an available node. Easily achivable trough f.ex.
> heartbeat from linux-ha.org, and probably also RH Cluster Suite..
>
>
> -jf
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
--
Nicholas Anderson
Administrador de Sistemas Unix
LPIC-1 Certified
Rede Fiocruz
e-mail: nicholas at fiocruz.br
From mykleb at no.ibm.com Wed Aug 2 19:43:33 2006
From: mykleb at no.ibm.com (Jan-Frode Myklebust)
Date: Wed, 2 Aug 2006 21:43:33 +0200
Subject: [Linux-cluster] Re: E-Mail Cluster
References: <44CE15B1.9010603@fiocruz.br>
<44D0F263.509@fiocruz.br>
Message-ID:
On 2006-08-02, Nicholas Anderson wrote:
>
> I'm searching in google how to convert from mbox to maildir using
> sendmail/procmail ....
At my previous job we changed from exim/uw-imap on mbox,
to exim/docevot on maildir a couple of years ago. Didn't use
a cluster-fs, only SCSI-based disk failover. For about 500-users.
Right now I'm setting up a similar solution to your... trying
to support up to 200.000 users on a 5 node cluster, using IBM GPFS.
If sendmail is using procmail to do final mailbox-delivery, I
think the configuration change should be primarily putting a '/'
at the end of the path, as that should instruct procmail to
do maildir-style delivery. At least that's how I've been doing
it in my ~/.procmailrc. Ref. 'man procmailrc'.
> i have 3000+ users and something like 70GB of
> emails and I'll have to test it very well before doing in the
> production server ....
Sure.. There are a few mbox2maildir converters.. You should probably
try a few of them and verify that they all give the same result.
Another thing to check is that your cluster-fs handles your load
well. My main consern would be how well GFS performs on
maildir-style folders, as most cluster-fs's I've seen are optimized
for large file streaming I/O. If possible, try to keep a lot of
file-metadata in cache so that you don't have to go to disk every
time someone check their maildir for new messages.
-jf
From riaan at obsidian.co.za Wed Aug 2 22:27:32 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Thu, 03 Aug 2006 00:27:32 +0200
Subject: [Linux-cluster] Re: E-Mail Cluster
In-Reply-To:
References: <44CE15B1.9010603@fiocruz.br>
<44D0F263.509@fiocruz.br>
Message-ID: <44D126D4.2080303@obsidian.co.za>
Jan-Frode Myklebust wrote:
> On 2006-08-02, Nicholas Anderson wrote:
>> I'm searching in google how to convert from mbox to maildir using
>> sendmail/procmail ....
>
> At my previous job we changed from exim/uw-imap on mbox,
> to exim/docevot on maildir a couple of years ago. Didn't use
> a cluster-fs, only SCSI-based disk failover. For about 500-users.
>
> Right now I'm setting up a similar solution to your... trying
> to support up to 200.000 users on a 5 node cluster, using IBM GPFS.
>
> If sendmail is using procmail to do final mailbox-delivery, I
> think the configuration change should be primarily putting a '/'
> at the end of the path, as that should instruct procmail to
> do maildir-style delivery. At least that's how I've been doing
> it in my ~/.procmailrc. Ref. 'man procmailrc'.
>
>> i have 3000+ users and something like 70GB of
>> emails and I'll have to test it very well before doing in the
>> production server ....
>
> Sure.. There are a few mbox2maildir converters.. You should probably
> try a few of them and verify that they all give the same result.
>
> Another thing to check is that your cluster-fs handles your load
> well. My main consern would be how well GFS performs on
> maildir-style folders, as most cluster-fs's I've seen are optimized
> for large file streaming I/O. If possible, try to keep a lot of
> file-metadata in cache so that you don't have to go to disk every
> time someone check their maildir for new messages.
>
We are running 700 000 users on a 2.5 GFS, 4 nodes, with POP, IMAP
(direct access and SquirrelmMail) and SMTP. To make things worse, we use
NFS between our GFS nodes and our mail servers.
We initially had huge performance problems in our setup, which I wrote
in this message:
http://www.redhat.com/archives/linux-cluster/2006-July/msg00136.html
We ended up bumping the spindle count from 36 to 60 and then to 114,
without it making a noticeable difference.
Our main killer was Squirrelmail over IMAP (the solution is primarily a
webmail-based one)
Our performance problems were solved by the following:
- removing the folder-size plugin (built-in) and mail quota plugin (3rd
party) reduced the traffic between IMAP servers and storage backend by 40%.
- Implement imap proxy (www.imapproxy.org). This is giving us a 1 to 14
hit ratio. This storage which could not keep up previously, is now
humming along fine.
Our initial mistake was to try and optimise on the FS layer (there
werent any real performance optimizations in our setup to be made) and
throw hardware at the problem, instead of suspecting and optimizing our
application. Despite GFS not being designed for lots of small files, and
not recommended for use with NFS, with the above changes, it performs
more than adequately. We hope to see another performance gain once we
get rid of the NFS and have our mail servers access the GFS directly.
Riaan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riaan.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL:
From Hansjoerg.Maurer at dlr.de Thu Aug 3 07:17:31 2006
From: Hansjoerg.Maurer at dlr.de (=?ISO-8859-15?Q?Hansj=F6rg_Maurer?=)
Date: Thu, 03 Aug 2006 09:17:31 +0200
Subject: [Linux-cluster] Re: E-Mail Cluster
In-Reply-To:
References: <44CE15B1.9010603@fiocruz.br>
<44D0F263.509@fiocruz.br>
Message-ID: <44D1A30B.4060200@dlr.de>
Hi
we had some problems with cyrus-imap on top of gpfs a year ago
concerning mmap files in a simple failover environment.
(see the gpfs-mailinglist)
But with recent versions it works.
Greetings
Hansjoerg
>
>Right now I'm setting up a similar solution to your... trying
>to support up to 200.000 users on a 5 node cluster, using IBM GPFS.
>
>If sendmail is using procmail to do final mailbox-delivery, I
>think the configuration change should be primarily putting a '/'
>at the end of the path, as that should instruct procmail to
>do maildir-style delivery. At least that's how I've been doing
>it in my ~/.procmailrc. Ref. 'man procmailrc'.
>
>
>
>>i have 3000+ users and something like 70GB of
>>emails and I'll have to test it very well before doing in the
>>production server ....
>>
>>
>
>Sure.. There are a few mbox2maildir converters.. You should probably
>try a few of them and verify that they all give the same result.
>
>Another thing to check is that your cluster-fs handles your load
>well. My main consern would be how well GFS performs on
>maildir-style folders, as most cluster-fs's I've seen are optimized
>for large file streaming I/O. If possible, try to keep a lot of
>file-metadata in cache so that you don't have to go to disk every
>time someone check their maildir for new messages.
>
>
> -jf
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
--
_________________________________________________________________
Dr. Hansjoerg Maurer | LAN- & System-Manager
|
Deutsches Zentrum | DLR Oberpfaffenhofen
f. Luft- und Raumfahrt e.V. |
Institut f. Robotik |
Postfach 1116 | Muenchner Strasse 20
82230 Wessling | 82234 Wessling
Germany |
|
Tel: 08153/28-2431 | E-mail: Hansjoerg.Maurer at dlr.de
Fax: 08153/28-1134 | WWW: http://www.robotic.dlr.de/
__________________________________________________________________
There are 10 types of people in this world,
those who understand binary and those who don't.
From mykleb at no.ibm.com Thu Aug 3 11:42:45 2006
From: mykleb at no.ibm.com (Jan-Frode Myklebust)
Date: Thu, 3 Aug 2006 13:42:45 +0200
Subject: [Linux-cluster] Re: E-Mail Cluster
References: <44CE15B1.9010603@fiocruz.br>
<44D0F263.509@fiocruz.br>
<44D1A30B.4060200@dlr.de>
Message-ID:
On 2006-08-03, Hansj?rg Maurer wrote:
>
> we had some problems with cyrus-imap on top of gpfs a year ago
> concerning mmap files in a simple failover environment.
> (see the gpfs-mailinglist)
> But with recent versions it works.
Yes, I remember your posting.. and AFAICT you solved it by turning
off mmap in Cyrus.
https://lists.sdsc.edu/mailman/private.cgi/gpfs-general/2005q4/000040.html
Did you consider GFS for this project ? Or are you now looking
at GFS for the same project ?
We're using courier-imap, which as far as I can tell doesn't use mmap,
so we shouldn't hit this problem. Otherwise there is always the
mmap-invalidate patch that should solve this...
-jf
From singh.rajeshwar at gmail.com Thu Aug 3 12:41:08 2006
From: singh.rajeshwar at gmail.com (Rajesh singh)
Date: Thu, 3 Aug 2006 18:11:08 +0530
Subject: [Linux-cluster] fencing agent
Message-ID:
Hi all,
We are in process of procurring a fencing device and we have been suggested
by our hardware vendor to use fencing device as mentioned in below URL.
http://www.supermicro.com/products/accessories/addon/AOC-IPMI20-E.cfm
My setup is that I am using 2 node AMD servers on rhel4 u2 in clustered
mode.
I am not using gfs, but i am putting fencing device.
My querry is that, can I use the *AOC-IPMI20-E card as an fencing device.
regards
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From riaan at obsidian.co.za Thu Aug 3 14:21:28 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Thu, 03 Aug 2006 16:21:28 +0200
Subject: [Linux-cluster] fencing agent
In-Reply-To:
References:
Message-ID: <44D20668.6070901@obsidian.co.za>
Rajesh singh wrote:
> Hi all,
> We are in process of procurring a fencing device and we have been
> suggested by our hardware vendor to use fencing device as mentioned in
> below URL.
> http://www.supermicro.com/products/accessories/addon/AOC-IPMI20-E.cfm
> My setup is that I am using 2 node AMD servers on rhel4 u2 in
> clustered mode.
> I am not using gfs, but i am putting fencing device.
> My querry is that, can I use the *AOC-IPMI20-E card as an fencing device.
>
> regards
According to the above URL, this card supports IPMI 2, which means that
it should work with the fence_ipmilan fencing module in RHCS 4.
Riaan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riaan.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL:
From mwill at penguincomputing.com Thu Aug 3 15:29:36 2006
From: mwill at penguincomputing.com (Michael Will)
Date: Thu, 3 Aug 2006 08:29:36 -0700
Subject: [Linux-cluster] fencing agent
Message-ID: <433093DF7AD7444DA65EFAFE3987879C125E83@jellyfish.highlyscyld.com>
Or you buy systems that come with ipmi on the mainboard.
-----Original Message-----
From: Rajesh singh [mailto:singh.rajeshwar at gmail.com]
Sent: Thu Aug 03 07:01:25 2006
To: linux-cluster at redhat.com
Subject: [Linux-cluster] fencing agent
Hi all,
We are in process of procurring a fencing device and we have been suggested
by our hardware vendor to use fencing device as mentioned in below URL.
http://www.supermicro.com/products/accessories/addon/AOC-IPMI20-E.cfm
My setup is that I am using 2 node AMD servers on rhel4 u2 in clustered
mode.
I am not using gfs, but i am putting fencing device.
My querry is that, can I use the *AOC-IPMI20-E card as an fencing device.
regards
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From raycharles_man at yahoo.com Thu Aug 3 15:55:04 2006
From: raycharles_man at yahoo.com (Ray Charles)
Date: Thu, 3 Aug 2006 08:55:04 -0700 (PDT)
Subject: [Linux-cluster] Logging for cluster errors.
Message-ID: <20060803155505.26305.qmail@web32108.mail.mud.yahoo.com>
Hi,
Easy question.
When I run system-config-cluster I am able to see the
gui. But in the event there's an error while using
the gui where does that get logged?
I've seen an error from the gui and it says check
logging. I didn't see a /var/log/cluster/
-TIA
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
From rpeterso at redhat.com Thu Aug 3 16:20:03 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Thu, 03 Aug 2006 11:20:03 -0500
Subject: [Linux-cluster] Logging for cluster errors.
In-Reply-To: <20060803155505.26305.qmail@web32108.mail.mud.yahoo.com>
References: <20060803155505.26305.qmail@web32108.mail.mud.yahoo.com>
Message-ID: <44D22233.5030403@redhat.com>
Ray Charles wrote:
>
> Hi,
>
> Easy question.
>
> When I run system-config-cluster I am able to see the
> gui. But in the event there's an error while using
> the gui where does that get logged?
>
> I've seen an error from the gui and it says check
> logging. I didn't see a /var/log/cluster/
>
> -TIA
>
Hi Ray,
Usually the messages go into /var/log/messages.
Many of the messages can be redirected to other places by
changing the cluster.conf file, so the code won't tell you
specifically where to look.
Regards,
Bob peterson
Red Hat Cluster Suite
From jparsons at redhat.com Thu Aug 3 17:06:26 2006
From: jparsons at redhat.com (James Parsons)
Date: Thu, 03 Aug 2006 13:06:26 -0400
Subject: [Linux-cluster] Logging for cluster errors.
In-Reply-To: <44D22233.5030403@redhat.com>
References: <20060803155505.26305.qmail@web32108.mail.mud.yahoo.com>
<44D22233.5030403@redhat.com>
Message-ID: <44D22D12.2060501@redhat.com>
Robert Peterson wrote:
> Ray Charles wrote:
>
>>
>> Hi,
>>
>> Easy question.
>>
>> When I run system-config-cluster I am able to see the
>> gui. But in the event there's an error while using
>> the gui where does that get logged?
>> I've seen an error from the gui and it says check
>> logging. I didn't see a /var/log/cluster/
>>
>> -TIA
>>
>
> Hi Ray,
>
> Usually the messages go into /var/log/messages.
> Many of the messages can be redirected to other places by
> changing the cluster.conf file, so the code won't tell you
> specifically where to look.
>
> Regards,
>
> Bob peterson
> Red Hat Cluster Suite
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
Ray,
What is the nature of the error that you are seeing?
-J
From rpeterso at redhat.com Thu Aug 3 18:44:31 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Thu, 03 Aug 2006 13:44:31 -0500
Subject: [Linux-cluster] Tracing gfs problems
In-Reply-To: <992633B6A0E42B49BC5A41C10A8C841B030E29B8@MUCEX004.root.local>
References: <992633B6A0E42B49BC5A41C10A8C841B030E29B8@MUCEX004.root.local>
Message-ID: <44D2440F.40408@redhat.com>
R?thlein Michael (RI-Solution) wrote:
> Hello,
>
> In the past there occured hangs resulting in reboots of our 4 node cluster. The real problem is: there aren't any traces in the log files of the nodes.
>
> Is there a possibilty to raise the verbosity of gfs?
>
> Thanks
>
> Michael
>
Hi Michael,
Right now, there's no way to increase the level of verbosity or logging
in the gfs kernel code, but
I'm not sure that would help you anyway. The lockup could be in any
part of the kernel:
GFS, The DLM/Gulm locking infrastructure, or any other part for that
matter. It could also be
hardware related or running out of memory, etc.
Your best bet may be to temporarily disable fencing so that the hung
node(s) don't get fenced
as soon as it happens, for example by changing it to manual fencing, and
then when it hangs,
check for dmesgs on the console, syslog messages in /var/log/messages
and if you can't get
a command prompt, use the "magic sysreq" key to dump out what each
module, thread and
process is doing.
If that doesn't tell you where the problem is, you can send the info to
this list or create a
bugzilla for the problem and attach the output from the sysrq, along
with details on what
release of code you're using, your cluster.conf, etc.
Here are simple instructions for using the "magic sysrq" in case you're
unfamiliar:
1. Turn it on by doing:
echo "1" > /proc/sys/kernel/sysrq
2. Recreate your kernel hang
3. If you're at the system console with a keyboard, do alt-sysrq t (task
list)
If you have a telnet console instead, do ctrl-] to get telnet> prompt
telnet> send brk (send a break char)
t (task list)
If you don't have a keyboard or telnet, but do have a shell:
echo "t" > /proc/sysrq-trigger
If you're doing it from a minicom, use: f followed by t
(For other types of serial consoles, you have to get it to send a break,
then letter t)
4. The task info will be dumped to the console, so hopefully you have
a way to save that off.
Regards,
Bob Peterson
Red Hat Cluster Suite
From lhh at redhat.com Thu Aug 3 18:36:39 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 03 Aug 2006 14:36:39 -0400
Subject: [Linux-cluster] CS4 Update 2 / is this problem fix more recent
update ?
In-Reply-To: <44CF35CE.1060700@bull.net>
References: <44CF35CE.1060700@bull.net>
Message-ID: <1154630199.28677.18.camel@ayanami.boston.redhat.com>
On Tue, 2006-08-01 at 13:06 +0200, Alain Moulle wrote:
> Hi
>
> We are facing a big problem of split-brain, due to the fact
> that the process and Clurgmgrd daemon from RedHat Cluster-Suite unexpectedly
> disappeared (still for an unknown reason ...) on one of the HA-Nodes pair. This
> caused the other Clurgmgrd on the other Node to be aware of this and then simply
> to re-start the application service without effective fenceing/migration.
>
> It seems to be an abnormal behavior, isn't it ?
>
> Is there a already a fix available in more recent Update ?
Fixed in U4 beta; there were two problems:
(a) a segfault, and
(b) missing inclusion of Stanko Kupcevic's self-monitoring in clurgmgrd.
-- Lon
From lhh at redhat.com Thu Aug 3 18:38:44 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 03 Aug 2006 14:38:44 -0400
Subject: [Linux-cluster] clurgmgrd stops service without reason
In-Reply-To: <44D08673.3010207@mediatransfer.com>
References: <44D08673.3010207@mediatransfer.com>
Message-ID: <1154630324.28677.21.camel@ayanami.boston.redhat.com>
On Wed, 2006-08-02 at 13:03 +0200, Falk Hackenberger - MediaTransfer AG
Netresearch & Consulting wrote:
> --snip--
> Aug 1 17:31:28 kain clurgmgrd: [4780]: Executing
> /exports/imap/checkimapstartup.sh status
> Aug 1 17:31:28 kain clurgmgrd: [4780]: Executing
> /exports/subversion/etc/rc.d/init.d/svnserver status
> Aug 1 17:31:28 kain clurgmgrd: [4780]: Checking 192.168.0.223,
> Level 0
> Aug 1 17:31:28 kain clurgmgrd: [4780]: 192.168.0.223 present on
> eth0
> Aug 1 17:31:28 kain clurgmgrd: [4780]: Link for eth0: Detected
> Aug 1 17:31:28 kain clurgmgrd: [4780]: Link detected on eth0
> Aug 1 17:31:37 kain clurgmgrd[4780]: Stopping service storage
> --snap--
>
> how to say to clurgmgrd, that he should log the reason for stoping the
> service?
Something must be returning an error code where it should not be; can
you post your service XML blob?
-- Lon
From lhh at redhat.com Thu Aug 3 18:39:29 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 03 Aug 2006 14:39:29 -0400
Subject: [Linux-cluster] fencing agent
In-Reply-To: <44D20668.6070901@obsidian.co.za>
References:
<44D20668.6070901@obsidian.co.za>
Message-ID: <1154630369.28677.23.camel@ayanami.boston.redhat.com>
On Thu, 2006-08-03 at 16:21 +0200, Riaan van Niekerk wrote:
> According to the above URL, this card supports IPMI 2, which means that
> it should work with the fence_ipmilan fencing module in RHCS 4.
It should work with RHCS4, since we're just calling ipmitool.
-- Lon
From lhh at redhat.com Thu Aug 3 19:25:46 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 03 Aug 2006 15:25:46 -0400
Subject: [Linux-cluster] 2-node fencing question
In-Reply-To: <080220061550.6837.44D0C9B800021AD200001AB522007347489B9C0A99020E0B@comcast.net>
References: <080220061550.6837.44D0C9B800021AD200001AB522007347489B9C0A99020E0B@comcast.net>
Message-ID: <1154633146.28677.70.camel@ayanami.boston.redhat.com>
Sorry I didn't see this earlier!
On Wed, 2006-08-02 at 15:50 +0000, danwest at comcast.net wrote:
> It seems like a significant problem to have fence_ipmilan issue a power-off followed by a power-on with a 2 node cluster.
Generally, the chances of this occurring are very, very small, though
not impossible.
However, it could very well be that IPMI hardware modules are slow
enough at processing requests that this could pose a problem. What
hardware has this happened on? Was ACPI disabled on boot in the host OS
(it should be; see below)?
> This seems to make a 2-node cluster with ipmi fencing pointless.
I'm pretty sure that 'both-nodes-off problem' can only occur if all of
the following criteria are met:
(a) while using a separate NICs for IPMI and cluster traffic (the
recommended configuration),
(b) in the event of a network partition, such that both nodes can not
see each other but can see each other's IPMI port, and
(c) if both nodes send their power-off packets at or near the exact same
time.
The time window for (c) increases significantly (5+ seconds) if the
cluster nodes are enabling ACPI power events on boot. This is one of
the reasons why booting with acpi=off is required when using IPMI, iLO,
or other integrated power management solutions.
If booting with acpi=off, does the problem persist?
> It looks like fence_ipmilan needs to support sending a cycle instead of a poweroff than a poweron?
The reason fence_ipmilan functions this way (off, status, on) is because
that we require a confirmation that the node has lost power. I am not
sure that it is possible to confirm the node has rebooted using IPMI.
Arguably, it also might not be necessary to make such a confirmation in
this particular case.
> According to fence_ipmilan.c it looks like cycle is not an option although it is an option for ipmitool. (ipmitool -H -U -P chassis power cycle)
Looks like you're on the right track.
-- Lon
From nicholas at fiocruz.br Thu Aug 3 20:27:03 2006
From: nicholas at fiocruz.br (Nicholas Anderson)
Date: Thu, 03 Aug 2006 17:27:03 -0300
Subject: [Linux-cluster] Re: E-Mail Cluster
In-Reply-To: <44D126D4.2080303@obsidian.co.za>
References: <44CE15B1.9010603@fiocruz.br> <44D0F263.509@fiocruz.br>
<44D126D4.2080303@obsidian.co.za>
Message-ID: <44D25C17.8000004@fiocruz.br>
Hi again all .....
I guess i'm starting to understand how the things should work ....
I was reading about GFS and all the documents that i found suppose that
you have a storage with a SAN and 2 or more machines connected through
FC to the SAN.
Well, it seems to me that in this case the storage or the SAN switch
still being one single-point-of-failure right? If the storage or SAN
goes down, the whole service will be offline right ?
I thought that with GFS i could do something like a "Parallel FS" where
2 (or more) machines would have the same data in their disks, but this
data would be synchronized in realtime ....
am i totally noob or there really has a way to make FS's work in
parallel, synchronizing in realtime?
I'd like to do this without having a SAN (cause i don't have one :-) and
i have only 1 storage ) and without leaving a single-point-of-failure.
Let me try to explain exactly what I'm thinking ...
3 servers, each one with a 300GB SCSI disk (local, no FC) to be
synchronized with the others (through GFS?? mounted and shared as a
/data f.ex.), and one 36GB disk only for the SO.
All the servers would have smtp(sendmail with spamassassin and clamav),
imap and pop3 services running, and probably a squirrelmail.
Is it possible to do this? Is it possible to get this data synchronized
in realtime ?
Thanks again for your really really important answers, and sorry for
asking so much noob questions :-)
Nick
Riaan van Niekerk wrote:
>
> We are running 700 000 users on a 2.5 GFS, 4 nodes, with POP, IMAP
> (direct access and SquirrelmMail) and SMTP. To make things worse, we
> use NFS between our GFS nodes and our mail servers.
>
> We initially had huge performance problems in our setup, which I wrote
> in this message:
> http://www.redhat.com/archives/linux-cluster/2006-July/msg00136.html
>
> We ended up bumping the spindle count from 36 to 60 and then to 114,
> without it making a noticeable difference.
>
> Our main killer was Squirrelmail over IMAP (the solution is primarily
> a webmail-based one)
> Our performance problems were solved by the following:
> - removing the folder-size plugin (built-in) and mail quota plugin
> (3rd party) reduced the traffic between IMAP servers and storage
> backend by 40%.
> - Implement imap proxy (www.imapproxy.org). This is giving us a 1 to
> 14 hit ratio. This storage which could not keep up previously, is now
> humming along fine.
>
> Our initial mistake was to try and optimise on the FS layer (there
> werent any real performance optimizations in our setup to be made) and
> throw hardware at the problem, instead of suspecting and optimizing
> our application. Despite GFS not being designed for lots of small
> files, and not recommended for use with NFS, with the above changes,
> it performs more than adequately. We hope to see another performance
> gain once we get rid of the NFS and have our mail servers access the
> GFS directly.
>
> Riaan
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
--
Nicholas Anderson
Administrador de Sistemas Unix
LPIC-1 Certified
Rede Fiocruz
e-mail: nicholas at fiocruz.br
From rainer at ultra-secure.de Thu Aug 3 22:53:51 2006
From: rainer at ultra-secure.de (Rainer Duffner)
Date: Fri, 04 Aug 2006 00:53:51 +0200
Subject: [Linux-cluster] Re: E-Mail Cluster
In-Reply-To: <44D25C17.8000004@fiocruz.br>
References: <44CE15B1.9010603@fiocruz.br> <44D0F263.509@fiocruz.br> <44D126D4.2080303@obsidian.co.za>
<44D25C17.8000004@fiocruz.br>
Message-ID: <44D27E7F.6000706@ultra-secure.de>
Nicholas Anderson wrote:
> Hi again all .....
>
> I guess i'm starting to understand how the things should work ....
>
> I was reading about GFS and all the documents that i found suppose
> that you have a storage with a SAN and 2 or more machines connected
> through FC to the SAN.
> Well, it seems to me that in this case the storage or the SAN switch
> still being one single-point-of-failure right? If the storage or SAN
> goes down, the whole service will be offline right ?
First of all, you (should) have redundant FC-switches (mulipathing).
Then, your storage has (should have) multiple controllers. Eg. HP EVA
series.
If that isn't enough, there are solution to mirror the storage at the
hardware-level.
Usually, this is in the
"if-you-have-to-ask-it's-probably-too-expensive-for-you-anyway"-pricerange
and thus only used where the (lack of) downtime is worth the investment.
>
> I thought that with GFS i could do something like a "Parallel FS"
> where 2 (or more) machines would have the same data in their disks,
> but this data would be synchronized in realtime ....
> am i totally noob or there really has a way to make FS's work in
> parallel, synchronizing in realtime?
> I'd like to do this without having a SAN (cause i don't have one :-)
> and i have only 1 storage ) and without leaving a
> single-point-of-failure.
>
> Let me try to explain exactly what I'm thinking ...
>
> 3 servers, each one with a 300GB SCSI disk (local, no FC) to be
> synchronized with the others (through GFS?? mounted and shared as a
> /data f.ex.), and one 36GB disk only for the SO.
> All the servers would have smtp(sendmail with spamassassin and
> clamav), imap and pop3 services running, and probably a squirrelmail.
>
You can have a master/slave solution with DRBD.
> Is it possible to do this? Is it possible to get this data
> synchronized in realtime ?
I don't think so.
Well, Google has sort-of a solution via their "Google Filesystem". But
not for you or me. :-(
>
> Thanks again for your really really important answers, and sorry for
> asking so much noob questions :-)
>
IMO, hardware is very reliable these days (if you choose wisely). Things
like DRBD seem (to me) only useful in very special cases - and I would
fear that DRBD might create more problems than it solves.
In your special case (email), if you can't afford a SAN, get a used
NetApp and store the maildirs there (qmail-style maildirs). Then
NFS-mount them on the "cluster-nodes".
The NetApp is reliable enough for these scenarios and depending on the
exact model, already contains a lot of redundancy in itself.
cheers,
Rainer
From nanfang.xun at sunnexchina.com Fri Aug 4 00:37:21 2006
From: nanfang.xun at sunnexchina.com (Nanfang.Xun)
Date: Fri, 04 Aug 2006 08:37:21 +0800
Subject: [Linux-cluster] Linux-cluster mailing list submissions
Message-ID: <1154651841.3512.20.camel@ns.xunting.net>
From yfttyfs at gmail.com Fri Aug 4 02:32:07 2006
From: yfttyfs at gmail.com (y f)
Date: Fri, 4 Aug 2006 10:32:07 +0800
Subject: [Linux-cluster] Linux-cluster mailing list submissions
In-Reply-To: <1154651841.3512.20.camel@ns.xunting.net>
References: <1154651841.3512.20.camel@ns.xunting.net>
Message-ID: <78fcc84a0608031932j7522df50wf6cd36b28a81ff67@mail.gmail.com>
Hi, Xun,
Do you also like Cluster as a guy of metal products company ?
Wish you a nice day !
/yf
On 8/4/06, Nanfang.Xun wrote:
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From nicholas at fiocruz.br Fri Aug 4 02:34:30 2006
From: nicholas at fiocruz.br (Nicholas Anderson)
Date: Thu, 3 Aug 2006 23:34:30 -0300 (BRT)
Subject: [Linux-cluster] Re: E-Mail Cluster
In-Reply-To: <44D27E7F.6000706@ultra-secure.de>
References: <44CE15B1.9010603@fiocruz.br> <44D0F263.509@fiocruz.br> <44D126D4.2080303@obsidian.co.za>
<44D25C17.8000004@fiocruz.br> <44D27E7F.6000706@ultra-secure.de>
Message-ID: <61582.201.51.123.23.1154658870.squirrel@www.redefiocruz.fiocruz.br>
> First of all, you (should) have redundant FC-switches (mulipathing).
> Then, your storage has (should have) multiple controllers. Eg. HP EVA
> series.
> If that isn't enough, there are solution to mirror the storage at the
> hardware-level.
> Usually, this is in the
> "if-you-have-to-ask-it's-probably-too-expensive-for-you-anyway"-pricerange
> and thus only used where the (lack of) downtime is worth the investment.
oops, money is the problem :-P
i work for a government institution ..... in Brazil :-P
> IMO, hardware is very reliable these days (if you choose wisely). Things
> like DRBD seem (to me) only useful in very special cases - and I would
> fear that DRBD might create more problems than it solves.
> In your special case (email), if you can't afford a SAN, get a used
> NetApp and store the maildirs there (qmail-style maildirs). Then
> NFS-mount them on the "cluster-nodes".
> The NetApp is reliable enough for these scenarios and depending on the
> exact model, already contains a lot of redundancy in itself.
i already thought about this ..... its a possibility ....
thanks for the answer ...
cheers
Nick
--
Nicholas Anderson
Administrador de Sistemas Unix
LPIC-1 Certified
Rede Fiocruz
e-mail: nicholas at fiocruz.br
From Leonardo.Mello at planejamento.gov.br Fri Aug 4 11:31:01 2006
From: Leonardo.Mello at planejamento.gov.br (Leonardo Rodrigues de Mello)
Date: Fri, 4 Aug 2006 08:31:01 -0300
Subject: [Linux-cluster] gfs support for extended security attributes
Message-ID: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B5E@corp-bsa-mp01.planejamento.gov.br>
The gfs doesn't support SELinux attributes, currently you MUST DISABLE SELinux to use GFS+Cluster Suite.
I don't know if there is any plan to support it. maybe one developer or someone from redhat can anwser you better. :-D
Best Regards
Leonardo Rodrigues de Mello
-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of David Caplan
Sent: sex 21/7/2006 17:16
To: linux-cluster at redhat.com
Cc:
Subject: [Linux-cluster] gfs support for extended security attributes
Does the current release of GFS support extended security attributes for
use with SELinux? If not, are there any plans for support?
Thanks,
David
--
__________________________________
David Caplan 410 290 1411 x105
dac at tresys.com
Tresys Technology, LLC
8840 Stanford Blvd., Suite 2100
Columbia, MD 21045
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3061 bytes
Desc: not available
URL:
From kanderso at redhat.com Fri Aug 4 14:12:49 2006
From: kanderso at redhat.com (Kevin Anderson)
Date: Fri, 04 Aug 2006 09:12:49 -0500
Subject: [Linux-cluster] gfs support for extended security attributes
In-Reply-To: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B5E@corp-bsa-mp01.planejamento.gov.br>
References: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B5E@corp-bsa-mp01.planejamento.gov.br>
Message-ID: <1154700769.2783.3.camel@dhcp80-204.msp.redhat.com>
The gfs-kernel code in the HEAD of the cvs tree now has SELinux extended
attribute support integrated into the code. Also, the upstream gfs2
code that is in the -mm kernel also has the SELinux support as well.
The gfs code in HEAD is targeted at the Fedora Core 6 and RHEL5
releases.
Kevin
On Fri, 2006-08-04 at 08:31 -0300, Leonardo Rodrigues de Mello wrote:
> The gfs doesn't support SELinux attributes, currently you MUST DISABLE SELinux to use GFS+Cluster Suite.
>
> I don't know if there is any plan to support it. maybe one developer or someone from redhat can anwser you better. :-D
>
> Best Regards
> Leonardo Rodrigues de Mello
>
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com on behalf of David Caplan
> Sent: sex 21/7/2006 17:16
> To: linux-cluster at redhat.com
> Cc:
> Subject: [Linux-cluster] gfs support for extended security attributes
>
> Does the current release of GFS support extended security attributes for
> use with SELinux? If not, are there any plans for support?
>
> Thanks,
> David
>
> --
> __________________________________
>
> David Caplan 410 290 1411 x105
> dac at tresys.com
> Tresys Technology, LLC
> 8840 Stanford Blvd., Suite 2100
> Columbia, MD 21045
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
From gforte at leopard.us.udel.edu Fri Aug 4 14:32:25 2006
From: gforte at leopard.us.udel.edu (Greg Forte)
Date: Fri, 04 Aug 2006 10:32:25 -0400
Subject: [Linux-cluster] what causes "magma send einval to ..."?
Message-ID: <44D35A79.700@leopard.us.udel.edu>
I had a cluster node chugging along seemingly fine last night, then the
following two lines appear in /var/log/messages:
Aug 3 22:20:07 hostname kernel: al to 1
Aug 3 22:20:07 hostname kernel: Magma send einval to 1
And about 20 seconds later the other node fenced this one.
I'm guessing that that fragmented message means that there's some sort
of kernel flakiness going on, or that the box got overloaded (no way to
tell, unfortunately - any recommendations on monitoring tools to track
and log load level?), but that's just a guess.
-g
Greg Forte
gforte at udel.edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE
From lhh at redhat.com Fri Aug 4 15:11:12 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 04 Aug 2006 11:11:12 -0400
Subject: [Linux-cluster] CS4 Update 2 / is this problem fix more recent
update ?
In-Reply-To: <44D3251A.5080001@bull.net>
References: <44D3251A.5080001@bull.net>
Message-ID: <1154704272.28677.90.camel@ayanami.boston.redhat.com>
On Fri, 2006-08-04 at 12:44 +0200, Alain Moulle wrote:
> Hi Ron,
>
> could you provide me the defects numbers and/or linked patches ?
Here's the current list of pending fixes:
http://bugzilla.redhat.com/bugzilla/buglist.cgi?component=rgmanager&bug_status=MODIFIED&bug_status=FAILS_QA&bug_status=ON_QA
The patch for internal self-monitoring was simply a backport from the
HEAD branch. I've attached a hand-edited patch which should enable the
self-monitoring bit.
Additionally, there was a segfault fixed in U3. Here's the errata
advisory, which contains links to bugzillas:
https://rhn.redhat.com/errata/RHBA-2006-0241.html
-- Lon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: watchdog.diff
Type: text/x-patch
Size: 4064 bytes
Desc: not available
URL:
From raycharles_man at yahoo.com Fri Aug 4 15:43:41 2006
From: raycharles_man at yahoo.com (Ray Charles)
Date: Fri, 4 Aug 2006 08:43:41 -0700 (PDT)
Subject: [Linux-cluster] Logging for cluster errors.
In-Reply-To: <44D22D12.2060501@redhat.com>
Message-ID: <20060804154341.47391.qmail@web32114.mail.mud.yahoo.com>
Yes,
The error pops up say when I want to disable a service
and re-enable. At the moment its a failed service so
when I go to disable it in the gui i get the error and
it directs me to check the logs. The Error is not
explicit at all just ERROR and a directive to check
the logs.
-Ray
--- James Parsons wrote:
> Robert Peterson wrote:
>
> > Ray Charles wrote:
> >
> >>
> >> Hi,
> >>
> >> Easy question.
> >>
> >> When I run system-config-cluster I am able to see
> the
> >> gui. But in the event there's an error while
> using
> >> the gui where does that get logged?
> >> I've seen an error from the gui and it says check
> >> logging. I didn't see a /var/log/cluster/
> >>
> >> -TIA
> >>
> >
> > Hi Ray,
> >
> > Usually the messages go into /var/log/messages.
> > Many of the messages can be redirected to other
> places by
> > changing the cluster.conf file, so the code won't
> tell you
> > specifically where to look.
> >
> > Regards,
> >
> > Bob peterson
> > Red Hat Cluster Suite
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
>
> Ray,
>
> What is the nature of the error that you are seeing?
>
> -J
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
From Leonardo.Mello at planejamento.gov.br Fri Aug 4 17:20:48 2006
From: Leonardo.Mello at planejamento.gov.br (Leonardo Rodrigues de Mello)
Date: Fri, 4 Aug 2006 14:20:48 -0300
Subject: [Linux-cluster] what causes "magma send einval to ..."?
Message-ID: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B61@corp-bsa-mp01.planejamento.gov.br>
There was some discussions about this on the list in october 2005.
http://www.google.com/search?q=%22Magma+send+einval+to%22&hl=en&lr=&filter=0
One entry in bugzilla relative this:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=169693
Best Regards
Leonardo Rodrigues de Mello
-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Greg Forte
Sent: sex 4/8/2006 11:32
To: linux clustering
Cc:
Subject: [Linux-cluster] what causes "magma send einval to ..."?
I had a cluster node chugging along seemingly fine last night, then the
following two lines appear in /var/log/messages:
Aug 3 22:20:07 hostname kernel: al to 1
Aug 3 22:20:07 hostname kernel: Magma send einval to 1
And about 20 seconds later the other node fenced this one.
I'm guessing that that fragmented message means that there's some sort
of kernel flakiness going on, or that the box got overloaded (no way to
tell, unfortunately - any recommendations on monitoring tools to track
and log load level?), but that's just a guess.
-g
Greg Forte
gforte at udel.edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3388 bytes
Desc: not available
URL:
From rohara at redhat.com Fri Aug 4 21:57:16 2006
From: rohara at redhat.com (Ryan O'Hara)
Date: Fri, 04 Aug 2006 16:57:16 -0500
Subject: [Linux-cluster] gfs support for extended security attributes
In-Reply-To: <6FE441CD9F0C0C479F2D88F959B01588298BCA@exchange.columbia.tresys.com>
References: <6FE441CD9F0C0C479F2D88F959B01588298BCA@exchange.columbia.tresys.com>
Message-ID: <44D3C2BC.2070904@redhat.com>
David,
Sorry for the delay.
The current release of GFS (in RHEL3 and RHEL4) does not support SELinux
extended attributes.
The code for GFS(1) in cvs HEAD does have support of SELinux. I added
this code recently. This should make its way into our released version
of GFS in the near future.
GFS2, which is currently in development and being pushed upstream, also
has SELinux extended attribute support.
So to answer your questions... No, our current release does not support
SELinux. Yes, we do plan to support it and the code is in-place.
Note that anyone who wanted to try using GFS/GFS2 with SELinux
attributes may need to make relevant changes to the policy. With that
said, I do know for certain that the Rawhide packages do have a policy
that define gfs and gfs2 as supported filesystems.
Ryan
David Caplan wrote:
>
> Does the current release of GFS support extended security attributes for
> use with SELinux? If not, are there any plans for support?
>
> Thanks,
> David
>
> --
> __________________________________
>
> David Caplan 410 290 1411 x105
> dac at tresys.com
> Tresys Technology, LLC
> 8840 Stanford Blvd., Suite 2100
> Columbia, MD 21045
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
From riaan at obsidian.co.za Sat Aug 5 22:19:56 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Sun, 06 Aug 2006 00:19:56 +0200
Subject: [Linux-cluster] 2-node fencing question
In-Reply-To: <1154633146.28677.70.camel@ayanami.boston.redhat.com>
References: <080220061550.6837.44D0C9B800021AD200001AB522007347489B9C0A99020E0B@comcast.net>
<1154633146.28677.70.camel@ayanami.boston.redhat.com>
Message-ID: <44D5198C.2090603@obsidian.co.za>
> However, it could very well be that IPMI hardware modules are slow
> enough at processing requests that this could pose a problem. What
> hardware has this happened on? Was ACPI disabled on boot in the host OS
> (it should be; see below)?
>
>
snip
>
> The time window for (c) increases significantly (5+ seconds) if the
> cluster nodes are enabling ACPI power events on boot. This is one of
> the reasons why booting with acpi=off is required when using IPMI, iLO,
> or other integrated power management solutions.
>
> If booting with acpi=off, does the problem persist?
>
Lon - is the requirement for disabling acpi when using integrated fence
devices documented anywhere?
I have searched far and wide on the nature of acpi=off (if it is good or
bad, recommended by Red Hat or anyone out there). Yours is the strongest
against acpi enabled I have found, but not for reasons I would have
expected.
My impression of acpi=off is it borders on a magical cure-all for
boot/installation problems (in part due to bad acpi by server/firmware
vendors), but that it also acts as some kind of safe mode (e.g. ht is
disabled, does things to IRQ routing, etc) which may have an adverse
effect on system performance.
Are you aware of any negative effects, performance or otherwise, which
acpi=off will cause. E.g. if the only adverse effect of acpi=off is
hyperthreading being disabled, users that want it back, can so using acpi=ht
Riaan
note: IMHO, a Knowledge Base article on the use of acpi=off (and its
variants), for general RHEL installations, and pertaining to RHCS/GFS
implementations would be very welcome.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riaan.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL:
From riaan at obsidian.co.za Sat Aug 5 22:53:13 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Sun, 06 Aug 2006 00:53:13 +0200
Subject: [Linux-cluster] CS4 Update 2 / is this problem fix more recent
update ?
In-Reply-To: <1154704272.28677.90.camel@ayanami.boston.redhat.com>
References: <44D3251A.5080001@bull.net>
<1154704272.28677.90.camel@ayanami.boston.redhat.com>
Message-ID: <44D52159.8090004@obsidian.co.za>
Lon Hohberger wrote:
> On Fri, 2006-08-04 at 12:44 +0200, Alain Moulle wrote:
>> Hi Ron,
>>
>> could you provide me the defects numbers and/or linked patches ?
>
> Here's the current list of pending fixes:
>
> http://bugzilla.redhat.com/bugzilla/buglist.cgi?component=rgmanager&bug_status=MODIFIED&bug_status=FAILS_QA&bug_status=ON_QA
>
Lon
With RHEL 4 update 4 just around the corner, what is the planned release
schedule for RHCS 4 update 4 / GFS 6.1 update 4? Since these are not
even in beta yet, does that mean that CS/GFS customers will have to wait
for the CS/GFS versions of update 4 before they can go to RHEL 4 update 4?
tnx
Riaan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riaan.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL:
From f.hackenberger at mediatransfer.de Mon Aug 7 07:16:56 2006
From: f.hackenberger at mediatransfer.de (Falk Hackenberger - MediaTransfer AG Netresearch & Consulting)
Date: Mon, 07 Aug 2006 09:16:56 +0200
Subject: [Linux-cluster] clurgmgrd stops service without reason
In-Reply-To: <1154630324.28677.21.camel@ayanami.boston.redhat.com>
References: <44D08673.3010207@mediatransfer.com>
<1154630324.28677.21.camel@ayanami.boston.redhat.com>
Message-ID: <44D6E8E8.4090903@mediatransfer.de>
Lon Hohberger wrote:
> On Wed, 2006-08-02 at 13:03 +0200, Falk Hackenberger - MediaTransfer AG
> Netresearch & Consulting wrote:
>
>>--snip--
>>Aug 1 17:31:28 kain clurgmgrd: [4780]: Executing
>>/exports/imap/checkimapstartup.sh status
>>Aug 1 17:31:28 kain clurgmgrd: [4780]: Executing
>>/exports/subversion/etc/rc.d/init.d/svnserver status
>>Aug 1 17:31:28 kain clurgmgrd: [4780]: Checking 192.168.0.223,
>>Level 0
>>Aug 1 17:31:28 kain clurgmgrd: [4780]: 192.168.0.223 present on
>>eth0
>>Aug 1 17:31:28 kain clurgmgrd: [4780]: Link for eth0: Detected
>>Aug 1 17:31:28 kain clurgmgrd: [4780]: Link detected on eth0
>>Aug 1 17:31:37 kain clurgmgrd[4780]: Stopping service storage
>>--snap--
>>
>>how to say to clurgmgrd, that he should log the reason for stoping the
>>service?
>
> Something must be returning an error code where it should not be; can
> you post your service XML blob?
it is very long and a little bit complex as i know... ;-)
From f.hackenberger at mediatransfer.com Mon Aug 7 07:17:26 2006
From: f.hackenberger at mediatransfer.com (Falk Hackenberger - MediaTransfer AG Netresearch & Consulting)
Date: Mon, 07 Aug 2006 09:17:26 +0200
Subject: [Linux-cluster] clurgmgrd stops service without reason
In-Reply-To: <1154630324.28677.21.camel@ayanami.boston.redhat.com>
References: <44D08673.3010207@mediatransfer.com>
<1154630324.28677.21.camel@ayanami.boston.redhat.com>
Message-ID: <44D6E906.5060006@mediatransfer.com>
Lon Hohberger wrote:
> On Wed, 2006-08-02 at 13:03 +0200, Falk Hackenberger - MediaTransfer AG
> Netresearch & Consulting wrote:
>
>>--snip--
>>Aug 1 17:31:28 kain clurgmgrd: [4780]: Executing
>>/exports/imap/checkimapstartup.sh status
>>Aug 1 17:31:28 kain clurgmgrd: [4780]: Executing
>>/exports/subversion/etc/rc.d/init.d/svnserver status
>>Aug 1 17:31:28 kain clurgmgrd: [4780]: Checking 192.168.0.223,
>>Level 0
>>Aug 1 17:31:28 kain clurgmgrd: [4780]: 192.168.0.223 present on
>>eth0
>>Aug 1 17:31:28 kain clurgmgrd: [4780]: Link for eth0: Detected
>>Aug 1 17:31:28 kain clurgmgrd: [4780]: Link detected on eth0
>>Aug 1 17:31:37 kain clurgmgrd[4780]: Stopping service storage
>>--snap--
>>
>>how to say to clurgmgrd, that he should log the reason for stoping the
>>service?
>
> Something must be returning an error code where it should not be; can
> you post your service XML blob?
it is very long and a little bit complex as i know... ;-)
From neohill at gmail.com Mon Aug 7 07:50:07 2006
From: neohill at gmail.com (Neo Hill)
Date: Mon, 7 Aug 2006 09:50:07 +0200
Subject: [Linux-cluster] DRBD in Active-active mode
Message-ID:
Hi everybody,
I am still looking on information or documents regarding DRBD in
active-active mode.
Does anyone could help me ?
Thanks a lot.
Neo hill
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Alain.Moulle at bull.net Mon Aug 7 09:29:33 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Mon, 07 Aug 2006 11:29:33 +0200
Subject: [Linux-cluster] CS4 Update 2 / is this problem fix more recent
update ?
In-Reply-To: <1154704272.28677.90.camel@ayanami.boston.redhat.com>
References: <44D3251A.5080001@bull.net>
<1154704272.28677.90.camel@ayanami.boston.redhat.com>
Message-ID: <44D707FD.6080602@bull.net>
Hi Lon
I've tried to patch the U2 version with this patch, but it requires
a nodeevent.c which apparently did not exist in CS4 U2 (that Makefile patch
adds a nodeevent.o as well as the watchdog.o) .
Does that mean that this patch can definetly not be applied
on rgmanager (1.9.38) from CS4 U2 ?
Thanks
Alain Moull?
Lon Hohberger wrote:
> On Fri, 2006-08-04 at 12:44 +0200, Alain Moulle wrote:
>
>>Hi Ron,
>>
>>could you provide me the defects numbers and/or linked patches ?
>
>
> Here's the current list of pending fixes:
>
> http://bugzilla.redhat.com/bugzilla/buglist.cgi?component=rgmanager&bug_status=MODIFIED&bug_status=FAILS_QA&bug_status=ON_QA
>
> The patch for internal self-monitoring was simply a backport from the
> HEAD branch. I've attached a hand-edited patch which should enable the
> self-monitoring bit.
>
> Additionally, there was a segfault fixed in U3. Here's the errata
> advisory, which contains links to bugzillas:
>
> https://rhn.redhat.com/errata/RHBA-2006-0241.html
>
> -- Lon
From joe.devman at yahoo.fr Mon Aug 7 12:08:00 2006
From: joe.devman at yahoo.fr (Joe)
Date: Mon, 07 Aug 2006 14:08:00 +0200
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <44CF94FE.3070407@redhat.com>
References: <44CF2F94.4000003@framestore-cfc.com>
<44CF8383.3040208@redhat.com> <44CF822D.7070705@framestore-cfc.com>
<44CF94FE.3070407@redhat.com>
Message-ID: <44D72D20.2060705@yahoo.fr>
Robert Peterson wrote:
> We've tried to kick around ideas on how to improve the speed, such as
> (1) adding an option to only focus on areas where the journals are dirty,
> (2) introducing multiple threads to process the different RGs, and even
> (3) trying to get multiple nodes in the cluster to team up and do
> different
> areas of the file system. None of these have been implemented yet
> because of higher priorities. Since this is an open-source project,
> anyone
> could step in and do these. Volunteers?
>
I've tried to look at the code many times. But, as a clustered file
system is a complex thing, it gets hard to figure out what it's all
about. I tried to find a "big picture" documentation, at least for
on-disk layout. The only nearest thing i've found is :
http://opengfs.sourceforge.net/docs.php , which is the documentation
written at the time OpenGFS forked from Cistina's code. Although
principles may still be the sames (or not ?), the code has obviously
changed and on-disk layout may not be the same, too.
So, is there some sort of documentation about the principles found in
GFS (not a design doc, i've read
/usr/src/linux/Documentation/stable_api_nonsense.txt) ? This would much
help anybody who wishes to enter the code to do it more efficientely...
Thanks !
From lhh at redhat.com Mon Aug 7 14:27:27 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 07 Aug 2006 10:27:27 -0400
Subject: [Linux-cluster] DRBD in Active-active mode
In-Reply-To:
References:
Message-ID: <1154960847.21204.35.camel@ayanami.boston.redhat.com>
On Mon, 2006-08-07 at 09:50 +0200, Neo Hill wrote:
> Hi everybody,
>
> I am still looking on information or documents regarding DRBD in
> active-active mode.
>
> Does anyone could help me ?
>
> Thanks a lot.
Fairly certain this is not possible, unless something has changed
recently. That is, you cannot use DRBD as a distributed concurrently
writable mirror; only one node can be the master of a DRBD device at a
time.
You can do this with GNBD + Cluster Mirroring, though.
-- Lon
From lhh at redhat.com Mon Aug 7 14:30:41 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 07 Aug 2006 10:30:41 -0400
Subject: [Linux-cluster] CS4 Update 2 / is this problem fix more recent
update ?
In-Reply-To: <44D707FD.6080602@bull.net>
References: <44D3251A.5080001@bull.net>
<1154704272.28677.90.camel@ayanami.boston.redhat.com>
<44D707FD.6080602@bull.net>
Message-ID: <1154961041.21204.40.camel@ayanami.boston.redhat.com>
On Mon, 2006-08-07 at 11:29 +0200, Alain Moulle wrote:
> Hi Lon
>
> I've tried to patch the U2 version with this patch, but it requires
> a nodeevent.c which apparently did not exist in CS4 U2 (that Makefile patch
> adds a nodeevent.o as well as the watchdog.o) .
> Does that mean that this patch can definetly not be applied
> on rgmanager (1.9.38) from CS4 U2 ?
Take it out of the patched Makefile. Nodeevent.c shouldn't be required
to make the watchdog work.
-- Lon
From chawkins at bplinux.com Mon Aug 7 14:33:31 2006
From: chawkins at bplinux.com (Christopher Hawkins)
Date: Mon, 7 Aug 2006 10:33:31 -0400
Subject: [Linux-cluster] DRBD in Active-active mode
In-Reply-To: <1154960847.21204.35.camel@ayanami.boston.redhat.com>
Message-ID: <200608071418.k77EIq1X000664@mail2.ontariocreditcorp.com>
On Mon, 2006-08-07 at 09:50 +0200, Neo Hill wrote:
>> Hi everybody,
>>
>> I am still looking on information or documents regarding DRBD in
>> active-active mode.
>>
>> Does anyone could help me ?
>>
>> Thanks a lot.
>Fairly certain this is not possible, unless something has changed recently.
>That is, you cannot use DRBD as adistributed concurrently writable mirror;
>only one node can be the master of a DRBD device at a time.
>You can do this with GNBD + Cluster Mirroring, though.
>-- Lon
Lon,
GNBD + Cluster Mirroring? Are you referring to clvm2, or is there another
package out there I haven't heard of?
Thanks,
Chris
From riaan at obsidian.co.za Mon Aug 7 15:03:32 2006
From: riaan at obsidian.co.za (Riaan van Niekerk)
Date: Mon, 07 Aug 2006 17:03:32 +0200
Subject: [Linux-cluster] DRBD in Active-active mode
In-Reply-To: <1154960847.21204.35.camel@ayanami.boston.redhat.com>
References:
<1154960847.21204.35.camel@ayanami.boston.redhat.com>
Message-ID: <44D75644.8000901@obsidian.co.za>
Lon Hohberger wrote:
> On Mon, 2006-08-07 at 09:50 +0200, Neo Hill wrote:
>> Hi everybody,
>>
>> I am still looking on information or documents regarding DRBD in
>> active-active mode.
>>
>> Does anyone could help me ?
>>
>> Thanks a lot.
>
> Fairly certain this is not possible, unless something has changed
> recently. That is, you cannot use DRBD as a distributed concurrently
> writable mirror; only one node can be the master of a DRBD device at a
> time.
>
> You can do this with GNBD + Cluster Mirroring, though.
>
> -- Lon
According to the DRBD FAQ:
http://www.linux-ha.org/DRBD/FAQ#head-ec4ab5a57e15232e9ac4e12775de5a1b328aeff5
Why does DRBD not allow concurrent access from all nodes? I'd like to
use it with GFS/OCFS2...
Actually, DRBD v8 (which is still in pre-release state at the
time of this writing) supports this. You need to net {
allow-two-primaries; } ...
I have not tried this myself though, but would like to hear the
experiences of anyone who has tried this.
Riaan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riaan.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL:
From Alain.Moulle at bull.net Mon Aug 7 15:07:40 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Mon, 07 Aug 2006 17:07:40 +0200
Subject: [Linux-cluster] CS4 Update 2 / is this problem fix more recent
update ?
In-Reply-To: <1154961041.21204.40.camel@ayanami.boston.redhat.com>
References: <44D3251A.5080001@bull.net>
<1154704272.28677.90.camel@ayanami.boston.redhat.com>
<44D707FD.6080602@bull.net>
<1154961041.21204.40.camel@ayanami.boston.redhat.com>
Message-ID: <44D7573C.5020209@bull.net>
Lon Hohberger wrote:
> On Mon, 2006-08-07 at 11:29 +0200, Alain Moulle wrote:
>
>>Hi Lon
>>
>>I've tried to patch the U2 version with this patch, but it requires
>>a nodeevent.c which apparently did not exist in CS4 U2 (that Makefile patch
>>adds a nodeevent.o as well as the watchdog.o) .
>>Does that mean that this patch can definetly not be applied
>>on rgmanager (1.9.38) from CS4 U2 ?
>
>
> Take it out of the patched Makefile. Nodeevent.c shouldn't be required
> to make the watchdog work.
>
> -- Lon
Build ok. Thanks.
Could you explain exactly the benefit of this watchdog work ?
Thanks
Alain
>
>
From rpeterso at redhat.com Mon Aug 7 15:55:10 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Mon, 07 Aug 2006 10:55:10 -0500
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <44D72D20.2060705@yahoo.fr>
References: <44CF2F94.4000003@framestore-cfc.com> <44CF8383.3040208@redhat.com> <44CF822D.7070705@framestore-cfc.com> <44CF94FE.3070407@redhat.com>
<44D72D20.2060705@yahoo.fr>
Message-ID: <44D7625E.9090305@redhat.com>
Joe wrote:
> I've tried to look at the code many times. But, as a clustered file
> system is a complex thing, it gets hard to figure out what it's all
> about. I tried to find a "big picture" documentation, at least for
> on-disk layout. The only nearest thing i've found is :
> http://opengfs.sourceforge.net/docs.php , which is the documentation
> written at the time OpenGFS forked from Cistina's code. Although
> principles may still be the sames (or not ?), the code has obviously
> changed and on-disk layout may not be the same, too.
> So, is there some sort of documentation about the principles found in
> GFS (not a design doc, i've read
> /usr/src/linux/Documentation/stable_api_nonsense.txt) ? This would
> much help anybody who wishes to enter the code to do it more
> efficientely...
>
> Thanks !
Hi Joe,
I agree that there isn't much good design information out there
regarding GFS.
That might be because it started out as a proprietary product before Red
Hat open-sourced it.
There are some comments in the kernel's gfs_ondisk.h include:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/gfs_ondisk.h?cvsroot=cluster
Perhaps I'll start working on a GFS1/2 design whitepaper based on some
of the information I've gathered.
Regards,
Bob Peterson
Red Hat Cluster Suite
From hardyjm at potsdam.edu Mon Aug 7 18:11:06 2006
From: hardyjm at potsdam.edu (Jeff Hardy)
Date: Mon, 07 Aug 2006 14:11:06 -0400
Subject: [Linux-cluster] lvm2 liblvm2clusterlock.so on fc5
In-Reply-To: <1154022763.2789.120.camel@fritzdesk.potsdam.edu>
References: <1154022763.2789.120.camel@fritzdesk.potsdam.edu>
Message-ID: <1154974266.1797.124.camel@fritzdesk.potsdam.edu>
On Thu, 2006-07-27 at 13:52 -0400, Jeff Hardy wrote:
> I apologize if this has been answered already or appeared in release
> notes somewhere, but I cannot find it. FC4 had the lvm2-cluster package
> to provide the clvm locking library. This was removed in FC5 (as
> indicated in the release notes).
>
> Is this still necessary for a clvm setup:
>
> In /etc/lvm/lvm.conf:
> locking_type = 2
> locking_library = "/lib/liblvm2clusterlock.so"
>
> And if so, where does one find this now?
>
> Thank you.
>
>
Well, though absent in FC5, I just recently saw a message somewhere
indicating the lvm2-cluster package was back in FC6 testing. Anyone
have any idea why this was dropped for FC5? I built off of the lvm2
source rpm, using a modified lvm2-cluster spec file from FC4. Looks ok.
If anyone has reason to believe this is a really bad idea, or wants the
srpm or rpm, feel free to drop me a line.
--
Jeff Hardy
Systems Analyst
hardyjm at potsdam.edu
From agk at redhat.com Mon Aug 7 18:28:24 2006
From: agk at redhat.com (Alasdair G Kergon)
Date: Mon, 7 Aug 2006 19:28:24 +0100
Subject: [Linux-cluster] lvm2 liblvm2clusterlock.so on fc5
In-Reply-To: <1154974266.1797.124.camel@fritzdesk.potsdam.edu>
References: <1154022763.2789.120.camel@fritzdesk.potsdam.edu>
<1154974266.1797.124.camel@fritzdesk.potsdam.edu>
Message-ID: <20060807182824.GP18633@agk.surrey.redhat.com>
On Mon, Aug 07, 2006 at 02:11:06PM -0400, Jeff Hardy wrote:
> have any idea why this was dropped for FC5?
It got disabled early on because it wouldn't build (depended on cluster
infrastructure that wasn't there) and when that got resolved, nobody
remembered to reenable it.
As you noticed, we've got it back into fc6/devel and we're trying to
to get it approved for fc5 updates.
Alasdair
--
agk at redhat.com
From Leonardo.Mello at planejamento.gov.br Mon Aug 7 19:09:00 2006
From: Leonardo.Mello at planejamento.gov.br (Leonardo Rodrigues de Mello)
Date: Mon, 7 Aug 2006 16:09:00 -0300
Subject: RES: [Linux-cluster] DRBD in Active-active mode
Message-ID: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B6E@corp-bsa-mp01.planejamento.gov.br>
Hi everyone,
Lon you are right for the stable version of DRBD the version 0.7. But DRBD actualy has support for active-active setup in the development version 0.8. There is significant changes between this versions, the entire roadmap can be read at:
http://svn.drbd.org/drbd/trunk/ROADMAP
I have done some investigations and tests with DRBD 0.8 in active-active setup with two nodes and OCFS2 and with GFS. This was for one project i was doing related to oracle Rac 10g.
I have produced one documentation in portuguese that shows how to setup and use drbd in active-active with ocfs2. the link is:
http://guialivre.governoeletronico.gov.br/seminario/index.php/DocumentacaoTecnologiasDRBDOCFS2
I have discovered in my investigations that ocfs2 is more unstable that GFS. I have a several kernel panics with ocfs2 under high loads on the machine, but no one with GFS.
I have the instalation of GFS documented at, one performance test i have done some time ago:
http://guialivre.governoeletronico.gov.br/mediawiki/index.php/TestesGFS
(here i use clvm, and gnbd)
The problem of drbd is that actualy you can use just two machines, if you want to use more you need to use the commercial version drbd+.
Best Regards
Leonardo Rodrigues de Mello
-----Mensagem original-----
De: linux-cluster-bounces at redhat.com em nome de Lon Hohberger
Enviada: seg 7/8/2006 11:27
Para: linux clustering
Cc:
Assunto: Re: [Linux-cluster] DRBD in Active-active mode
On Mon, 2006-08-07 at 09:50 +0200, Neo Hill wrote:
> Hi everybody,
>
> I am still looking on information or documents regarding DRBD in
> active-active mode.
>
> Does anyone could help me ?
>
> Thanks a lot.
Fairly certain this is not possible, unless something has changed
recently. That is, you cannot use DRBD as a distributed concurrently
writable mirror; only one node can be the master of a DRBD device at a
time.
You can do this with GNBD + Cluster Mirroring, though.
-- Lon
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3708 bytes
Desc: not available
URL:
From Leonardo.Mello at planejamento.gov.br Mon Aug 7 19:14:19 2006
From: Leonardo.Mello at planejamento.gov.br (Leonardo Rodrigues de Mello)
Date: Mon, 7 Aug 2006 16:14:19 -0300
Subject: RES: [Linux-cluster] DRBD in Active-active mode
Message-ID: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B6F@corp-bsa-mp01.planejamento.gov.br>
sorry for the typos and english errors in the last message.
I believe need more coffee.
Leonardo Rodrigues de Mello
-----Mensagem original-----
De: Leonardo Rodrigues de Mello em nome de Leonardo Rodrigues de Mello
Enviada: seg 7/8/2006 16:09
Para: linux clustering
Cc:
Assunto: RES: [Linux-cluster] DRBD in Active-active mode
Hi everyone,
Lon you are right for the stable version of DRBD the version 0.7. But DRBD actualy has support for active-active setup in the development version 0.8. There is significant changes between this versions, the entire roadmap can be read at:
http://svn.drbd.org/drbd/trunk/ROADMAP
I have done some investigations and tests with DRBD 0.8 in active-active setup with two nodes and OCFS2 and with GFS. This was for one project i was doing related to oracle Rac 10g.
I have produced one documentation in portuguese that shows how to setup and use drbd in active-active with ocfs2. the link is:
http://guialivre.governoeletronico.gov.br/seminario/index.php/DocumentacaoTecnologiasDRBDOCFS2
I have discovered in my investigations that ocfs2 is more unstable that GFS. I have a several kernel panics with ocfs2 under high loads on the machine, but no one with GFS.
I have the instalation of GFS documented at, one performance test i have done some time ago:
http://guialivre.governoeletronico.gov.br/mediawiki/index.php/TestesGFS
(here i use clvm, and gnbd)
The problem of drbd is that actualy you can use just two machines, if you want to use more you need to use the commercial version drbd+.
Best Regards
Leonardo Rodrigues de Mello
-----Mensagem original-----
De: linux-cluster-bounces at redhat.com em nome de Lon Hohberger
Enviada: seg 7/8/2006 11:27
Para: linux clustering
Cc:
Assunto: Re: [Linux-cluster] DRBD in Active-active mode
On Mon, 2006-08-07 at 09:50 +0200, Neo Hill wrote:
> Hi everybody,
>
> I am still looking on information or documents regarding DRBD in
> active-active mode.
>
> Does anyone could help me ?
>
> Thanks a lot.
Fairly certain this is not possible, unless something has changed
recently. That is, you cannot use DRBD as a distributed concurrently
writable mirror; only one node can be the master of a DRBD device at a
time.
You can do this with GNBD + Cluster Mirroring, though.
-- Lon
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3932 bytes
Desc: not available
URL:
From haiwu.us at gmail.com Mon Aug 7 20:21:39 2006
From: haiwu.us at gmail.com (hai wu)
Date: Mon, 7 Aug 2006 15:21:39 -0500
Subject: [Linux-cluster] 2-node cluster and fence_drac
Message-ID:
Hi,
For a 2-node cluster (RHEL4), does it require the use of power switch or
fence_drac would be good enough for the setup? Would fence_drac work
properly in a 2-node cluster?
Thanks,
Hai
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From brad at seatab.com Mon Aug 7 22:07:29 2006
From: brad at seatab.com (Brad Dameron)
Date: Mon, 07 Aug 2006 15:07:29 -0700
Subject: [Linux-cluster] GFS 6.1 kernel warning.
Message-ID: <1154988449.19157.20.camel@serpent.office.seatab.com>
We have two rather large server's running GFS in a production
environment and have been getting errors since the start. First our
configuration.
2 - Quad 880 Opteron Servers with 64GB RAM
1 - Infortrend 2GB SAN
OS - SuSe 10.0 Professional (Kernel 2.6.13-15.8-smp x86_64)
Cluster network is on GigE connection. This link is shared and used for
other purposes but not much traffic.
Here is the error message:
Aug 7 14:40:45 CServer01 kernel: GFS: fsid=Cluster01:gfs1.1: warning:
assertion "gfs_glock_is_locked_by_me(ip->i_gl)" failed
Aug 7 14:40:45 CServer01 kernel: GFS: fsid=Cluster01:gfs1.1: function
= gfs_readpage
Aug 7 14:40:45 CServer01 kernel: GFS: fsid=Cluster01:gfs1.1: file
= /usr/src/gfs/src/cluster-1.02.00/gfs-kernel/src/gfs/ops_address.c,
line = 283
Aug 7 14:40:45 CServer01 kernel: GFS: fsid=Cluster01:gfs1.1: time =
1154986845
This appears to occur when both machines try to access the same
files/directory. They happen at a rate of about 10-15 an hour. Anyone
know if this is critical or a way to turn these off if they are not an
issue? There is definitely a big performance issue when using GFS on
very CPU intense applications. When the first server is using all 8 CPU
core's doing processing the second server's IO response slows to a
crawl. Any sysctl tweaks to help improve the performance appreciated.
Thanks,
Brad Dameron
SeaTab Software
www.seatab.com
From Alain.Moulle at bull.net Tue Aug 8 07:53:27 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 08 Aug 2006 09:53:27 +0200
Subject: [Linux-cluster] CS4 Update 4 / two questions
Message-ID: <44D842F7.7080805@bull.net>
Hi
1/ About the return of quorum disk functionnality : is it mandatory
to configure it, or is it possible to run the CS4 U4 without it in
a first step ?
(this question only to know how to manage eventual update from U2 (currently in
production without quorum disk configured) to U4 )
2/ is there a beta documentation on CS4 U4 download-able somewhere ?
Thanks
Alain
From pcaulfie at redhat.com Tue Aug 8 08:04:37 2006
From: pcaulfie at redhat.com (Patrick Caulfield)
Date: Tue, 08 Aug 2006 09:04:37 +0100
Subject: [Linux-cluster] CS4 Update 4 / two questions
In-Reply-To: <44D842F7.7080805@bull.net>
References: <44D842F7.7080805@bull.net>
Message-ID: <44D84595.3000304@redhat.com>
Alain Moulle wrote:
> Hi
>
> 1/ About the return of quorum disk functionnality : is it mandatory
> to configure it, or is it possible to run the CS4 U4 without it in
> a first step ?
>
> (this question only to know how to manage eventual update from U2 (currently in
> production without quorum disk configured) to U4 )
>
Quorum disk is completely optional, even in a two-node system.
--
patrick
From Alain.Moulle at bull.net Tue Aug 8 13:08:38 2006
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 08 Aug 2006 15:08:38 +0200
Subject: [Linux-cluster] CS4 Update 4/ about __NR_gettid and syscall
Message-ID: <44D88CD6.7030800@bull.net>
Hi
In CS4 Update 4 , there are several places where a syscall call
is dependant on NR_gettid set or not , for example in qdisk/gettid.c :
#include
#include
#include
#include
/* Patch from Adam Conrad / Ubuntu: Don't use _syscall macro */
#ifdef __NR_gettid
pid_t gettid (void)
{
return syscall(__NR_gettid);
}
#else
#warn "gettid not available -- substituting with pthread_self()"
#include
pid_t gettid (void)
{
return (pid_t)pthread_self();
}
#endif
and also in :
magma-plugins-1.0.9/gulm/gulm.c
rgmanager-1.9.52/src/clulib/gettid
And in fact, I have compilation error if the syscall is choosen by the ifdef ,
so I wonder what to do about that , what does __NR_gettid means ,etc.
Any piece of advise ?
Thanks
Alain
From Leonardo.Mello at planejamento.gov.br Wed Aug 9 15:04:58 2006
From: Leonardo.Mello at planejamento.gov.br (Leonardo Rodrigues de Mello)
Date: Wed, 9 Aug 2006 12:04:58 -0300
Subject: [Linux-cluster] cs-deploy-gfs
Message-ID: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B78@corp-bsa-mp01.planejamento.gov.br>
Hi everyone,
Does anyone know what happened with the development of cs-deploy-gfs ?
The version in cvs is the lastest version ?
There is any improvements since the initial version ?
This software was abandoned ?
I don't have much time but I want help in the development of it, do the things in TODO, and others like porting it to systems debian-like or better, port it to use smartpm (http://labix.org/smart), help with the internacionalization, and translation to portuguese brazil.
Best Regards
Leonardo Rodrigues de Mello
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Leonardo.Mello at planejamento.gov.br Wed Aug 9 15:16:16 2006
From: Leonardo.Mello at planejamento.gov.br (Leonardo Rodrigues de Mello)
Date: Wed, 9 Aug 2006 12:16:16 -0300
Subject: cs-deploy-tool (WAS: [Linux-cluster] cs-deploy-gfs)
Message-ID: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B79@corp-bsa-mp01.planejamento.gov.br>
sorry,
the application name is cs-deploy-tool, not cs-deploy-gfs.
Best Regards
Leonardo Rodrigues de Mello
-----Mensagem original-----
De: linux-cluster-bounces at redhat.com em nome de Leonardo Rodrigues de Mello
Enviada: qua 9/8/2006 12:04
Para: linux-cluster at redhat.com
Cc:
Assunto: [Linux-cluster] cs-deploy-gfs
Hi everyone,
Does anyone know what happened with the development of cs-deploy-gfs ?
The version in cvs is the lastest version ?
There is any improvements since the initial version ?
This software was abandoned ?
I don't have much time but I want help in the development of it, do the things in TODO, and others like porting it to systems debian-like or better, port it to use smartpm (http://labix.org/smart), help with the internacionalization, and translation to portuguese brazil.
Best Regards
Leonardo Rodrigues de Mello
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3004 bytes
Desc: not available
URL:
From jparsons at redhat.com Wed Aug 9 15:16:41 2006
From: jparsons at redhat.com (James Parsons)
Date: Wed, 09 Aug 2006 11:16:41 -0400
Subject: cs-deploy-tool (WAS: [Linux-cluster] cs-deploy-gfs)
In-Reply-To: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B79@corp-bsa-mp01.planejamento.gov.br>
References: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B79@corp-bsa-mp01.planejamento.gov.br>
Message-ID: <44D9FC59.5050408@redhat.com>
Leonardo Rodrigues de Mello wrote:
>sorry,
>the application name is cs-deploy-tool, not cs-deploy-gfs.
>
>Best Regards
>Leonardo Rodrigues de Mello
>
>
>-----Mensagem original-----
>De: linux-cluster-bounces at redhat.com em nome de Leonardo Rodrigues de Mello
>Enviada: qua 9/8/2006 12:04
>Para: linux-cluster at redhat.com
>Cc:
>Assunto: [Linux-cluster] cs-deploy-gfs
>
>Hi everyone,
>
>Does anyone know what happened with the development of cs-deploy-gfs ?
>
>The version in cvs is the lastest version ?
>
>There is any improvements since the initial version ?
>
>This software was abandoned ?
>
>
The functionality available in cs-deploy-tool will be available in a new
management interface for clusters and storage called Conga, targetted
for RHEL5 and (hopefully) RHEL4.5
-J
From stephen.willey at framestore-cfc.com Wed Aug 9 15:33:29 2006
From: stephen.willey at framestore-cfc.com (Stephen Willey)
Date: Wed, 09 Aug 2006 16:33:29 +0100
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <44D7625E.9090305@redhat.com>
References: <44CF2F94.4000003@framestore-cfc.com> <44CF8383.3040208@redhat.com> <44CF822D.7070705@framestore-cfc.com> <44CF94FE.3070407@redhat.com> <44D72D20.2060705@yahoo.fr>
<44D7625E.9090305@redhat.com>
Message-ID: <44DA0049.8030505@framestore-cfc.com>
So ya know...
Once we'd added a 137Gb swap drive, it took 48 hours to run all stages
of the gfs_fsck on a 42Tb filesystem without any -v options
That was on a dual Opteron 275 (4Gb RAM) with 4Gb FC to 6 SATA RAIDs in
CLVM.
--
Stephen Willey
Senior Systems Engineer, Framestore-CFC
+44 (0)207 344 8000
http://www.framestore-cfc.com
From lhh at redhat.com Wed Aug 9 15:34:39 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 09 Aug 2006 11:34:39 -0400
Subject: [Linux-cluster] CS4 Update 4 / two questions
In-Reply-To: <44D842F7.7080805@bull.net>
References: <44D842F7.7080805@bull.net>
Message-ID: <1155137679.21204.144.camel@ayanami.boston.redhat.com>
On Tue, 2006-08-08 at 09:53 +0200, Alain Moulle wrote:
> Hi
>
> 1/ About the return of quorum disk functionnality : is it mandatory
> to configure it,
Not required in the least.
-- Lon
From lhh at redhat.com Wed Aug 9 15:35:16 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 09 Aug 2006 11:35:16 -0400
Subject: [Linux-cluster] CS4 Update 2 / is this problem fix more recent
update ?
In-Reply-To: <44D7573C.5020209@bull.net>
References: <44D3251A.5080001@bull.net>
<1154704272.28677.90.camel@ayanami.boston.redhat.com>
<44D707FD.6080602@bull.net>
<1154961041.21204.40.camel@ayanami.boston.redhat.com>
<44D7573C.5020209@bull.net>
Message-ID: <1155137716.21204.146.camel@ayanami.boston.redhat.com>
On Mon, 2006-08-07 at 17:07 +0200, Alain Moulle wrote:
> Build ok. Thanks.
> Could you explain exactly the benefit of this watchdog work ?
> Thanks
> Alain
If rgmanager crashes, the node gets rebooted.
-- Lon
From lhh at redhat.com Wed Aug 9 15:37:07 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 09 Aug 2006 11:37:07 -0400
Subject: RES: [Linux-cluster] DRBD in Active-active mode
In-Reply-To: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B6E@corp-bsa-mp01.planejamento.gov.br>
References: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B6E@corp-bsa-mp01.planejamento.gov.br>
Message-ID: <1155137827.21204.149.camel@ayanami.boston.redhat.com>
On Mon, 2006-08-07 at 16:09 -0300, Leonardo Rodrigues de Mello wrote:
> Hi everyone,
>
> Lon you are right for the stable version of DRBD the version 0.7. But DRBD actualy has support for active-active setup in the development version 0.8. There is significant changes between this versions, the entire roadmap can be read at:
> http://svn.drbd.org/drbd/trunk/ROADMAP
Awesome. :)
>
> I have done some investigations and tests with DRBD 0.8 in active-active setup with two nodes and OCFS2 and with GFS. This was for one project i was doing related to oracle Rac 10g.
>
> I have produced one documentation in portuguese that shows how to setup and use drbd in active-active with ocfs2. the link is:
> http://guialivre.governoeletronico.gov.br/seminario/index.php/DocumentacaoTecnologiasDRBDOCFS2
>
> I have discovered in my investigations that ocfs2 is more unstable that GFS. I have a several kernel panics with ocfs2 under high loads on the machine, but no one with GFS.
>
> I have the instalation of GFS documented at, one performance test i have done some time ago:
> http://guialivre.governoeletronico.gov.br/mediawiki/index.php/TestesGFS
> (here i use clvm, and gnbd)
>
>
> The problem of drbd is that actualy you can use just two machines, if you want to use more you need to use the commercial version drbd+.
Wow, great information. Thanks!
-- Lon
From lhh at redhat.com Wed Aug 9 15:46:12 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 09 Aug 2006 11:46:12 -0400
Subject: [Linux-cluster] 2-node cluster and fence_drac
In-Reply-To:
References:
Message-ID: <1155138372.21204.158.camel@ayanami.boston.redhat.com>
On Mon, 2006-08-07 at 15:21 -0500, hai wu wrote:
> Hi,
> For a 2-node cluster (RHEL4), does it require the use of power switch
> or fence_drac would be good enough for the setup? Would fence_drac
> work properly in a 2-node cluster?
> Thanks,
> Hai
fence_drac would be fine, but you need to understand that with DRAC (or
any integrated power management which receives power from the machine)
that if host power is completely lost, fencing will fail - causing the
cluster to stop.
This failure is indistinguishable from DRAC + host losing network at the
same time (ex: the ethernet switch fails).
Generally, these machines have redundant power, so losing power all at
once is less likely.
So, DRAC is fine, but there are failure cases where it is less than
optimal, particularly in machines without redundant power supplies.
-- Lon
From lhh at redhat.com Wed Aug 9 15:46:38 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 09 Aug 2006 11:46:38 -0400
Subject: [Linux-cluster] cs-deploy-gfs
In-Reply-To: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B78@corp-bsa-mp01.planejamento.gov.br>
References: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B78@corp-bsa-mp01.planejamento.gov.br>
Message-ID: <1155138398.21204.160.camel@ayanami.boston.redhat.com>
On Wed, 2006-08-09 at 12:04 -0300, Leonardo Rodrigues de Mello wrote:
> Hi everyone,
>
> Does anyone know what happened with the development of cs-deploy-gfs ?
I think that it's being replaced with Conga.
-- Lon
From Leonardo.Mello at planejamento.gov.br Wed Aug 9 16:54:03 2006
From: Leonardo.Mello at planejamento.gov.br (Leonardo Rodrigues de Mello)
Date: Wed, 9 Aug 2006 13:54:03 -0300
Subject: RES: cs-deploy-tool (WAS: [Linux-cluster] cs-deploy-gfs)
Message-ID: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B7A@corp-bsa-mp01.planejamento.gov.br>
Thanks for the anwsers :-D
But I believe beside the fact that conga and cs-deploy-tool share some things in common. cs-deploy-tool has it place for simple and no pain instalation of cluster suite with basic services in one network environment.
The point that count for me is that i can get anywhere with my laptop and just with the knowledge of IP numbers and root passwords setup one cluster in 10 minutes or less.
To use conga i need go configure and install zope in one machine, install the agents in the servers that will be in the cluster, configure the zope to see the agents, its more complicated and demands more work for the simple task of cluster instalation and basic initial configuration.
Conga is one great initiative and complex initiative for managing, deploy, administration, and others thinks for production cluster enviroments. if i need to choose one tool to just deploy cluster suite. i will choose cs-deploy-tool.
If i need to manage, and be the administrator of a cluster, of course i will need the power of conga. :-D
this long message is just to ask: Can I implement the changes i had proposed in the first message ? if, yes to who i will send they ?
best regards
Leonardo Rodrigues de Mello
-----Mensagem original-----
De: linux-cluster-bounces at redhat.com em nome de James Parsons
Enviada: qua 9/8/2006 12:16
Para: linux clustering
Cc:
Assunto: Re: cs-deploy-tool (WAS: [Linux-cluster] cs-deploy-gfs)
Leonardo Rodrigues de Mello wrote:
>sorry,
>the application name is cs-deploy-tool, not cs-deploy-gfs.
>
>Best Regards
>Leonardo Rodrigues de Mello
>
>
>-----Mensagem original-----
>De: linux-cluster-bounces at redhat.com em nome de Leonardo Rodrigues de Mello
>Enviada: qua 9/8/2006 12:04
>Para: linux-cluster at redhat.com
>Cc:
>Assunto: [Linux-cluster] cs-deploy-gfs
>
>Hi everyone,
>
>Does anyone know what happened with the development of cs-deploy-gfs ?
>
>The version in cvs is the lastest version ?
>
>There is any improvements since the initial version ?
>
>This software was abandoned ?
>
>
The functionality available in cs-deploy-tool will be available in a new
management interface for clusters and storage called Conga, targetted
for RHEL5 and (hopefully) RHEL4.5
-J
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3761 bytes
Desc: not available
URL:
From jparsons at redhat.com Wed Aug 9 18:17:44 2006
From: jparsons at redhat.com (James Parsons)
Date: Wed, 09 Aug 2006 14:17:44 -0400
Subject: RES: cs-deploy-tool (WAS: [Linux-cluster] cs-deploy-gfs)
In-Reply-To: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B7A@corp-bsa-mp01.planejamento.gov.br>
References: <1DDCE5B29CB5BC42BC2BFC39E3F1C8A3255B7A@corp-bsa-mp01.planejamento.gov.br>
Message-ID: <44DA26C8.6020701@redhat.com>
Leonardo Rodrigues de Mello wrote:
>Thanks for the anwsers :-D
>
>But I believe beside the fact that conga and cs-deploy-tool share some things in common. cs-deploy-tool has it place for simple and no pain instalation of cluster suite with basic services in one network environment.
>
>The point that count for me is that i can get anywhere with my laptop and just with the knowledge of IP numbers and root passwords setup one cluster in 10 minutes or less.
>
>To use conga i need go configure and install zope in one machine, install the agents in the servers that will be in the cluster, configure the zope to see the agents, its more complicated and demands more work for the simple task of cluster instalation and basic initial configuration.
>
HOLD IT HOLD IT! I have to make an urgent correction to your response
above :)
Conga requires absolutely NO configuration of zope. In fact, zope is so
far beneath the sheets that you will be able to install your OWN default
instance of zope, and there will be no interaction with Conga.
After installing the Conga server component, the admin enters the IP
addresses of the machines/cluster nodes to be managed just like you do
with cs-deploy-tool. There is no other configuration work necessary.
It is true that you will need the agent installed on the machines that
you wish to manage. cs-deploy-tool does not use an agent, but rather
logs in through an ssh session with the user-provided root password in
order to get things set up. cs-deploy-tool is not going away, and your
patches are welcome. You can send them to me and I will forward them to
Stan Kupcevic who wrote and maintains cs-deploy-tool. Thanks for your
involvement, Leonardo.
-J
>
>
>
>Conga is one great initiative and complex initiative for managing, deploy, administration, and others thinks for production cluster enviroments. if i need to choose one tool to just deploy cluster suite. i will choose cs-deploy-tool.
>
>If i need to manage, and be the administrator of a cluster, of course i will need the power of conga. :-D
>
>this long message is just to ask: Can I implement the changes i had proposed in the first message ? if, yes to who i will send they ?
>
>
>best regards
>Leonardo Rodrigues de Mello
>-----Mensagem original-----
>De: linux-cluster-bounces at redhat.com em nome de James Parsons
>Enviada: qua 9/8/2006 12:16
>Para: linux clustering
>Cc:
>Assunto: Re: cs-deploy-tool (WAS: [Linux-cluster] cs-deploy-gfs)
>
>Leonardo Rodrigues de Mello wrote:
>
>
>
>>sorry,
>>the application name is cs-deploy-tool, not cs-deploy-gfs.
>>
>>Best Regards
>>Leonardo Rodrigues de Mello
>>
>>
>>-----Mensagem original-----
>>De: linux-cluster-bounces at redhat.com em nome de Leonardo Rodrigues de Mello
>>Enviada: qua 9/8/2006 12:04
>>Para: linux-cluster at redhat.com
>>Cc:
>>Assunto: [Linux-cluster] cs-deploy-gfs
>>
>>Hi everyone,
>>
>>Does anyone know what happened with the development of cs-deploy-gfs ?
>>
>>The version in cvs is the lastest version ?
>>
>>There is any improvements since the initial version ?
>>
>>This software was abandoned ?
>>
>>
>>
>>
>The functionality available in cs-deploy-tool will be available in a new
>management interface for clusters and storage called Conga, targetted
>for RHEL5 and (hopefully) RHEL4.5
>
>-J
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
>
>------------------------------------------------------------------------
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
From haiwu.us at gmail.com Wed Aug 9 18:44:39 2006
From: haiwu.us at gmail.com (hai wu)
Date: Wed, 9 Aug 2006 13:44:39 -0500
Subject: [Linux-cluster] 2-node cluster and fence_drac
In-Reply-To: <1155138372.21204.158.camel@ayanami.boston.redhat.com>
References:
<1155138372.21204.158.camel@ayanami.boston.redhat.com>
Message-ID:
Thanks Lon. We got redundant power here.
How can I test this fence_drac? How to simulate a failure on one node and
know for sure that it does kick in and restarts the failed node in the
cluster?
Thanks,
Hai
On 8/9/06, Lon Hohberger wrote:
>
> On Mon, 2006-08-07 at 15:21 -0500, hai wu wrote:
> > Hi,
> > For a 2-node cluster (RHEL4), does it require the use of power switch
> > or fence_drac would be good enough for the setup? Would fence_drac
> > work properly in a 2-node cluster?
> > Thanks,
> > Hai
>
> fence_drac would be fine, but you need to understand that with DRAC (or
> any integrated power management which receives power from the machine)
> that if host power is completely lost, fencing will fail - causing the
> cluster to stop.
>
> This failure is indistinguishable from DRAC + host losing network at the
> same time (ex: the ethernet switch fails).
>
> Generally, these machines have redundant power, so losing power all at
> once is less likely.
>
> So, DRAC is fine, but there are failure cases where it is less than
> optimal, particularly in machines without redundant power supplies.
>
> -- Lon
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From lhh at redhat.com Wed Aug 9 19:05:32 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 09 Aug 2006 15:05:32 -0400
Subject: [Linux-cluster] 2-node cluster and fence_drac
In-Reply-To:
References:
<1155138372.21204.158.camel@ayanami.boston.redhat.com>
Message-ID: <1155150332.21204.202.camel@ayanami.boston.redhat.com>
On Wed, 2006-08-09 at 13:44 -0500, hai wu wrote:
> Thanks Lon. We got redundant power here.
>
> How can I test this fence_drac? How to simulate a failure on one node
> and know for sure that it does kick in and restarts the failed node in
> the cluster?
After both nodes join the cluster, try doing 'reboot -fn' on the node.
Oh, also, you should be booting with acpi=off when using integrated
power management.
-- Lon
From haiwu.us at gmail.com Wed Aug 9 20:39:50 2006
From: haiwu.us at gmail.com (hai wu)
Date: Wed, 9 Aug 2006 15:39:50 -0500
Subject: [Linux-cluster] 2-node cluster and fence_drac
In-Reply-To: <1155150332.21204.202.camel@ayanami.boston.redhat.com>
References:
<1155138372.21204.158.camel@ayanami.boston.redhat.com>
<1155150332.21204.202.camel@ayanami.boston.redhat.com>
Message-ID:
I got the following errors after "reboot -fn" on erd-tt-eproof1, which
script do I need to change?
Aug 9 15:35:40 erd-tt-eproof2 kernel: CMAN: removing node erd-tt-eproof1
from t
he cluster : Missed too many heartbeats
Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]: erd-tt-eproof1 not a cluster
member
after 0 sec post_fail_delay
Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]: fencing node "erd-tt-eproof1"
Aug 9 15:35:42 erd-tt-eproof2 fenced[3437]: agent "fence_drac" reports:
WARNING
: unable to detect DRAC version ' Dell Embedded Remote Access Controller
(ERA) F
irmware Version 3.31 (Build 07.15) ' WARNING: unsupported DRAC version
'__unknow
n__' failed: unable to determine power state
This is DRAC on Dell PE2650.
Thanks,
Hai
On 8/9/06, Lon Hohberger wrote:
>
> On Wed, 2006-08-09 at 13:44 -0500, hai wu wrote:
> > Thanks Lon. We got redundant power here.
> >
> > How can I test this fence_drac? How to simulate a failure on one node
> > and know for sure that it does kick in and restarts the failed node in
> > the cluster?
>
> After both nodes join the cluster, try doing 'reboot -fn' on the node.
>
> Oh, also, you should be booting with acpi=off when using integrated
> power management.
>
> -- Lon
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From jparsons at redhat.com Wed Aug 9 20:54:36 2006
From: jparsons at redhat.com (James Parsons)
Date: Wed, 09 Aug 2006 16:54:36 -0400
Subject: [Linux-cluster] 2-node cluster and fence_drac
In-Reply-To:
References: <1155138372.21204.158.camel@ayanami.boston.redhat.com> <1155150332.21204.202.camel@ayanami.boston.redhat.com>
Message-ID: <44DA4B8C.1050602@redhat.com>
hai wu wrote:
> I got the following errors after "reboot -fn" on erd-tt-eproof1, which
> script do I need to change?
>
> Aug 9 15:35:40 erd-tt-eproof2 kernel: CMAN: removing node
> erd-tt-eproof1 from t
> he cluster : Missed too many heartbeats
> Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]: erd-tt-eproof1 not a
> cluster member
> after 0 sec post_fail_delay
> Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]: fencing node "erd-tt-eproof1"
> Aug 9 15:35:42 erd-tt-eproof2 fenced[3437]: agent "fence_drac"
> reports: WARNING
> : unable to detect DRAC version ' Dell Embedded Remote Access
> Controller (ERA) F
> irmware Version 3.31 (Build 07.15) ' WARNING: unsupported DRAC version
> '__unknow
> n__' failed: unable to determine power state
>
> This is DRAC on Dell PE2650.
> Thanks,
> Hai
Do you know what DRAC version you are using? Can you please telnet into
the drac port and find out what it says when it starts your session?
Thanks,
-J
From haiwu.us at gmail.com Wed Aug 9 21:04:40 2006
From: haiwu.us at gmail.com (hai wu)
Date: Wed, 9 Aug 2006 16:04:40 -0500
Subject: [Linux-cluster] 2-node cluster and fence_drac
In-Reply-To: <44DA4B8C.1050602@redhat.com>
References:
<1155138372.21204.158.camel@ayanami.boston.redhat.com>
<1155150332.21204.202.camel@ayanami.boston.redhat.com>
<44DA4B8C.1050602@redhat.com>
Message-ID:
I got the following prompts after telneting to the drac port, maybe a simple
upgrade for the firmware would fix this issue:
Dell Embedded Remote Access Controller (ERA)
Firmware Version 3.31 (Build 07.15)
Login:
Thanks,
Hai
On 8/9/06, James Parsons wrote:
>
> hai wu wrote:
>
> > I got the following errors after "reboot -fn" on erd-tt-eproof1, which
> > script do I need to change?
> >
> > Aug 9 15:35:40 erd-tt-eproof2 kernel: CMAN: removing node
> > erd-tt-eproof1 from t
> > he cluster : Missed too many heartbeats
> > Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]: erd-tt-eproof1 not a
> > cluster member
> > after 0 sec post_fail_delay
> > Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]: fencing node
> "erd-tt-eproof1"
> > Aug 9 15:35:42 erd-tt-eproof2 fenced[3437]: agent "fence_drac"
> > reports: WARNING
> > : unable to detect DRAC version ' Dell Embedded Remote Access
> > Controller (ERA) F
> > irmware Version 3.31 (Build 07.15) ' WARNING: unsupported DRAC version
> > '__unknow
> > n__' failed: unable to determine power state
> >
> > This is DRAC on Dell PE2650.
> > Thanks,
> > Hai
>
> Do you know what DRAC version you are using? Can you please telnet into
> the drac port and find out what it says when it starts your session?
>
> Thanks,
>
> -J
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From jparsons at redhat.com Wed Aug 9 22:47:45 2006
From: jparsons at redhat.com (James Parsons)
Date: Wed, 09 Aug 2006 18:47:45 -0400
Subject: [Linux-cluster] 2-node cluster and fence_drac
In-Reply-To:
References: <1155138372.21204.158.camel@ayanami.boston.redhat.com> <1155150332.21204.202.camel@ayanami.boston.redhat.com> <44DA4B8C.1050602@redhat.com>
Message-ID: <44DA6611.6080602@redhat.com>
hai wu wrote:
> I got the following prompts after telneting to the drac port, maybe a
> simple upgrade for the firmware would fix this issue:
>
> Dell Embedded Remote Access Controller (ERA)
> Firmware Version 3.31 (Build 07.15)
> Login:
>
> Thanks,
> Hai
Oops. Sorry. Unsupported version. If you want, you could hack the agent
script (it is in perl) and get it to accept that version and just *see*
if it works -- it might. I tried looking for documentation for that
firmware rev and couldn't google any. If you know of some, drop me a
line and maybe we can get something working - or at least know if it
*will ever* work. :)
-J
BTW, the agent supports DRAC III/XT, DRAC MC, and DRAC 4/I.
>
> On 8/9/06, *James Parsons* > wrote:
>
> hai wu wrote:
>
> > I got the following errors after "reboot -fn" on erd-tt-eproof1,
> which
> > script do I need to change?
> >
> > Aug 9 15:35:40 erd-tt-eproof2 kernel: CMAN: removing node
> > erd-tt-eproof1 from t
> > he cluster : Missed too many heartbeats
> > Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]: erd-tt-eproof1 not a
> > cluster member
> > after 0 sec post_fail_delay
> > Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]: fencing node
> "erd-tt-eproof1"
> > Aug 9 15:35:42 erd-tt-eproof2 fenced[3437]: agent "fence_drac"
> > reports: WARNING
> > : unable to detect DRAC version ' Dell Embedded Remote Access
> > Controller (ERA) F
> > irmware Version 3.31 (Build 07.15) ' WARNING: unsupported DRAC
> version
> > '__unknow
> > n__' failed: unable to determine power state
> >
> > This is DRAC on Dell PE2650.
> > Thanks,
> > Hai
>
> Do you know what DRAC version you are using? Can you please telnet
> into
> the drac port and find out what it says when it starts your session?
>
> Thanks,
>
> -J
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>------------------------------------------------------------------------
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
From joni at philox.eu Thu Aug 10 07:04:31 2006
From: joni at philox.eu (Jonathan Salomon)
Date: Thu, 10 Aug 2006 09:04:31 +0200
Subject: [Linux-cluster] patch 2.6 kernel without modules
Message-ID: <44DADA7F.8040505@philox.eu>
Hi all,
I want to use GFS for a webcluster with shared data through a iSCSI SAN.
The cluster nodes are diskless and boot through PXE by downloading a
kernel and rootfs that is stored in RAM. I have built a custom minimal
Linux system with LFS (http://linuxfromscratch.org) to keep the image as
small as possible (the smallest Fedora/RedHat I could get by stripping
RPMs was still 650MB).
I would like to work without kernel modules and therefore I would like
to know whether it is possible to patch the 2.6 kernel to include GFS
'statically' (i.e. no kernel modules). As far as I cann tell the
cluster-1.02.00 package I downloaded builds kernel modules.
In addition I would like to know what minimal requirements I need to use
GFS. The load balancing itself will be done on other machines with a
different setup. Hence I would like to refrain from installing any of
that functionality on the cluster nodes. From reading the docs I get the
impression GFS needs a whole lot of clustering packages.
Thanks!
Jonathan
From sbhagat at redhat.com Thu Aug 10 08:11:21 2006
From: sbhagat at redhat.com (Subodh Bhagat)
Date: Thu, 10 Aug 2006 13:41:21 +0530
Subject: [Linux-cluster] Red Hat Cluster Service and Informix with 1.5 GB
memory allocation
Message-ID: <44DAEA29.1060903@redhat.com>
Dear all,
This issue is with one of our major customers, IBM Global Services. They
are implementing a 3 node cluster and configuring Informix database for
failover. The specifications of the three nodes are as follows:
ADBM01 2.4.21-40.ELsmp i686 AS release 3 (Taroon
Update 8) clumanager-1.2.31-1-i386
ADBM02 2.4.21-40.ELsmp i686 AS release 3 (Taroon
Update 8) clumanager-1.2.31-1-i386
ADBM03 2.4.21-40.ELhugemem i686 AS release 3 (Taroon Update
8) clumanager-1.2.26.1-1-i386
Informix version: IBM Informix Dynamic Server 10.00.UC4 On Linux Intel
Informix runs with over 1.8GB MEM allocated to it on the server when the
clustering agents are turned off. Also it works with Mem allocation of
less that 1.5 GB in cluster environment. But when in cluster
environment, the node is rebooted if >=1.5 GB is allocated.
At Informix end, the SHMBASE parameter would help only if there was a
memory allocation issue between Linux and Informix. But as Informix runs
with over 1.8GB MEM allocated to it on the server when the clustering
agents are turned off, altering SHMBASE may not help resolving this
issue. The issue most definitely be between the Red Hat Cluster Service
and Informix with a high mem allocation.
* We have suggested the customer to setup all the nodes in cluster as
identical with respect to the kernel version and cluster suite versions
and OS versions.
* Any idea, what else can be done here?
Please suggest.
--
Regards,
Subodh Bhagat,
Technical Engineer,
Red Hat India Pvt. Ltd.
1st Floor, 'C' Wing,
Fortune 2000,
Bandra Kurla Complex,
Bandra (East), Mumbai 400051
----------------------------
Mobile: +91-9323968930
Technical Support: +91-9322952612
Tel: +91-22-39878888 (Board Line)
Fax: +91-22-39878899
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From mark at sparkyone.com Thu Aug 10 11:35:13 2006
From: mark at sparkyone.com (Mark Reynolds)
Date: Thu, 10 Aug 2006 12:35:13 +0100 (BST)
Subject: [Linux-cluster] clurgmgrd stops service without reason
Message-ID: <35159.82.70.162.86.1155209713.squirrel@www.easilymail.co.uk>
Hi,
Have you been able to resolve this issue? I have the exact same symptoms
on a RedHat cluster (rgmanger version 1.9.46).
I receive a message " stopping service fileserver" and the node
shutsdown and ends up rebooting as it cant unmount a partition.
What worries me is that this has happened 3 times in 2 weeks with no
obvous reason as the server is working fine up until that point.
The relevant section of my cluster.conf is
Any thoughts or updates greatly appreciated as this is occuring on a
production server.
Regards
Mark Reynolds
> > On Wed, 2006-08-02 at 13:03 +0200, Falk Hackenberger - MediaTransfer AG
> > Netresearch & Consulting wrote:
> >
> >>--snip--
> >>Aug 1 17:31:28 kain clurgmgrd: [4780]: Executing
> >>/exports/imap/checkimapstartup.sh status
> >>Aug 1 17:31:28 kain clurgmgrd: [4780]: Executing
>
>>/exports/subversion/etc/rc.d/init.d/svnserver status
> >>Aug 1 17:31:28 kain clurgmgrd: [4780]: Checking 192.168.0.223,
> >>Level 0
> >>Aug 1 17:31:28 kain clurgmgrd: [4780]: 192.168.0.223 present on
> >>eth0
> >>Aug 1 17:31:28 kain clurgmgrd: [4780]: Link for eth0: Detected
> >>Aug 1 17:31:28 kain clurgmgrd: [4780]: Link detected on eth0
> >>Aug 1 17:31:37 kain clurgmgrd[4780]: Stopping service storage
> >>--snap--
> >>
> >>how to say to clurgmgrd, that he should log the reason for stoping the
> >>service?
> >
> > Something must be returning an error code where it should not be; can
> > you post
your service XML blob?
>
>it is very long and a little bit complex as i know... ;-)
>
>recovery="restart">
From rico_tsang at macroview.com Thu Aug 10 12:59:05 2006
From: rico_tsang at macroview.com (Rico Tsang)
Date: Thu, 10 Aug 2006 20:59:05 +0800
Subject: [Linux-cluster] Two-node cluster fencing each other
Message-ID: <61E6BBD96354E1419428314BA80EA8B95D0C1D@exchsvr.macroview.com>
Hi
This is my first trial on using Red Hat Cluster Suite and GFS on RHEL4. I'm trying to setup a two-node cluster. I've configured the Dell DRAC as the fencing device for both nodes. When I disconnect the network interface of one of the node, both nodes will try to fence each other.
How can I prevent this?
Suppose, I would like to check whether the router can be pinged before I fence the peer node. Is it a possible configuration in using Red Hat Cluster Suite?
Regards,
Rico
From f.hackenberger at mediatransfer.com Thu Aug 10 15:04:14 2006
From: f.hackenberger at mediatransfer.com (Falk Hackenberger - MediaTransfer AG Netresearch & Consulting)
Date: Thu, 10 Aug 2006 17:04:14 +0200
Subject: [Linux-cluster] clurgmgrd stops service without reason
In-Reply-To: <35159.82.70.162.86.1155209713.squirrel@www.easilymail.co.uk>
References: <35159.82.70.162.86.1155209713.squirrel@www.easilymail.co.uk>
Message-ID: <44DB4AEE.2000908@mediatransfer.com>
Mark Reynolds wrote:
> Have you been able to resolve this issue?
no way until now...
> I have the exact same symptoms
> on a RedHat cluster (rgmanger version 1.9.46).
>
> I receive a message " stopping service fileserver" and the node
> shutsdown and ends up rebooting as it cant unmount a partition.
>
> What worries me is that this has happened 3 times in 2 weeks with no
> obvous reason as the server is working fine up until that point.
same thing here but not only 3 times in 2 week more 1 time in 3 days...
From Matthew.Patton.ctr at osd.mil Thu Aug 10 19:03:11 2006
From: Matthew.Patton.ctr at osd.mil (Patton, Matthew F, CTR, OSD-PA&E)
Date: Thu, 10 Aug 2006 15:03:11 -0400
Subject: [Linux-cluster] patch 2.6 kernel without modules
Message-ID:
Classification: UNCLASSIFIED
> small as possible (the smallest Fedora/RedHat I could get by
> stripping RPMs was still 650MB).
wow, really? my more or less stripped RH4 is 163 RPMs and a footprint of
~350mb and that is with man pages and all the other junk in /usr/share.
> I would like to work without kernel modules
I'm with you there. however, I would suggest building a minimalist kernel
with module support and putting all the standard things compiled in and
using modules for extraneous stuff like GFS and cluster services. The reason
is GFS etc are a moving target and it's a LOT easier to track the steady
stream of bugfixes by just deploying new modules than it is to keep
recompiling a full kernel.
unless your webfarm is doing something unusual, NFS will be vastly simpler,
easier, and probably a lot faster. I think recent Linux NFS is a much less
sucky than it used to be.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From Zelikov_Mikhail at emc.com Thu Aug 10 20:04:11 2006
From: Zelikov_Mikhail at emc.com (Zelikov_Mikhail at emc.com)
Date: Thu, 10 Aug 2006 16:04:11 -0400
Subject: [Linux-cluster] Magma; Magma-plugins
Message-ID: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF2C@CORPUSMX40B.corp.emc.com>
I was looking for any documentation on Magma API or any Red Hat cluster
suite API. So far I was only able to find a man page on clu_connect,
clu_disconnect and clu_get_event. However there many more defined in magma.h
and magma-build.h. It also looks like there is a draft version of Magma man
page at the end of magma.h.
I am interested in writing a cluster aware application as well as defining
my own cluster resource.
Any help would be really appreciated.
Thank you,
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Blank Bkgrd.gif
Type: image/gif
Size: 145 bytes
Desc: not available
URL:
From teigland at redhat.com Thu Aug 10 20:07:41 2006
From: teigland at redhat.com (David Teigland)
Date: Thu, 10 Aug 2006 15:07:41 -0500
Subject: [Linux-cluster] Re: [Cluster-devel] Magma; Magma-plugins
In-Reply-To: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF2C@CORPUSMX40B.corp.emc.com>
References: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF2C@CORPUSMX40B.corp.emc.com>
Message-ID: <20060810200741.GB20622@redhat.com>
On Thu, Aug 10, 2006 at 04:04:11PM -0400, Zelikov_Mikhail at emc.com wrote:
> I was looking for any documentation on Magma API or any Red Hat cluster
> suite API.
Magma was only a temporary lib used in RHEL4, we're not using it any
longer. You should look at libcman or any of the other libraries
available in openais, http://developer.osdl.org/dev/openais/
Dave
From Zelikov_Mikhail at emc.com Thu Aug 10 20:19:03 2006
From: Zelikov_Mikhail at emc.com (Zelikov_Mikhail at emc.com)
Date: Thu, 10 Aug 2006 16:19:03 -0400
Subject: [Linux-cluster] RE: [Cluster-devel] Magma; Magma-plugins
Message-ID: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF5D@CORPUSMX40B.corp.emc.com>
Dave, thank you for the reply.
Magma is used in the latest release of CS and GFS. I was wondering if I use
this (openais) API will it work within the currently existing cluster
infrastructure on RHEL4.3? Or is it the future supported API?
Mike
-----Original Message-----
From: David Teigland [mailto:teigland at redhat.com]
Sent: Thursday, August 10, 2006 4:08 PM
To: Zelikov, Mikhail
Cc: linux-cluster at redhat.com; cluster-devel at redhat.com
Subject: Re: [Cluster-devel] Magma; Magma-plugins
On Thu, Aug 10, 2006 at 04:04:11PM -0400, Zelikov_Mikhail at emc.com wrote:
> I was looking for any documentation on Magma API or any Red Hat
> cluster suite API.
Magma was only a temporary lib used in RHEL4, we're not using it any longer.
You should look at libcman or any of the other libraries available in
openais, http://developer.osdl.org/dev/openais/
Dave
From lhh at redhat.com Thu Aug 10 20:28:22 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 10 Aug 2006 16:28:22 -0400
Subject: [Linux-cluster] Re: [Cluster-devel] Magma; Magma-plugins
In-Reply-To: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF2C@CORPUSMX40B.corp.emc.com>
References: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF2C@CORPUSMX40B.corp.emc.com>
Message-ID: <1155241702.21204.267.camel@ayanami.boston.redhat.com>
On Thu, 2006-08-10 at 16:04 -0400, Zelikov_Mikhail at emc.com wrote:
> I was looking for any documentation on Magma API or any Red Hat
> cluster suite API. So far I was only able to find a man page on
> clu_connect, clu_disconnect and clu_get_event. However there many more
> defined in magma.h and magma-build.h. It also looks like there is a
> draft version of Magma man page at the end of magma.h.
> I am interested in writing a cluster aware application as well as
> defining my own cluster resource.
> Any help would be really appreciated.
> Thank you,
> Mike
There really isn't much.
Originally, it was used to provide an API which worked on CMAN and GuLM
clusters. We've since decided that we did not need two infrastructures,
so we no longer have Magma (in -head; it's still in STABLE and RHEL4
branches).
You should use the CMAN API and the DLM API instead of Magma if you need
compatibility between current stable/RHEL4 and future release(s).
-- Lon
From lhh at redhat.com Thu Aug 10 20:29:44 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 10 Aug 2006 16:29:44 -0400
Subject: [Linux-cluster] RE: [Cluster-devel] Magma; Magma-plugins
In-Reply-To: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF5D@CORPUSMX40B.corp.emc.com>
References: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF5D@CORPUSMX40B.corp.emc.com>
Message-ID: <1155241784.21204.270.camel@ayanami.boston.redhat.com>
On Thu, 2006-08-10 at 16:19 -0400, Zelikov_Mikhail at emc.com wrote:
> Dave, thank you for the reply.
> Magma is used in the latest release of CS and GFS. I was wondering if I use
> this (openais) API will it work within the currently existing cluster
> infrastructure on RHEL4.3? Or is it the future supported API?
> Mike
It will work on RHEL4, but it will not work on RHEL5, FC5/6 or any
future release.
The CMAN and DLM APIs work on all of the above.
-- Lon
From teigland at redhat.com Thu Aug 10 20:30:19 2006
From: teigland at redhat.com (David Teigland)
Date: Thu, 10 Aug 2006 15:30:19 -0500
Subject: [Linux-cluster] Re: [Cluster-devel] Magma; Magma-plugins
In-Reply-To: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF5D@CORPUSMX40B.corp.emc.com>
References: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF5D@CORPUSMX40B.corp.emc.com>
Message-ID: <20060810203019.GA25666@redhat.com>
On Thu, Aug 10, 2006 at 04:19:03PM -0400, Zelikov_Mikhail at emc.com wrote:
> Dave, thank you for the reply.
> Magma is used in the latest release of CS and GFS. I was wondering if I use
> this (openais) API will it work within the currently existing cluster
> infrastructure on RHEL4.3? Or is it the future supported API?
RHEL4 doesn't include openais, but libcman is available on both RHEL4 and
RHEL5 (with some minor changes IIRC).
> -----Original Message-----
> From: David Teigland [mailto:teigland at redhat.com]
> Sent: Thursday, August 10, 2006 4:08 PM
> To: Zelikov, Mikhail
> Cc: linux-cluster at redhat.com; cluster-devel at redhat.com
> Subject: Re: [Cluster-devel] Magma; Magma-plugins
>
> On Thu, Aug 10, 2006 at 04:04:11PM -0400, Zelikov_Mikhail at emc.com wrote:
> > I was looking for any documentation on Magma API or any Red Hat
> > cluster suite API.
>
> Magma was only a temporary lib used in RHEL4, we're not using it any longer.
> You should look at libcman or any of the other libraries available in
> openais, http://developer.osdl.org/dev/openais/
>
> Dave
From lhh at redhat.com Thu Aug 10 20:33:40 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 10 Aug 2006 16:33:40 -0400
Subject: [Linux-cluster] Two-node cluster fencing each other
In-Reply-To: <61E6BBD96354E1419428314BA80EA8B95D0C1D@exchsvr.macroview.com>
References: <61E6BBD96354E1419428314BA80EA8B95D0C1D@exchsvr.macroview.com>
Message-ID: <1155242020.21204.274.camel@ayanami.boston.redhat.com>
On Thu, 2006-08-10 at 20:59 +0800, Rico Tsang wrote:
> Hi
>
> This is my first trial on using Red Hat Cluster Suite and GFS on RHEL4. I'm trying to setup a two-node cluster. I've configured the Dell DRAC as the fencing device for both nodes. When I disconnect the network interface of one of the node, both nodes will try to fence each other.
>
> How can I prevent this?
Don't do that.
:)
They're *supposed* to try to fence each other in this case. However,
the one with the disconnected network jack will lose - because it should
not be able to talk to DRAC.
> Suppose, I would like to check whether the router can be pinged before I fence the peer node. Is it a possible configuration in using Red Hat Cluster Suite?
You can use qdisk to add any sort of heuristic you want for a node to
determine liveliness fitness. See the qdisk man page out of the RHEL4
branch.
-- Lon
From Zelikov_Mikhail at emc.com Thu Aug 10 20:40:10 2006
From: Zelikov_Mikhail at emc.com (Zelikov_Mikhail at emc.com)
Date: Thu, 10 Aug 2006 16:40:10 -0400
Subject: [Linux-cluster] RE: [Cluster-devel] Magma; Magma-plugins
Message-ID: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF9F@CORPUSMX40B.corp.emc.com>
I can see that I have libcman and libdlm installed as the part of cman and
dlm packages. I looked at the cluster, dlm and cman project home pages - the
APIs are mentioned but there is no documentation I could find. Am I looking
at the wrong place?
Mike
-----Original Message-----
From: Lon Hohberger [mailto:lhh at redhat.com]
Sent: Thursday, August 10, 2006 4:30 PM
To: Zelikov, Mikhail
Cc: teigland at redhat.com; cluster-devel at redhat.com; linux-cluster at redhat.com
Subject: RE: [Cluster-devel] Magma; Magma-plugins
On Thu, 2006-08-10 at 16:19 -0400, Zelikov_Mikhail at emc.com wrote:
> Dave, thank you for the reply.
> Magma is used in the latest release of CS and GFS. I was wondering if
> I use this (openais) API will it work within the currently existing
> cluster infrastructure on RHEL4.3? Or is it the future supported API?
> Mike
It will work on RHEL4, but it will not work on RHEL5, FC5/6 or any future
release.
The CMAN and DLM APIs work on all of the above.
-- Lon
From lhh at redhat.com Thu Aug 10 20:42:41 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 10 Aug 2006 16:42:41 -0400
Subject: [Linux-cluster] clurgmgrd stops service without reason
In-Reply-To: <35159.82.70.162.86.1155209713.squirrel@www.easilymail.co.uk>
References: <35159.82.70.162.86.1155209713.squirrel@www.easilymail.co.uk>
Message-ID: <1155242561.21204.283.camel@ayanami.boston.redhat.com>
On Thu, 2006-08-10 at 12:35 +0100, Mark Reynolds wrote:
> Hi,
>
> Have you been able to resolve this issue? I have the exact same symptoms
> on a RedHat cluster (rgmanger version 1.9.46).
>
> I receive a message " stopping service fileserver" and the node
> shutsdown and ends up rebooting as it cant unmount a partition.
>
> What worries me is that this has happened 3 times in 2 weeks with no
> obvous reason as the server is working fine up until that point.
>
> The relevant section of my cluster.conf is
>
>
> force_unmount="1" fsid="11439" fstype="ext3"
> mountpoint="/mnt/live" name="live" options="noatime"
> self_fence="1"/>
> force_unmount="1" fsid="53676" fstype="ext3"
> mountpoint="/mnt/backup" name="backup" options="noatime"
> self_fence="1"/>
>
>
>
>
>
>
> Any thoughts or updates greatly appreciated as this is occuring on a
> production server.
Well, your log messages and XML don't match.
There's a recent bugzilla noting that rgmanager lacks sufficient error
reporting for several resource agents.
I will make a couple of updates to the resource agents shortly (e.g.
today or tomorrow), and you can drop them in (on an already-running
cluster, without restarting rgmanager). It should, then, provide you
the information as to what part is failing. I would suspect that it is
either the Samba script or the NFS script that is returning an error,
based on the previously noted log messages.
-- Lon
From teigland at redhat.com Thu Aug 10 20:51:34 2006
From: teigland at redhat.com (David Teigland)
Date: Thu, 10 Aug 2006 15:51:34 -0500
Subject: [Linux-cluster] Re: [Cluster-devel] Magma; Magma-plugins
In-Reply-To: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF9F@CORPUSMX40B.corp.emc.com>
References: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CF9F@CORPUSMX40B.corp.emc.com>
Message-ID: <20060810205134.GB25666@redhat.com>
On Thu, Aug 10, 2006 at 04:40:10PM -0400, Zelikov_Mikhail at emc.com wrote:
> I can see that I have libcman and libdlm installed as the part of cman and
> dlm packages. I looked at the cluster, dlm and cman project home pages - the
> APIs are mentioned but there is no documentation I could find. Am I looking
> at the wrong place?
Download the source code from the "cluster" cvs tree. The API's in the
RHEL4 cvs branch will be slightly different that those in the cvs HEAD
(for RHEL5). Look at
cluster/cman/lib/libcman.h
cluster/dlm/lib/libdlm.h
cluster/dlm/doc/*
> -----Original Message-----
> From: Lon Hohberger [mailto:lhh at redhat.com]
> Sent: Thursday, August 10, 2006 4:30 PM
> To: Zelikov, Mikhail
> Cc: teigland at redhat.com; cluster-devel at redhat.com; linux-cluster at redhat.com
> Subject: RE: [Cluster-devel] Magma; Magma-plugins
>
> On Thu, 2006-08-10 at 16:19 -0400, Zelikov_Mikhail at emc.com wrote:
> > Dave, thank you for the reply.
> > Magma is used in the latest release of CS and GFS. I was wondering if
> > I use this (openais) API will it work within the currently existing
> > cluster infrastructure on RHEL4.3? Or is it the future supported API?
> > Mike
>
> It will work on RHEL4, but it will not work on RHEL5, FC5/6 or any future
> release.
>
> The CMAN and DLM APIs work on all of the above.
>
> -- Lon
From Zelikov_Mikhail at emc.com Thu Aug 10 20:57:16 2006
From: Zelikov_Mikhail at emc.com (Zelikov_Mikhail at emc.com)
Date: Thu, 10 Aug 2006 16:57:16 -0400
Subject: [Linux-cluster] RE: [Cluster-devel] Magma; Magma-plugins
In-Reply-To: <20060810205134.GB25666@redhat.com>
Message-ID: <9B2FEC4CE7E80B4A965F1D9ADF22B1730437CFE9@CORPUSMX40B.corp.emc.com>
Will do. Thank you!
Mike
-----Original Message-----
From: David Teigland [mailto:teigland at redhat.com]
Sent: Thursday, August 10, 2006 4:52 PM
To: Zelikov, Mikhail
Cc: lhh at redhat.com; cluster-devel at redhat.com; linux-cluster at redhat.com
Subject: Re: [Cluster-devel] Magma; Magma-plugins
On Thu, Aug 10, 2006 at 04:40:10PM -0400, Zelikov_Mikhail at emc.com wrote:
> I can see that I have libcman and libdlm installed as the part of cman
> and dlm packages. I looked at the cluster, dlm and cman project home
> pages - the APIs are mentioned but there is no documentation I could
> find. Am I looking at the wrong place?
Download the source code from the "cluster" cvs tree. The API's in the
RHEL4 cvs branch will be slightly different that those in the cvs HEAD
(for RHEL5). Look at
cluster/cman/lib/libcman.h
cluster/dlm/lib/libdlm.h
cluster/dlm/doc/*
> -----Original Message-----
> From: Lon Hohberger [mailto:lhh at redhat.com]
> Sent: Thursday, August 10, 2006 4:30 PM
> To: Zelikov, Mikhail
> Cc: teigland at redhat.com; cluster-devel at redhat.com;
> linux-cluster at redhat.com
> Subject: RE: [Cluster-devel] Magma; Magma-plugins
>
> On Thu, 2006-08-10 at 16:19 -0400, Zelikov_Mikhail at emc.com wrote:
> > Dave, thank you for the reply.
> > Magma is used in the latest release of CS and GFS. I was wondering
> > if I use this (openais) API will it work within the currently
> > existing cluster infrastructure on RHEL4.3? Or is it the future
supported API?
> > Mike
>
> It will work on RHEL4, but it will not work on RHEL5, FC5/6 or any
> future release.
>
> The CMAN and DLM APIs work on all of the above.
>
> -- Lon
From lhh at redhat.com Fri Aug 11 15:52:25 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 11 Aug 2006 11:52:25 -0400
Subject: [Linux-cluster] [PATCH] Update several resource agents to log errors
Message-ID: <1155311545.21204.290.camel@ayanami.boston.redhat.com>
Patch is against HEAD; should apply to most branches (except for the
xenvm.sh part)
-- Lon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: resource-agent-error.patch
Type: text/x-patch
Size: 4711 bytes
Desc: not available
URL:
From lhh at redhat.com Fri Aug 11 18:01:34 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 11 Aug 2006 14:01:34 -0400
Subject: [Linux-cluster] Red Hat Cluster Service and Informix with 1.5
GB memory allocation
In-Reply-To: <44DAEA29.1060903@redhat.com>
References: <44DAEA29.1060903@redhat.com>
Message-ID: <1155319294.21204.331.camel@ayanami.boston.redhat.com>
On Thu, 2006-08-10 at 13:41 +0530, Subodh Bhagat wrote:
> Dear all,
>
> This issue is with one of our major customers, IBM Global Services.
> They are implementing a 3 node cluster and configuring Informix
> database for failover. The specifications of the three nodes are as
> follows:
>
> ADBM01 2.4.21-40.ELsmp i686 AS release 3 (Taroon
> Update 8) clumanager-1.2.31-1-i386
> ADBM02 2.4.21-40.ELsmp i686 AS release 3 (Taroon
> Update 8) clumanager-1.2.31-1-i386
> ADBM03 2.4.21-40.ELhugemem i686 AS release 3 (Taroon Update
> 8) clumanager-1.2.26.1-1-i386
>
> Informix version: IBM Informix Dynamic Server 10.00.UC4 On Linux Intel
>
> Informix runs with over 1.8GB MEM allocated to it on the server when
> the clustering agents are turned off. Also it works with Mem
> allocation of less that 1.5 GB in cluster environment. But when in
> cluster environment, the node is rebooted if >=1.5 GB is allocated.
>
> At Informix end, the SHMBASE parameter would help only if there was a
> memory allocation issue between Linux and Informix. But as Informix
> runs with over 1.8GB MEM allocated to it on the server when the
> clustering agents are turned off, altering SHMBASE may not help
> resolving this issue. The issue most definitely be between the Red Hat
> Cluster Service and Informix with a high mem allocation.
>
> * We have suggested the customer to setup all the nodes in cluster as
> identical with respect to the kernel version and cluster suite
> versions and OS versions.
RHCS3 supports rolling upgrade, so there shouldn't be an issue with the
1.2.26.1 + 1.2.31 versions being mixed. There certainly shouldn't be
though.
> * Any idea, what else can be done here?
You can try setting realtime priority in RHCS; see the cludb man page.
Also, increase the failover time. Both will decrease the chance that
clumanager gets a 'false' transition.
-- Lon
From sunsadm at gmail.com Fri Aug 11 20:27:20 2006
From: sunsadm at gmail.com (sun sadm)
Date: Fri, 11 Aug 2006 22:27:20 +0200
Subject: [Linux-cluster] Re: USB flash drive no longer mounted read/write
under RHEL
In-Reply-To: <538FF2EE40374C48BD9D7BA20F776ADB0166D82D@nnc.co.uk>
References: <538FF2EE40374C48BD9D7BA20F776ADB0166D82D@nnc.co.uk>
Message-ID:
On 8/11/06, Cannon, Andrew wrote:
> Hi all,
> According to the mtab it is mounted read/write,
> the properties box comes up in Gnome as saying that there are rw permissions
> on the drive, but that I am not the owner so I can't change any of the
> permissions. I can't even write to it as root.
>
> Yet, I can plug it into my Windows box and write to it to my heart's
> content. Any ideas on why it has suddenly decided to become read only
> (there are no switches on it) and what I can do to fix it?
>
> Thanks
>
> Andrew
Hi colleague,
please share your comments about this issue. We have a similar problem
in cluster environment where SAN disks get read only (mtab and mount
say rw). We absolutely don't know why this happen. How can we
troubleshoot this?
We opened a ticket at EMC (our SAN provider) and they claim its a RHEL problem.
Nico
From rpeterso at redhat.com Fri Aug 11 21:21:25 2006
From: rpeterso at redhat.com (Robert Peterson)
Date: Fri, 11 Aug 2006 16:21:25 -0500
Subject: [Linux-cluster] Re: USB flash drive no longer mounted read/write
under RHEL
In-Reply-To:
References: <538FF2EE40374C48BD9D7BA20F776ADB0166D82D@nnc.co.uk>
Message-ID: <44DCF4D5.5010103@redhat.com>
sun sadm wrote:
> Hi colleague,
>
> please share your comments about this issue. We have a similar problem
> in cluster environment where SAN disks get read only (mtab and mount
> say rw). We absolutely don't know why this happen. How can we
> troubleshoot this?
>
> We opened a ticket at EMC (our SAN provider) and they claim its a RHEL
> problem.
>
> Nico
Hi Nico,
Have you spoken to Red Hat Tech Support about this? I doubt whether
your problem
is related to the USB flash drive issue, although it's hard to rule it
out without
more info.
Before we can help, we need to know more about your situation, such
as what file system you are using, what version of the Cluster Suite you
are using,
and what messages appear in /var/log/messages from all nodes in the cluster.
If it's on a SAN in a cluster, I'm assuming it's GFS, in which case you
can do some
things, like (1) checking for errors in /var/log/messages, (2) check
/proc/mounts to see
if the kernel also thinks that it's mounted RW. (3) make sure the
cluster bit in
on for the volume group: do the vgs command and check if it has a
"c" in the flags, e.g. "wz--nc" and not "wz--n-".
If the data on the drive is expendable and you can afford to lose it,
you can
do some experiments writing data to the raw device. For example,
unmount the file system from ALL nodes and do something like:
(save off some raw data from the lv)
dd if=/dev/your_vg/your_lv0 of=/tmp/test1 bs=1024 count=1
(write the saved data back to the lv)
dd if=/tmp/test1 of=/dev/your_vg/your_lv0 bs=1024 count=1
If that doesn't work, try writing to the SCSI device directly.
Assuming that /dev/sda is part of your vg on the SAN:
(save off some raw data from the SCSI device)
dd if=/dev/sda of=/tmp/test1 bs=1024 count=1
(write the saved data back to the raw device)
dd if=/tmp/test1 of=/dev/sda bs=1024 count=1
NOTE: These commands are dangerous and should not be attempted
on production machines with live data.
I think some fibre channel SANs can be configured to restrict access
to the data, so you may have to check that as well. Just some ideas.
Regards,
Bob Peterson
Red Hat Cluster Suite
From sanjay at userspace.com Sat Aug 12 01:50:44 2006
From: sanjay at userspace.com (Sanjaya Joshi)
Date: Fri, 11 Aug 2006 18:50:44 -0700
Subject: [Linux-cluster] 6-processor 64-bit cluster available for sale
Message-ID: <44DD33F4.3030502@userspace.com>
Apologize if this is not the venue for this post...
Self-Contained 6-processor 64-bit cluster available for sale.
Ideal for training or small groups.
This is a cluster built from "best-of-breed" components and most of the system
is less than 6 months old. The Operating System and MPI engine is installed and
ready to go!
Please contact me if you are interested.
Details below:
--
NODE 1:
CPU: Opteron 248 2.6 Ghz Single Core Retail; Quantity: 2
Mainboard: Make/Model: TYAN "Thunder K8S Pro(S2882UG3NR)" AMD-8000 Chipset Server
Motherboard for Dual AMD Socket 940 CPU -RETAIL
Quantity: 1
MEMORY: RAM - Make/Model: Kingston 184 Pin 512MB ECC Registered DDR PC-2700 -
Retail; Quantity: 2
HARD DISK: 80GB PATA
SAMSUNG 80GB 7200RPM IDE Hard Drive, Model SP0802N, OEM Drive only
CASE: 19" 2U rackmount case
Make/Model: I-STAR 2U Stylish Rackmount Server Chassis (Black) With
I-Star 350W Power Supply, Model "D-200 Storm Series" -RETAIL
Quantity: 1
--
NODE 2:
CPU: Opteron 265 1.8 Ghz Dual Core OEM; Quantity: 2
Mainboard: Tyan S2892G3NR Dual Socket 940/nForce Pro.2200 Motherboard; Quantity: 1
MEMORY: RAM - Make/Model: Kingston 184 Pin 512MB ECC Registered DDR PC-2700 -
Retail; Quantity: 2
HARD DISK
SAMSUNG 80GB 7200RPM IDE Hard Drive, Model SP0802N, OEM Drive only
CASE 19" 2U rackmount case
Make/Model: I-STAR 2U Stylish Rackmount Server Chassis (Black) With
I-Star 350W Power Supply, Model "D-200 Storm Series" -RETAIL
Quantity: 1
--
NODE 3:
CPU: Opteron 248 2.6 Ghz Single Core Retail; Quantity: 2
Mainboard: Tyan S2892G3NR Dual Socket 940/nForce Pro.2200 Motherboard
Quantity: 1
MEMORY:RAM - Make/Model: Kingston 184 Pin 512MB ECC Registered DDR PC-2700 -
Retail; Quantity: 2
HARD DISK: 200GB 7,400rpm SATA drive
CASE 19" 3U rackmount case
Make/Model: Antec 3U25ATX550EPS-XR BLACK
Quantity: 1
--
RACK
ISC, Glass front, wheels (and can hold 5x 2U cases or 3x 2U cases and 1x 3U Case)
SWITCH DETAILS
8 port Gigabit switch
LCD MONITOR
15 inch Generic
KEYBOARD: Generic
--
OPERATING SYSTEM:
Fedora Core 4 with MPI
Installed and tested for all nodes.
PLEASE NOTE: RAILS INCLUDED FOR RACK MOUNT OF CASE, BUT NOT INSTALLED WITHIN RACK.
CLUSTER READY AND OPERATIONAL
--
Most components less than about 6 months
SALE PRICE: $4,995
Available for inspection locally in WA.
Sales tax and shipping/pickup is responsibility of purchaser
If sales tax is exempt, need the exemption certificate
System around 200 pounds weight.
From joni at philox.eu Sat Aug 12 09:55:16 2006
From: joni at philox.eu (Philox / Jonathan Salomon)
Date: Sat, 12 Aug 2006 11:55:16 +0200
Subject: [Linux-cluster] patch 2.6 kernel without modules
In-Reply-To:
References:
Message-ID: <44DDA584.4040108@philox.eu>
Patton, Matthew F, CTR, OSD-PA&E wrote:
> unless your webfarm is doing something unusual, NFS will be vastly
> simpler, easier, and probably a lot faster. I think recent Linux NFS
> is a much less sucky than it used to be.
>
As my webfarm is serving webpages, I don't think my webfarm is doing
something unusual ;) But seriously why would NFS be faster? I have a
Dell EMC AX-150 storage machine that speaks iSCSI to my cluster nodes,
which are equipped with iSCSI HBA's, so they are directly connected to
the data storage. If I were to connect only one machine to the data
storage and export the data through NFS on that machine, it seems to me
that creating this extra layer would only slow it down!?
Could you elaborate why you think/know NFS would be faster?
Thanks!
Jonathan
From troels at arvin.dk Mon Aug 14 07:37:35 2006
From: troels at arvin.dk (Troels Arvin)
Date: Mon, 14 Aug 2006 09:37:35 +0200
Subject: [Linux-cluster] Two clurgmgrd processes
Message-ID:
Hello,
I've upgraded our RHEL 4+Cluster Suite installation to Update 4. Before
the upgrade, there was always one and only one clurgmgrd process; after
the update, there seems to be two. Is this expected, or does in indicate a
potential problem?
--
Greetings from Troels Arvin
From mels.kooijman at gmail.com Mon Aug 14 07:53:54 2006
From: mels.kooijman at gmail.com (Mels Kooijman)
Date: Mon, 14 Aug 2006 09:53:54 +0200
Subject: [Linux-cluster] Cmnd failed-retry the same path
Message-ID: <7a02458e0608140053x2a5f9f29s6cc4b7d1da53580c@mail.gmail.com>
We have a 2 node cluster, with qlogic san drives and GFS filesystem.
When the system boot we get a message in dmesg:
qla2300 0000:03:0b.0:
QLogic Fibre Channel HBA Driver: 8.01.02-d4
QLogic QLA2340 -
ISP2312: PCI-X (133 MHz) @ 0000:03:0b.0 hdma+, host#=1, fw=3.03.18 IPX
Vendor: IBM Model: 1815 FAStT Rev: 0914
Type: Direct-Access ANSI SCSI revision: 03
qla2300 0000:03:0b.0: scsi(1:0:0:1): Enabled tagged queuing, queue depth 32.
Vendor: IBM Model: 1815 FAStT Rev: 0914
Type: Direct-Access ANSI SCSI revision: 03
qla2300 0000:03:0b.0: scsi(1:0:0:2): Enabled tagged queuing, queue depth 32.
Vendor: IBM Model: 1815 FAStT Rev: 0914
Type: Direct-Access ANSI SCSI revision: 03
qla2300 0000:03:0b.0: scsi(1:0:0:3): Enabled tagged queuing, queue depth 32.
Vendor: IBM Model: 1815 FAStT Rev: 0914
Type: Direct-Access ANSI SCSI revision: 03
qla2300 0000:03:0b.0: scsi(1:0:0:4): Enabled tagged queuing, queue depth 32.
scsi2 : mpp virtual bus adaptor :version:09.01.B5.30,timestamp:Tue Apr 18
08:34:11 CDT 2006
Vendor: IBM Model: VirtualDisk Rev: 0914
Type: Direct-Access ANSI SCSI revision: 03
scsi(2:0:0:0): Enabled tagged queuing, queue depth 30.
Vendor: IBM Model: VirtualDisk Rev: 0914
Type: Direct-Access ANSI SCSI revision: 03
scsi(2:0:0:1): Enabled tagged queuing, queue depth 30.
SCSI device sdb: 104857600 512-byte hdwr sectors (53687 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 104857600 512-byte hdwr sectors (53687 MB)
SCSI device sdb: drive cache: write back
sdb:<4>493 [RAIDarray.mpp]DS4800_AM:1:0:1 Cmnd failed-retry the same path.
vcmnd SN 680 pdev H1:C0:T0:L1 0x06/0x8b/0x02 0x08000002 mpp_status:1
The last line we see in messages when both systems read from the GFS
filesystem, the read performance is very low on that moment.
We are using linuxrdac-09.01.B5.30
Any has a solution?
Regards
Mels
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From raj4linux at gmail.com Mon Aug 14 08:00:00 2006
From: raj4linux at gmail.com (rajesh mishra)
Date: Mon, 14 Aug 2006 13:30:00 +0530
Subject: [Linux-cluster] Two-node cluster fencing each other
In-Reply-To: <61E6BBD96354E1419428314BA80EA8B95D0C1D@exchsvr.macroview.com>
References: <61E6BBD96354E1419428314BA80EA8B95D0C1D@exchsvr.macroview.com>
Message-ID: <5a8d914c0608140100l56d7d3eu4314237e1b791b7a@mail.gmail.com>
For the trail purpose u can use gnbd for gfs. That will be more easy to set
up. For that u need to have 3-machones.
U can read the setup instruction form downloaded source (cluster/doc/min-
gfs.txt).
With Regards
RajSun.
On 8/10/06, Rico Tsang wrote:
>
> Hi
>
> This is my first trial on using Red Hat Cluster Suite and GFS on
> RHEL4. I'm trying to setup a two-node cluster. I've configured the Dell
> DRAC as the fencing device for both nodes. When I disconnect the network
> interface of one of the node, both nodes will try to fence each other.
>
> How can I prevent this?
>
> Suppose, I would like to check whether the router can be pinged before I
> fence the peer node. Is it a possible configuration in using Red Hat
> Cluster Suite?
>
>
>
> Regards,
> Rico
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From raj4linux at gmail.com Mon Aug 14 08:02:07 2006
From: raj4linux at gmail.com (rajesh mishra)
Date: Mon, 14 Aug 2006 13:32:07 +0530
Subject: [Linux-cluster] Two-node cluster fencing each other
In-Reply-To: <5a8d914c0608140100l56d7d3eu4314237e1b791b7a@mail.gmail.com>
References: <61E6BBD96354E1419428314BA80EA8B95D0C1D@exchsvr.macroview.com>
<5a8d914c0608140100l56d7d3eu4314237e1b791b7a@mail.gmail.com>
Message-ID: <5a8d914c0608140102j4239575cs96c3eae8b3fb58a4@mail.gmail.com>
Ooops spelling mistake..
For the trail purpose u can use gnbd for gfs. That will be more easy to set
up. For that u need to have* 3-machines*.
U can read the setup instruction form downloaded source (cluster/doc/min-
gfs.txt).
With Regards
RajSun.
On 8/14/06, rajesh mishra wrote:
>
> For the trail purpose u can use gnbd for gfs. That will be more easy to
> set up. For that u need to have 3-machones.
> U can read the setup instruction form downloaded source (cluster/doc/min-
> gfs.txt).
>
>
> With Regards
> RajSun.
>
>
> On 8/10/06, Rico Tsang wrote:
> >
> > Hi
> >
> > This is my first trial on using Red Hat Cluster Suite and GFS on
> > RHEL4. I'm trying to setup a two-node cluster. I've configured the Dell
> > DRAC as the fencing device for both nodes. When I disconnect the network
> > interface of one of the node, both nodes will try to fence each other.
> >
> > How can I prevent this?
> >
> > Suppose, I would like to check whether the router can be pinged before I
> > fence the peer node. Is it a possible configuration in using Red Hat
> > Cluster Suite?
> >
> >
> >
> > Regards,
> > Rico
> >
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From f.hackenberger at mediatransfer.com Mon Aug 14 08:33:02 2006
From: f.hackenberger at mediatransfer.com (Falk Hackenberger - MediaTransfer AG Netresearch & Consulting)
Date: Mon, 14 Aug 2006 10:33:02 +0200
Subject: [Linux-cluster] clurgmgrd stops service without reason
In-Reply-To: <1155242561.21204.283.camel@ayanami.boston.redhat.com>
References: <35159.82.70.162.86.1155209713.squirrel@www.easilymail.co.uk>
<1155242561.21204.283.camel@ayanami.boston.redhat.com>
Message-ID: <44E0353E.5090507@mediatransfer.com>
Lon Hohberger wrote:
> Well, your log messages and XML don't match.
>
> There's a recent bugzilla noting that rgmanager lacks sufficient error
> reporting for several resource agents.
wich bug id?
> I will make a couple of updates to the resource agents shortly (e.g.
> today or tomorrow), and you can drop them in (on an already-running
> cluster, without restarting rgmanager). It should, then, provide you
> the information as to what part is failing.
wich rgmanager version should have this bugfix?
falk
From joe.devman at yahoo.fr Mon Aug 7 07:08:13 2006
From: joe.devman at yahoo.fr (Joe)
Date: Mon, 07 Aug 2006 09:08:13 +0200
Subject: [Linux-cluster] gfs_fsck fails on large filesystem
In-Reply-To: <44CF94FE.3070407@redhat.com>
References: <44CF2F94.4000003@framestore-cfc.com>
<44CF8383.3040208@redhat.com> <44CF822D.7070705@framestore-cfc.com>
<44CF94FE.3070407@redhat.com>
Message-ID: <44D6E6DD.7050407@yahoo.fr>
Robert Peterson wrote:
> We've tried to kick around ideas on how to improve the speed, such as
> (1) adding an option to only focus on areas where the journals are dirty,
> (2) introducing multiple threads to process the different RGs, and even
> (3) trying to get multiple nodes in the cluster to team up and do
> different
> areas of the file system. None of these have been implemented yet
> because of higher priorities. Since this is an open-source project,
> anyone
> could step in and do these. Volunteers?
>
I've tried to look at the code many times. But, as a clustered file
system is a complex thing, it gets hard to figure out what it's all
about. I tried to find a "big picture" documentation, at least for
on-disk layout. The only nearest thing i've found is :
http://opengfs.sourceforge.net/docs.php , which is the documentation
written at the time OpenGFS forked from Cistina's code. Although
principles may still be the sames (or not ?), the code has obviously
changed and on-disk layout may not be the same, too.
So, is there some sort of documentation about the principles found in
OpenGFS (not a design doc, i've read
/usr/src/linux/Documentation/stable_api_nonsense.txt) ? This would much
help anybody who wishes to enter the code to do it more efficientely...
--
Mathieu
From joni at 2male.com Wed Aug 9 12:59:45 2006
From: joni at 2male.com (2m@le / Jonathan Salomon)
Date: Wed, 9 Aug 2006 14:59:45 +0200
Subject: [Linux-cluster] patch 2.6 kernel without modules
Message-ID: <533F8DF356F31946B3E08ACB9F6EF25F1C7BBA@2003server.2male.com>
Hi all,
I want to use GFS for a webcluster with shared data through a iSCSI SAN. The cluster nodes are diskless and boot through PXE by downloading a kernel and rootfs that is stored in RAM. I have built a custom minimal Linux system with LFS (http://linuxfromscratch.org) to keep the image as small as possible (the smallest Fedora/RedHat I could get by stripping RPMs was still 650MB).
I would like to work without kernel modules and therefore I would like to know whether it is possible to patch the 2.6 kernel to include GFS 'statically' (i.e. no kernel modules). As far as I cann tell the cluster-1.02.00 package I downloaded builds kernel modules.
In addition I would like to know what minimal requirements I need to use GFS. The load balancing itself will be done on other machines with a different setup. Hence I would like to refrain from installing any of that functionality on the cluster nodes. From reading the docs I get the impression GFS needs a whole lot of clustering packages.
Thanks!
Jonathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From lhh at redhat.com Mon Aug 14 15:21:38 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 14 Aug 2006 11:21:38 -0400
Subject: [Linux-cluster] Two clurgmgrd processes
In-Reply-To:
References:
Message-ID: <1155568898.21204.362.camel@ayanami.boston.redhat.com>
On Mon, 2006-08-14 at 09:37 +0200, Troels Arvin wrote:
> Hello,
>
> I've upgraded our RHEL 4+Cluster Suite installation to Update 4. Before
> the upgrade, there was always one and only one clurgmgrd process; after
> the update, there seems to be two. Is this expected, or does in indicate a
> potential problem?
>
Expected.
One is a watchdog daemon (all it does is monitor the other one), it
reboots the node to clean up any resources if the main one crashes.
-- Lon
From lhh at redhat.com Mon Aug 14 15:30:55 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 14 Aug 2006 11:30:55 -0400
Subject: [Linux-cluster] clurgmgrd stops service without reason
In-Reply-To: <44E0353E.5090507@mediatransfer.com>
References: <35159.82.70.162.86.1155209713.squirrel@www.easilymail.co.uk>
<1155242561.21204.283.camel@ayanami.boston.redhat.com>
<44E0353E.5090507@mediatransfer.com>
Message-ID: <1155569455.21204.374.camel@ayanami.boston.redhat.com>
On Mon, 2006-08-14 at 10:33 +0200, Falk Hackenberger - MediaTransfer AG
Netresearch & Consulting wrote:
> Lon Hohberger wrote:
> > Well, your log messages and XML don't match.
> >
> > There's a recent bugzilla noting that rgmanager lacks sufficient error
> > reporting for several resource agents.
> wich bug id?
199678, but it looks like it's marked private. You can file another one
if you would like.
> > I will make a couple of updates to the resource agents shortly (e.g.
> > today or tomorrow), and you can drop them in (on an already-running
> > cluster, without restarting rgmanager). It should, then, provide you
> > the information as to what part is failing.
>
> wich rgmanager version should have this bugfix?
None yet; I submitted a patch on Friday, so others might leave feedback.
-- Lon
From haiwu.us at gmail.com Mon Aug 14 15:38:45 2006
From: haiwu.us at gmail.com (hai wu)
Date: Mon, 14 Aug 2006 10:38:45 -0500
Subject: [Linux-cluster] 2-node cluster and fence_drac
In-Reply-To: <44DA6611.6080602@redhat.com>
References:
<1155138372.21204.158.camel@ayanami.boston.redhat.com>
<1155150332.21204.202.camel@ayanami.boston.redhat.com>
<44DA4B8C.1050602@redhat.com>
<44DA6611.6080602@redhat.com>
Message-ID:
The one I am using is ERA, which is almost the same as DRAC III/XT, except
it is an embeded DRAC card on motherboard, compared to DRAC III/XT, which is
a seperate added DRAC card. But the telnet prompt does not have the matching
pattern in /sbin/fence_drac file, so I just added the following to the file,
which works now.
Thanks,
Hai
if (/Dell Embedded Remote Access Controller \(ERA\)\nFirmware
Version/m)
{
$drac_version = $DRAC_VERSION_III_XT;
} else {
if (/.*\((DRAC[^)]*)\)/m)
{
print "detected drac version '$1'\n" if $verbose;
$drac_version = $1 unless defined $drac_version;
print "WARNING: detected drac version '$1' but using "
. "user defined version '$drac_version'\n"
if ($drac_version ne $1);
}
else
{
print "WARNING: unable to detect DRAC version '$_'\n";
$drac_version = $DRAC_VERSION_UNKNOWN;
}
}
On 8/9/06, James Parsons wrote:
>
> hai wu wrote:
>
> > I got the following prompts after telneting to the drac port, maybe a
> > simple upgrade for the firmware would fix this issue:
> >
> > Dell Embedded Remote Access Controller (ERA)
> > Firmware Version 3.31 (Build 07.15)
> > Login:
> >
> > Thanks,
> > Hai
>
> Oops. Sorry. Unsupported version. If you want, you could hack the agent
> script (it is in perl) and get it to accept that version and just *see*
> if it works -- it might. I tried looking for documentation for that
> firmware rev and couldn't google any. If you know of some, drop me a
> line and maybe we can get something working - or at least know if it
> *will ever* work. :)
>
> -J
>
> BTW, the agent supports DRAC III/XT, DRAC MC, and DRAC 4/I.
>
> >
> > On 8/9/06, *James Parsons* > > wrote:
> >
> > hai wu wrote:
> >
> > > I got the following errors after "reboot -fn" on erd-tt-eproof1,
> > which
> > > script do I need to change?
> > >
> > > Aug 9 15:35:40 erd-tt-eproof2 kernel: CMAN: removing node
> > > erd-tt-eproof1 from t
> > > he cluster : Missed too many heartbeats
> > > Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]: erd-tt-eproof1 not a
> > > cluster member
> > > after 0 sec post_fail_delay
> > > Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]: fencing node
> > "erd-tt-eproof1"
> > > Aug 9 15:35:42 erd-tt-eproof2 fenced[3437]: agent "fence_drac"
> > > reports: WARNING
> > > : unable to detect DRAC version ' Dell Embedded Remote Access
> > > Controller (ERA) F
> > > irmware Version 3.31 (Build 07.15) ' WARNING: unsupported DRAC
> > version
> > > '__unknow
> > > n__' failed: unable to determine power state
> > >
> > > This is DRAC on Dell PE2650.
> > > Thanks,
> > > Hai
> >
> > Do you know what DRAC version you are using? Can you please telnet
> > into
> > the drac port and find out what it says when it starts your session?
> >
> > Thanks,
> >
> > -J
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> >------------------------------------------------------------------------
> >
> >--
> >Linux-cluster mailing list
> >Linux-cluster at redhat.com
> >https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From jparsons at redhat.com Mon Aug 14 15:54:27 2006
From: jparsons at redhat.com (James Parsons)
Date: Mon, 14 Aug 2006 11:54:27 -0400
Subject: [Linux-cluster] 2-node cluster and fence_drac
In-Reply-To:
References: <1155138372.21204.158.camel@ayanami.boston.redhat.com> <1155150332.21204.202.camel@ayanami.boston.redhat.com> <44DA4B8C.1050602@redhat.com> <44DA6611.6080602@redhat.com>
Message-ID: <44E09CB3.5080802@redhat.com>
hai wu wrote:
> The one I am using is ERA, which is almost the same as DRAC III/XT,
> except it is an embeded DRAC card on motherboard, compared to DRAC
> III/XT, which is a seperate added DRAC card. But the telnet prompt
> does not have the matching pattern in /sbin/fence_drac file, so I just
> added the following to the file, which works now.
>
> Thanks,
> Hai
Hai,
So, you are saying that if you are using a DRAC ERA card, and you set
the $drac_version in the script to $DRAC_VERSION_III_XT, then the fence
agent works? If so, we'll patch the agent accordingly. Please confirm.
-J
>
> if (/Dell Embedded Remote Access Controller \(ERA\)\nFirmware
> Version/m)
> {
> $drac_version = $DRAC_VERSION_III_XT;
> } else {
> if (/.*\((DRAC[^)]*)\)/m)
> {
> print "detected drac version '$1'\n" if $verbose;
> $drac_version = $1 unless defined $drac_version;
>
> print "WARNING: detected drac version '$1' but using "
> . "user defined version '$drac_version'\n"
> if ($drac_version ne $1);
> }
> else
> {
> print "WARNING: unable to detect DRAC version '$_'\n";
> $drac_version = $DRAC_VERSION_UNKNOWN;
> }
> }
>
>
>
> On 8/9/06, *James Parsons* > wrote:
>
> hai wu wrote:
>
> > I got the following prompts after telneting to the drac port,
> maybe a
> > simple upgrade for the firmware would fix this issue:
> >
> > Dell Embedded Remote Access Controller (ERA)
> > Firmware Version 3.31 (Build 07.15)
> > Login:
> >
> > Thanks,
> > Hai
>
> Oops. Sorry. Unsupported version. If you want, you could hack the
> agent
> script (it is in perl) and get it to accept that version and just
> *see*
> if it works -- it might. I tried looking for documentation for that
> firmware rev and couldn't google any. If you know of some, drop me a
> line and maybe we can get something working - or at least know if it
> *will ever* work. :)
>
> -J
>
> BTW, the agent supports DRAC III/XT, DRAC MC, and DRAC 4/I.
>
> >
> > On 8/9/06, *James Parsons* < jparsons at redhat.com
>
> > >> wrote:
> >
> > hai wu wrote:
> >
> > > I got the following errors after "reboot -fn" on
> erd-tt-eproof1,
> > which
> > > script do I need to change?
> > >
> > > Aug 9 15:35:40 erd-tt-eproof2 kernel: CMAN: removing node
> > > erd-tt-eproof1 from t
> > > he cluster : Missed too many heartbeats
> > > Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]:
> erd-tt-eproof1 not a
> > > cluster member
> > > after 0 sec post_fail_delay
> > > Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]: fencing node
> > "erd-tt-eproof1"
> > > Aug 9 15:35:42 erd-tt-eproof2 fenced[3437]: agent
> "fence_drac"
> > > reports: WARNING
> > > : unable to detect DRAC version ' Dell Embedded Remote Access
> > > Controller (ERA) F
> > > irmware Version 3.31 (Build 07.15) ' WARNING: unsupported DRAC
> > version
> > > '__unknow
> > > n__' failed: unable to determine power state
> > >
> > > This is DRAC on Dell PE2650.
> > > Thanks,
> > > Hai
> >
> > Do you know what DRAC version you are using? Can you please
> telnet
> > into
> > the drac port and find out what it says when it starts your
> session?
> >
> > Thanks,
> >
> > -J
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > < https://www.redhat.com/mailman/listinfo/linux-cluster>
> >
> >
> >------------------------------------------------------------------------
> >
> >--
> >Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>------------------------------------------------------------------------
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
From haiwu.us at gmail.com Mon Aug 14 16:37:21 2006
From: haiwu.us at gmail.com (hai wu)
Date: Mon, 14 Aug 2006 11:37:21 -0500
Subject: [Linux-cluster] 2-node cluster and fence_drac
In-Reply-To: <44E09CB3.5080802@redhat.com>
References:
<1155138372.21204.158.camel@ayanami.boston.redhat.com>
<1155150332.21204.202.camel@ayanami.boston.redhat.com>
<44DA4B8C.1050602@redhat.com>
<44DA6611.6080602@redhat.com>
<44E09CB3.5080802@redhat.com>
Message-ID:
Yes, the fence agent works after that. I tested it with DRAC firmware v3.20and
v3.35.
According to Dell:
Dell Remote Access Controller - ERA and DRAC III/XT, v.3.20,
A00
was
released on 12/8/2004
Dell Remote Access Controller - ERA and DRAC III/XT, v.3.30,
A00
was
released on 6/9/2005
Dell Remote Access Controller - ERA and DRAC III/XT, v.3.31,
A00
was
released on 07/25/2005
Dell Remote Access Controller - ERA and DRAC III/XT, v.3.35,
A00
was
released on 12/25/2005
I am sure it would work for v3.30 and v3.31 in this case as well.
Thanks,
Hai
On 8/14/06, James Parsons wrote:
>
> hai wu wrote:
>
> > The one I am using is ERA, which is almost the same as DRAC III/XT,
> > except it is an embeded DRAC card on motherboard, compared to DRAC
> > III/XT, which is a seperate added DRAC card. But the telnet prompt
> > does not have the matching pattern in /sbin/fence_drac file, so I just
> > added the following to the file, which works now.
> >
> > Thanks,
> > Hai
>
> Hai,
>
> So, you are saying that if you are using a DRAC ERA card, and you set
> the $drac_version in the script to $DRAC_VERSION_III_XT, then the fence
> agent works? If so, we'll patch the agent accordingly. Please confirm.
>
> -J
>
> >
> > if (/Dell Embedded Remote Access Controller \(ERA\)\nFirmware
> > Version/m)
> > {
> > $drac_version = $DRAC_VERSION_III_XT;
> > } else {
> > if (/.*\((DRAC[^)]*)\)/m)
> > {
> > print "detected drac version '$1'\n" if $verbose;
> > $drac_version = $1 unless defined $drac_version;
> >
> > print "WARNING: detected drac version '$1' but using "
> > . "user defined version '$drac_version'\n"
> > if ($drac_version ne $1);
> > }
> > else
> > {
> > print "WARNING: unable to detect DRAC version '$_'\n";
> > $drac_version = $DRAC_VERSION_UNKNOWN;
> > }
> > }
> >
> >
> >
> > On 8/9/06, *James Parsons* > > wrote:
> >
> > hai wu wrote:
> >
> > > I got the following prompts after telneting to the drac port,
> > maybe a
> > > simple upgrade for the firmware would fix this issue:
> > >
> > > Dell Embedded Remote Access Controller (ERA)
> > > Firmware Version 3.31 (Build 07.15)
> > > Login:
> > >
> > > Thanks,
> > > Hai
> >
> > Oops. Sorry. Unsupported version. If you want, you could hack the
> > agent
> > script (it is in perl) and get it to accept that version and just
> > *see*
> > if it works -- it might. I tried looking for documentation for that
> > firmware rev and couldn't google any. If you know of some, drop me a
> > line and maybe we can get something working - or at least know if it
> > *will ever* work. :)
> >
> > -J
> >
> > BTW, the agent supports DRAC III/XT, DRAC MC, and DRAC 4/I.
> >
> > >
> > > On 8/9/06, *James Parsons* < jparsons at redhat.com
> >
> > > >> wrote:
> > >
> > > hai wu wrote:
> > >
> > > > I got the following errors after "reboot -fn" on
> > erd-tt-eproof1,
> > > which
> > > > script do I need to change?
> > > >
> > > > Aug 9 15:35:40 erd-tt-eproof2 kernel: CMAN: removing node
> > > > erd-tt-eproof1 from t
> > > > he cluster : Missed too many heartbeats
> > > > Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]:
> > erd-tt-eproof1 not a
> > > > cluster member
> > > > after 0 sec post_fail_delay
> > > > Aug 9 15:35:40 erd-tt-eproof2 fenced[3437]: fencing node
> > > "erd-tt-eproof1"
> > > > Aug 9 15:35:42 erd-tt-eproof2 fenced[3437]: agent
> > "fence_drac"
> > > > reports: WARNING
> > > > : unable to detect DRAC version ' Dell Embedded Remote
> Access
> > > > Controller (ERA) F
> > > > irmware Version 3.31 (Build 07.15) ' WARNING: unsupported
> DRAC
> > > version
> > > > '__unknow
> > > > n__' failed: unable to determine power state
> > > >
> > > > This is DRAC on Dell PE2650.
> > > > Thanks,
> > > > Hai
> > >
> > > Do you know what DRAC version you are using? Can you please
> > telnet
> > > into
> > > the drac port and find out what it says when it starts your
> > session?
> > >
> > > Thanks,
> > >
> > > -J
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > >
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > > < https://www.redhat.com/mailman/listinfo/linux-cluster>
> > >
> > >
> >
> >------------------------------------------------------------------------
> > >
> > >--
> > >Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > >https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >------------------------------------------------------------------------
> >
> >--
> >Linux-cluster mailing list
> >Linux-cluster at redhat.com
> >https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From dist-list at LEXUM.UMontreal.CA Mon Aug 14 17:28:34 2006
From: dist-list at LEXUM.UMontreal.CA (FM)
Date: Mon, 14 Aug 2006 13:28:34 -0400
Subject: [Linux-cluster] cluster suite and webfarm + GFS ???
Message-ID: <44E0B2C2.2010808@lexum.umontreal.ca>
Hello,
After 1 week of trial and error, and a lot of reading, I still cannot
figure out how-to configure :
3 webservers (behind director servers using piranha_gui) connected to a
GFS file system (on a LUN).
What I am trying to achieve :
3 servers connected to a GFS file system. the web root is on the GFS
file system. HTTPD up on all servers for failover/load balancing (httpd
is monitored by the director servers).
The prob is the link httpd/gfs. if I loose the GFS link, httpd will
still response to the director, but the information will be missing.
That's where, I suppose, cluster suite takes place. But I am unable to
have httpd+gfs activated on each node at the same time !
I tried :
1 failover domain per node + 1service per failover domain + 1 resource
1 failover domain per node + 1service per failover domain + 1 resource
per service
All failed to achieve my goal !
Can someone have a clue ? I suppose I'm not the first to try this.
Thanks !
From lhh at redhat.com Mon Aug 14 17:33:52 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 14 Aug 2006 13:33:52 -0400
Subject: [Linux-cluster] cluster suite and webfarm + GFS ???
In-Reply-To: <44E0B2C2.2010808@lexum.umontreal.ca>
References: <44E0B2C2.2010808@lexum.umontreal.ca>
Message-ID: <1155576832.21204.377.camel@ayanami.boston.redhat.com>
On Mon, 2006-08-14 at 13:28 -0400, FM wrote:
> Hello,
> After 1 week of trial and error, and a lot of reading, I still cannot
> figure out how-to configure :
> 3 webservers (behind director servers using piranha_gui) connected to a
> GFS file system (on a LUN).
>
> What I am trying to achieve :
> 3 servers connected to a GFS file system. the web root is on the GFS
> file system. HTTPD up on all servers for failover/load balancing (httpd
> is monitored by the director servers).
> The prob is the link httpd/gfs. if I loose the GFS link, httpd will
> still response to the director, but the information will be missing.
> That's where, I suppose, cluster suite takes place. But I am unable to
> have httpd+gfs activated on each node at the same time !
>
> I tried :
> 1 failover domain per node + 1service per failover domain + 1 resource
> 1 failover domain per node + 1service per failover domain + 1 resource
> per service
>
> All failed to achieve my goal !
>
> Can someone have a clue ? I suppose I'm not the first to try this.
Could you post one of your configurations?
-- Lon
From dist-list at LEXUM.UMontreal.CA Mon Aug 14 17:45:24 2006
From: dist-list at LEXUM.UMontreal.CA (FM)
Date: Mon, 14 Aug 2006 13:45:24 -0400
Subject: [Linux-cluster] cluster suite and webfarm + GFS ???
In-Reply-To: <1155576832.21204.377.camel@ayanami.boston.redhat.com>
References: <44E0B2C2.2010808@lexum.umontreal.ca>
<1155576832.21204.377.camel@ayanami.boston.redhat.com>
Message-ID: <44E0B6B4.4070709@lexum.umontreal.ca>
sure here is my latest cluster.conf.
Note that on this one the is no gfs service. I trying to fixe my httpd
prob (httpd up on all nodes) before adding gfs in the game :-)
thanks !
------------------------------------
Fr?d?ric M?dery
Administrateur Syst?me /
System Administrator
LexUM, Universit? de Montr?al
email : mederyf at lexum.umontreal.ca
tel. : (514) 343-6111 p. 1-3288
fax. : (514) 343-7359
------------------------------------
Lon Hohberger wrote:
> On Mon, 2006-08-14 at 13:28 -0400, FM wrote:
>
>> Hello,
>> After 1 week of trial and error, and a lot of reading, I still cannot
>> figure out how-to configure :
>> 3 webservers (behind director servers using piranha_gui) connected to a
>> GFS file system (on a LUN).
>>
>> What I am trying to achieve :
>> 3 servers connected to a GFS file system. the web root is on the GFS
>> file system. HTTPD up on all servers for failover/load balancing (httpd
>> is monitored by the director servers).
>> The prob is the link httpd/gfs. if I loose the GFS link, httpd will
>> still response to the director, but the information will be missing.
>> That's where, I suppose, cluster suite takes place. But I am unable to
>> have httpd+gfs activated on each node at the same time !
>>
>> I tried :
>> 1 failover domain per node + 1service per failover domain + 1 resource
>> 1 failover domain per node + 1service per failover domain + 1 resource
>> per service
>>
>> All failed to achieve my goal !
>>
>> Can someone have a clue ? I suppose I'm not the first to try this.
>>
>
> Could you post one of your configurations?
>
> -- Lon
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
From adel at opennet.ae Mon Aug 14 17:30:34 2006
From: adel at opennet.ae (Adel Ben Zarrouk)
Date: Mon, 14 Aug 2006 21:30:34 +0400
Subject: [Linux-cluster] RH GFS 6.0 and fencing configuration
Message-ID: <200608142130.35116.adel@opennet.ae>
Dear All,
I am trying to solve a problem regarding one of the fencing method which
supposed to work without any problem (ILO)
I installed two gfs nodes with RH AS3 U7 and GFS 6.0 latest update, but the
problem I don't have a sufficient hardware, like the power switches (WTI) or
additional lock server, I installed, 2 HP DL380, attached on it SAN fibre
channel storage (QLogic), with ONE normal pc lock server, it seems working
fine, but i didn't get how can i configure the fencing device since I have
only ILO,
I tried to make one of the scenario if one of the server goes done, what will
happen, and below the result:
in the beginning of the startup of the all server, everything works perfect,
loading of the all modules except the daemon " lock_gulmd" doesn't work, it
looks for the lock server, and the lock server looks for fencing device and
the method must be used, after certain time out, it fail
Somebody have any idea how can I solve his problem without additional hardware
Regards
--Adel
--
Adel Ben Zarrouk
Opennet MEA FZ LLC
Tel: +971 4 390 1943
Fax: +971 4 390 4767
http://www.opennet.ae/
From Matthew.Patton.ctr at osd.mil Mon Aug 14 22:17:00 2006
From: Matthew.Patton.ctr at osd.mil (Patton, Matthew F, CTR, OSD-PA&E)
Date: Mon, 14 Aug 2006 18:17:00 -0400
Subject: [Linux-cluster] cluster suite and webfarm + GFS ???
Message-ID:
Classification: UNCLASSIFIED
this doesnt solve your problem but why not just leave all httpd processes
running on all the boxes and put some smarts in the loadbalancer such that
if a page fetch comes up empty/invalid to mark that server as offline. So if
GFS goes wondering off, cluster services doesnt have to figure if http is
valid or not. Obviously cluster services needs to figure out that GFS went
screwy...
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From mailing-lists at hughesjr.com Tue Aug 15 09:36:52 2006
From: mailing-lists at hughesjr.com (Johnny Hughes)
Date: Tue, 15 Aug 2006 04:36:52 -0500
Subject: [Linux-cluster] Missing GFS-6.0.2.34-2.src.rpm
Message-ID: <1155634612.28367.21.camel@myth.home.local>
Why has the latest SRPM from here:
http://rhn.redhat.com/errata/RHBA-2006-0593.html
Not yet been posted to:
http://ftp.redhat.com/pub/redhat/linux/updates/enterprise/3AS/en/RHGFS/SRPMS/
Anybody know where GFS-6.0.2.34-2.src.rpm is?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL:
From lhh at redhat.com Tue Aug 15 17:12:52 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 15 Aug 2006 13:12:52 -0400
Subject: [Linux-cluster] clurgmgrd stops service without reason
In-Reply-To: <1155569455.21204.374.camel@ayanami.boston.redhat.com>
References: <35159.82.70.162.86.1155209713.squirrel@www.easilymail.co.uk>
<1155242561.21204.283.camel@ayanami.boston.redhat.com>
<44E0353E.5090507@mediatransfer.com>
<1155569455.21204.374.camel@ayanami.boston.redhat.com>
Message-ID: <1155661972.24719.2.camel@ayanami.boston.redhat.com>
On Mon, 2006-08-14 at 11:30 -0400, Lon Hohberger wrote:
> On Mon, 2006-08-14 at 10:33 +0200, Falk Hackenberger - MediaTransfer AG
> Netresearch & Consulting wrote:
> > Lon Hohberger wrote:
> > > Well, your log messages and XML don't match.
> > >
> > > There's a recent bugzilla noting that rgmanager lacks sufficient error
> > > reporting for several resource agents.
> > wich bug id?
>
> 199678, but it looks like it's marked private. You can file another one
> if you would like.
I've filed this bugzilla which is viewable by all:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=202637
Add yourself to it, and you can track how progress goes. It should be
dome this week, I hope.
-- Lon
From lhh at redhat.com Tue Aug 15 17:22:12 2006
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 15 Aug 2006 13:22:12 -0400
Subject: [Linux-cluster] cluster suite and webfarm + GFS ???
In-Reply-To: <44E0B6B4.4070709@lexum.umontreal.ca>
References: <44E0B2C2.2010808@lexum.umontreal.ca>
<1155576832.21204.377.camel@ayanami.boston.redhat.com>
<44E0B6B4.4070709@lexum.umontreal.ca>
Message-ID: <1155662533.24719.12.camel@ayanami.boston.redhat.com>
On Mon, 2006-08-14 at 13:45 -0400, FM wrote:
> sure here is my latest cluster.conf.
> Note that on this one the is no gfs service. I trying to fixe my httpd
> prob (httpd up on all nodes) before adding gfs in the game :-)
> thanks !
>
>
>
>
>
>
>
>
> nodename="lecce"/>
>
>
>
>
>
>
> nodename="cagliari"/>
>
>
>
>
>
>
>
>
>
>
> restricted="1">
> priority="1"/>
>
> restricted="1">
> priority="1"/>
>
>
>
>