From mcollins at flmnh.ufl.edu  Tue Dec  1 21:38:23 2009
From: mcollins at flmnh.ufl.edu (Matthew Collins)
Date: Tue, 01 Dec 2009 16:38:23 -0500
Subject: [Linux-cluster] Limiting the number of VMs that start at once
Message-ID: <4B158CCF.3040801@flmnh.ufl.edu>

Is there a structure for staggering the starting of resources when 
failing over to another node? The problem I'm having is that when one 
node fails and its Xen VMs start on another node in the failover domain, 
that second node's load is so high it can't respond to tokens or qdisk 
requests in a timely fashion and it gets fenced.

This is kind of specific to VM resources which have high startup costs 
so I was going to hack the vm.sh script. Does anyone have a better idea? 
Would anyone want my hacks when I'm done?

-- 
Matt Collins
Systems Administrator
Florida Museum of Natural History



From rmicmirregs at gmail.com  Tue Dec  1 23:33:00 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Wed, 02 Dec 2009 00:33:00 +0100
Subject: [Linux-cluster] Qdisk with multiple heuristics?
Message-ID: <1259710380.6571.14.camel@mecatol>

Hi all,

As it can be found in qdiskd man page, it is allowed to use up to 10
different heuristics in one cluster.

How is this specified into cluster.conf? I'm trying to make it work with
the following piece of cluster.conf file:

<quorumd device="/dev/mpath/quorum" interval="6" min_score="1" tko="10"
votes="3">
	<heuristic interval="3" program="ping -c3 -t2 10.240.199.1" score="1"/>
	<heuristic interval="3" program="ping -c3 -t2 10.240.199.47"
score="1"/>
</quorumd>

My objective is to have 2 (or more) different heuristics which keep this
node alive even if only one heuristic is OK. The cluster.conf file was
created with system-config-cluster and later was edited by hand.

The qdisk and heuristics are not working:
1.- system-config-cluster shows me a warning about an error related to
some options not allowed into quorumd. I'm sorry i cannot be more
specific right now, I could attach the exact message tomorrow.

2.- The cluster is operational, but using "clustat" i don't see the
qdisk with its votes in the node list. The qdisk process is neither
shown in the process list on the system.

Is there somethin wrong?

I'm using RHEL5.3 with:
cman-2.0.98-1.el5.x86_64
openais-0.80.3-22.el5.x86_64
rgmanager-2.0.46-1.el5.x86_64


Thanks in advance. Cheers,

Rafael


-- 
Rafael Mic? Miranda



From maniac.nl at gmail.com  Wed Dec  2 10:09:48 2009
From: maniac.nl at gmail.com (Mark Janssen)
Date: Wed, 2 Dec 2009 11:09:48 +0100
Subject: [Linux-cluster] GFS - Small files - Performance
In-Reply-To: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com>
References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com>
Message-ID: <531e3e4c0912020209q5a617654g20daba32154c79b2@mail.gmail.com>

2009/11/30 Leonardo D'Angelo Gon?alves <leonardodg2084 at gmail.com>:
> Hi
>
> I have a GFS cluster on RHEL4.8 which one filesystem (10G) with? various
> directories and sub-directories and small files about 5Kb. When I run the
> command "du-sh" in the directory it generates about 1500 IOPS on the disks,
> for GFS it takes time about 5 minutes and 2 second for ext3 filesyem. Could
> someone help me with this problem. follows below the output of gfs_tool
> Why for GFS it takes 5 minutes and ext3 2 seconds ? Is there any relation ?

Try setting statfs_fast to '1'. This should speed up commands like 'df'.

gfs_tool settune <mount> statfs_fast 1

Do note that when you resize your filesystem you have to turn it back
off, and then back on again to update the size of your filesystem.

-- 
Mark Janssen  --  maniac(at)maniac.nl  --  pgp: 0x357D2178 |   ,''`.  |
Unix / Linux Open-Source and Internet Consultant @ Snow.nl |  : :' :  |
Maniac.nl      MarkJanssen.nl      NerdNet.nl      Unix.nl |  `. `'   |
Skype: markmjanssen ICQ: 129696007 irc: FooBar on undernet |    `-    |



From brem.belguebli at gmail.com  Wed Dec  2 10:49:30 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Wed, 2 Dec 2009 11:49:30 +0100
Subject: [Linux-cluster] Qdisk with multiple heuristics?
In-Reply-To: <1259710380.6571.14.camel@mecatol>
References: <1259710380.6571.14.camel@mecatol>
Message-ID: <29ae894c0912020249q198d8410sc9007a6398757ea4@mail.gmail.com>

Hi Rafael,

Concerning your second point, have you initialized your
/dev/mpath/quorum device with mkqdisk ?

Also, the qdisk daemon must be running if you want it to be
operationnal in your cluster.

In my setup, everything is started manually, no automatic boot time
cluster start (safest option IMHO), and I use the following stepping:

1) start qdisk (service qdiskd start)
2) start cman (service cman start)
3) start rgmanager (service rgmanager start)
4) wait untill the cluster is quorate (a shell loop) before starting clvmd
5) start clvmd

 Output of clustat:


Cluster Status for rhcl1 @ Wed Dec  2 11:20:20 2009
Member Status: Quorate

 Member Name                                           ID   Status
 ------ ----                                                     ---- ------
 node1.mydom                                        1 Online, Local, rgmanager
 node2.mydom                                        2 Online, rgmanager
 node3.mydom                                        3 Online, rgmanager
 /dev/iscsi/storage.quorum                        0 Online, Quorum
Disk                       <-- Qorum disk started...

 Service Name                                            Owner (Last)
                                          State
....


[root at node1 ~]# ps -edf | grep qdisk
root      4409     1  0 Nov26 ?        00:04:00 qdiskd -Q


Concerning your point 1, you may address this by giving a different
score to each heuristic, but I clearly don't know if this is what it
intends to.

Brem
Regards


2009/12/2 Rafael Mic? Miranda <rmicmirregs at gmail.com>:
> Hi all,
>
> As it can be found in qdiskd man page, it is allowed to use up to 10
> different heuristics in one cluster.
>
> How is this specified into cluster.conf? I'm trying to make it work with
> the following piece of cluster.conf file:
>
> <quorumd device="/dev/mpath/quorum" interval="6" min_score="1" tko="10"
> votes="3">
> ? ? ? ?<heuristic interval="3" program="ping -c3 -t2 10.240.199.1" score="1"/>
> ? ? ? ?<heuristic interval="3" program="ping -c3 -t2 10.240.199.47"
> score="1"/>
> </quorumd>
>
> My objective is to have 2 (or more) different heuristics which keep this
> node alive even if only one heuristic is OK. The cluster.conf file was
> created with system-config-cluster and later was edited by hand.
>
> The qdisk and heuristics are not working:
> 1.- system-config-cluster shows me a warning about an error related to
> some options not allowed into quorumd. I'm sorry i cannot be more
> specific right now, I could attach the exact message tomorrow.
>
> 2.- The cluster is operational, but using "clustat" i don't see the
> qdisk with its votes in the node list. The qdisk process is neither
> shown in the process list on the system.
>
> Is there somethin wrong?
>
> I'm using RHEL5.3 with:
> cman-2.0.98-1.el5.x86_64
> openais-0.80.3-22.el5.x86_64
> rgmanager-2.0.46-1.el5.x86_64
>
>
> Thanks in advance. Cheers,
>
> Rafael
>
>
> --
> Rafael Mic? Miranda
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From leonardodg2084 at gmail.com  Wed Dec  2 10:50:56 2009
From: leonardodg2084 at gmail.com (=?ISO-8859-1?Q?Leonardo_D=27Angelo_Gon=E7alves?=)
Date: Wed, 2 Dec 2009 08:50:56 -0200
Subject: [Linux-cluster] GFS - Small files - Performance
In-Reply-To: <531e3e4c0912020209q5a617654g20daba32154c79b2@mail.gmail.com>
References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com>
	<531e3e4c0912020209q5a617654g20daba32154c79b2@mail.gmail.com>
Message-ID: <3170ac020912020250l56177bd4p420e3e714756c3dd@mail.gmail.com>

Hi..

So.. I set up this configuration but, don`t resolve my problem.

ilimit1 = 100
ilimit1_tries = 3
ilimit1_min = 1
ilimit2 = 500
ilimit2_tries = 10
ilimit2_min = 3
demote_secs = 100
incore_log_blocks = 1024
jindex_refresh_secs = 60
depend_secs = 60
scand_secs = 3
recoverd_secs = 60
logd_secs = 1
quotad_secs = 5
inoded_secs = 15
glock_purge = 50
quota_simul_sync = 64
quota_warn_period = 10
atime_quantum = 3600
quota_quantum = 60
quota_scale = 1.0000   (1, 1)
quota_enforce = 1
quota_account = 1
new_files_jdata = 0
new_files_directio = 0
max_atomic_write = 4194304
max_readahead = 262144
lockdump_size = 131072
stall_secs = 600
complain_secs = 10
reclaim_limit = 5000
entries_per_readdir = 32
prefetch_secs = 10
statfs_slots = 64
max_mhc = 10000
greedy_default = 100
greedy_quantum = 25
greedy_max = 250
rgrp_try_threshold = 100
statfs_fast = 1
seq_readahead = 0


2009/12/2 Mark Janssen <maniac.nl at gmail.com>

> 2009/11/30 Leonardo D'Angelo Gon?alves <leonardodg2084 at gmail.com>:
> > Hi
> >
> > I have a GFS cluster on RHEL4.8 which one filesystem (10G) with  various
> > directories and sub-directories and small files about 5Kb. When I run the
> > command "du-sh" in the directory it generates about 1500 IOPS on the
> disks,
> > for GFS it takes time about 5 minutes and 2 second for ext3 filesyem.
> Could
> > someone help me with this problem. follows below the output of gfs_tool
> > Why for GFS it takes 5 minutes and ext3 2 seconds ? Is there any relation
> ?
>
> Try setting statfs_fast to '1'. This should speed up commands like 'df'.
>
> gfs_tool settune <mount> statfs_fast 1
>
> Do note that when you resize your filesystem you have to turn it back
> off, and then back on again to update the size of your filesystem.
>
> --
> Mark Janssen  --  maniac(at)maniac.nl  --  pgp: 0x357D2178 |   ,''`.  |
> Unix / Linux Open-Source and Internet Consultant @ Snow.nl |  : :' :  |
> Maniac.nl      MarkJanssen.nl      NerdNet.nl      Unix.nl |  `. `'   |
> Skype: markmjanssen ICQ: 129696007 irc: FooBar on undernet |    `-    |
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091202/fa120ad0/attachment.htm>

From frank at si.ct.upc.edu  Wed Dec  2 11:53:46 2009
From: frank at si.ct.upc.edu (frank)
Date: Wed, 02 Dec 2009 12:53:46 +0100
Subject: [Linux-cluster] GFS performance test
Message-ID: <4B16554A.50002@si.ct.upc.edu>

Hi,
after seeing some posts related to GFS performance, we have decided to 
test our two-node GFS filesystem with ping_pong program.
We are worried about the results.

Running the program in only one node, without parameters, we get between 
800000 locks/sec and 900000 locks/sec
Running the program in both nodes over the same file on the shared 
filesystem, the lock rate did not drop and it is the same in both nodes! 
What does this mean? Is there any problem with locks ?

Just for you info, GFS filesystem is /mnt/gfs and what I run in both 
nodes is:

./ping_pong /mnt/gfs/tmp/test.dat 3

Thanks for your help.

Frank


-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est? net.
For all your IT requirements visit: http://www.transtec.co.uk



From dan at quah.ro  Wed Dec  2 12:09:20 2009
From: dan at quah.ro (Dan Candea)
Date: Wed, 2 Dec 2009 14:09:20 +0200
Subject: [Linux-cluster] gfs2 assertion "!mapping->nrpages" failed on rsync
In-Reply-To: <531e3e4c0912020209q5a617654g20daba32154c79b2@mail.gmail.com>
References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com>
	<531e3e4c0912020209q5a617654g20daba32154c79b2@mail.gmail.com>
Message-ID: <200912021409.21001.dan@quah.ro>


hello

randomly , during a nightly backup with rsync I receive the error below on a 3 
node setup with cluster2.  because of the withdraw I can't unmount without a 
reboot.

does someone have a clue?


GFS2: fsid=data:FSdata.0: fatal: assertion "!mapping->nrpages" failed
GFS2: fsid=data:FSdata.0:   function = gfs2_meta_inval, file = 
fs/gfs2/meta_io.c, line = 110
GFS2: fsid=data:FSdata.0: about to withdraw this file system
GFS2: fsid=data:FSdata.0: telling LM to withdraw
GFS2: fsid=data:FSdata.0: withdrawn
Pid: 4643, comm: glock_workqueue Not tainted 2.6.28-hardened-r9 #1
Call Trace:
 [<ffffffffa008e4ea>] 0xffffffffa008e4ea
 [<ffffffff8025ecee>] 0xffffffff8025ecee
 [<ffffffffa0091307>] 0xffffffffa0091307
 [<ffffffffa008f640>] 0xffffffffa008f640
 [<ffffffffa000fc18>] 0xffffffffa000fc18
 [<ffffffffa000bfe8>] 0xffffffffa000bfe8
 [<ffffffff8022605c>] 0xffffffff8022605c
 [<ffffffffa008f060>] 0xffffffffa008f060
 [<ffffffffa008e5cb>] 0xffffffffa008e5cb
 [<ffffffffa00912f3>] 0xffffffffa00912f3
 [<ffffffffa0077a9b>] 0xffffffffa0077a9b
 [<ffffffffa0076a03>] 0xffffffffa0076a03
 [<ffffffffa00771f7>] 0xffffffffa00771f7
 [<ffffffff8023b43e>] 0xffffffff8023b43e
 [<ffffffff8023b571>] 0xffffffff8023b571
 [<ffffffff8023eee5>] 0xffffffff8023eee5
 [<ffffffff8023eee5>] 0xffffffff8023eee5
 [<ffffffff8023b4d8>] 0xffffffff8023b4d8
 [<ffffffff8023e794>] 0xffffffff8023e794
 [<ffffffff802035e9>] 0xffffffff802035e9
 [<ffffffff8023e72b>] 0xffffffff8023e72b
 [<ffffffff802035df>] 0xffffffff802035df


regards
-- 
Dan C?ndea
Does God Play Dice?



From dan at quah.ro  Wed Dec  2 12:15:01 2009
From: dan at quah.ro (Dan Candea)
Date: Wed, 2 Dec 2009 14:15:01 +0200
Subject: [Linux-cluster] gfs2 assertion "!mapping->nrpages" failed on rsync
Message-ID: <200912021415.01701.dan@quah.ro>

hello

randomly , during a nightly backup with rsync I receive the error below on a 3 
node setup with cluster2.  because of the withdraw I can't unmount without a 
reboot.

does someone have a clue?


GFS2: fsid=data:FSdata.0: fatal: assertion "!mapping->nrpages" failed
GFS2: fsid=data:FSdata.0:   function = gfs2_meta_inval, file = 
fs/gfs2/meta_io.c, line = 110
GFS2: fsid=data:FSdata.0: about to withdraw this file system
GFS2: fsid=data:FSdata.0: telling LM to withdraw
GFS2: fsid=data:FSdata.0: withdrawn
Pid: 4643, comm: glock_workqueue Not tainted 2.6.28-hardened-r9 #1
Call Trace:
 [<ffffffffa008e4ea>] 0xffffffffa008e4ea
 [<ffffffff8025ecee>] 0xffffffff8025ecee
 [<ffffffffa0091307>] 0xffffffffa0091307
 [<ffffffffa008f640>] 0xffffffffa008f640
 [<ffffffffa000fc18>] 0xffffffffa000fc18
 [<ffffffffa000bfe8>] 0xffffffffa000bfe8
 [<ffffffff8022605c>] 0xffffffff8022605c
 [<ffffffffa008f060>] 0xffffffffa008f060
 [<ffffffffa008e5cb>] 0xffffffffa008e5cb
 [<ffffffffa00912f3>] 0xffffffffa00912f3
 [<ffffffffa0077a9b>] 0xffffffffa0077a9b
 [<ffffffffa0076a03>] 0xffffffffa0076a03
 [<ffffffffa00771f7>] 0xffffffffa00771f7
 [<ffffffff8023b43e>] 0xffffffff8023b43e
 [<ffffffff8023b571>] 0xffffffff8023b571
 [<ffffffff8023eee5>] 0xffffffff8023eee5
 [<ffffffff8023eee5>] 0xffffffff8023eee5
 [<ffffffff8023b4d8>] 0xffffffff8023b4d8
 [<ffffffff8023e794>] 0xffffffff8023e794
 [<ffffffff802035e9>] 0xffffffff802035e9
 [<ffffffff8023e72b>] 0xffffffff8023e72b
 [<ffffffff802035df>] 0xffffffff802035df


regards
-- 
Dan C?ndea
Does God Play Dice?



From swhiteho at redhat.com  Wed Dec  2 12:48:06 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 02 Dec 2009 12:48:06 +0000
Subject: [Linux-cluster] gfs2 assertion "!mapping->nrpages" failed on rsync
In-Reply-To: <200912021409.21001.dan@quah.ro>
References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com>
	<531e3e4c0912020209q5a617654g20daba32154c79b2@mail.gmail.com>
	<200912021409.21001.dan@quah.ro>
Message-ID: <1259758086.6052.959.camel@localhost.localdomain>

Hi,

On Wed, 2009-12-02 at 14:09 +0200, Dan Candea wrote:
> hello
> 
> randomly , during a nightly backup with rsync I receive the error below on a 3 
> node setup with cluster2.  because of the withdraw I can't unmount without a 
> reboot.
> 
> does someone have a clue?
> 
> 
> GFS2: fsid=data:FSdata.0: fatal: assertion "!mapping->nrpages" failed
> GFS2: fsid=data:FSdata.0:   function = gfs2_meta_inval, file = 
> fs/gfs2/meta_io.c, line = 110
> GFS2: fsid=data:FSdata.0: about to withdraw this file system
> GFS2: fsid=data:FSdata.0: telling LM to withdraw
> GFS2: fsid=data:FSdata.0: withdrawn
> Pid: 4643, comm: glock_workqueue Not tainted 2.6.28-hardened-r9 #1
I don't recognise this kernel version, which distro is it from?

Can you reproduce this issue? I've heard of an issue involving rsync,
but having now tried various different rsync commands, I've not been
able to reproduce anything that fails.

> Call Trace:
>  [<ffffffffa008e4ea>] 0xffffffffa008e4ea
>  [<ffffffff8025ecee>] 0xffffffff8025ecee
>  [<ffffffffa0091307>] 0xffffffffa0091307
>  [<ffffffffa008f640>] 0xffffffffa008f640
>  [<ffffffffa000fc18>] 0xffffffffa000fc18
>  [<ffffffffa000bfe8>] 0xffffffffa000bfe8
>  [<ffffffff8022605c>] 0xffffffff8022605c
>  [<ffffffffa008f060>] 0xffffffffa008f060
>  [<ffffffffa008e5cb>] 0xffffffffa008e5cb
>  [<ffffffffa00912f3>] 0xffffffffa00912f3
>  [<ffffffffa0077a9b>] 0xffffffffa0077a9b
>  [<ffffffffa0076a03>] 0xffffffffa0076a03
>  [<ffffffffa00771f7>] 0xffffffffa00771f7
>  [<ffffffff8023b43e>] 0xffffffff8023b43e
>  [<ffffffff8023b571>] 0xffffffff8023b571
>  [<ffffffff8023eee5>] 0xffffffff8023eee5
>  [<ffffffff8023eee5>] 0xffffffff8023eee5
>  [<ffffffff8023b4d8>] 0xffffffff8023b4d8
>  [<ffffffff8023e794>] 0xffffffff8023e794
>  [<ffffffff802035e9>] 0xffffffff802035e9
>  [<ffffffff8023e72b>] 0xffffffff8023e72b
>  [<ffffffff802035df>] 0xffffffff802035df
> 
This set of numbers is pretty useless without being translated into
symbols. On the other hand the assertion which you've hit is GFS2
complaining that its requested that the pages relating to an inode to be
invalidated, but there are some that have not been removed after that
invalidation. So in this particular case it doesn't matter,

Steve.

> 
> regards



From mm at yuhu.biz  Wed Dec  2 12:54:45 2009
From: mm at yuhu.biz (Marian Marinov)
Date: Wed, 2 Dec 2009 14:54:45 +0200
Subject: [Linux-cluster] Searching for speakers
Message-ID: <200912021454.53872.mm@yuhu.biz>

Hello,
sorry for the off topic e-mail, but I'm organizing the biggest FOSS conference 
in Bulgaria - OpenFest. And I'm curious if any one of you guys is interested 
in coming to Bulgaria next year and speaking about CLVM or the Cluster project 
as a whole?

Next year's OpenFest will be held in Sofia, Bulgaria at 6-7 of November.

If you are interested, please contact me.

Again sorry for the off-topic mail.

-- 
Best regards,
Marian Marinov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091202/cc2d3467/attachment.sig>

From rvandolson at esri.com  Wed Dec  2 14:58:43 2009
From: rvandolson at esri.com (Ray Van Dolson)
Date: Wed, 2 Dec 2009 06:58:43 -0800
Subject: [Linux-cluster] GFS performance test
In-Reply-To: <4B16554A.50002@si.ct.upc.edu>
References: <4B16554A.50002@si.ct.upc.edu>
Message-ID: <20091202145842.GA16292@esri.com>

On Wed, Dec 02, 2009 at 03:53:46AM -0800, frank wrote:
> Hi,
> after seeing some posts related to GFS performance, we have decided to 
> test our two-node GFS filesystem with ping_pong program.
> We are worried about the results.
> 
> Running the program in only one node, without parameters, we get between 
> 800000 locks/sec and 900000 locks/sec
> Running the program in both nodes over the same file on the shared 
> filesystem, the lock rate did not drop and it is the same in both nodes! 
> What does this mean? Is there any problem with locks ?
> 
> Just for you info, GFS filesystem is /mnt/gfs and what I run in both 
> nodes is:
> 
> ./ping_pong /mnt/gfs/tmp/test.dat 3
> 
> Thanks for your help.
> 

Wow, that doesn't sound right at all (or at least not consistent with
results I've gotten :)

Can you provide details of your setup, and perhaps your cluster.conf
file?  Have you done any other GFS tuning?  Are we talking GFS1 or
GFS2?

I get in the 3000-5000 locks/sec range with my GFS2 filesystem (using
nodiratime,noatime and reducing the lock limit to 0 from 100 in my
cluster.conf file).

The numbers you provide I'd expect to see on a local filesystem.

Ray



From swhiteho at redhat.com  Wed Dec  2 15:14:21 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 02 Dec 2009 15:14:21 +0000
Subject: [Linux-cluster] GFS performance test
In-Reply-To: <20091202145842.GA16292@esri.com>
References: <4B16554A.50002@si.ct.upc.edu> <20091202145842.GA16292@esri.com>
Message-ID: <1259766861.6052.963.camel@localhost.localdomain>

Hi,

On Wed, 2009-12-02 at 06:58 -0800, Ray Van Dolson wrote:
> On Wed, Dec 02, 2009 at 03:53:46AM -0800, frank wrote:
> > Hi,
> > after seeing some posts related to GFS performance, we have decided to 
> > test our two-node GFS filesystem with ping_pong program.
> > We are worried about the results.
> > 
> > Running the program in only one node, without parameters, we get between 
> > 800000 locks/sec and 900000 locks/sec
> > Running the program in both nodes over the same file on the shared 
> > filesystem, the lock rate did not drop and it is the same in both nodes! 
> > What does this mean? Is there any problem with locks ?
> > 
> > Just for you info, GFS filesystem is /mnt/gfs and what I run in both 
> > nodes is:
> > 
> > ./ping_pong /mnt/gfs/tmp/test.dat 3
> > 
> > Thanks for your help.
> > 
> 
> Wow, that doesn't sound right at all (or at least not consistent with
> results I've gotten :)
> 
> Can you provide details of your setup, and perhaps your cluster.conf
> file?  Have you done any other GFS tuning?  Are we talking GFS1 or
> GFS2?
> 
> I get in the 3000-5000 locks/sec range with my GFS2 filesystem (using
> nodiratime,noatime and reducing the lock limit to 0 from 100 in my
> cluster.conf file).
> 
> The numbers you provide I'd expect to see on a local filesystem.
> 
> Ray
> 
If you are mounting with lock_nolock, then the locks are the same as for
any other local filesystem, so you'll see it works much faster than any
clustered arrangement. If the lock rate appears to be that high in the
cluster, maybe the localflocks mount parameter has been specified which
means that the locking will be done locally on each node, and is not
being done across the cluster. Ray's figures sound much more reasonable,

Steve.




From marcos.david at efacec.com  Wed Dec  2 16:09:12 2009
From: marcos.david at efacec.com (Marcos David)
Date: Wed, 02 Dec 2009 16:09:12 +0000
Subject: [Linux-cluster] Random clurgmrgd crashes
In-Reply-To: <4B1656EF.50301@efacec.com>
References: <1259710380.6571.14.camel@mecatol> <4B16544A.5060408@efacec.com>
	<4B1656EF.50301@efacec.com>
Message-ID: <4B169128.3040903@efacec.com>

Hi,

I'm experiencing random crashes on clurgmgrd on a 4 node RHEL5.3  cluster.
This is a big problem since it is happening on the production cluster....

The corefile backtrace gives:

Core was generated by `clurgmgrd -d'.
Program terminated with signal 6, Aborted.
[New process 2495]
#0  0x0068b402 in __kernel_vsyscall ()
(gdb) bt
#0  0x0068b402 in __kernel_vsyscall ()
#1  0x001da211 in select () from /lib/libc.so.6
#2  0x08051f6a in event_loop ()
#3  0x08052d10 in main ()
(gdb)
	

Can anyone help me out with this?

Thanks in advance.









From marcos.david at efacec.com  Wed Dec  2 16:14:31 2009
From: marcos.david at efacec.com (Marcos David)
Date: Wed, 02 Dec 2009 16:14:31 +0000
Subject: [Linux-cluster] Random clurgmrgd crashes
Message-ID: <4B169267.7080000@efacec.com>

(Previous message went into the wrong thread... sorry).

Hi,

I'm experiencing random crashes on clurgmgrd on a 4 node RHEL5.3  cluster.
This is a big problem since it is happening on the production cluster....

The corefile backtrace gives:

Core was generated by `clurgmgrd -d'.
Program terminated with signal 6, Aborted.
[New process 2495]
#0  0x0068b402 in __kernel_vsyscall ()
(gdb) bt
#0  0x0068b402 in __kernel_vsyscall ()
#1  0x001da211 in select () from /lib/libc.so.6
#2  0x08051f6a in event_loop ()
#3  0x08052d10 in main ()
(gdb)
	

Can anyone help me out with this?

Thanks in advance.




From dan at quah.ro  Wed Dec  2 16:25:12 2009
From: dan at quah.ro (Dan Candea)
Date: Wed, 2 Dec 2009 18:25:12 +0200
Subject: [Linux-cluster] gfs2 assertion "!mapping->nrpages" failed on rsync
In-Reply-To: <1259758086.6052.959.camel@localhost.localdomain>
References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com>
	<200912021409.21001.dan@quah.ro>
	<1259758086.6052.959.camel@localhost.localdomain>
Message-ID: <200912021825.12575.dan@quah.ro>

On Wednesday 02 December 2009 14:48, Whitehouse Steven wrote:
--
Hi,

On Wed, 2009-12-02 at 14:09 +0200, Dan Candea wrote:
> hello
> 
> randomly , during a nightly backup with rsync I receive the error below on a 
3 
> node setup with cluster2.  because of the withdraw I can't unmount without a 
> reboot.
> 
> does someone have a clue?
> 
> 
> GFS2: fsid=data:FSdata.0: fatal: assertion "!mapping->nrpages" failed
> GFS2: fsid=data:FSdata.0:   function = gfs2_meta_inval, file = 
> fs/gfs2/meta_io.c, line = 110
> GFS2: fsid=data:FSdata.0: about to withdraw this file system
> GFS2: fsid=data:FSdata.0: telling LM to withdraw
> GFS2: fsid=data:FSdata.0: withdrawn
> Pid: 4643, comm: glock_workqueue Not tainted 2.6.28-hardened-r9 #1
I don't recognise this kernel version, which distro is it from?

its a kernel with grsecurity applied from gentoo


Can you reproduce this issue? I've heard of an issue involving rsync,
but having now tried various different rsync commands, I've not been
able to reproduce anything that fails.


I'll try to reproduce it after the reboot, which I have to do it by night, but 
I'm not sure I'll make something of it, cause the error is spontaneous, while 
the rsync is ran each day.

> Call Trace:
>  [<ffffffffa008e4ea>] 0xffffffffa008e4ea
>  [<ffffffff8025ecee>] 0xffffffff8025ecee
>  [<ffffffffa0091307>] 0xffffffffa0091307
>  [<ffffffffa008f640>] 0xffffffffa008f640
>  [<ffffffffa000fc18>] 0xffffffffa000fc18
>  [<ffffffffa000bfe8>] 0xffffffffa000bfe8
>  [<ffffffff8022605c>] 0xffffffff8022605c
>  [<ffffffffa008f060>] 0xffffffffa008f060
>  [<ffffffffa008e5cb>] 0xffffffffa008e5cb
>  [<ffffffffa00912f3>] 0xffffffffa00912f3
>  [<ffffffffa0077a9b>] 0xffffffffa0077a9b
>  [<ffffffffa0076a03>] 0xffffffffa0076a03
>  [<ffffffffa00771f7>] 0xffffffffa00771f7
>  [<ffffffff8023b43e>] 0xffffffff8023b43e
>  [<ffffffff8023b571>] 0xffffffff8023b571
>  [<ffffffff8023eee5>] 0xffffffff8023eee5
>  [<ffffffff8023eee5>] 0xffffffff8023eee5
>  [<ffffffff8023b4d8>] 0xffffffff8023b4d8
>  [<ffffffff8023e794>] 0xffffffff8023e794
>  [<ffffffff802035e9>] 0xffffffff802035e9
>  [<ffffffff8023e72b>] 0xffffffff8023e72b
>  [<ffffffff802035df>] 0xffffffff802035df
> 
This set of numbers is pretty useless without being translated into
symbols. On the other hand the assertion which you've hit is GFS2
complaining that its requested that the pages relating to an inode to be
invalidated, but there are some that have not been removed after that
invalidation. So in this particular case it doesn't matter,



Here are you saying that it could be an inconsistency in the FS?

Steve.

> 
> regards

thank you

-- 
Dan C?ndea
Does God Play Dice?



From swhiteho at redhat.com  Wed Dec  2 16:46:08 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 02 Dec 2009 16:46:08 +0000
Subject: [Linux-cluster] gfs2 assertion "!mapping->nrpages" failed on rsync
In-Reply-To: <200912021825.12575.dan@quah.ro>
References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com>
	<200912021409.21001.dan@quah.ro>
	<1259758086.6052.959.camel@localhost.localdomain>
	<200912021825.12575.dan@quah.ro>
Message-ID: <1259772368.6052.968.camel@localhost.localdomain>

Hi,

On Wed, 2009-12-02 at 18:25 +0200, Dan Candea wrote:
> On Wednesday 02 December 2009 14:48, Whitehouse Steven wrote:
> --
> Hi,
> 
> On Wed, 2009-12-02 at 14:09 +0200, Dan Candea wrote:
> > hello
> > 
> > randomly , during a nightly backup with rsync I receive the error below on a 
> 3 
> > node setup with cluster2.  because of the withdraw I can't unmount without a 
> > reboot.
> > 
> > does someone have a clue?
> > 
> > 
> > GFS2: fsid=data:FSdata.0: fatal: assertion "!mapping->nrpages" failed
> > GFS2: fsid=data:FSdata.0:   function = gfs2_meta_inval, file = 
> > fs/gfs2/meta_io.c, line = 110
> > GFS2: fsid=data:FSdata.0: about to withdraw this file system
> > GFS2: fsid=data:FSdata.0: telling LM to withdraw
> > GFS2: fsid=data:FSdata.0: withdrawn
> > Pid: 4643, comm: glock_workqueue Not tainted 2.6.28-hardened-r9 #1
> I don't recognise this kernel version, which distro is it from?
> 
> its a kernel with grsecurity applied from gentoo
> 
> 
> Can you reproduce this issue? I've heard of an issue involving rsync,
> but having now tried various different rsync commands, I've not been
> able to reproduce anything that fails.
> 
> 
> I'll try to reproduce it after the reboot, which I have to do it by night, but 
> I'm not sure I'll make something of it, cause the error is spontaneous, while 
> the rsync is ran each day.
> 
Ok. I suspect though that whatever the issue, it has probably been fixed
in more recent kernels, .28 is pretty old now so I'd suggest upgrading
your kernel as one possible solution. I'd be surprised if that doesn't
fix your issue.


[various number removed for brevity]
> >  [<ffffffff802035df>] 0xffffffff802035df
> > 
> This set of numbers is pretty useless without being translated into
> symbols. On the other hand the assertion which you've hit is GFS2
> complaining that its requested that the pages relating to an inode to be
> invalidated, but there are some that have not been removed after that
> invalidation. So in this particular case it doesn't matter,
> 
> 
> 
> Here are you saying that it could be an inconsistency in the FS?
> 
No, its more likely to be an issue in the code. It doesn't look like the
fs is damaged at all, in fact that bug trap is there to prevent damage
to the fs in this particular case,

Steve.




From dan at quah.ro  Wed Dec  2 16:44:54 2009
From: dan at quah.ro (Dan Candea)
Date: Wed, 2 Dec 2009 18:44:54 +0200
Subject: [Linux-cluster] gfs2 assertion "!mapping->nrpages" failed on rsync
In-Reply-To: <1259772368.6052.968.camel@localhost.localdomain>
References: <3170ac020911300654g33fbd14fpa6361b358ba7cbb2@mail.gmail.com>
	<200912021825.12575.dan@quah.ro>
	<1259772368.6052.968.camel@localhost.localdomain>
Message-ID: <200912021844.54205.dan@quah.ro>

On Wednesday 02 December 2009 18:46, Whitehouse Steven wrote:
--
Hi,

On Wed, 2009-12-02 at 18:25 +0200, Dan Candea wrote:
> On Wednesday 02 December 2009 14:48, Whitehouse Steven wrote:
> --
> Hi,
> 
> On Wed, 2009-12-02 at 14:09 +0200, Dan Candea wrote:
> > hello
> > 
> > randomly , during a nightly backup with rsync I receive the error below on 
a 
> 3 
> > node setup with cluster2.  because of the withdraw I can't unmount without 
a 
> > reboot.
> > 
> > does someone have a clue?
> > 
> > 
> > GFS2: fsid=data:FSdata.0: fatal: assertion "!mapping->nrpages" failed
> > GFS2: fsid=data:FSdata.0:   function = gfs2_meta_inval, file = 
> > fs/gfs2/meta_io.c, line = 110
> > GFS2: fsid=data:FSdata.0: about to withdraw this file system
> > GFS2: fsid=data:FSdata.0: telling LM to withdraw
> > GFS2: fsid=data:FSdata.0: withdrawn
> > Pid: 4643, comm: glock_workqueue Not tainted 2.6.28-hardened-r9 #1
> I don't recognise this kernel version, which distro is it from?
> 
> its a kernel with grsecurity applied from gentoo
> 
> 
> Can you reproduce this issue? I've heard of an issue involving rsync,
> but having now tried various different rsync commands, I've not been
> able to reproduce anything that fails.
> 
> 
> I'll try to reproduce it after the reboot, which I have to do it by night, 
but 
> I'm not sure I'll make something of it, cause the error is spontaneous, 
while 
> the rsync is ran each day.
> 
Ok. I suspect though that whatever the issue, it has probably been fixed
in more recent kernels, .28 is pretty old now so I'd suggest upgrading
your kernel as one possible solution. I'd be surprised if that doesn't
fix your issue.


ok, thank you.  I'll try a kernel upgrade.


[various number removed for brevity]
> >  [<ffffffff802035df>] 0xffffffff802035df
> > 
> This set of numbers is pretty useless without being translated into
> symbols. On the other hand the assertion which you've hit is GFS2
> complaining that its requested that the pages relating to an inode to be
> invalidated, but there are some that have not been removed after that
> invalidation. So in this particular case it doesn't matter,
> 
> 
> 
> Here are you saying that it could be an inconsistency in the FS?
> 
No, its more likely to be an issue in the code. It doesn't look like the
fs is damaged at all, in fact that bug trap is there to prevent damage
to the fs in this particular case,

Steve.



-- 
Dan C?ndea
Does God Play Dice?



From rmicmirregs at gmail.com  Wed Dec  2 16:52:35 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Wed, 02 Dec 2009 17:52:35 +0100
Subject: [Linux-cluster] Qdisk with multiple heuristics?
In-Reply-To: <29ae894c0912020249q198d8410sc9007a6398757ea4@mail.gmail.com>
References: <1259710380.6571.14.camel@mecatol>
	<29ae894c0912020249q198d8410sc9007a6398757ea4@mail.gmail.com>
Message-ID: <1259772755.6568.5.camel@mecatol>

Hi Brem,

Thanks for you answer. 

The problem was the qdiskd service not being started by CMAN. In my
previous configuration, it was started by the CMAN startup script
(located in init.d, it would start qdiskd if necessary) but this time
the qdiskd service was configured to not start in system start-up (with
chkconfig) so CMAN did not start it either. A strange behaviour/design
decision, in my opinion.

Now everything is solved and the multiple heuristic is working (I see
the 2 ping processes working). I only need to check the "score"
configuration to see if it is working properly. I plan to do it
tomorrow.

Cheers,

Rafael



El mi?, 02-12-2009 a las 11:49 +0100, brem belguebli escribi?:
> Hi Rafael,
> 
> Concerning your second point, have you initialized your
> /dev/mpath/quorum device with mkqdisk ?
> 
> Also, the qdisk daemon must be running if you want it to be
> operationnal in your cluster.
> 
> In my setup, everything is started manually, no automatic boot time
> cluster start (safest option IMHO), and I use the following stepping:
> 
> 1) start qdisk (service qdiskd start)
> 2) start cman (service cman start)
> 3) start rgmanager (service rgmanager start)
> 4) wait untill the cluster is quorate (a shell loop) before starting clvmd
> 5) start clvmd
> 
>  Output of clustat:
> 
> 
> Cluster Status for rhcl1 @ Wed Dec  2 11:20:20 2009
> Member Status: Quorate
> 
>  Member Name                                           ID   Status
>  ------ ----                                                     ---- ------
>  node1.mydom                                        1 Online, Local, rgmanager
>  node2.mydom                                        2 Online, rgmanager
>  node3.mydom                                        3 Online, rgmanager
>  /dev/iscsi/storage.quorum                        0 Online, Quorum
> Disk                       <-- Qorum disk started...
> 
>  Service Name                                            Owner (Last)
>                                           State
> ....
> 
> 
> [root at node1 ~]# ps -edf | grep qdisk
> root      4409     1  0 Nov26 ?        00:04:00 qdiskd -Q
> 
> 
> Concerning your point 1, you may address this by giving a different
> score to each heuristic, but I clearly don't know if this is what it
> intends to.
> 
> Brem
> Regards
> 
> 
> 2009/12/2 Rafael Mic? Miranda <rmicmirregs at gmail.com>:
> > Hi all,
> >
> > As it can be found in qdiskd man page, it is allowed to use up to 10
> > different heuristics in one cluster.
> >
> > How is this specified into cluster.conf? I'm trying to make it work with
> > the following piece of cluster.conf file:
> >
> > <quorumd device="/dev/mpath/quorum" interval="6" min_score="1" tko="10"
> > votes="3">
> >        <heuristic interval="3" program="ping -c3 -t2 10.240.199.1" score="1"/>
> >        <heuristic interval="3" program="ping -c3 -t2 10.240.199.47"
> > score="1"/>
> > </quorumd>
> >
> > My objective is to have 2 (or more) different heuristics which keep this
> > node alive even if only one heuristic is OK. The cluster.conf file was
> > created with system-config-cluster and later was edited by hand.
> >
> > The qdisk and heuristics are not working:
> > 1.- system-config-cluster shows me a warning about an error related to
> > some options not allowed into quorumd. I'm sorry i cannot be more
> > specific right now, I could attach the exact message tomorrow.
> >
> > 2.- The cluster is operational, but using "clustat" i don't see the
> > qdisk with its votes in the node list. The qdisk process is neither
> > shown in the process list on the system.
> >
> > Is there somethin wrong?
> >
> > I'm using RHEL5.3 with:
> > cman-2.0.98-1.el5.x86_64
> > openais-0.80.3-22.el5.x86_64
> > rgmanager-2.0.46-1.el5.x86_64
> >
> >
> > Thanks in advance. Cheers,
> >
> > Rafael
> >
> >
> > --
> > Rafael Mic? Miranda
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-- 
Rafael Mic? Miranda



From rvandolson at esri.com  Thu Dec  3 20:42:57 2009
From: rvandolson at esri.com (Ray Van Dolson)
Date: Thu, 3 Dec 2009 12:42:57 -0800
Subject: [Linux-cluster] GFS2 and backups (performance tuning)
Message-ID: <20091203204257.GA15314@esri.com>

We have a two node cluster primarily acting as an NFS serving
environment.  Our backup infrastructure here uses NetBackup and,
unfortunately, NetBackup has no PPC client (we're running on IBM JS20
blades) so we're approaching the backup strategy in two different ways:

  - Run netbackup client from another machine and point it to NFS share
    on one of our two cluster nodes
  - Run rsyncd on our cluster nodes and rsync from a remote machine.
    NetBackup then backs up that machine.

The GFS2 filesystem in our cluster only is storing about 90GB of data,
but has about one million files (inodes used reported via df -i) on it.

(For the curious, this is a home directory server and we do break
thinsg up under a top level hierarchy of a folder for each first letter
of a username).

The NetBackup over NFS route is extremely slow and spikes the load up
on whichever server is being backed up from.  We made the following
adjustments to try and improve performance:

  - Set the following in our cluster.conf file:

    <dlm plock_ownership="1" plock_rate_limit="0"/>
    <gfs_controld plock_rate_limit="0"/>

    ping_pong will give me about 3-5k locks/sec now.
  
  - Mounted filesystem with noatime,nodiratime,quota=off

This seems to have helped a bit, but things are still taking a long
time.  I should note here that I tried running ping_pong to one of our
cluster nodes via one of its NFS exports of the GFS2 filesystem.  While
I can get 3000-5000 locks/sec locally, over NFS it was about... 2 or 3
(not thousand, literally 2 or 3).  tcpdump of the NLM port shows the
NFS lock manager on the node responding NLM_BLOCK most of the time.
I'm not sure if GFS2 or our NFS daemon is to blame... in any case...

.. I've set up rsyncd on the cluster nodes and am sync'ing from a
remote server now (all of this via Gigabit ethernet).  I'm over an hour
in and the client is still generatin the file list.  strace confirms
that rsync --daemon is still trolling through, generating a list of
files on the filesystem...

I've done a blktrace dump on my GFS2 filesystem's block device and can
clearly see glock_workqueue showing up the most by far.  However, I
don't know what else I can glean from these results.

Anyone have any tips or suggestions on improving either our NFS locking
or rsync --daemon performance beyond what I've already tried?  It might
almost be quicker for us to do a full backup each time than to spend
hours building file lists for differential backups :)

Details of our setup:

  - IBM DS4300 Storage (12 drive RAID5 + 2 spares)
    - Exposed as two LUNs (one per controller)
    - Don't believe this array does hardware snapshots :(
  - Two (2) IBM JS20 Blades (PPC)
    - QLogic ISP2312 2Gb HBA's
    - RHEL 5.4 Advanced Platform PPC
    - multipathd
    - clvm aggregates two LUNs
    - GFS2 on top of clvm
      - Configured with quotas originally, but disabled later by
        mounting quota=off
      - Mounted with noatime,nodiratime,quota=off

  # gfs2_tool gettune /domus1
  new_files_directio = 0
  new_files_jdata = 0
  quota_scale = 1.0000   (1, 1)
  logd_secs = 1
  recoverd_secs = 60
  statfs_quantum = 30
  stall_secs = 600
  quota_cache_secs = 300
  quota_simul_sync = 64
  statfs_slow = 0
  complain_secs = 10
  max_readahead = 262144
  quota_quantum = 60
  quota_warn_period = 10
  jindex_refresh_secs = 60
  log_flush_secs = 60
  incore_log_blocks = 1024

  # gfs2_tool getargs /domus1
  data 2
  suiddir 0
  quota 0
  posix_acl 1
  upgrade 0
  debug 0
  localflocks 0
  localcaching 0
  ignore_local_fs 0
  spectator 0
  hostdata jid=1:id=196610:first=0
  locktable 
  lockproto 

Thanks in advance for any advice.

Ray



From allen at isye.gatech.edu  Thu Dec  3 22:30:29 2009
From: allen at isye.gatech.edu (Allen Belletti)
Date: Thu, 03 Dec 2009 17:30:29 -0500
Subject: [Linux-cluster] GFS2: processes stuck in "just schedule"
In-Reply-To: <20091203204257.GA15314@esri.com>
References: <20091203204257.GA15314@esri.com>
Message-ID: <4B183C05.1060101@isye.gatech.edu>

Hi All,

After Steve and the RedHat guys dug into my nasty crashdump (thanks 
all!), I believe I'm down to the last GFS2 problem on our mail cluster, 
but it's a common one.

I've always had trouble with processes getting stuck on GFS2 access and 
queuing up.  Since the 5.4 upgrade and moving the proper GFS2 kernel 
module, it's changed but not gone away.  Ever few days now, I'm seeing 
processes getting stuck with WCHAN=just_schedule.  Once this starts 
happening, both cluster nodes will accumulate them rapidly which 
eventually brings IO to a halt.  The only way I've found to escape is 
via a reboot, sometimes of one, sometimes of both nodes.

Since there's no crash, I don't get any useful debug information.  
Outside of this one repeating glitch, performance is great and all is 
well.  If anyone can suggest ways of gathering more data about the 
problem, or possible solutions, I would be grateful.

Thanks,
Allen




From no-reply at dropbox.com  Fri Dec  4 01:41:14 2009
From: no-reply at dropbox.com (Dropbox)
Date: Fri, 04 Dec 2009 01:41:14 +0000
Subject: [Linux-cluster] Jorge Palma has invited you to Dropbox
Message-ID: <20091204014114.C4E7E46180B@mailman.dropbox.com>

We're excited to let you know that Jorge Palma has invited you to Dropbox!

Jorge Palma has been using Dropbox to sync and share files online and across computers, and thought you might want it too.

Visit http://www.dropbox.com/link/20.yzjZ2HAsSs/NjYwMDc0ODg3 to get started.

- The Dropbox Team

____________________________________________________ 
To stop receiving invites from Dropbox, please go to http://www.dropbox.com/bl/180e8afc7eea/linux-cluster%40redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091204/324a2d1a/attachment.htm>

From baishuwei at gmail.com  Fri Dec  4 03:06:05 2009
From: baishuwei at gmail.com (Bai Shuwei)
Date: Fri, 4 Dec 2009 11:06:05 +0800
Subject: [Linux-cluster] LUN/LUN Masking
Message-ID: <f3566d60912031906v78f40992u349c16245149e349@mail.gmail.com>

HI, everyone:
   I am a begginer on FC-SAN. On my machine i have installed
HBAs(QLogic2342) and SLCI tools. How I can setup the lun masking to
forbidden/allow hosts to access special LUN/Disk? Do I need some other
speccial tools to do it? Thanks all.

Best Regards

Bai SHuwei

-- 
Love other people, as same as love yourself!
Don't think all the time, do it by your hands!

Personal URL: http://dslab.lzu.edu.cn:8080/members/baishw/
E-Mail: baishuwei at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091204/d2fd2204/attachment.htm>

From fajar at fajar.net  Fri Dec  4 04:26:31 2009
From: fajar at fajar.net (Fajar A. Nugraha)
Date: Fri, 4 Dec 2009 11:26:31 +0700
Subject: [Linux-cluster] LUN/LUN Masking
In-Reply-To: <f3566d60912031906v78f40992u349c16245149e349@mail.gmail.com>
References: <f3566d60912031906v78f40992u349c16245149e349@mail.gmail.com>
Message-ID: <7207d96f0912032026n7ef04c7ahc86b24ca482e3326@mail.gmail.com>

On Fri, Dec 4, 2009 at 10:06 AM, Bai Shuwei <baishuwei at gmail.com> wrote:
> HI, everyone:
> ?? I am a begginer on FC-SAN. On my machine i have installed
> HBAs(QLogic2342) and SLCI tools. How I can setup the lun masking to
> forbidden/allow hosts to access special LUN/Disk? Do I need some other
> speccial tools to do it? Thanks all.

AFAIK LUN masking is done on storage side, not client side.

-- 
Fajar



From cthulhucalling at gmail.com  Fri Dec  4 05:43:26 2009
From: cthulhucalling at gmail.com (Ian Hayes)
Date: Thu, 3 Dec 2009 21:43:26 -0800
Subject: [Linux-cluster] LUN/LUN Masking
In-Reply-To: <f3566d60912031906v78f40992u349c16245149e349@mail.gmail.com>
References: <f3566d60912031906v78f40992u349c16245149e349@mail.gmail.com>
Message-ID: <36df569a0912032143u6bbf9e0fh5f01496738b51e33@mail.gmail.com>

It depends on who your san vendor is, but its done on the storage side
usually through the management console. all the ones I've used filter by the
wwn of the host bus adapters. You may also want to consider zoning your
hba's at the switch level.

On Dec 3, 2009 7:06 PM, "Bai Shuwei" <baishuwei at gmail.com> wrote:

HI, everyone:
   I am a begginer on FC-SAN. On my machine i have installed
HBAs(QLogic2342) and SLCI tools. How I can setup the lun masking to
forbidden/allow hosts to access special LUN/Disk? Do I need some other
speccial tools to do it? Thanks all.

Best Regards

Bai SHuwei

-- 
Love other people, as same as love yourself!
Don't think all the time, do it by your hands!

Personal URL: http://dslab.lzu.edu.cn:8080/members/baishw/
E-Mail: baishuwei at gmail.com

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091203/04e0bd2d/attachment.htm>

From swhiteho at redhat.com  Fri Dec  4 09:39:01 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Fri, 04 Dec 2009 09:39:01 +0000
Subject: [Linux-cluster] GFS2: processes stuck in "just schedule"
In-Reply-To: <4B183C05.1060101@isye.gatech.edu>
References: <20091203204257.GA15314@esri.com>
	<4B183C05.1060101@isye.gatech.edu>
Message-ID: <1259919541.2489.8.camel@localhost>

Hi,

On Thu, 2009-12-03 at 17:30 -0500, Allen Belletti wrote:
> Hi All,
> 
> After Steve and the RedHat guys dug into my nasty crashdump (thanks 
> all!), I believe I'm down to the last GFS2 problem on our mail cluster, 
> but it's a common one.
> 
> I've always had trouble with processes getting stuck on GFS2 access and 
> queuing up.  Since the 5.4 upgrade and moving the proper GFS2 kernel 
> module, it's changed but not gone away.  Ever few days now, I'm seeing 
> processes getting stuck with WCHAN=just_schedule.  Once this starts 
> happening, both cluster nodes will accumulate them rapidly which 
> eventually brings IO to a halt.  The only way I've found to escape is 
> via a reboot, sometimes of one, sometimes of both nodes.
> 
> Since there's no crash, I don't get any useful debug information.  
> Outside of this one repeating glitch, performance is great and all is 
> well.  If anyone can suggest ways of gathering more data about the 
> problem, or possible solutions, I would be grateful.
> 
> Thanks,
> Allen
> 
> 
This would be typical for what happens when there is contention on a
glock between two (or more) nodes. There is a mechanism which is
supposed to try and mitigate the issue (by allowing each node to hold on
to a glock for a minimum period of time which is designed to ensure that
some work is done each time a node acquires a glock) but if your storage
is particularly slow, and/or possibly depending upon the exact I/O
pattern, it may not always be 100% effective.

In the first instance though, see if you can find an inode which is
being contended from both nodes as that will most likely be the culprit,

Steve.




From swhiteho at redhat.com  Fri Dec  4 09:44:30 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Fri, 04 Dec 2009 09:44:30 +0000
Subject: [Linux-cluster] GFS2 and backups (performance tuning)
In-Reply-To: <20091203204257.GA15314@esri.com>
References: <20091203204257.GA15314@esri.com>
Message-ID: <1259919870.2489.10.camel@localhost>

Hi,

I'd suggest filing a bug in the first instance. I can't see anything
obviously wrong with what you are doing. The fcntl() locks go via the
dlm and dlm_controld not via the glock_workqueues, so I don't think that
is likely to be the issue,

Steve.

On Thu, 2009-12-03 at 12:42 -0800, Ray Van Dolson wrote:
> We have a two node cluster primarily acting as an NFS serving
> environment.  Our backup infrastructure here uses NetBackup and,
> unfortunately, NetBackup has no PPC client (we're running on IBM JS20
> blades) so we're approaching the backup strategy in two different ways:
> 
>   - Run netbackup client from another machine and point it to NFS share
>     on one of our two cluster nodes
>   - Run rsyncd on our cluster nodes and rsync from a remote machine.
>     NetBackup then backs up that machine.
> 
> The GFS2 filesystem in our cluster only is storing about 90GB of data,
> but has about one million files (inodes used reported via df -i) on it.
> 
> (For the curious, this is a home directory server and we do break
> thinsg up under a top level hierarchy of a folder for each first letter
> of a username).
> 
> The NetBackup over NFS route is extremely slow and spikes the load up
> on whichever server is being backed up from.  We made the following
> adjustments to try and improve performance:
> 
>   - Set the following in our cluster.conf file:
> 
>     <dlm plock_ownership="1" plock_rate_limit="0"/>
>     <gfs_controld plock_rate_limit="0"/>
> 
>     ping_pong will give me about 3-5k locks/sec now.
>   
>   - Mounted filesystem with noatime,nodiratime,quota=off
> 
> This seems to have helped a bit, but things are still taking a long
> time.  I should note here that I tried running ping_pong to one of our
> cluster nodes via one of its NFS exports of the GFS2 filesystem.  While
> I can get 3000-5000 locks/sec locally, over NFS it was about... 2 or 3
> (not thousand, literally 2 or 3).  tcpdump of the NLM port shows the
> NFS lock manager on the node responding NLM_BLOCK most of the time.
> I'm not sure if GFS2 or our NFS daemon is to blame... in any case...
> 
> .. I've set up rsyncd on the cluster nodes and am sync'ing from a
> remote server now (all of this via Gigabit ethernet).  I'm over an hour
> in and the client is still generatin the file list.  strace confirms
> that rsync --daemon is still trolling through, generating a list of
> files on the filesystem...
> 
> I've done a blktrace dump on my GFS2 filesystem's block device and can
> clearly see glock_workqueue showing up the most by far.  However, I
> don't know what else I can glean from these results.
> 
> Anyone have any tips or suggestions on improving either our NFS locking
> or rsync --daemon performance beyond what I've already tried?  It might
> almost be quicker for us to do a full backup each time than to spend
> hours building file lists for differential backups :)
> 
> Details of our setup:
> 
>   - IBM DS4300 Storage (12 drive RAID5 + 2 spares)
>     - Exposed as two LUNs (one per controller)
>     - Don't believe this array does hardware snapshots :(
>   - Two (2) IBM JS20 Blades (PPC)
>     - QLogic ISP2312 2Gb HBA's
>     - RHEL 5.4 Advanced Platform PPC
>     - multipathd
>     - clvm aggregates two LUNs
>     - GFS2 on top of clvm
>       - Configured with quotas originally, but disabled later by
>         mounting quota=off
>       - Mounted with noatime,nodiratime,quota=off
> 
>   # gfs2_tool gettune /domus1
>   new_files_directio = 0
>   new_files_jdata = 0
>   quota_scale = 1.0000   (1, 1)
>   logd_secs = 1
>   recoverd_secs = 60
>   statfs_quantum = 30
>   stall_secs = 600
>   quota_cache_secs = 300
>   quota_simul_sync = 64
>   statfs_slow = 0
>   complain_secs = 10
>   max_readahead = 262144
>   quota_quantum = 60
>   quota_warn_period = 10
>   jindex_refresh_secs = 60
>   log_flush_secs = 60
>   incore_log_blocks = 1024
> 
>   # gfs2_tool getargs /domus1
>   data 2
>   suiddir 0
>   quota 0
>   posix_acl 1
>   upgrade 0
>   debug 0
>   localflocks 0
>   localcaching 0
>   ignore_local_fs 0
>   spectator 0
>   hostdata jid=1:id=196610:first=0
>   locktable 
>   lockproto 
> 
> Thanks in advance for any advice.
> 
> Ray
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From frank at si.ct.upc.edu  Fri Dec  4 12:06:52 2009
From: frank at si.ct.upc.edu (frank)
Date: Fri, 04 Dec 2009 13:06:52 +0100
Subject: [Linux-cluster] GFS performance test
In-Reply-To: <20091202163200.DCB0A8E14CA@hormel.redhat.com>
References: <20091202163200.DCB0A8E14CA@hormel.redhat.com>
Message-ID: <4B18FB5C.3090500@si.ct.upc.edu>

Hi Ray,
thank for your answer.
We are using GFS1 on a Red Hat 5.4 cluster. GFS filesystem is mounted on 
/mnt/gfs, and when we created such filesystem we used parameter "-p 
lock_dlm". Anyway, look at this output :

[root at parmenides ~]# gfs_tool getsb /mnt/gfs
  .........................
   no_addr = 26
   sb_lockproto = lock_dlm
   sb_locktable = hr-pm:gfs01
   no_formal_ino = 24
   no_addr = 24
   ...............

For you information my cluster.conf file is:

-------------------------------------------------------------------------------------------------------------------------------
<?xml version="1.0"?>
<cluster config_version="4" name="hr-pm">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="zipi" nodeid="1" votes="1">
<fence>
<method name="1">
<device modulename="" name="DRAC_heraclito"/>
</method>
</fence>
</clusternode>
<clusternode name="zape" nodeid="2" votes="1">
<fence>
<method name="1">
<device modulename="" name="DRAC_parmenides"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_drac" ipaddr="10.0.0.207" login="root" 
name="DRAC_heraclito" passwd="*****"/>
<fencedevice agent="fence_drac" ipaddr="10.0.0.208" login="root" 
name="DRAC_parmenides" passwd="******"/>
<fencedevice agent="fence_ipmilan" auth="md5" ipaddr="10.0.0.207" 
login="root" name="IPMILan_heraclito" passwd="*"/>
<fencedevice agent="fence_ipmilan" auth="md5" ipaddr="10.0.0.208" 
login="root" name="IPMILan_parmenides" passwd="*"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>
-------------------------------------------------------------------------------------------------------------
Shared disk is a LUN on a fibber channel SAN.
The most surprising thing is that we have another similar cluster, and 
there we get "98 locks/sec" always, starting the ping_pong in one or in 
both nodes. Buf! I'm lost! What is happening?

Frank


> Date: Wed, 2 Dec 2009 06:58:43 -0800 From: Ray Van Dolson 
> <rvandolson at esri.com> Subject: Re: [Linux-cluster] GFS performance 
> test To: linux-cluster at redhat.com Message-ID: 
> <20091202145842.GA16292 at esri.com> Content-Type: text/plain; 
> charset=us-ascii On Wed, Dec 02, 2009 at 03:53:46AM -0800, frank wrote:
>> >  Hi,
>> >  after seeing some posts related to GFS performance, we have decided to
>> >  test our two-node GFS filesystem with ping_pong program.
>> >  We are worried about the results.
>> >  
>> >  Running the program in only one node, without parameters, we get between
>> >  800000 locks/sec and 900000 locks/sec
>> >  Running the program in both nodes over the same file on the shared
>> >  filesystem, the lock rate did not drop and it is the same in both nodes!
>> >  What does this mean? Is there any problem with locks ?
>> >  
>> >  Just for you info, GFS filesystem is /mnt/gfs and what I run in both
>> >  nodes is:
>> >  
>> >  ./ping_pong /mnt/gfs/tmp/test.dat 3
>> >  
>> >  Thanks for your help.
>> >  
>>      
> Wow, that doesn't sound right at all (or at least not consistent with
> results I've gotten:)
>
> Can you provide details of your setup, and perhaps your cluster.conf
> file?  Have you done any other GFS tuning?  Are we talking GFS1 or
> GFS2?
>
> I get in the 3000-5000 locks/sec range with my GFS2 filesystem (using
> nodiratime,noatime and reducing the lock limit to 0 from 100 in my
> cluster.conf file).
>
> The numbers you provide I'd expect to see on a local filesystem.
>
> Ray
>    


-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est? net.
For all your IT requirements visit: http://www.transtec.co.uk



From rvandolson at esri.com  Fri Dec  4 15:19:16 2009
From: rvandolson at esri.com (Ray Van Dolson)
Date: Fri, 4 Dec 2009 07:19:16 -0800
Subject: [Linux-cluster] GFS2 and backups (performance tuning)
In-Reply-To: <1259919870.2489.10.camel@localhost>
References: <20091203204257.GA15314@esri.com>
	<1259919870.2489.10.camel@localhost>
Message-ID: <20091204151916.GA899@esri.com>

On Fri, Dec 04, 2009 at 01:44:30AM -0800, Steven Whitehouse wrote:
> Hi,
> 
> I'd suggest filing a bug in the first instance. I can't see anything
> obviously wrong with what you are doing. The fcntl() locks go via the
> dlm and dlm_controld not via the glock_workqueues, so I don't think that
> is likely to be the issue,
> 
> Steve.

Thanks Steve.  I'll go the bug + SR route.

Ray



From allen at isye.gatech.edu  Fri Dec  4 19:26:39 2009
From: allen at isye.gatech.edu (Allen Belletti)
Date: Fri, 04 Dec 2009 14:26:39 -0500
Subject: [Linux-cluster] GFS2: processes stuck in "just schedule"
In-Reply-To: <1259919541.2489.8.camel@localhost>
References: <20091203204257.GA15314@esri.com>	<4B183C05.1060101@isye.gatech.edu>
	<1259919541.2489.8.camel@localhost>
Message-ID: <4B19626F.3060405@isye.gatech.edu>



On 12/04/2009 04:39 AM, Steven Whitehouse wrote:
> Hi,
>
> On Thu, 2009-12-03 at 17:30 -0500, Allen Belletti wrote:
>    
>> Hi All,
>>
>> After Steve and the RedHat guys dug into my nasty crashdump (thanks
>> all!), I believe I'm down to the last GFS2 problem on our mail cluster,
>> but it's a common one.
>>
>> I've always had trouble with processes getting stuck on GFS2 access and
>> queuing up.  Since the 5.4 upgrade and moving the proper GFS2 kernel
>> module, it's changed but not gone away.  Ever few days now, I'm seeing
>> processes getting stuck with WCHAN=just_schedule.  Once this starts
>> happening, both cluster nodes will accumulate them rapidly which
>> eventually brings IO to a halt.  The only way I've found to escape is
>> via a reboot, sometimes of one, sometimes of both nodes.
>>
>> Since there's no crash, I don't get any useful debug information.
>> Outside of this one repeating glitch, performance is great and all is
>> well.  If anyone can suggest ways of gathering more data about the
>> problem, or possible solutions, I would be grateful.
>>
>> Thanks,
>> Allen
>>
>>
>>      
> This would be typical for what happens when there is contention on a
> glock between two (or more) nodes. There is a mechanism which is
> supposed to try and mitigate the issue (by allowing each node to hold on
> to a glock for a minimum period of time which is designed to ensure that
> some work is done each time a node acquires a glock) but if your storage
> is particularly slow, and/or possibly depending upon the exact I/O
> pattern, it may not always be 100% effective.
>
> In the first instance though, see if you can find an inode which is
> being contended from both nodes as that will most likely be the culprit,
>    
We've got a 3-4 year old Sun 3510 FC array shared between the two 
nodes.  The utilization on it is generally quite reasonable, so I doubt 
that this would qualify as "particularly slow".  Also, the very busiest 
times for the mail system are usually during the night rsync backups and 
it rarely if ever gets wedged during those times.

Can you give me some hints as to how I might go about finding a inode 
that's being contended for by both nodes?  I assume that would be useful 
to confirm what the problem is at least.

Thanks,
Allen

-- 
Allen Belletti
allen at isye.gatech.edu                             404-894-6221 Phone
Industrial and Systems Engineering                404-385-2988 Fax
Georgia Institute of Technology



From gbmiglia at yahoo.it  Mon Dec  7 17:41:01 2009
From: gbmiglia at yahoo.it (gilberto migliavacca)
Date: Mon, 07 Dec 2009 18:41:01 +0100
Subject: [Linux-cluster] redhat cluster and resource agent
Message-ID: <4B1D3E2D.8040109@yahoo.it>

Hi

I'm a newbie in the red hat cluster configuration and
I don't know if this is the right mailing list for my
question.

I have to use my own resource agent script and I have
to say to the cluster that the related service must be
run just on single server.

I other words I want to drive 2 nodes with 4 instances
of the same application (2 instances per node).

the infostructure is somehting like:

node_1
/opt/myapp_11/bin/myapp.sh
/opt/myapp_12/bin/myapp.sh
node_2
/opt/myapp_21/bin/myapp.sh
/opt/myapp_22/bin/myapp.sh


My idea is to create 4 services in the /etc/cluster/cluster.conf
but I don't know how to related the service with a
given machine and a related path on the given machine


for my understanding I think I cannot use the Conga GUI (neither
the system-config-cluster) and I have to edit manually the
/etc/cluster/cluster.conf

could anyone help to write the XML section in the <rm> tag?

something like

<rm>
     <service autostart="0" exclusive="0" name="srv11">
             <myapp name="res11"
                   myapp_home="/opt/myapp_11"
                   shutdown_wait="0"/>
     </service>
     <service autostart="0" exclusive="0" name="srv12">
             <myapp name="res12"
                   myapp_home="/opt/myapp_12"
                   shutdown_wait="0"/>
     </service>
     <service autostart="0" exclusive="0" name="srv21">
             <myapp name="res21"
                   myapp_home="/opt/myapp_21"
                   shutdown_wait="0"/>
     </service>
     <service autostart="0" exclusive="0" name="srv22">
             <myapp name="res22"
                   myapp_home="/opt/myapp_22"
                   shutdown_wait="0"/>
     </service>
</rm>


As you can see I don't know how to specify the node

thanks in advance

gilberto



From rmicmirregs at gmail.com  Mon Dec  7 23:16:01 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Tue, 08 Dec 2009 00:16:01 +0100
Subject: [Linux-cluster] redhat cluster and resource agent
In-Reply-To: <4B1D3E2D.8040109@yahoo.it>
References: <4B1D3E2D.8040109@yahoo.it>
Message-ID: <1260227761.6606.9.camel@mecatol>

Hi Gilberto,

What you need to specify where to run each service is the Failover
Domain of each service.

Some info:

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Cluster_Administration/s1-config-failover-domain-CA.html
http://sources.redhat.com/cluster/wiki/FailoverDomains

You have 4 different services, so I would use 2 or 4 different Failover
Domains to achieve your objective, depending on the availability of
running each of your services in your cluster nodes.

First you will need to define de Failover Domains:

Failover Domain X
Restricted domain: yes
Ordered: yes
Node A - Priority 1
Node B - Priority 2

And so on.

Then you'll need to set the Failover Domain for each of the services,
for example:
Service 1 -> FailoverDomain1
Service 2 -> FailoverDomain2
Service 3 -> FailoverDomain3
Service 4 -> FailoverDomain4

This can be all done with system-config-cluster, but using a resource
made by yourself into cluster.conf will give you some errors.

It should be similar to this:

<rm>
	<failoverdomains>
		<failoverdomain name="failoverdomain1" ordered="1"
restricted="1"><failoverdomainnode name="ulises"
priority="1"/><failoverdomainnode name="aquiles"
priority="2"/></failoverdomain>
		<failoverdomain name="failoverdomain2" ordered="1"
restricted="1"><failoverdomainnode name="caligula"
priority="1"/><failoverdomainnode name="ulises"
priority="2"/></failoverdomain>
	</failoverdomains>
	<resources>
[I think you need your definition of your myapp resources here]
	<myapp name="res11" myapp_home="/opt/myapp_11" shutdown_wait="0"/>
[...and so on]
	</resources>
[and then start the definition of your services]
	<service autostart="0" exclusive="0" name="srv11">
             <myapp ref="res11"/>
	</service>
[... and so on]
</rm>

Another question is: is your script usable by CMAN?

I hope this helps. Cheers,

Rafael

El lun, 07-12-2009 a las 18:41 +0100, gilberto migliavacca escribi?:
> Hi
> 
> I'm a newbie in the red hat cluster configuration and
> I don't know if this is the right mailing list for my
> question.
> 
> I have to use my own resource agent script and I have
> to say to the cluster that the related service must be
> run just on single server.
> 
> I other words I want to drive 2 nodes with 4 instances
> of the same application (2 instances per node).
> 
> the infostructure is somehting like:
> 
> node_1
> /opt/myapp_11/bin/myapp.sh
> /opt/myapp_12/bin/myapp.sh
> node_2
> /opt/myapp_21/bin/myapp.sh
> /opt/myapp_22/bin/myapp.sh
> 
> 
> My idea is to create 4 services in the /etc/cluster/cluster.conf
> but I don't know how to related the service with a
> given machine and a related path on the given machine
> 
> 
> for my understanding I think I cannot use the Conga GUI (neither
> the system-config-cluster) and I have to edit manually the
> /etc/cluster/cluster.conf
> 
> could anyone help to write the XML section in the <rm> tag?
> 
> something like
> 
> <rm>
>      <service autostart="0" exclusive="0" name="srv11">
>              <myapp name="res11"
>                    myapp_home="/opt/myapp_11"
>                    shutdown_wait="0"/>
>      </service>
>      <service autostart="0" exclusive="0" name="srv12">
>              <myapp name="res12"
>                    myapp_home="/opt/myapp_12"
>                    shutdown_wait="0"/>
>      </service>
>      <service autostart="0" exclusive="0" name="srv21">
>              <myapp name="res21"
>                    myapp_home="/opt/myapp_21"
>                    shutdown_wait="0"/>
>      </service>
>      <service autostart="0" exclusive="0" name="srv22">
>              <myapp name="res22"
>                    myapp_home="/opt/myapp_22"
>                    shutdown_wait="0"/>
>      </service>
> </rm>
> 
> 
> As you can see I don't know how to specify the node
> 
> thanks in advance
> 
> gilberto
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
-- 
Rafael Mic? Miranda



From fdinitto at redhat.com  Tue Dec  8 00:09:30 2009
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 08 Dec 2009 01:09:30 +0100
Subject: [Linux-cluster] Cluster 3.0.6 stable release
Message-ID: <4B1D993A.5010402@redhat.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The cluster team and its community are proud to announce the 3.0.6
stable release from the STABLE3 branch.

This release contains a few major bug fixes. We strongly recommend
people to update your clusters.

IMPORTANT NOTE:
- - fence_xvm has now been obsoleted. fence_xvmd is provided as backward
compatibility tool. The new replacement can be downloaded here:
http://fence-virt.sourceforge.net/ and it also includes a fence_xvm
compatibility mode.

In order to build the 3.0.6 release you will need:

- - corosync 1.1.2
- - openais 1.1.1
- - linux kernel 2.6.31

The new source tarball can be downloaded here:

ftp://sources.redhat.com/pub/cluster/releases/cluster-3.0.6.tar.gz
https://fedorahosted.org/releases/c/l/cluster/cluster-3.0.6.tar.gz

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Thanks/congratulations to all people that contributed to achieve this
great milestone.

Happy clustering,
Fabio

Under the hood (from 3.0.5):

Abhijith Das (3):
      Revert "gfs2_convert: Fix rgrp conversion to allow re-converts"
      gfs2_convert: Fix rgrp conversion to allow re-converts
      gfs2_convert: Fix conversion of inodes with different heights on
gfs1 and gfs2

Bob Peterson (2):
      GFS2: fsck.gfs2 should fix the system statfs file
      GFS kernel panic, suid + nfsd with posix ACLs enabled

Christine Caulfield (3):
      cman: Look for group_tool in SBINDIR rather than PATH
      cman: Make consensus twice token timeout
      Revert "cman: Look for group_tool in SBINDIR rather than PATH"

David Teigland (3):
      group_tool: remove "groupd not running"
      dlm_controld: set rmem for sctp
      cman: remove set_networking_params

Fabio M. Di Nitto (7):
      rgmanager: make init script LSB compliant
      cman init: make init script LSB compliant
      rgmanager: init script should create lock file
      cman init: update help text
      rgmanager init: update help text
      rgmanage init: no need to re-init variables around
      fence_xvm: obsole in favour of fence_virt

Federico Simoncelli (1):
      resource-agents: Fix vm.sh return codes

Lon Hohberger (4):
      resource-agents: Add "path" support to virsh mode
      resource-agents: Fix some path support bugs in vm.sh
      resource-agents: Fix vm.sh migration failure handling
      config: Update Schemas for new fence_scsi

Marek 'marx' Grac (1):
      fence: RSB fence agents changed interface a bit

Ryan O'Hara (4):
      Remove fence_scsi_test.pl and update Makefile.
      New fence_scsi with config options.
      Change location of key file to /var/lib/cluster/fence_scsi.key
      Update fence_scsi man page.

 cman/daemon/cman-preconfig.c         |    8 +-
 cman/init.d/cman.in                  |   74 ++---
 config/plugins/ldap/99cluster.ldif   |   30 +-
 config/plugins/ldap/ldap-base.csv    |    5 +-
 config/tools/xml/cluster.rng.in      |   25 ++-
 fence/agents/rsb/fence_rsb.py        |   17 +-
 fence/agents/scsi/Makefile           |    4 +-
 fence/agents/scsi/fence_scsi.pl      |  712
++++++++++++++++++++++------------
 fence/agents/scsi/fence_scsi_test.pl |  236 -----------
 fence/agents/xvm/Makefile            |   51 +--
 fence/agents/xvm/fence_xvm.c         |  380 ------------------
 fence/agents/xvm/ip_lookup.c         |  307 ---------------
 fence/agents/xvm/ip_lookup.h         |   22 -
 fence/man/fence_scsi.8               |  148 ++++----
 gfs-kernel/src/gfs/eattr.c           |  107 +++---
 gfs-kernel/src/gfs/ops_file.c        |    5 +-
 gfs2/convert/gfs2_convert.c          |    3 +-
 gfs2/fsck/main.c                     |   70 ++++-
 gfs2/libgfs2/libgfs2.h               |    3 +-
 gfs2/libgfs2/structures.c            |   60 ++--
 gfs2/mkfs/main_mkfs.c                |    3 +-
 group/dlm_controld/action.c          |   88 +++++
 group/tool/main.c                    |    1 -
 rgmanager/init.d/rgmanager.in        |   14 +-
 rgmanager/src/resources/vm.sh        |  164 +++++++--
 25 files changed, 1044 insertions(+), 1493 deletions(-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksdmTgACgkQhCzbekR3nhiyRgCfdBL4GpYG48HZaULWaaP6EvrG
s+YAoJ2OLEKHjkHBAO+AkJs264y8kyUe
=vdD4
-----END PGP SIGNATURE-----



From avi at myphonebook.co.in  Wed Dec  9 07:03:59 2009
From: avi at myphonebook.co.in (avi at myphonebook.co.in)
Date: Wed, 09 Dec 2009 12:33:59 +0530 (IST)
Subject: [Linux-cluster] Cluster configuration enquiry
Message-ID: <1260342239.14007@myphonebook.co.in>

Hi

I am a newbie to clustering in Linux. Just wanted some advice. 

My requirements are as under: 

I am hosting several domains and dynamic/static websites. I need load balancing and redundancy.

Hardware : 3 systems ( one public IP address ).
 
 

outside world                    internal lan
---------------------> node A ------------------> node B   
public IP address        | 
                         | internal lan
                         |
                         v
                        node C


The cluster will use LVS-NAT and mysql clustering on gigabit ethernet.

node A: two interfaces with a public ip and an internal lan IP. It will host the mysql management node and LVS. 

node B and node C: apache + mysql storage nodes. connected to node A on internal IP.

LVS with persistence will make sure that user sessions are honored. Mysql cluster will make sure that the databases are up to date, on both nodes B and C. 

I do not plan to use GFS, because I do not want to invest in a SAN right now.

Any ideas or comments?

From pradhanparas at gmail.com  Wed Dec  9 18:54:10 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Wed, 9 Dec 2009 12:54:10 -0600
Subject: [Linux-cluster] changing heartbeat interface
Message-ID: <8b711df40912091054m3a8c5d7ax42d7cd0143898fde@mail.gmail.com>

hi,

I believe its not recommend but just curious to know about the consequences
of changing  the heartbeat of the cluster to the 2nd interface of the
cluster nodes. In this case  if the network switch fails , then cluster will
still be quorate since they will be connected each other with the 2nd
interfaces of the nodes and will not be fenced.

Thanks
Paras.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091209/49bf0f69/attachment.htm>

From rvandolson at esri.com  Wed Dec  9 19:08:28 2009
From: rvandolson at esri.com (Ray Van Dolson)
Date: Wed, 9 Dec 2009 11:08:28 -0800
Subject: [Linux-cluster] Backup strategies for large-ish GFS2 filesystems.
Message-ID: <20091209190828.GA8880@esri.com>

How do those of you with large-ish GFS2 filesystems (and multiple
nodes) handle backups?  I'm specifically thinking of people running
mailspools and such with many files.

I'd be interested in hearing your space usage, inode usage and how long
it takes you to do a full and diff backup to see if the numbers we're
seeing are reasonable.

Thanks!
Ray



From johannes.russek at io-consulting.net  Thu Dec 10 11:03:48 2009
From: johannes.russek at io-consulting.net (jr)
Date: Thu, 10 Dec 2009 12:03:48 +0100
Subject: [Linux-cluster] Backup strategies for large-ish GFS2 filesystems.
In-Reply-To: <20091209190828.GA8880@esri.com>
References: <20091209190828.GA8880@esri.com>
Message-ID: <1260443028.15239.2.camel@dell-jr.intern.win-rar.com>

Hello Ray,
unfortunately we only have a very small gfs volume running, but how are
you doing backups? Are you doing snapshots and mounting them with
lockproto=lock_nolock?
regards,
Johannes

Am Mittwoch, den 09.12.2009, 11:08 -0800 schrieb Ray Van Dolson:
> How do those of you with large-ish GFS2 filesystems (and multiple
> nodes) handle backups?  I'm specifically thinking of people running
> mailspools and such with many files.
> 
> I'd be interested in hearing your space usage, inode usage and how long
> it takes you to do a full and diff backup to see if the numbers we're
> seeing are reasonable.
> 
> Thanks!
> Ray
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From gbmiglia at yahoo.it  Thu Dec 10 15:15:32 2009
From: gbmiglia at yahoo.it (gilberto migliavacca)
Date: Thu, 10 Dec 2009 16:15:32 +0100
Subject: [Linux-cluster] redhat cluster and resource agent
In-Reply-To: <1260227761.6606.9.camel@mecatol>
References: <4B1D3E2D.8040109@yahoo.it> <1260227761.6606.9.camel@mecatol>
Message-ID: <4B211094.5010501@yahoo.it>

Thanks for helping me.
now the configuration seems ok;
but I have another problem, I'll open a new thred

gilberto



Rafael Mic? Miranda wrote:
> Hi Gilberto,
> 
> What you need to specify where to run each service is the Failover
> Domain of each service.
> 
> Some info:
> 
> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Cluster_Administration/s1-config-failover-domain-CA.html
> http://sources.redhat.com/cluster/wiki/FailoverDomains
> 
> You have 4 different services, so I would use 2 or 4 different Failover
> Domains to achieve your objective, depending on the availability of
> running each of your services in your cluster nodes.
> 
> First you will need to define de Failover Domains:
> 
> Failover Domain X
> Restricted domain: yes
> Ordered: yes
> Node A - Priority 1
> Node B - Priority 2
> 
> And so on.
> 
> Then you'll need to set the Failover Domain for each of the services,
> for example:
> Service 1 -> FailoverDomain1
> Service 2 -> FailoverDomain2
> Service 3 -> FailoverDomain3
> Service 4 -> FailoverDomain4
> 
> This can be all done with system-config-cluster, but using a resource
> made by yourself into cluster.conf will give you some errors.
> 
> It should be similar to this:
> 
> <rm>
> 	<failoverdomains>
> 		<failoverdomain name="failoverdomain1" ordered="1"
> restricted="1"><failoverdomainnode name="ulises"
> priority="1"/><failoverdomainnode name="aquiles"
> priority="2"/></failoverdomain>
> 		<failoverdomain name="failoverdomain2" ordered="1"
> restricted="1"><failoverdomainnode name="caligula"
> priority="1"/><failoverdomainnode name="ulises"
> priority="2"/></failoverdomain>
> 	</failoverdomains>
> 	<resources>
> [I think you need your definition of your myapp resources here]
> 	<myapp name="res11" myapp_home="/opt/myapp_11" shutdown_wait="0"/>
> [...and so on]
> 	</resources>
> [and then start the definition of your services]
> 	<service autostart="0" exclusive="0" name="srv11">
>              <myapp ref="res11"/>
> 	</service>
> [... and so on]
> </rm>
> 
> Another question is: is your script usable by CMAN?
> 
> I hope this helps. Cheers,
> 
> Rafael
> 
> El lun, 07-12-2009 a las 18:41 +0100, gilberto migliavacca escribi?:
>> Hi
>>
>> I'm a newbie in the red hat cluster configuration and
>> I don't know if this is the right mailing list for my
>> question.
>>
>> I have to use my own resource agent script and I have
>> to say to the cluster that the related service must be
>> run just on single server.
>>
>> I other words I want to drive 2 nodes with 4 instances
>> of the same application (2 instances per node).
>>
>> the infostructure is somehting like:
>>
>> node_1
>> /opt/myapp_11/bin/myapp.sh
>> /opt/myapp_12/bin/myapp.sh
>> node_2
>> /opt/myapp_21/bin/myapp.sh
>> /opt/myapp_22/bin/myapp.sh
>>
>>
>> My idea is to create 4 services in the /etc/cluster/cluster.conf
>> but I don't know how to related the service with a
>> given machine and a related path on the given machine
>>
>>
>> for my understanding I think I cannot use the Conga GUI (neither
>> the system-config-cluster) and I have to edit manually the
>> /etc/cluster/cluster.conf
>>
>> could anyone help to write the XML section in the <rm> tag?
>>
>> something like
>>
>> <rm>
>>      <service autostart="0" exclusive="0" name="srv11">
>>              <myapp name="res11"
>>                    myapp_home="/opt/myapp_11"
>>                    shutdown_wait="0"/>
>>      </service>
>>      <service autostart="0" exclusive="0" name="srv12">
>>              <myapp name="res12"
>>                    myapp_home="/opt/myapp_12"
>>                    shutdown_wait="0"/>
>>      </service>
>>      <service autostart="0" exclusive="0" name="srv21">
>>              <myapp name="res21"
>>                    myapp_home="/opt/myapp_21"
>>                    shutdown_wait="0"/>
>>      </service>
>>      <service autostart="0" exclusive="0" name="srv22">
>>              <myapp name="res22"
>>                    myapp_home="/opt/myapp_22"
>>                    shutdown_wait="0"/>
>>      </service>
>> </rm>
>>
>>
>> As you can see I don't know how to specify the node
>>
>> thanks in advance
>>
>> gilberto
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster



From gbmiglia at yahoo.it  Thu Dec 10 15:26:30 2009
From: gbmiglia at yahoo.it (gilberto migliavacca)
Date: Thu, 10 Dec 2009 16:26:30 +0100
Subject: [Linux-cluster] how to start/stop a service
Message-ID: <4B211326.3060102@yahoo.it>

Hi

I have the following configuration: 2 nodes with the same
application "fun".
This is the /etc/cluster/cluster.conf

<?xml version="1.0"?>
<cluster alias="FunCluster" config_version="33" name="FunCluster">

   <cman expected_votes="1" two_node="1"/>

     <clusternodes>
         <clusternode name="redhat02.fun.uk" nodeid="1" votes="1">
           <fence/>
         </clusternode>
         <clusternode name="redhat03.fun.uk" nodeid="2" votes="1">
           <fence/>
         </clusternode>
     </clusternodes>

     <fencedevices/>

     <rm>

       <failoverdomains>
           <failoverdomain name="red2" nofailback="1" ordered="1" 
restricted="1">
                <failoverdomainnode name="redhat02.fun.uk" priority="1"/>
           </failoverdomain>
           <failoverdomain name="red3" nofailback="1" ordered="1" 
restricted="1">
                <failoverdomainnode name="redhat03.fun.uk" priority="1"/>
           </failoverdomain>
       </failoverdomains>

       <service autostart="0" domain="red2" name="fun11" recovery="restart">
         <fun name="res11" fun_home="/opt/fun1_1"/>
       </service>

       <service autostart="0" domain="red3" name="fun22" recovery="restart">
         <fun name="res22" fun_home="/opt/fun2_2/"/>
       </service>

     </rm>

     <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>

</cluster>



Now I'd like to manage (start/stop) manually both instances
from the node redhat02.fun.uk;

I'm using the command line tool but when I run

clusvcadm -e fun11 -m redhat02.fun.uk

the application starts correctly

when I run

clusvcadm -e fun22 -m redhat03.fun.uk

the output says:

Member redhat03.fun.uk trying to enable service:fun22...Success
service:fun22 is now running on redhat03.fun.uk

but the service is not up and running on the redhat03.fun.uk


can anybody  help me with this issue?

thanks in advance

gilberto



From kkovachev at varna.net  Thu Dec 10 15:27:44 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Thu, 10 Dec 2009 17:27:44 +0200
Subject: [Linux-cluster] validity error
Message-ID: <20091210151736.M20864@varna.net>

Hello,
 after upgrading to 3.0.6 i get:

Starting cman... Relax-NG validity error : Extra element cman in interleave

but cluster.conf should be correct and was working so far without problems.
The coresponding section in <cluster> is:

<cman expected_votes="7">
   <multicast addr="239.192.11.81" keyfile="/etc/cluster/cman_authkey"/>
</cman>

how should i change it to pass the validity check?



From ccaulfie at redhat.com  Thu Dec 10 16:12:33 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Thu, 10 Dec 2009 16:12:33 +0000
Subject: [Linux-cluster] validity error
In-Reply-To: <20091210151736.M20864@varna.net>
References: <20091210151736.M20864@varna.net>
Message-ID: <4B211DF1.9030305@redhat.com>

On 10/12/09 15:27, Kaloyan Kovachev wrote:
> Hello,
>   after upgrading to 3.0.6 i get:
>
> Starting cman... Relax-NG validity error : Extra element cman in interleave
>
> but cluster.conf should be correct and was working so far without problems.
> The coresponding section in<cluster>  is:
>
> <cman expected_votes="7">
>     <multicast addr="239.192.11.81" keyfile="/etc/cluster/cman_authkey"/>
> </cman>
>
> how should i change it to pass the validity check?

Remove the keyfile="" attribute. cman ignores it anyway :-)

If you need to specify an encrpytion key it should go into the <totem> 
part of cluster.conf.


Chrissie



From rvandolson at esri.com  Thu Dec 10 16:25:09 2009
From: rvandolson at esri.com (Ray Van Dolson)
Date: Thu, 10 Dec 2009 08:25:09 -0800
Subject: [Linux-cluster] Backup strategies for large-ish GFS2 filesystems.
In-Reply-To: <1260443028.15239.2.camel@dell-jr.intern.win-rar.com>
References: <20091209190828.GA8880@esri.com>
	<1260443028.15239.2.camel@dell-jr.intern.win-rar.com>
Message-ID: <20091210162508.GA24895@esri.com>

On Thu, Dec 10, 2009 at 03:03:48AM -0800, jr wrote:
> Hello Ray,
> unfortunately we only have a very small gfs volume running, but how are
> you doing backups? Are you doing snapshots and mounting them with
> lockproto=lock_nolock?
> regards,
> Johannes

That would be ideal -- unfortunately our underlying storage hardware
(IBM DS4300/FASt600) does not support snapshots.  If cLVM supported
snapshots I'd jump on going that route in a millisecond... :)

We've tried three methods (1) NetBackup to exposed NFS export of GFS2
filesystem; (2) rsync from remote machine to rsyncd on GFS2 node; (3)
rsync from remote machine to NFS export of GFS2 filesystem.

Option 1 is the slowest (6+ hours), 2 is somewhat better (3 hours) and
3 has been our best bet so far (82 minutes).  This is using the
--size-only argument to rsycn in an effort to avoid reading mtime on an
inode.  Probably not much gain though as it appears stat() is called
anyways.

I'm kind of surprised that rsync to NFS is faster than rsync --daemon.

I have been testing with our GFS2 filesystem mounted in spectator mode
on the passive node, but I don't think it's really making much
difference.

It would be nice if GFS2 had some backup-friendly type options for
caching some of this information about all our inodes.  I mean,
obviously it does -- but some knobs we could easily turn on a node
we intend to run backups from that, given ample amount of memory, cache
all the stat() information for 24+ hour periods...

Or maybe some cluster filesystem friendly backup tools as I see these
problems exist on OCFS2 and Lustre as well... 

Thanks for the reply.

> 
> Am Mittwoch, den 09.12.2009, 11:08 -0800 schrieb Ray Van Dolson:
> > How do those of you with large-ish GFS2 filesystems (and multiple
> > nodes) handle backups?  I'm specifically thinking of people running
> > mailspools and such with many files.
> > 
> > I'd be interested in hearing your space usage, inode usage and how long
> > it takes you to do a full and diff backup to see if the numbers we're
> > seeing are reasonable.
> > 
> > Thanks!
> > Ray
> > 

Ray



From gbmiglia at yahoo.it  Thu Dec 10 16:45:02 2009
From: gbmiglia at yahoo.it (gilberto migliavacca)
Date: Thu, 10 Dec 2009 17:45:02 +0100
Subject: [Linux-cluster] SOLVED - how to start/stop a service
In-Reply-To: <4B211326.3060102@yahoo.it>
References: <4B211326.3060102@yahoo.it>
Message-ID: <4B21258E.5050107@yahoo.it>

there was an error in the log due a incorrect settings for a given 
property in the "metadafile".

in that case the configuration was not applied correctly.

Now I fixed the problem in the "metadafile" and the

clusvcadm command works properly

gilberto

gilberto migliavacca wrote:
> Hi
> 
> I have the following configuration: 2 nodes with the same
> application "fun".
> This is the /etc/cluster/cluster.conf
> 
> <?xml version="1.0"?>
> <cluster alias="FunCluster" config_version="33" name="FunCluster">
> 
>   <cman expected_votes="1" two_node="1"/>
> 
>     <clusternodes>
>         <clusternode name="redhat02.fun.uk" nodeid="1" votes="1">
>           <fence/>
>         </clusternode>
>         <clusternode name="redhat03.fun.uk" nodeid="2" votes="1">
>           <fence/>
>         </clusternode>
>     </clusternodes>
> 
>     <fencedevices/>
> 
>     <rm>
> 
>       <failoverdomains>
>           <failoverdomain name="red2" nofailback="1" ordered="1" 
> restricted="1">
>                <failoverdomainnode name="redhat02.fun.uk" priority="1"/>
>           </failoverdomain>
>           <failoverdomain name="red3" nofailback="1" ordered="1" 
> restricted="1">
>                <failoverdomainnode name="redhat03.fun.uk" priority="1"/>
>           </failoverdomain>
>       </failoverdomains>
> 
>       <service autostart="0" domain="red2" name="fun11" recovery="restart">
>         <fun name="res11" fun_home="/opt/fun1_1"/>
>       </service>
> 
>       <service autostart="0" domain="red3" name="fun22" recovery="restart">
>         <fun name="res22" fun_home="/opt/fun2_2/"/>
>       </service>
> 
>     </rm>
> 
>     <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
> 
> </cluster>
> 
> 
> 
> Now I'd like to manage (start/stop) manually both instances
> from the node redhat02.fun.uk;
> 
> I'm using the command line tool but when I run
> 
> clusvcadm -e fun11 -m redhat02.fun.uk
> 
> the application starts correctly
> 
> when I run
> 
> clusvcadm -e fun22 -m redhat03.fun.uk
> 
> the output says:
> 
> Member redhat03.fun.uk trying to enable service:fun22...Success
> service:fun22 is now running on redhat03.fun.uk
> 
> but the service is not up and running on the redhat03.fun.uk
> 
> 
> can anybody  help me with this issue?
> 
> thanks in advance
> 
> gilberto
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From kkovachev at varna.net  Fri Dec 11 09:48:02 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Fri, 11 Dec 2009 11:48:02 +0200
Subject: [Linux-cluster] validity error
In-Reply-To: <4B211DF1.9030305@redhat.com>
References: <20091210151736.M20864@varna.net> <4B211DF1.9030305@redhat.com>
Message-ID: <20091211093427.M4078@varna.net>

On Thu, 10 Dec 2009 16:12:33 +0000, Christine Caulfield wrote
> On 10/12/09 15:27, Kaloyan Kovachev wrote:
> > Hello,
> >   after upgrading to 3.0.6 i get:
> >
> > Starting cman... Relax-NG validity error : Extra element cman in interleave
> >
> > but cluster.conf should be correct and was working so far without problems.
> > The coresponding section in<cluster>  is:
> >
> > <cman expected_votes="7">
> >     <multicast addr="239.192.11.81" keyfile="/etc/cluster/cman_authkey"/>
> > </cman>
> >
> > how should i change it to pass the validity check?
> 
> Remove the keyfile="" attribute. cman ignores it anyway :-)
> 

I am sure it was working with RHCM v2, so it seems i will need to rewrite the
config for V3, as i get another error now about specifying multicast interface
for clusternode and there will be others for sure

> If you need to specify an encrpytion key it should go into the <totem> 
> part of cluster.conf.
> 

looking at cluster.rng keyfile is valid for the cman block. May i just move it
there or i should create <totem>

> Chrissie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From ccaulfie at redhat.com  Fri Dec 11 09:58:38 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Fri, 11 Dec 2009 09:58:38 +0000
Subject: [Linux-cluster] validity error
In-Reply-To: <20091211093427.M4078@varna.net>
References: <20091210151736.M20864@varna.net> <4B211DF1.9030305@redhat.com>
	<20091211093427.M4078@varna.net>
Message-ID: <4B2217CE.3060007@redhat.com>

On 11/12/09 09:48, Kaloyan Kovachev wrote:
> On Thu, 10 Dec 2009 16:12:33 +0000, Christine Caulfield wrote
>> On 10/12/09 15:27, Kaloyan Kovachev wrote:
>>> Hello,
>>>    after upgrading to 3.0.6 i get:
>>>
>>> Starting cman... Relax-NG validity error : Extra element cman in interleave
>>>
>>> but cluster.conf should be correct and was working so far without problems.
>>> The coresponding section in<cluster>   is:
>>>
>>> <cman expected_votes="7">
>>>      <multicast addr="239.192.11.81" keyfile="/etc/cluster/cman_authkey"/>
>>> </cman>
>>>
>>> how should i change it to pass the validity check?
>>
>> Remove the keyfile="" attribute. cman ignores it anyway :-)
>>
>
> I am sure it was working with RHCM v2, so it seems i will need to rewrite the
> config for V3, as i get another error now about specifying multicast interface
> for clusternode and there will be others for sure

Yes, it would work fine under v2. In fact it's working now - you're just 
getting a warning message (I hope!). We have added a lot more checks to 
the configuration to try and help invalid configurations from being run 
and causing trouble.


>> If you need to specify an encrpytion key it should go into the<totem>
>> part of cluster.conf.
>>
>
> looking at cluster.rng keyfile is valid for the cman block. May i just move it
> there or i should create<totem>

I would just remove it. It's not doing anything, so if you move it to 
<totem> you will change the encryption key used by the cluster and have 
to reboot all your nodes to get them communicating again.

Chrissie



From kkovachev at varna.net  Fri Dec 11 10:21:41 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Fri, 11 Dec 2009 12:21:41 +0200
Subject: [Linux-cluster] validity error
In-Reply-To: <4B2217CE.3060007@redhat.com>
References: <20091210151736.M20864@varna.net> <4B211DF1.9030305@redhat.com>
	<20091211093427.M4078@varna.net> <4B2217CE.3060007@redhat.com>
Message-ID: <20091211100852.M47131@varna.net>

On Fri, 11 Dec 2009 09:58:38 +0000, Christine Caulfield wrote
> On 11/12/09 09:48, Kaloyan Kovachev wrote:
> > On Thu, 10 Dec 2009 16:12:33 +0000, Christine Caulfield wrote
> >> On 10/12/09 15:27, Kaloyan Kovachev wrote:
> >>> Hello,
> >>>    after upgrading to 3.0.6 i get:
> >>>
> >>> Starting cman... Relax-NG validity error : Extra element cman in interleave
> >>>
> >>> but cluster.conf should be correct and was working so far without problems.
> >>> The coresponding section in<cluster>   is:
> >>>
> >>> <cman expected_votes="7">
> >>>      <multicast addr="239.192.11.81" keyfile="/etc/cluster/cman_authkey"/>
> >>> </cman>
> >>>
> >>> how should i change it to pass the validity check?
> >>
> >> Remove the keyfile="" attribute. cman ignores it anyway :-)
> >>
> >
> > I am sure it was working with RHCM v2, so it seems i will need to rewrite the
> > config for V3, as i get another error now about specifying multicast interface
> > for clusternode and there will be others for sure
> 
> Yes, it would work fine under v2. In fact it's working now - you're just 
> getting a warning message (I hope!). We have added a lot more checks to 
> the configuration to try and help invalid configurations from being run 
> and causing trouble.

when starting the cluster i get just warnings, but updating the config and
using cman_tool version -r cman doesn't reload it, so i am forced to fix my
errors :)

> 
> >> If you need to specify an encrpytion key it should go into the<totem>
> >> part of cluster.conf.
> >>
> >
> > looking at cluster.rng keyfile is valid for the cman block. May i just move it
> > there or i should create<totem>
> 
> I would just remove it. It's not doing anything, so if you move it to 
> <totem> you will change the encryption key used by the cluster and have 
> to reboot all your nodes to get them communicating again.
> 

The cluster is not a production one, so it is OK and am looking for the
correct end result. My question was actually 'Is encription key valid/used
only from <totem> section or in <cman> too as described in cluster.rng file'.

Multicast and keyfile are present in both <cman> and <totem> ... i guess
<totem> is the preferred one for future compatibility?

> Chrissie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From ccaulfie at redhat.com  Fri Dec 11 10:24:49 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Fri, 11 Dec 2009 10:24:49 +0000
Subject: [Linux-cluster] validity error
In-Reply-To: <20091211100852.M47131@varna.net>
References: <20091210151736.M20864@varna.net>
	<4B211DF1.9030305@redhat.com>	<20091211093427.M4078@varna.net>
	<4B2217CE.3060007@redhat.com> <20091211100852.M47131@varna.net>
Message-ID: <4B221DF1.2030300@redhat.com>

On 11/12/09 10:21, Kaloyan Kovachev wrote:
> On Fri, 11 Dec 2009 09:58:38 +0000, Christine Caulfield wrote
>> On 11/12/09 09:48, Kaloyan Kovachev wrote:
>>> On Thu, 10 Dec 2009 16:12:33 +0000, Christine Caulfield wrote
>>>> On 10/12/09 15:27, Kaloyan Kovachev wrote:
>>>>> Hello,
>>>>>     after upgrading to 3.0.6 i get:
>>>>>
>>>>> Starting cman... Relax-NG validity error : Extra element cman in interleave
>>>>>
>>>>> but cluster.conf should be correct and was working so far without problems.
>>>>> The coresponding section in<cluster>    is:
>>>>>
>>>>> <cman expected_votes="7">
>>>>>       <multicast addr="239.192.11.81" keyfile="/etc/cluster/cman_authkey"/>
>>>>> </cman>
>>>>>
>>>>> how should i change it to pass the validity check?
>>>>
>>>> Remove the keyfile="" attribute. cman ignores it anyway :-)
>>>>
>>>
>>> I am sure it was working with RHCM v2, so it seems i will need to rewrite the
>>> config for V3, as i get another error now about specifying multicast interface
>>> for clusternode and there will be others for sure
>>
>> Yes, it would work fine under v2. In fact it's working now - you're just
>> getting a warning message (I hope!). We have added a lot more checks to
>> the configuration to try and help invalid configurations from being run
>> and causing trouble.
>
> when starting the cluster i get just warnings, but updating the config and
> using cman_tool version -r cman doesn't reload it, so i am forced to fix my
> errors :)
>
>>
>>>> If you need to specify an encrpytion key it should go into the<totem>
>>>> part of cluster.conf.
>>>>
>>>
>>> looking at cluster.rng keyfile is valid for the cman block. May i just move it
>>> there or i should create<totem>
>>
>> I would just remove it. It's not doing anything, so if you move it to
>> <totem>  you will change the encryption key used by the cluster and have
>> to reboot all your nodes to get them communicating again.
>>
>
> The cluster is not a production one, so it is OK and am looking for the
> correct end result. My question was actually 'Is encription key valid/used
> only from<totem>  section or in<cman>  too as described in cluster.rng file'.
>
> Multicast and keyfile are present in both<cman>  and<totem>  ... i guess
> <totem>  is the preferred one for future compatibility?
>

Confusingly, multicast must be part of <cman> and keyfile should be part 
of <totem>.

That's just how it is, sorry ;-)

Chrissie



From kkovachev at varna.net  Fri Dec 11 11:21:36 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Fri, 11 Dec 2009 13:21:36 +0200
Subject: [Linux-cluster] validity error
In-Reply-To: <4B221DF1.2030300@redhat.com>
References: <20091210151736.M20864@varna.net>
	<4B211DF1.9030305@redhat.com>	<20091211093427.M4078@varna.net>
	<4B2217CE.3060007@redhat.com> <20091211100852.M47131@varna.net>
	<4B221DF1.2030300@redhat.com>
Message-ID: <20091211105109.M90980@varna.net>

On Fri, 11 Dec 2009 10:24:49 +0000, Christine Caulfield wrote
> On 11/12/09 10:21, Kaloyan Kovachev wrote:
> > On Fri, 11 Dec 2009 09:58:38 +0000, Christine Caulfield wrote
> >> On 11/12/09 09:48, Kaloyan Kovachev wrote:
> >>> On Thu, 10 Dec 2009 16:12:33 +0000, Christine Caulfield wrote
> >>>> On 10/12/09 15:27, Kaloyan Kovachev wrote:
> >>>>> Hello,
> >>>>>     after upgrading to 3.0.6 i get:
> >>>>>
> >>>>> Starting cman... Relax-NG validity error : Extra element cman in
interleave
> >>>>>
> >>>>> but cluster.conf should be correct and was working so far without
problems.
> >>>>> The coresponding section in<cluster>    is:
> >>>>>
> >>>>> <cman expected_votes="7">
> >>>>>       <multicast addr="239.192.11.81"
keyfile="/etc/cluster/cman_authkey"/>
> >>>>> </cman>
> >>>>>
> >>>>> how should i change it to pass the validity check?
> >>>>
> >>>> Remove the keyfile="" attribute. cman ignores it anyway :-)
> >>>>
> >>>
> >>> I am sure it was working with RHCM v2, so it seems i will need to
rewrite the
> >>> config for V3, as i get another error now about specifying multicast
interface
> >>> for clusternode and there will be others for sure
> >>
> >> Yes, it would work fine under v2. In fact it's working now - you're just
> >> getting a warning message (I hope!). We have added a lot more checks to
> >> the configuration to try and help invalid configurations from being run
> >> and causing trouble.
> >
> > when starting the cluster i get just warnings, but updating the config and
> > using cman_tool version -r cman doesn't reload it, so i am forced to fix my
> > errors :)
> >
> >>
> >>>> If you need to specify an encrpytion key it should go into the<totem>
> >>>> part of cluster.conf.
> >>>>
> >>>
> >>> looking at cluster.rng keyfile is valid for the cman block. May i just
move it
> >>> there or i should create<totem>
> >>
> >> I would just remove it. It's not doing anything, so if you move it to
> >> <totem>  you will change the encryption key used by the cluster and have
> >> to reboot all your nodes to get them communicating again.
> >>
> >
> > The cluster is not a production one, so it is OK and am looking for the
> > correct end result. My question was actually 'Is encription key valid/used
> > only from<totem>  section or in<cman>  too as described in cluster.rng file'.
> >
> > Multicast and keyfile are present in both<cman>  and<totem>  ... i guess
> > <totem>  is the preferred one for future compatibility?
> >
> 
> Confusingly, multicast must be part of <cman> and keyfile should be part 
> of <totem>.
> 
> That's just how it is, sorry ;-)
> 

Thanks. The validation schema should be updated then, as it allows keyfile in
cman too (fixed in my copy and could provide a patch).

I still can't find how do i specify per node interface. There is interface
allowed only in <totem> section while most of the nodes have (and use) bond0,
for one of them i need to specify different interface for cluster
communication. Is per node interface attribute missed from the validation
schema or is removed completely?

> Chrissie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From ccaulfie at redhat.com  Fri Dec 11 11:36:53 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Fri, 11 Dec 2009 11:36:53 +0000
Subject: [Linux-cluster] validity error
In-Reply-To: <20091211105109.M90980@varna.net>
References: <20091210151736.M20864@varna.net>	<4B211DF1.9030305@redhat.com>	<20091211093427.M4078@varna.net>	<4B2217CE.3060007@redhat.com>
	<20091211100852.M47131@varna.net>	<4B221DF1.2030300@redhat.com>
	<20091211105109.M90980@varna.net>
Message-ID: <4B222ED5.4040805@redhat.com>

On 11/12/09 11:21, Kaloyan Kovachev wrote:
> On Fri, 11 Dec 2009 10:24:49 +0000, Christine Caulfield wrote
>> On 11/12/09 10:21, Kaloyan Kovachev wrote:
>>> On Fri, 11 Dec 2009 09:58:38 +0000, Christine Caulfield wrote
>>>> On 11/12/09 09:48, Kaloyan Kovachev wrote:
>>>>> On Thu, 10 Dec 2009 16:12:33 +0000, Christine Caulfield wrote
>>>>>> On 10/12/09 15:27, Kaloyan Kovachev wrote:
>>>>>>> Hello,
>>>>>>>      after upgrading to 3.0.6 i get:
>>>>>>>
>>>>>>> Starting cman... Relax-NG validity error : Extra element cman in
> interleave
>>>>>>>
>>>>>>> but cluster.conf should be correct and was working so far without
> problems.
>>>>>>> The coresponding section in<cluster>     is:
>>>>>>>
>>>>>>> <cman expected_votes="7">
>>>>>>>        <multicast addr="239.192.11.81"
> keyfile="/etc/cluster/cman_authkey"/>
>>>>>>> </cman>
>>>>>>>
>>>>>>> how should i change it to pass the validity check?
>>>>>>
>>>>>> Remove the keyfile="" attribute. cman ignores it anyway :-)
>>>>>>
>>>>>
>>>>> I am sure it was working with RHCM v2, so it seems i will need to
> rewrite the
>>>>> config for V3, as i get another error now about specifying multicast
> interface
>>>>> for clusternode and there will be others for sure
>>>>
>>>> Yes, it would work fine under v2. In fact it's working now - you're just
>>>> getting a warning message (I hope!). We have added a lot more checks to
>>>> the configuration to try and help invalid configurations from being run
>>>> and causing trouble.
>>>
>>> when starting the cluster i get just warnings, but updating the config and
>>> using cman_tool version -r cman doesn't reload it, so i am forced to fix my
>>> errors :)
>>>
>>>>
>>>>>> If you need to specify an encrpytion key it should go into the<totem>
>>>>>> part of cluster.conf.
>>>>>>
>>>>>
>>>>> looking at cluster.rng keyfile is valid for the cman block. May i just
> move it
>>>>> there or i should create<totem>
>>>>
>>>> I would just remove it. It's not doing anything, so if you move it to
>>>> <totem>   you will change the encryption key used by the cluster and have
>>>> to reboot all your nodes to get them communicating again.
>>>>
>>>
>>> The cluster is not a production one, so it is OK and am looking for the
>>> correct end result. My question was actually 'Is encription key valid/used
>>> only from<totem>   section or in<cman>   too as described in cluster.rng file'.
>>>
>>> Multicast and keyfile are present in both<cman>   and<totem>   ... i guess
>>> <totem>   is the preferred one for future compatibility?
>>>
>>
>> Confusingly, multicast must be part of<cman>  and keyfile should be part
>> of<totem>.
>>
>> That's just how it is, sorry ;-)
>>
>
> Thanks. The validation schema should be updated then, as it allows keyfile in
> cman too (fixed in my copy and could provide a patch).

Hmm. I am totally wrong. Very sorry. keyfile IS allowed in cluster3, it 
overrides the one assigned in totem. In which case I'm not sure why it's 
failing to validate on your system.

The schema is a bit of a work-in-progress at the moment, which is why it 
warns rather than fails if it finds an error. Did it work when you 
removed keyfile ?


> I still can't find how do i specify per node interface. There is interface
> allowed only in<totem>  section while most of the nodes have (and use) bond0,
> for one of them i need to specify different interface for cluster
> communication. Is per node interface attribute missed from the validation
> schema or is removed completely?
>

cman always binds to the address of the host given as a node name. see

http://sources.redhat.com/cluster/wiki/FAQ/CMAN#cman_heartbeat_nic


Chrissie



From kkovachev at varna.net  Fri Dec 11 12:32:39 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Fri, 11 Dec 2009 14:32:39 +0200
Subject: [Linux-cluster] validity error
In-Reply-To: <4B222ED5.4040805@redhat.com>
References: <20091210151736.M20864@varna.net>	<4B211DF1.9030305@redhat.com>	<20091211093427.M4078@varna.net>	<4B2217CE.3060007@redhat.com>
	<20091211100852.M47131@varna.net>	<4B221DF1.2030300@redhat.com>
	<20091211105109.M90980@varna.net> <4B222ED5.4040805@redhat.com>
Message-ID: <20091211115838.M23490@varna.net>

On Fri, 11 Dec 2009 11:36:53 +0000, Christine Caulfield wrote

<snip>

> 
> Hmm. I am totally wrong. Very sorry. keyfile IS allowed in cluster3, it 
> overrides the one assigned in totem. In which case I'm not sure why it's 
> failing to validate on your system.
> 

according to the validation file (cluster.rng) it should be an attribute of
cman, while in my case it was attribute of multicast subelement and is not
allowed there

> The schema is a bit of a work-in-progress at the moment, which is why it 
> warns rather than fails if it finds an error. Did it work when you 
> removed keyfile ?
> 

Yes i know and trying to help with one more config case (hence my email here
in the first place). I have replaced it with <totem keyfile=".."> and passed
this warning, but still can't pass the validation because of rm ... still
looking for the reason.

> > I still can't find how do i specify per node interface. There is interface
> > allowed only in<totem>  section while most of the nodes have (and use) bond0,
> > for one of them i need to specify different interface for cluster
> > communication. Is per node interface attribute missed from the validation
> > schema or is removed completely?
> >
> 
> cman always binds to the address of the host given as a node name. see
> 
> http://sources.redhat.com/cluster/wiki/FAQ/CMAN#cman_heartbeat_nic
> 

I have used the answer of "How can I configure my RHEL4 cluster to use
multicast rather than broadcast?" few lines below (with V2 initially), so it
is safe to just remove this in my conf now and that step was passed.

> Chrissie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From kkovachev at varna.net  Fri Dec 11 13:34:26 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Fri, 11 Dec 2009 15:34:26 +0200
Subject: [Linux-cluster] validity error
In-Reply-To: <20091211115838.M23490@varna.net>
References: <20091210151736.M20864@varna.net>	<4B211DF1.9030305@redhat.com>	<20091211093427.M4078@varna.net>	<4B2217CE.3060007@redhat.com>
	<20091211100852.M47131@varna.net>	<4B221DF1.2030300@redhat.com>
	<20091211105109.M90980@varna.net> <4B222ED5.4040805@redhat.com>
	<20091211115838.M23490@varna.net>
Message-ID: <20091211130955.M1600@varna.net>

Update

On Fri, 11 Dec 2009 14:32:39 +0200, Kaloyan Kovachev wrote
> On Fri, 11 Dec 2009 11:36:53 +0000, Christine Caulfield wrote
> 
> <snip>
> 
> > 
> > Hmm. I am totally wrong. Very sorry. keyfile IS allowed in cluster3, it 
> > overrides the one assigned in totem. In which case I'm not sure why it's 
> > failing to validate on your system.
> >
> 
> according to the validation file (cluster.rng) it should be an attribute of
> cman, while in my case it was attribute of multicast subelement and is not
> allowed there
> 
> > The schema is a bit of a work-in-progress at the moment, which is why it 
> > warns rather than fails if it finds an error. Did it work when you 
> > removed keyfile ?
> >
> 
> Yes i know and trying to help with one more config case (hence my email here
> in the first place). I have replaced it with <totem keyfile=".."> and passed
> this warning, but still can't pass the validation because of rm ... still
> looking for the reason.
> 

After i have added validation section for our custom service resource and
reloaded the config the nodes were still communicating with each other.
Then i have restarted one of the nodes and it didn't join the cluster, but
created a new one as do other restarted nodes did later. It seems the keyfile
was not active with the old config and not activated on just reload

> > > I still can't find how do i specify per node interface. There is interface
> > > allowed only in<totem>  section while most of the nodes have (and use)
bond0,
> > > for one of them i need to specify different interface for cluster
> > > communication. Is per node interface attribute missed from the validation
> > > schema or is removed completely?
> > >
> > 
> > cman always binds to the address of the host given as a node name. see
> > 
> > http://sources.redhat.com/cluster/wiki/FAQ/CMAN#cman_heartbeat_nic
> >
> 
> I have used the answer of "How can I configure my RHEL4 cluster to use
> multicast rather than broadcast?" few lines below (with V2 initially), so it
> is safe to just remove this in my conf now and that step was passed.
> 
> > Chrissie
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From baishuwei at gmail.com  Sun Dec 13 04:46:48 2009
From: baishuwei at gmail.com (Bai Shuwei)
Date: Sun, 13 Dec 2009 12:46:48 +0800
Subject: [Linux-cluster] How to set lun masking
Message-ID: <f3566d60912122046w2c2ae576w2abcfaced911d709@mail.gmail.com>

Hi, All:

The bellow is my system architecture

HOST0 ---|
HOST1 ---| --switch --| server (disk0, disk1, disk2 or LUN0, LUN1, LUN2).
HOST2 ---|

I want to assign LUN0 to HOST0, LUN1 to HOST1, and LUN2 to HOST2. There is
only one QLogic HBA on the server. In my zone i build, all HOST can see all
LUNs. So I want to know how to make it possible to make one to one.? I have
installed CSCT and scli tools on my server. How i configure my server to
make the network work?

THanks all!

Bai SHuwei

-- 
Love other people, as same as love yourself!
Don't think all the time, do it by your hands!

Personal URL: http://dslab.lzu.edu.cn:8080/members/baishw/
E-Mail: baishuwei at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091213/eb0a4d61/attachment.htm>

From baishuwei at gmail.com  Sun Dec 13 07:05:47 2009
From: baishuwei at gmail.com (Bai Shuwei)
Date: Sun, 13 Dec 2009 15:05:47 +0800
Subject: [Linux-cluster] LUN/LUN Masking
In-Reply-To: <7207d96f0912032026n7ef04c7ahc86b24ca482e3326@mail.gmail.com>
References: <f3566d60912031906v78f40992u349c16245149e349@mail.gmail.com>
	<7207d96f0912032026n7ef04c7ahc86b24ca482e3326@mail.gmail.com>
Message-ID: <f3566d60912122305y777b03ao7bf8e0b694489845@mail.gmail.com>

On Fri, Dec 4, 2009 at 12:26 PM, Fajar A. Nugraha <fajar at fajar.net> wrote:

> On Fri, Dec 4, 2009 at 10:06 AM, Bai Shuwei <baishuwei at gmail.com> wrote:
> > HI, everyone:
> >    I am a begginer on FC-SAN. On my machine i have installed
> > HBAs(QLogic2342) and SLCI tools. How I can setup the lun masking to
> > forbidden/allow hosts to access special LUN/Disk? Do I need some other
> > speccial tools to do it? Thanks all.
>
> AFAIK LUN masking is done on storage side, not client side.
>
> How I configure my storage to meet the requirement? Or which tool can help
me?

Thanks!

> --
> Fajar
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Love other people, as same as love yourself!
Don't think all the time, do it by your hands!

Personal URL: http://dslab.lzu.edu.cn:8080/members/baishw/
E-Mail: baishuwei at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091213/9713c76b/attachment.htm>

From arturogf at gmail.com  Mon Dec 14 08:49:04 2009
From: arturogf at gmail.com (Arturo Gonzalez Ferrer)
Date: Mon, 14 Dec 2009 09:49:04 +0100
Subject: [Linux-cluster] Adding a new node to rh cluster + GFS2
Message-ID: <d2aeadbf0912140049j29f3158dt9c860b3ee083a406@mail.gmail.com>

Dear all,

I'm in trouble with adding a new node to an existing cluster of three nodes
(so I want to have four), because it somehow doesn't let me access the
cluster infrastructure.

These 3 nodes were set up as http servers, sharing a GFS2 volume (physical:
vg_cluster, logical: lv_cluster) where data is stored.

I want to set up the new node to access the same GFS2 volume, with the idea
of exporting the data via NFS, so that a remote backup library can be
configured to backup nightly the data, by connecting to the new node.

I've tried a lot of things, always getting same kind of errors.

Running "cman_tool status" on any of the 3 nodes i get:

Version: 6.2.0
Config Version: 70
Cluster Name: campusvirtual
Cluster Id: 45794
Cluster Member: Yes
Cluster Generation: 1136
Membership state: Cluster-Member
Nodes: 3
Expected votes: 4
Total votes: 3
Quorum: 3
Active subsystems: 9
Flags: Dirty
Ports Bound: 0 11 177
Node name: cev01
Node ID: 2
Multicast addresses: 239.192.178.149
Node addresses: 150.214.243.20


while running "cman_tool status" on the new node:

Version: 6.2.0
Config Version: 70
Cluster Name: campusvirtual
Cluster Id: 45794
Cluster Member: Yes
Cluster Generation: 1124
Membership state: Cluster-Member
Nodes: 1
Expected votes: 4
Total votes: 1
Quorum: 3 Activity blocked
Active subsystems: 2
Flags:
Ports Bound: 0
Node name: cevstream.ugr.es
Node ID: 4
Multicast addresses: 239.192.178.149
Node addresses: 150.214.243.19


Running "fence_tool_dump" on the three nodes:

[root at cev01 ~]# fence_tool dump
dump read: Success
1260778939 our_nodeid 2 our_name cev01.ugr.es
1260778939 listen 4 member 5 groupd 7
1260778964 client 3: join default
1260778964 delay post_join 3s post_fail 0s
1260778964 added 4 nodes from ccs
1260778964 setid default 65538
1260778964 start default 1 members 2
1260778964 do_recovery stop 0 start 1 finish 0
1260778964 node "cevstream.ugr.es" not a cman member, cn 1
1260778964 add first victim cevstream.ugr.es
1260778965 node "cevstream.ugr.es" not a cman member, cn 1
1260778966 node "cevstream.ugr.es" not a cman member, cn 1
1260778967 node "cevstream.ugr.es" not a cman member, cn 1
1260778967 delay of 3s leaves 1 victims
1260778967 node "cevstream.ugr.es" not a cman member, cn 1
1260778967 node "cevstream.ugr.es" has not been fenced
1260778967 fencing node cevstream.ugr.es
1260778971 finish default 1
1260778971 stop default
1260778971 start default 2 members 3 2
1260778971 do_recovery stop 1 start 2 finish 1
1260778971 finish default 2
1260778971 stop default
1260778971 start default 3 members 1 3 2
1260778971 do_recovery stop 2 start 3 finish 2
1260778971 finish default 3
1260779876 client 3: dump

while running it in the new node:

[root at cevstream ~]# fence_tool dump
fence_tool: can't communicate with fenced

I get a lot of errors telling me that cluster is not quorate:

Dec 14 09:39:20 cevstream ccsd[3668]: Cluster is not quorate.  Refusing
connection.
Dec 14 09:39:20 cevstream ccsd[3668]: Error while processing connect:
Connection refused

Printing the superblock on any of the three nodes:

[root at cev01 ~]# gfs2_tool sb /dev/vg_cluster/lv_cluster all
  mh_magic = 0x01161970
  mh_type = 1
  mh_format = 100
  sb_fs_format = 1801
  sb_multihost_format = 1900
  sb_bsize = 4096
  sb_bsize_shift = 12
  no_formal_ino = 2
  no_addr = 23
  no_formal_ino = 1
  no_addr = 22
  sb_lockproto = lock_dlm
  sb_locktable = campusvirtual:gfs_cluster01
  uuid = C6A9FBB4-A881-2128-2AB8-1AB8547C7F30


I've tried something i saw in some forums, deactivating and even removing
the logical volume (with lvremove), because supposedly the new node could
need this operation in order to access the gfs2 volume.

Running lvcreate on the new node, with all the other nodes deactivates and
removed, i still get the error:

[root at cevstream ~]# lvcreate -l 100%FREE -n lv_cluster vg_cluster
  connect() failed on local socket: Conexi?n rehusada
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  Skipping clustered volume group vg_cluster

Find attached the configuration of cluster.conf.

I'm pretty desperate with this situation, i really don't know how to deal
with the adition of a new node.

Best regards,
Arturo.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091214/a24fda90/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: application/octet-stream
Size: 3463 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091214/a24fda90/attachment.obj>

From ccaulfie at redhat.com  Mon Dec 14 09:03:58 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Mon, 14 Dec 2009 09:03:58 +0000
Subject: [Linux-cluster] Adding a new node to rh cluster + GFS2
In-Reply-To: <d2aeadbf0912140049j29f3158dt9c860b3ee083a406@mail.gmail.com>
References: <d2aeadbf0912140049j29f3158dt9c860b3ee083a406@mail.gmail.com>
Message-ID: <4B25FF7E.5000203@redhat.com>

On 14/12/09 08:49, Arturo Gonzalez Ferrer wrote:
> Dear all,
>
> I'm in trouble with adding a new node to an existing cluster of three
> nodes (so I want to have four), because it somehow doesn't let me access
> the cluster infrastructure.
>
> These 3 nodes were set up as http servers, sharing a GFS2 volume
> (physical: vg_cluster, logical: lv_cluster) where data is stored.
>
> I want to set up the new node to access the same GFS2 volume, with the
> idea of exporting the data via NFS, so that a remote backup library can
> be configured to backup nightly the data, by connecting to the new node.
>
> I've tried a lot of things, always getting same kind of errors.
>
> Running "cman_tool status" on any of the 3 nodes i get:
>
> Version: 6.2.0
> Config Version: 70
> Cluster Name: campusvirtual
> Cluster Id: 45794
> Cluster Member: Yes
> Cluster Generation: 1136
> Membership state: Cluster-Member
> Nodes: 3
> Expected votes: 4
> Total votes: 3
> Quorum: 3
> Active subsystems: 9
> Flags: Dirty
> Ports Bound: 0 11 177
> Node name: cev01
> Node ID: 2
> Multicast addresses: 239.192.178.149
> Node addresses: 150.214.243.20
>
>
> while running "cman_tool status" on the new node:
>
> Version: 6.2.0
> Config Version: 70
> Cluster Name: campusvirtual
> Cluster Id: 45794
> Cluster Member: Yes
> Cluster Generation: 1124
> Membership state: Cluster-Member
> Nodes: 1

This is the key. The new node can't see the network traffic of the other 
three. The most likely explanation for this is iptables blocking the 
traffic.

But check other network connections and settings too - It's almost 
certainly a network configuration problem. The multicast and node 
addresses look fine to me.

Chrissie



From arturogf at gmail.com  Mon Dec 14 09:09:21 2009
From: arturogf at gmail.com (Arturo Gonzalez Ferrer)
Date: Mon, 14 Dec 2009 10:09:21 +0100
Subject: [Linux-cluster] Adding a new node to rh cluster + GFS2
In-Reply-To: <4B25FF7E.5000203@redhat.com>
References: <d2aeadbf0912140049j29f3158dt9c860b3ee083a406@mail.gmail.com>
	<4B25FF7E.5000203@redhat.com>
Message-ID: <d2aeadbf0912140109x176d83abjfba3d0b4ee874750@mail.gmail.com>

2009/12/14 Christine Caulfield <ccaulfie at redhat.com>

> On 14/12/09 08:49, Arturo Gonzalez Ferrer wrote:
>
>> Dear all,
>>
>> I'm in trouble with adding a new node to an existing cluster of three
>> nodes (so I want to have four), because it somehow doesn't let me access
>> the cluster infrastructure.
>>
>> These 3 nodes were set up as http servers, sharing a GFS2 volume
>> (physical: vg_cluster, logical: lv_cluster) where data is stored.
>>
>> I want to set up the new node to access the same GFS2 volume, with the
>> idea of exporting the data via NFS, so that a remote backup library can
>> be configured to backup nightly the data, by connecting to the new node.
>>
>> I've tried a lot of things, always getting same kind of errors.
>>
>> Running "cman_tool status" on any of the 3 nodes i get:
>>
>> Version: 6.2.0
>> Config Version: 70
>> Cluster Name: campusvirtual
>> Cluster Id: 45794
>> Cluster Member: Yes
>> Cluster Generation: 1136
>> Membership state: Cluster-Member
>> Nodes: 3
>> Expected votes: 4
>> Total votes: 3
>> Quorum: 3
>> Active subsystems: 9
>> Flags: Dirty
>> Ports Bound: 0 11 177
>> Node name: cev01
>> Node ID: 2
>> Multicast addresses: 239.192.178.149
>> Node addresses: 150.214.243.20
>>
>>
>> while running "cman_tool status" on the new node:
>>
>> Version: 6.2.0
>> Config Version: 70
>> Cluster Name: campusvirtual
>> Cluster Id: 45794
>> Cluster Member: Yes
>> Cluster Generation: 1124
>> Membership state: Cluster-Member
>> Nodes: 1
>>
>
> This is the key. The new node can't see the network traffic of the other
> three. The most likely explanation for this is iptables blocking the
> traffic.
>
> But check other network connections and settings too - It's almost
> certainly a network configuration problem. The multicast and node addresses
> look fine to me.
>

The iptables is deactivated in the new node:

[root at cevstream ~]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

as well as the selinux, is deactivated also.

Any other idea? I don't see the flag "dirty" in the new node. It has no
service associated, i don't know if this can means anything...

Cheers,
Arturo.


>
> Chrissie
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091214/ddff9c3f/attachment.htm>

From brettcave at gmail.com  Mon Dec 14 09:32:11 2009
From: brettcave at gmail.com (Brett Cave)
Date: Mon, 14 Dec 2009 11:32:11 +0200
Subject: [Linux-cluster] how to specify a fence method with "ccs_tool
	addnode"
Message-ID: <c0773fd30912140132l11e0fb3fs115af8f54346bd54@mail.gmail.com>

how would I go about specifying a fence method with ccs_tool? Can find much
documentation on fencing methods.

ccs_tool addfence myfence ....
ccs_tool addnode mynode -n X -f myfence

the above uses "single" fencing method, whereas I would like to specify
"fabric". What impact does specifying different method's have?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091214/24aea1aa/attachment.htm>

From brettcave at gmail.com  Mon Dec 14 09:36:09 2009
From: brettcave at gmail.com (Brett Cave)
Date: Mon, 14 Dec 2009 11:36:09 +0200
Subject: [Linux-cluster] more info on fencing
Message-ID: <c0773fd30912140136h30ec8705y7c36e74075d5be3d@mail.gmail.com>

I am using ilo fencing to reset servers, so this is more power fencing than
fabric fencing?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091214/35cb870f/attachment.htm>

From rmicmirregs at gmail.com  Mon Dec 14 22:15:09 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Mon, 14 Dec 2009 23:15:09 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
Message-ID: <1260828909.6558.24.camel@mecatol>

Hi all,

I was wondering if there is a way to achieve a "quorum disk over a RAID
software device" working CMAN cluster.

Explanation:

A) Environment
- 6 x different servers used as cluster nodes, with dual FC HBA
- 2 x different fabrics, each build with 3 FC SAN switches
- 2 x storage arrays, with 23 270GB LUNs of data each. 
- 1x Qdisk: a 24th LUN located in one of the storage arrays

B) Objectives
- All the 6 nodes must be able to mount and use any of the 2x23 LUNs of
data in the final configuration. Already done.
- Usage of a Qdisk for a last-man-standing configuration. Already done
(1 vote each node and 5 votes in the Qdisk device)

C) Flaws 
- Qdisk is located in ONE storage array. If there is a failure in that
storage array, 5 votes are lost. With only one cluster node failing
there won't be quorum. This means that with 5 nodes and an storage array
operative I will lose quorum.

D) Possible Fixes
- Using 2 quorum disks: Not implemented yet
http://sources.redhat.com/cluster/wiki/MultiQuorumDisk

- Using an LVM-Mirror device as a Qdisk and creating additional LUNs for
mirror and log in both storage arrays: if the Qdisk is a Clustered
Logical Volume, it won't be available in the CMAN start phase due CLVMD
(and CMIRROR) is needed to have access to clustered logical volumes and
CLVMD won't be running if CMAN is not running yet.
Question: is it really necessary to use a Clustered Logical Volume for
the Qdisk? Is there any problem in NOT using a clustered volume? 

- Using an Software Raid (MDRAID) device as a Qdisk and creating an
additional LUN in the second storage array: 
Each cluster node will use the MD device as de Qdisk. Do you see any
problem in this proposal? 


E) Possible Flaws
- With LVM-Mirror: what would happen if one of the underlying disks of
the Qdisk fails in only a part of the cluster nodes? You can imagine in
a lun-masking problem of the storage array controller or in an admin
making a mistake, which would result in some nodes losing the access to
one of the disks.
What would happen when the disk when it's fully on-line again?

- With MDRAID: same questions.


Of course, any idea os proposal is welcome. Thanks in advance. Cheers,

Rafael

-- 
Rafael Mic? Miranda



From raju.rajsand at gmail.com  Tue Dec 15 03:27:05 2009
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Tue, 15 Dec 2009 08:57:05 +0530
Subject: [Linux-cluster] more info on fencing
In-Reply-To: <c0773fd30912140136h30ec8705y7c36e74075d5be3d@mail.gmail.com>
References: <c0773fd30912140136h30ec8705y7c36e74075d5be3d@mail.gmail.com>
Message-ID: <8786b91c0912141927m48f244bdr3d613bc284292c9a@mail.gmail.com>

Greetings,

I am not an expert in cluster.

On Mon, Dec 14, 2009 at 3:06 PM, Brett Cave <brettcave at gmail.com> wrote:
> I am using ilo fencing to reset servers, so this is more power fencing than
> fabric fencing?

Following is my understanding of fencing:

There are three type of fencing (excluding manual):
1. Power fencing -- using IP enabled power strips
2. In-band fencing -- using RSA, ILO, IPMI and the such -- sorta power fencing
3. Storage fencing -- using the SAN fabric Switch

Please correct me if I am wrong

Regards

Rajagopal



From brettcave at gmail.com  Tue Dec 15 09:30:41 2009
From: brettcave at gmail.com (Brett Cave)
Date: Tue, 15 Dec 2009 11:30:41 +0200
Subject: [Linux-cluster] more info on fencing
In-Reply-To: <8786b91c0912141927m48f244bdr3d613bc284292c9a@mail.gmail.com>
References: <c0773fd30912140136h30ec8705y7c36e74075d5be3d@mail.gmail.com>
	<8786b91c0912141927m48f244bdr3d613bc284292c9a@mail.gmail.com>
Message-ID: <c0773fd30912150130y1b311bd7p2cc6aa90784a5497@mail.gmail.com>

On Tue, Dec 15, 2009 at 5:27 AM, Rajagopal Swaminathan <
raju.rajsand at gmail.com> wrote:

> Greetings,
>
> I am not an expert in cluster.
>
> On Mon, Dec 14, 2009 at 3:06 PM, Brett Cave <brettcave at gmail.com> wrote:
> > I am using ilo fencing to reset servers, so this is more power fencing
> than
> > fabric fencing?
>
> Following is my understanding of fencing:
>
> There are three type of fencing (excluding manual):
> 1. Power fencing -- using IP enabled power strips
> 2. In-band fencing -- using RSA, ILO, IPMI and the such -- sorta power
> fencing
> 3. Storage fencing -- using the SAN fabric Switch
>

Ah, ok - in-band fencing it is, thanks. With regards to my previous post, I
assumed that the name applied to the fencing method of a node had some sort
of impact, but from what i can see, it seems to just be for reference.

I used to maintain consistency in the config file (which i editted
manually):
<clusternode ...><fence><method name="fabric">....</></>
the method name was originally fabric as i was going to use san fabric
switching, but this did not work - ILO fencing works well for us however, so
"inband" would be a more relevant name.

we are making changes that now result in configuration being updated via
ccs_tool, which doesn't seem to provide an parameter to configure the method
name, and defaults to "single" - so this is inconsistent with naming, but
with no real functional affects on the config.

thanks again.

>
> Please correct me if I am wrong
>
> Regards
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091215/1665917e/attachment.htm>

From jakov.sosic at srce.hr  Tue Dec 15 10:58:34 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Tue, 15 Dec 2009 11:58:34 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <1260828909.6558.24.camel@mecatol>
References: <1260828909.6558.24.camel@mecatol>
Message-ID: <1260874714.9719.1.camel@localhost>

On Mon, 2009-12-14 at 23:15 +0100, Rafael Mic? Miranda wrote:

> - Using an LVM-Mirror device as a Qdisk and creating additional LUNs for
> mirror and log in both storage arrays: if the Qdisk is a Clustered
> Logical Volume,

But is it possible to have clustered LVM-mirror? And if so, how? I would
be very interested in something like that...


Sorry that I haven't helped you out with this one, but if there is
possibility to have mirrored volumes I would be very interested...
Because it would solve lot of my problems...



-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
|                                                               |




From apfaffeneder at pfaffeneder.org  Tue Dec 15 14:31:24 2009
From: apfaffeneder at pfaffeneder.org (Andreas Pfaffeneder)
Date: Tue, 15 Dec 2009 15:31:24 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <1260828909.6558.24.camel@mecatol>
References: <1260828909.6558.24.camel@mecatol>
Message-ID: <4B279DBC.4090102@pfaffeneder.org>

Hi Rafael,

Am 14.12.2009 23:15, schrieb Rafael Mic? Miranda:
> Hi all,
>
> I was wondering if there is a way to achieve a "quorum disk over a RAID
> software device" working CMAN cluster.
>
>    
in a similar situation I am using a raid-1 device (built with mdadm 
prior to the startup of cman/rgmanager) which consists of two luns, one 
in each location. This works pretty well as quorum-device.

Andreas



From brem.belguebli at gmail.com  Tue Dec 15 16:21:34 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Tue, 15 Dec 2009 17:21:34 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <4B279DBC.4090102@pfaffeneder.org>
References: <1260828909.6558.24.camel@mecatol>
	<4B279DBC.4090102@pfaffeneder.org>
Message-ID: <29ae894c0912150821w607b9edcv75443186035b0c5c@mail.gmail.com>

Hi,

The problem you could encounter is the network and storage split brain.

If your Qdsik LUNs were hosted by 2 arrays located in 2 different
rooms or site, each room hosting half the nodes of your cluster, in
case a SAN and network partition occurs between the 2 rooms, you'll
find yourself in a perfect storage and network split brain.

Each room having the same number of nodes and accessing one leg of
your qdisk, each qdisk leg being seen "alive" by the nodes in the
room.

Brem


2009/12/15 Andreas Pfaffeneder <apfaffeneder at pfaffeneder.org>:
> Hi Rafael,
>
> Am 14.12.2009 23:15, schrieb Rafael Mic? Miranda:
>>
>> Hi all,
>>
>> I was wondering if there is a way to achieve a "quorum disk over a RAID
>> software device" working CMAN cluster.
>>
>>
>
> in a similar situation I am using a raid-1 device (built with mdadm prior to
> the startup of cman/rgmanager) which consists of two luns, one in each
> location. This works pretty well as quorum-device.
>
> Andreas
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From jakov.sosic at srce.hr  Tue Dec 15 16:26:54 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Tue, 15 Dec 2009 17:26:54 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <4B279DBC.4090102@pfaffeneder.org>
References: <1260828909.6558.24.camel@mecatol>
	<4B279DBC.4090102@pfaffeneder.org>
Message-ID: <1260894414.1878.1.camel@localhost>

On Tue, 2009-12-15 at 15:31 +0100, Andreas Pfaffeneder wrote:

> in a similar situation I am using a raid-1 device (built with mdadm 
> prior to the startup of cman/rgmanager) which consists of two luns, one 
> in each location. This works pretty well as quorum-device.

So you have to create mdraid on every node of the cluster? But, is that
legitimate way of doing things - because mdraid isn't cluster aware?
It's like having a LVM without using clustered volumes... It's ok as
long as you don't change metadata...

What about mdraid?



-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
|                                                               |




From rmicmirregs at gmail.com  Tue Dec 15 18:51:16 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Tue, 15 Dec 2009 19:51:16 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <1260874714.9719.1.camel@localhost>
References: <1260828909.6558.24.camel@mecatol>
	<1260874714.9719.1.camel@localhost>
Message-ID: <1260903076.7153.1.camel@mecatol>

Hi Jakov,

El mar, 15-12-2009 a las 11:58 +0100, Jakov Sosic escribi?:
> On Mon, 2009-12-14 at 23:15 +0100, Rafael Mic? Miranda wrote:
> 
> > - Using an LVM-Mirror device as a Qdisk and creating additional LUNs for
> > mirror and log in both storage arrays: if the Qdisk is a Clustered
> > Logical Volume,
> 
> But is it possible to have clustered LVM-mirror? And if so, how? I would
> be very interested in something like that...
> 
> 
> Sorry that I haven't helped you out with this one, but if there is
> possibility to have mirrored volumes I would be very interested...
> Because it would solve lot of my problems...
> 
> 
> 

Maybe this helps:

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirrored_volumes.html
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirvol_create_ex.html

The point is, as I said, I cannot use a clustered logical volume (I mean
a logical volume over a clustered volume group) because it wont be
available as CMAN starts.

Cheers,

Rafael

-- 
Rafael Mic? Miranda



From rmicmirregs at gmail.com  Tue Dec 15 19:01:00 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Tue, 15 Dec 2009 20:01:00 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <4B279DBC.4090102@pfaffeneder.org>
References: <1260828909.6558.24.camel@mecatol>
	<4B279DBC.4090102@pfaffeneder.org>
Message-ID: <1260903660.7153.12.camel@mecatol>

Hi Andreas

El mar, 15-12-2009 a las 15:31 +0100, Andreas Pfaffeneder escribi?:
> Hi Rafael,
> 
> Am 14.12.2009 23:15, schrieb Rafael Mic? Miranda:
> > Hi all,
> >
> > I was wondering if there is a way to achieve a "quorum disk over a RAID
> > software device" working CMAN cluster.
> >
> >    
> in a similar situation I am using a raid-1 device (built with mdadm 
> prior to the startup of cman/rgmanager) which consists of two luns, one 
> in each location. This works pretty well as quorum-device.
> 
> Andreas
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Today I tried this approach, but I have no previous experience with
MDADM. The problem I found was, how do you manage multipath on the
different LUNs without device-mapper-multipath?

As I see in the system logs the MD driver loads before
device-mapper-multipath is working, and it uses the devices (/dev/sdX)
to assemble the RAID devices (/dev/mdX) before the devices are available
through device-mapper-multipath (/dev/mapper/quorumdiskX) at system boot
time. This even makes the device-mapper-multipath devices not getting
built when multipathd starts after that. Does this happen to you? 

So, I tried to configure a multipath device with MDADM and after that
use the built devices to assemble a RAID1 MD device. This was the config
in mdadm.conf I used:

DEVICE /dev/sd*

ARRAY /dev/md1 metadata=1.1 level=multipath num-devices=2
name=multipath01 
ARRAY /dev/md2 metadata=1.1 level=multipath num-devices=2
name=multipath02
ARRAY /dev/md3 metadata=1.2 level=raid1 num-devices=2 name=quorum
devices=/dev/md1,/dev/md2

Devices where all with the "Linux raid auto" partition mark. When I
built the MDs first everything seemed to work, but after a system boot
the /dev/md3 device was not built, so the quorum disk would not be
available for CMAN.

How do you implement this with MDADM?

Thanks in advance,

Rafael

-- 
Rafael Mic? Miranda



From rmicmirregs at gmail.com  Tue Dec 15 19:10:16 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Tue, 15 Dec 2009 20:10:16 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <29ae894c0912150821w607b9edcv75443186035b0c5c@mail.gmail.com>
References: <1260828909.6558.24.camel@mecatol>
	<4B279DBC.4090102@pfaffeneder.org>
	<29ae894c0912150821w607b9edcv75443186035b0c5c@mail.gmail.com>
Message-ID: <1260904216.7153.22.camel@mecatol>

Hi Brem

El mar, 15-12-2009 a las 17:21 +0100, brem belguebli escribi?:
> Hi,
> 
> The problem you could encounter is the network and storage split brain.
> 
> If your Qdsik LUNs were hosted by 2 arrays located in 2 different
> rooms or site, each room hosting half the nodes of your cluster, in
> case a SAN and network partition occurs between the 2 rooms, you'll
> find yourself in a perfect storage and network split brain.
> 
> Each room having the same number of nodes and accessing one leg of
> your qdisk, each qdisk leg being seen "alive" by the nodes in the
> room.
> 
> Brem
> 

> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

I thought about this. In my situation:

- All the nodes are in the same site.
- All the nodes are connected to the two storage arrays via the same FC
switches in a symmetric way. 
- All the nodes have their network interfaces connected to the same
couple of Ethernet Switches in a symmetric way via bonding.

I think the probability of failing exactly the devices that should fail
(5 exact FC ports in one FC switch, another 5 on the another FC switch
and a "split" in the Ethernet switches themselves exactly dividing the
nodes in groups of 3) is pretty small.

I see you exposed your point with the idea of a multi-site cluster with
the 2 qdisk LUNs placed in different sites and cluster nodes in both of
them, but this is not the case. But that is, in fact, a really
interesting scenario :)

Thanks for your interest. Cheers,

Rafael

-- 
Rafael Mic? Miranda



From rmicmirregs at gmail.com  Tue Dec 15 19:23:23 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Tue, 15 Dec 2009 20:23:23 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <1260894414.1878.1.camel@localhost>
References: <1260828909.6558.24.camel@mecatol>
	<4B279DBC.4090102@pfaffeneder.org> <1260894414.1878.1.camel@localhost>
Message-ID: <1260905003.7153.34.camel@mecatol>

Hi Jacov

El mar, 15-12-2009 a las 17:26 +0100, Jakov Sosic escribi?:
> On Tue, 2009-12-15 at 15:31 +0100, Andreas Pfaffeneder wrote:
> 
> > in a similar situation I am using a raid-1 device (built with mdadm 
> > prior to the startup of cman/rgmanager) which consists of two luns, one 
> > in each location. This works pretty well as quorum-device.
> 
> So you have to create mdraid on every node of the cluster? But, is that
> legitimate way of doing things - because mdraid isn't cluster aware?
> It's like having a LVM without using clustered volumes... It's ok as
> long as you don't change metadata...
> 
> What about mdraid?
> 
> 
> 

As I see, in this situation of the usage of the shared storage volume as
a Qdisk there is no problem of the system being "not cluster aware". I
mean: a usual qdisk is a LUN with a "clustered" filesystem, to say it in
some way, in which all the cluster nodes can write an read at the same
time.

If you don't plan to change the LVM metadata of the qdisk (I don't) I
think this will be feasible. The same should happen with the MDADM
variant.

Today I configured a not-clustered volume group and then I built a
mirrored logical volume over it and configured it as a Qdisk. Then I
started CMAN and it worked OK using the LVM-mirror qdisk. 

Tomorrow (I hope) I'll do some tests to see what happens if only one of
the nodes loses one of the LUNs which build the LVM-mirror volume, and
what happens when the LUN is back.

Thanks for your interest. Cheers,

Rafael

-- 
Rafael Mic? Miranda



From apfaffeneder at pfaffeneder.org  Tue Dec 15 19:38:26 2009
From: apfaffeneder at pfaffeneder.org (Andreas Pfaffeneder)
Date: Tue, 15 Dec 2009 20:38:26 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <1260903660.7153.12.camel@mecatol>
References: <1260828909.6558.24.camel@mecatol>	<4B279DBC.4090102@pfaffeneder.org>
	<1260903660.7153.12.camel@mecatol>
Message-ID: <4B27E5B2.20005@pfaffeneder.org>

Am 15.12.2009 20:01, schrieb Rafael Mic? Miranda:
> in a similar situation I am using a raid-1 device (built with mdadm
>> prior to the startup of cman/rgmanager) which consists of two luns, one
>> in each location. This works pretty well as quorum-device.
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>      
> Today I tried this approach, but I have no previous experience with
> MDADM. The problem I found was, how do you manage multipath on the
> different LUNs without device-mapper-multipath?
> [
>    
[...]
> How do you implement this with MDADM?
>
>    
with a custom init-script which runs after multipathd but before 
cman/rgmanager.

Andreas



From brem.belguebli at gmail.com  Tue Dec 15 20:15:30 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Tue, 15 Dec 2009 21:15:30 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <1260905003.7153.34.camel@mecatol>
References: <1260828909.6558.24.camel@mecatol>
	<4B279DBC.4090102@pfaffeneder.org> <1260894414.1878.1.camel@localhost>
	<1260905003.7153.34.camel@mecatol>
Message-ID: <29ae894c0912151215g433305ebncfde15dd10e124ea@mail.gmail.com>

Hi Rafael,

I can already predict what is going to happen during your test

I one of your nodes looses only 1 leg of your mirrored qdisk (either
with mdadm or lvm), the qdisk will still be active from the point of
view of this particular node, so nothing will happen.

What you should consider is

1) reducing the scsi timeout of the lun which is by default around 60
seconds (see udev rules)
2) if your qdisk lun is configured to multipath, don't configure it
with queue_if_no_path or mdadm will never see if one of the legs came
to be unavail.

Brem

2009/12/15 Rafael Mic? Miranda <rmicmirregs at gmail.com>:
> Hi Jacov
>
> El mar, 15-12-2009 a las 17:26 +0100, Jakov Sosic escribi?:
>> On Tue, 2009-12-15 at 15:31 +0100, Andreas Pfaffeneder wrote:
>>
>> > in a similar situation I am using a raid-1 device (built with mdadm
>> > prior to the startup of cman/rgmanager) which consists of two luns, one
>> > in each location. This works pretty well as quorum-device.
>>
>> So you have to create mdraid on every node of the cluster? But, is that
>> legitimate way of doing things - because mdraid isn't cluster aware?
>> It's like having a LVM without using clustered volumes... It's ok as
>> long as you don't change metadata...
>>
>> What about mdraid?
>>
>>
>>
>
> As I see, in this situation of the usage of the shared storage volume as
> a Qdisk there is no problem of the system being "not cluster aware". I
> mean: a usual qdisk is a LUN with a "clustered" filesystem, to say it in
> some way, in which all the cluster nodes can write an read at the same
> time.
>
> If you don't plan to change the LVM metadata of the qdisk (I don't) I
> think this will be feasible. The same should happen with the MDADM
> variant.
>
> Today I configured a not-clustered volume group and then I built a
> mirrored logical volume over it and configured it as a Qdisk. Then I
> started CMAN and it worked OK using the LVM-mirror qdisk.
>
> Tomorrow (I hope) I'll do some tests to see what happens if only one of
> the nodes loses one of the LUNs which build the LVM-mirror volume, and
> what happens when the LUN is back.
>
> Thanks for your interest. Cheers,
>
> Rafael
>
> --
> Rafael Mic? Miranda
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From jakov.sosic at srce.hr  Wed Dec 16 00:02:19 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Wed, 16 Dec 2009 01:02:19 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <1260903076.7153.1.camel@mecatol>
References: <1260828909.6558.24.camel@mecatol>
	<1260874714.9719.1.camel@localhost> <1260903076.7153.1.camel@mecatol>
Message-ID: <1260921739.2754.1.camel@localhost>

On Tue, 2009-12-15 at 19:51 +0100, Rafael Mic? Miranda wrote:

> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirrored_volumes.html
> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirvol_create_ex.html

Thank you...

This seems as a good replacement for DRBD, except that after one side of
the mirror failse, whole logical volume would be synced from the start
(because I presume there is no wfbitmap like in drbd)?


-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
|                                                               |




From kkovachev at varna.net  Wed Dec 16 11:41:11 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Wed, 16 Dec 2009 13:41:11 +0200
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <1260921739.2754.1.camel@localhost>
References: <1260828909.6558.24.camel@mecatol>
	<1260874714.9719.1.camel@localhost>
	<1260903076.7153.1.camel@mecatol>
	<1260921739.2754.1.camel@localhost>
Message-ID: <20091216104820.M24883@varna.net>

On Wed, 16 Dec 2009 01:02:19 +0100, Jakov Sosic wrote
> On Tue, 2009-12-15 at 19:51 +0100, Rafael [UTF-8?]Mic?? Miranda wrote:
> 
> > 
[1]
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirrored_volumes.html
> >
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirvol_create_ex.html
> 
> Thank you...
> 
> This seems as a good replacement for DRBD, except that after one side of
> the mirror failse, whole logical volume would be synced from the start
> (because I presume there is no wfbitmap like in drbd)?
> 

from [1] "An LVM mirror divides the device being copied into regions that are
typically 512KB in size. LVM maintains a small log which it uses to keep track
of which regions are in sync with the mirror or mirrors. This log can be kept
on disk, which will keep it persistent across reboots, or it can be maintained
in memory." - so they shouldn't be synced from start

About the 6 node cluster - do you really need to have it operational with just
a single node? If this is not mandatory it might be better to use different
votes for the nodes to break the tie instead of mirrored qdisk (one more place
for split brain) ... like 3 nodes with 2 votes and the others with 3 votes or
a combination with non mirrored qdisk (with 4 votes)

> -- 
> |    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
> =================================================================
> |                                                               |
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From Alain.Moulle at bull.net  Wed Dec 16 13:53:27 2009
From: Alain.Moulle at bull.net (Alain.Moulle)
Date: Wed, 16 Dec 2009 14:53:27 +0100
Subject: [Linux-cluster] Question about openais rrp_mode
Message-ID: <4B28E657.3090400@bull.net>

Hi,
> man openais.conf : rrp_mode
>               This specifies the mode of redundant ring, which may  
> be  none,
>               active,  or  passive.  Active replication offers 
> slightly lower
>               latency from transmit to delivery in  faulty  network  
> environ-
>               ments  but  with  less  performance.   Passive  
> replication may
>               nearly double the speed of the totem protocol if  the  
> protocol
>               doesn't  become  cpu bound.
Not completely clear for me:  does that mean that "active mode" makes it
send the totems systematically on both networks, and "passive mode" makes
it send on the first interface ringnumber (in openais.conf) and only on
the second interface rignnumber if the first is broken ?
Could someone give more precise information ?
or where can I find more information about this ?

And by the way, is there any issue to use to set a first interface 
ringnumber
on Ethernet (eth0) and a second on IP/Infiniband ?

Thanks for your response.
Alain Moull?



From gianluca.cecchi at gmail.com  Wed Dec 16 14:51:06 2009
From: gianluca.cecchi at gmail.com (Gianluca Cecchi)
Date: Wed, 16 Dec 2009 15:51:06 +0100
Subject: [Linux-cluster] actions to be taken when changing fence devices ip
	address
Message-ID: <561c252c0912160651y79ca70fk618173542a249464@mail.gmail.com>

Hello,
I'm using RHEL 5.4 based cluster.
I'm using fence_ilo fence device and I'm going to change ip address
for the iLO of one node of the cluster.

Is this action supposed to be made near-online, in the sense that I
have not to shutdown all the cluster nodes?
Idea would be:
1) services remains on node where iLO ip doesn't change
2) shutdown and change iLO ip of the other node (actually it is a
server swap maintaining its disks)
3) on first node change cluster.conf and issue of
    ccs_tool update /etc/cluster/cluster.conf
    cman_tool version -r <new_version_number>
4) power on of second node

will action 4) give an automatic join with new config of node 2 to the
cluster? Or do I have to make anything with fenced to reload its
config?

My question arises from past experience with need of changing qdisk
parameters in cluster.conf: this requires a qdiskd restart, with steps
in 3) not being sufficient...
Do I have to restart fenced? In this case does this produce any
problem/relocation?

Thanks in advance,
Gianluca



From jakov.sosic at srce.hr  Wed Dec 16 17:50:09 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Wed, 16 Dec 2009 18:50:09 +0100
Subject: [Linux-cluster] 2 pptp links on two hosts
Message-ID: <1260985809.2168.6.camel@localhost>

Hi.

I have two pptp links on two hosts. Hosts are frontends (gateways,
firewalls, NAT) for some network. Hosts also must be gateways for all
the VLANs.

Now, two things in this case can fail - one host, or for example it's
pptp route, in which case again gateway and static routes should be
transferred to the secondary node.

Is there a way to solve this with RHCS, or is there any more appropriate
software for this kind of failover?

My initial idea when I heard the problem was to write something like
init script, which in status part pings some address behind PPTP link,
and if ping is OK, than service is considered OK. Now, if for some
reason ping fails, status wouldn't be 0, and RHCS would apply relocate
policy, stop the script on primary and start it on secondary. stop
function would delete all the routes, and start would set appropriate
static routes. RHCS itself would take care of the floating address of
the gateway.

I wonder if you have any experience with this kind of setup, or any
ideas if this could be done in any better way?

Thank you.




-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
|                                                               |




From rmicmirregs at gmail.com  Wed Dec 16 18:50:26 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Wed, 16 Dec 2009 19:50:26 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <20091216104820.M24883@varna.net>
References: <1260828909.6558.24.camel@mecatol>
	<1260874714.9719.1.camel@localhost> <1260903076.7153.1.camel@mecatol>
	<1260921739.2754.1.camel@localhost> <20091216104820.M24883@varna.net>
Message-ID: <1260989426.6687.3.camel@mecatol>

Hi Kaloyan

El mi?, 16-12-2009 a las 13:41 +0200, Kaloyan Kovachev escribi?:

> About the 6 node cluster - do you really need to have it operational with just
> a single node? If this is not mandatory it might be better to use different
> votes for the nodes to break the tie instead of mirrored qdisk (one more place
> for split brain) ... like 3 nodes with 2 votes and the others with 3 votes or
> a combination with non mirrored qdisk (with 4 votes)
> 
> >

Well, this is a thing I have to think about. Maybe only one node cannot
give the full service due to load and performance reasons, but I think
the Qdisk is a must in the service for availability reasons. I'll take
note on your recommendation and maybe i change the votes to make the
minimal number of nodes higher, possibly 2.

Thanks!

Rafael

-- 
Rafael Mic? Miranda



From rmicmirregs at gmail.com  Wed Dec 16 19:09:04 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Wed, 16 Dec 2009 20:09:04 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <29ae894c0912151215g433305ebncfde15dd10e124ea@mail.gmail.com>
References: <1260828909.6558.24.camel@mecatol>
	<4B279DBC.4090102@pfaffeneder.org> <1260894414.1878.1.camel@localhost>
	<1260905003.7153.34.camel@mecatol>
	<29ae894c0912151215g433305ebncfde15dd10e124ea@mail.gmail.com>
Message-ID: <1260990544.6687.23.camel@mecatol>

Hi Brem

El mar, 15-12-2009 a las 21:15 +0100, brem belguebli escribi?:
> Hi Rafael,
> 
> I can already predict what is going to happen during your test
> 
> I one of your nodes looses only 1 leg of your mirrored qdisk (either
> with mdadm or lvm), the qdisk will still be active from the point of
> view of this particular node, so nothing will happen.
> 
> What you should consider is
> 
> 1) reducing the scsi timeout of the lun which is by default around 60
> seconds (see udev rules)
> 2) if your qdisk lun is configured to multipath, don't configure it
> with queue_if_no_path or mdadm will never see if one of the legs came
> to be unavail.
> 
> Brem

> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

I made some tests today. 

A) With MDADM mirrored LUNs:
I built the MD device over the multipathd devices and used it as a
quorum disk. It seemed to work, but in a test during the intentioned
failure of a LUN on a single machine the node failed to access the
quorum device, so it was evicted by the rest of the nodes. I have to
take a closer look to this because in other attempts it didn't happen, I
think this is realated with the device timeouts, retries and queues.

B) With non-clustered LVM-Mirrored LUNs:
Seems to work too, but there are some strange behaviours. During the
intentioned failure of a LUN on a single machine the node did not see
the failure at the LVM layer of one device not being reachable, but the
multipath daemon was marking the device as failed. In other attempts it
worked right.

Also I have to check, as you commented, the values at the udev rules and
multipath.conf file:

device {
vendor			"HP"
product			"MSA VOLUME" 
path_grouping_policy	group_by_prio
getuid_callout		"/sbin/scsi_id -g -u -s /block/%n"
path_checket		tur
patch_selector		"round_robin 0"
prio_callout		"/sbin/mpath_prio_alua /dev/%n"
rr_weight		uniform
failback		immediate
hardware_handler	"0"
no_path_retry		12
rr_min_io		100
}

Note: this is my testing scenario. The production environment is not
using MSA storage arrays.

I'm thinking in reducing the "no_path_retry" to a smaller value or even
to "fail". With the current value (equivalent to "queue_if_no_path" of
12 regarding RHEL docs) MDADM saw the failure of the device, so this is
more or less working. 
I'm interested too in the "flush_on_last_del" parameter, have you ever
tried it?

Thanks in advance. Cheers,

Rafael

-- 
Rafael Mic? Miranda



From brem.belguebli at gmail.com  Wed Dec 16 19:13:02 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Wed, 16 Dec 2009 20:13:02 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <1260989426.6687.3.camel@mecatol>
References: <1260828909.6558.24.camel@mecatol>
	<1260874714.9719.1.camel@localhost> <1260903076.7153.1.camel@mecatol>
	<1260921739.2754.1.camel@localhost> <20091216104820.M24883@varna.net>
	<1260989426.6687.3.camel@mecatol>
Message-ID: <29ae894c0912161113u76528a3em5532005a6407b177@mail.gmail.com>

Rafael,

What ou have to take care about is the following.

Imagine your SAN admin modifies the wrong zoning while doing his job,
making the qdisk (both legs) unavailable for your nodes, and at this
time you have one node off because of maintenance operation, your
whole cluster would go down.

Brem

2009/12/16 Rafael Mic? Miranda <rmicmirregs at gmail.com>:
> Hi Kaloyan
>
> El mi?, 16-12-2009 a las 13:41 +0200, Kaloyan Kovachev escribi?:
>
>> About the 6 node cluster - do you really need to have it operational with just
>> a single node? If this is not mandatory it might be better to use different
>> votes for the nodes to break the tie instead of mirrored qdisk (one more place
>> for split brain) ... like 3 nodes with 2 votes and the others with 3 votes or
>> a combination with non mirrored qdisk (with 4 votes)
>>
>> >
>
> Well, this is a thing I have to think about. Maybe only one node cannot
> give the full service due to load and performance reasons, but I think
> the Qdisk is a must in the service for availability reasons. I'll take
> note on your recommendation and maybe i change the votes to make the
> minimal number of nodes higher, possibly 2.
>
> Thanks!
>
> Rafael
>
> --
> Rafael Mic? Miranda
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From brem.belguebli at gmail.com  Wed Dec 16 19:41:26 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Wed, 16 Dec 2009 20:41:26 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <1260990544.6687.23.camel@mecatol>
References: <1260828909.6558.24.camel@mecatol>
	<4B279DBC.4090102@pfaffeneder.org> <1260894414.1878.1.camel@localhost>
	<1260905003.7153.34.camel@mecatol>
	<29ae894c0912151215g433305ebncfde15dd10e124ea@mail.gmail.com>
	<1260990544.6687.23.camel@mecatol>
Message-ID: <29ae894c0912161141t1085baf7t6bbba32a82820bc1@mail.gmail.com>

In my multipath setup I use the following :

polling_interval        3 (checks the storage every 3 seconds)
no_path_retry     5 (will check 5 times the path if failure happens on
it, making it  last scsi_timer (/sys/block/sdXX/device/timeout) + 5*3
secondes )

path_grouping_policy    multibus (to load-balance accross all paths,
group_by_prio may be recommended with MSA if it is an active/passive
array?)

>From my experience,
no_path_retry, when using mirror (md or LVM) could be put to fail
instead of 5 in my case.

Concerning the flush_on_last_del, it just means that for a given LUN,
when there is only one path remaining, if it comes to fail, what
behaviour to adopt.

Same consideration, if using mirror, just fail.

The thing to take into account is the interval at which your qdisk
process accesses the qdisk lun, if configured to a high value (let's
imagine every 65 seconds) it'll take (worst case) 60 seconds of scsi
timeout (default) + 12 times default polling interval  (30 seconds if
I'm not wrong) + 5 seconds=  425 seconds.....

Brem

2009/12/16 Rafael Mic? Miranda <rmicmirregs at gmail.com>:
> Hi Brem
>
> El mar, 15-12-2009 a las 21:15 +0100, brem belguebli escribi?:
>> Hi Rafael,
>>
>> I can already predict what is going to happen during your test
>>
>> I one of your nodes looses only 1 leg of your mirrored qdisk (either
>> with mdadm or lvm), the qdisk will still be active from the point of
>> view of this particular node, so nothing will happen.
>>
>> What you should consider is
>>
>> 1) reducing the scsi timeout of the lun which is by default around 60
>> seconds (see udev rules)
>> 2) if your qdisk lun is configured to multipath, don't configure it
>> with queue_if_no_path or mdadm will never see if one of the legs came
>> to be unavail.
>>
>> Brem
>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> I made some tests today.
>
> A) With MDADM mirrored LUNs:
> I built the MD device over the multipathd devices and used it as a
> quorum disk. It seemed to work, but in a test during the intentioned
> failure of a LUN on a single machine the node failed to access the
> quorum device, so it was evicted by the rest of the nodes. I have to
> take a closer look to this because in other attempts it didn't happen, I
> think this is realated with the device timeouts, retries and queues.
>
> B) With non-clustered LVM-Mirrored LUNs:
> Seems to work too, but there are some strange behaviours. During the
> intentioned failure of a LUN on a single machine the node did not see
> the failure at the LVM layer of one device not being reachable, but the
> multipath daemon was marking the device as failed. In other attempts it
> worked right.
>
> Also I have to check, as you commented, the values at the udev rules and
> multipath.conf file:
>
> device {
> vendor ? ? ? ? ? ? ? ? ?"HP"
> product ? ? ? ? ? ? ? ? "MSA VOLUME"
> path_grouping_policy ? ?group_by_prio
> getuid_callout ? ? ? ? ?"/sbin/scsi_id -g -u -s /block/%n"
> path_checket ? ? ? ? ? ?tur
> patch_selector ? ? ? ? ?"round_robin 0"
> prio_callout ? ? ? ? ? ?"/sbin/mpath_prio_alua /dev/%n"
> rr_weight ? ? ? ? ? ? ? uniform
> failback ? ? ? ? ? ? ? ?immediate
> hardware_handler ? ? ? ?"0"
> no_path_retry ? ? ? ? ? 12
> rr_min_io ? ? ? ? ? ? ? 100
> }
>
> Note: this is my testing scenario. The production environment is not
> using MSA storage arrays.
>
> I'm thinking in reducing the "no_path_retry" to a smaller value or even
> to "fail". With the current value (equivalent to "queue_if_no_path" of
> 12 regarding RHEL docs) MDADM saw the failure of the device, so this is
> more or less working.
> I'm interested too in the "flush_on_last_del" parameter, have you ever
> tried it?
>
> Thanks in advance. Cheers,
>
> Rafael
>
> --
> Rafael Mic? Miranda
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From siedler at hrd-asia.com  Thu Dec 17 07:41:13 2009
From: siedler at hrd-asia.com (Wolf Siedler)
Date: Thu, 17 Dec 2009 15:41:13 +0800
Subject: [Linux-cluster] Cluster config. advice for sought
Message-ID: <4B29E099.6090105@hrd-asia.com>

Dear all:

I am new to this list and cluster technology. Anyway, I managed to get a
cluster set up based on CentOS 5 with two nodes which worked very well
for several months.
Even several CentOS update rounds (all within version 5) worked flawlessly.
The cluster contains three paravirtualized Xen-based virtual machines in
an iSCSI storage vault. Even failover and failback worked perfectly.
Cluster control/management was handled by a separate standalone PC
running Conga.

Both cluster nodes and the adminpc are running CentOS5. After another
CentOS upgrade round in October, the cluster wouldn't start anymore. We
got that solved (cman would't start, but a newer openais package -
0.80.6 - let us overcome that by manual update), but now the virtual
machines always get started on all nodes simultaneously. Furthermore,
something in Conga setup also seems to have broken: The Conga
webinterface at the separate adminpc can still be accessed, but fails
when probing storage (broken ricci/luci communication?)
This never happened before the upgrade and we had changed neither
hardware nor software configuration during the update. Unfortunately, I
don't have access to the testing system anymore (but we *did* a lot of
testing before putting the system in production use).

I would appreciate if more experienced persons could review our
configuration and point out any errors or improvements:

The cluster has two nodes (station1, station2) and one standalone PC for
administration running Conga (adminpc). The nodes are standard Dell 1950
servers.
Main storage location is a Dell storage vault which is accessed via
iSCSI and mounted on both nodes as /rootfs/. The file system is GFS2.
Furthermore, it provides a quorum partition.
Fencing is handled via the included DRAC remote access boards.
There are three paravirtualized Xen-based virtual machines
(vm_mailserver, vm_ldapserver, vm_adminserver). Their container files 
are located at /rootfs/vmadminserver etc. The VMs are supposed to start
distributed on station1 (vm_mailserver) and station2 (vm_ldapserver,
vm_adminserver).

Software versions (identical on both nodes):
kernel 2.6.18-164.el5xen
openais-0.80.6-8.el5
cman-2.0.115-1.el5
rgmanager-2.0.52-1.el5.centos
xen-3.0.3-80.el5-3.3
xen-libs-3.0.3-80.el5-3.3
luci-0.12.1-7.3.el5.centos.1
ricci-0.12.1-7.3.el5.centos.1
gfs2-utils-0.1.62-1.el5

Before the CentOS update, the working cluster.conf was:
===quote nonworking cluster.conf===
<?xml version="1.0"?>
<cluster alias="example_cluster_1" config_version="81"
name="example_cluster_1">
    <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="30"/>
    <clusternodes>
        <clusternode name="station1.example.com" nodeid="1" votes="1">
            <fence>
                <method name="1">
                    <device name="station1_fenced"/>
                </method>
            </fence>
        </clusternode>
        <clusternode name="station2.example.com" nodeid="2" votes="1">
            <fence>
                <method name="1">
                    <device name="station2_fenced"/>
                </method>
            </fence>
        </clusternode>
    </clusternodes>
    <cman expected_votes="3" two_node="0"/>
    <fencedevices>
        <fencedevice agent="fence_ipmilan" ipaddr="172.16.10.91"
login="ipmi_admin" name="station1_fenced" operation="off" passwd="secret"/>
        <fencedevice agent="fence_ipmilan" ipaddr="172.16.10.92"
login="ipmi_admin" name="station2_fenced" operation="off" passwd="secret"/>
    </fencedevices>
    <rm>
        <failoverdomains>
            <failoverdomain name="bias-station1" nofailback="0"
ordered="0" restricted="0">
                <failoverdomainnode name="station1.example.com"
priority="1"/>
            </failoverdomain>
            <failoverdomain name="bias-station2" nofailback="0"
ordered="0" restricted="0">
                <failoverdomainnode name="station2.example.com"
priority="1"/>
            </failoverdomain>
        </failoverdomains>
        <resources/>
        <vm autostart="1" domain="bias-station1" exclusive="0"
migrate="live" name="vm_mailserver" path="/rootfs" recovery="restart"/>
        <vm autostart="1" domain="bias-station2" exclusive="0"
migrate="live" name="vm_ldapserver" path="/rootfs" recovery="restart"/>
        <vm autostart="1" domain="bias-station2" exclusive="0"
migrate="live" name="vm_adminserver" path="/rootfs" recovery="restart"/>
    </rm>
    <quorumd interval="3" label="xen_qdisk" min_score="1" tko="23"
votes="1"/>
</cluster>
===unquote nonworking cluster.conf===

A explained, this configuration worked flawlessly for 10 months.
Only after the CentOS update, it started the virtual machines
simultaneously on both station1 *and* station2 and not distributed as
per the <vm ...> directive. We temporarily worked arounf this problem by
changing the autostart parameter to <vm autostart="0" ...>.
At least this brought our cluster back to running, but we lost the
desired automatic restart should a system hang. And failover also
doesn't seem to work anymore.

I read several messages on this list where users seem to have had a
similar problem. It seems to me as if I had missed the use_virsh="0"
statement.

Hence my question: Is the following a valid cluster.conf for such a
setup (distributed VMs, automatic start, failover/failback):
===quote===
<?xml version="1.0"?>
<cluster alias="example_cluster_1" config_version="81"
name="example_cluster_1">
    <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="30"/>
    <clusternodes>
        <clusternode name="station1.example.com" nodeid="1" votes="1">
            <fence>
                <method name="1">
                    <device name="station1_fenced"/>
                </method>
            </fence>
        </clusternode>
        <clusternode name="station2.example.com" nodeid="2" votes="1">
            <fence>
                <method name="1">
                    <device name="station2_fenced"/>
                </method>
            </fence>
        </clusternode>
    </clusternodes>
    <cman expected_votes="3" two_node="0"/>
    <fencedevices>
        <fencedevice agent="fence_ipmilan" ipaddr="172.16.10.91"
login="ipmi_admin" name="station1_fenced" operation="off" passwd="secret"/>
        <fencedevice agent="fence_ipmilan" ipaddr="172.16.10.92"
login="ipmi_admin" name="station2_fenced" operation="off" passwd="secret"/>
    </fencedevices>
    <rm>
        <failoverdomains>
            <failoverdomain name="bias-station1" nofailback="0"
ordered="0" restricted="0">
                <failoverdomainnode name="station1.example.com"
priority="1"/>
            </failoverdomain>
            <failoverdomain name="bias-station2" nofailback="0"
ordered="0" restricted="0">
                <failoverdomainnode name="station2.example.com"
priority="1"/>
            </failoverdomain>
        </failoverdomains>
        <resources/>
        <vm autostart="1" use_virsh="0" domain="bias-station1"
exclusive="0" migrate="live" name="vm_mailserver" path="/rootfs"
recovery="restart"/>
        <vm autostart="1" use_virsh="0" domain="bias-station2"
exclusive="0" migrate="live" name="vm_ldapserver" path="/rootfs"
recovery="restart"/>
        <vm autostart="1" use_virsh="0" domain="bias-station2"
exclusive="0" migrate="live" name="vm_adminserver" path="/rootfs"
recovery="restart"/>
    </rm>
    <quorumd interval="3" label="xen_qdisk" min_score="1" tko="23"
votes="1"/>
</cluster>
===unquote===

I am open to further updates/testing and will gladly provide additional
details should if needed.
But as this setup also contains production systems, I want to avoid any
fundamental mistakes/oversights.

Needless to say, I would appreciate any feedback/suggestions!

Regards,
Wolf



From siedler at hrd-asia.com  Thu Dec 17 10:14:43 2009
From: siedler at hrd-asia.com (Wolf Siedler)
Date: Thu, 17 Dec 2009 18:14:43 +0800
Subject: [Linux-cluster] Re: Cluster config. advice for sought (2)
In-Reply-To: <4B29E099.6090105@hrd-asia.com>
References: <4B29E099.6090105@hrd-asia.com>
Message-ID: <4B2A0493.6040409@hrd-asia.com>

As follow up to my earlier question:

===
Station1 - output clustat:
Cluster Status for example_cluster_1 @ Thu Dec 17 18:07:44 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 station1.example.com                                               1
Online, Local, rgmanager
 station2.example.com                                               2
Online, rgmanager
 /dev/disk/by-id/scsi-36002219000a2f28b00000420494880a3-part1        0
Online, Quorum Disk

 Service Name                                                     Owner
(Last)                                                     State        
 ------- ----                                                     -----
------                                                     -----        
 vm:vm_adminserver                                               
(none)                                                          
disabled     
 vm:vm_ldapserver                                                
(none)                                                          
disabled     
 vm:vm_mailserver                                                
(none)                                                           disabled
===
Station1 - output xm li:
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0      768     1 r----- 410641.5
vm_mailserver                              3     2047     4 -b---- 833206.1
===
Station2 - output clustat:
Cluster Status for example_cluster_1 @ Thu Dec 17 17:37:15 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 station1.example.com                                               1
Online, rgmanager
 station2.example.com                                               2
Online, Local, rgmanager
 /dev/disk/by-id/scsi-36002219000a2f28b00000420494880a3-part1        0
Online, Quorum Disk

 Service Name                                                     Owner
(Last)                                                     State        
 ------- ----                                                     -----
------                                                     -----        
 vm:vm_adminserver                                               
(none)                                                          
disabled     
 vm:vm_ldapserver                                                
(none)                                                          
disabled     
 vm:vm_mailserver                                                
(none)                                                          
disabled     
===
Station2 - output xm li:
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0      768     1 r----- 384845.0
vm_adminserver                             6     1023     1 -b----  76745.5
vm_ldapserver                              4     1023     1 -b----  22685.6
===

Hope this provides better insight.

Regards,
Wolf



From brem.belguebli at gmail.com  Thu Dec 17 10:22:50 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Thu, 17 Dec 2009 11:22:50 +0100
Subject: [Linux-cluster] Re: Cluster config. advice for sought (2)
In-Reply-To: <4B2A0493.6040409@hrd-asia.com>
References: <4B29E099.6090105@hrd-asia.com> <4B2A0493.6040409@hrd-asia.com>
Message-ID: <29ae894c0912170222k6480988av4608450236ff1356@mail.gmail.com>

I think it is better if you post your cluster.conf.

Try to look in linux-cluster archive, your problem looks similar to
some others that were posted around October/November.

There were things to check with use_virsh, path etc... in the cluster.conf...



2009/12/17 Wolf Siedler <siedler at hrd-asia.com>:
> As follow up to my earlier question:
>
> ===
> Station1 - output clustat:
> Cluster Status for example_cluster_1 @ Thu Dec 17 18:07:44 2009
> Member Status: Quorate
>
> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
> ?station1.example.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1
> Online, Local, rgmanager
> ?station2.example.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2
> Online, rgmanager
> ?/dev/disk/by-id/scsi-36002219000a2f28b00000420494880a3-part1 ? ? ? ?0
> Online, Quorum Disk
>
> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Owner
> (Last) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? State
> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -----
> ------ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -----
> ?vm:vm_adminserver
> (none)
> disabled
> ?vm:vm_ldapserver
> (none)
> disabled
> ?vm:vm_mailserver
> (none) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? disabled
> ===
> Station1 - output xm li:
> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ?768 ? ? 1 r----- 410641.5
> vm_mailserver ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?3 ? ? 2047 ? ? 4 -b---- 833206.1
> ===
> Station2 - output clustat:
> Cluster Status for example_cluster_1 @ Thu Dec 17 17:37:15 2009
> Member Status: Quorate
>
> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
> ?station1.example.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1
> Online, rgmanager
> ?station2.example.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2
> Online, Local, rgmanager
> ?/dev/disk/by-id/scsi-36002219000a2f28b00000420494880a3-part1 ? ? ? ?0
> Online, Quorum Disk
>
> ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Owner
> (Last) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? State
> ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -----
> ------ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -----
> ?vm:vm_adminserver
> (none)
> disabled
> ?vm:vm_ldapserver
> (none)
> disabled
> ?vm:vm_mailserver
> (none)
> disabled
> ===
> Station2 - output xm li:
> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ID Mem(MiB) VCPUs State ? Time(s)
> Domain-0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ?768 ? ? 1 r----- 384845.0
> vm_adminserver ? ? ? ? ? ? ? ? ? ? ? ? ? ? 6 ? ? 1023 ? ? 1 -b---- ?76745.5
> vm_ldapserver ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?4 ? ? 1023 ? ? 1 -b---- ?22685.6
> ===
>
> Hope this provides better insight.
>
> Regards,
> Wolf
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From siedler at hrd-asia.com  Thu Dec 17 11:22:19 2009
From: siedler at hrd-asia.com (Wolf Siedler)
Date: Thu, 17 Dec 2009 19:22:19 +0800
Subject: [Linux-cluster] 
	Re: cluster.conf, was: Cluster config. advice sought
In-Reply-To: <29ae894c0912170222k6480988av4608450236ff1356@mail.gmail.com>
References: <4B29E099.6090105@hrd-asia.com> <4B2A0493.6040409@hrd-asia.com>
	<29ae894c0912170222k6480988av4608450236ff1356@mail.gmail.com>
Message-ID: <4B2A146B.5050705@hrd-asia.com>

Dear Brem,

Thanks for taking time to look at my problem.

> Try to look in linux-cluster archive, your problem looks similar to
> some others that were posted around October/November.
> There were things to check with use_virsh, path etc... in the
> cluster.conf...

I did and this was actually the reason for my original question (I am
definitely open for testing, but there is one production VM running in
the cluster. Which in turn limits my access for configuration changes
and restarts.):
After studying the thread you described, I came up with this cluster.conf:
===quote===
<?xml version="1.0"?>
<cluster alias="example_cluster_1" config_version="81"
name="example_cluster_1">
    <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="30"/>
    <clusternodes>
        <clusternode name="station1.example.com" nodeid="1" votes="1">
            <fence>
                <method name="1">
                    <device name="station1_fenced"/>
                </method>
            </fence>
        </clusternode>
        <clusternode name="station2.example.com" nodeid="2" votes="1">
            <fence>
                <method name="1">
                    <device name="station2_fenced"/>
                </method>
            </fence>
        </clusternode>
    </clusternodes>
    <cman expected_votes="3" two_node="0"/>
    <fencedevices>
        <fencedevice agent="fence_ipmilan" ipaddr="172.16.10.91"
login="ipmi_admin" name="station1_fenced" operation="off" passwd="secret"/>
        <fencedevice agent="fence_ipmilan" ipaddr="172.16.10.92"
login="ipmi_admin" name="station2_fenced" operation="off" passwd="secret"/>
    </fencedevices>
    <rm>
        <failoverdomains>
            <failoverdomain name="bias-station1" nofailback="0"
ordered="0" restricted="0">
                <failoverdomainnode name="station1.example.com"
priority="1"/>
            </failoverdomain>
            <failoverdomain name="bias-station2" nofailback="0"
ordered="0" restricted="0">
                <failoverdomainnode name="station2.example.com"
priority="1"/>
            </failoverdomain>
        </failoverdomains>
        <resources/>
        <vm autostart="1" use_virsh="0" domain="bias-station1"
exclusive="0" migrate="live" name="vm_mailserver" path="/rootfs"
recovery="restart"/>
        <vm autostart="1" use_virsh="0" domain="bias-station2"
exclusive="0" migrate="live" name="vm_ldapserver" path="/rootfs"
recovery="restart"/>
        <vm autostart="1" use_virsh="0" domain="bias-station2"
exclusive="0" migrate="live" name="vm_adminserver" path="/rootfs"
recovery="restart"/>
    </rm>
    <quorumd interval="3" label="xen_qdisk" min_score="1" tko="23"
votes="1"/>
</cluster>
===unquote===

You will notice that I already included use_virsh.
Does this cluster.conf look OK?

As said before, I would highly appreciate any advice/suggestion you
would be willing to give.

Regards,
Wolf



From brem.belguebli at gmail.com  Thu Dec 17 12:42:04 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Thu, 17 Dec 2009 13:42:04 +0100
Subject: [Linux-cluster] Re: cluster.conf, was: Cluster config. advice 
	sought
In-Reply-To: <4B2A146B.5050705@hrd-asia.com>
References: <4B29E099.6090105@hrd-asia.com> <4B2A0493.6040409@hrd-asia.com>
	<29ae894c0912170222k6480988av4608450236ff1356@mail.gmail.com>
	<4B2A146B.5050705@hrd-asia.com>
Message-ID: <29ae894c0912170442y22ff7076ob61bcfbc0960f6ee@mail.gmail.com>

Hi Wolf,

I have no xen setup to tell you exactly if the cluster.conf you posted
should be fine.

I do understand that this cluster.conf comes from what you think it
should be after reading the different posts, and it is not the one you
have in production right now, right ?

To test it without disturbing your prod setup, as the use_virsh, path
parameters are VM based, you may create a test VM with these
parameters and see if you get the same behaviour.

 Brem
2009/12/17 Wolf Siedler <siedler at hrd-asia.com>:
> Dear Brem,
>
> Thanks for taking time to look at my problem.
>
>> Try to look in linux-cluster archive, your problem looks similar to
>> some others that were posted around October/November.
>> There were things to check with use_virsh, path etc... in the
>> cluster.conf...
>
> I did and this was actually the reason for my original question (I am
> definitely open for testing, but there is one production VM running in
> the cluster. Which in turn limits my access for configuration changes
> and restarts.):
> After studying the thread you described, I came up with this cluster.conf:
> ===quote===
> <?xml version="1.0"?>
> <cluster alias="example_cluster_1" config_version="81"
> name="example_cluster_1">
> ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="30"/>
> ? ?<clusternodes>
> ? ? ? ?<clusternode name="station1.example.com" nodeid="1" votes="1">
> ? ? ? ? ? ?<fence>
> ? ? ? ? ? ? ? ?<method name="1">
> ? ? ? ? ? ? ? ? ? ?<device name="station1_fenced"/>
> ? ? ? ? ? ? ? ?</method>
> ? ? ? ? ? ?</fence>
> ? ? ? ?</clusternode>
> ? ? ? ?<clusternode name="station2.example.com" nodeid="2" votes="1">
> ? ? ? ? ? ?<fence>
> ? ? ? ? ? ? ? ?<method name="1">
> ? ? ? ? ? ? ? ? ? ?<device name="station2_fenced"/>
> ? ? ? ? ? ? ? ?</method>
> ? ? ? ? ? ?</fence>
> ? ? ? ?</clusternode>
> ? ?</clusternodes>
> ? ?<cman expected_votes="3" two_node="0"/>
> ? ?<fencedevices>
> ? ? ? ?<fencedevice agent="fence_ipmilan" ipaddr="172.16.10.91"
> login="ipmi_admin" name="station1_fenced" operation="off" passwd="secret"/>
> ? ? ? ?<fencedevice agent="fence_ipmilan" ipaddr="172.16.10.92"
> login="ipmi_admin" name="station2_fenced" operation="off" passwd="secret"/>
> ? ?</fencedevices>
> ? ?<rm>
> ? ? ? ?<failoverdomains>
> ? ? ? ? ? ?<failoverdomain name="bias-station1" nofailback="0"
> ordered="0" restricted="0">
> ? ? ? ? ? ? ? ?<failoverdomainnode name="station1.example.com"
> priority="1"/>
> ? ? ? ? ? ?</failoverdomain>
> ? ? ? ? ? ?<failoverdomain name="bias-station2" nofailback="0"
> ordered="0" restricted="0">
> ? ? ? ? ? ? ? ?<failoverdomainnode name="station2.example.com"
> priority="1"/>
> ? ? ? ? ? ?</failoverdomain>
> ? ? ? ?</failoverdomains>
> ? ? ? ?<resources/>
> ? ? ? ?<vm autostart="1" use_virsh="0" domain="bias-station1"
> exclusive="0" migrate="live" name="vm_mailserver" path="/rootfs"
> recovery="restart"/>
> ? ? ? ?<vm autostart="1" use_virsh="0" domain="bias-station2"
> exclusive="0" migrate="live" name="vm_ldapserver" path="/rootfs"
> recovery="restart"/>
> ? ? ? ?<vm autostart="1" use_virsh="0" domain="bias-station2"
> exclusive="0" migrate="live" name="vm_adminserver" path="/rootfs"
> recovery="restart"/>
> ? ?</rm>
> ? ?<quorumd interval="3" label="xen_qdisk" min_score="1" tko="23"
> votes="1"/>
> </cluster>
> ===unquote===
>
> You will notice that I already included use_virsh.
> Does this cluster.conf look OK?
>
> As said before, I would highly appreciate any advice/suggestion you
> would be willing to give.
>
> Regards,
> Wolf
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From pradhanparas at gmail.com  Thu Dec 17 18:33:49 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Thu, 17 Dec 2009 12:33:49 -0600
Subject: [Linux-cluster] conga issue?
Message-ID: <8b711df40912171033u66034db1p7c6f35816b20cc79@mail.gmail.com>

I am trying to configure a cluster using conga in RH5.4. Luci version is
0.12.2-6.el5_4.1. It is responding really really slow. When I log on inside
the congo and click the tabs, it takes ages to show me the page/link that I
want to. Sometimes it reports it is unable to communicate with cluster
nodes. What might be this issue?


Thanks
Paras.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091217/ec0aaf4f/attachment.htm>

From brem.belguebli at gmail.com  Thu Dec 17 18:49:09 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Thu, 17 Dec 2009 19:49:09 +0100
Subject: [Linux-cluster] conga issue?
In-Reply-To: <8b711df40912171033u66034db1p7c6f35816b20cc79@mail.gmail.com>
References: <8b711df40912171033u66034db1p7c6f35816b20cc79@mail.gmail.com>
Message-ID: <29ae894c0912171049q6d7a865ja97fae23efd3f753@mail.gmail.com>

I personnaly gave up trying to use it, as it is very slow

Particularly the storage tab is completely unusable if you have
mutipath devices or more than a few disks.

There was something about the /etc/hosts entries that was supposed to
resolve the overall slowlyness (I can't find back the thread it was
about) but it didn't have any kind of effects in my setup.

Brem

2009/12/17 Paras pradhan <pradhanparas at gmail.com>:
> I am trying to configure a cluster using conga in RH5.4. Luci version is
> 0.12.2-6.el5_4.1. It is responding really really slow. When I log on inside
> the congo and click the tabs, it takes ages to show me the page/link that I
> want to. Sometimes it reports it is unable to communicate with cluster
> nodes. What might be this issue?
>
> Thanks
> Paras.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From pradhanparas at gmail.com  Thu Dec 17 18:53:44 2009
From: pradhanparas at gmail.com (Paras pradhan)
Date: Thu, 17 Dec 2009 12:53:44 -0600
Subject: [Linux-cluster] conga issue?
In-Reply-To: <29ae894c0912171049q6d7a865ja97fae23efd3f753@mail.gmail.com>
References: <8b711df40912171033u66034db1p7c6f35816b20cc79@mail.gmail.com>
	<29ae894c0912171049q6d7a865ja97fae23efd3f753@mail.gmail.com>
Message-ID: <8b711df40912171053v44ab326atef5ec43addafc8b5@mail.gmail.com>

On Thu, Dec 17, 2009 at 12:49 PM, brem belguebli
<brem.belguebli at gmail.com>wrote:

> I personnaly gave up trying to use it, as it is very slow
>
> Particularly the storage tab is completely unusable if you have
> mutipath devices or more than a few disks.
>

Yes you are correct. I have multipath device mapper. I created the storage
using conga. But now I the storage tab is completely unsuable.

>
> There was something about the /etc/hosts entries that was supposed to
> resolve the overall slowlyness (I can't find back the thread it was
> about) but it didn't have any kind of effects in my setup.
>

I have played a bit with /etc/hosts but no luck to me as well.

>
> Brem
>
> 2009/12/17 Paras pradhan <pradhanparas at gmail.com>:
> > I am trying to configure a cluster using conga in RH5.4. Luci version is
> > 0.12.2-6.el5_4.1. It is responding really really slow. When I log on
> inside
> > the congo and click the tabs, it takes ages to show me the page/link that
> I
> > want to. Sometimes it reports it is unable to communicate with cluster
> > nodes. What might be this issue?
> >
> > Thanks
> > Paras.
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


Thanks
Paras.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091217/6866e478/attachment.htm>

From siedler at hrd-asia.com  Fri Dec 18 01:24:25 2009
From: siedler at hrd-asia.com (Wolf Siedler)
Date: Fri, 18 Dec 2009 09:24:25 +0800
Subject: [Linux-cluster] Re: cluster.conf
In-Reply-To: <29ae894c0912170442y22ff7076ob61bcfbc0960f6ee@mail.gmail.com>
References: <4B29E099.6090105@hrd-asia.com>
	<4B2A0493.6040409@hrd-asia.com>	<29ae894c0912170222k6480988av4608450236ff1356@mail.gmail.com>	<4B2A146B.5050705@hrd-asia.com>
	<29ae894c0912170442y22ff7076ob61bcfbc0960f6ee@mail.gmail.com>
Message-ID: <4B2AD9C9.3080006@hrd-asia.com>

Hi Brem,

> I do understand that this cluster.conf comes from what you think it
> should be after reading the different posts, and it is not the one you
> have in production right now, right ?

Yes.
However, except for
use_virsh="0"
it is exactly the one we used in production until the problematic CentOS
update.

> I have no xen setup to tell you exactly if the cluster.conf you posted
> should be fine
I had noticed that. But anyway, if you don't spot any major
misconfigurations in the original cluster.conf (as quoted below), then
I'll give it a try with the included use_virsh parameter.

Thanks for your feedback and regards,
Wolf

PS:
Just to clarify, this is the exact cluster.conf we used until the
update-related problem:
===quote===
<?xml version="1.0"?>
<cluster alias="example_cluster_1" config_version="81"
name="example_cluster_1">
    <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="30"/>
    <clusternodes>
        <clusternode name="station1.example.com" nodeid="1" votes="1">
            <fence>
                <method name="1">
                    <device name="station1_fenced"/>
                </method>
            </fence>
        </clusternode>
        <clusternode name="station2.example.com" nodeid="2" votes="1">
            <fence>
                <method name="1">
                    <device name="station2_fenced"/>
                </method>
            </fence>
        </clusternode>
    </clusternodes>
    <cman expected_votes="3" two_node="0"/>
    <fencedevices>
        <fencedevice agent="fence_ipmilan" ipaddr="172.16.10.91"
login="ipmi_admin" name="station1_fenced" operation="off" passwd="secret"/>
        <fencedevice agent="fence_ipmilan" ipaddr="172.16.10.92"
login="ipmi_admin" name="station2_fenced" operation="off" passwd="secret"/>
    </fencedevices>
    <rm>
        <failoverdomains>
            <failoverdomain name="bias-station1" nofailback="0"
ordered="0" restricted="0">
                <failoverdomainnode name="station1.example.com"
priority="1"/>
            </failoverdomain>
            <failoverdomain name="bias-station2" nofailback="0"
ordered="0" restricted="0">
                <failoverdomainnode name="station2.example.com"
priority="1"/>
            </failoverdomain>
        </failoverdomains>
        <resources/>
        <vm autostart="1" domain="bias-station1"
exclusive="0" migrate="live" name="vm_mailserver" path="/rootfs"
recovery="restart"/>
        <vm autostart="1" domain="bias-station2"
exclusive="0" migrate="live" name="vm_ldapserver" path="/rootfs"
recovery="restart"/>
        <vm autostart="1" domain="bias-station2"
exclusive="0" migrate="live" name="vm_adminserver" path="/rootfs"
recovery="restart"/>
    </rm>
    <quorumd interval="3" label="xen_qdisk" min_score="1" tko="23"
votes="1"/>
</cluster>
===unquote===



From baishuwei at gmail.com  Fri Dec 18 05:17:38 2009
From: baishuwei at gmail.com (Bai Shuwei)
Date: Thu, 17 Dec 2009 21:17:38 -0800 (PST)
Subject: [Linux-cluster] Invitation to connect on LinkedIn
Message-ID: <1756103043.258655.1261113458016.JavaMail.app@ech3-cdn05.prod>

LinkedIn
------------

Bai Shuwei requested to add you as a connection on LinkedIn:
------------------------------------------

Marian,

I'd like to add you to my professional network on LinkedIn.

- Bai

Accept invitation from Bai Shuwei
http://www.linkedin.com/e/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/blk/I1669366669_2/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_cBYVdzoSdzcVdzoNiiYUc31xu5pBuiYUdzwVdjwUcPALrCBxbOYWrSlI/EML_comm_afe/

View invitation from Bai Shuwei
http://www.linkedin.com/e/ulDuieLaAX544oVCOYcgj_GaXIys4TuLMXGmOx/blk/I1669366669_2/39vejoSdzoPejoSckALqnpPbOYWrSlI/svi/

------------------------------------------

Why might connecting with Bai Shuwei be a good idea?

People Bai Shuwei knows can discover your profile:
Connecting to Bai Shuwei will attract the attention of LinkedIn users. See who's been viewing your profile:

http://www.linkedin.com/e/wvp/inv18_wvmp/

 
------
(c) 2009, LinkedIn Corporation

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091217/e94702d4/attachment.htm>

From jakov.sosic at srce.hr  Fri Dec 18 16:17:55 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Fri, 18 Dec 2009 17:17:55 +0100
Subject: [Linux-cluster] openais on RHEL 5.4 and broadcast?
Message-ID: <1261153075.1918.8.camel@localhost>

Hi.

How can I force openais on RHEL 5.4 to use broadcast? I've found this in
documentation:

	OpenAIS now provides broadcast network communication in addition
	to multicast. This functionality is considered Technology
	Preview for standalone usage of OpenAIS and for usage with the
	Cluster Suite. Note, however, that the functionality for
	configuring OpenAIS to use broadcast is not integrated into the
	cluster management tools and must be configured manually.


I've found in cman(5) that openais settings from /etc/ais/openais.conf
are ignored if openais is started by ccs_tool, and that I have to set
properties for totem in cluster.conf. But how could I do that? Beacause
there is no example in the man page :(





-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
|                                                               |




From ccook at pandora.com  Fri Dec 18 17:35:52 2009
From: ccook at pandora.com (Christopher Strider Cook)
Date: Fri, 18 Dec 2009 09:35:52 -0800
Subject: [Linux-cluster] cluster3 - service fails, doesn't failover/fence
Message-ID: <4B2BBD78.5000900@pandora.com>

I've got an otherwise working fine two node + qdisk cluster3 (3.0.0) 
setup running under Debian with 2.6.30 kern. In the past it has fenced 
and failed over properly to recover from a failed node.
But, yesterday one of the status checks returned a 1 and the subsequent 
automatic start/stop of the service also returned non-good. This set my 
cluster service into a 'failed' state and all related components were 
stopped. Everything was resolved with a manual service disable and enable.

Should the secondary have fenced in this case or is that reserved for 
only when communications in the cluster fail? I would have thought that 
it would have tried to start the service at least. A clustat on either 
machine showed the service "failed' and nothing was logged on the 
non-active node.

Since a failover (rather then a give up) would be the proper thing, I'm 
assuming a config issue. Any pointers?



<?xml version="1.0"?>
<cluster name="alpha" config_version="42">

<cman two_node="0" expected_votes="3">
</cman>

<clusternodes>
<clusternode name="wonder-p" votes="1" nodeid="1">
       <fence>
               <method name="single">
                       <device name="pwr01" option="off"/>
                       <device name="pwr02" option="off"/>
                       <device name="pwr01" option="on"/>
                       <device name="pwr02" option="on"/>
               </method>
       </fence>
</clusternode>
<clusternode name="nicks-p" votes="1" nodeid="2">
       <fence>
               <method name="single">
                       <device name="pwr03" option="off"/>
                       <device name="pwr04" option="off"/>
                       <device name="pwr03" option="on"/>
                       <device name="pwr04" option="on"/>
               </method>
       </fence>
</clusternode>
</clusternodes>

<quorumd interval="1" tko="10" votes="1" label="quorumdisk">
       <heuristic program="ping 172.25.19.254 -c1 -t1" score="1" 
interval="2" tko="3"/>
</quorumd>

<fence_daemon post_join_delay="20">
</fence_daemon>

<fencedevices>
       <fencedevice agent="fence_apc_snmp" ipaddr="pdu-paul-2-2" 
port="4" name="pwr01" udpport="161" />
       <fencedevice agent="fence_apc_snmp" ipaddr="pdu-paul-2-3" 
port="4" name="pwr02" udpport="161" />
       <fencedevice agent="fence_apc_snmp" ipaddr="pdu-paul-2-2" 
port="3" name="pwr03" udpport="161" />
       <fencedevice agent="fence_apc_snmp" ipaddr="pdu-paul-2-3" 
port="3" name="pwr04" udpport="161" />
</fencedevices>

<rm>

  <failoverdomains>
          <failoverdomain name="mailcluster" restricted="1" ordered="0" >
               <failoverdomainnode name="wonder-p" priority="1"/>
               <failoverdomainnode name="nicks-p" priority="1"/>
          </failoverdomain>
  </failoverdomains>

  <service name="MailHost" autostart="1" domain="mailcluster" >
          <script name="MailHost-early" 
file="/etc/cluster/MailHost-misc-early" />
          <fs name="mailhome" mountpoint="/home" device="LABEL=home" 
fstype="ext4" force_unmount="1" active_monitor="1" 
options="defaults,noatime,nodiratime" />
          <fs name="mailcluster" mountpoint="/var/cluster" 
device="LABEL=cluster" fstype="ext3" force_unmount="1" 
active_monitor="1" options="defaults" />
          <ip address="172.25.16.58" monitor_link="1" />
          <script name="saslauthd" file="/etc/cluster/saslauthd-cluster" />
          <script name="postfix" file="/etc/cluster/postfix-cluster" />
          <script name="dovecot" file="/etc/cluster/dovecot-wrapper" 
__independent_subtree="1" />
          <script name="mailman" file="/etc/cluster/mailman-wrapper" 
__independent_subtree="1" />
          <script name="apache2-mailhost" 
file="/etc/cluster/apache2-mailhost" __independent_subtree="1" />
          <script name="usermin" file="/etc/init.d/usermin" 
__independent_subtree="1" />
          <script name="MailHost-late" 
file="/etc/cluster/MailHost-misc-late" />
  </service>

</rm>
</cluster>

Dec 15 12:37:00 bash Executing /etc/cluster/postfix-cluster status
Dec 15 12:37:00 bash Executing /etc/cluster/dovecot-wrapper status
Dec 15 12:37:00 bash Executing /etc/cluster/mailman-wrapper status
Dec 15 12:37:00 bash Executing /etc/cluster/apache2-mailhost status
Dec 15 12:37:00 bash Executing /etc/init.d/usermin status
Dec 15 12:37:00 bash script:usermin: status of /etc/init.d/usermin 
failed (return
ed 1)
Dec 15 12:37:01 bash Executing /etc/cluster/MailHost-misc-late status
Dec 15 12:37:01 bash Executing /etc/init.d/usermin stop
Dec 15 12:37:03 bash Executing /etc/init.d/usermin start
Dec 15 12:37:19 bash script:usermin: start of /etc/init.d/usermin failed 
(returne
d 98)
Dec 15 12:37:20 bash Executing /etc/cluster/MailHost-misc-late stop
Dec 15 12:37:21 bash Executing /etc/init.d/usermin stop
Dec 15 12:37:21 bash script:usermin: stop of /etc/init.d/usermin failed 
(returned
1)
Dec 15 12:37:21 bash Executing /etc/cluster/apache2-mailhost stop
Dec 15 12:37:24 bash Executing /etc/cluster/mailman-wrapper stop
Dec 15 12:37:42 bash Executing /etc/cluster/dovecot-wrapper stop
Dec 15 12:37:43 bash Executing /etc/cluster/postfix-cluster stop
Dec 15 12:37:56 bash Executing /etc/cluster/saslauthd-cluster stop
Dec 15 12:38:07 bash Executing /etc/cluster/MailHost-misc-early stop
Dec 15 12:38:08 bash Removing IPv4 address 172.25.16.58/22 from eth0
Dec 15 12:38:21 bash unmounting /var/cluster
Dec 15 12:38:21 bash Forcefully unmounting /var/cluster
Dec 15 12:38:22 bash killing process 6844 (daemon atd /var/cluster)
Dec 15 12:38:22 bash killing process 4274 (root bash /var/cluster)
Dec 15 12:38:22 bash killing process 6836 (root cron /var/cluster)
Dec 15 12:38:30 bash unmounting /var/cluster
Dec 15 12:38:32 bash unmounting /home
Dec 15 12:38:32 bash Forcefully unmounting /home
Dec 15 12:38:33 bash killing process 27678 (root bacula-fd /home)
Dec 15 12:38:41 bash unmounting /home
Dec 15 12:50:08 bash Executing /etc/cluster/MailHost-misc-late stop
Dec 15 12:50:08 bash Executing /etc/init.d/usermin stop
Dec 15 12:50:08 bash script:usermin: stop of /etc/init.d/usermin failed 
(returned
1)
Dec 15 12:50:08 bash Executing /etc/cluster/apache2-mailhost stop
Dec 15 12:50:09 bash Executing /etc/cluster/mailman-wrapper stop
Dec 15 12:50:09 bash script:mailman: stop of 
/etc/cluster/mailman-wrapper failed
(returned 1)
Dec 15 12:50:09 bash Executing /etc/cluster/dovecot-wrapper stop
Dec 15 12:50:09 bash Executing /etc/cluster/postfix-cluster stop
Dec 15 12:50:09 bash Executing /etc/cluster/saslauthd-cluster stop
Dec 15 12:50:10 bash Executing /etc/cluster/MailHost-misc-early stop
Dec 15 12:50:10 bash 172.25.16.58 is not configured
Dec 15 12:50:10 bash /dev/dm-1 is not mounted
Dec 15 12:50:10 bash /dev/dm-0 is not mounted
Dec 15 12:50:20 bash Unknown file system type 'ext4' for device 
/dev/dm-0.  Assum
ing fsck is required.
Dec 15 12:50:20 bash Running fsck on /dev/dm-0
Dec 15 12:50:21 bash mounting /dev/dm-0 on /home
Dec 15 12:50:21 bash mount -t ext4 -o defaults,noatime,nodiratime 
/dev/dm-0 /home
Dec 15 12:50:22 bash quotaon not found in /bin:/sbin:/usr/bin:/usr/sbin
Dec 15 12:50:22 bash mounting /dev/dm-1 on /var/cluster
Dec 15 12:50:23 bash mount -t ext3 -o defaults /dev/dm-1 /var/cluster
Dec 15 12:50:23 bash quotaon not found in /bin:/sbin:/usr/bin:/usr/sbin
Dec 15 12:50:23 bash Link for eth0: Detected
Dec 15 12:50:23 bash Adding IPv4 address 172.25.16.58/22 to eth0
Dec 15 12:50:23 bash Sending gratuitous ARP: 172.25.16.58 
00:30:48:c6:de:24 brd f
f:ff:ff:ff:ff:ff
Dec 15 12:50:24 bash Executing /etc/cluster/MailHost-misc-early start
... startup continues fine



From brem.belguebli at gmail.com  Fri Dec 18 21:46:28 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Fri, 18 Dec 2009 22:46:28 +0100
Subject: [Linux-cluster] Re: cluster.conf
In-Reply-To: <4B2AD9C9.3080006@hrd-asia.com>
References: <4B29E099.6090105@hrd-asia.com> <4B2A0493.6040409@hrd-asia.com>
	<29ae894c0912170222k6480988av4608450236ff1356@mail.gmail.com>
	<4B2A146B.5050705@hrd-asia.com>
	<29ae894c0912170442y22ff7076ob61bcfbc0960f6ee@mail.gmail.com>
	<4B2AD9C9.3080006@hrd-asia.com>
Message-ID: <29ae894c0912181346hdbce8b1se70576ac2b3c535@mail.gmail.com>

2009/12/18 Wolf Siedler <siedler at hrd-asia.com>:
> Hi Brem,
>
>> I do understand that this cluster.conf comes from what you think it
>> should be after reading the different posts, and it is not the one you
>> have in production right now, right ?
>
> Yes.
> However, except for
> use_virsh="0"
> it is exactly the one we used in production until the problematic CentOS
> update.
>
>> I have no xen setup to tell you exactly if the cluster.conf you posted
>> should be fine
> I had noticed that. But anyway, if you don't spot any major
> misconfigurations in the original cluster.conf (as quoted below), then
> I'll give it a try with the included use_virsh parameter.
>
Best thing to do is to create a new VM with the use_virsh and path
parameters and test if it is behaving as desired .

> Thanks for your feedback and regards,
> Wolf
>
Brem
> PS:
> Just to clarify, this is the exact cluster.conf we used until the
> update-related problem:
> ===quote===
> <?xml version="1.0"?>
> <cluster alias="example_cluster_1" config_version="81"
> name="example_cluster_1">
> ? ?<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="30"/>
> ? ?<clusternodes>
> ? ? ? ?<clusternode name="station1.example.com" nodeid="1" votes="1">
> ? ? ? ? ? ?<fence>
> ? ? ? ? ? ? ? ?<method name="1">
> ? ? ? ? ? ? ? ? ? ?<device name="station1_fenced"/>
> ? ? ? ? ? ? ? ?</method>
> ? ? ? ? ? ?</fence>
> ? ? ? ?</clusternode>
> ? ? ? ?<clusternode name="station2.example.com" nodeid="2" votes="1">
> ? ? ? ? ? ?<fence>
> ? ? ? ? ? ? ? ?<method name="1">
> ? ? ? ? ? ? ? ? ? ?<device name="station2_fenced"/>
> ? ? ? ? ? ? ? ?</method>
> ? ? ? ? ? ?</fence>
> ? ? ? ?</clusternode>
> ? ?</clusternodes>
> ? ?<cman expected_votes="3" two_node="0"/>
> ? ?<fencedevices>
> ? ? ? ?<fencedevice agent="fence_ipmilan" ipaddr="172.16.10.91"
> login="ipmi_admin" name="station1_fenced" operation="off" passwd="secret"/>
> ? ? ? ?<fencedevice agent="fence_ipmilan" ipaddr="172.16.10.92"
> login="ipmi_admin" name="station2_fenced" operation="off" passwd="secret"/>
> ? ?</fencedevices>
> ? ?<rm>
> ? ? ? ?<failoverdomains>
> ? ? ? ? ? ?<failoverdomain name="bias-station1" nofailback="0"
> ordered="0" restricted="0">
> ? ? ? ? ? ? ? ?<failoverdomainnode name="station1.example.com"
> priority="1"/>
> ? ? ? ? ? ?</failoverdomain>
> ? ? ? ? ? ?<failoverdomain name="bias-station2" nofailback="0"
> ordered="0" restricted="0">
> ? ? ? ? ? ? ? ?<failoverdomainnode name="station2.example.com"
> priority="1"/>
> ? ? ? ? ? ?</failoverdomain>
> ? ? ? ?</failoverdomains>
> ? ? ? ?<resources/>
> ? ? ? ?<vm autostart="1" domain="bias-station1"
> exclusive="0" migrate="live" name="vm_mailserver" path="/rootfs"
> recovery="restart"/>
> ? ? ? ?<vm autostart="1" domain="bias-station2"
> exclusive="0" migrate="live" name="vm_ldapserver" path="/rootfs"
> recovery="restart"/>
> ? ? ? ?<vm autostart="1" domain="bias-station2"
> exclusive="0" migrate="live" name="vm_adminserver" path="/rootfs"
> recovery="restart"/>
> ? ?</rm>
> ? ?<quorumd interval="3" label="xen_qdisk" min_score="1" tko="23"
> votes="1"/>
> </cluster>
> ===unquote===
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From jakov.sosic at srce.hr  Sat Dec 19 22:40:33 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Sat, 19 Dec 2009 23:40:33 +0100
Subject: [Linux-cluster] openais on RHEL 5.4 and broadcast?
In-Reply-To: <1261153075.1918.8.camel@localhost>
References: <1261153075.1918.8.camel@localhost>
Message-ID: <1261262433.2565.3.camel@localhost>

On Fri, 2009-12-18 at 17:17 +0100, Jakov Sosic wrote:

> I've found in cman(5) that openais settings from /etc/ais/openais.conf
> are ignored if openais is started by ccs_tool, and that I have to set
> properties for totem in cluster.conf. But how could I do that? Beacause
> there is no example in the man page :(

I see no one is answering this one... So maybe I should put another
questions at stake.

Who is running aisexec? ccsd? And where can I get source of the current
cluster suite? I'll take a look into the source I will hopefully find a
way to set up broadcasting?


-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
|                                                               |




From i.sasportas at repliweb.com  Sun Dec 20 07:56:10 2009
From: i.sasportas at repliweb.com (Itshak Sasportas)
Date: Sun, 20 Dec 2009 09:56:10 +0200
Subject: [Linux-cluster] Linux-cluster Digest, Vol 68, Issue 14
In-Reply-To: <mailman.25.1261069204.11104.linux-cluster@redhat.com>
References: <mailman.25.1261069204.11104.linux-cluster@redhat.com>
Message-ID: <C3F774F20F8A1C45B7D4D9F9826E3DFAD123CF@exchil.rwint.com>

Hi!
How can I unsubscribe ?
Best Regards,
Itshak


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of
linux-cluster-request at redhat.com
Sent: Thursday, December 17, 2009 7:00 PM
To: linux-cluster at redhat.com
Subject: Linux-cluster Digest, Vol 68, Issue 14

Send Linux-cluster mailing list submissions to
	linux-cluster at redhat.com

To subscribe or unsubscribe via the World Wide Web, visit
	https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
	linux-cluster-request at redhat.com

You can reach the person managing the list at
	linux-cluster-owner at redhat.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-cluster digest..."


Today's Topics:

   1. 2 pptp links on two hosts (Jakov Sosic)


----------------------------------------------------------------------

Message: 1
Date: Wed, 16 Dec 2009 18:50:09 +0100
From: Jakov Sosic <jakov.sosic at srce.hr>
To: linux-cluster at redhat.com
Subject: [Linux-cluster] 2 pptp links on two hosts
Message-ID: <1260985809.2168.6.camel at localhost>
Content-Type: text/plain; charset="UTF-8"

Hi.

I have two pptp links on two hosts. Hosts are frontends (gateways,
firewalls, NAT) for some network. Hosts also must be gateways for all
the VLANs.

Now, two things in this case can fail - one host, or for example it's
pptp route, in which case again gateway and static routes should be
transferred to the secondary node.

Is there a way to solve this with RHCS, or is there any more appropriate
software for this kind of failover?

My initial idea when I heard the problem was to write something like
init script, which in status part pings some address behind PPTP link,
and if ping is OK, than service is considered OK. Now, if for some
reason ping fails, status wouldn't be 0, and RHCS would apply relocate
policy, stop the script on primary and start it on secondary. stop
function would delete all the routes, and start would set appropriate
static routes. RHCS itself would take care of the floating address of
the gateway.

I wonder if you have any experience with this kind of setup, or any
ideas if this could be done in any better way?

Thank you.




-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
|                                                               |




------------------------------

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 68, Issue 14
*********************************************



From sdake at redhat.com  Sun Dec 20 21:36:27 2009
From: sdake at redhat.com (Steven Dake)
Date: Sun, 20 Dec 2009 14:36:27 -0700
Subject: [Linux-cluster] openais on RHEL 5.4 and broadcast?
In-Reply-To: <1261262433.2565.3.camel@localhost>
References: <1261153075.1918.8.camel@localhost>
	<1261262433.2565.3.camel@localhost>
Message-ID: <1261344987.2628.7.camel@localhost.localdomain>

On Sat, 2009-12-19 at 23:40 +0100, Jakov Sosic wrote:
> On Fri, 2009-12-18 at 17:17 +0100, Jakov Sosic wrote:
> 
> > I've found in cman(5) that openais settings from /etc/ais/openais.conf
> > are ignored if openais is started by ccs_tool, and that I have to set
> > properties for totem in cluster.conf. But how could I do that? Beacause
> > there is no example in the man page :(
> 
> I see no one is answering this one... So maybe I should put another
> questions at stake.
> 
> Who is running aisexec? ccsd? And where can I get source of the current
> cluster suite? I'll take a look into the source I will hopefully find a
> way to set up broadcasting?
> 
> 

The developers get alot of mail.  It was probably missed.  My apologies.

When using cman, you can specify totem parameters in the cluster.conf
file.  The man page on rhel5.4 may be out of date.  I can't recommend
broadcast, but to use it in cman, add <totem broadcast="yes"> inside the
cluster block.  From the current man page:

Other openais parameters
              When  openais  is  started by cman (cman_tool runs aisexec), the
              openais.conf file  is  not  used.   Many  of  the  configuration
              parameters  listed  in  openais.conf  can be set in cluster.conf
              (CCS) instead.  Cman  will  read  openais  parameters  from  the
              following sections in cluster.conf and load them into openais:

                <cluster>
                  <totem />
                  <logging />
                  <event />
                  <aisexec />
                  <group />
                </cluster>

              See  the  openais.conf(5)  man page for more information on keys
              that are valid for these sections.  Note that  settings  in  the
              <clusternodes>  section  will  override settings in the sections
              above, and options on the cman_tool command line  will  override
              both.   In  particular,  settings  like  bindnetaddr, mcastaddr,
              mcastport and nodeid  will  always  be  replaced  by  values  in
              <clusternodes>.

              Cman  uses different defaults for some of the openais parameters
              listed in openais.conf(5).  If you wish  to  use  a  non-default
              setting,  they can be configured in cluster.conf as shown above.
              Cman uses the following default values:

                <totem
                  vsftype="none"
                  token="10000"
                  token_retransmits_before_loss_const="20"
                  join="60"
                  consensus="4800"
                  rrp_mode="none"
                  <!-- or rrp_mode="active" if altnames are present >
                />
                <logging syslog_facility="local4" />
                <aisexec user="root" group="root" />

              Here?s how to set the token timeout to five seconds:

                <totem token="5000"/>

              And this is how to add extra openais logging options to CMAN and
              CPG:

                <logging to_stderr="yes">
                  <logger ident="CPG" debug="on" to_stderr="yes">
                  </logger>
                  <logger ident="CMAN" debug="on" to_stderr="yes">
                  </logger>
                </logging>



From jakov.sosic at srce.hr  Mon Dec 21 00:06:14 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Mon, 21 Dec 2009 01:06:14 +0100
Subject: [Linux-cluster] openais on RHEL 5.4 and broadcast?
In-Reply-To: <1261344987.2628.7.camel@localhost.localdomain>
References: <1261153075.1918.8.camel@localhost>
	<1261262433.2565.3.camel@localhost>
	<1261344987.2628.7.camel@localhost.localdomain>
Message-ID: <1261353974.1663.17.camel@localhost>

On Sun, 2009-12-20 at 14:36 -0700, Steven Dake wrote:

> The developers get alot of mail.  It was probably missed.  My apologies.

No problems... you are not paid to answer all the questions posted here.
Thank you in any case for your quick answer!


> When using cman, you can specify totem parameters in the cluster.conf
> file.  The man page on rhel5.4 may be out of date.  I can't recommend
> broadcast, but to use it in cman, add <totem broadcast="yes"> inside the
> cluster block.  From the current man page:

That's the part I was searching for. It's not documented anywhere... I
tried with:

<totem>
   <interface broadcast="yes"/>
</totem>

because that's the analogy to syntax of openais.conf.

Also, I've read RHEL 5.4 release documents, in which it states that
openais now supports broadcast, but that it's a "technology preview".
Also, there's no mention of broadcast in cman. I've read about setting
up cman, with all the totem/logging/event/aisexec stuff but no sign of
broadcast at all. Maybe you should put a note, and the example you
provided, to the man pages too?


And about not recommending broadcast - I know multicast is superior for
cluster membership usage, but I'm just pushed to the wall with Cisco
switches which don't support pim sparse-dense-mode, with current IOS,
and require license upgrade which would result in additional costs :(

Also from my experience with RHEL v4, broadcast worked as a charm, and I
rarely if ever had problems with it.

As I say, I use multicast if I can, but I think this addon to 5.4 is a
great thing because now RHCS can be run on any kind of switches no
matter if the support the IGMP snooping / pim-sparse-dense or not.



Once again, I will try this solution, and I will report back my results.
Thank you.



-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
|                                                               |




From jakov.sosic at srce.hr  Mon Dec 21 00:15:26 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Mon, 21 Dec 2009 01:15:26 +0100
Subject: [Linux-cluster] openais on RHEL 5.4 and broadcast?
In-Reply-To: <1261344987.2628.7.camel@localhost.localdomain>
References: <1261153075.1918.8.camel@localhost>
	<1261262433.2565.3.camel@localhost>
	<1261344987.2628.7.camel@localhost.localdomain>
Message-ID: <1261354526.1663.19.camel@localhost>

On Sun, 2009-12-20 at 14:36 -0700, Steven Dake wrote:
> <totem broadcast="yes">

Nope, it does not work as expected :(

Dec 21 01:13:46 gate2 openais[24487]: [MAIN ] AIS Executive Service
RELEASE 'subrev 1887 version 0.80.6' 
Dec 21 01:13:46 gate2 openais[24487]: [MAIN ] Copyright (C) 2002-2006
MontaVista Software, Inc and contributors. 
Dec 21 01:13:46 gate2 openais[24487]: [MAIN ] Copyright (C) 2006 Red
Hat, Inc. 
Dec 21 01:13:46 gate2 openais[24487]: [MAIN ] AIS Executive Service:
started and ready to provide service. 
Dec 21 01:13:46 gate2 openais[24487]: [MAIN ] Using default multicast
address of 239.192.213.177 


I've tried with both:
<totem broadcast="yes"></totem>
and
<totem broadcast="yes"/>




-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
|                                                               |




From peter.tiggerdine at uq.edu.au  Mon Dec 21 03:26:38 2009
From: peter.tiggerdine at uq.edu.au (Peter Tiggerdine)
Date: Mon, 21 Dec 2009 13:26:38 +1000
Subject: [Linux-cluster] Rebooting qdisk master causes quorum to dissolve.
Message-ID: <DB208538359CE54C920E89A5038023EAC0313A@UQEXMB5.soe.uq.edu.au>

Hi,

I have a five node cluster with a shared quorum disk without heuristics.
Because of the a hardware problem I need to move the services off the
host in question and replace some ram. The services moved without a
hitch, but soon as I rebooted the nodes the cluster came down.

The relevant configuration is 

<cluster alias="Services" config_version="150" name="Services">
        <quorumd interval="5" tko="12" device="/dev/emcpowere" votes="3"
log_level="9" log_facility="local4" status_file="/qdisk_status"/>
        <fence_daemon clean_start="1" post_fail_delay="15"
post_join_delay="30"/>
        <cman deadnode_timeout="90" expected_nodes="4"/> 

The relevant logs are below from an adjacent node:

Dec 21 11:40:15 io2 clurgmgrd[7271]: <notice> Member 1 shutting down 
Dec 21 11:40:40 io2 qdiskd[6820]: <info> Node 1 shutdown 
Dec 21 11:40:47 io2 openais[6801]: [CMAN ] lost contact with quorum
device 
Dec 21 11:40:47 io2 openais[6801]: [CMAN ] quorum lost, blocking
activity 
Dec 21 11:40:47 io2 clurgmgrd[7271]: <emerg> #1: Quorum Dissolved 
Dec 21 11:40:47 io2 kernel: dlm: closing connection to node 1

Have I configured this in-correctly or is the a known problem with
rebooting the qdisk master? It's just occurred to me that I did lock the
resource groups to prevent the moved services from returning to the
node.

Thanks in-advance and look forward to your replies, 

Peter Tiggerdine
HPC & eResearch Specialist
High Performance Computing Group
Information Technology Services
University of Queensland




From a.alawi at auckland.ac.nz  Mon Dec 21 05:07:54 2009
From: a.alawi at auckland.ac.nz (Abraham Alawi)
Date: Mon, 21 Dec 2009 18:07:54 +1300
Subject: [Linux-cluster] large-scale ( size: +10TB,
	users: +500 ) file server (Samba & NFS) using RHCS (CLVM + GFS) +
	CTDB + CoRAID (AoE) as backend storage?
Message-ID: <104A2F73-255D-4FE5-8E84-7DE4A87C322A@auckland.ac.nz>

Has anyone successfully setup a production large-scale ( size: +10TB, users: +500, concurrent/active users: +50 ) file server (Samba & NFS) using RHCS (CLVM + GFS) + CTDB + CoRAID (AoE) as backend storage? I'd be thankful if someone can share their experience with that sort of setup.  The setup I've done works but I'm not confident enough to put it in production, it doesn't consistently cope well under high load, sometimes it's RHCS (fencing, rgmanager, gfs, clvm) related misbehavior, other times it's CTDB related. I'm using the latest versions of RHCS & CTDB & AoE. 

This is the basic layout:
Nodes: 3 (identical IBM blades)
Fencing: IBM blade fence
FS: GFS (GFS2 seems to be less reliable even without CTDB)
Service Network: eth0
RHCS (multicasting) & CoRAID/AoE Network: eth1 (isolated from the service network)

RHCS handles the availability of CTDB through rgmanager, three services ensures the running of three services exclusively:
ctdb{1-3}: clvm --> gfs --> ctdb

CTDB handles the IP failover + Samba + NFS

I'm also interested to know if someone had a production CTDB with other cluster file systems like GPFS or OCFS or Lustre. 


Cheers,

  -- Abraham

''''''''''''''''''''''''''''''''''''''''''''''''''''''
Abraham Alawi

Unix/Linux Systems Administrator
Science IT
University of Auckland
e: a.alawi at auckland.ac.nz
p: +64-9-373 7599, ext#: 87572

''''''''''''''''''''''''''''''''''''''''''''''''''''''




From ccaulfie at redhat.com  Mon Dec 21 08:25:38 2009
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Mon, 21 Dec 2009 08:25:38 +0000
Subject: [Linux-cluster] openais on RHEL 5.4 and broadcast?
In-Reply-To: <1261354526.1663.19.camel@localhost>
References: <1261153075.1918.8.camel@localhost>	<1261262433.2565.3.camel@localhost>	<1261344987.2628.7.camel@localhost.localdomain>
	<1261354526.1663.19.camel@localhost>
Message-ID: <4B2F3102.6080000@redhat.com>

On 21/12/09 00:15, Jakov Sosic wrote:
> On Sun, 2009-12-20 at 14:36 -0700, Steven Dake wrote:
>> <totem broadcast="yes">
>
> Nope, it does not work as expected :(
>
> Dec 21 01:13:46 gate2 openais[24487]: [MAIN ] AIS Executive Service
> RELEASE 'subrev 1887 version 0.80.6'
> Dec 21 01:13:46 gate2 openais[24487]: [MAIN ] Copyright (C) 2002-2006
> MontaVista Software, Inc and contributors.
> Dec 21 01:13:46 gate2 openais[24487]: [MAIN ] Copyright (C) 2006 Red
> Hat, Inc.
> Dec 21 01:13:46 gate2 openais[24487]: [MAIN ] AIS Executive Service:
> started and ready to provide service.
> Dec 21 01:13:46 gate2 openais[24487]: [MAIN ] Using default multicast
> address of 239.192.213.177
>
>
> I've tried with both:
> <totem broadcast="yes"></totem>
> and
> <totem broadcast="yes"/>
>


If you're using cman you need to tell cman to enable broadcast (because 
it affects its internals too). So the correct key is

<cman broadcast="yes"/>

Chrissie



From jakov.sosic at srce.hr  Mon Dec 21 11:09:24 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Mon, 21 Dec 2009 12:09:24 +0100
Subject: [Linux-cluster] Rebooting qdisk master causes quorum to
 dissolve.
In-Reply-To: <DB208538359CE54C920E89A5038023EAC0313A@UQEXMB5.soe.uq.edu.au>
References: <DB208538359CE54C920E89A5038023EAC0313A@UQEXMB5.soe.uq.edu.au>
Message-ID: <1261393764.2280.2.camel@localhost>

On Mon, 2009-12-21 at 13:26 +1000, Peter Tiggerdine wrote:
> Hi,
> 
> I have a five node cluster with a shared quorum disk without heuristics.
> Because of the a hardware problem I need to move the services off the
> host in question and replace some ram. The services moved without a
> hitch, but soon as I rebooted the nodes the cluster came down.
> 
> The relevant configuration is 
> 
> <cluster alias="Services" config_version="150" name="Services">
>         <quorumd interval="5" tko="12" device="/dev/emcpowere" votes="3"
> log_level="9" log_facility="local4" status_file="/qdisk_status"/>
>         <fence_daemon clean_start="1" post_fail_delay="15"
> post_join_delay="30"/>
>         <cman deadnode_timeout="90" expected_nodes="4"/> 

Try something like this:


<cman quorum_dev_poll="25000"/>
<totem token="25000"/>

<quorumd interval="2" tko="10" votes="2" label="One2Play-SAS-qdisk"
status_file="/tmp/qdisk" stop_cman="1"/>

I think your token timeout and cman quorum_dev_poll should be few
seconds bigger than interval * tko (which is in my case 2x10=20 secs,
and other values are 25 secs).

This means that one node will be fenced after 25 seconds.




-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
|                                                               |




From jakov.sosic at srce.hr  Mon Dec 21 11:22:57 2009
From: jakov.sosic at srce.hr (Jakov Sosic)
Date: Mon, 21 Dec 2009 12:22:57 +0100
Subject: [Linux-cluster] openais on RHEL 5.4 and broadcast?
In-Reply-To: <4B2F3102.6080000@redhat.com>
References: <1261153075.1918.8.camel@localhost>
	<1261262433.2565.3.camel@localhost>
	<1261344987.2628.7.camel@localhost.localdomain>
	<1261354526.1663.19.camel@localhost>  <4B2F3102.6080000@redhat.com>
Message-ID: <1261394577.2280.8.camel@localhost>

On Mon, 2009-12-21 at 08:25 +0000, Christine Caulfield wrote:

> If you're using cman you need to tell cman to enable broadcast (because 
> it affects its internals too). So the correct key is
> 
> <cman broadcast="yes"/>

I've put it in both token and cman sections of XML file, but still no
progress. Aisexec is started with multicast address again - for some
reason. I've tried with forcing -4 option to ccsd
(via /etc/sysconfig/cman variable), but again that does not help. Idea
for that came from ccsd(8) which says:

-4     Use IPv4 for inter-node communication.  By default, IPv6 is
tried, then IPv4.

If  you  are  using IPv4, the default action is to use broadcast.
Specifying this option will cause multicast to be used in that
instance.


I guess I'll take a look at the source.


Thank you for your assistance.



-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
|                                                               |




From rmicmirregs at gmail.com  Mon Dec 21 20:05:03 2009
From: rmicmirregs at gmail.com (Rafael =?ISO-8859-1?Q?Mic=F3?= Miranda)
Date: Mon, 21 Dec 2009 21:05:03 +0100
Subject: [Linux-cluster] Quorum disk over RAID software device
In-Reply-To: <29ae894c0912161141t1085baf7t6bbba32a82820bc1@mail.gmail.com>
References: <1260828909.6558.24.camel@mecatol>
	<4B279DBC.4090102@pfaffeneder.org> <1260894414.1878.1.camel@localhost>
	<1260905003.7153.34.camel@mecatol>
	<29ae894c0912151215g433305ebncfde15dd10e124ea@mail.gmail.com>
	<1260990544.6687.23.camel@mecatol>
	<29ae894c0912161141t1085baf7t6bbba32a82820bc1@mail.gmail.com>
Message-ID: <1261425904.7365.7.camel@mecatol>

Hi Brem

El mi?, 16-12-2009 a las 20:41 +0100, brem belguebli escribi?:
> In my multipath setup I use the following :
> 
> polling_interval        3 (checks the storage every 3 seconds)
> no_path_retry     5 (will check 5 times the path if failure happens on
> it, making it  last scsi_timer (/sys/block/sdXX/device/timeout) + 5*3
> secondes )
> 
> path_grouping_policy    multibus (to load-balance accross all paths,
> group_by_prio may be recommended with MSA if it is an active/passive
> array?)
> 
> >From my experience,
> no_path_retry, when using mirror (md or LVM) could be put to fail
> instead of 5 in my case.
> 
> Concerning the flush_on_last_del, it just means that for a given LUN,
> when there is only one path remaining, if it comes to fail, what
> behaviour to adopt.
> 
> Same consideration, if using mirror, just fail.
> 
> The thing to take into account is the interval at which your qdisk
> process accesses the qdisk lun, if configured to a high value (let's
> imagine every 65 seconds) it'll take (worst case) 60 seconds of scsi
> timeout (default) + 12 times default polling interval  (30 seconds if
> I'm not wrong) + 5 seconds=  425 seconds.....
> 
> Brem
> 
> 2009/12/16 Rafael Mic? Miranda <rmicmirregs at gmail.com>:

> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

After some testings I have to drop the idea.

With both solutions (MDADM RAID  software and LVM Mirror) I have some
inconsistencies. When a disk of the pair fails sometimes it leaves the
qdisk device unreachable. I have tested it with different multipath
options (fail_if_no_queue with fail value) and I find the behaviour not
predictable.

I'm leaving the idea and I'll go back to a 6 nodes + 1 qdisk device
architecture.

Thanks to all. Cheers,

Rafael

-- 
Rafael Mic? Miranda



From mdiesburg at gmail.com  Tue Dec 22 17:23:33 2009
From: mdiesburg at gmail.com (Marty Diesburg)
Date: Tue, 22 Dec 2009 11:23:33 -0600
Subject: [Linux-cluster] Mysql.sh error.
Message-ID: <5370ab990912220923t5564be7do8a624929421992b9@mail.gmail.com>

Hi all,

I am new to the list and have an issue with the Mysql service.  It is
running, but when I run the commands /usr/share/cluster/mysql.sh restart, or
/usr/share/cluster/mysql.sh status I get
the following errors.  I am using mysql as a database for an email server
with Dovecot, Qmail, and Vpopmail.

Thanks and Happy Holidays!

Marty Diesburg
Adv. Tech
Independence Telcom



<?xml version="1.0"?>
<cluster alias="mailcluster" config_version="52" name="mailcluster">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="mailcluster-node0" nodeid="1" votes="1">
<fence>
<method name="1">
<device lanplus="" name="node0-ipmi"/>
</method>
</fence>
</clusternode>
<clusternode name="mailcluster-node1" nodeid="2" votes="1">
<fence>
<method name="1">
<device lanplus="" name="node1-ipmi"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" auth="password" ipaddr="24.149.30.43"
login="admin" name="node0-ipmi" passwd="admin"/>
<fencedevice agent="fence_ipmilan" auth="password" ipaddr="24.149.30.44"
login="admin" name="node1-ipmi" passwd="admin"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="mailcluster" ordered="0" restricted="1">
<failoverdomainnode name="mailcluster-node1" priority="1"/>
<failoverdomainnode name="mailcluster-node0" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<fs device="/dev/storage0/production" force_fsck="0" force_unmount="0"
fsid="37651" fstype="ext3" mountpoint="/production" name="production"
options="" self_fence="1"/>
<fs device="/dev/storage0/backup" force_fsck="0" force_unmount="0"
fsid="46184" fstype="ext3" mountpoint="/backup" name="backup" options=""
self_fence="1"/>
<ip address="24.149.30.40" monitor_link="1"/>
<mysql config_file="/production/mysql/etc/my.cnf"
listen_address="24.149.30.40" mysql_options="" name="productionsql"
shutdown_wait=""/>
<script file="/production/qmail/bin/qmailctl" name="qmail"/>
</resources>
<service autostart="1" domain="mailcluster" name="mailcluster"
recovery="restart">
<ip ref="24.149.30.40">
<fs ref="backup"/>
<fs ref="production"/>
<mysql ref="productionsql">
<fs ref="production"/>
</mysql>
<script ref="qmail">
<fs ref="production"/>
</script>
</ip>
</service>
</rm>
</cluster>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091222/85a43ef7/attachment.htm>

From mdiesburg at gmail.com  Tue Dec 22 17:27:29 2009
From: mdiesburg at gmail.com (Marty Diesburg)
Date: Tue, 22 Dec 2009 11:27:29 -0600
Subject: [Linux-cluster] Mysql.sh error.
Message-ID: <5370ab990912220927i38092060t7d439dea109e6e00@mail.gmail.com>

Sorry, for the double-post, ---great way to start on the list :).  Below has
the error message as well "Failed - Invalid Name Of Service".

Hi all,

I am new to the list and have an issue with the Mysql service.  It is
running, but when I run the commands /usr/share/cluster/mysql.sh restart, or
/usr/share/cluster/mysql.sh status I get
the following errors.  I am using mysql as a database for an email server
with Dovecot, Qmail, and Vpopmail.

<debug>  Verifying Configuration Of default
<error>  Verifying Configuration Of default > Failed - Invalid Name Of
Service
<debug>  Monitoring Service default
<debug>  Monitoring Service default > Service Is Running




Thanks and Happy Holidays!

Marty Diesburg
Adv. Tech
Independence Telcom



<?xml version="1.0"?>
<cluster alias="mailcluster" config_version="52" name="mailcluster">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="mailcluster-node0" nodeid="1" votes="1">
<fence>
<method name="1">
<device lanplus="" name="node0-ipmi"/>
</method>
</fence>
</clusternode>
<clusternode name="mailcluster-node1" nodeid="2" votes="1">
<fence>
<method name="1">
<device lanplus="" name="node1-ipmi"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" auth="password" ipaddr="24.149.30.43"
login="admin" name="node0-ipmi" passwd="admin"/>
<fencedevice agent="fence_ipmilan" auth="password" ipaddr="24.149.30.44"
login="admin" name="node1-ipmi" passwd="admin"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="mailcluster" ordered="0" restricted="1">
<failoverdomainnode name="mailcluster-node1" priority="1"/>
<failoverdomainnode name="mailcluster-node0" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<fs device="/dev/storage0/production" force_fsck="0" force_unmount="0"
fsid="37651" fstype="ext3" mountpoint="/production" name="production"
options="" self_fence="1"/>
<fs device="/dev/storage0/backup" force_fsck="0" force_unmount="0"
fsid="46184" fstype="ext3" mountpoint="/backup" name="backup" options=""
self_fence="1"/>
<ip address="24.149.30.40" monitor_link="1"/>
<mysql config_file="/production/mysql/etc/my.cnf"
listen_address="24.149.30.40" mysql_options="" name="productionsql"
shutdown_wait=""/>
<script file="/production/qmail/bin/qmailctl" name="qmail"/>
</resources>
<service autostart="1" domain="mailcluster" name="mailcluster"
recovery="restart">
<ip ref="24.149.30.40">
<fs ref="backup"/>
<fs ref="production"/>
<mysql ref="productionsql">
<fs ref="production"/>
</mysql>
<script ref="qmail">
<fs ref="production"/>
</script>
</ip>
</service>
</rm>
</cluster>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091222/14ece5f7/attachment.htm>

From lhh at redhat.com  Tue Dec 22 18:38:13 2009
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 22 Dec 2009 13:38:13 -0500
Subject: [Linux-cluster] Mysql.sh error.
In-Reply-To: <5370ab990912220927i38092060t7d439dea109e6e00@mail.gmail.com>
References: <5370ab990912220927i38092060t7d439dea109e6e00@mail.gmail.com>
Message-ID: <1261507093.26419.83.camel@localhost.localdomain>

On Tue, 2009-12-22 at 11:27 -0600, Marty Diesburg wrote:
> Sorry, for the double-post, ---great way to start on the list :).
>  Below has the error message as well "Failed - Invalid Name Of
> Service".


> I am new to the list and have an issue with the Mysql service.  It is
> running, but when I run the commands /usr/share/cluster/mysql.sh
> restart, or /usr/share/cluster/mysql.sh status I get
> the following errors.  I am using mysql as a database for an email
> server with Dovecot, Qmail, and Vpopmail.
> 
> 
> <debug>  Verifying Configuration Of default
> <error>  Verifying Configuration Of default > Failed - Invalid Name Of
> Service
> <debug>  Monitoring Service default
> <debug>  Monitoring Service default > Service Is Running

Try:

rg_test test /etc/cluster/cluster.conf status mysql productionsql

For restarting, use 'clusvcadm -R mailcluster".  If you need to work on
your mysql instance while the rest of your application is running, you
need to do:

  clusvcadm -Z mailcluster
  rg_test test /etc/cluster/cluster.conf stop mysql productionsql
  [do stuff]
  rg_test test /etc/cluster/cluster.conf start mysql productionsql
  clusvcadm -U mailcluster

Your service can be simplified a lot, as well:

<service autostart="1" domain="mailcluster"
       name="mailcluster" recovery="restart">
  <fs ref="backup"/>
  <fs ref="production"/>
  <ip ref="24.149.30.40"/>
  <mysql ref="productionsql"/>
  <script ref="qmail"/>
</service>

-- Lon



From lhh at redhat.com  Tue Dec 22 21:49:35 2009
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 22 Dec 2009 16:49:35 -0500
Subject: [Linux-cluster] Rebooting qdisk master causes quorum to
 dissolve.
In-Reply-To: <DB208538359CE54C920E89A5038023EAC0313A@UQEXMB5.soe.uq.edu.au>
References: <DB208538359CE54C920E89A5038023EAC0313A@UQEXMB5.soe.uq.edu.au>
Message-ID: <1261518575.6351.67.camel@localhost.localdomain>

On Mon, 2009-12-21 at 13:26 +1000, Peter Tiggerdine wrote:
> Hi,
> 
> I have a five node cluster with a shared quorum disk without heuristics.
> Because of the a hardware problem I need to move the services off the
> host in question and replace some ram. The services moved without a
> hitch, but soon as I rebooted the nodes the cluster came down.
> 
> The relevant configuration is 
> 
> <cluster alias="Services" config_version="150" name="Services">
>         <quorumd interval="5" tko="12" device="/dev/emcpowere" votes="3"
> log_level="9" log_facility="local4" status_file="/qdisk_status"/>
>         <fence_daemon clean_start="1" post_fail_delay="15"
> post_join_delay="30"/>
>         <cman deadnode_timeout="90" expected_nodes="4"/> 

http://kbase.redhat.com/faq/docs/DOC-2882

-- Lon




From frank at si.ct.upc.edu  Wed Dec 23 12:24:29 2009
From: frank at si.ct.upc.edu (frank)
Date: Wed, 23 Dec 2009 13:24:29 +0100
Subject: [Linux-cluster] lock_dlm but local flocks = true?
Message-ID: <4B320BFD.6000106@si.ct.upc.edu>

Hi,
I'm running RHEL 5.4 with a GFS1 filesystem (over a SAN)
After checking the "ping_pong" test on the gfs filesystem I noticed 
counters was too much high, like doing the test with a local filesystem. 
But gfs filesystem is mounted with "lock_dlm" option. The only strange 
thing I see is in "gfs_tool df" output because it shows "Local flocks = 
TRUE" instead of "Local flocks = FALSE"

How is this possible? Can I change this in some way?
Thanks for your help.


--------------------------------------------------------------------------------
[root at heraclito Cluster]# gfs_tool df /mnt/gfs
/mnt/gfs:
   SB lock proto = "lock_dlm"
   SB lock table = "hr-pm:gfs01"
   SB ondisk format = 1309
   SB multihost format = 1401
   Block size = 4096
   Journals = 2
   Resource Groups = 4096
   Mounted lock proto = "lock_dlm"
   Mounted lock table = "hr-pm:gfs01"
   Mounted host data = "jid=0:id=65537:first=1"
   Journal number = 0
   Lock module flags = 0
   Local flocks = TRUE
   Local caching = FALSE
   Oopses OK = FALSE
--------------------------------------------------------------------------------

-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est? net.
For all your IT requirements visit: http://www.transtec.co.uk



From swhiteho at redhat.com  Wed Dec 23 13:15:46 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 23 Dec 2009 13:15:46 +0000
Subject: [Linux-cluster] lock_dlm but local flocks = true?
In-Reply-To: <4B320BFD.6000106@si.ct.upc.edu>
References: <4B320BFD.6000106@si.ct.upc.edu>
Message-ID: <1261574146.2219.4.camel@localhost>

Hi,

On Wed, 2009-12-23 at 13:24 +0100, frank wrote:
> Hi,
> I'm running RHEL 5.4 with a GFS1 filesystem (over a SAN)
> After checking the "ping_pong" test on the gfs filesystem I noticed 
> counters was too much high, like doing the test with a local filesystem. 
> But gfs filesystem is mounted with "lock_dlm" option. The only strange 
> thing I see is in "gfs_tool df" output because it shows "Local flocks = 
> TRUE" instead of "Local flocks = FALSE"
> 
> How is this possible? Can I change this in some way?
> Thanks for your help.
> 
> 
You need to avoid specifying the localflocks mount parameter if you want
the flocks to be true cluster locks,

Steve.




From diamondiona at gmail.com  Wed Dec 23 14:32:12 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Wed, 23 Dec 2009 22:32:12 +0800
Subject: [Linux-cluster] lvcreate hangs always
In-Reply-To: <dd23a5e0912230243s6761f43m162a9f9cd9c7cced@mail.gmail.com>
References: <dd23a5e0912230228y291a2548qb8c7e4b51c06d047@mail.gmail.com>
	<dd23a5e0912230241k63fa7f77i9946baa3a46d537d@mail.gmail.com>
	<dd23a5e0912230243s6761f43m162a9f9cd9c7cced@mail.gmail.com>
Message-ID: <dd23a5e0912230632ieeeba3cs9c445d7e3a795d09@mail.gmail.com>

On Wed, Dec 23, 2009 at 6:43 PM, Diamond Li <diamondiona at gmail.com> wrote:
> [root at wplccdlvm446 ~]# uname -r
> 2.6.18-164.el5
>
>
> On Wed, Dec 23, 2009 at 6:41 PM, Diamond Li <diamondiona at gmail.com> wrote:
>> same result, every I added nosync parameter.
>>
>> [root at wplccdlvm446 ~]# lvcreate -vvvv ?--nosync -m1 --mirrorlog core
>> -L 800M ? vg00
>>
>> On Wed, Dec 23, 2009 at 6:28 PM, Diamond Li <diamondiona at gmail.com> wrote:
>>> Hello ?everyone,
>>>
>>> I am trying to create a mirror LVM on cluster, but lvcreate hangs
>>> every time and I have to use kill from another terminal. According to
>>> release note, this should be supported since ?5.3
>>>
>>> Any words from wisdoms?
>>>
>>> [root at wplccdlvm446 ~]# lvcreate -vvvv ?-m1 --mirrorlog core -L 800M ? vg00
>>> #lvmcmdline.c:987 ? ? ? ? Processing: lvcreate -vvvv -m1 --mirrorlog
>>> core -L 800M vg00
>>> #lvmcmdline.c:990 ? ? ? ? O_DIRECT will be used
>>> #config/config.c:950 ? ? ? Setting global/locking_type to 3
>>> #locking/locking.c:253 ? ? ? Cluster locking selected.
>>> #activate/activate.c:363 ? ? ? Getting target version for linear
>>> #ioctl/libdm-iface.c:1672 ? ? ? ? dm version ? OF ? [16384]
>>> #ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>>> #activate/activate.c:363 ? ? ? Getting target version for striped
>>> #ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>>> #lvcreate.c:318 ? ? Setting logging type to core
>>> #config/config.c:950 ? ? ? Setting activation/mirror_region_size to 512
>>> #lvcreate.c:997 ? ? Finding volume group "vg00"
>>> #locking/cluster_locking.c:458 ? ? ? Locking VG V_vg00 PW B (0x4)
>>>
>>> [root at wplccdlvm446 ~]# cat /etc/redhat-release
>>> Red Hat Enterprise Linux Server release 5.4 (Tikanga)
>>>
>>
>



From diamondiona at gmail.com  Wed Dec 23 14:38:34 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Wed, 23 Dec 2009 22:38:34 +0800
Subject: [Linux-cluster] lvcreate hangs always
In-Reply-To: <dd23a5e0912230632ieeeba3cs9c445d7e3a795d09@mail.gmail.com>
References: <dd23a5e0912230228y291a2548qb8c7e4b51c06d047@mail.gmail.com>
	<dd23a5e0912230241k63fa7f77i9946baa3a46d537d@mail.gmail.com>
	<dd23a5e0912230243s6761f43m162a9f9cd9c7cced@mail.gmail.com>
	<dd23a5e0912230632ieeeba3cs9c445d7e3a795d09@mail.gmail.com>
Message-ID: <dd23a5e0912230638q66888a9dp1b8f0bcf5911d2fa@mail.gmail.com>

Hello  everyone,

 I am trying to create a mirror LVM on cluster, but lvcreate hangs
 every time and I have to use kill from another terminal. According to
 release note, this should be supported since  5.3

 Any words from wisdoms?

 [root at wplccdlvm446 ~]# lvcreate -vvvv  -m1 --mirrorlog core -L 800M   vg00
 #lvmcmdline.c:987         Processing: lvcreate -vvvv -m1 --mirrorlog
 core -L 800M vg00
 #lvmcmdline.c:990         O_DIRECT will be used
 #config/config.c:950       Setting global/locking_type to 3
 #locking/locking.c:253       Cluster locking selected.
 #activate/activate.c:363       Getting target version for linear
 #ioctl/libdm-iface.c:1672         dm version   OF   [16384]
 #ioctl/libdm-iface.c:1672         dm versions   OF   [16384]
 #activate/activate.c:363       Getting target version for striped
 #ioctl/libdm-iface.c:1672         dm versions   OF   [16384]
 #lvcreate.c:318     Setting logging type to core
 #config/config.c:950       Setting activation/mirror_region_size to 512
 #lvcreate.c:997     Finding volume group "vg00"
 #locking/cluster_locking.c:458       Locking VG V_vg00 PW B (0x4)

 [root at wplccdlvm446 ~]# cat /etc/redhat-release
 Red Hat Enterprise Linux Server release 5.4 (Tikanga)

[root at wplccdlvm446 ~]# uname -r
 2.6.18-164.el5
 same result, every I added nosync parameter.

 [root at wplccdlvm446 ~]# lvcreate -vvvv  --nosync -m1 --mirrorlog core
 -L 800M   vg00



From frank at si.ct.upc.edu  Wed Dec 23 14:53:10 2009
From: frank at si.ct.upc.edu (frank)
Date: Wed, 23 Dec 2009 15:53:10 +0100
Subject: [Linux-cluster] lock_dlm but local flocks = true?
In-Reply-To: <mailman.4086.1261579137.11679.linux-cluster@redhat.com>
References: <mailman.4086.1261579137.11679.linux-cluster@redhat.com>
Message-ID: <4B322ED6.6090607@si.ct.upc.edu>

Hi Steve, thanks for your answer
but I have not put the "localflocks" mount parameter anywhere. Look at 
"gfs_tool df" output:

# gfs_tool df /mnt/gfs
/mnt/gfs:
   SB lock proto = "lock_dlm"
   SB lock table = "H-N:gfs01"
   SB ondisk format = 1309
   SB multihost format = 1401
   Block size = 4096
   Journals = 2
   Resource Groups = 200
   Mounted lock proto = "lock_dlm"
   Mounted lock table = "H-N:gfs01"
   Mounted host data = "jid=0:id=196610:first=1"
   Journal number = 0
   Lock module flags = 0
   Local flocks = TRUE
   Local caching = FALSE
   Oopses OK = FALSE

it says 'Mounted lock proto = "lock_dlm" ' because that is what I did. 
So why is it using "local flocks"?

Frank
> Date: Wed, 23 Dec 2009 13:15:46 +0000 From: Steven Whitehouse 
> <swhiteho at redhat.com> To: linux clustering <linux-cluster at redhat.com> 
> Subject: Re: [Linux-cluster] lock_dlm but local flocks = true? 
> Message-ID: <1261574146.2219.4.camel at localhost> Content-Type: 
> text/plain; charset="UTF-8" Hi, On Wed, 2009-12-23 at 13:24 +0100, 
> frank wrote:
>> >  Hi,
>> >  I'm running RHEL 5.4 with a GFS1 filesystem (over a SAN)
>> >  After checking the "ping_pong" test on the gfs filesystem I noticed
>> >  counters was too much high, like doing the test with a local filesystem.
>> >  But gfs filesystem is mounted with "lock_dlm" option. The only strange
>> >  thing I see is in "gfs_tool df" output because it shows "Local flocks =
>> >  TRUE" instead of "Local flocks = FALSE"
>> >  
>> >  How is this possible? Can I change this in some way?
>> >  Thanks for your help.
>> >  
>> >  
>>      
> You need to avoid specifying the localflocks mount parameter if you want
> the flocks to be true cluster locks,
>
> Steve.
>    


-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est? net.
For all your IT requirements visit: http://www.transtec.co.uk



From swhiteho at redhat.com  Wed Dec 23 15:15:28 2009
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 23 Dec 2009 15:15:28 +0000
Subject: [Linux-cluster] lock_dlm but local flocks = true?
In-Reply-To: <4B322ED6.6090607@si.ct.upc.edu>
References: <mailman.4086.1261579137.11679.linux-cluster@redhat.com>
	<4B322ED6.6090607@si.ct.upc.edu>
Message-ID: <1261581328.14393.113.camel@localhost.localdomain>

Hi,

On Wed, 2009-12-23 at 15:53 +0100, frank wrote:
> Hi Steve, thanks for your answer
> but I have not put the "localflocks" mount parameter anywhere. Look at 
> "gfs_tool df" output:
> 
> # gfs_tool df /mnt/gfs
> /mnt/gfs:
>    SB lock proto = "lock_dlm"
>    SB lock table = "H-N:gfs01"
>    SB ondisk format = 1309
>    SB multihost format = 1401
>    Block size = 4096
>    Journals = 2
>    Resource Groups = 200
>    Mounted lock proto = "lock_dlm"
>    Mounted lock table = "H-N:gfs01"
>    Mounted host data = "jid=0:id=196610:first=1"
>    Journal number = 0
>    Lock module flags = 0
>    Local flocks = TRUE
>    Local caching = FALSE
>    Oopses OK = FALSE
> 
> it says 'Mounted lock proto = "lock_dlm" ' because that is what I did. 
> So why is it using "local flocks"?
> 
I don't know. What does it say in /proc/mounts? (or what was your mount
command line?)

Steve.




From diamondiona at gmail.com  Thu Dec 24 12:40:34 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Thu, 24 Dec 2009 20:40:34 +0800
Subject: [Linux-cluster] lvcreate hangs always
In-Reply-To: <dd23a5e0912230638q66888a9dp1b8f0bcf5911d2fa@mail.gmail.com>
References: <dd23a5e0912230228y291a2548qb8c7e4b51c06d047@mail.gmail.com>
	<dd23a5e0912230241k63fa7f77i9946baa3a46d537d@mail.gmail.com>
	<dd23a5e0912230243s6761f43m162a9f9cd9c7cced@mail.gmail.com>
	<dd23a5e0912230632ieeeba3cs9c445d7e3a795d09@mail.gmail.com>
	<dd23a5e0912230638q66888a9dp1b8f0bcf5911d2fa@mail.gmail.com>
Message-ID: <dd23a5e0912240440s55466ab3n3a04236aad6731c3@mail.gmail.com>

Could someone  kindly help me to get through because I haven been
blocked for very long time.


On Wed, Dec 23, 2009 at 10:38 PM, Diamond Li <diamondiona at gmail.com> wrote:
> Hello ?everyone,
>
> ?I am trying to create a mirror LVM on cluster, but lvcreate hangs
> ?every time and I have to use kill from another terminal. According to
> ?release note, this should be supported since ?5.3
>
> ?Any words from wisdoms?
>
> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?-m1 --mirrorlog core -L 800M ? vg00
> ?#lvmcmdline.c:987 ? ? ? ? Processing: lvcreate -vvvv -m1 --mirrorlog
> ?core -L 800M vg00
> ?#lvmcmdline.c:990 ? ? ? ? O_DIRECT will be used
> ?#config/config.c:950 ? ? ? Setting global/locking_type to 3
> ?#locking/locking.c:253 ? ? ? Cluster locking selected.
> ?#activate/activate.c:363 ? ? ? Getting target version for linear
> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm version ? OF ? [16384]
> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
> ?#activate/activate.c:363 ? ? ? Getting target version for striped
> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
> ?#lvcreate.c:318 ? ? Setting logging type to core
> ?#config/config.c:950 ? ? ? Setting activation/mirror_region_size to 512
> ?#lvcreate.c:997 ? ? Finding volume group "vg00"
> ?#locking/cluster_locking.c:458 ? ? ? Locking VG V_vg00 PW B (0x4)
>
> ?[root at wplccdlvm446 ~]# cat /etc/redhat-release
> ?Red Hat Enterprise Linux Server release 5.4 (Tikanga)
>
> [root at wplccdlvm446 ~]# uname -r
> ?2.6.18-164.el5
> ?same result, every I added nosync parameter.
>
> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?--nosync -m1 --mirrorlog core
> ?-L 800M ? vg00
>



From brem.belguebli at gmail.com  Fri Dec 25 00:04:40 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Fri, 25 Dec 2009 01:04:40 +0100
Subject: [Linux-cluster] lvcreate hangs always
In-Reply-To: <dd23a5e0912240440s55466ab3n3a04236aad6731c3@mail.gmail.com>
References: <dd23a5e0912230228y291a2548qb8c7e4b51c06d047@mail.gmail.com>
	<dd23a5e0912230241k63fa7f77i9946baa3a46d537d@mail.gmail.com>
	<dd23a5e0912230243s6761f43m162a9f9cd9c7cced@mail.gmail.com>
	<dd23a5e0912230632ieeeba3cs9c445d7e3a795d09@mail.gmail.com>
	<dd23a5e0912230638q66888a9dp1b8f0bcf5911d2fa@mail.gmail.com>
	<dd23a5e0912240440s55466ab3n3a04236aad6731c3@mail.gmail.com>
Message-ID: <29ae894c0912241604r6b5f5bbo225e9d8ae7f9890@mail.gmail.com>

try to run cmirror (service cmirror start) before doing any clvm with mirror



2009/12/24 Diamond Li <diamondiona at gmail.com>:
> Could someone ?kindly help me to get through because I haven been
> blocked for very long time.
>
>
> On Wed, Dec 23, 2009 at 10:38 PM, Diamond Li <diamondiona at gmail.com> wrote:
>> Hello ?everyone,
>>
>> ?I am trying to create a mirror LVM on cluster, but lvcreate hangs
>> ?every time and I have to use kill from another terminal. According to
>> ?release note, this should be supported since ?5.3
>>
>> ?Any words from wisdoms?
>>
>> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?-m1 --mirrorlog core -L 800M ? vg00
>> ?#lvmcmdline.c:987 ? ? ? ? Processing: lvcreate -vvvv -m1 --mirrorlog
>> ?core -L 800M vg00
>> ?#lvmcmdline.c:990 ? ? ? ? O_DIRECT will be used
>> ?#config/config.c:950 ? ? ? Setting global/locking_type to 3
>> ?#locking/locking.c:253 ? ? ? Cluster locking selected.
>> ?#activate/activate.c:363 ? ? ? Getting target version for linear
>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm version ? OF ? [16384]
>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>> ?#activate/activate.c:363 ? ? ? Getting target version for striped
>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>> ?#lvcreate.c:318 ? ? Setting logging type to core
>> ?#config/config.c:950 ? ? ? Setting activation/mirror_region_size to 512
>> ?#lvcreate.c:997 ? ? Finding volume group "vg00"
>> ?#locking/cluster_locking.c:458 ? ? ? Locking VG V_vg00 PW B (0x4)
>>
>> ?[root at wplccdlvm446 ~]# cat /etc/redhat-release
>> ?Red Hat Enterprise Linux Server release 5.4 (Tikanga)
>>
>> [root at wplccdlvm446 ~]# uname -r
>> ?2.6.18-164.el5
>> ?same result, every I added nosync parameter.
>>
>> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?--nosync -m1 --mirrorlog core
>> ?-L 800M ? vg00
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From diamondiona at gmail.com  Mon Dec 28 02:55:14 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Mon, 28 Dec 2009 10:55:14 +0800
Subject: [Linux-cluster] lvcreate hangs always
In-Reply-To: <29ae894c0912241604r6b5f5bbo225e9d8ae7f9890@mail.gmail.com>
References: <dd23a5e0912230228y291a2548qb8c7e4b51c06d047@mail.gmail.com>
	<dd23a5e0912230241k63fa7f77i9946baa3a46d537d@mail.gmail.com>
	<dd23a5e0912230243s6761f43m162a9f9cd9c7cced@mail.gmail.com>
	<dd23a5e0912230632ieeeba3cs9c445d7e3a795d09@mail.gmail.com>
	<dd23a5e0912230638q66888a9dp1b8f0bcf5911d2fa@mail.gmail.com>
	<dd23a5e0912240440s55466ab3n3a04236aad6731c3@mail.gmail.com>
	<29ae894c0912241604r6b5f5bbo225e9d8ae7f9890@mail.gmail.com>
Message-ID: <dd23a5e0912271855v56cfdfccm570085f2d303030c@mail.gmail.com>

thanks for your reply, I have started cmirror:
[root at wplccdlvm445 ~]# service cmirror status
cmirror is running.
[root at wplccdlvm445 ~]# service clvmd status
clvmd (pid 5392) is running...

[root at wplccdlvm445 ~]# service cman  status
cman is running.
[root at wplccdlvm445 ~]# clustat
Cluster Status for clearcase @ Mon Dec 28 10:54:49 2009
Member Status: Quorate

 Member Name                                       ID   Status
 ------ ----                                       ---- ------
 wplccdlvm445.cn.ibm.com                               1 Online, Local
 wplccdlvm446.cn.ibm.com                               2 Online

[root at wplccdlvm445 ~]# lvcreate  -vvv -m1 --corelog -L 800M   vg100
        Processing: lvcreate -vvv -m1 --corelog -L 800M vg100
        O_DIRECT will be used
      Setting global/locking_type to 3
      Cluster locking selected.
      Getting target version for linear
        dm version   OF   [16384]
        dm versions   OF   [16384]
      Getting target version for striped
        dm versions   OF   [16384]
    Setting logging type to core
      Setting activation/mirror_region_size to 512
    Finding volume group "vg100"
      Locking VG V_vg100 PW B (0x4)



On Fri, Dec 25, 2009 at 8:04 AM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> try to run cmirror (service cmirror start) before doing any clvm with mirror
>
>
>
> 2009/12/24 Diamond Li <diamondiona at gmail.com>:
>> Could someone ?kindly help me to get through because I haven been
>> blocked for very long time.
>>
>>
>> On Wed, Dec 23, 2009 at 10:38 PM, Diamond Li <diamondiona at gmail.com> wrote:
>>> Hello ?everyone,
>>>
>>> ?I am trying to create a mirror LVM on cluster, but lvcreate hangs
>>> ?every time and I have to use kill from another terminal. According to
>>> ?release note, this should be supported since ?5.3
>>>
>>> ?Any words from wisdoms?
>>>
>>> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?-m1 --mirrorlog core -L 800M ? vg00
>>> ?#lvmcmdline.c:987 ? ? ? ? Processing: lvcreate -vvvv -m1 --mirrorlog
>>> ?core -L 800M vg00
>>> ?#lvmcmdline.c:990 ? ? ? ? O_DIRECT will be used
>>> ?#config/config.c:950 ? ? ? Setting global/locking_type to 3
>>> ?#locking/locking.c:253 ? ? ? Cluster locking selected.
>>> ?#activate/activate.c:363 ? ? ? Getting target version for linear
>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm version ? OF ? [16384]
>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>>> ?#activate/activate.c:363 ? ? ? Getting target version for striped
>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>>> ?#lvcreate.c:318 ? ? Setting logging type to core
>>> ?#config/config.c:950 ? ? ? Setting activation/mirror_region_size to 512
>>> ?#lvcreate.c:997 ? ? Finding volume group "vg00"
>>> ?#locking/cluster_locking.c:458 ? ? ? Locking VG V_vg00 PW B (0x4)
>>>
>>> ?[root at wplccdlvm446 ~]# cat /etc/redhat-release
>>> ?Red Hat Enterprise Linux Server release 5.4 (Tikanga)
>>>
>>> [root at wplccdlvm446 ~]# uname -r
>>> ?2.6.18-164.el5
>>> ?same result, every I added nosync parameter.
>>>
>>> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?--nosync -m1 --mirrorlog core
>>> ?-L 800M ? vg00
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From raju.rajsand at gmail.com  Mon Dec 28 04:54:14 2009
From: raju.rajsand at gmail.com (Rajagopal Swaminathan)
Date: Mon, 28 Dec 2009 10:24:14 +0530
Subject: [Linux-cluster] lvcreate hangs always
In-Reply-To: <dd23a5e0912271855v56cfdfccm570085f2d303030c@mail.gmail.com>
References: <dd23a5e0912230228y291a2548qb8c7e4b51c06d047@mail.gmail.com>
	<dd23a5e0912230241k63fa7f77i9946baa3a46d537d@mail.gmail.com>
	<dd23a5e0912230243s6761f43m162a9f9cd9c7cced@mail.gmail.com>
	<dd23a5e0912230632ieeeba3cs9c445d7e3a795d09@mail.gmail.com>
	<dd23a5e0912230638q66888a9dp1b8f0bcf5911d2fa@mail.gmail.com>
	<dd23a5e0912240440s55466ab3n3a04236aad6731c3@mail.gmail.com>
	<29ae894c0912241604r6b5f5bbo225e9d8ae7f9890@mail.gmail.com>
	<dd23a5e0912271855v56cfdfccm570085f2d303030c@mail.gmail.com>
Message-ID: <8786b91c0912272054q642ab5e2wf64f167355fa9641@mail.gmail.com>

Greetings,

I am not an expert.

But I do remember that some features of LVM like snapshot,mirroring
are not applicable when in clustered mode...

Of course I am not sure if these limitations have been removed and if
so sinch whuich version of what component of the stack.

Regards,

Rajagopal



From diamondiona at gmail.com  Mon Dec 28 05:45:15 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Mon, 28 Dec 2009 13:45:15 +0800
Subject: [Linux-cluster] lvcreate hangs always
In-Reply-To: <8786b91c0912272054q642ab5e2wf64f167355fa9641@mail.gmail.com>
References: <dd23a5e0912230228y291a2548qb8c7e4b51c06d047@mail.gmail.com>
	<dd23a5e0912230241k63fa7f77i9946baa3a46d537d@mail.gmail.com>
	<dd23a5e0912230243s6761f43m162a9f9cd9c7cced@mail.gmail.com>
	<dd23a5e0912230632ieeeba3cs9c445d7e3a795d09@mail.gmail.com>
	<dd23a5e0912230638q66888a9dp1b8f0bcf5911d2fa@mail.gmail.com>
	<dd23a5e0912240440s55466ab3n3a04236aad6731c3@mail.gmail.com>
	<29ae894c0912241604r6b5f5bbo225e9d8ae7f9890@mail.gmail.com>
	<dd23a5e0912271855v56cfdfccm570085f2d303030c@mail.gmail.com>
	<8786b91c0912272054q642ab5e2wf64f167355fa9641@mail.gmail.com>
Message-ID: <dd23a5e0912272145k37a24269u75b8ccc9a8fbe7bf@mail.gmail.com>

since 5.3, it should support, at least from release note:

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Release_Notes/index.html#id787533
<snip from 5.3 release note>

LVM-based Cluster mirroring
    With this update, the ability to create LVM mirrors in a cluster
environment (i.e. while using CLVM) is now available in Red Hat
Enterprise Linux It provides for simultaneous access from multiple
cluster machines, like when using a cluster-aware file system. This
solution is compatible with existing single-machine mirrors. When
switching a mirrored logical volume between single-machine and
cluster-aware, no resynchronization is necessary.

<snip from 5.3 release note>

On Mon, Dec 28, 2009 at 12:54 PM, Rajagopal Swaminathan
<raju.rajsand at gmail.com> wrote:
> Greetings,
>
> I am not an expert.
>
> But I do remember that some features of LVM like snapshot,mirroring
> are not applicable when in clustered mode...
>
> Of course I am not sure if these limitations have been removed and if
> so sinch whuich version of what component of the stack.
>
> Regards,
>
> Rajagopal
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From brem.belguebli at gmail.com  Mon Dec 28 07:22:02 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Mon, 28 Dec 2009 08:22:02 +0100
Subject: [Linux-cluster] lvcreate hangs always
In-Reply-To: <dd23a5e0912271855v56cfdfccm570085f2d303030c@mail.gmail.com>
References: <dd23a5e0912230228y291a2548qb8c7e4b51c06d047@mail.gmail.com>
	<dd23a5e0912230241k63fa7f77i9946baa3a46d537d@mail.gmail.com>
	<dd23a5e0912230243s6761f43m162a9f9cd9c7cced@mail.gmail.com>
	<dd23a5e0912230632ieeeba3cs9c445d7e3a795d09@mail.gmail.com>
	<dd23a5e0912230638q66888a9dp1b8f0bcf5911d2fa@mail.gmail.com>
	<dd23a5e0912240440s55466ab3n3a04236aad6731c3@mail.gmail.com>
	<29ae894c0912241604r6b5f5bbo225e9d8ae7f9890@mail.gmail.com>
	<dd23a5e0912271855v56cfdfccm570085f2d303030c@mail.gmail.com>
Message-ID: <29ae894c0912272322u71f09eb4v353747051625be86@mail.gmail.com>

did it work ?

2009/12/28 Diamond Li <diamondiona at gmail.com>:
> thanks for your reply, I have started cmirror:
> [root at wplccdlvm445 ~]# service cmirror status
> cmirror is running.
> [root at wplccdlvm445 ~]# service clvmd status
> clvmd (pid 5392) is running...
>
> [root at wplccdlvm445 ~]# service cman ?status
> cman is running.
> [root at wplccdlvm445 ~]# clustat
> Cluster Status for clearcase @ Mon Dec 28 10:54:49 2009
> Member Status: Quorate
>
> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
> ?wplccdlvm445.cn.ibm.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 Online, Local
> ?wplccdlvm446.cn.ibm.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online
>
> [root at wplccdlvm445 ~]# lvcreate ?-vvv -m1 --corelog -L 800M ? vg100
> ? ? ? ?Processing: lvcreate -vvv -m1 --corelog -L 800M vg100
> ? ? ? ?O_DIRECT will be used
> ? ? ?Setting global/locking_type to 3
> ? ? ?Cluster locking selected.
> ? ? ?Getting target version for linear
> ? ? ? ?dm version ? OF ? [16384]
> ? ? ? ?dm versions ? OF ? [16384]
> ? ? ?Getting target version for striped
> ? ? ? ?dm versions ? OF ? [16384]
> ? ?Setting logging type to core
> ? ? ?Setting activation/mirror_region_size to 512
> ? ?Finding volume group "vg100"
> ? ? ?Locking VG V_vg100 PW B (0x4)
>
>
>
> On Fri, Dec 25, 2009 at 8:04 AM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
>> try to run cmirror (service cmirror start) before doing any clvm with mirror
>>
>>
>>
>> 2009/12/24 Diamond Li <diamondiona at gmail.com>:
>>> Could someone ?kindly help me to get through because I haven been
>>> blocked for very long time.
>>>
>>>
>>> On Wed, Dec 23, 2009 at 10:38 PM, Diamond Li <diamondiona at gmail.com> wrote:
>>>> Hello ?everyone,
>>>>
>>>> ?I am trying to create a mirror LVM on cluster, but lvcreate hangs
>>>> ?every time and I have to use kill from another terminal. According to
>>>> ?release note, this should be supported since ?5.3
>>>>
>>>> ?Any words from wisdoms?
>>>>
>>>> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?-m1 --mirrorlog core -L 800M ? vg00
>>>> ?#lvmcmdline.c:987 ? ? ? ? Processing: lvcreate -vvvv -m1 --mirrorlog
>>>> ?core -L 800M vg00
>>>> ?#lvmcmdline.c:990 ? ? ? ? O_DIRECT will be used
>>>> ?#config/config.c:950 ? ? ? Setting global/locking_type to 3
>>>> ?#locking/locking.c:253 ? ? ? Cluster locking selected.
>>>> ?#activate/activate.c:363 ? ? ? Getting target version for linear
>>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm version ? OF ? [16384]
>>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>>>> ?#activate/activate.c:363 ? ? ? Getting target version for striped
>>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>>>> ?#lvcreate.c:318 ? ? Setting logging type to core
>>>> ?#config/config.c:950 ? ? ? Setting activation/mirror_region_size to 512
>>>> ?#lvcreate.c:997 ? ? Finding volume group "vg00"
>>>> ?#locking/cluster_locking.c:458 ? ? ? Locking VG V_vg00 PW B (0x4)
>>>>
>>>> ?[root at wplccdlvm446 ~]# cat /etc/redhat-release
>>>> ?Red Hat Enterprise Linux Server release 5.4 (Tikanga)
>>>>
>>>> [root at wplccdlvm446 ~]# uname -r
>>>> ?2.6.18-164.el5
>>>> ?same result, every I added nosync parameter.
>>>>
>>>> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?--nosync -m1 --mirrorlog core
>>>> ?-L 800M ? vg00
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From diamondiona at gmail.com  Mon Dec 28 08:04:40 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Mon, 28 Dec 2009 16:04:40 +0800
Subject: [Linux-cluster] lvcreate hangs always
In-Reply-To: <29ae894c0912272322u71f09eb4v353747051625be86@mail.gmail.com>
References: <dd23a5e0912230228y291a2548qb8c7e4b51c06d047@mail.gmail.com>
	<dd23a5e0912230241k63fa7f77i9946baa3a46d537d@mail.gmail.com>
	<dd23a5e0912230243s6761f43m162a9f9cd9c7cced@mail.gmail.com>
	<dd23a5e0912230632ieeeba3cs9c445d7e3a795d09@mail.gmail.com>
	<dd23a5e0912230638q66888a9dp1b8f0bcf5911d2fa@mail.gmail.com>
	<dd23a5e0912240440s55466ab3n3a04236aad6731c3@mail.gmail.com>
	<29ae894c0912241604r6b5f5bbo225e9d8ae7f9890@mail.gmail.com>
	<dd23a5e0912271855v56cfdfccm570085f2d303030c@mail.gmail.com>
	<29ae894c0912272322u71f09eb4v353747051625be86@mail.gmail.com>
Message-ID: <dd23a5e0912280004w11565736se059d75e74ad216@mail.gmail.com>

not at all, it hangs again.

On Mon, Dec 28, 2009 at 3:22 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> did it work ?
>
> 2009/12/28 Diamond Li <diamondiona at gmail.com>:
>> thanks for your reply, I have started cmirror:
>> [root at wplccdlvm445 ~]# service cmirror status
>> cmirror is running.
>> [root at wplccdlvm445 ~]# service clvmd status
>> clvmd (pid 5392) is running...
>>
>> [root at wplccdlvm445 ~]# service cman ?status
>> cman is running.
>> [root at wplccdlvm445 ~]# clustat
>> Cluster Status for clearcase @ Mon Dec 28 10:54:49 2009
>> Member Status: Quorate
>>
>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>> ?wplccdlvm445.cn.ibm.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 Online, Local
>> ?wplccdlvm446.cn.ibm.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online
>>
>> [root at wplccdlvm445 ~]# lvcreate ?-vvv -m1 --corelog -L 800M ? vg100
>> ? ? ? ?Processing: lvcreate -vvv -m1 --corelog -L 800M vg100
>> ? ? ? ?O_DIRECT will be used
>> ? ? ?Setting global/locking_type to 3
>> ? ? ?Cluster locking selected.
>> ? ? ?Getting target version for linear
>> ? ? ? ?dm version ? OF ? [16384]
>> ? ? ? ?dm versions ? OF ? [16384]
>> ? ? ?Getting target version for striped
>> ? ? ? ?dm versions ? OF ? [16384]
>> ? ?Setting logging type to core
>> ? ? ?Setting activation/mirror_region_size to 512
>> ? ?Finding volume group "vg100"
>> ? ? ?Locking VG V_vg100 PW B (0x4)
>>
>>
>>
>> On Fri, Dec 25, 2009 at 8:04 AM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>>> try to run cmirror (service cmirror start) before doing any clvm with mirror
>>>
>>>
>>>
>>> 2009/12/24 Diamond Li <diamondiona at gmail.com>:
>>>> Could someone ?kindly help me to get through because I haven been
>>>> blocked for very long time.
>>>>
>>>>
>>>> On Wed, Dec 23, 2009 at 10:38 PM, Diamond Li <diamondiona at gmail.com> wrote:
>>>>> Hello ?everyone,
>>>>>
>>>>> ?I am trying to create a mirror LVM on cluster, but lvcreate hangs
>>>>> ?every time and I have to use kill from another terminal. According to
>>>>> ?release note, this should be supported since ?5.3
>>>>>
>>>>> ?Any words from wisdoms?
>>>>>
>>>>> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?-m1 --mirrorlog core -L 800M ? vg00
>>>>> ?#lvmcmdline.c:987 ? ? ? ? Processing: lvcreate -vvvv -m1 --mirrorlog
>>>>> ?core -L 800M vg00
>>>>> ?#lvmcmdline.c:990 ? ? ? ? O_DIRECT will be used
>>>>> ?#config/config.c:950 ? ? ? Setting global/locking_type to 3
>>>>> ?#locking/locking.c:253 ? ? ? Cluster locking selected.
>>>>> ?#activate/activate.c:363 ? ? ? Getting target version for linear
>>>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm version ? OF ? [16384]
>>>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>>>>> ?#activate/activate.c:363 ? ? ? Getting target version for striped
>>>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>>>>> ?#lvcreate.c:318 ? ? Setting logging type to core
>>>>> ?#config/config.c:950 ? ? ? Setting activation/mirror_region_size to 512
>>>>> ?#lvcreate.c:997 ? ? Finding volume group "vg00"
>>>>> ?#locking/cluster_locking.c:458 ? ? ? Locking VG V_vg00 PW B (0x4)
>>>>>
>>>>> ?[root at wplccdlvm446 ~]# cat /etc/redhat-release
>>>>> ?Red Hat Enterprise Linux Server release 5.4 (Tikanga)
>>>>>
>>>>> [root at wplccdlvm446 ~]# uname -r
>>>>> ?2.6.18-164.el5
>>>>> ?same result, every I added nosync parameter.
>>>>>
>>>>> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?--nosync -m1 --mirrorlog core
>>>>> ?-L 800M ? vg00
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From diamondiona at gmail.com  Mon Dec 28 08:32:36 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Mon, 28 Dec 2009 16:32:36 +0800
Subject: [Linux-cluster] Configuration Between Real Servers and GFS Cluster
Message-ID: <dd23a5e0912280032vc941c14y619efb3e52a8eccc@mail.gmail.com>

Hello everyone,

I am trying to setup a test file server environment using the
following architecture:
   layer1: active and standby LVS router
   layer2:  real servers, providing ftp, samba, and http service to end users
   layer3:  GFS servers
   layer4:  iSCSI target servers.

I am planning to use nfs connecting real and GFS servers, and real
servers pvovide ftp, samba, and http services to outside world.


What truly confusing me is the configuration between real servers and
GFS cluster.

There is a reference from redhat, addressing this isse:

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Configuration_Example_-_NFS_Over_GFS/index.html

But my concern is: if I use HA solution, for instance I have 5 GFS
servers, but only one of them is serving nfs service. That means 1
server is overloaded, the other are starving in meantime.

Is there any better solutions to make 5 GFS servers provide nfs
service concurrently?

Wish to hear  from words from wisdom!



From gordan at bobich.net  Mon Dec 28 08:50:16 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Mon, 28 Dec 2009 08:50:16 +0000
Subject: [Linux-cluster] Configuration Between Real Servers and GFS
	Cluster
In-Reply-To: <dd23a5e0912280032vc941c14y619efb3e52a8eccc@mail.gmail.com>
References: <dd23a5e0912280032vc941c14y619efb3e52a8eccc@mail.gmail.com>
Message-ID: <4B387148.4010603@bobich.net>

Diamond Li wrote:

> But my concern is: if I use HA solution, for instance I have 5 GFS
> servers, but only one of them is serving nfs service. That means 1
> server is overloaded, the other are starving in meantime.
> 
> Is there any better solutions to make 5 GFS servers provide nfs
> service concurrently?

There is nothing to stop you from having NFS exports run on all of the 
nodes simultaneously. Divide up your clients into groups so that each 
group mounts a different NFS server, and have the floating IP of the NFS 
servers as a fail-over resource.

There are, however, a lot of potential performance issues you will 
likely encounter in such a setup (attribute caching on NFS, lock caching 
on GFS, etc.).

Gordan



From brem.belguebli at gmail.com  Mon Dec 28 08:51:32 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Mon, 28 Dec 2009 09:51:32 +0100
Subject: [Linux-cluster] lvcreate hangs always
In-Reply-To: <dd23a5e0912280004w11565736se059d75e74ad216@mail.gmail.com>
References: <dd23a5e0912230228y291a2548qb8c7e4b51c06d047@mail.gmail.com>
	<dd23a5e0912230241k63fa7f77i9946baa3a46d537d@mail.gmail.com>
	<dd23a5e0912230243s6761f43m162a9f9cd9c7cced@mail.gmail.com>
	<dd23a5e0912230632ieeeba3cs9c445d7e3a795d09@mail.gmail.com>
	<dd23a5e0912230638q66888a9dp1b8f0bcf5911d2fa@mail.gmail.com>
	<dd23a5e0912240440s55466ab3n3a04236aad6731c3@mail.gmail.com>
	<29ae894c0912241604r6b5f5bbo225e9d8ae7f9890@mail.gmail.com>
	<dd23a5e0912271855v56cfdfccm570085f2d303030c@mail.gmail.com>
	<29ae894c0912272322u71f09eb4v353747051625be86@mail.gmail.com>
	<dd23a5e0912280004w11565736se059d75e74ad216@mail.gmail.com>
Message-ID: <29ae894c0912280051x11b0b93ahb3e57b6232cf7566@mail.gmail.com>

In my setup, cmirror is started before clvm, it may be the reason.

2009/12/28 Diamond Li <diamondiona at gmail.com>:
> not at all, it hangs again.
>
> On Mon, Dec 28, 2009 at 3:22 PM, brem belguebli
> <brem.belguebli at gmail.com> wrote:
>> did it work ?
>>
>> 2009/12/28 Diamond Li <diamondiona at gmail.com>:
>>> thanks for your reply, I have started cmirror:
>>> [root at wplccdlvm445 ~]# service cmirror status
>>> cmirror is running.
>>> [root at wplccdlvm445 ~]# service clvmd status
>>> clvmd (pid 5392) is running...
>>>
>>> [root at wplccdlvm445 ~]# service cman ?status
>>> cman is running.
>>> [root at wplccdlvm445 ~]# clustat
>>> Cluster Status for clearcase @ Mon Dec 28 10:54:49 2009
>>> Member Status: Quorate
>>>
>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>> ?wplccdlvm445.cn.ibm.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 Online, Local
>>> ?wplccdlvm446.cn.ibm.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online
>>>
>>> [root at wplccdlvm445 ~]# lvcreate ?-vvv -m1 --corelog -L 800M ? vg100
>>> ? ? ? ?Processing: lvcreate -vvv -m1 --corelog -L 800M vg100
>>> ? ? ? ?O_DIRECT will be used
>>> ? ? ?Setting global/locking_type to 3
>>> ? ? ?Cluster locking selected.
>>> ? ? ?Getting target version for linear
>>> ? ? ? ?dm version ? OF ? [16384]
>>> ? ? ? ?dm versions ? OF ? [16384]
>>> ? ? ?Getting target version for striped
>>> ? ? ? ?dm versions ? OF ? [16384]
>>> ? ?Setting logging type to core
>>> ? ? ?Setting activation/mirror_region_size to 512
>>> ? ?Finding volume group "vg100"
>>> ? ? ?Locking VG V_vg100 PW B (0x4)
>>>
>>>
>>>
>>> On Fri, Dec 25, 2009 at 8:04 AM, brem belguebli
>>> <brem.belguebli at gmail.com> wrote:
>>>> try to run cmirror (service cmirror start) before doing any clvm with mirror
>>>>
>>>>
>>>>
>>>> 2009/12/24 Diamond Li <diamondiona at gmail.com>:
>>>>> Could someone ?kindly help me to get through because I haven been
>>>>> blocked for very long time.
>>>>>
>>>>>
>>>>> On Wed, Dec 23, 2009 at 10:38 PM, Diamond Li <diamondiona at gmail.com> wrote:
>>>>>> Hello ?everyone,
>>>>>>
>>>>>> ?I am trying to create a mirror LVM on cluster, but lvcreate hangs
>>>>>> ?every time and I have to use kill from another terminal. According to
>>>>>> ?release note, this should be supported since ?5.3
>>>>>>
>>>>>> ?Any words from wisdoms?
>>>>>>
>>>>>> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?-m1 --mirrorlog core -L 800M ? vg00
>>>>>> ?#lvmcmdline.c:987 ? ? ? ? Processing: lvcreate -vvvv -m1 --mirrorlog
>>>>>> ?core -L 800M vg00
>>>>>> ?#lvmcmdline.c:990 ? ? ? ? O_DIRECT will be used
>>>>>> ?#config/config.c:950 ? ? ? Setting global/locking_type to 3
>>>>>> ?#locking/locking.c:253 ? ? ? Cluster locking selected.
>>>>>> ?#activate/activate.c:363 ? ? ? Getting target version for linear
>>>>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm version ? OF ? [16384]
>>>>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>>>>>> ?#activate/activate.c:363 ? ? ? Getting target version for striped
>>>>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>>>>>> ?#lvcreate.c:318 ? ? Setting logging type to core
>>>>>> ?#config/config.c:950 ? ? ? Setting activation/mirror_region_size to 512
>>>>>> ?#lvcreate.c:997 ? ? Finding volume group "vg00"
>>>>>> ?#locking/cluster_locking.c:458 ? ? ? Locking VG V_vg00 PW B (0x4)
>>>>>>
>>>>>> ?[root at wplccdlvm446 ~]# cat /etc/redhat-release
>>>>>> ?Red Hat Enterprise Linux Server release 5.4 (Tikanga)
>>>>>>
>>>>>> [root at wplccdlvm446 ~]# uname -r
>>>>>> ?2.6.18-164.el5
>>>>>> ?same result, every I added nosync parameter.
>>>>>>
>>>>>> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?--nosync -m1 --mirrorlog core
>>>>>> ?-L 800M ? vg00
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From diamondiona at gmail.com  Mon Dec 28 09:33:50 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Mon, 28 Dec 2009 17:33:50 +0800
Subject: [Linux-cluster] lvcreate hangs always
In-Reply-To: <29ae894c0912280051x11b0b93ahb3e57b6232cf7566@mail.gmail.com>
References: <dd23a5e0912230228y291a2548qb8c7e4b51c06d047@mail.gmail.com>
	<dd23a5e0912230243s6761f43m162a9f9cd9c7cced@mail.gmail.com>
	<dd23a5e0912230632ieeeba3cs9c445d7e3a795d09@mail.gmail.com>
	<dd23a5e0912230638q66888a9dp1b8f0bcf5911d2fa@mail.gmail.com>
	<dd23a5e0912240440s55466ab3n3a04236aad6731c3@mail.gmail.com>
	<29ae894c0912241604r6b5f5bbo225e9d8ae7f9890@mail.gmail.com>
	<dd23a5e0912271855v56cfdfccm570085f2d303030c@mail.gmail.com>
	<29ae894c0912272322u71f09eb4v353747051625be86@mail.gmail.com>
	<dd23a5e0912280004w11565736se059d75e74ad216@mail.gmail.com>
	<29ae894c0912280051x11b0b93ahb3e57b6232cf7566@mail.gmail.com>
Message-ID: <dd23a5e0912280133u123c6f0cn42a3dae81329d2ac@mail.gmail.com>

it hangs as always(after locking VG). From redhat document, it said
clvmd should start before cmirror, or the sequence does not matter at
all.

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Logical_Volume_Manager_Administration/mirvol_create_ex.html


[root at wplccdlvm445 ~]# service cman start
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... done
                                                           [  OK  ]
[root at wplccdlvm445 ~]# service cmirror start
Loading clustered mirror log module:                       [  OK  ]
Starting clustered mirror log server:                      [  OK  ]
[root at wplccdlvm445 ~]# service clvmd start
Starting clvmd:                                            [  OK  ]
Activating VGs:   0 logical volume(s) in volume group "vg100" now active
  2 logical volume(s) in volume group "VolGroup00" now active
                                                           [  OK  ]


[root at wplccdlvm445 ~]# lvcreate -vvv -m1 --corelog -L 800M   vg100
        Processing: lvcreate -vvv -m1 --corelog -L 800M vg100
        O_DIRECT will be used
      Setting global/locking_type to 3
      Cluster locking selected.
      Getting target version for linear
        dm version   OF   [16384]
        dm versions   OF   [16384]
      Getting target version for striped
        dm versions   OF   [16384]
    Setting logging type to core
      Setting activation/mirror_region_size to 512
    Finding volume group "vg100"
      Locking VG V_vg100 PW B (0x4)
        Opened /dev/ramdisk RW O_DIRECT
        /dev/ramdisk: block size is 4096 bytes
      /dev/ramdisk: No label detected
        Closed /dev/ramdisk
        Opened /dev/root RW O_DIRECT
        /dev/root: block size is 4096 bytes
      /dev/root: No label detected
        Closed /dev/root
        Opened /dev/ram RW O_DIRECT
        /dev/ram: block size is 4096 bytes
      /dev/ram: No label detected
        Closed /dev/ram
        Opened /dev/sda1 RW O_DIRECT
        /dev/sda1: block size is 1024 bytes
      /dev/sda1: No label detected
        Closed /dev/sda1
        Opened /dev/VolGroup00/LogVol01 RW O_DIRECT
        /dev/VolGroup00/LogVol01: block size is 4096 bytes
      /dev/VolGroup00/LogVol01: No label detected
        Closed /dev/VolGroup00/LogVol01
        Opened /dev/ram2 RW O_DIRECT
        /dev/ram2: block size is 4096 bytes
     /dev/ram2: No label detected
        Closed /dev/ram2
        Opened /dev/sda2 RW O_DIRECT
        /dev/sda2: block size is 512 bytes
      /dev/sda2: lvm2 label detected
        lvmcache: /dev/sda2: now in VG #orphans_lvm2 (#orphans_lvm2)
        /dev/sda2: Found metadata at 6656 size 1150 (in area at 4096
size 192512) for VolGroup00 (iQuMpX-6Un5-Q4jX-UtXp-MHZg-AQAC-wK0FXS)
        lvmcache: /dev/sda2: now in VG VolGroup00 with 1 mdas
        lvmcache: /dev/sda2: setting VolGroup00 VGID to
iQuMpX6Un5Q4jXUtXpMHZgAQACwK0FXS
        lvmcache: /dev/sda2: VG VolGroup00: Set creation host to
localhost.localdomain.
        Closed /dev/sda2
        Opened /dev/ram3 RW O_DIRECT
        /dev/ram3: block size is 4096 bytes
      /dev/ram3: No label detected
        Closed /dev/ram3
        Opened /dev/ram4 RW O_DIRECT
        /dev/ram4: block size is 4096 bytes
      /dev/ram4: No label detected
        Closed /dev/ram4
        Opened /dev/ram5 RW O_DIRECT
        /dev/ram5: block size is 4096 bytes
      /dev/ram5: No label detected
        Closed /dev/ram5
        Opened /dev/ram6 RW O_DIRECT
        /dev/ram6: block size is 4096 bytes
      /dev/ram6: No label detected
        Closed /dev/ram6
        Opened /dev/ram7 RW O_DIRECT
        /dev/ram7: block size is 4096 bytes
      /dev/ram7: No label detected
        Closed /dev/ram7
        Opened /dev/ram8 RW O_DIRECT
        /dev/ram8: block size is 4096 bytes
      /dev/ram8: No label detected
        Closed /dev/ram8
        Opened /dev/ram9 RW O_DIRECT
        /dev/ram9: block size is 4096 bytes
      /dev/ram9: No label detected
        Closed /dev/ram9
        Opened /dev/ram10 RW O_DIRECT
        /dev/ram10: block size is 4096 bytes
      /dev/ram10: No label detected
        Closed /dev/ram10
        Opened /dev/ram11 RW O_DIRECT
        /dev/ram11: block size is 4096 bytes
      /dev/ram11: No label detected
        Closed /dev/ram11
        Opened /dev/ram12 RW O_DIRECT
        /dev/ram12: block size is 4096 bytes
      /dev/ram12: No label detected
        Closed /dev/ram12
        Opened /dev/ram13 RW O_DIRECT
        /dev/ram13: block size is 4096 bytes
      /dev/ram13: No label detected
        Closed /dev/ram13
        Opened /dev/ram14 RW O_DIRECT
        /dev/ram14: block size is 4096 bytes
      /dev/ram14: No label detected
        Closed /dev/ram14
        Opened /dev/ram15 RW O_DIRECT
        /dev/ram15: block size is 4096 bytes
      /dev/ram15: No label detected
        Closed /dev/ram15
        Opened /dev/sdb RW O_DIRECT
        /dev/sdb: block size is 4096 bytes
      /dev/sdb: lvm2 label detected
        lvmcache: /dev/sdb: now in VG #orphans_lvm2 (#orphans_lvm2)
        /dev/sdb: Found metadata at 4608 size 854 (in area at 4096
size 192512) for vg100 (IZRA48-hz68-x145-4YOd-Q8vo-pGT2-O7V5PW)
        lvmcache: /dev/sdb: now in VG vg100 with 1 mdas
        lvmcache: /dev/sdb: setting vg100 VGID to
IZRA48hz68x1454YOdQ8vopGT2O7V5PW
        lvmcache: /dev/sdb: VG vg100: Set creation host to
wplccdlvm446.cn.ibm.com.
        Opened /dev/sdc RW O_DIRECT
        /dev/sdc: block size is 4096 bytes
      /dev/sdc: lvm2 label detected
        lvmcache: /dev/sdc: now in VG #orphans_lvm2 (#orphans_lvm2)
        /dev/sdc: Found metadata at 4608 size 854 (in area at 4096
size 192512) for vg100 (IZRA48-hz68-x145-4YOd-Q8vo-pGT2-O7V5PW)
        lvmcache: /dev/sdc: now in VG vg100
(IZRA48hz68x1454YOdQ8vopGT2O7V5PW) with 1 mdas
        Using cached label for /dev/sdb
        Using cached label for /dev/sdc
        Using cached label for /dev/sdb
        Using cached label for /dev/sdc
        Read vg100 metadata (1) from /dev/sdb at 4608 size 854
        Using cached label for /dev/sdb
        Using cached label for /dev/sdc
        Read vg100 metadata (1) from /dev/sdc at 4608 size 854
        /dev/sdb 0:      0    255: NULL(0:0)
        /dev/sdc 0:      0    255: NULL(0:0)
    Archiving volume group "vg100" metadata (seqno 1).
    Creating logical volume lvol0
        Allowing allocation on /dev/sdb start PE 0 length 255
        Allowing allocation on /dev/sdc start PE 0 length 255
        Allowing allocation on /dev/sdb start PE 200 length 55
        Allowing allocation on /dev/sdc start PE 0 length 255
        Parallel PVs at LE 0 length 200: /dev/sdb
    Creating logical volume lvol0_mimage_0
        Getting device info for vg100-lvol0
        dm info
LVM-IZRA48hz68x1454YOdQ8vopGT2O7V5PWxsPfOhRBpGDV6wbmcITMUwAmtX0yeVex
NF   [16384]
        dm info
IZRA48hz68x1454YOdQ8vopGT2O7V5PWxsPfOhRBpGDV6wbmcITMUwAmtX0yeVex NF
[16384]
        dm info vg100-lvol0  NF   [16384]
  cluster request failed: Invalid argument
      Inserting layer lvol0_mimage_0 for lvol0
      Stack lvol0:0[0] on LV lvol0_mimage_0:0
      Adding lvol0:0 as an user of lvol0_mimage_0
    Creating logical volume lvol0_mimage_1
      Remove lvol0:0[0] from the top of LV lvol0_mimage_0:0
      lvol0:0 is no longer a user of lvol0_mimage_0
      Stack lvol0:0[0] on LV lvol0_mimage_0:0
      Adding lvol0:0 as an user of lvol0_mimage_0
      Stack lvol0:0[1] on LV lvol0_mimage_1:0
      Adding lvol0:0 as an user of lvol0_mimage_1
        /dev/sdb 0:      0    200: lvol0_mimage_0(0:0)
        /dev/sdb 1:    200     55: NULL(0:0)
        /dev/sdc 0:      0    200: lvol0_mimage_1(0:0)
        /dev/sdc 1:    200     55: NULL(0:0)
      Locking VG P_vg100 PW B (0x4)



On Mon, Dec 28, 2009 at 4:51 PM, brem belguebli
<brem.belguebli at gmail.com> wrote:
> In my setup, cmirror is started before clvm, it may be the reason.
>
> 2009/12/28 Diamond Li <diamondiona at gmail.com>:
>> not at all, it hangs again.
>>
>> On Mon, Dec 28, 2009 at 3:22 PM, brem belguebli
>> <brem.belguebli at gmail.com> wrote:
>>> did it work ?
>>>
>>> 2009/12/28 Diamond Li <diamondiona at gmail.com>:
>>>> thanks for your reply, I have started cmirror:
>>>> [root at wplccdlvm445 ~]# service cmirror status
>>>> cmirror is running.
>>>> [root at wplccdlvm445 ~]# service clvmd status
>>>> clvmd (pid 5392) is running...
>>>>
>>>> [root at wplccdlvm445 ~]# service cman ?status
>>>> cman is running.
>>>> [root at wplccdlvm445 ~]# clustat
>>>> Cluster Status for clearcase @ Mon Dec 28 10:54:49 2009
>>>> Member Status: Quorate
>>>>
>>>> ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status
>>>> ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------
>>>> ?wplccdlvm445.cn.ibm.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 Online, Local
>>>> ?wplccdlvm446.cn.ibm.com ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Online
>>>>
>>>> [root at wplccdlvm445 ~]# lvcreate ?-vvv -m1 --corelog -L 800M ? vg100
>>>> ? ? ? ?Processing: lvcreate -vvv -m1 --corelog -L 800M vg100
>>>> ? ? ? ?O_DIRECT will be used
>>>> ? ? ?Setting global/locking_type to 3
>>>> ? ? ?Cluster locking selected.
>>>> ? ? ?Getting target version for linear
>>>> ? ? ? ?dm version ? OF ? [16384]
>>>> ? ? ? ?dm versions ? OF ? [16384]
>>>> ? ? ?Getting target version for striped
>>>> ? ? ? ?dm versions ? OF ? [16384]
>>>> ? ?Setting logging type to core
>>>> ? ? ?Setting activation/mirror_region_size to 512
>>>> ? ?Finding volume group "vg100"
>>>> ? ? ?Locking VG V_vg100 PW B (0x4)
>>>>
>>>>
>>>>
>>>> On Fri, Dec 25, 2009 at 8:04 AM, brem belguebli
>>>> <brem.belguebli at gmail.com> wrote:
>>>>> try to run cmirror (service cmirror start) before doing any clvm with mirror
>>>>>
>>>>>
>>>>>
>>>>> 2009/12/24 Diamond Li <diamondiona at gmail.com>:
>>>>>> Could someone ?kindly help me to get through because I haven been
>>>>>> blocked for very long time.
>>>>>>
>>>>>>
>>>>>> On Wed, Dec 23, 2009 at 10:38 PM, Diamond Li <diamondiona at gmail.com> wrote:
>>>>>>> Hello ?everyone,
>>>>>>>
>>>>>>> ?I am trying to create a mirror LVM on cluster, but lvcreate hangs
>>>>>>> ?every time and I have to use kill from another terminal. According to
>>>>>>> ?release note, this should be supported since ?5.3
>>>>>>>
>>>>>>> ?Any words from wisdoms?
>>>>>>>
>>>>>>> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?-m1 --mirrorlog core -L 800M ? vg00
>>>>>>> ?#lvmcmdline.c:987 ? ? ? ? Processing: lvcreate -vvvv -m1 --mirrorlog
>>>>>>> ?core -L 800M vg00
>>>>>>> ?#lvmcmdline.c:990 ? ? ? ? O_DIRECT will be used
>>>>>>> ?#config/config.c:950 ? ? ? Setting global/locking_type to 3
>>>>>>> ?#locking/locking.c:253 ? ? ? Cluster locking selected.
>>>>>>> ?#activate/activate.c:363 ? ? ? Getting target version for linear
>>>>>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm version ? OF ? [16384]
>>>>>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>>>>>>> ?#activate/activate.c:363 ? ? ? Getting target version for striped
>>>>>>> ?#ioctl/libdm-iface.c:1672 ? ? ? ? dm versions ? OF ? [16384]
>>>>>>> ?#lvcreate.c:318 ? ? Setting logging type to core
>>>>>>> ?#config/config.c:950 ? ? ? Setting activation/mirror_region_size to 512
>>>>>>> ?#lvcreate.c:997 ? ? Finding volume group "vg00"
>>>>>>> ?#locking/cluster_locking.c:458 ? ? ? Locking VG V_vg00 PW B (0x4)
>>>>>>>
>>>>>>> ?[root at wplccdlvm446 ~]# cat /etc/redhat-release
>>>>>>> ?Red Hat Enterprise Linux Server release 5.4 (Tikanga)
>>>>>>>
>>>>>>> [root at wplccdlvm446 ~]# uname -r
>>>>>>> ?2.6.18-164.el5
>>>>>>> ?same result, every I added nosync parameter.
>>>>>>>
>>>>>>> ?[root at wplccdlvm446 ~]# lvcreate -vvvv ?--nosync -m1 --mirrorlog core
>>>>>>> ?-L 800M ? vg00
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From diamondiona at gmail.com  Mon Dec 28 09:48:18 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Mon, 28 Dec 2009 17:48:18 +0800
Subject: [Linux-cluster] Configuration Between Real Servers and GFS
	Cluster
In-Reply-To: <4B387148.4010603@bobich.net>
References: <dd23a5e0912280032vc941c14y619efb3e52a8eccc@mail.gmail.com>
	<4B387148.4010603@bobich.net>
Message-ID: <dd23a5e0912280148w46706accxb53be4da771f7035@mail.gmail.com>

thanks for your reply. I am trying to understand what you mean. for
instance, I have 6 clients and 3 GFS servers, they are divided into 3
different groups
    client1            GFS server1
    client2

    client3            GFS server2
    client4

    client 5           GFS server3
    client6

So my question is how to balance the workload between these 3 GFS
servers since they are hosting identical data?

anyone successfully implemented this using CTDB+GFS?




On Mon, Dec 28, 2009 at 4:50 PM, Gordan Bobic <gordan at bobich.net> wrote:
> Diamond Li wrote:
>
>> But my concern is: if I use HA solution, for instance I have 5 GFS
>> servers, but only one of them is serving nfs service. That means 1
>> server is overloaded, the other are starving in meantime.
>>
>> Is there any better solutions to make 5 GFS servers provide nfs
>> service concurrently?
>
> There is nothing to stop you from having NFS exports run on all of the nodes
> simultaneously. Divide up your clients into groups so that each group mounts
> a different NFS server, and have the floating IP of the NFS servers as a
> fail-over resource.
>
> There are, however, a lot of potential performance issues you will likely
> encounter in such a setup (attribute caching on NFS, lock caching on GFS,
> etc.).
>
> Gordan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From bohdan at harazd.net  Mon Dec 28 09:52:44 2009
From: bohdan at harazd.net (Bohdan Sydor)
Date: Mon, 28 Dec 2009 10:52:44 +0100
Subject: [Linux-cluster] Running KVM VM as a cluster service in RHCS
Message-ID: <4B387FEC.3070002@harazd.net>

Hello,

In Release Notes to RHEL 5.4 you can read that

``running the cluster suite in conjunction with KVM hypervisor is
considered to be a Technology Preview''. I installed and ran a clustered
vm service under KVM hypervisor for tests and it worked fine.

I'm planning to deploy a clustered KVM vm service in production. It is
going to be a two-node cluster with a DB service on one node and KVM vm
service (Windows 2003 or 2088 --- Terminal services) on the other.

My question is: has anyone been using KVM virtual machines as a cluster
service in production and are there any caveats? Thanks for any suggestions.

Regards

-- 
Bohdan Sydor



From gordan at bobich.net  Mon Dec 28 10:39:15 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Mon, 28 Dec 2009 10:39:15 +0000
Subject: [Linux-cluster] Configuration Between Real Servers and
	GFS	Cluster
In-Reply-To: <dd23a5e0912280148w46706accxb53be4da771f7035@mail.gmail.com>
References: <dd23a5e0912280032vc941c14y619efb3e52a8eccc@mail.gmail.com>	<4B387148.4010603@bobich.net>
	<dd23a5e0912280148w46706accxb53be4da771f7035@mail.gmail.com>
Message-ID: <4B388AD3.40200@bobich.net>

Diamond Li wrote:
> thanks for your reply. I am trying to understand what you mean. for
> instance, I have 6 clients and 3 GFS servers, they are divided into 3
> different groups
>     client1            GFS server1
>     client2
> 
>     client3            GFS server2
>     client4
> 
>     client 5           GFS server3
>     client6
> 
> So my question is how to balance the workload between these 3 GFS
> servers since they are hosting identical data?

By the diagram above, you're already load balancing it. Have clients 1,2 
mount off server1's floating IP, clients 3,4 off server2, clients 5,6 
off server3. If one of the servers fails it's IP fails over to one of 
the surviving servers, and you'll get 4 clients on one server and 2 on 
the other.

Note, however, that if different clients are accessing files in the same 
directories, you'll get lock contention and locks will end up bouncing 
between the servers, which will seriously hurt performance.

Gordan



From diamondiona at gmail.com  Mon Dec 28 14:26:20 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Mon, 28 Dec 2009 22:26:20 +0800
Subject: [Linux-cluster] Configuration Between Real Servers and GFS
	Cluster
In-Reply-To: <4B388AD3.40200@bobich.net>
References: <dd23a5e0912280032vc941c14y619efb3e52a8eccc@mail.gmail.com>
	<4B387148.4010603@bobich.net>
	<dd23a5e0912280148w46706accxb53be4da771f7035@mail.gmail.com>
	<4B388AD3.40200@bobich.net>
Message-ID: <dd23a5e0912280626x54b01083sec8be26f364f36f2@mail.gmail.com>

let's imagine,  in this diagram, client1 and client2 are sending
numerous requests to server1, but in meantime, there is no request
from client3 and client4, that means server1 is overloaded,  server2
is starving. So that is why I say we can't balance workload between 3
servers.



On Mon, Dec 28, 2009 at 6:39 PM, Gordan Bobic <gordan at bobich.net> wrote:
> Diamond Li wrote:
>>
>> thanks for your reply. I am trying to understand what you mean. for
>> instance, I have 6 clients and 3 GFS servers, they are divided into 3
>> different groups
>> ? ?client1 ? ? ? ? ? ?GFS server1
>> ? ?client2
>>
>> ? ?client3 ? ? ? ? ? ?GFS server2
>> ? ?client4
>>
>> ? ?client 5 ? ? ? ? ? GFS server3
>> ? ?client6
>>
>> So my question is how to balance the workload between these 3 GFS
>> servers since they are hosting identical data?
>
> By the diagram above, you're already load balancing it. Have clients 1,2
> mount off server1's floating IP, clients 3,4 off server2, clients 5,6 off
> server3. If one of the servers fails it's IP fails over to one of the
> surviving servers, and you'll get 4 clients on one server and 2 on the
> other.
>
> Note, however, that if different clients are accessing files in the same
> directories, you'll get lock contention and locks will end up bouncing
> between the servers, which will seriously hurt performance.
>
> Gordan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From gordan at bobich.net  Mon Dec 28 14:49:52 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Mon, 28 Dec 2009 14:49:52 +0000
Subject: [Linux-cluster] Configuration Between Real Servers and
	GFS	Cluster
In-Reply-To: <dd23a5e0912280626x54b01083sec8be26f364f36f2@mail.gmail.com>
References: <dd23a5e0912280032vc941c14y619efb3e52a8eccc@mail.gmail.com>	<4B387148.4010603@bobich.net>	<dd23a5e0912280148w46706accxb53be4da771f7035@mail.gmail.com>	<4B388AD3.40200@bobich.net>
	<dd23a5e0912280626x54b01083sec8be26f364f36f2@mail.gmail.com>
Message-ID: <4B38C590.6060401@bobich.net>

On 28/12/2009 14:26, Diamond Li wrote:
> let's imagine,  in this diagram, client1 and client2 are sending
> numerous requests to server1, but in meantime, there is no request
> from client3 and client4, that means server1 is overloaded,  server2
> is starving. So that is why I say we can't balance workload between3
> servers.

That sounds like something you need to deal with on the application 
level. You can't balance NFS (or CIFS, or any similar system) "per request".

The closest to what you're asking for I can think of is something like 
WebDAV (http://en.wikipedia.org/wiki/WebDAV) on the servers, with davfs2 
(http://savannah.nongnu.org/projects/davfs2) on the clients. You could 
then put a load balancer between the clients and the servers, or use DNS 
based load balancing to achieve more even load distribution.

Gordan



From diamondiona at gmail.com  Tue Dec 29 08:59:19 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Tue, 29 Dec 2009 16:59:19 +0800
Subject: [Linux-cluster] any best practice for number of new GFS journal to
	add?
Message-ID: <dd23a5e0912290059qcba8058m7f7f16acf0c07b47@mail.gmail.com>

hello everyone,

I feel a little bit confused about gfs_jadd. my understanding is we do
not have to add new journal after we extend file system size because
current size may be enough,  is it true ?

is there any best pratice for when should we add new journal, the
right number of new added journal, and journal size? for instance, if
my file system grows from 2G to 4G, what would be the right answer for
my questions, and it grows from 2T to 4T, etc.

best regards!



From gordan at bobich.net  Tue Dec 29 10:22:47 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Tue, 29 Dec 2009 10:22:47 +0000
Subject: [Linux-cluster] GFS: inode changed under us
Message-ID: <4B39D877.2070600@bobich.net>

I'm seeing this logged to the console. Is it dangerous? Does it imply 
corruption is happening?

The setup is a RHEL 5.4 host (KVM) with two RHEL 5.4 guests running 
OpenVZ. The two (KVM) guests are mounting a shared image via an IDE 
device with GFS on it. The two (KVM) guests are running one OpenVZ guest 
each and the root of the OpenVZ guests is shared and on the GFS volume.

Gordan



From alfredo.moralejo at roche.com  Tue Dec 29 14:10:12 2009
From: alfredo.moralejo at roche.com (Moralejo, Alfredo)
Date: Tue, 29 Dec 2009 15:10:12 +0100
Subject: [Linux-cluster] Making fence to leave the JOIN_START_WAIT status
Message-ID: <C64734E4E1C80E49955AD539DB2FBC3A079D668E@rkamsem703.emea.roche.com>

Hi,

I'm building a two node cluster (three nodes in future), and I'd like to prevent nodes from start whenever other nodes does not join the cluster. However, I don't like the idea of fencing a node is not part of the cluster in an uncontrolled way when other node is forming a new cluster (let's say, while patching the system). I've found a nice way to do so by configuring clean_start to 0 and post_join_delay to "-1". With this configuration fence daemon will not try to fence the nodes out of the cluster and will wait until the rest of nodes join the cluster in this status:

[root at jimenez1 admin]# group_tool
type             level name       id       state
fence            0     default    00010001 JOIN_START_WAIT
[1]
dlm              1     rgmanager  00020001 none
[1]

[root at jimenez1 admin]# clustat
Cluster Status for ext_test @ Tue Dec 29 15:01:54 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 hb-jimenez1                                                         1 Online, Local
 hb-jimenez2                                                         2 Offline

This is the behavior I want, however, there is still a problem. I'd like to be able to manually start the cluster in that node by means of a command or something like that. I've seen in the past I could use fence_tool with -c option to leave the JOIN_START_WAIT status and let the cluster start with only one node. However, this options does not seem to be available in my version (I'm using RHEL 5.3) and cman version cman-2.0.115-1.el5_4.9.

With this configuration, is there any way to start the cluster manually in only one node without reconfiguring the cluster? Is there any other configuration that allows me to have the same functionality?

Best regards,


Alfredo Moralejo
Business Platforms Engineering - OS Servers - UNIX Senior Specialist
F. Hoffmann-La Roche Ltd.
Global Informatics Group Infrastructure
Josefa Valc?rcel, 40
28027 Madrid SPAIN

Phone: +34 91 305 97 87

alfredo.moralejo at roche.com<mailto:alfredo.moralejo at roche.com>

Confidentiality Note: This message is intended only for the use of the named recipient(s) and may contain confidential and/or proprietary information. If you are not the intended recipient, please contact the sender and delete this message. Any unauthorized use of the information contained in this message is prohibited.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091229/15b7f80f/attachment.htm>

From cthulhucalling at gmail.com  Tue Dec 29 15:23:25 2009
From: cthulhucalling at gmail.com (Ian Hayes)
Date: Tue, 29 Dec 2009 07:23:25 -0800
Subject: [Linux-cluster] any best practice for number of new GFS journal
	to add?
In-Reply-To: <dd23a5e0912290059qcba8058m7f7f16acf0c07b47@mail.gmail.com>
References: <dd23a5e0912290059qcba8058m7f7f16acf0c07b47@mail.gmail.com>
Message-ID: <36df569a0912290723s55bb5680tb232d2fde46ca41@mail.gmail.com>

The number of journals you have is dependant on the number of cluster
members using that GFS filesystem, not the size of it. As I understand it,
you may have to tweak the size of the journals if you're doing a lot of I/O.
At a minimum, you're going to need N journals, where N is the number of
cluster members using the GFS volume. I generally create N+1 journals in
case I need to add a host quickly and I might forget to add a journal
otherwise.


On Tue, Dec 29, 2009 at 12:59 AM, Diamond Li <diamondiona at gmail.com> wrote:

> hello everyone,
>
> I feel a little bit confused about gfs_jadd. my understanding is we do
> not have to add new journal after we extend file system size because
> current size may be enough,  is it true ?
>
> is there any best pratice for when should we add new journal, the
> right number of new added journal, and journal size? for instance, if
> my file system grows from 2G to 4G, what would be the right answer for
> my questions, and it grows from 2T to 4T, etc.
>
> best regards!
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091229/29cb5b87/attachment.htm>

From matt.iavarone at gmail.com  Tue Dec 29 15:43:27 2009
From: matt.iavarone at gmail.com (Matt Iavarone)
Date: Tue, 29 Dec 2009 08:43:27 -0700
Subject: [Linux-cluster] any best practice for number of new GFS journal
	to add?
In-Reply-To: <dd23a5e0912290059qcba8058m7f7f16acf0c07b47@mail.gmail.com>
References: <dd23a5e0912290059qcba8058m7f7f16acf0c07b47@mail.gmail.com>
Message-ID: <29f82b420912290743wa9a3450sf5a68cf7bec6270b@mail.gmail.com>

What I got from Red Hat was the number of nodes + 2.  I can't speak to
the logic of that, though.

On Tue, Dec 29, 2009 at 1:59 AM, Diamond Li <diamondiona at gmail.com> wrote:
> hello everyone,
>
> I feel a little bit confused about gfs_jadd. my understanding is we do
> not have to add new journal after we extend file system size because
> current size may be enough, ?is it true ?
>
> is there any best pratice for when should we add new journal, the
> right number of new added journal, and journal size? for instance, if
> my file system grows from 2G to 4G, what would be the right answer for
> my questions, and it grows from 2T to 4T, etc.
>
> best regards!
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From mgrac at redhat.com  Tue Dec 29 17:42:27 2009
From: mgrac at redhat.com (Marek Grac)
Date: Tue, 29 Dec 2009 18:42:27 +0100
Subject: [Linux-cluster] Mysql.sh error.
In-Reply-To: <5370ab990912220927i38092060t7d439dea109e6e00@mail.gmail.com>
References: <5370ab990912220927i38092060t7d439dea109e6e00@mail.gmail.com>
Message-ID: <4B3A3F83.3080302@redhat.com>

Hi,

On 12/22/2009 06:27 PM, Marty Diesburg wrote:
> Sorry, for the double-post, ---great way to start on the list :). 
>  Below has the error message as well "Failed - Invalid Name Of Service".
>
> Hi all,
>
> I am new to the list and have an issue with the Mysql service.  It is 
> running, but when I run the commands /usr/share/cluster/mysql.sh 
> restart, or /usr/share/cluster/mysql.sh status I get
> the following errors.  I am using mysql as a database for an email 
> server with Dovecot, Qmail, and Vpopmail.
>
> <debug>  Verifying Configuration Of default
> <error>  Verifying Configuration Of default > Failed - Invalid Name Of 
> Service
> <debug>  Monitoring Service default
> <debug>  Monitoring Service default > Service Is Running

Yours $OCF_RESKEY_* variables are not set, so resource agent does not 
know name of the service. Resource agents are not used directly but via 
rgmanager, so you should start it and your service will be started 
automatically. If you need run in it debugging mode, you can try 
'clurgmgrd -fd'.

> <mysql config_file="/production/mysql/etc/my.cnf" 
> listen_address="24.149.30.40" mysql_options="" name="productionsql" 
> shutdown_wait=""/>
> <script file="/production/qmail/bin/qmailctl" name="qmail"/>
> </resources>
> <service autostart="1" domain="mailcluster" name="mailcluster" 
> recovery="restart">
> <ip ref="24.149.30.40">
> <fs ref="backup"/>
> <fs ref="production"/>
> <mysql ref="productionsql">
> <fs ref="production"/>
> </mysql>
> <script ref="qmail">
> <fs ref="production"/>
> </script>
> </ip>
> </service>
> </rm>
> </cluster>
>
marx,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091229/14c2a277/attachment.htm>

From td3201 at gmail.com  Tue Dec 29 19:30:15 2009
From: td3201 at gmail.com (Terry)
Date: Tue, 29 Dec 2009 13:30:15 -0600
Subject: [Linux-cluster] cannot add 3rd node to running cluster
Message-ID: <8ee061010912291130n68f0bad6l496f71df2cd703ac@mail.gmail.com>

Hello,

I have a working 2 node cluster that I am trying to add a third node
to.   I am trying to use Red Hat's conga (luci) to add the node in but
I have also tried command line as well with no luck.  I cannot start
cman.  cman_tool does not give any errors when I try to join the
cluster either, even with -d.  I am not sure where to take this at
this point.  Here are my package versions:

cman-2.0.115-1.el5_4.9
rgmanager-2.0.52-1.el5_4.3
modcluster-0.12.1-2.el5
luci-0.12.2-6.el5_4.1
ricci-0.12.2-6.el5_4.1

I would really appreciate some help.

Thanks,
Terry



From jwellband at gmail.com  Tue Dec 29 23:20:42 2009
From: jwellband at gmail.com (Jason W.)
Date: Tue, 29 Dec 2009 18:20:42 -0500
Subject: [Linux-cluster] cannot add 3rd node to running cluster
In-Reply-To: <8ee061010912291130n68f0bad6l496f71df2cd703ac@mail.gmail.com>
References: <8ee061010912291130n68f0bad6l496f71df2cd703ac@mail.gmail.com>
Message-ID: <74e9d01e0912291520l3bc36ac4yc7a17b1f96fa123d@mail.gmail.com>

On Tue, Dec 29, 2009 at 2:30 PM, Terry <td3201 at gmail.com> wrote:
> Hello,
>
> I have a working 2 node cluster that I am trying to add a third node
> to. ? I am trying to use Red Hat's conga (luci) to add the node in but

If you have two node cluster with two_node=1 in cluster.conf - such as
two nodes with no quorum device to break a tie - you'll need to bring
the cluster down, change two_node to 0 on both nodes (and rev the
cluster version at the top of cluster.conf), bring the cluster up and
then add the third node.

For troubleshooting any cluster issue, take a look at syslog
(/var/log/messages by default). It can help to watch it on a
centralized syslog server that all of your nodes forward logs to.

-- 
HTH, YMMV, HANW :)

Jason

The path to enlightenment is /usr/bin/enlightenment.



From xishipan at gmail.com  Wed Dec 30 01:44:08 2009
From: xishipan at gmail.com (Xishi PAN)
Date: Wed, 30 Dec 2009 09:44:08 +0800
Subject: [Linux-cluster] changing heartbeat interface
In-Reply-To: <8b711df40912091054m3a8c5d7ax42d7cd0143898fde@mail.gmail.com>
References: <8b711df40912091054m3a8c5d7ax42d7cd0143898fde@mail.gmail.com>
Message-ID: <aacd930b0912291744k323ec4a9ib3de258cab2f0330@mail.gmail.com>

Hi,
    Would you like to try channel bonding?

Thanks.
G.P

On Thu, Dec 10, 2009 at 2:54 AM, Paras pradhan <pradhanparas at gmail.com>wrote:

> hi,
>
> I believe its not recommend but just curious to know about the consequences
> of changing  the heartbeat of the cluster to the 2nd interface of the
> cluster nodes. In this case  if the network switch fails , then cluster will
> still be quorate since they will be connected each other with the 2nd
> interfaces of the nodes and will not be fenced.
>
> Thanks
> Paras.
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Stay Fabulous,
Xishi PAN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091230/96117f38/attachment.htm>

From diamondiona at gmail.com  Wed Dec 30 03:36:02 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Wed, 30 Dec 2009 11:36:02 +0800
Subject: [Linux-cluster] can not mount GFS, "no such device"
Message-ID: <dd23a5e0912291936q308d8f14x401e7d49cdcf912@mail.gmail.com>

Hello, everyone:

I failed to mount GFS with error message "no such device". But I have
confirmed that the device exists and all relevant kernel modules have
been loaded.

I am using RH5.4 and no any customization at all.

Would someone kindly help?

[root at wplccdlvm445 ~]# mount -t gfs -v /dev/vg100/lvol0 /gfs
/sbin/mount.gfs: mount /dev/mapper/vg100-lvol0 /gfs
/sbin/mount.gfs: parse_opts: opts = "rw"
/sbin/mount.gfs:   clear flag 1 for "rw", flags = 0
/sbin/mount.gfs: parse_opts: flags = 0
/sbin/mount.gfs: parse_opts: extra = ""
/sbin/mount.gfs: parse_opts: hostdata = ""
/sbin/mount.gfs: parse_opts: lockproto = ""
/sbin/mount.gfs: parse_opts: locktable = ""
/sbin/mount.gfs: message to gfs_controld: asking to join mountgroup:
/sbin/mount.gfs: write "join /gfs gfs lock_dlm clearcase:gfs rw
/dev/mapper/vg100-lvol0"
/sbin/mount.gfs: message from gfs_controld: response to join request:
/sbin/mount.gfs: lock_dlm_join: read "0"
/sbin/mount.gfs: message from gfs_controld: mount options:
/sbin/mount.gfs: lock_dlm_join: read "hostdata=jid=0:id=327681:first=1"
/sbin/mount.gfs: lock_dlm_join: hostdata: "hostdata=jid=0:id=327681:first=1"
/sbin/mount.gfs: lock_dlm_join: extra_plus: "hostdata=jid=0:id=327681:first=1"
/sbin/mount.gfs: mount(2) failed error -1 errno 19
/sbin/mount.gfs: lock_dlm_mount_result: write "mount_result /gfs gfs -1"
/sbin/mount.gfs: message to gfs_controld: asking to leave mountgroup:
/sbin/mount.gfs: lock_dlm_leave: write "leave /gfs gfs 19"
/sbin/mount.gfs: message from gfs_controld: response to leave request:
/sbin/mount.gfs: lock_dlm_leave: read "0"
/sbin/mount.gfs: error mounting /dev/mapper/vg100-lvol0 on /gfs: No such device

[root at wplccdlvm445 ~]# ls /dev/mapper/vg100-lvol0
/dev/mapper/vg100-lvol0

[root at wplccdlvm445 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.4 (Tikanga)
[root at wplccdlvm445 ~]# uname -r
2.6.18-164.el5

[root at wplccdlvm445 ~]# lsmod |grep gfs
gfs2                  347529  1 lock_dlm
configfs               28753  2 dlm
[root at wplccdlvm445 ~]# lsmod |grep dl
lock_dlm               20193  0
gfs2                  347529  1 lock_dlm
dlm                   113749  11 lock_dlm
configfs               28753  2 dlm



From diamondiona at gmail.com  Wed Dec 30 06:11:52 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Wed, 30 Dec 2009 14:11:52 +0800
Subject: [Linux-cluster] can not mount GFS, "no such device"
In-Reply-To: <dd23a5e0912291936q308d8f14x401e7d49cdcf912@mail.gmail.com>
References: <dd23a5e0912291936q308d8f14x401e7d49cdcf912@mail.gmail.com>
Message-ID: <dd23a5e0912292211o7168a20csf086d7a4264bfc65@mail.gmail.com>

after I use mkfs.gfs2, it works. However, I did not see any document
to mention this command,  always gfs_mkfs.

in my humble opnion, redhat has a log way to provide real enterprise
solution, both from software quality and documentation.



On Wed, Dec 30, 2009 at 11:36 AM, Diamond Li <diamondiona at gmail.com> wrote:
> Hello, everyone:
>
> I failed to mount GFS with error message "no such device". But I have
> confirmed that the device exists and all relevant kernel modules have
> been loaded.
>
> I am using RH5.4 and no any customization at all.
>
> Would someone kindly help?
>
> [root at wplccdlvm445 ~]# mount -t gfs -v /dev/vg100/lvol0 /gfs
> /sbin/mount.gfs: mount /dev/mapper/vg100-lvol0 /gfs
> /sbin/mount.gfs: parse_opts: opts = "rw"
> /sbin/mount.gfs: ? clear flag 1 for "rw", flags = 0
> /sbin/mount.gfs: parse_opts: flags = 0
> /sbin/mount.gfs: parse_opts: extra = ""
> /sbin/mount.gfs: parse_opts: hostdata = ""
> /sbin/mount.gfs: parse_opts: lockproto = ""
> /sbin/mount.gfs: parse_opts: locktable = ""
> /sbin/mount.gfs: message to gfs_controld: asking to join mountgroup:
> /sbin/mount.gfs: write "join /gfs gfs lock_dlm clearcase:gfs rw
> /dev/mapper/vg100-lvol0"
> /sbin/mount.gfs: message from gfs_controld: response to join request:
> /sbin/mount.gfs: lock_dlm_join: read "0"
> /sbin/mount.gfs: message from gfs_controld: mount options:
> /sbin/mount.gfs: lock_dlm_join: read "hostdata=jid=0:id=327681:first=1"
> /sbin/mount.gfs: lock_dlm_join: hostdata: "hostdata=jid=0:id=327681:first=1"
> /sbin/mount.gfs: lock_dlm_join: extra_plus: "hostdata=jid=0:id=327681:first=1"
> /sbin/mount.gfs: mount(2) failed error -1 errno 19
> /sbin/mount.gfs: lock_dlm_mount_result: write "mount_result /gfs gfs -1"
> /sbin/mount.gfs: message to gfs_controld: asking to leave mountgroup:
> /sbin/mount.gfs: lock_dlm_leave: write "leave /gfs gfs 19"
> /sbin/mount.gfs: message from gfs_controld: response to leave request:
> /sbin/mount.gfs: lock_dlm_leave: read "0"
> /sbin/mount.gfs: error mounting /dev/mapper/vg100-lvol0 on /gfs: No such device
>
> [root at wplccdlvm445 ~]# ls /dev/mapper/vg100-lvol0
> /dev/mapper/vg100-lvol0
>
> [root at wplccdlvm445 ~]# cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 5.4 (Tikanga)
> [root at wplccdlvm445 ~]# uname -r
> 2.6.18-164.el5
>
> [root at wplccdlvm445 ~]# lsmod |grep gfs
> gfs2 ? ? ? ? ? ? ? ? ?347529 ?1 lock_dlm
> configfs ? ? ? ? ? ? ? 28753 ?2 dlm
> [root at wplccdlvm445 ~]# lsmod |grep dl
> lock_dlm ? ? ? ? ? ? ? 20193 ?0
> gfs2 ? ? ? ? ? ? ? ? ?347529 ?1 lock_dlm
> dlm ? ? ? ? ? ? ? ? ? 113749 ?11 lock_dlm
> configfs ? ? ? ? ? ? ? 28753 ?2 dlm
>



From gordan at bobich.net  Wed Dec 30 06:44:23 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Wed, 30 Dec 2009 06:44:23 +0000
Subject: [Linux-cluster] can not mount GFS, "no such device"
In-Reply-To: <dd23a5e0912292211o7168a20csf086d7a4264bfc65@mail.gmail.com>
References: <dd23a5e0912291936q308d8f14x401e7d49cdcf912@mail.gmail.com>
	<dd23a5e0912292211o7168a20csf086d7a4264bfc65@mail.gmail.com>
Message-ID: <4B3AF6C7.2080203@bobich.net>

Diamond Li wrote:
> after I use mkfs.gfs2, it works. However, I did not see any document
> to mention this command,  always gfs_mkfs.

I'm not sure what you're doing differntly (you omitted the FS creation 
command in your previous email), but this works just fine for me:

gfs_mkfs -j 2 -p lock_dlm -t test:root /dev/hdb
mount /mnt/gfs

The fstab line is:
/dev/hdb   /mnt/gfs   gfs   defaults,noatime,nodiratime   0 0

Just tested it on a scratch VM.

I'm assuming you have your cluster.conf configured right and the cman 
service (which provides fenced, groupd, etc.) has started without any 
errors? Again, you haven't posted your cluster.conf so it's impossible 
to tell.

You also haven't specified whether your intention is to use gfs or gfs2. 
They are not the same.

> in my humble opnion, redhat has a log way to provide real enterprise
> solution, both from software quality and documentation.

There doesn't seem to be enough in this thread to persuade me that the 
cause of problems isn't user error. :)

Gordan



From diamondiona at gmail.com  Wed Dec 30 06:49:40 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Wed, 30 Dec 2009 14:49:40 +0800
Subject: [Linux-cluster] lvextend hangs up
Message-ID: <dd23a5e0912292249t6f645062x199efc75ee81e5dc@mail.gmail.com>

hello, everyone,

it is frustrated to see lvextend hanging up when I am trying to extend
a mirror  logical volume: no error message, log, can't exit using
CTRL+C. :-(

anyone has similar experience?

[root at wplccdlvm446 gfs]# lvextend -d -L +1G -m1  /dev/vg100/lvol0

[root at wplccdlvm446 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.4 (Tikanga)
[root at wplccdlvm446 ~]# uname -r
2.6.18-164.el5



From diamondiona at gmail.com  Wed Dec 30 06:53:54 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Wed, 30 Dec 2009 14:53:54 +0800
Subject: [Linux-cluster] can not mount GFS, "no such device"
In-Reply-To: <4B3AF6C7.2080203@bobich.net>
References: <dd23a5e0912291936q308d8f14x401e7d49cdcf912@mail.gmail.com>
	<dd23a5e0912292211o7168a20csf086d7a4264bfc65@mail.gmail.com>
	<4B3AF6C7.2080203@bobich.net>
Message-ID: <dd23a5e0912292253p3e22119cs9a946e2a3cf2808@mail.gmail.com>

thanks Gordan,  looks like we are in the same timezone, here  is the
command, same as previous one except for using mkfs.gfs2 instead of
gfs_mkfs

 mkfs.gfs2 -t clearcase:gfs -p lock_dlm -j 6 /dev/vg100/lvol0

[root at wplccdlvm445 gfs]# clustat
Cluster Status for clearcase @ Wed Dec 30 14:56:37 2009
Member Status: Quorate

 Member Name                                       ID   Status
 ------ ----                                       ---- ------
 wplccdlvm445.cn.ibm.com                               1 Online, Local
 wplccdlvm446.cn.ibm.com                               2 Online


On Wed, Dec 30, 2009 at 2:44 PM, Gordan Bobic <gordan at bobich.net> wrote:
> Diamond Li wrote:
>>
>> after I use mkfs.gfs2, it works. However, I did not see any document
>> to mention this command, ?always gfs_mkfs.
>
> I'm not sure what you're doing differntly (you omitted the FS creation
> command in your previous email), but this works just fine for me:
>
> gfs_mkfs -j 2 -p lock_dlm -t test:root /dev/hdb
> mount /mnt/gfs
>
> The fstab line is:
> /dev/hdb ? /mnt/gfs ? gfs ? defaults,noatime,nodiratime ? 0 0
>
> Just tested it on a scratch VM.
>
> I'm assuming you have your cluster.conf configured right and the cman
> service (which provides fenced, groupd, etc.) has started without any
> errors? Again, you haven't posted your cluster.conf so it's impossible to
> tell.
>
> You also haven't specified whether your intention is to use gfs or gfs2.
> They are not the same.
>
>> in my humble opnion, redhat has a log way to provide real enterprise
>> solution, both from software quality and documentation.
>
> There doesn't seem to be enough in this thread to persuade me that the cause
> of problems isn't user error. :)
>
> Gordan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From cthulhucalling at gmail.com  Wed Dec 30 07:01:03 2009
From: cthulhucalling at gmail.com (Ian Hayes)
Date: Tue, 29 Dec 2009 23:01:03 -0800
Subject: [Linux-cluster] can not mount GFS, "no such device"
In-Reply-To: <4B3AF6C7.2080203@bobich.net>
References: <dd23a5e0912291936q308d8f14x401e7d49cdcf912@mail.gmail.com>
	<dd23a5e0912292211o7168a20csf086d7a4264bfc65@mail.gmail.com>
	<4B3AF6C7.2080203@bobich.net>
Message-ID: <36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com>

On Tue, Dec 29, 2009 at 10:44 PM, Gordan Bobic <gordan at bobich.net> wrote:

> Diamond Li wrote:
>
>> after I use mkfs.gfs2, it works. However, I did not see any document
>> to mention this command,  always gfs_mkfs.
>>
>
> I'm not sure what you're doing differntly (you omitted the FS creation
> command in your previous email), but this works just fine for me:
>
> gfs_mkfs -j 2 -p lock_dlm -t test:root /dev/hdb
> mount /mnt/gfs
>
> The fstab line is:
> /dev/hdb   /mnt/gfs   gfs   defaults,noatime,nodiratime   0 0
>


I had a similar problem in my Redhat Clustering and Storage Management class
the other week. I believe the problem was with a couple of mistakes I made
while playing around in one of the labs. I know once it was because I was
trying to mount the block device instead of the logical volume.

 in my humble opnion, redhat has a log way to provide real enterprise
> solution, both from software quality and documentation.
>


> There doesn't seem to be enough in this thread to persuade me that the
> cause of problems isn't user error. :)
>

IIRC, gfs2 is still under development and considered experimental. There's
tons of documentation for production-quality GFS and I imagine once gfs2
gets more mainlined, this will be the case also.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091229/2f14c192/attachment.htm>

From diamondiona at gmail.com  Wed Dec 30 07:14:59 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Wed, 30 Dec 2009 15:14:59 +0800
Subject: [Linux-cluster] can not mount GFS, "no such device"
In-Reply-To: <36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com>
References: <dd23a5e0912291936q308d8f14x401e7d49cdcf912@mail.gmail.com>
	<dd23a5e0912292211o7168a20csf086d7a4264bfc65@mail.gmail.com>
	<4B3AF6C7.2080203@bobich.net>
	<36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com>
Message-ID: <dd23a5e0912292314v767d6cf7o3550d4f887f092e8@mail.gmail.com>

Since I have no idea about you guys OS version. From my version,
RH5.4, system is using
gfs2 kernel module, so I guess I have to use mkfs.gfs2 to create gfs2
file system.  However, I didn't see any RH5.4 document pointing
this(or I missed it out).

If you guys have the same configuration, that means GFS tools is
unstable because the only change I did is using different command.

Same as the problem I encountered using lvcreate, on the first day it
always hangs up, but in next morning, it executed successfully without
any changes. It sounds impossible but this is the truth.

[root at wplccdlvm445 gfs]# lsmod |grep -i gfs
gfs2                  347529  2 lock_dlm
configfs               28753  2 dlm

On Wed, Dec 30, 2009 at 3:01 PM, Ian Hayes <cthulhucalling at gmail.com> wrote:
> On Tue, Dec 29, 2009 at 10:44 PM, Gordan Bobic <gordan at bobich.net> wrote:
>>
>> Diamond Li wrote:
>>>
>>> after I use mkfs.gfs2, it works. However, I did not see any document
>>> to mention this command, ?always gfs_mkfs.
>>
>> I'm not sure what you're doing differntly (you omitted the FS creation
>> command in your previous email), but this works just fine for me:
>>
>> gfs_mkfs -j 2 -p lock_dlm -t test:root /dev/hdb
>> mount /mnt/gfs
>>
>> The fstab line is:
>> /dev/hdb ? /mnt/gfs ? gfs ? defaults,noatime,nodiratime ? 0 0
>
>
> I had a similar problem in my Redhat Clustering and Storage Management class
> the other week. I believe the problem was with a couple of mistakes I made
> while playing around in one of the labs. I know once it was because I was
> trying to mount the block device instead of the logical volume.
>
>> in my humble opnion, redhat has a log way to provide real enterprise
>> solution, both from software quality and documentation.
>
>>
>> There doesn't seem to be enough in this thread to persuade me that the
>> cause of problems isn't user error. :)
>
> IIRC, gfs2 is still under development and considered experimental. There's
> tons of documentation for production-quality GFS and I imagine once gfs2
> gets more mainlined, this will be the case also.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From gordan at bobich.net  Wed Dec 30 07:23:40 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Wed, 30 Dec 2009 07:23:40 +0000
Subject: [Linux-cluster] can not mount GFS, "no such device"
In-Reply-To: <36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com>
References: <dd23a5e0912291936q308d8f14x401e7d49cdcf912@mail.gmail.com>	<dd23a5e0912292211o7168a20csf086d7a4264bfc65@mail.gmail.com>	<4B3AF6C7.2080203@bobich.net>
	<36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com>
Message-ID: <4B3AFFFC.2000903@bobich.net>

Ian Hayes wrote:

> I had a similar problem in my Redhat Clustering and Storage Management 
> class the other week. I believe the problem was with a couple of 
> mistakes I made while playing around in one of the labs. I know once it 
> was because I was trying to mount the block device instead of the 
> logical volume.

I'm assuming you mean that you were mkfs-ing one and then trying to 
mount the other. I'm vehemently against putting everything on lvm just 
for the sake of it, but I've never had a problem with mkfs-ing or 
mount-ing either, as long as it's consistent. I tend not to partition 
iSCSI and DRBD volumes, so I know that working direct with the whole 
block device works just fine.

>     in my humble opnion, redhat has a log way to provide real enterprise
>     solution, both from software quality and documentation.
> 
> 
> 
>     There doesn't seem to be enough in this thread to persuade me that
>     the cause of problems isn't user error. :)
> 
> 
> IIRC, gfs2 is still under development and considered experimental. 
> There's tons of documentation for production-quality GFS and I imagine 
> once gfs2 gets more mainlined, this will be the case also. 

Don't quite me on this, but I'm pretty sure GFS2 is deemed stable as of 
RHEL 5.4 (or was it 5.3?). Having said that, I haven't yet deployed any 
GFS2 volumes in production, and don't plan on doing so imminently, so 
draw whatever conclusions you see fit from that. ;)

Gordan



From gordan at bobich.net  Wed Dec 30 07:31:03 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Wed, 30 Dec 2009 07:31:03 +0000
Subject: [Linux-cluster] can not mount GFS, "no such device"
In-Reply-To: <dd23a5e0912292314v767d6cf7o3550d4f887f092e8@mail.gmail.com>
References: <dd23a5e0912291936q308d8f14x401e7d49cdcf912@mail.gmail.com>	<dd23a5e0912292211o7168a20csf086d7a4264bfc65@mail.gmail.com>	<4B3AF6C7.2080203@bobich.net>	<36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com>
	<dd23a5e0912292314v767d6cf7o3550d4f887f092e8@mail.gmail.com>
Message-ID: <4B3B01B7.10907@bobich.net>

Diamond Li wrote:
> Since I have no idea about you guys OS version. From my version,
> RH5.4, system is using
> gfs2 kernel module, so I guess I have to use mkfs.gfs2 to create gfs2
> file system.  However, I didn't see any RH5.4 document pointing
> this(or I missed it out).

I suspect most documentation still doesn't mention GFS2 since it is 
still quite new and not far on it's maturity curve. GFS1, OTOH, has been 
around for a long time and is what is expected to be in production at 
the moment. FYI, I use RHEL/CentOS 5.x on my systems, most are now 
updated to 5.4 (as was the example I ran to test what you reported).

There are two separate kernel modules: gfs and gfs2. gfs requires gfs2 
(some of the low level dependencies were moved there a long time ago), 
but working with GFS1 requires the gfs kernel module. If you haven't got 
gfs loaded (but do have gfs2 loaded) that would explain why you were 
having difficulties mounting a GFS1 file system (but GFS2 worked fine).

Your lsmod information seems consistent with this theory.

> If you guys have the same configuration, that means GFS tools is
> unstable because the only change I did is using different command.

See previous paragraph for gfs vs. gfs2.

> Same as the problem I encountered using lvcreate, on the first day it
> always hangs up, but in next morning, it executed successfully without
> any changes. It sounds impossible but this is the truth.

Just to make sure - I take it you are aware that lvm (the non-cluster 
version) is different to clvm (the cluster-aware version)? You aren't 
using the non-cluster lvm for a cluster volume, are you?

Gordan



From cthulhucalling at gmail.com  Wed Dec 30 08:07:18 2009
From: cthulhucalling at gmail.com (Ian Hayes)
Date: Wed, 30 Dec 2009 00:07:18 -0800
Subject: [Linux-cluster] can not mount GFS, "no such device"
In-Reply-To: <4B3AFFFC.2000903@bobich.net>
References: <dd23a5e0912291936q308d8f14x401e7d49cdcf912@mail.gmail.com>
	<dd23a5e0912292211o7168a20csf086d7a4264bfc65@mail.gmail.com>
	<4B3AF6C7.2080203@bobich.net>
	<36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com>
	<4B3AFFFC.2000903@bobich.net>
Message-ID: <36df569a0912300007v3f7ff0b9p87ffe405987099dc@mail.gmail.com>

On Tue, Dec 29, 2009 at 11:23 PM, Gordan Bobic <gordan at bobich.net> wrote:

> Ian Hayes wrote:
>
>  I had a similar problem in my Redhat Clustering and Storage Management
>> class the other week. I believe the problem was with a couple of mistakes I
>> made while playing around in one of the labs. I know once it was because I
>> was trying to mount the block device instead of the logical volume.
>>
>
> I'm assuming you mean that you were mkfs-ing one and then trying to mount
> the other. I'm vehemently against putting everything on lvm just for the
> sake of it, but I've never had a problem with mkfs-ing or mount-ing either,
> as long as it's consistent. I tend not to partition iSCSI and DRBD volumes,
> so I know that working direct with the whole block device works just fine.
>

Well, the good thing about being in a RH class is that you can do all kinds
of sick, twisted evil things just to see what happens. I've also made the
mistake of doing things like not changing the locking_type in lvm.conf to 3
and forgetting to start clvmd. Any of those can lead to strange and exciting
times with GFS.


IIRC, gfs2 is still under development and considered experimental. There's
>> tons of documentation for production-quality GFS and I imagine once gfs2
>> gets more mainlined, this will be the case also.
>>
>
> Don't quite me on this, but I'm pretty sure GFS2 is deemed stable as of
> RHEL 5.4 (or was it 5.3?). Having said that, I haven't yet deployed any GFS2
> volumes in production, and don't plan on doing so imminently, so draw
> whatever conclusions you see fit from that. ;)


We're fine with GFS where we are. I've done some benchmarking on GFS2 and
it's performance didn't come anywhere near what we could do with GFS.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091230/ddd15a95/attachment.htm>

From diamondiona at gmail.com  Wed Dec 30 08:20:50 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Wed, 30 Dec 2009 16:20:50 +0800
Subject: [Linux-cluster] can not mount GFS, "no such device"
In-Reply-To: <36df569a0912300007v3f7ff0b9p87ffe405987099dc@mail.gmail.com>
References: <dd23a5e0912291936q308d8f14x401e7d49cdcf912@mail.gmail.com>
	<dd23a5e0912292211o7168a20csf086d7a4264bfc65@mail.gmail.com>
	<4B3AF6C7.2080203@bobich.net>
	<36df569a0912292301iffae905n7cf75f3c95f0b1a1@mail.gmail.com>
	<4B3AFFFC.2000903@bobich.net>
	<36df569a0912300007v3f7ff0b9p87ffe405987099dc@mail.gmail.com>
Message-ID: <dd23a5e0912300020m571f6d03n5aed66b1dd7a857e@mail.gmail.com>

I have started clvmd  on all nodes, and changed locking_type. anyway,
I will keep an eye on this random error.

On Wed, Dec 30, 2009 at 4:07 PM, Ian Hayes <cthulhucalling at gmail.com> wrote:
>
>
> On Tue, Dec 29, 2009 at 11:23 PM, Gordan Bobic <gordan at bobich.net> wrote:
>>
>> Ian Hayes wrote:
>>
>>> I had a similar problem in my Redhat Clustering and Storage Management
>>> class the other week. I believe the problem was with a couple of mistakes I
>>> made while playing around in one of the labs. I know once it was because I
>>> was trying to mount the block device instead of the logical volume.
>>
>> I'm assuming you mean that you were mkfs-ing one and then trying to mount
>> the other. I'm vehemently against putting everything on lvm just for the
>> sake of it, but I've never had a problem with mkfs-ing or mount-ing either,
>> as long as it's consistent. I tend not to partition iSCSI and DRBD volumes,
>> so I know that working direct with the whole block device works just fine.
>
> Well, the good thing about being in a RH class is that you can do all kinds
> of sick, twisted evil things just to see what happens. I've also made the
> mistake of doing things like not changing the locking_type in lvm.conf to 3
> and forgetting to start clvmd. Any of those can lead to strange and exciting
> times with GFS.
>
>
>>> IIRC, gfs2 is still under development and considered experimental.
>>> There's tons of documentation for production-quality GFS and I imagine once
>>> gfs2 gets more mainlined, this will be the case also.
>>
>> Don't quite me on this, but I'm pretty sure GFS2 is deemed stable as of
>> RHEL 5.4 (or was it 5.3?). Having said that, I haven't yet deployed any GFS2
>> volumes in production, and don't plan on doing so imminently, so draw
>> whatever conclusions you see fit from that. ;)
>
> We're fine with GFS where we are. I've done some benchmarking on GFS2 and
> it's performance didn't come anywhere near what we could do with GFS.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From diamondiona at gmail.com  Wed Dec 30 09:41:42 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Wed, 30 Dec 2009 17:41:42 +0800
Subject: [Linux-cluster] CTDB configuration files are missing
Message-ID: <dd23a5e0912300141x2c7efd87gb8c56b0409858bfa@mail.gmail.com>

Hello, everyone,
it may not the right group to ask CTDB question, but if someone
happens to know the answer, I would appreciate. after I compiled and
installed ctdb, I did not see configure file  /etc/sysconfig/ctdb, but
there is no errors during installation.

It should be created during installation, right?

one more question, is there easy to build CTDB rpm package?

[root at wplccdlvm445 config]# ls -l /etc/sysconfig/ctdb
ls: /etc/sysconfig/ctdb: No such file or directory

installation steps:
cd ctdb
   ./autogen.sh
   ./configure
   make
   make install



[root at wplccdlvm445 ctdb]# make install |less
ctdb will be compiled with flags:
  CFLAGS = -g -I./include -Iinclude -Ilib -Ilib/util -I.
-I./lib/talloc -Ilib/tdb/include -I./lib/re
place -DVARDIR=\"/usr/local/var\" -DETCDIR=\"/usr/local/etc\"
-DLOGDIR=\"/usr/local/var/log\" -DUSE_
MMAP=1  -I./lib/replace -Wall -Wshadow -Wstrict-prototypes
-Wpointer-arith -Wcast-qual -Wcast-align
-Wwrite-strings
  LIBS =
mkdir -p //usr/local/lib/pkgconfig
mkdir -p //usr/local/bin
mkdir -p //usr/local/sbin
mkdir -p //usr/local/include
mkdir -p //usr/local/etc/ctdb
mkdir -p //usr/local/etc/ctdb/events.d
mkdir -p //usr/share/doc/ctdb
/usr/bin/install -c -m 644 ctdb.pc //usr/local/lib/pkgconfig
/usr/bin/install -c -m 755 bin/ctdb //usr/local/bin
/usr/bin/install -c -m 755 bin/ctdbd //usr/local/sbin
/usr/bin/install -c -m 755 bin/smnotify //usr/local/bin
/usr/bin/install -c -m 755 bin/ping_pong //usr/local/bin
/usr/bin/install -c -m 644 include/ctdb.h //usr/local/include
/usr/bin/install -c -m 644 include/ctdb_private.h //usr/local/include
# for samba3
/usr/bin/install -c -m 644 config/functions //usr/local/etc/ctdb
/usr/bin/install -c -m 755 config/statd-callout //usr/local/etc/ctdb
/usr/bin/install -c -m 644 config/events.d/README
//usr/share/doc/ctdb/README.eventscripts
/usr/bin/install -c -m 644 doc/recovery-process.txt
//usr/share/doc/ctdb/recovery-process.txt
/usr/bin/install -c -m 755 config/events.d/00.ctdb //usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 755 config/events.d/01.reclock
//usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 755 config/events.d/10.interface
//usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 755 config/events.d/11.natgw
//usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 755 config/events.d/11.routing
//usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 644 config/events.d/20.multipathd
//usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 644 config/events.d/31.clamd
//usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 755 config/events.d/40.vsftpd
//usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 755 config/events.d/41.httpd
//usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 755 config/events.d/50.samba
//usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 755 config/events.d/60.nfs //usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 755 config/events.d/61.nfstickle
//usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 755 config/events.d/70.iscsi
//usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 755 config/events.d/91.lvs //usr/local/etc/ctdb/events.d
/usr/bin/install -c -m 755 tools/ctdb_diagnostics //usr/local/bin
/usr/bin/install -c -m 755 tools/onnode //usr/local/bin
if [ -f doc/ctdb.1 ];then /usr/bin/install -c -d //usr/local/man/man1; fi
if [ -f doc/ctdb.1 ];then /usr/bin/install -c -m 644 doc/ctdb.1
//usr/local/man/man1; fi
if [ -f doc/ctdbd.1 ];then /usr/bin/install -c -m 644 doc/ctdbd.1
//usr/local/man/man1; fi
if [ -f doc/onnode.1 ];then /usr/bin/install -c -m 644 doc/onnode.1
//usr/local/man/man1; fi
if [ ! -f //usr/local/etc/ctdb/notify.sh ];then /usr/bin/install -c -m
755 config/notify.sh //usr/local/etc/ctdb; fi



From gordan at bobich.net  Wed Dec 30 09:59:30 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Wed, 30 Dec 2009 09:59:30 +0000
Subject: [Linux-cluster] CTDB configuration files are missing
In-Reply-To: <dd23a5e0912300141x2c7efd87gb8c56b0409858bfa@mail.gmail.com>
References: <dd23a5e0912300141x2c7efd87gb8c56b0409858bfa@mail.gmail.com>
Message-ID: <4B3B2482.9080906@bobich.net>

Never used CTDB myself, but as far as RPMs go, they are available in the 
epel yum repository.

rpm -Uvh \
http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-5-3.noarch.rpm

yum install ctdb

should to get you going without building your own.

Gordan

Diamond Li wrote:
> Hello, everyone,
> it may not the right group to ask CTDB question, but if someone
> happens to know the answer, I would appreciate. after I compiled and
> installed ctdb, I did not see configure file  /etc/sysconfig/ctdb, but
> there is no errors during installation.
> 
> It should be created during installation, right?
> 
> one more question, is there easy to build CTDB rpm package?
> 
> [root at wplccdlvm445 config]# ls -l /etc/sysconfig/ctdb
> ls: /etc/sysconfig/ctdb: No such file or directory
> 
> installation steps:
> cd ctdb
>    ./autogen.sh
>    ./configure
>    make
>    make install
> 
> 
> 
> [root at wplccdlvm445 ctdb]# make install |less
> ctdb will be compiled with flags:
>   CFLAGS = -g -I./include -Iinclude -Ilib -Ilib/util -I.
> -I./lib/talloc -Ilib/tdb/include -I./lib/re
> place -DVARDIR=\"/usr/local/var\" -DETCDIR=\"/usr/local/etc\"
> -DLOGDIR=\"/usr/local/var/log\" -DUSE_
> MMAP=1  -I./lib/replace -Wall -Wshadow -Wstrict-prototypes
> -Wpointer-arith -Wcast-qual -Wcast-align
> -Wwrite-strings
>   LIBS =
> mkdir -p //usr/local/lib/pkgconfig
> mkdir -p //usr/local/bin
> mkdir -p //usr/local/sbin
> mkdir -p //usr/local/include
> mkdir -p //usr/local/etc/ctdb
> mkdir -p //usr/local/etc/ctdb/events.d
> mkdir -p //usr/share/doc/ctdb
> /usr/bin/install -c -m 644 ctdb.pc //usr/local/lib/pkgconfig
> /usr/bin/install -c -m 755 bin/ctdb //usr/local/bin
> /usr/bin/install -c -m 755 bin/ctdbd //usr/local/sbin
> /usr/bin/install -c -m 755 bin/smnotify //usr/local/bin
> /usr/bin/install -c -m 755 bin/ping_pong //usr/local/bin
> /usr/bin/install -c -m 644 include/ctdb.h //usr/local/include
> /usr/bin/install -c -m 644 include/ctdb_private.h //usr/local/include
> # for samba3
> /usr/bin/install -c -m 644 config/functions //usr/local/etc/ctdb
> /usr/bin/install -c -m 755 config/statd-callout //usr/local/etc/ctdb
> /usr/bin/install -c -m 644 config/events.d/README
> //usr/share/doc/ctdb/README.eventscripts
> /usr/bin/install -c -m 644 doc/recovery-process.txt
> //usr/share/doc/ctdb/recovery-process.txt
> /usr/bin/install -c -m 755 config/events.d/00.ctdb //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 755 config/events.d/01.reclock
> //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 755 config/events.d/10.interface
> //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 755 config/events.d/11.natgw
> //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 755 config/events.d/11.routing
> //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 644 config/events.d/20.multipathd
> //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 644 config/events.d/31.clamd
> //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 755 config/events.d/40.vsftpd
> //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 755 config/events.d/41.httpd
> //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 755 config/events.d/50.samba
> //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 755 config/events.d/60.nfs //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 755 config/events.d/61.nfstickle
> //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 755 config/events.d/70.iscsi
> //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 755 config/events.d/91.lvs //usr/local/etc/ctdb/events.d
> /usr/bin/install -c -m 755 tools/ctdb_diagnostics //usr/local/bin
> /usr/bin/install -c -m 755 tools/onnode //usr/local/bin
> if [ -f doc/ctdb.1 ];then /usr/bin/install -c -d //usr/local/man/man1; fi
> if [ -f doc/ctdb.1 ];then /usr/bin/install -c -m 644 doc/ctdb.1
> //usr/local/man/man1; fi
> if [ -f doc/ctdbd.1 ];then /usr/bin/install -c -m 644 doc/ctdbd.1
> //usr/local/man/man1; fi
> if [ -f doc/onnode.1 ];then /usr/bin/install -c -m 644 doc/onnode.1
> //usr/local/man/man1; fi
> if [ ! -f //usr/local/etc/ctdb/notify.sh ];then /usr/bin/install -c -m
> 755 config/notify.sh //usr/local/etc/ctdb; fi
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From crosa at redhat.com  Wed Dec 30 10:08:45 2009
From: crosa at redhat.com (Cleber Rosa)
Date: Wed, 30 Dec 2009 05:08:45 -0500 (EST)
Subject: [Linux-cluster] CTDB configuration files are missing
In-Reply-To: <dd23a5e0912300141x2c7efd87gb8c56b0409858bfa@mail.gmail.com>
Message-ID: <1581333230.47421262167725041.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com>

Hi Li, 

AFAIK there are ctdb packages on RHEL's supplementary channel (to support the samba 3x package). 

CR.
--- 
Cleber Rodrigues < crosa at redhat.com > 
Solutions Architect - Red Hat, Inc. 
Mobile: +55 61 9185.3454 

----- Mensagem original ----- 
De: "Diamond Li" <diamondiona at gmail.com> 
Para: "linux clustering" <linux-cluster at redhat.com> 
Enviadas: Quarta-feira, 30 de Dezembro de 2009 7:41:42 (GMT-0300) Auto-Detected 
Assunto: [Linux-cluster] CTDB configuration files are missing 

Hello, everyone, 
it may not the right group to ask CTDB question, but if someone 
happens to know the answer, I would appreciate. after I compiled and 
installed ctdb, I did not see configure file /etc/sysconfig/ctdb, but 
there is no errors during installation. 

It should be created during installation, right? 

one more question, is there easy to build CTDB rpm package? 

[root at wplccdlvm445 config]# ls -l /etc/sysconfig/ctdb 
ls: /etc/sysconfig/ctdb: No such file or directory 

installation steps: 
cd ctdb 
./autogen.sh 
./configure 
make 
make install 



[root at wplccdlvm445 ctdb]# make install |less 
ctdb will be compiled with flags: 
CFLAGS = -g -I./include -Iinclude -Ilib -Ilib/util -I. 
-I./lib/talloc -Ilib/tdb/include -I./lib/re 
place -DVARDIR=\"/usr/local/var\" -DETCDIR=\"/usr/local/etc\" 
-DLOGDIR=\"/usr/local/var/log\" -DUSE_ 
MMAP=1 -I./lib/replace -Wall -Wshadow -Wstrict-prototypes 
-Wpointer-arith -Wcast-qual -Wcast-align 
-Wwrite-strings 
LIBS = 
mkdir -p //usr/local/lib/pkgconfig 
mkdir -p //usr/local/bin 
mkdir -p //usr/local/sbin 
mkdir -p //usr/local/include 
mkdir -p //usr/local/etc/ctdb 
mkdir -p //usr/local/etc/ctdb/events.d 
mkdir -p //usr/share/doc/ctdb 
/usr/bin/install -c -m 644 ctdb.pc //usr/local/lib/pkgconfig 
/usr/bin/install -c -m 755 bin/ctdb //usr/local/bin 
/usr/bin/install -c -m 755 bin/ctdbd //usr/local/sbin 
/usr/bin/install -c -m 755 bin/smnotify //usr/local/bin 
/usr/bin/install -c -m 755 bin/ping_pong //usr/local/bin 
/usr/bin/install -c -m 644 include/ctdb.h //usr/local/include 
/usr/bin/install -c -m 644 include/ctdb_private.h //usr/local/include 
# for samba3 
/usr/bin/install -c -m 644 config/functions //usr/local/etc/ctdb 
/usr/bin/install -c -m 755 config/statd-callout //usr/local/etc/ctdb 
/usr/bin/install -c -m 644 config/events.d/README 
//usr/share/doc/ctdb/README.eventscripts 
/usr/bin/install -c -m 644 doc/recovery-process.txt 
//usr/share/doc/ctdb/recovery-process.txt 
/usr/bin/install -c -m 755 config/events.d/00.ctdb //usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 755 config/events.d/01.reclock 
//usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 755 config/events.d/10.interface 
//usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 755 config/events.d/11.natgw 
//usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 755 config/events.d/11.routing 
//usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 644 config/events.d/20.multipathd 
//usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 644 config/events.d/31.clamd 
//usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 755 config/events.d/40.vsftpd 
//usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 755 config/events.d/41.httpd 
//usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 755 config/events.d/50.samba 
//usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 755 config/events.d/60.nfs //usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 755 config/events.d/61.nfstickle 
//usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 755 config/events.d/70.iscsi 
//usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 755 config/events.d/91.lvs //usr/local/etc/ctdb/events.d 
/usr/bin/install -c -m 755 tools/ctdb_diagnostics //usr/local/bin 
/usr/bin/install -c -m 755 tools/onnode //usr/local/bin 
if [ -f doc/ctdb.1 ];then /usr/bin/install -c -d //usr/local/man/man1; fi 
if [ -f doc/ctdb.1 ];then /usr/bin/install -c -m 644 doc/ctdb.1 
//usr/local/man/man1; fi 
if [ -f doc/ctdbd.1 ];then /usr/bin/install -c -m 644 doc/ctdbd.1 
//usr/local/man/man1; fi 
if [ -f doc/onnode.1 ];then /usr/bin/install -c -m 644 doc/onnode.1 
//usr/local/man/man1; fi 
if [ ! -f //usr/local/etc/ctdb/notify.sh ];then /usr/bin/install -c -m 
755 config/notify.sh //usr/local/etc/ctdb; fi 

-- 
Linux-cluster mailing list 
Linux-cluster at redhat.com 
https://www.redhat.com/mailman/listinfo/linux-cluster 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091230/0226b111/attachment.htm>

From michael.lense at convergys.com  Wed Dec 30 13:32:56 2009
From: michael.lense at convergys.com (michael.lense at convergys.com)
Date: Wed, 30 Dec 2009 08:32:56 -0500
Subject: [Linux-cluster] Network Bonding in Clustered Environment ??
Message-ID: <1F33592152DAAB43A67411276617D3C5F53EB3A931@CDCMW10E.na.convergys.com>

Red Hat Linux-Clustering


I am currently setting up a two node cluster for a Database Environment...

I have Network Bonding setup on the two nodes and was reading in one document that Red Hat uses eth0 as the default heartbeat...

Is there something I need to do to have it setup to us a certain bond0.xxx VLan ?? and if so how would I do this ??

bond0     Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B
          inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:205491095 errors:0 dropped:0 overruns:0 frame:0
          TX packets:213210619 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:95394558805 (88.8 GiB)  TX bytes:173234191604 (161.3 GiB)

bond0.211 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B
          inet addr:10.195.27.5  Bcast:10.195.27.31  Mask:255.255.255.224
          inet6 addr: fe80::226:b9ff:fe34:460b/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:200489472 errors:0 dropped:0 overruns:0 frame:0
          TX packets:213200258 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:90408105703 (84.1 GiB)  TX bytes:171522012810 (159.7 GiB)

bond0.211:1 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B
          inet addr:10.195.27.16  Bcast:10.195.27.31  Mask:255.255.255.224
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bond0.212 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B
          inet addr:135.108.71.5  Bcast:135.108.71.31  Mask:255.255.255.224
          inet6 addr: fe80::226:b9ff:fe34:460b/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:523160 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10850 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:79853140 (76.1 MiB)  TX bytes:2559699 (2.4 MiB)

bond0.212:1 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B
          inet addr:135.108.71.16  Bcast:135.108.71.31  Mask:255.255.255.224
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bond0.213 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B
          inet addr:192.168.65.5  Bcast:192.168.65.31  Mask:255.255.255.224
          inet6 addr: fe80::226:b9ff:fe34:460b/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:367458 errors:0 dropped:0 overruns:0 frame:0
          TX packets:938 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:63975254 (61.0 MiB)  TX bytes:39788 (38.8 KiB)

bond0.213:1 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B
          inet addr:192.168.65.16  Bcast:192.168.65.31  Mask:255.255.255.224
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bond0.215 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B
          inet addr:192.168.15.5  Bcast:192.168.15.255  Mask:255.255.255.0
          inet6 addr: fe80::226:b9ff:fe34:460b/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:39936 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:7984970 (7.6 MiB)  TX bytes:1106 (1.0 KiB)

eth0      Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:106617306 errors:0 dropped:0 overruns:0 frame:0
          TX packets:106605309 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:49644091276 (46.2 GiB)  TX bytes:86616008153 (80.6 GiB)
          Interrupt:225 Memory:d6000000-d6012100

eth1      Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:98873789 errors:0 dropped:0 overruns:0 frame:0
          TX packets:106605310 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:45750467529 (42.6 GiB)  TX bytes:86618183451 (80.6 GiB)
          Interrupt:233 Memory:d8000000-d8012100

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:63014709 errors:0 dropped:0 overruns:0 frame:0
          TX packets:63014709 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:4440888070 (4.1 GiB)  TX bytes:4440888070 (4.1 GiB)

# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="echpdb" config_version="46" name="echpdb">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
           <clusternode name="echpdb2.dcd.convergys.com" nodeid="1" votes="1">
              <fence>
                 <method name="1">
                    <device name="echpdb2-drac"/>
                 </method>
              </fence>
           </clusternode>
           <clusternode name="echpdb1.dcd.convergys.com" nodeid="2" votes="1">
              <fence>
                 <method name="1">
                    <device name="echpdb1-drac"/>
                 </method>
              </fence>
           </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
           <fencedevice agent="fence_drac" ipaddr="10.195.27.70" login="root" name="echpdb1-drac" passwd="calvin"/>
           <fencedevice agent="fence_drac" ipaddr="10.195.27.71" login="root" name="echpdb2-drac" passwd="calvin"/>
        </fencedevices>
        <rm>
           <failoverdomains>
              <failoverdomain name="echpdb" nofailback="1" ordered="1" restricted="1">
                 <failoverdomainnode name="echpdb2.dcd.convergys.com" priority="2"/>
                 <failoverdomainnode name="echpdb1.dcd.convergys.com" priority="1"/>
              </failoverdomain>
           </failoverdomains>
           <resources>
              <script file="/etc/cluster/echpdb/echpdb.ksh" name="ECHP"/>
           </resources>
           <service autostart="0" domain="echpdb" exclusive="0" name="echpdb" recovery="relocate">
              <script ref="ECHP"/>
           </service>
        </rm>
</cluster>
#

Thanks
Mike

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091230/2d6ed7b0/attachment.htm>

From kkovachev at varna.net  Wed Dec 30 14:02:22 2009
From: kkovachev at varna.net (Kaloyan Kovachev)
Date: Wed, 30 Dec 2009 16:02:22 +0200
Subject: [Linux-cluster] Network Bonding in Clustered Environment ??
In-Reply-To: <1F33592152DAAB43A67411276617D3C5F53EB3A931@CDCMW10E.na.convergys.com>
References: <1F33592152DAAB43A67411276617D3C5F53EB3A931@CDCMW10E.na.convergys.com>
Message-ID: <20091230135908.M74811@varna.net>

On Wed, 30 Dec 2009 08:32:56 -0500, michael.lense wrote 
> Red Hat Linux-Clustering  
>    
>    
> I am currently setting up a two node cluster for a Database Environment? 
>    
> I have Network Bonding setup on the two nodes and was reading in one
document that Red Hat uses eth0 as the default heartbeat? 
>    
> Is there something I need to do to have it setup to us a certain bond0.xxx
VLan ?? and if so how would I do this ?? 

All you need to do is to use a hostname in cluster.conf which points to an IP
belonging to that interface: if in 192.168.15.0/24 then bond0.215 will be used

>    
> bond0     Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B 
>           inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link 
>           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1 
>           RX packets:205491095 errors:0 dropped:0 overruns:0 frame:0 
>           TX packets:213210619 errors:0 dropped:0 overruns:0 carrier:0 
>           collisions:0 txqueuelen:0 
>           RX bytes:95394558805 (88.8 GiB)  TX bytes:173234191604 (161.3 GiB) 
>    
> bond0.211 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B 
>           inet addr:10.195.27.5  Bcast:10.195.27.31  Mask:255.255.255.224 
>           inet6 addr: fe80::226:b9ff:fe34:460b/64 Scope:Link 
>           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1 
>           RX packets:200489472 errors:0 dropped:0 overruns:0 frame:0 
>           TX packets:213200258 errors:0 dropped:0 overruns:0 carrier:0 
>           collisions:0 txqueuelen:0 
>           RX bytes:90408105703 (84.1 GiB)  TX bytes:171522012810 (159.7 GiB) 
>    
> bond0.211:1 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B 
>           inet addr:10.195.27.16  Bcast:10.195.27.31  Mask:255.255.255.224 
>           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1 
>    
> bond0.212 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B 
>           inet addr:135.108.71.5  Bcast:135.108.71.31  Mask:255.255.255.224 
>           inet6 addr: fe80::226:b9ff:fe34:460b/64 Scope:Link 
>           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1 
>           RX packets:523160 errors:0 dropped:0 overruns:0 frame:0 
>           TX packets:10850 errors:0 dropped:0 overruns:0 carrier:0 
>           collisions:0 txqueuelen:0 
>           RX bytes:79853140 (76.1 MiB)  TX bytes:2559699 (2.4 MiB) 
>    
> bond0.212:1 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B 
>           inet addr:135.108.71.16  Bcast:135.108.71.31  Mask:255.255.255.224 
>           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1 
>    
> bond0.213 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B 
>           inet addr:192.168.65.5  Bcast:192.168.65.31  Mask:255.255.255.224 
>           inet6 addr: fe80::226:b9ff:fe34:460b/64 Scope:Link 
>           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1 
>           RX packets:367458 errors:0 dropped:0 overruns:0 frame:0 
>           TX packets:938 errors:0 dropped:0 overruns:0 carrier:0 
>           collisions:0 txqueuelen:0 
>           RX bytes:63975254 (61.0 MiB)  TX bytes:39788 (38.8 KiB) 
>    
> bond0.213:1 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B 
>           inet addr:192.168.65.16  Bcast:192.168.65.31  Mask:255.255.255.224 
>           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1 
>    
> bond0.215 Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B 
>           inet addr:192.168.15.5  Bcast:192.168.15.255  Mask:255.255.255.0 
>           inet6 addr: fe80::226:b9ff:fe34:460b/64 Scope:Link 
>           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1 
>           RX packets:39936 errors:0 dropped:0 overruns:0 frame:0 
>           TX packets:17 errors:0 dropped:0 overruns:0 carrier:0 
>           collisions:0 txqueuelen:0 
>           RX bytes:7984970 (7.6 MiB)  TX bytes:1106 (1.0 KiB) 
>    
> eth0      Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B 
>           UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1 
>           RX packets:106617306 errors:0 dropped:0 overruns:0 frame:0 
>           TX packets:106605309 errors:0 dropped:0 overruns:0 carrier:0 
>           collisions:0 txqueuelen:1000 
>           RX bytes:49644091276 (46.2 GiB)  TX bytes:86616008153 (80.6 GiB) 
>           Interrupt:225 Memory:d6000000-d6012100 
>    
> eth1      Link encap:Ethernet  HWaddr 00:26:B9:34:46:0B 
>           UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1 
>           RX packets:98873789 errors:0 dropped:0 overruns:0 frame:0 
>           TX packets:106605310 errors:0 dropped:0 overruns:0 carrier:0 
>           collisions:0 txqueuelen:1000 
>           RX bytes:45750467529 (42.6 GiB)  TX bytes:86618183451 (80.6 GiB) 
>           Interrupt:233 Memory:d8000000-d8012100 
>    
> lo        Link encap:Local Loopback 
>           inet addr:127.0.0.1  Mask:255.0.0.0 
>            inet6 addr: ::1/128 Scope:Host 
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1 
>           RX packets:63014709 errors:0 dropped:0 overruns:0 frame:0 
>           TX packets:63014709 errors:0 dropped:0 overruns:0 carrier:0 
>           collisions:0 txqueuelen:0 
>           RX bytes:4440888070 (4.1 GiB)  TX bytes:4440888070 (4.1 GiB) 
>    
> # cat /etc/cluster/cluster.conf 
> <?xml version="1.0"?> 
> <cluster alias="echpdb" config_version="46" name="echpdb"> 
>         <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> 
>         <clusternodes> 
>            <clusternode name="echpdb2.dcd.convergys.com" nodeid="1" votes="1"> 
>                <fence> 
>                   <method name="1"> 
>                       <device name="echpdb2-drac"/> 
>                   </method> 
>                </fence> 
>            </clusternode> 
>            <clusternode name="echpdb1.dcd.convergys.com" nodeid="2" votes="1"> 
>                <fence> 
>                   <method name="1"> 
>                      <device name="echpdb1-drac"/> 
>                   </method> 
>                </fence> 
>            </clusternode> 
>         </clusternodes> 
>         <cman expected_votes="1" two_node="1"/> 
>         <fencedevices> 
>            <fencedevice agent="fence_drac" ipaddr="10.195.27.70"
login="root" name="echpdb1-drac" passwd="calvin"/> 
>            <fencedevice agent="fence_drac" ipaddr="10.195.27.71"
login="root" name="echpdb2-drac" passwd="calvin"/> 
>         </fencedevices> 
>         <rm> 
>            <failoverdomains> 
>                <failoverdomain name="echpdb" nofailback="1" ordered="1"
restricted="1"> 
>                   <failoverdomainnode name="echpdb2.dcd.convergys.com"
priority="2"/> 
>                   <failoverdomainnode name="echpdb1.dcd.convergys.com"
priority="1"/> 
>                </failoverdomain> 
>            </failoverdomains> 
>            <resources> 
>                <script file="/etc/cluster/echpdb/echpdb.ksh" name="ECHP"/> 
>            </resources> 
>            <service autostart="0" domain="echpdb" exclusive="0"
name="echpdb" recovery="relocate"> 
>                <script ref="ECHP"/> 
>            </service> 
>         </rm> 
> </cluster> 
> # 
>    
> Thanks 
> Mike 
>  



From brem.belguebli at gmail.com  Wed Dec 30 15:38:42 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Wed, 30 Dec 2009 16:38:42 +0100
Subject: [Linux-cluster] qdisk max_error_cycles setting
Message-ID: <29ae894c0912300738g16c2a808u6aa1afe38270cce3@mail.gmail.com>

Hi,

It looks like the quorumd max_error_cycles parameter it not taken into account.

Here's the test I'm doing:

A 3 nodes cluster (RHEL 5.4) with a iscsi qdisk lun from a RHEL 5.4
target server.

All 3 cluster nodes have the following cqdisk configuration:

<quorumd device="/dev/iscsi/storage.quorum" interval="1"
log_facility="local5" log_level="7" tko="10" votes="1"
max_error_cycles="10">

When I block access from the 3 nodes to the target server (iptables
rule that prevents all ip flows from the 3 nodes to the target
server), I see the Quorum disk go offline but qdisk never gets stopped
and keeps on retrying the qdisk device despite the fact that I
instructed it to abort after 10 cycles (max_error_cycles=10).

Am I misunderstanding the max_error_cycles definition in the qdisk man page ?

Regards

PS: As consequence of not being killed after this max-error_cycles,
qdisk  keeps on growing (memory usage virtual size) and if the
situation lasts too long OOM killer gets involved.....



From td3201 at gmail.com  Wed Dec 30 16:13:57 2009
From: td3201 at gmail.com (Terry)
Date: Wed, 30 Dec 2009 10:13:57 -0600
Subject: [Linux-cluster] cannot add 3rd node to running cluster
In-Reply-To: <74e9d01e0912291520l3bc36ac4yc7a17b1f96fa123d@mail.gmail.com>
References: <8ee061010912291130n68f0bad6l496f71df2cd703ac@mail.gmail.com>
	<74e9d01e0912291520l3bc36ac4yc7a17b1f96fa123d@mail.gmail.com>
Message-ID: <8ee061010912300813i7fd29c70hd81bf691d574df0c@mail.gmail.com>

On Tue, Dec 29, 2009 at 5:20 PM, Jason W. <jwellband at gmail.com> wrote:
> On Tue, Dec 29, 2009 at 2:30 PM, Terry <td3201 at gmail.com> wrote:
>> Hello,
>>
>> I have a working 2 node cluster that I am trying to add a third node
>> to. ? I am trying to use Red Hat's conga (luci) to add the node in but
>
> If you have two node cluster with two_node=1 in cluster.conf - such as
> two nodes with no quorum device to break a tie - you'll need to bring
> the cluster down, change two_node to 0 on both nodes (and rev the
> cluster version at the top of cluster.conf), bring the cluster up and
> then add the third node.
>
> For troubleshooting any cluster issue, take a look at syslog
> (/var/log/messages by default). It can help to watch it on a
> centralized syslog server that all of your nodes forward logs to.
>
> --
> HTH, YMMV, HANW :)
>
> Jason
>
> The path to enlightenment is /usr/bin/enlightenment.

Thank you for the response.  /var/log/messages doesn't have any
errors.  It says cman started then says can't connect to cluster
infrastructure after a few seconds.  My cluster does not have the
two_node=1 config now.  Conga took that out for me.  That bit me last
night because I needed to put that back in.



From brem.belguebli at gmail.com  Wed Dec 30 17:20:37 2009
From: brem.belguebli at gmail.com (brem belguebli)
Date: Wed, 30 Dec 2009 18:20:37 +0100
Subject: [Linux-cluster] Submitting lvm-cluster resource script
In-Reply-To: <1254778350.6511.16.camel@mecatol>
References: <1254778350.6511.16.camel@mecatol>
Message-ID: <29ae894c0912300920p5261b37dr831e99316d7e8abe@mail.gmail.com>

Hello,

Have you guys had a look at Rafael's resource that would allow to
manage CLVM VG/LV exclusive activation in an active/passive fashion ?

Regards

2009/10/5 Rafael Mic? Miranda <rmicmirregs at gmail.com>:
> Hi,
>
> I previously submitted this to the list in the following thread:
>
> https://www.redhat.com/archives/cluster-devel/2009-June/msg00020.html
>
> I just want to submit a new version, the diff patch against the previous
> version (which can be found in the older thread) and a readme file to
> help people in the deployment and usage of the lvm-cluster resource
> script.
>
> md5sums:
>
> fe6bf8a73ab059231172210f299f943d ?lvm-cluster.sh
> 5c6679317e90af5c3cf933d972df20bb ?patch-v1
> f1dcd80729066bf57cc6743a16fcb261 ?README.txt
>
> Cheers,
>
> Rafael
>
> --
> Rafael Mic? Miranda
>



From diamondiona at gmail.com  Thu Dec 31 00:41:32 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Thu, 31 Dec 2009 08:41:32 +0800
Subject: [Linux-cluster] CTDB configuration files are missing
In-Reply-To: <4B3B2482.9080906@bobich.net>
References: <dd23a5e0912300141x2c7efd87gb8c56b0409858bfa@mail.gmail.com>
	<4B3B2482.9080906@bobich.net>
Message-ID: <dd23a5e0912301641s36463446s595eae6f0ef0a45c@mail.gmail.com>

thanks everyone, what I am looking for 32 bit package, but RHN only
provides 64 bits.


On Wed, Dec 30, 2009 at 5:59 PM, Gordan Bobic <gordan at bobich.net> wrote:
> Never used CTDB myself, but as far as RPMs go, they are available in the
> epel yum repository.
>
> rpm -Uvh \
> http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-5-3.noarch.rpm
>
> yum install ctdb
>
> should to get you going without building your own.
>
> Gordan
>
> Diamond Li wrote:
>>
>> Hello, everyone,
>> it may not the right group to ask CTDB question, but if someone
>> happens to know the answer, I would appreciate. after I compiled and
>> installed ctdb, I did not see configure file ?/etc/sysconfig/ctdb, but
>> there is no errors during installation.
>>
>> It should be created during installation, right?
>>
>> one more question, is there easy to build CTDB rpm package?
>>
>> [root at wplccdlvm445 config]# ls -l /etc/sysconfig/ctdb
>> ls: /etc/sysconfig/ctdb: No such file or directory
>>
>> installation steps:
>> cd ctdb
>> ? ./autogen.sh
>> ? ./configure
>> ? make
>> ? make install
>>
>>
>>
>> [root at wplccdlvm445 ctdb]# make install |less
>> ctdb will be compiled with flags:
>> ?CFLAGS = -g -I./include -Iinclude -Ilib -Ilib/util -I.
>> -I./lib/talloc -Ilib/tdb/include -I./lib/re
>> place -DVARDIR=\"/usr/local/var\" -DETCDIR=\"/usr/local/etc\"
>> -DLOGDIR=\"/usr/local/var/log\" -DUSE_
>> MMAP=1 ?-I./lib/replace -Wall -Wshadow -Wstrict-prototypes
>> -Wpointer-arith -Wcast-qual -Wcast-align
>> -Wwrite-strings
>> ?LIBS =
>> mkdir -p //usr/local/lib/pkgconfig
>> mkdir -p //usr/local/bin
>> mkdir -p //usr/local/sbin
>> mkdir -p //usr/local/include
>> mkdir -p //usr/local/etc/ctdb
>> mkdir -p //usr/local/etc/ctdb/events.d
>> mkdir -p //usr/share/doc/ctdb
>> /usr/bin/install -c -m 644 ctdb.pc //usr/local/lib/pkgconfig
>> /usr/bin/install -c -m 755 bin/ctdb //usr/local/bin
>> /usr/bin/install -c -m 755 bin/ctdbd //usr/local/sbin
>> /usr/bin/install -c -m 755 bin/smnotify //usr/local/bin
>> /usr/bin/install -c -m 755 bin/ping_pong //usr/local/bin
>> /usr/bin/install -c -m 644 include/ctdb.h //usr/local/include
>> /usr/bin/install -c -m 644 include/ctdb_private.h //usr/local/include
>> # for samba3
>> /usr/bin/install -c -m 644 config/functions //usr/local/etc/ctdb
>> /usr/bin/install -c -m 755 config/statd-callout //usr/local/etc/ctdb
>> /usr/bin/install -c -m 644 config/events.d/README
>> //usr/share/doc/ctdb/README.eventscripts
>> /usr/bin/install -c -m 644 doc/recovery-process.txt
>> //usr/share/doc/ctdb/recovery-process.txt
>> /usr/bin/install -c -m 755 config/events.d/00.ctdb
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 755 config/events.d/01.reclock
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 755 config/events.d/10.interface
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 755 config/events.d/11.natgw
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 755 config/events.d/11.routing
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 644 config/events.d/20.multipathd
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 644 config/events.d/31.clamd
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 755 config/events.d/40.vsftpd
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 755 config/events.d/41.httpd
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 755 config/events.d/50.samba
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 755 config/events.d/60.nfs
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 755 config/events.d/61.nfstickle
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 755 config/events.d/70.iscsi
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 755 config/events.d/91.lvs
>> //usr/local/etc/ctdb/events.d
>> /usr/bin/install -c -m 755 tools/ctdb_diagnostics //usr/local/bin
>> /usr/bin/install -c -m 755 tools/onnode //usr/local/bin
>> if [ -f doc/ctdb.1 ];then /usr/bin/install -c -d //usr/local/man/man1; fi
>> if [ -f doc/ctdb.1 ];then /usr/bin/install -c -m 644 doc/ctdb.1
>> //usr/local/man/man1; fi
>> if [ -f doc/ctdbd.1 ];then /usr/bin/install -c -m 644 doc/ctdbd.1
>> //usr/local/man/man1; fi
>> if [ -f doc/onnode.1 ];then /usr/bin/install -c -m 644 doc/onnode.1
>> //usr/local/man/man1; fi
>> if [ ! -f //usr/local/etc/ctdb/notify.sh ];then /usr/bin/install -c -m
>> 755 config/notify.sh //usr/local/etc/ctdb; fi
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



From gordan at bobich.net  Thu Dec 31 02:19:37 2009
From: gordan at bobich.net (Gordan Bobic)
Date: Thu, 31 Dec 2009 02:19:37 +0000
Subject: [Linux-cluster] CTDB configuration files are missing
In-Reply-To: <dd23a5e0912301641s36463446s595eae6f0ef0a45c@mail.gmail.com>
References: <dd23a5e0912300141x2c7efd87gb8c56b0409858bfa@mail.gmail.com>	<4B3B2482.9080906@bobich.net>
	<dd23a5e0912301641s36463446s595eae6f0ef0a45c@mail.gmail.com>
Message-ID: <4B3C0A39.4030000@bobich.net>

Epel repository has 32-bit packages. I beileve the same epel-release rpm 
file will work. If not, use the one on the same site only replace x86_64 
with i386 in the URL.

Gordan

Diamond Li wrote:
> thanks everyone, what I am looking for 32 bit package, but RHN only
> provides 64 bits.
> 
> 
> On Wed, Dec 30, 2009 at 5:59 PM, Gordan Bobic <gordan at bobich.net> wrote:
>> Never used CTDB myself, but as far as RPMs go, they are available in the
>> epel yum repository.
>>
>> rpm -Uvh \
>> http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-5-3.noarch.rpm
>>
>> yum install ctdb
>>
>> should to get you going without building your own.
>>
>> Gordan
>>
>> Diamond Li wrote:
>>> Hello, everyone,
>>> it may not the right group to ask CTDB question, but if someone
>>> happens to know the answer, I would appreciate. after I compiled and
>>> installed ctdb, I did not see configure file  /etc/sysconfig/ctdb, but
>>> there is no errors during installation.
>>>
>>> It should be created during installation, right?
>>>
>>> one more question, is there easy to build CTDB rpm package?
>>>
>>> [root at wplccdlvm445 config]# ls -l /etc/sysconfig/ctdb
>>> ls: /etc/sysconfig/ctdb: No such file or directory
>>>
>>> installation steps:
>>> cd ctdb
>>>   ./autogen.sh
>>>   ./configure
>>>   make
>>>   make install
>>>
>>>
>>>
>>> [root at wplccdlvm445 ctdb]# make install |less
>>> ctdb will be compiled with flags:
>>>  CFLAGS = -g -I./include -Iinclude -Ilib -Ilib/util -I.
>>> -I./lib/talloc -Ilib/tdb/include -I./lib/re
>>> place -DVARDIR=\"/usr/local/var\" -DETCDIR=\"/usr/local/etc\"
>>> -DLOGDIR=\"/usr/local/var/log\" -DUSE_
>>> MMAP=1  -I./lib/replace -Wall -Wshadow -Wstrict-prototypes
>>> -Wpointer-arith -Wcast-qual -Wcast-align
>>> -Wwrite-strings
>>>  LIBS =
>>> mkdir -p //usr/local/lib/pkgconfig
>>> mkdir -p //usr/local/bin
>>> mkdir -p //usr/local/sbin
>>> mkdir -p //usr/local/include
>>> mkdir -p //usr/local/etc/ctdb
>>> mkdir -p //usr/local/etc/ctdb/events.d
>>> mkdir -p //usr/share/doc/ctdb
>>> /usr/bin/install -c -m 644 ctdb.pc //usr/local/lib/pkgconfig
>>> /usr/bin/install -c -m 755 bin/ctdb //usr/local/bin
>>> /usr/bin/install -c -m 755 bin/ctdbd //usr/local/sbin
>>> /usr/bin/install -c -m 755 bin/smnotify //usr/local/bin
>>> /usr/bin/install -c -m 755 bin/ping_pong //usr/local/bin
>>> /usr/bin/install -c -m 644 include/ctdb.h //usr/local/include
>>> /usr/bin/install -c -m 644 include/ctdb_private.h //usr/local/include
>>> # for samba3
>>> /usr/bin/install -c -m 644 config/functions //usr/local/etc/ctdb
>>> /usr/bin/install -c -m 755 config/statd-callout //usr/local/etc/ctdb
>>> /usr/bin/install -c -m 644 config/events.d/README
>>> //usr/share/doc/ctdb/README.eventscripts
>>> /usr/bin/install -c -m 644 doc/recovery-process.txt
>>> //usr/share/doc/ctdb/recovery-process.txt
>>> /usr/bin/install -c -m 755 config/events.d/00.ctdb
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 755 config/events.d/01.reclock
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 755 config/events.d/10.interface
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 755 config/events.d/11.natgw
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 755 config/events.d/11.routing
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 644 config/events.d/20.multipathd
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 644 config/events.d/31.clamd
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 755 config/events.d/40.vsftpd
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 755 config/events.d/41.httpd
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 755 config/events.d/50.samba
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 755 config/events.d/60.nfs
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 755 config/events.d/61.nfstickle
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 755 config/events.d/70.iscsi
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 755 config/events.d/91.lvs
>>> //usr/local/etc/ctdb/events.d
>>> /usr/bin/install -c -m 755 tools/ctdb_diagnostics //usr/local/bin
>>> /usr/bin/install -c -m 755 tools/onnode //usr/local/bin
>>> if [ -f doc/ctdb.1 ];then /usr/bin/install -c -d //usr/local/man/man1; fi
>>> if [ -f doc/ctdb.1 ];then /usr/bin/install -c -m 644 doc/ctdb.1
>>> //usr/local/man/man1; fi
>>> if [ -f doc/ctdbd.1 ];then /usr/bin/install -c -m 644 doc/ctdbd.1
>>> //usr/local/man/man1; fi
>>> if [ -f doc/onnode.1 ];then /usr/bin/install -c -m 644 doc/onnode.1
>>> //usr/local/man/man1; fi
>>> if [ ! -f //usr/local/etc/ctdb/notify.sh ];then /usr/bin/install -c -m
>>> 755 config/notify.sh //usr/local/etc/ctdb; fi
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From diamondiona at gmail.com  Thu Dec 31 06:11:46 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Thu, 31 Dec 2009 14:11:46 +0800
Subject: [Linux-cluster] lvextend hangs up
In-Reply-To: <dd23a5e0912292249t6f645062x199efc75ee81e5dc@mail.gmail.com>
References: <dd23a5e0912292249t6f645062x199efc75ee81e5dc@mail.gmail.com>
Message-ID: <dd23a5e0912302211u59cff135w95aeecd5b1ec914@mail.gmail.com>

after I reboot clvmd, it works now. but after a while, all lvm
commands hung again. So I have to reboot machine,  unfortunately, OS
hung as well, I have to press "reset" button.

I did run lvmconf --enable-cluster, clvmd is not stable or I missed something?

[root at wplccdlvm445 gfs]# uname -r
2.6.18-164.el5

[root at wplccdlvm445 gfs]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.4 (Tikanga)


[root at wplccdlvm445 gfs]# lvm dumpconfig
  devices {
        dir="/dev"
        scan="/dev"
        preferred_names=[]
        filter="a/.*/"
        cache_dir="/etc/lvm/cache"
        cache_file_prefix=""
        write_cache_state=1
        sysfs_scan=1
        md_component_detection=1
        ignore_suspended_devices=0
  }
  activation {
        missing_stripe_filler="/dev/ioerror"
        reserved_stack=256
        reserved_memory=8192
        process_priority=-18
        mirror_region_size=512
        readahead="auto"
        mirror_log_fault_policy="allocate"
        mirror_device_fault_policy="remove"
  }
  global {
        library_dir="/usr/lib"
        umask=63
        test=0
        units="h"
        activation=1
        proc="/proc"
        locking_type=3
        fallback_to_clustered_locking=1
        fallback_to_local_locking=1
        locking_dir="/var/lock/lvm"
  }
  shell {
        history_size=100
  }
  backup {
        backup=1
        backup_dir="/etc/lvm/backup"
        archive=1
        archive_dir="/etc/lvm/archive"
        retain_min=10
        retain_days=30
  }
  log {
        verbose=0
        syslog=1
        overwrite=0
        level=0
        indent=1
        command_names=0
        prefix="  "
  }




On Wed, Dec 30, 2009 at 2:49 PM, Diamond Li <diamondiona at gmail.com> wrote:
> hello, everyone,
>
> it is frustrated to see lvextend hanging up when I am trying to extend
> a mirror ?logical volume: no error message, log, can't exit using
> CTRL+C. :-(
>
> anyone has similar experience?
>
> [root at wplccdlvm446 gfs]# lvextend -d -L +1G -m1 ?/dev/vg100/lvol0
>
> [root at wplccdlvm446 ~]# cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 5.4 (Tikanga)
> [root at wplccdlvm446 ~]# uname -r
> 2.6.18-164.el5
>



From diamondiona at gmail.com  Thu Dec 31 06:55:00 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Thu, 31 Dec 2009 14:55:00 +0800
Subject: [Linux-cluster] gfs2_grow does not work
Message-ID: <dd23a5e0912302255h4f11f77ay1152bb7639530acf@mail.gmail.com>

Hello,

I am trying to grow a gfs2 file system, unfortunately  it does not work.

anyone has similar issues or I always have bad luck?

[root at wplccdlvm446 gfs]# mount

/dev/mapper/vg100-lvol0 on /gfs type gfs2 (rw,hostdata=jid=0:id=131074:first=1)

[root at wplccdlvm446 gfs]# gfs2_grow -v /gfs
Initializing lists...
gfs2_grow: Couldn't mount /tmp/.gfs2meta : Invalid argument

[root at wplccdlvm446 gfs]# ls -a /tmp/.gfs2meta/
.  ..


[root at wplccdlvm446 gfs]# uname -r
2.6.18-164.el5

[root at wplccdlvm446 gfs]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.4 (Tikanga)



From diamondiona at gmail.com  Thu Dec 31 07:16:44 2009
From: diamondiona at gmail.com (Diamond Li)
Date: Thu, 31 Dec 2009 15:16:44 +0800
Subject: [Linux-cluster] gfs2_grow does not work
In-Reply-To: <dd23a5e0912302255h4f11f77ay1152bb7639530acf@mail.gmail.com>
References: <dd23a5e0912302255h4f11f77ay1152bb7639530acf@mail.gmail.com>
Message-ID: <dd23a5e0912302316x5b8c1fb2y61ff503acb0dfd3f@mail.gmail.com>

from system log, I can see the erorr message:

Dec 31 15:04:56 wplccdlvm446 kernel: GFS2: gfs2 mount does not exist

but I have mounted gfs2 file system under /gfs folder and I can do
operations such as mkdir, rm, successfully.



On Thu, Dec 31, 2009 at 2:55 PM, Diamond Li <diamondiona at gmail.com> wrote:
> Hello,
>
> I am trying to grow a gfs2 file system, unfortunately ?it does not work.
>
> anyone has similar issues or I always have bad luck?
>
> [root at wplccdlvm446 gfs]# mount
>
> /dev/mapper/vg100-lvol0 on /gfs type gfs2 (rw,hostdata=jid=0:id=131074:first=1)
>
> [root at wplccdlvm446 gfs]# gfs2_grow -v /gfs
> Initializing lists...
> gfs2_grow: Couldn't mount /tmp/.gfs2meta : Invalid argument
>
> [root at wplccdlvm446 gfs]# ls -a /tmp/.gfs2meta/
> . ?..
>
>
> [root at wplccdlvm446 gfs]# uname -r
> 2.6.18-164.el5
>
> [root at wplccdlvm446 gfs]# cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 5.4 (Tikanga)
>



From td3201 at gmail.com  Thu Dec 31 16:13:49 2009
From: td3201 at gmail.com (Terry)
Date: Thu, 31 Dec 2009 10:13:49 -0600
Subject: [Linux-cluster] cannot add 3rd node to running cluster
In-Reply-To: <8ee061010912300813i7fd29c70hd81bf691d574df0c@mail.gmail.com>
References: <8ee061010912291130n68f0bad6l496f71df2cd703ac@mail.gmail.com>
	<74e9d01e0912291520l3bc36ac4yc7a17b1f96fa123d@mail.gmail.com>
	<8ee061010912300813i7fd29c70hd81bf691d574df0c@mail.gmail.com>
Message-ID: <8ee061010912310813g3f45bf6ekfc52c3d5420a5826@mail.gmail.com>

On Wed, Dec 30, 2009 at 10:13 AM, Terry <td3201 at gmail.com> wrote:
> On Tue, Dec 29, 2009 at 5:20 PM, Jason W. <jwellband at gmail.com> wrote:
>> On Tue, Dec 29, 2009 at 2:30 PM, Terry <td3201 at gmail.com> wrote:
>>> Hello,
>>>
>>> I have a working 2 node cluster that I am trying to add a third node
>>> to. ? I am trying to use Red Hat's conga (luci) to add the node in but
>>
>> If you have two node cluster with two_node=1 in cluster.conf - such as
>> two nodes with no quorum device to break a tie - you'll need to bring
>> the cluster down, change two_node to 0 on both nodes (and rev the
>> cluster version at the top of cluster.conf), bring the cluster up and
>> then add the third node.
>>
>> For troubleshooting any cluster issue, take a look at syslog
>> (/var/log/messages by default). It can help to watch it on a
>> centralized syslog server that all of your nodes forward logs to.
>>
>> --
>> HTH, YMMV, HANW :)
>>
>> Jason
>>
>> The path to enlightenment is /usr/bin/enlightenment.
>
> Thank you for the response. ?/var/log/messages doesn't have any
> errors. ?It says cman started then says can't connect to cluster
> infrastructure after a few seconds. ?My cluster does not have the
> two_node=1 config now. ?Conga took that out for me. ?That bit me last
> night because I needed to put that back in.
>

CMAN still will not start and gives no debug information.  Anyone know
why cman_tool -d join would not print any output at all?
Troubleshooting this is kind of a nightmare.  I verified that two_node
is not in play.